HumanLift: Single-Image 3D Human Reconstruction with 3D-Aware Diffusion Priors and Facial Enhancement

1Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences
2University of Chinese Academy of Sciences
3Hong Kong University of Science and Technology
4Cardiff University
SIGGRAPH ASIA 2025
Teaser Image

Abstract

Creating high-quality, photorealistic 3D digital humans from a single image remains challenging. While existing methods can generate visually appealing multi-view outputs, they often suffer from inconsistencies in viewpoints and camera poses, resulting in suboptimal 3D reconstructions with reduced realism. Furthermore, most approaches focus on body generation while overlooking facial consistency—a perceptually critical issue due to the face occupying only a small area in full-body images (e.g., approximately 80×80 pixels out of a 512×512 image). This limited resolution and low weight for facial regions during optimization lead to insufficient facial details and inconsistent facial identity features across multiple views. To address these challenges, we leverage the powerful capabilities of 2D video diffusion models for consistent multi-view RGB and Normal human image generation, combined with the 3D SMPL-X model to ensure spatial consistency and geometrical details. By fine-tuning the Diffusion Transformer (DiT) models (HumanWan-DiTs) on realistic 3D human datasets using the LoRA technique, we achieve both generalizability and 3D visual consistency in realistic multi-view human image generation. The proposed facial enhancement is integrated into 3D Gaussian optimization to enhance facial details. For further refinement, we apply super-resolution and generative priors to reduce facial blurring, alongside SMPL-X parameter tuning and the assistance of generated multi-view normal images, resulting in photorealistic and consistent rendering from a single image. Extensive experiments demonstrate that our approach outperforms existing methods, producing photorealistic, consistent, and fine-detailed human renderings.

Video

BibTeX

@inproceedings{
  author    = {Yang, Jie and Zhang, Bo-Tao and Liu, Feng-Lin abd Fu, Hongbo and Lai, Yu-Kun and Gao, Lin},
  title     = {HumanLift: Single-Image 3D Human Reconstruction with 3D-Aware Diffusion Priors and Facial Enhancement},
  year      = {2025},
  url       = {https://doi.org/10.1145/3757377.3763839},
  doi       = {10.1145/3757377.3763839},
  booktitle = {SIGGRAPH Asia 2025 Conference Papers (SA Conference Papers '25)},
  articleno = {31},
  numpages  = {12},
  series    = {SIGGRAPH ASIA Conference Papers '25}
}