Explore the Full Program of SIGGRAPH Asia 2025!
Close

Presentation

PriorAvatar: Efficient and Robust Avatar Creation from Monocular Video Using Learned Priors
DescriptionHigh-fidelity avatar reconstruction from monocular videos faces significant challenges due to imperfect foreground segmentation and inaccurate body poses. Existing methods typically depend on additive components, such as explicit background modeling, which introduce additional overhead and reduce the flexibility of avatar reconstruction. We argue that these challenges need to be addressed fundamentally. To this end, we propose leveraging a learned 3D human prior to guide the reconstruction of 3D avatars, dubbed PriorAvatar, without increasing model complexity. At the core of our method is a learned 3D prior, which consists of a multi-person feature codebook that stores the 3D shapes and appearances derived from human scans. These latent features are complemented by a shared U-Net decoder that converts them into a set of renderable 3D Gaussians. During reconstruction, the learned 3D prior allows for fitting to unseen subjects in the monocular videos by fine-tuning with 2D photometric losses using 3D Gaussians. This approach ensures that the reconstruction process effectively utilizes the learned latent spaces while minimizing discrepancies with the 2D observations. In our experiments, we demonstrate the efficiency and robustness of our novel reconstruction scheme, as evidenced by its state-of-the-art quantitative and qualitative performance without relying on complex regularizers or additional model enhancements. The results of ablation studies further verify the effectiveness of incorporating a learned human prior for monocular avatar reconstruction.