BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Hong_Kong
X-LIC-LOCATION:Asia/Hong_Kong
BEGIN:STANDARD
TZOFFSETFROM:+0800
TZOFFSETTO:+0800
TZNAME:HKT
DTSTART:19911015T033000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20251218T030653Z
LOCATION:Meeting Room S423+S424\, Level 4
DTSTART;TZID=Asia/Hong_Kong:20251215T135300
DTEND;TZID=Asia/Hong_Kong:20251215T140400
UID:siggraphasia_SIGGRAPH Asia 2025_sess105_papers_2057@linklings.com
SUMMARY:GenLit: Reformulating Single-Image Relighting as Video Generation
DESCRIPTION:Shrisha Bharadwaj, Haiwen Feng, Giorgio Becherini, Victoria Fe
 rnandez Abrevaya, and Michael J. Black (Max Planck Institute for Intellige
 nt Systems)\n\nManipulating the illumination of a 3D scene within a single
  image represents a fundamental challenge in computer vision and graphics.
  This problem has traditionally been addressed using inverse rendering tec
 hniques, which involve\nexplicit 3D asset reconstruction and costly ray-tr
 acing simulations. Meanwhile, recent advancements in visual foundation mod
 els suggest that a new paradigm could soon be possible -- one that replace
 s explicit physical models with networks that are trained on large amounts
  of image and video data. In this paper, we exploit the implicit scene und
 erstanding of a video diffusion model, particularly Stable Video Diffusion
 , to relight a single image. \nWe introduce Genlit, a framework that disti
 lls the ability of a graphics engine to perform light manipulation into a 
 video-generation model, enabling users to directly insert and manipulate a
  point light in the 3D world within a given image and generate results dir
 ectly as a video sequence. \nWe find that a model fine-tuned on only a sma
 ll synthetic dataset generalizes to real-world scenes, enabling single-ima
 ge relighting with plausible and convincing shadows and inter-reflections.
  Our results highlight the ability of video foundation models to capture r
 ich information about lighting, material, and shape, and our findings indi
 cate that such models, with minimal training, can be used to perform relig
 hting without explicit asset reconstruction or ray-tracing.\n\nRegistratio
 n Category: Full Access, Full Access Supporter\n\nSession Chair: Julien Ph
 ilip (Eyeline)\n\n
END:VEVENT
END:VCALENDAR
