BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Hong_Kong
X-LIC-LOCATION:Asia/Hong_Kong
BEGIN:STANDARD
TZOFFSETFROM:+0800
TZOFFSETTO:+0800
TZNAME:HKT
DTSTART:19911015T033000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20251218T030656Z
LOCATION:Meeting Room S423+S424\, Level 4
DTSTART;TZID=Asia/Hong_Kong:20251216T131000
DTEND;TZID=Asia/Hong_Kong:20251216T132000
UID:siggraphasia_SIGGRAPH Asia 2025_sess121_papers_1719@linklings.com
SUMMARY:DvD: Unleashing a Generative Paradigm for Document Dewarping via C
 oordinates-based Diffusion Model
DESCRIPTION:Weiguang Zhang, Huangcheng Lu, and Maizhen Ning (Xi’an Jiaoton
 g-Liverpool University, University of Liverpool); Xiaowei Huang (Universit
 y of Liverpool); Wei Wang (Xi’an Jiaotong-Liverpool University); Kaizhu Hu
 ang (Duke Kunshan University); and Qiufeng Wang (Xi’an Jiaotong-Liverpool 
 University)\n\nDocument dewarping aims to rectify deformations in photogra
 phic document images, thus improving text readability, which has attracted
  much attention and made great progress, but it is still challenging to pr
 eserve document structures. Given recent advances in diffusion models, it 
 is natural for us to consider their potential applicability to document de
 warping. However, it is far from straightforward to adopt diffusion models
  in document dewarping due to their unfaithful control on highly complex d
 ocument images (e.g., 2000$\times$3000 resolution). \nIn this paper, we pr
 opose DvD, the first generative model to tackle document \textbf{D}ewarpin
 g \textbf{v}ia a \textbf{D}iffusion framework. To be specific, DvD introdu
 ces a coordinate-level denoising instead of typical pixel-level denoising,
  generating a mapping for deformation rectification. In addition, we furth
 er propose a time-variant condition refinement mechanism to enhance the pr
 eservation of document structures. In experiments, we find that current do
 cument dewarping benchmarks can not evaluate dewarping models comprehensiv
 ely. To this end, we present AnyPhotoDoc6300, a rigorously designed large-
 scale document dewarping benchmark comprising 6,300 real image pairs acros
 s three distinct domains, enabling fine-grained evaluation of dewarping mo
 dels. Comprehensive experiments demonstrate that our proposed DvD can achi
 eve state-of-the-art performance with acceptable computational efficiency 
 on multiple metrics across various benchmarks including DocUNet, DIR300, a
 nd AnyPhotoDoc6300. The new benchmark and code will be publicly available 
 at https://github.com/afdgasggx/DvD.\n\nRegistration Category: Full Access
 , Full Access Supporter\n\nSession Chair: Paul Debevec (Eyeline, USC Insti
 tute for Creative Technologies (ICT))\n\n
END:VEVENT
END:VCALENDAR
