BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Asia/Hong_Kong
X-LIC-LOCATION:Asia/Hong_Kong
BEGIN:STANDARD
TZOFFSETFROM:+0800
TZOFFSETTO:+0800
TZNAME:HKT
DTSTART:19911015T033000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20251218T030657Z
LOCATION:Meeting Room S423+S424\, Level 4
DTSTART;TZID=Asia/Hong_Kong:20251218T090000
DTEND;TZID=Asia/Hong_Kong:20251218T091000
UID:siggraphasia_SIGGRAPH Asia 2025_sess147_papers_1424@linklings.com
SUMMARY:ConsiStyle: Style Diversity in Training-Free Consistent T2I Genera
 tion
DESCRIPTION:Yohai Mazuz (Tel Aviv University); Janna Bruner (Tel Aviv Univ
 ersity, Amazon); and Lior Wolf (Tel Aviv University)\n\nIn text-to-image m
 odels, consistent character generation is the task of achieving text align
 ment while maintaining the subject's appearance across different prompts. 
 However, since style and appearance are often entangled, the existing meth
 ods struggle to preserve consistent subject characteristics while adhering
  to varying style prompts. Current approaches for consistent text-to-image
  generation typically rely on large-scale fine-tuning on curated image set
 s or per-subject optimization, which either fail to generalize across prom
 pts or do not align well with textual descriptions. Meanwhile, training-fr
 ee methods often fail to maintain subject consistency across different sty
 les. \nIn this work, we introduce a training-free method that, for the fir
 st time, jointly achieves style preservation and subject consistency acros
 s varied styles. The attention matrices are manipulated such that Queries 
 and Keys are obtained from the anchor image(s) that are used to define the
  subject, while the Values are imported from a parallel copy that is not s
 ubject-anchored. Additionally, cross-image components are added to the sel
 f-attention mechanism by expanding the Key and Value matrices. To do witho
 ut shifting from the target style, we align the statistics of the Value ma
 trices.\nAs is demonstrated in a comprehensive battery of qualitative and 
 quantitative experiments, our method effectively decouples style from subj
 ect appearance and enables faithful generation of text-aligned images with
  consistent characters across diverse styles.\n\nRegistration Category: Fu
 ll Access, Full Access Supporter\n\nSession Chair: Fan Tang (Institute of 
 Computing Technology, Chinese Academy of Sciences)\n\n
END:VEVENT
END:VCALENDAR
