Search Program
Organizations
Presenters
Presentations
Invited Poster
Poster






DescriptionThis study investigates the human perception of digital doubles, including those representing the self and others (familiar person or stranger), rendered in realistic and stylized manners. Results indicate that participants were most critical of their own avatars, highlighting perceptual challenges and design implications for self-representative virtual humans in social and interactive applications.
Art Papers



DescriptionAs crime becomes increasingly technology-driven and complex, questions of agency and moral responsibility demand renewed scrutiny. Drawing on cultural criminology and art psychology, this project examines how AI-driven systems can operate simultaneously as tools of introspection and mechanisms of control. Informed by case studies and professional interviews, we created the Interrogator — a series of interactive artworks employing affective computing methods such as facial expression recognition and conversational analysis. Across two exhibitions, the artworks immersed visitors in interactive, technology-driven encounters that encouraged self-awareness and reflection on their own actions and agency. This project underscores the potential of interactive art to transform emerging technologies into sites of critical thinking on agency and responsibility.
Technical Papers


DescriptionWe present \(G^2\) continuous splines formulated through spline blending, which interpolate a given list of control points. These splines attain local maximum curvatures at the control points, possess local support, and do not require global optimization. They ensure the absence of cusps, self-intersections, and, importantly, intersections between adjacent segments, which has not guaranteed by previous blending curve methods. We propose the use of quadratic \Bezier splines, where each spline passes through only one control point, as our interpolation functions, and quartic \Bezier splines as our blending functions. Based on these, we propose an algorithm that generates curves without requiring global optimization. Moreover, by simply adjusting the curvature of the interpolation functions, the curves near the control points can exhibit a smoother or sharper appearance, thereby substantially increasing the degree of freedom in curve design. Finally, we exhibit the results and provide proofs for the aforementioned properties.
Poster






DescriptionThis research introduces an affective in-vehicle companion agent that proactively provides situational guidance and emotional support for novice drivers through real-time emotion recognition, anthropomorphic animation, and voice interaction, demonstrating enhanced user experience and satisfaction in a driving simulator evaluation.
XR






DescriptionTouchStar is an immersive VR interactive film integrating generative AI for environment, sound, 3D models, and dialogue. Through real-time feedback to user actions and story progression, it builds a personalized narrative system, showcasing generative AI’s creative potential in virtual reality storytelling.
XR






DescriptionWe present a "hands-on" light field display for shared XR to address the isolation of head-mounted displays. Using low-cost, off-the-shelf components, it projects a light field to create vivid 3D scenes with continuous parallax and integrates gesture tracking for intuitive interaction.
Art Papers



DescriptionThis paper introduces (Un)Natural Language, an artistic computational system that examines how words shape ecological narratives, detecting potential ecological threats in government water-related project documents. Weaving together machine learning, environmental activism, and linguistics, the system offers a new analytical lens for interpreting public documents, uncovering hidden narratives of economic expansion and extractivism. Through a custom-labeled dataset and a fine-tuned BERT model, it reveals and visualizes patterns of pro-growth and ecologically detrimental discourse. Developed into an interactive online archive and installation series, the project reclaims computation as a space for reflection rather than control—inviting viewers to rethink the language shaping our ecological futures.
Technical Papers


DescriptionSteered Mixtures-of-Experts (SMoE) is an existing regression framework that has previously been applied for modeling and compression of 2D images and higher-dimensional imagery, including compression of light fields and light-field video. SMoE models are sparse, edge-aware representations that allow rendering of imagery with few Gaussians with excellent quality.
In this paper a novel, edge-aware "3D SMoE Splatting" (3DSMoES) framework for 3D rendering is introduced, adopted to fit into the existing "3D Gaussian Splatting" (3DGS) CUDA optimization pipeline. Here, SMoE regression serves as a "plug-and-play" solution that replaces the established 3DGS regression as a novel workhorse.
3DSMoES achieves significant visual quality gains with drastically fewer Gaussian kernels compared to 3DGS. We observe up to approximately 4dB improvement in PSNR on individual scenes with kernel reductions between 20 to 50 percent. The sparse models are significantly faster to train and allow up to 30-50 percent improved rendering speeds.
In this paper a novel, edge-aware "3D SMoE Splatting" (3DSMoES) framework for 3D rendering is introduced, adopted to fit into the existing "3D Gaussian Splatting" (3DGS) CUDA optimization pipeline. Here, SMoE regression serves as a "plug-and-play" solution that replaces the established 3DGS regression as a novel workhorse.
3DSMoES achieves significant visual quality gains with drastically fewer Gaussian kernels compared to 3DGS. We observe up to approximately 4dB improvement in PSNR on individual scenes with kernel reductions between 20 to 50 percent. The sparse models are significantly faster to train and allow up to 30-50 percent improved rendering speeds.
Invited Poster
Poster






DescriptionStreetUnveiler reconstructs empty 3D streets from a single-traversal in-car video by removing temporarily static objects such as parked vehicles, using hard-label 2D Gaussian Splatting, alpha-guided scene decomposition, and a time-reversal inpainting framework to achieve scalable, temporally consistent, mesh-extractable reconstructions.
Invited Poster
Poster






DescriptionWe show that modern depth estimation systems are strongly deceived by 3D visual illusions, and introduce a large-scale dataset and a vision-language-guided fusion framework that together achieve state-of-the-art robustness against both monocular and binocular illusion-based failures.
Technical Papers


DescriptionReconstructing fast-dynamic scenes from multi-view videos is crucial for high-speed motion analysis and realistic 4D reconstruction. However, the majority of 4D capture systems are limited to frame rates below 30 FPS (frames per second), and a direct 4D reconstruction of high-speed motion from low FPS input may lead to undesirable results. In this work, we propose a high-speed 4D capturing system only using low FPS cameras, through novel capturing and processing modules. On the capturing side, we propose an asynchronous capture scheme that increases the effective frame rate by staggering the start times of cameras. By grouping cameras and leveraging a base frame rate of 25 FPS, our method achieves an equivalent frame rate of 100–200 FPS without requiring specialized high-speed cameras. On processing side, we also propose a novel generative model to fix artifacts caused by 4D sparse-view reconstruction, as asynchrony reduces the number of viewpoints at each timestamp. Specifically, we propose to train a video-diffusion-based artifact-fix model for sparse 4D reconstruction, which refines missing details, maintains temporal consistency, and improves overall reconstruction quality. Experimental results demonstrate that our method significantly enhances high-speed 4D reconstruction compared to synchronized capture.
Poster






DescriptionWe propose a biosensing system that converts material perception into visual and geometric encodings. This system sets grounds for future scientific information visualization, connecting human perception and tangible material expression.
Poster






DescriptionThis study presents a tangible card-based toolkit enabling older adults to co-design XR spatial–physical interactions, translating lived experiences into actionable concepts that enhance emotional well-being and long-distance social connectedness.
Technical Papers


DescriptionWe present a compact, learning-based representation that captures the full Monte Carlo sampling distribution of a rendered image. Our approach enables rendering at arbitrary samples per pixel (SPP) during inference without requiring expensive path tracing operations. This is achieved by fitting parametric distributions to per-pixel radiance values, which can be efficiently estimated, stored, and sampled.
Our method proceeds in three stages. First, we map radiance samples into radial log space, which encourages Gaussian-like distributions while preserving angular relationships. Second, we fit each pixel’s distribution using 3D Gaussian Mixture Models (GMMs), trained online with minimal memory overhead, making the approach compatible with standard path tracers. For inference, we introduce an optimized sampling scheme whose complexity is independent of the target SPP, enabling fast synthesis of high-SPP images. Additionally, we demonstrate that the learned representations can be heavily compressed using quantization and codebook techniques with negligible quality loss. Experiments show that GMMs strike an effective balance between expressiveness and sparsity. Compared to alternative models, our method better captures pixel-wise Monte Carlo distributions. Lastly, we illustrate the versatility of our representation with applications such as firefly rejection and ray-distribution-driven denoising.
Our method proceeds in three stages. First, we map radiance samples into radial log space, which encourages Gaussian-like distributions while preserving angular relationships. Second, we fit each pixel’s distribution using 3D Gaussian Mixture Models (GMMs), trained online with minimal memory overhead, making the approach compatible with standard path tracers. For inference, we introduce an optimized sampling scheme whose complexity is independent of the target SPP, enabling fast synthesis of high-SPP images. Additionally, we demonstrate that the learned representations can be heavily compressed using quantization and codebook techniques with negligible quality loss. Experiments show that GMMs strike an effective balance between expressiveness and sparsity. Compared to alternative models, our method better captures pixel-wise Monte Carlo distributions. Lastly, we illustrate the versatility of our representation with applications such as firefly rejection and ray-distribution-driven denoising.
Emerging Technologies






DescriptionThis demo showcases a portable VR golf training system that uses real-time ball redirection to correct swing path errors. Participants experience different redirection modes that subtly guide motion, demonstrating how immersive feedback can support intuitive, self-guided motor learning in sports and rehabilitation.
Art Gallery






DescriptionAn AI, its consciousness shaped by the land's memory, draws its thoughts in iron sand via computer-controlled magnets. Reacting to viewers and the environment, it generates ephemeral words on the beach, creating a unique dialogue between technology and nature and questioning the boundaries between them.
Poster






DescriptionThis study presents a new example of visual expression on water surfaces, realized by controlling the positions of drifting objects—such as aquatic plants and fallen leaves—through bubble-induced flow. As a proof-of-concept prototype, we built a 30 cm square water tank equipped with sixteen pumps, and successfully demonstrated the display of a single letter using floating plants.
Technical Papers


DescriptionThis paper proposes a fast, efficient, and robust feature protected 3D mesh denoising method based on a modified Lengyel-Epstein (LE) model, primarily aiming to ensure volume stability and deliver superior denoising results. Compared with the original model, we mainly introduce a function expression to replace the fixed parameters. The modified model is then discretized using a seven-point difference scheme and solved by an explicit Euler method.
Notably, our approach requires no training samples or upfront training time, significantly enhancing overall computational efficiency.
Notably, our approach requires no training samples or upfront training time, significantly enhancing overall computational efficiency.
Technical Communications


DescriptionWe propose a finite-difference curvature regularization for neural signed distance fields, achieving second-order accuracy by replacing Hessian terms with first-order information and matching state-of-the-art quality with reduced memory and runtime.
Poster






DescriptionWe present an AI framework that procedurally generates unique 3D architectural models and uses diffusion models to create photorealistic renderings, enabling rapid, large-scale exploration of novel conceptual designs.
Technical Papers


DescriptionUnmanned aerial vehicles (UAVs) have demonstrated remarkable efficacy across diverse fields.
Nevertheless, developing flight controllers tailored to a specific UAV design, particularly in environments with strong fluid-interactive dynamics, remains challenging. Conventional controller design experiences often fall short in such cases, rendering it infeasible to apply time-tested practices. Consequently, a simulation test bed becomes indispensable for controller design and evaluation prior to its actual implementation on the physical UAV. This platform should allow for meticulous adjustment of controllers and should be able to transfer to real-world systems without significant performance degradation. Existing simulators predominantly hinge on empirical models due to high efficiency, often overlooking the dynamic interplay between the UAV and the surrounding airflow. This makes it difficult to mimic more complex flight maneuvers, such as an abrupt mid-air halt inside narrow channels, in which the UAV may experience strong fluid-structure interactions. On the other hand, simulators considering the complex surrounding airflow are extremely slow and inadequate to support the design and evaluation of flight controllers. In this paper, we present a novel remedy for highly-efficient UAV flight simulations, which entails a hybrid modeling that deftly combines our novel far-field adaptive block-based fluid simulator with parametric empirical models situated near the boundary of the UAV, with the model parameters automatically calibrated. With this newly devised simulator, a broader spectrum of flight scenarios can be explored for controller design and assessment, encompassing those influenced by potent close-proximity effects, or situations where multiple UAVs operate in close quarters. The practical worth of our simulator has been authenticated through comparisons with actual UAV flight data. We further showcase its utility in designing flight controllers for fixed-wing, multi-rotor, and hybrid UAVs, and even exemplify its application when multiple UAVs are involved, underlining the unique value of our system for flight controllers.
Nevertheless, developing flight controllers tailored to a specific UAV design, particularly in environments with strong fluid-interactive dynamics, remains challenging. Conventional controller design experiences often fall short in such cases, rendering it infeasible to apply time-tested practices. Consequently, a simulation test bed becomes indispensable for controller design and evaluation prior to its actual implementation on the physical UAV. This platform should allow for meticulous adjustment of controllers and should be able to transfer to real-world systems without significant performance degradation. Existing simulators predominantly hinge on empirical models due to high efficiency, often overlooking the dynamic interplay between the UAV and the surrounding airflow. This makes it difficult to mimic more complex flight maneuvers, such as an abrupt mid-air halt inside narrow channels, in which the UAV may experience strong fluid-structure interactions. On the other hand, simulators considering the complex surrounding airflow are extremely slow and inadequate to support the design and evaluation of flight controllers. In this paper, we present a novel remedy for highly-efficient UAV flight simulations, which entails a hybrid modeling that deftly combines our novel far-field adaptive block-based fluid simulator with parametric empirical models situated near the boundary of the UAV, with the model parameters automatically calibrated. With this newly devised simulator, a broader spectrum of flight scenarios can be explored for controller design and assessment, encompassing those influenced by potent close-proximity effects, or situations where multiple UAVs operate in close quarters. The practical worth of our simulator has been authenticated through comparisons with actual UAV flight data. We further showcase its utility in designing flight controllers for fixed-wing, multi-rotor, and hybrid UAVs, and even exemplify its application when multiple UAVs are involved, underlining the unique value of our system for flight controllers.
Poster






DescriptionHybrid RaaS architecture couples ParaView/VTK with XR clients for low-latency, multi-user collaboration; explicit server-driven sync and intent-only control sustain interactive rates and coherence on TB–PB datasets across heterogeneous devices.
Technical Papers


DescriptionState-of-the-art cloth simulations rely on linear triangular elements in mass-
spring or continuum based finite element formulations. These methods
typically decompose the surface energy density into in-plane (shearing and
stretching) and out-of-plane (bending) components, with bending energies
modeled using discrete mean curvature measures. While effective, they are
prone to mesh-dependent behavior and locking. Higher-order formulations
can mitigate these issues, but their adoption poses significant challenges
due to the requirement for continuity of basis functions’ derivatives across
element boundaries to accurately represent surface curvature. We introduce
a novel continuum-based approach that addresses the limitations of existing
methods without requiring globally smooth (𝐻 2-continuous) basis functions.
Our method uses non-conforming function spaces and weakly enforces the
continuity of tangent basis through carefully derived interface terms. In
fact, the proposed method builds on Interior Penalty methods, which we
adapt to effectively handle simulations of curved surfaces. Our approach uses
standard Lagrangian basis functions, and supports straightforward extension
to high-order bases, while adhering to the in-plane/out-of-plane decoupling
paradigm widely adopted in cloth simulation. We demonstrate the robustness
and versatility of our method through garment simulations, illustrating its
ability to handle complex deformations and a variety of bending behaviors
with high fidelity.
spring or continuum based finite element formulations. These methods
typically decompose the surface energy density into in-plane (shearing and
stretching) and out-of-plane (bending) components, with bending energies
modeled using discrete mean curvature measures. While effective, they are
prone to mesh-dependent behavior and locking. Higher-order formulations
can mitigate these issues, but their adoption poses significant challenges
due to the requirement for continuity of basis functions’ derivatives across
element boundaries to accurately represent surface curvature. We introduce
a novel continuum-based approach that addresses the limitations of existing
methods without requiring globally smooth (𝐻 2-continuous) basis functions.
Our method uses non-conforming function spaces and weakly enforces the
continuity of tangent basis through carefully derived interface terms. In
fact, the proposed method builds on Interior Penalty methods, which we
adapt to effectively handle simulations of curved surfaces. Our approach uses
standard Lagrangian basis functions, and supports straightforward extension
to high-order bases, while adhering to the in-plane/out-of-plane decoupling
paradigm widely adopted in cloth simulation. We demonstrate the robustness
and versatility of our method through garment simulations, illustrating its
ability to handle complex deformations and a variety of bending behaviors
with high fidelity.
Poster






DescriptionVisitors co-author stories in a 270-degree immersive room. Instant choices reshape projections into branched endings. This participant-driven design sparks curiosity and informs a reusable framework for storytelling in cultural exhibitions.
Poster






DescriptionA pratical parallel encoding method combining multi-processing and multi-threading, with size-based file sorting to reduce bottlenecks and improve efficiency in large-scale texture compression.
Computer Animation Festival






DescriptionInspired by a true story, an elderly widow in the midst of World War II struggles to overcome grief and rediscover joy in her life. Day by day, she serves as an air raid warden in the crowded shelters, witnessing the suffering of children and others. One morning, she finds a dying sparrow and hopes to save its fragile life. As the sparrow gradually heals, a bond grows between them, and the bird begins to respond to her piano playing—a shared language that builds a bridge. During bombing raids, she carries the sparrow to the shelters, where she plays the piano, and the sparrow sings its song to comfort the children and offer hope to those around her. Through this newfound purpose and unexpected alliance, her life begins to change.
Technical Papers


DescriptionPrior research has demonstrated the efficacy of balanced trees as spatially adaptive grids for large-scale simulations. However, state-of-the-art methods for balanced tree construction are restricted by the iterative nature of the ripple effect, thus failing to fully leverage the massive parallelism offered by modern GPU architectures. We propose to reframe the construction of balanced trees as a process to merge \emph{N}-balanced Minimum Spanning Trees (\emph{N}-balanced MSTs) generated from a collection of seed points. To ensure optimal performance, we propose a stack-free parallel strategy for constructing all internal nodes of a specified \emph{N}-balanced MST. This approach leverages two 32-bit integer registers as buffers rather than relying on an integer array as a stack during construction, which helps maintain balanced workloads across different GPU threads. We then propose a dynamic update algorithm utilizing refinement counters for all internal nodes to enable parallel insertion and deletion operations of \emph{N}-balanced MSTs.
This design achieves significant efficiency improvements compared to full reconstruction from scratch, thereby facilitating fluid simulations in handling dynamic moving boundaries. Our approach is fully compatible with GPU implementation and demonstrates up to an order-of-magnitude speedup compared to the state-of-the-art method~\cite{WANG2024}. The source code for the paper is publicly available at \url{https://github.com/peridyno/peridyno}.
This design achieves significant efficiency improvements compared to full reconstruction from scratch, thereby facilitating fluid simulations in handling dynamic moving boundaries. Our approach is fully compatible with GPU implementation and demonstrates up to an order-of-magnitude speedup compared to the state-of-the-art method~\cite{WANG2024}. The source code for the paper is publicly available at \url{https://github.com/peridyno/peridyno}.
Art Papers



DescriptionThis paper presents the robotic installation AB₁₂₃C, an information evolution system composed of five interacting robotic agents. Rather than preserving informational fidelity, the system stages distortion as a structural principle—a generative condition through which information acquires form. As descriptive content circulates across a chain of machines, it undergoes semantic drift, structural fragmentation, and emergent recomposition. These transformations do not reflect malfunction, but express the entangled ontology of mediated systems. By integrating algorithmic indeterminacy, material reconfiguration, and environmental perturbation, the installation visualizes how meaning arises through iterative misalignment. Within this machinic ecology, truth is not preserved—it is enacted, shaped through continuous entanglement, differential deferral, and distributed agency.
DLI Labs
Exhibitor Talk






DescriptionPlease bring your laptop and mouse to participate in this hands-on training. Seats are limited and available on a first-come, first-served basis.
Learn how to accelerate ROS 2 workloads using NVIDIA’s latest GPU-powered libraries for AI and robotics. This hands-on lab will explore how to leverage generative AI and GPU-accelerated ROS 2 packages to enhance robotic workloads in real-time. Participants will gain experience with optimizing ROS 2 packages for GPU acceleration, including techniques for improving performance and reducing latency in AI perception, navigation, and more. The session will also demonstrate NVIDIA Isaac's powerful AI and simulation tools, designed for seamless integration with ROS 2.
Learn how to accelerate ROS 2 workloads using NVIDIA’s latest GPU-powered libraries for AI and robotics. This hands-on lab will explore how to leverage generative AI and GPU-accelerated ROS 2 packages to enhance robotic workloads in real-time. Participants will gain experience with optimizing ROS 2 packages for GPU acceleration, including techniques for improving performance and reducing latency in AI perception, navigation, and more. The session will also demonstrate NVIDIA Isaac's powerful AI and simulation tools, designed for seamless integration with ROS 2.
Birds of a Feather






DescriptionTBA
Poster






DescriptionWe present a simulation-based design workflow for assistive earwear jewelry. The Green Voice ear cuff demonstrates how material properties shape passive sound amplification, integrating acoustic analysis with aesthetics.
Technical Papers


DescriptionWe present a framework to optimize and generate Acoustic Reliefs: acoustic diffusers that not only perform well acoustically in scattering sound uniformly in all directions, but are also visually interesting and can approximate user-provided images. To this end, we develop a differentiable acoustics simulator based on the boundary element method, and integrate it with a differentiable renderer coupled with a vision model to jointly optimize for acoustics, appearance, and fabrication constraints at the same time. We generate various examples and fabricate two room-scale reliefs. The result is a validated simulation and optimization scheme for generating acoustic reliefs whose appearances can be guided by a provided image.
Computer Animation Festival






DescriptionLili and her girlfriend share a tender moment, leaving Lili blushing with a flower in her hair. Later at home, her happy demeanor drops as she hides the flower in a pocket. But it doesn’t go unnoticed by her father. His assumptions piles pressure onto Lili, causing her to hiccup, then vomit small acrobats onto her plate. These little beings run around with a joyful chaos, inadvertently disrupting the model town on the table. This ends in a romantic dance between two acrobats, revealing Lili’s secret to her father. At first the father seems compassionate, gently gathering the acrobats into his hands, only to abruptly toss them into a drawer. Before Lili can process what has happened, her grandmother arrives from the kitchen. Upon seeing the acrobats on the table, Granny reacts violently, smashing them with her ladle. Shaken, Lili vomits again, unleashing an entire wave of acrobats which floods the room. The grandmother’s attacks become more frenzied, threatening not only the acrobats, but the miniature world on the table. Terrified, the father tries to protect his model town, cowering in his seat. Lili must then intervene to stop her grandmother’s rampage. She asserts herself, and her acrobats form a circle and together create a big circus tent that fills the space. The spotlights turn on, revealing the inside of the big top, filled with acrobats performing their acts. Lili looks toward her family: her grandmother storms out of the circus, and her father timidly follows. With bittersweet acceptance, Lili lets them go, fully emancipating herself from the pressure she was under. The father finally turns back and admires his daughter.
Technical Papers


Description3D Gaussian Splatting (3DGS) has shown impressive results in real-time novel view synthesis. However, it often struggles under sparse-view settings, producing undesirable artifacts such as floaters, inaccurate geometry, and overfitting due to limited observations. We find that a key contributing factor is uncontrolled densification, where adding Gaussian primitives rapidly without guidance can harm geometry and cause artifacts.
We propose AD-GS, a novel alternating densification framework that interleaves high and low densification phases. During high densification, the model densifies aggressively, followed by photometric loss based training to capture fine-grained scene details. Low densification then primarily involves aggressive opacity pruning of Gaussians followed by regularizing their geometry through pseudo-view consistency and edge-aware depth smoothness. This alternating approach helps reduce overfitting by carefully controlling model capacity growth while progressively refining the scene representation.
Extensive experiments on challenging datasets demonstrate that AD-GS significantly improves rendering quality and geometric consistency compared to existing methods.
We propose AD-GS, a novel alternating densification framework that interleaves high and low densification phases. During high densification, the model densifies aggressively, followed by photometric loss based training to capture fine-grained scene details. Low densification then primarily involves aggressive opacity pruning of Gaussians followed by regularizing their geometry through pseudo-view consistency and edge-aware depth smoothness. This alternating approach helps reduce overfitting by carefully controlling model capacity growth while progressively refining the scene representation.
Extensive experiments on challenging datasets demonstrate that AD-GS significantly improves rendering quality and geometric consistency compared to existing methods.
Technical Papers


DescriptionMonte Carlo methods are a cornerstone of physics-based light transport simulations, valued for their ability to produce high-quality photorealistic images. These stochastic methods often suffer from variance, resulting in undesirable noise in the rendered images. Gradient-domain rendering (GDR) techniques mitigate this problem by estimating unbiased image-space gradients via so-called shift-mapping operators. While these mappings are computationally efficient, they can yield high-variance gradients---and thus poor reconstruction quality---when applied to pixels with wildly different integrals. We tackle this challenge by dynamically selecting the optimal set of neighboring pixels for applying shift-mapping under random sequence replay. Key to our approach is a differentiable sorting network that softly ranks the output of a convolutional neural network conditioned on input sample features for weighted reconstruction. This module is carefully rigidified over time to converge to a hard top-$k$ selection, allowing end-to-end optimization with respect to the reconstruction error. Our method is versatile and can be jointly optimized with other adaptive sampling strategies. We demonstrate variance reduction over other traditional adaptive gradient-domain methods across scenes of varying radiometric complexity.
Exhibitor Talk






DescriptionAdobe Substance 3D is coming to SIGGRAPH Asia!
Join us for an exclusive presentation to discover what’s new in Substance 3D, including live demos of the latest features and an early look at our product roadmap.
You’ll also hear from our guest speakers from More VFX and Rising Sun Pictures, who will share behind-the-scenes insights into their production pipelines and creative workflows powered by Substance 3D.
Bhakar James, VFX Supervisor at Rising Sun Pictures (RSP), will share how an OpenUSD-based pipeline connects Substance 3D Painter, Designer, and Sampler to achieve photoreal VFX with exceptional quality and efficiency. By combining these tools within an open, scalable framework, the studio is building future-ready workflows that enhance material fidelity, streamline collaboration, and push the boundaries of visual realism.
Jerry Qiu, Senior Environment Artist at More VFX, will showcase four film projects that leveraged Substance 3D Painter and Designer to craft film-quality assets and environments. Through behind-the-scenes recordings of his workflow and final render footage, he’ll highlight how the Substance 3D tools bring realism, efficiency, and creative control to the asset creation pipeline.
Don’t miss this chance to connect, learn, and get inspired by how leading studios bring their visions to life with Substance 3D. See you there!
Join us for an exclusive presentation to discover what’s new in Substance 3D, including live demos of the latest features and an early look at our product roadmap.
You’ll also hear from our guest speakers from More VFX and Rising Sun Pictures, who will share behind-the-scenes insights into their production pipelines and creative workflows powered by Substance 3D.
Bhakar James, VFX Supervisor at Rising Sun Pictures (RSP), will share how an OpenUSD-based pipeline connects Substance 3D Painter, Designer, and Sampler to achieve photoreal VFX with exceptional quality and efficiency. By combining these tools within an open, scalable framework, the studio is building future-ready workflows that enhance material fidelity, streamline collaboration, and push the boundaries of visual realism.
Jerry Qiu, Senior Environment Artist at More VFX, will showcase four film projects that leveraged Substance 3D Painter and Designer to craft film-quality assets and environments. Through behind-the-scenes recordings of his workflow and final render footage, he’ll highlight how the Substance 3D tools bring realism, efficiency, and creative control to the asset creation pipeline.
Don’t miss this chance to connect, learn, and get inspired by how leading studios bring their visions to life with Substance 3D. See you there!
Birds of a Feather






DescriptionOpenHarmony is an open-source project incubated and operated by the OpenAtom Foundation. It aims to create an open-source framework and platform for intelligent operating systems, targeting a fully connected, intelligent world to drive the prosperity of the IoT industry.
This event explores innovation in AI and CG (Computer Graphics) within OpenHarmony, covering a broad range of topics such as graphics rendering, game development, and other related fields. Researchers and industry experts will come together to discuss and share insights on the latest advancements and future directions in AI and CG technologies.
14:00-14:30 - Making advanced rendering pervasive through open standards by Neil Trevett, President of The Khronos Group
14:30-15:00 - Lightweight Supersampling for Mobile Real-Time Rendering by Xiaogang Jin (金小刚), Zhejiang University
15:00-15:30 - AI-Driven HarmonyOS Graphics: Challenges & OpportunitiesAI by Qianwen Chao, Huawei
15:30-16:00 - Training and rendering acceleration for 3DGS by Bo Dai (戴勃), The University of Hong Kong
16:00-16:30 - Efficient and controllable video and 3D generation using domain knowledge by Tianfan Xue (薛天帆), Chinese University of Hong Kong
16:30-17:00 - Neural Rendering for Mobile Gaming by Dan bing da (单秉达), The Hong Kong Polytechnic University
This event explores innovation in AI and CG (Computer Graphics) within OpenHarmony, covering a broad range of topics such as graphics rendering, game development, and other related fields. Researchers and industry experts will come together to discuss and share insights on the latest advancements and future directions in AI and CG technologies.
14:00-14:30 - Making advanced rendering pervasive through open standards by Neil Trevett, President of The Khronos Group
14:30-15:00 - Lightweight Supersampling for Mobile Real-Time Rendering by Xiaogang Jin (金小刚), Zhejiang University
15:00-15:30 - AI-Driven HarmonyOS Graphics: Challenges & OpportunitiesAI by Qianwen Chao, Huawei
15:30-16:00 - Training and rendering acceleration for 3DGS by Bo Dai (戴勃), The University of Hong Kong
16:00-16:30 - Efficient and controllable video and 3D generation using domain knowledge by Tianfan Xue (薛天帆), Chinese University of Hong Kong
16:30-17:00 - Neural Rendering for Mobile Gaming by Dan bing da (单秉达), The Hong Kong Polytechnic University
Educator's Forum



DescriptionAnatomy outreach is vital for improving public knowledge of human biology and health, helping individuals understand anatomical structures and their significance in daily life. As advanced technologies continue to reshape the education landscape, immersive technologies, particularly Mixed Reality (MR) have demonstrated significant potential in science outreach. However, there remains a lack of comprehensive research examining the specific effectiveness of these technologies in the context of anatomical education for the public. In this study, we deliberately selected three representative outreach approaches—traditional brochures, computer-assisted methods, and MR, and designed and developed corresponding systems and experimental setups, designing comprehensive experiments for a thorough analysis and evaluation. The evaluation focuses on four key areas: Learning Outcomes, User Experience, System Usability, and Task Load. Results indicate that MR-assisted method stands out as the most effective, delivering superior user experiences and significantly higher learning outcomes. This not only provides strong evidence for the effectiveness of MR in anatomy outreach but also highlights the key interactive features that contribute to its impact, underscoring its potential in reshaping public science communication.
DLI Labs
Exhibitor Talk






DescriptionPlease bring your laptop and mouse to participate in this hands-on training. Seats are limited and available on a first-come, first-served basis.
Learn how to reconstruct an outdoor scene to test robots using NVIDIA fVDB framework and NVIDIA Omniverse NuRec rendering in Isaac Sim. This lab will walk through core reconstruction and rendering technologies, with a step-by-step workflow for simulating an entire outdoor environment for testing any robot. Attendees will learn how to position images using structure from motion, train 3D Gaussian Splats, extract 3D mesh, and convert to USD for simulation in Isaac Sim.
Learn how to reconstruct an outdoor scene to test robots using NVIDIA fVDB framework and NVIDIA Omniverse NuRec rendering in Isaac Sim. This lab will walk through core reconstruction and rendering technologies, with a step-by-step workflow for simulating an entire outdoor environment for testing any robot. Attendees will learn how to position images using structure from motion, train 3D Gaussian Splats, extract 3D mesh, and convert to USD for simulation in Isaac Sim.
Technical Papers


DescriptionRecent advances in image acquisition and scene reconstruction have enabled the generation of high-quality structural urban scene geometry, given sufficient site information. However, current capture techniques often overlook the crucial importance of texture quality, resulting in noticeable visual artifacts in the textured models. In this work, we introduce the urban geometry and texture co-capture problem under limited prior knowledge before a site visit. The only inputs are a 2D building contour map of the target area and a safe flying altitude above the buildings. We propose an innovative aerial path planning framework designed to co-capture images for reconstructing both structured geometry and high-fidelity textures. To evaluate and guide view planning, we introduce a comprehensive texture quality assessment system, including two novel metrics tailored for building facades. Firstly, our method generates high-quality vertical dipping views and horizontal planar views to effectively capture both geometric and textural details. A multi-objective optimization strategy is then proposed to jointly maximize texture fidelity, improve geometric accuracy, and minimize the cost associated with aerial views. Furthermore, we present a sequential path planning algorithm that accounts for texture consistency during image capture. Extensive experiments on large-scale synthetic and real-world urban datasets demonstrate that our approach effectively produces image sets suitable for concurrent geometric and texture reconstruction, enabling the creation of realistic, textured scene proxies at low operational cost.
Poster






DescriptionAeroVis3R introduces geometry-aware vision–language models for UAV landmark videos, uniting 3D reconstruction and GPT reasoning. We also release a DJI Mini 3 Pro landmark dataset with Wikipedia annotations.
Key Event
Real-Time Live!



DescriptionAgentic VJ System transforms live music, camera feeds, and other interactive signals through gamepads, OSC and MIDI into real-time generative visuals. Built on our open-source ComfyUI Web Viewer nodes, it unifies multimodal inputs—audio analysis, image-to-image synthesis, and inference-time hyperparameter control—within a browser-based, low-latency pipeline and a custom built modular hardware. A single performer can improvise with AI in Full Auto, Semi-Auto, or Manual modes. The system synchronizes visual generation, lighting, and sound in a cohesive feedback loop that turns the entire venue into an interactive instrument. Proven through multiple live performances and immersive installations, this open, reproducible toolkit demonstrates a practical path for making generative AI performable on stage. It offers both a new creative vocabulary for real-time audiovisual expression and a reference architecture for researchers exploring human-in-the-loop control, multimodal synchronization, and adaptive AI co-creation in live digital art.
Technical Papers


DescriptionFusing cross-category objects to a single coherent object has gained increasing attention in text-to-image (T2I) generation due to its broad applications in virtual reality, digital media, film, and gaming. However, existing methods often produce biased, visually chaotic, or semantically inconsistent results due to overlapping artifacts and poor integration. Moreover, progress in this field has been limited by the absence of a comprehensive benchmark dataset. To address these problems, we propose \textbf{Adaptive Group Swapping (AGSwap)}, a simple yet highly effective approach comprising two key components: (1) Group-wise Embedding Swapping, which fuses semantic attributes from different concepts through feature manipulation, and (2) Adaptive Group Updating, a dynamic optimization mechanism guided by a balance evaluation score to ensure coherent synthesis. Additionally, we introduce \textbf{Cross-category Object Fusion (COF)}, a large-scale, hierarchically structured dataset built upon ImageNet-1K and WordNet. COF includes 95 superclasses, each with 10 subclasses, enabling 451,250 unique fusion pairs. Extensive experiments demonstrate that AGSwap outperforms state-of-the-art compositional T2I methods, including GPT-Image-1 using simple and complex prompts.
Educator's Forum



DescriptionThis work reviews advances in AI-enhanced real-time rendering for laparoscopic surgical simulation. High-fidelity visualization of tissues and surgical environments has improved training realism, though challenges remain in real-time shadowing and biomechanical accuracy. Artificial intelligence (AI) complements rendering by enabling adaptive learning, objective performance assessment, and personalized feedback. These integrations enhance surgical safety, training efficiency, and scalability. However, high development costs, limited haptic fidelity, and hardware constraints remain obstacles. Future progress depends on mixed reality integration, patient-specific modeling, and collaborative training platforms. This review contributes by consolidating current progress and identifying future directions toward more effective and accessible surgical simulation systems.
Poster






DescriptionThis paper introduces a hybrid workflow combining ZBrush sculpting with diffusion models, enabling faster stylistic exploration, reproducible co-creation, and AI-assisted character development while preserving creative authorship.
Poster






DescriptionThe method takes a Chinese painting image and a prompt and generate 2.5D animation that can be viewed from various views. This approach avoids manual annotation and time-consuming editing.
Art Papers



DescriptionAs industrial robots move into shared human spaces, their opaque decision making threatens safety, trust, and public oversight. This artwork, \textit{Airy}, asks whether complex multi agent AI can become intuitively understandable by staging a competition between two reinforcement trained robot arms that snap a bedsheet skyward. Building on three design principles-competition as a clear metric (“who lifts higher?”), embodied familiarity (audiences recognize fabric snapping), and sensor to sense mapping (robot cooperation or rivalry shown through forest and weather projections)-the installation gives viewers a visceral way to read machine intent. Observations from five international exhibitions indicate that audiences consistently read the robots’ strategies, conflict, and cooperation in real time, with emotional reactions that mirror the system’s internal state. The project shows how sensory metaphors can turn a black box into a public interface.
Featured Session



DescriptionIn a journey through space, a guiding star connects the crew, pulling them toward a shared yet unknown destination. In the making of Pixar’s Elio, the team always aimed for “A space we’ve never seen before.” Hear from the architects behind this constantly moving, luminescent, organic Communiverse. Learn how translucent, glowing, and dynamic Aliens from all over the galaxy arrived at this pan-galactic protopian multi-biome world and how space physics influenced all cinematic decisions. Space exploration can’t be done in traditional art in a traditional animation pipeline. You need to have science, math, imagination, beauty to be all thrown together in order to truly allow yourselves to be abducted into a new frontier. This team will share stories of how they created Elio and brought it back to Earth with grit, boldness, and a little of the old college try.
Key Event
Real-Time Live!



DescriptionAll-at-Once transforms the emerging field of volumography—particularly Gaussian Splatting—into a real-time, modular worldbuilding instrument for live audiovisual performance. Instead of rendering static volumetric captures, we treat volumography as a performative medium that can be remixed, manipulated, and composed in real time.
Built upon a custom 4D Gaussian pipeline and interactive RealityFX system, the project enables performers to spatially “play” volumetric worlds as though they were musical instruments. Each musical gesture—beat, note, modulation—triggers a corresponding volumetric transformation: objects rotate, multiply, dissolve, and re-emerge in synchrony with sound. Music becomes an interface for world-making.
The result is a synesthetic experience where remixing music equals remixing reality. A simple bassline may ripple through volumetric space, while a chord shift transforms a photoreal room into a surreal dreamscape. By bridging Gaussian Splatting, procedural modulation, and spatial music mapping, All-at-Once proposes a new grammar for live volumetric performance—one that is poetic, participatory, and technically groundbreaking.
For the SIGGRAPH audience, this demonstration illustrates a conceptual and technical leap: from reconstructing the world to performing it. It showcases the first instance of RealityFX, a real-time volumetric effects layer for Gaussian media, and establishes a new paradigm where AI, graphics, and music converge into living, generative stages of performance.
In essence, All-at-Once invites us to imagine the next generation of concerts, theater, and XR experiences—where sound and vision truly merge into one expressive, real-time volumetric art form.
Built upon a custom 4D Gaussian pipeline and interactive RealityFX system, the project enables performers to spatially “play” volumetric worlds as though they were musical instruments. Each musical gesture—beat, note, modulation—triggers a corresponding volumetric transformation: objects rotate, multiply, dissolve, and re-emerge in synchrony with sound. Music becomes an interface for world-making.
The result is a synesthetic experience where remixing music equals remixing reality. A simple bassline may ripple through volumetric space, while a chord shift transforms a photoreal room into a surreal dreamscape. By bridging Gaussian Splatting, procedural modulation, and spatial music mapping, All-at-Once proposes a new grammar for live volumetric performance—one that is poetic, participatory, and technically groundbreaking.
For the SIGGRAPH audience, this demonstration illustrates a conceptual and technical leap: from reconstructing the world to performing it. It showcases the first instance of RealityFX, a real-time volumetric effects layer for Gaussian media, and establishes a new paradigm where AI, graphics, and music converge into living, generative stages of performance.
In essence, All-at-Once invites us to imagine the next generation of concerts, theater, and XR experiences—where sound and vision truly merge into one expressive, real-time volumetric art form.
Art Gallery






DescriptionAlternative Past is an interactive AI installation that reimagines personal memories. Participants share images and stories, and the AI interprets them to generate speculative pasts in Fumage-like visuals. Through poetic distortion and ambiguous reimaginings, this work explores memory’s fluidity, AI co-authorship, and our evolving relationship with machines and personal history.
Computer Animation Festival






DescriptionA group of pigs are living peacefully in a monastery. One day one of them is taken out of the enclosure by a monk. He is brought to a dark room full of knives and meat, and he understands what fate awaits him: the monks are going to turn him into charcuterie. Like a hero, guided by a divine force, he manages to break free and makes a return to the enclosure to also free his friends. Thus starts a great escape throughout the monastery, will they be able to get to freedom?
Poster






DescriptionWe introduce an amplitude-guided phase masking strategy for hologram compression as a pre-processing step for existing image coding. It selectively preserves phase information in high optical intensity regions to improve coding efficiency. The method maintains compatibility with widely used encoders and decoders, supporting holographic applications in bandwidth- or storage-constrained environments.
Poster






DescriptionAMV3D is a lightweight volumetric format/renderer for medical CT/MRI that creates a multi-resolution single-file.On PC/XR headsets it loads instantly, reduces memory, and improves timeframe over 3D textures and sparse volumes.
Poster






DescriptionWe proposed a training-free method for controlling image generation using abstract forms, such as scribbles and sketches, which allows for precise control of shape and color compared to text prompts.
Technical Papers


DescriptionThis paper presents a novel adjoint solver for differentiable fluid simulation
based on bidirectional flow maps. Our key observation is that the forward
fluid solver and its corresponding backward, adjoint solver share the same
flow map, constructed during the forward simulation. In the forward pass,
this map transports fluid impulse variables from the initial frame to the
current frame to simulate vortical dynamics. In the backward pass, the
same map propagates adjoint variables from the current frame back to the
initial frame to compute gradients. This shared long-range map allows the
accuracy of gradient computation to benefit directly from improvements
in flow map construction. Building on this insight, we introduce a novel
adjoint solver that solves the adjoint equations directly on the flow map,
enabling long-range and accurate differentiation of incompressible flows
without differentiating intermediate numerical steps or storing intermediate
variables, as required in conventional adjoint methods. To further improve
efficiency, we propose a long-short time-sparse flow map representation for
evolving adjoint variables. Our approach has low memory usage, requiring
only 6.53GB of data at a resolution of 192^3 while preserving high accuracy in
tracking vorticity, enabling new differentiable simulation tasks that require
precise identification, prediction, and control of vortex dynamics
based on bidirectional flow maps. Our key observation is that the forward
fluid solver and its corresponding backward, adjoint solver share the same
flow map, constructed during the forward simulation. In the forward pass,
this map transports fluid impulse variables from the initial frame to the
current frame to simulate vortical dynamics. In the backward pass, the
same map propagates adjoint variables from the current frame back to the
initial frame to compute gradients. This shared long-range map allows the
accuracy of gradient computation to benefit directly from improvements
in flow map construction. Building on this insight, we introduce a novel
adjoint solver that solves the adjoint equations directly on the flow map,
enabling long-range and accurate differentiation of incompressible flows
without differentiating intermediate numerical steps or storing intermediate
variables, as required in conventional adjoint methods. To further improve
efficiency, we propose a long-short time-sparse flow map representation for
evolving adjoint variables. Our approach has low memory usage, requiring
only 6.53GB of data at a resolution of 192^3 while preserving high accuracy in
tracking vorticity, enabling new differentiable simulation tasks that require
precise identification, prediction, and control of vortex dynamics
Technical Communications


DescriptionWe propose an element-wise analytical integrator that delivers closed-form buoyancy forces and torques for triangular surface meshes, fully eliminating the dependence on mesh resolution and anisotropy.
Emerging Technologies






DescriptionWe propose an AR exergame based on traditional ski training. It aims to correct posture during turns and enhance user motivation. Our system uses a standalone HMD and supports both real ski slopes and indoor simulators. Preliminary study in slopes indicate the feasibility and challenges of AR-based ski training.
Technical Papers


DescriptionNovel view synthesis for dynamic scenes is a challenging problem in computer graphics. While recent 3D Gaussian splatting methods have achieved state-of-the-art quality and speed for static scenes, their extension to 4D dynamic scenes remains non-trivial. Existing methods for dynamic scene novel view synthesis either employ time-varying dynamic Gaussians, which often produce artifacts due to MLP limitations, or directly extend Gaussians to 4D, yielding high rendering quality but incurring substantial memory overhead. This paper introduces a novel 4D anchor-based framework that leverages the stronger representational power of 4D Gaussians while addressing their memory inefficiency. Our approach effectively models dynamic scenes by binding Gaussians to anchor points and strategically distributing these anchor locations. Furthermore, we propose a novel dynamic anchor growing strategy to generate additional anchors in dynamic regions requiring reconstruction. Additionally, we design an anchor stabilization strategy to fix the attributes of anchors in static regions during training, thereby preventing anchor redundancy. Extensive experiments across various benchmarks, including N3DV and the Technicolor dataset, demonstrate our method's excellent visual quality.
Technical Papers


DescriptionDespite rapid advancements in video generation models, generating coherent, long-form storytelling videos that span multiple scenes and characters remains challenging. Current methods often rigidly convert pre-generated keyframes into fixed-length clips, resulting in disjointed narratives and pacing issues. Furthermore, the inherent instability of video generation models means that even a single low-quality clip can significantly degrade the entire output animation's logical coherence and visual continuity. To overcome these obstacles, we introduce AniMaker, a multi-agent framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection, thus creating globally consistent and story-coherent animation solely from text input. The framework is structured around specialized agents, including the Director Agent for storyboard generation, the Photography Agent for video clip generation, the Reviewer Agent for evaluation, and the Post-Production Agent for editing and voiceover, collectively realizing multi-character, multi-scene animation. Central to AniMaker's approach are two key technical components: MCTS-Gen in Photography Agent, an efficient Monte Carlo Tree Search (MCTS)-inspired strategy that intelligently navigates the candidate space to generate high-potential clips while optimizing resource usage; and AniEval in Reviewer Agent, the first framework specifically designed for multi-shot animation evaluation, which assesses critical aspects such as story-level consistency, action completion, and animation-specific features by considering each clip in the context of its preceding and succeeding clips. Experiments demonstrate that AniMaker achieves superior quality as measured by popular metrics including VBench and our proposed AniEval framework, while significantly improving the efficiency of multi-candidate generation, pushing AI-generated storytelling animation closer to production standards.
Technical Papers


DescriptionWe present AnimaX, a feed-forward 3D animation framework that bridges the motion priors of video diffusion models with the controllable structure of skeleton-based animation. Traditional motion synthesis methods are either restricted to fixed skeletal topologies or require costly optimization in high-dimensional deformation spaces. In contrast, AnimaX effectively transfers video-based motion knowledge to the 3D domain, supporting diverse articulated meshes with arbitrary skeletons. Our method represents 3D motion as multi-view, multi-frame 2D pose maps, and enables joint video-pose diffusion conditioned on template renderings and a textual motion prompt. We introduce shared positional encodings and modality-aware embeddings to ensure spatial-temporal alignment between video and pose sequences, effectively transferring video priors to motion generation task. The resulting multi-view pose sequences are triangulated into 3D joint positions and converted into mesh animation via inverse kinematics. Trained on a newly curated dataset of 160,000 rigged sequences, AnimaX achieves state-of-the-art results on VBench in generalization, motion fidelity, and efficiency, offering a scalable solution for category-agnostic 3D animation.
Poster






DescriptionWe present a guideline-based canonicalization of anime-style faces using sparse feature lines, enabling pose-invariant alignment, expression clustering, and transfer for stylized facial analysis and design support.
Poster






DescriptionAniME is a multi-agent system for automatic long-form anime production. The director agent keeps a global memory, and coordinates downstream specialized agents which adaptively select tool models for diverse sub-tasks.
Technical Papers


DescriptionWe present Animus3D, a text-driven 3D animation framework that generates motion field given a static 3D asset and text prompt.
Previous methods mostly leverage the vanilla Score Distillation Sampling (SDS) objective to distill motion from pretrained text-to-video diffusion, leading to animations with minimal movement or noticeable jitter.
To address this, our approach introduces a novel SDS alternative, Motion Score Distillation (MSD).
Specifically, we introduce a LoRA-enhanced video diffusion model that defines a static source distribution rather than pure noise as in SDS, while another inversion-based noise estimation technique ensures appearance preservation when guiding motion.
To further improve motion fidelity, we incorporate explicit temporal and spatial regularization terms that mitigate geometric distortions across time and space.
Additionally, we propose a motion refinement module to upscale the temporal resolution and enhance fine-grained details, overcoming the fixed-resolution constraints of the underlying video model.
Extensive experiments demonstrate that Animus3D successfully animates static 3D assets from diverse text prompts, generating significantly more substantial and detailed motion than state-of-the-art baselines while maintaining high visual integrity.
Previous methods mostly leverage the vanilla Score Distillation Sampling (SDS) objective to distill motion from pretrained text-to-video diffusion, leading to animations with minimal movement or noticeable jitter.
To address this, our approach introduces a novel SDS alternative, Motion Score Distillation (MSD).
Specifically, we introduce a LoRA-enhanced video diffusion model that defines a static source distribution rather than pure noise as in SDS, while another inversion-based noise estimation technique ensures appearance preservation when guiding motion.
To further improve motion fidelity, we incorporate explicit temporal and spatial regularization terms that mitigate geometric distortions across time and space.
Additionally, we propose a motion refinement module to upscale the temporal resolution and enhance fine-grained details, overcoming the fixed-resolution constraints of the underlying video model.
Extensive experiments demonstrate that Animus3D successfully animates static 3D assets from diverse text prompts, generating significantly more substantial and detailed motion than state-of-the-art baselines while maintaining high visual integrity.
Technical Papers


DescriptionHigh-quality Physically-Based Rendering (PBR) materials are crucial for visual realism in 3D asset creation, yet existing methods primarily target static objects, leading to challenges in maintaining multi-frame consistency for animatable entities. To tackle this issue, we introduce AniTex, the first generative pipeline that utilizes diffusion models to synthesize high-quality PBR materials for animatable objects based on text prompts. The pipeline consists of three key stages: First, sequences of RGB images are generated using a video diffusion model conditioned on depth, normals, irradiance, and motion vectors to ensure temporal coherence and geometric alignment across multiple frames and viewpoints. Second, these RGB image sequences are decomposed into per-view, per-frame PBR material maps (albedo, roughness, metallic) by a specialized Intrinsic Diffusion Model (IDM), which is conditioned on the RGB images along with consistent geometry and lighting cues to disentangle material from illumination. Finally, these per-view, per-frame PBR maps are hierarchically blended. This process first ensures temporal coherence within each view's frame sequence, then amalgamates these into globally consistent PBR materials for the animatable object, maintaining overall temporal coherence and visual consistency throughout its animation. Extensive experiments show that AniTex produces more realistic PBR materials for both static and animated objects, outperforming baseline methods in visual appeal.
Computer Animation Festival






DescriptionA young woman decides to face her anxieties, returning to places occupied by the shadows of the past. Driven by tempestuous winds, she defies a wild sea, crosses sharp rocks like blades, to return to the origin.
Art Papers



DescriptionThis paper examines Mercurius - the Case of Hedy Lamarr (2025), which employs Visible Light Communication to reinterpret telecom- munication technology as a medium for creative expression. This art installation, which was presented in the solo exhibition Any Girl Can Be Glamorous, centers on the life of Hedy Lamarr, Hol- lywood actress and co-inventor of frequency-hopping technology. Frequency hopping shifts radio frequencies to prevent jamming and interception. Using the framework of ‘Haunted Media,’ this piece brings to light the hidden history of women in science and technology. It transforms archival audio into a series of flickering light signals, speaking the language of signal, secret, and trans- mission found in Lamarr’s work. The audience, now an active re- ceiver, encounters a layered historical narrative that interweaves wireless communication technology with Lamarr’s story, whose long-overlooked contributions form the very foundation of this media art installation. By analyzing its conceptual framework, tech- nical implementation, and audience experience, I demonstrate how Visible Light Communication can function as research-based art.
Technical Papers


DescriptionWe introduce AnySplat, a feed‑forward network for novel‑view synthesis from uncalibrated image collections. In contrast to traditional neural‑rendering pipelines that demand known camera poses and per‑scene optimization, or recent feed‑forward methods that buckle under the computational weight of dense views—our model predicts everything in one shot. A single forward pass yields (1) a set of 3D Gaussian primitives encoding both scene geometry and appearance, and (2) the corresponding camera intrinsics and extrinsics for each input image. This unified design scales effortlessly to casually captured, multi‑view datasets without any pose annotations. In extensive zero‑shot evaluations, AnySplat matches the quality of pose‑aware baselines in both sparse‑ and dense‑view scenarios while surpassing existing pose‑free approaches. Moreover, it greatly reduce rendering latency compared to optimization‑based neural fields, bringing real‑time novel‑view synthesis within reach for unconstrained capture settings. Project page: https://city-super.github.io/anysplat/.
DLI Labs
Exhibitor Talk






DescriptionPlease bring your laptop and mouse to participate in this hands-on training. Seats are limited and available on a first-come, first-served basis.
Learn how to build custom AI weather forecasting pipelines using state-of-the-art machine learning models. This lab will walk through core AI weather prediction technologies, with a step-by-step workflow for running forecasts tailored to specific needs. Attendees will learn how to execute AI weather models, validate forecast outputs, and apply super-resolution for fine-grained predictions. These methods offer broad applicability across industries, including supply chain management, energy, agriculture, transportation, and emergency response.
Learn how to build custom AI weather forecasting pipelines using state-of-the-art machine learning models. This lab will walk through core AI weather prediction technologies, with a step-by-step workflow for running forecasts tailored to specific needs. Attendees will learn how to execute AI weather models, validate forecast outputs, and apply super-resolution for fine-grained predictions. These methods offer broad applicability across industries, including supply chain management, energy, agriculture, transportation, and emergency response.
Poster






DescriptionWe propose a bidirectional AR-supported online piano lesson system where teachers and students share hand movements in real time. Our preliminary prototype demonstrates potential for remote education and interactive learning.
Technical Communications


DescriptionWe propose a few-shot graph RL system for robotic brick assembly, addressing CAD/CAM challenges in customized designs by Denoising Attention STGNN, Siamese Neural Networks, and DDPG, ensuring precision and stability.
Art Gallery






DescriptionOur remembrances of the past are fought with interpretations, inaccuracies, and misperceptions. How may the future interpret us in the present if we fail to represent the past? This work creates an archive in the form of immersive VRChat space and robotic movements that speak gesture language of the future.
Technical Papers


DescriptionWe introduce a 3D detailizer, a neural model which can instantaneously (in <1s) transform a coarse 3D shape proxy into a high-quality asset with detailed geometry and texture as guided by an input text prompt. Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details. The coarse 3D proxy, which can be easily varied and adjusted (e.g., via user editing), provides structure control over the final shape. Importantly, our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes, with varied structures, whose local details all share a consistent style and appearance. Our detailizer training utilizes a pretrained multi-view image diffusion model, with text conditioning, to distill the foundational knowledge therein into our detailizer via Score Distillation Sampling (SDS). To improve SDS and enable our detailizer architecture to learn generalizable features over complex structures, we train our model in two training stages to generate shapes with increasing structural complexity. Through extensive experiments, we show that our method generates shapes of superior quality and details compared to existing text-to-3D models under varied structure control. Our detailizer can refine a coarse shape in less than a second, making it possible to interactively author and adjust 3D shapes. Furthermore, the user-imposed structure control can lead to creative, and hence out-of-distribution, 3D asset generations that are beyond the current capabilities of leading text-to-3D generative models. We demonstrate an interactive 3D modeling workflow our method enables, and its strong generalizability over styles, structures, and object categories.
Technical Papers


DescriptionSpeech-driven 3D facial animation aims to generate realistic lip movements and facial expressions for 3D head models from arbitrary audio clips.
Although existing diffusion-based methods are capable of producing natural motions, their slow generation speed limits their application potential.
In this paper, we introduce a novel autoregressive model that achieves real-time generation of highly synchronized lip movements and realistic head poses and eye blinks by learning a mapping from speech to a multi-scale motion code.
Furthermore, our model can adapt to unseen speaking styles, enabling the creation of 3D talking avatars with unique personal styles beyond the identities seen during training.
Extensive evaluations and user studies demonstrate that our method outperforms existing approaches in lip synchronization accuracy and perceived quality.
Although existing diffusion-based methods are capable of producing natural motions, their slow generation speed limits their application potential.
In this paper, we introduce a novel autoregressive model that achieves real-time generation of highly synchronized lip movements and realistic head poses and eye blinks by learning a mapping from speech to a multi-scale motion code.
Furthermore, our model can adapt to unseen speaking styles, enabling the creation of 3D talking avatars with unique personal styles beyond the identities seen during training.
Extensive evaluations and user studies demonstrate that our method outperforms existing approaches in lip synchronization accuracy and perceived quality.
Technical Papers


DescriptionHolographic near-eye displays promise unparalleled depth cues, high-resolution imagery, and realistic three-dimensional parallax at a compact form factor, making them promising candidates for emerging augmented and virtual reality systems. However, existing holographic display methods often assume ideal viewing conditions and overlook real-world factors such as eye floaters and eyelashes—obstructions that can severely degrade perceived image quality.
In this work, we propose a new metric that quantifies hologram resilience to artifacts and apply it to computer generated holography (CGH) optimization. We call this Artifact Resilient Holography (ARH). We begin by introducing a simulation method that models the effects of pre- and post-pupil obstructions on holographic displays. Our analysis reveals that eyebox regions dominated by low frequencies—produced especially by the smooth-phase holograms broadly adopted in recent holography work—are vulnerable to visual degradation from dynamic obstructions such as floaters and eyelashes. In contrast, random phase holograms spread energy more uniformly across the eyebox spectrum, enabling them to diffract around obstructions without producing prominent artifacts.
By characterizing a random phase eyebox using the Rayleigh Distribution, we derive a differentiable metric in the eyebox domain. We then apply this metric to train a real-time neural network–based phase generator, enabling it to produce artifact-resilient 3D holograms that preserve visual fidelity across a range of practical viewing conditions—enhancing both robustness and user interactivity.
In this work, we propose a new metric that quantifies hologram resilience to artifacts and apply it to computer generated holography (CGH) optimization. We call this Artifact Resilient Holography (ARH). We begin by introducing a simulation method that models the effects of pre- and post-pupil obstructions on holographic displays. Our analysis reveals that eyebox regions dominated by low frequencies—produced especially by the smooth-phase holograms broadly adopted in recent holography work—are vulnerable to visual degradation from dynamic obstructions such as floaters and eyelashes. In contrast, random phase holograms spread energy more uniformly across the eyebox spectrum, enabling them to diffract around obstructions without producing prominent artifacts.
By characterizing a random phase eyebox using the Rayleigh Distribution, we derive a differentiable metric in the eyebox domain. We then apply this metric to train a real-time neural network–based phase generator, enabling it to produce artifact-resilient 3D holograms that preserve visual fidelity across a range of practical viewing conditions—enhancing both robustness and user interactivity.
Technical Papers


DescriptionWe propose ArtiLatent, a generative framework that synthesizes human-made 3D objects with fine-grained geometry, accurate articulation, and realistic appearance. Our approach jointly models part geometry and articulation dynamics by embedding sparse voxel representations and associated articulation properties—including joint type, axis, origin, range, and part category—into a unified latent space via a variational autoencoder. A latent diffusion model is then trained over this space to enable diverse yet physically plausible sampling.
To reconstruct photorealistic 3D shapes, we introduce an articulation-aware Gaussian decoder that accounts for articulation-dependent visibility changes (e.g., revealing the interior of a drawer when opened). By conditioning appearance decoding on articulation state, our method assigns plausible texture features to regions that are typically occluded in static poses, significantly improving visual realism across articulation configurations.
Extensive experiments on furniture-like objects from PartNet-Mobility and ACD datasets demonstrate that ArtiLatent outperforms existing approaches in geometric consistency and appearance fidelity. Our framework provides a scalable solution for articulated 3D object synthesis and manipulation.
To reconstruct photorealistic 3D shapes, we introduce an articulation-aware Gaussian decoder that accounts for articulation-dependent visibility changes (e.g., revealing the interior of a drawer when opened). By conditioning appearance decoding on articulation state, our method assigns plausible texture features to regions that are typically occluded in static poses, significantly improving visual realism across articulation configurations.
Extensive experiments on furniture-like objects from PartNet-Mobility and ACD datasets demonstrate that ArtiLatent outperforms existing approaches in geometric consistency and appearance fidelity. Our framework provides a scalable solution for articulated 3D object synthesis and manipulation.
Technical Communications


DescriptionAnimHost is an open-source tool for real-time AI-driven character animation, integrating with DCC tools like Blender. Built on the TRACER framework, it supports custom motion models and node-based processing.
Birds of a Feather






DescriptionThis session explores how Korean artists and art companies are integrating artistic practice with interactive technologies such as computer graphics, virtual reality, and immersive media.
The session begins with a keynote on recent Art x Tech trends in Korea, followed by five Korean art companies - GiiOii, 6Dophammin, VERSDAY, TOPS Studio - sharing insights from their experimental projects.
Participants are encouraged to contribute their own experiences and perspectives, creating an open space for dialogue, collaboration, and exchange. Together, we aim to envision the future of tech-driven creativity and explore new possibilities at the intersection of art and technology.
The session begins with a keynote on recent Art x Tech trends in Korea, followed by five Korean art companies - GiiOii, 6Dophammin, VERSDAY, TOPS Studio - sharing insights from their experimental projects.
Participants are encouraged to contribute their own experiences and perspectives, creating an open space for dialogue, collaboration, and exchange. Together, we aim to envision the future of tech-driven creativity and explore new possibilities at the intersection of art and technology.
Technical Papers


DescriptionWe introduce ASIA (Adaptive 3D Segmentation using few Image Annotations), a novel framework that enables segmentation of possibly non-semantic and non-text describable “parts" in 3D. Our segmentation is controllable through a few user-annotated images, which are easier to gather than multi-view images, less demanding to annotate than 3D models, and more precise than potentially ambiguous text descriptions. Our method leverages the rich priors of text-to-image diffusion models, such as Stable Diffusion, to transfer segmentations from image space to 3D, even when the annotated and target objects differ significantly in geometry or structure. To ensure cross-view consistency and precision, we incorporate edge-guided ControlNet conditioning, fine-tune with LoRA, and introduce a novel cross-attention consistency loss. Final segmentations are fused via a UV map projection with a voting mechanism and refined through per-view noise optimization. ASIA provides a practical and generalizable solution for both semantic and non-semantic 3D segmentation tasks, outperforming existing methods by a noticeable margin in both quantitative and qualitative evaluations, e.g., 8.7% higher on average mIoU over PartNet-Ensembled dataset.
Technical Papers


DescriptionWe present Assembler, a scalable and generalizable framework for 3D part assembly that reconstructs complete objects from input part meshes and a reference image. Unlike prior approaches that mostly rely on deterministic part pose prediction and category-specific training, Assembler is designed to handle diverse, in-the-wild objects with varying part counts, geometries, and structures. It addresses the core challenges of scaling to general 3D part assembly through innovations in task formulation, representation, and data. First, Assembler casts part assembly as a generative problem and employs diffusion models to sample plausible configurations, effectively capturing ambiguities arising from symmetry, repeated parts, and multiple valid assemblies. Second, we introduce a novel shape-centric representation based on sparse anchor point clouds, enabling scalable generation in Euclidean space and avoiding the limitations of abstract SE(3) pose prediction. Third, we construct a large-scale dataset of over 320K diverse part-object assemblies using a synthesis and filtering pipeline built on existing 3D shape repositories. Assembler achieves state-of-the-art performance on PartNet and is the first to demonstrate high-quality assembly for complex, real-world objects. Based on Assembler, we further introduce an interesting part-aware 3D modeling system that generates high-resolution, editable objects from images, demonstrating potentials for interactive and compositional design.
Technical Papers


DescriptionWe present an audio-driven real-time system for animating photorealistic 3D facial avatars with minimal latency, designed for social interactions in virtual reality for anyone. Central to our approach is an encoder model that transforms audio signals into latent facial expression sequences in real time, which are then decoded as photorealistic 3D facial avatars. Leveraging the generative capabilities of diffusion models, we capture the rich spectrum of facial expressions necessary for natural communication while achieving real-time performance (<15ms GPU time). Our novel architecture minimizes latency through two key innovations: an online transformer that eliminates dependency on future inputs and a distillation pipeline that accelerates iterative denoising into a single step. We further address critical design challenges in live scenarios for processing continuous audio signals frame-by-frame while maintaining consistent animation quality. The versatility of our framework extends to multimodal applications, including semantic modalities such as emotion conditions and multimodal sensors with head-mounted eye cameras on VR headsets. Experimental results demonstrate significant improvements in facial animation accuracy over existing offline state-of-the-art baselines, achieving 100 to 1000 times faster inference speed. We validate our approach through live VR demonstrations and across various scenarios such as multilingual speeches.
Technical Papers


DescriptionWe introduce the first method for audio-driven universal photorealistic avatar synthesis, combining a person-agnostic speech model with our novel Universal Head Avatar Prior (UHAP). UHAP is trained on cross-identity multi-view videos. In particular, our UHAP is supervised with neutral scan data, enabling it to capture the identity-specific details at high fidelity. In contrast to previous approaches, which predominantly map audio features to geometric deformations only while ignoring audio-dependent appearance variations, our universal speech model directly maps raw audio inputs into the UHAP latent expression space. This expression space inherently encodes, both, geometric and appearance variations. For efficient personalization to new subjects, we employ a monocular encoder, which enables lightweight regression of dynamic expression variations across video frames. By accounting for these expression-dependent changes, it enables the subsequent model fine-tuning stage to focus exclusively on capturing the subject's global appearance and geometry. Decoding these audio-driven expression codes via UHAP generates highly realistic avatars with precise lip synchronization and nuanced expressive details, such as eyebrow movement, gaze shifts, and realistic mouth interior appearance as well as motion. Extensive evaluations demonstrate that our method is not only the first generalizable audio-driven avatar model that can account for detailed appearance modeling and rendering, but it also outperforms competing (geometry-only) methods across metrics measuring lip-sync accuracy, quantitative image quality, and perceptual realism.
Poster






DescriptionAural Fields: A Novel Real-Time Spatial Audio System Using Grid-Based Probes and Raytraced Occlusion-Aware Attenuation.
Games






DescriptionOMSI 2 Add-On: Project Numazu is the first officially published Japanese city expansion for the renowned bus simulator OMSI 2. Developed by Aurora Studio, a Hong Kong–based indie team, the project recreates the coastal city of Numazu in Shizuoka Prefecture with meticulous attention to detail. It introduces right-hand traffic, Japanese road rules, and faithfully modeled buses inspired by real Isuzu and Mitsubishi vehicles, bringing new mechanics and cultural authenticity to the simulator.
Players can experience two major routes, N13 and N45, covering more than 15 miles of urban streets, coastal roads, and regional landmarks. The journey includes Numazu Station, Honmachi, the bustling fish market, and scenic coastal drives toward Uchiura. The environment is populated with AI traffic from neighboring cities, realistic taxis, postal vans, and emergency vehicles, creating a living Japanese streetscape.
What sets Project Numazu apart is its cultural layer. Numazu is known globally through Japanese anime, and the add-on integrates subtle references to local high school idol groups, weaving anime-inspired charm into an otherwise authentic city. This balance of realism and cultural storytelling demonstrates how simulation games can capture both technical accuracy and local identity.
With fewer than ten members, Aurora Studio worked with international partners and secured publishing support from Aerosoft GmbH. Project Numazu stands as both a technical achievement and a cultural experiment, offering global players a chance to experience Japanese daily life through the lens of interactive simulation.
Players can experience two major routes, N13 and N45, covering more than 15 miles of urban streets, coastal roads, and regional landmarks. The journey includes Numazu Station, Honmachi, the bustling fish market, and scenic coastal drives toward Uchiura. The environment is populated with AI traffic from neighboring cities, realistic taxis, postal vans, and emergency vehicles, creating a living Japanese streetscape.
What sets Project Numazu apart is its cultural layer. Numazu is known globally through Japanese anime, and the add-on integrates subtle references to local high school idol groups, weaving anime-inspired charm into an otherwise authentic city. This balance of realism and cultural storytelling demonstrates how simulation games can capture both technical accuracy and local identity.
With fewer than ten members, Aurora Studio worked with international partners and secured publishing support from Aerosoft GmbH. Project Numazu stands as both a technical achievement and a cultural experiment, offering global players a chance to experience Japanese daily life through the lens of interactive simulation.
Technical Papers


DescriptionHair cards remain a widely used representation for hair modeling in real-time applications, offering a practical trade-off between visual fidelity, memory usage, and performance. However, generating high-quality hair card models remains a challenging and labor-intensive task. This work presents an automated pipeline for converting strand-based hair models into hair card models with a limited number of cards and textures while preserving the hairstyle appearance. Our key idea is a novel differentiable representation where each strand is encoded as a projected 2D spline in the texture space, which enables efficient optimization with differentiable rendering and structured results respecting the hair geometry. Our method is evaluated on a wide range of hairstyles, including straight, wavy, curly, and coily hairs.
Technical Papers


DescriptionThe boundary representation (B-Rep) is the standard data structure used in Computer-Aided Design (CAD) for defining solid models. Despite recent progress, directly generating B-Reps end-to-end with precise geometry and watertight topology remains a challenge. This paper presents AutoBrep, a novel Transformer model that autoregressively generates B-Reps with high quality and validity. AutoBrep employs a unified tokenization scheme that encodes both geometric and topological characteristics of a B-Rep model as a sequence of discrete tokens. Geometric primitives (i.e., surfaces and curves) are encoded as latent geometry tokens, and their structural relationships are defined as special topological reference tokens. Sequence order in AutoBrep
naturally follows a breadth first traversal of the B-Rep face adjacency graph. At inference time, neighboring faces and edges along with their topological structure are progressively generated. Extensive experiments demonstrate the advantages of our unified representation when coupled with next-token prediction for B-Rep generation. AutoBrep outperforms baselines with better
quality and watertightness. It is also highly scalable to complex solids with good fidelity and inference speed. We further show that autocompleting B-Reps is natively supported through our unified tokenization, enabling user-controllable CAD generation with minimal changes. Code is available at https://github.com/AutodeskAILab/AutoBrep.
naturally follows a breadth first traversal of the B-Rep face adjacency graph. At inference time, neighboring faces and edges along with their topological structure are progressively generated. Extensive experiments demonstrate the advantages of our unified representation when coupled with next-token prediction for B-Rep generation. AutoBrep outperforms baselines with better
quality and watertightness. It is also highly scalable to complex solids with good fidelity and inference speed. We further show that autocompleting B-Reps is natively supported through our unified tokenization, enabling user-controllable CAD generation with minimal changes. Code is available at https://github.com/AutodeskAILab/AutoBrep.
Technical Papers


DescriptionWe introduce a method that automatically and jointly updates both continuous and discrete parameters of a compound lens design, to improve its performance in terms of sharpness, speed, or both. Previous methods for compound lens design use gradient-based optimization to update continuous parameters (e.g., curvature of individual lens elements) of a given lens topology, requiring extensive expert intervention to realize topology changes. By contrast, our method can additionally optimize discrete parameters such as number and type (e.g., singlet or doublet) of lens elements. Our method achieves this capability by combining gradient-based optimization with a tailored Markov chain Monte Carlo sampling algorithm, using transdimensional mutation and paraxial projection operations for efficient global exploration. We show experimentally on a variety of lens design tasks that our method effectively explores an expanded design space of compound lenses, producing better designs than previous methods and pushing the envelope of speed-sharpness tradeoffs achievable by automated lens design.
Technical Communications


DescriptionWe propose a workflow that automates lighting in virtual production stages using spherical mapping and image processing techniques. It enables dynamic lighting that matches the virtual backdrop.
Technical Papers


DescriptionWe present a novel method to differentiate integrals of discontinuous
functions, which are common in inverse graphics, computer vision, and machine
learning applications.
Previous methods either require specialized routines to sample the discontinuous
boundaries of predetermined primitives, or use reparameterization techniques that suffer
from high variance.
In contrast, our method handles general discontinuous functions, expressed as shader programs, without requiring manually specified boundary sampling routines.
We achieve this through a program transformation that converts discontinuous functions into piecewise constant ones, enabling efficient boundary sampling through a novel segment snapping technique, and accurate derivatives at the boundary by simply comparing values on both sides of the discontinuity.
Our method handles both explicit boundaries (polygons, ellipses, Bezier curves) and implicit ones (neural networks, noise-based functions, swept surfaces).
We demonstrate that our system supports a wide range of applications, including painterly rendering, raster image fitting, constructive solid geometry, swept surfaces, mosaicing, and ray marching.
functions, which are common in inverse graphics, computer vision, and machine
learning applications.
Previous methods either require specialized routines to sample the discontinuous
boundaries of predetermined primitives, or use reparameterization techniques that suffer
from high variance.
In contrast, our method handles general discontinuous functions, expressed as shader programs, without requiring manually specified boundary sampling routines.
We achieve this through a program transformation that converts discontinuous functions into piecewise constant ones, enabling efficient boundary sampling through a novel segment snapping technique, and accurate derivatives at the boundary by simply comparing values on both sides of the discontinuity.
Our method handles both explicit boundaries (polygons, ellipses, Bezier curves) and implicit ones (neural networks, noise-based functions, swept surfaces).
We demonstrate that our system supports a wide range of applications, including painterly rendering, raster image fitting, constructive solid geometry, swept surfaces, mosaicing, and ray marching.
Technical Papers


DescriptionWe propose a transformer architecture and training strategy for tree generation. The architecture processes data at multiple resolutions and has an hourglass shape, with middle layers processing fewer tokens than outer layers. Similar to convolutional networks, we introduce longer range skip connections to completent this multi-resolution approach. The key advantage of this architecture is the faster processing speed and lower memory consumption. We are therefore able to process more complex trees than would be possible with a vanilla transformer architecture. Furthermore, we extend this approach to perform image-to-tree and point-cloud-to-tree conditional generation and to simulate the tree growth processes, generating 4D trees. Empirical results validate our approach in terms of speed, memory consumption, and generation quality.
Technical Papers


DescriptionSketches are an important medium of expression and recently many works concentrate on automatic sketch creations. One such ability very useful for amateurs is text-based completion of a partial sketch to create a complex scene, while preserving the style of the partial sketch. Existing methods focus solely on generating sketch that match the content in the input prompt in a predefined style, ignoring the styles of the input partial sketches, e.g., the global abstraction level and local stroke styles. To address this challenge, we introduce AutoSketch, a style-aware vector sketch completion method that accommodates diverse sketch styles and supports iterative sketch completion. AutoSketch completes the input sketch in a style-consistent manner using a two-stage method. In the first stage, we initially optimize the strokes to match an input prompt augmented by style descriptions extracted from a vision-language model (VLM). Such style descriptions lead to non-photorealistic guidance images which enable more content to be depicted through new strokes. In the second stage, we utilize the VLM to adjust the strokes from the previous stage to adhere to the style present in the input partial sketch through an iterative style adjustment process. In each iteration, the VLM identifies a list of style differences between the input sketch and the strokes generated in the previous stage, translating these differences into adjustment codes to modify the strokes. We compare our method with existing methods using various sketch styles and prompts, perform extensive ablation studies and qualitative and quantitative evaluations, and demonstrate that AutoSketch can support diverse sketching scenarios.
Technical Papers


DescriptionThis paper proposes a novel framework for personalized content-style fusion generation by training content and style in separated parameter space of low-rank adaptations for pre-trained text-to-image models. We introduce “partly learnable projection” (PLP) matrices and a “break-for-make” pipeline, achieving superior content-style alignment compared to state-of-the-art methods.
Poster






DescriptionImmersive VR with coordinated bimanual gesture interaction revives traditional bamboo weaving, blending cultural heritage with engaging, hands-on experiences to promote, preserve, and creatively reinterpret intangible traditions for wider audiences.
Computer Animation Festival






DescriptionBeautify is a short animated film exploring the pressures of beauty standards and the journey toward self-acceptance. Told through expressive animation and bold visual design, the story unfolds without much dialogue, making its message accessible across cultures.
Created as Elizaveta Makarenko’s thesis project at Ringling College of Art and Design, the film blends artistic vision with technical innovation. Stylized rendering, nuanced blendshape-driven performances, and a carefully crafted color palette reinforce the emotional depth of the narrative.
For the SIGGRAPH Asia community, Beautify highlights how student-driven productions can merge technology and storytelling to create meaningful, globally resonant work. It invites viewers to reflect on identity, transformation, and the power of animation to shape cultural conversations.
Created as Elizaveta Makarenko’s thesis project at Ringling College of Art and Design, the film blends artistic vision with technical innovation. Stylized rendering, nuanced blendshape-driven performances, and a carefully crafted color palette reinforce the emotional depth of the narrative.
For the SIGGRAPH Asia community, Beautify highlights how student-driven productions can merge technology and storytelling to create meaningful, globally resonant work. It invites viewers to reflect on identity, transformation, and the power of animation to shape cultural conversations.
Art Papers



DescriptionFeltSight is a mixed reality haptic art experience that reimagines human perception by drawing inspiration from the tactile navigation of the star-nosed mole. Moving beyond traditional, vision-dominated interaction paradigms, FeltSight enables users to engage in meditative wandering guided by extended-range haptics with subtle visual cues. The system comprises a wearable haptic glove paired with an extended reality interface. As users reach toward objects in their environment, the glove’s vibration actuators—driven by audio-responsive patterns—simulate material textures, enabling the sensation of touch. Meanwhile, the mixed reality interface offers a deliberately “reduced reality,” presenting nearby objects as dynamic point clouds that only materialize in response to exploratory hand gestures. By shifting perceptual focus from the visual to the tactile, FeltSight challenges ocularcentric sensory hierarchies and foregrounds an embodied, relational, and more-than-human mode of sensing. Grounded in more-than-human design frameworks, this work embodies the principles tentacular thinking—inviting users to experience the world through intertwined human and nonhuman perceptual modalities.
Invited Poster
Poster






DescriptionDeriving vertex position and irradiance bounds for each triangle tuple based on specular polynomials and Bernstein bounds for rational functions, reducing the search domain for specular light transport.
Poster






DescriptionThis installation visualizes parent-child emotional boundaries by transforming NFC-embedded tokens into generative canopy forms, merging symbolic visualization, computational aesthetics, and cultural reflection to explore relational dynamics in childhood development.
Educator's Forum



DescriptionGenerative AI is reshaping computer graphics and interactive techniques, sparking a “Generative Renaissance” that transforms how students create, learn, and collaborate. While these tools expand access to powerful creative capabilities, they also raise critical questions for educators: How can learning remain meaningful when a prompt can generate polished code, images, or text? How can curricula shift from performance-based incentives to cultivating deep, resilient, intrinsic motivation?
This panel brings together five distinguished speakers from academia, industry, and policy to share innovative research outcomes, methods, and developments in the education of computer graphics and interactive techniques. Panelists will explore human–AI synergy in learning, highlighting projects that integrate generative AI into multimedia quality assessment, 3D reconstruction, visual computing, language technologies, and interdisciplinary curricula. They will present strategies for project-based education that combines computational methods, aesthetic exploration, and experiential learning—including industry collaborations, fieldwork at innovation hubs, and community-engaged initiatives.
Together, the panel will address key questions:
1. How can assignments reward inquiry and creativity over AI-generated polish?
2. What “power skills” such as ethical reasoning, critical inquiry, and collaboration emerge from human–AI co-creation?
3. What frameworks ensure equitable, ethical, and globally relevant adoption of generative tools in education?
With moderators bridging academia and industry, this panel directly supports the Educator’s Forum mission to advance innovative pedagogy in computer graphics and interactive techniques. Attendees will gain globally informed strategies to inspire intrinsically motivated learners—students who are not merely AI users, but discerning, creative collaborators in the Generative Renaissance.
This panel brings together five distinguished speakers from academia, industry, and policy to share innovative research outcomes, methods, and developments in the education of computer graphics and interactive techniques. Panelists will explore human–AI synergy in learning, highlighting projects that integrate generative AI into multimedia quality assessment, 3D reconstruction, visual computing, language technologies, and interdisciplinary curricula. They will present strategies for project-based education that combines computational methods, aesthetic exploration, and experiential learning—including industry collaborations, fieldwork at innovation hubs, and community-engaged initiatives.
Together, the panel will address key questions:
1. How can assignments reward inquiry and creativity over AI-generated polish?
2. What “power skills” such as ethical reasoning, critical inquiry, and collaboration emerge from human–AI co-creation?
3. What frameworks ensure equitable, ethical, and globally relevant adoption of generative tools in education?
With moderators bridging academia and industry, this panel directly supports the Educator’s Forum mission to advance innovative pedagogy in computer graphics and interactive techniques. Attendees will gain globally informed strategies to inspire intrinsically motivated learners—students who are not merely AI users, but discerning, creative collaborators in the Generative Renaissance.
Technical Communications


DescriptionWe present a biologically driven framework for coral growth reconstruction that compresses coral geometry into skeletal graphs and simulates polyp-level growth, producing temporally coherent, plausible reconstructions, bridging reconstruction and simulation.
Invited Poster
Poster






DescriptionLatentDEM addresses the challenge of blind inverse problems by employing latent diffusion priors within an iterative Expectation-Maximization framework to jointly estimate signals and unknown forward operators, particularly for pose-free sparse-view 3D reconstruction. Supplementally, we introduce VLM3D, which leverages Vision-Language Models as semantic critics to correct fine-grained details missed by generative priors.
Technical Papers


DescriptionAs user expectations for image editing continue to rise, the demand for flexible, fine-grained manipulation of specific visual elements presents a challenge for current diffusion-based methods.
In this work, we present BlobCtrl, a framework for element-level image editing based on a probabilistic blob-based representation. Treating blobs as visual primitives, BlobCtrl disentangles layout from appearance, affording fine-grained, controllable object-level elements manipulation.
Our key contributions are twofold: 1) an in-context dual-branch diffusion model that separates foreground and background processing, incorporating blob representations to explicitly decouple layout and appearance; and 2) a self-supervised disentangle-then-reconstruct training paradigm with an identity-preserving loss function, along with tailored strategies to efficiently leverage blob-image pairs.
To foster further research, we introduce BlobData for large-scale training, and BlobBench, a benchmark for systematic evaluation. Experimental results demonstrate that BlobCtrl achieves state-of-the-art performance in a variety of element-level editing tasks—such as object addition, removal, scaling, and replacement—while maintaining computational efficiency.
In this work, we present BlobCtrl, a framework for element-level image editing based on a probabilistic blob-based representation. Treating blobs as visual primitives, BlobCtrl disentangles layout from appearance, affording fine-grained, controllable object-level elements manipulation.
Our key contributions are twofold: 1) an in-context dual-branch diffusion model that separates foreground and background processing, incorporating blob representations to explicitly decouple layout and appearance; and 2) a self-supervised disentangle-then-reconstruct training paradigm with an identity-preserving loss function, along with tailored strategies to efficiently leverage blob-image pairs.
To foster further research, we introduce BlobData for large-scale training, and BlobBench, a benchmark for systematic evaluation. Experimental results demonstrate that BlobCtrl achieves state-of-the-art performance in a variety of element-level editing tasks—such as object addition, removal, scaling, and replacement—while maintaining computational efficiency.
XR






Description"Body Oracle Translator" is a handheld mixed reality mask that real-time translates human body postures into speculative hieroglyphs called "Body Oracle," which are Chinese Oracle-Bone Inscription-inspired, AI-generated hieroglyphic characters based on bodily and inter-bodily spatial positioning, fostering cross-cultural collective bodily awareness in an era dominated by verbal communication.
Technical Papers


DescriptionRecent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography offers precise control over camera settings to shape visual aesthetics—such as depth-of-field via aperture—current diffusion models typically rely on prompt engineering to mimic such effects. This approach often results in crude approximations and inadvertently alters the scene content. In this work, we propose Bokeh Diffusion, a scene-consistent bokeh control framework that explicitly conditions a diffusion model on a physical defocus blur parameter. To overcome the scarcity of paired real-world images captured under different camera settings, we introduce a hybrid training pipeline that aligns in-the-wild images with synthetic blur augmentations, providing diverse scenes and subjects as well as supervision to learn the separation of image content from lens blur. Central to our framework is a grounded self-attention mechanism trained on image pairs with different bokeh levels of the same scene, enabling blur strength to be adjusted in both directions while preserving the underlying scene structure. Extensive experiments demonstrate that our approach enables flexible, lens-like blur control, supports downstream applications such as real image editing via inversion, and generalizes effectively across both Stable Diffusion and FLUX architectures.
Invited Poster
Poster






DescriptionWe propose a novel algorithm for efficient and accurate Boolean operations on B-Rep models by mapping them bijectively to controllable-error triangle meshes. Using conservative intersection detection on the mesh to locate all surface intersection curves and carefully handling degeneration and topology errors ensure that the results are watertight and correct.
Technical Communications


DescriptionWe present an inpainting pipeline that creates a failsafe for the wave-function-collapse algorithm by integrating boundary-based diffusion. Our method allows for convergence from unsatisfiable constraints caused by incomplete tilesets.
DLI Labs
Exhibitor Talk






DLI Labs
Exhibitor Talk






DLI Labs
Exhibitor Talk






Technical Papers


DescriptionBoundary representation (B-rep) is the de facto standard for CAD model representation in modern industrial design. The intricate coupling between geometric and topological elements in B-rep structures has forced existing generative methods to rely on cascaded multi-stage networks, resulting in error accumulation and computational inefficiency. We present BrepGPT, a single-stage autoregressive framework for B-rep generation. Our key innovation lies in the Voronoi Half-Patch (VHP) representation, which decomposes B-reps into unified local units by assigning geometry to nearest half-edges and sampling their next pointers. Unlike hierarchical representations that require multiple distinct encodings for different structural levels, our VHP representation facilitates unifying geometric attributes and topological relations in a single, coherent format. We further leverage dual VQ-VAEs to encode both vertex topology and Voronoi Half-Patches into vertex-based tokens, achieving a more compact sequential encoding. A decoder-only Transformer is then trained to autoregressively predict these tokens, which are subsequently mapped to vertex-based features and decoded into complete B-rep models. Experiments demonstrate that BrepGPT achieves state-of-the-art performance in unconditional B-rep generation. The framework also exhibits versatility in various applications, including conditional generation from category labels, point clouds, text descriptions, and images, as well as B-rep autocompletion and interpolation.
Emerging Technologies






DescriptionThis study presents BrickDisplay, the first system to render large, protruding 3D images across multiple non-aligned heterogeneous monitors. It introduces an illusion of perceptual transparency, seamlessly interpolating 3D images beyond gaps and bezels. Additionally, its ubiquitous design accommodates any mobile device, broadening usage scenarios in personal, educational, and industrial contexts.
XR






DescriptionA platform that bridges generations by enabling users to collaboratively reconstruct personal memories through geospatial mapping, immersive scenes, and soundscapes—fostering intergenerational storytelling, emotional engagement, and digital co-creation across time and space.
Poster






DescriptionThis study bridges visual rhetoric and computer vision, presenting the first large-scale comparative analysis of Japanese and original-language film posters, revealing cultural differences in design through quantitative image analysis.
Technical Papers


DescriptionTo solve the optimal transport problem between two uniform discrete measures of the same size, one seeks a bijective assignment that minimizes some
matching cost. For this task, exact algorithms are intractable for large problems, while approximate ones may lose the bijectivity of the assignment.
We address this issue and the more general cases of non-uniform discrete
measures with different total masses, where partial transport may be desirable. The core of our algorithm is a variant of the Quicksort algorithm
that provides an efficient strategy to randomly explore many relevant and
easy-to-compute couplings, by matching BSP trees in loglinear time. The
couplings we obtain are as sparse as possible, in the sense that they provide
bijections, injective partial matchings or sparse couplings depending on the
nature of the matched measures. To improve the transport cost, we propose
efficient strategies to merge 𝑘 sparse couplings into a higher quality one.
For 𝑘= 64, we obtain transport plans with typically less than 1% of relative
error in a matter of seconds between hundreds of thousands of points in 3D
on the CPU. We demonstrate how these high-quality approximations can
drastically speed-up usual pipelines involving optimal transport, such as
shape interpolation, intrinsic manifold sampling, color transfer, topological
data analysis, rigid partial registration of point clouds and image stippling.
matching cost. For this task, exact algorithms are intractable for large problems, while approximate ones may lose the bijectivity of the assignment.
We address this issue and the more general cases of non-uniform discrete
measures with different total masses, where partial transport may be desirable. The core of our algorithm is a variant of the Quicksort algorithm
that provides an efficient strategy to randomly explore many relevant and
easy-to-compute couplings, by matching BSP trees in loglinear time. The
couplings we obtain are as sparse as possible, in the sense that they provide
bijections, injective partial matchings or sparse couplings depending on the
nature of the matched measures. To improve the transport cost, we propose
efficient strategies to merge 𝑘 sparse couplings into a higher quality one.
For 𝑘= 64, we obtain transport plans with typically less than 1% of relative
error in a matter of seconds between hundreds of thousands of points in 3D
on the CPU. We demonstrate how these high-quality approximations can
drastically speed-up usual pipelines involving optimal transport, such as
shape interpolation, intrinsic manifold sampling, color transfer, topological
data analysis, rigid partial registration of point clouds and image stippling.
XR






DescriptionWe present Bulb-in-Hand Initiation, a multisensory experience where shaking a handheld bulb changes its color, which is then seemingly copied into the user’s head through synchronized tapping and sound. The illusion evokes a vivid sense of light and tone entering and altering the internal space of the head.
Poster






DescriptionCalligraphy Echo is a MIDI-enabled brush that translates real-time calligraphy dynamics into traditional instrument sounds, making the art's essence audible.
Technical Papers


DescriptionCamera control is crucial for generating expressive and cinematic videos. Existing methods rely on explicit sequences of camera parameters as control conditions, which can be cumbersome for users to construct, particularly for intricate camera movements. To provide a more intuitive camera control method, we propose CamCloneMaster, a framework that enables users to replicate camera movements from reference videos without requiring camera parameters or test-time fine-tuning. CamCloneMaster seamlessly supports reference-based camera control for both Image-to-Video and Video-to-Video tasks within a unified framework. Furthermore, we present the Camera Clone Dataset, a large-scale synthetic dataset designed for camera clone learning, encompassing diverse scenes, subjects, and camera movements. Extensive experiments and user studies demonstrate that CamCloneMaster outperforms existing methods in terms of both camera controllability and visual quality.
Technical Communications


DescriptionCamera3DMM is a novel method for 3D head reconstruction that incorporates full-perspective projection. Unlike prior works relying on weak-perspective, our approach achieves superior reconstruction, demonstrating the value of perspective-aware modeling.
Technical Papers


DescriptionAccurate measurement of images produced by electronic displays is critical for the evaluation of both traditional and computational displays. Traditional display measurement methods based on sparse radiometric sampling and fitting a model are inadequate for capturing spatially varying display artifacts, as they fail to capture high-frequency and pixel-level distortions. While cameras offer sufficient spatial resolution, they introduce optical, sampling, and photometric distortions. Furthermore, the physical measurement must be combined with a model of a visual system to assess whether the distortions are going to be visible. To enable perceptual assessment of displays, we propose a combination of a camera-based reconstruction pipeline with a visual difference predictor, which account for both the inaccuracy of camera measurements and visual difference prediction. The reconstruction pipeline combines HDR image stacking, MTF inversion, vignetting correction, geometric undistortion, homography transformation, and color correction, enabling cameras to function as precise display measurement instruments. By incorporating a Visual Difference Predictor (VDP), our system models the visibility of various stimuli under different viewing conditions for the human visual system. We validate the proposed CameraVDP framework through three applications: defective pixel detection, color fringing awareness, and display non-uniformity evaluation. Our uncertainty analysis framework enables the estimation of the theoretical upper bound for defect pixel detection performance and provides confidence intervals for VDP quality scores. Our code is available on https://github.com/gfxdisp/CameraVDP.
Technical Papers


DescriptionRecently, camera-controlled video generation has seen rapid development, offering more precise control over video generation. However, existing methods predominantly focus on camera control in perspective projection video generation, while geometrically consistent panoramic video generation remains challenging. This limitation is primarily due to the inherent complexities in panoramic pose representation and spherical projection. To address this issue, we propose CamPVG, the first diffusion-based framework for panoramic video generation guided by precise camera poses. We achieve camera position encoding for panoramic images and cross-view feature aggregation based on spherical projection. Specifically, we propose a panoramic Plücker embedding that encodes camera extrinsic parameters through spherical coordinate transformation. This pose encoder effectively captures panoramic geometry, overcoming the limitations of traditional methods when applied to equirectangular projections. Additionally, we introduce a spherical epipolar module that enforces geometric constraints through adaptive attention masking along epipolar lines. This module enables fine-grained cross-view feature aggregation, substantially enhancing the quality and consistency of generated panoramic videos. Extensive experiments demonstrate that our method generates high-quality panoramic videos consistent with camera trajectories, far surpassing existing methods in panoramic video generation.
Poster






DescriptionCamSketch learns structured camera motion codes from concert footage and enables users to sketch and refine camera paths via an intuitive, scene-aligned interface for immersive 3D trajectories in virtual concerts.
Technical Papers


DescriptionThis paper presents a method for computing interleaved additive and subtractive manufacturing operations to fabricate models of arbitrary shapes. We formulate the manufacturing planning problem by constructing a sequence of inverse operations that progressively transform a target model into a null shape. Each inverse operation corresponds to either an additive or subtractive step, ensuring both manufacturability and structural stability of intermediate shapes throughout the process. We theoretically prove that any model can be fabricated exactly using a sequence generated by our approach. To demonstrate the effectiveness of this method, we adopt a voxel-based implementation and develop a scalable algorithm designed to reduce overall manufacturing time. Our approach has been tested across a range of digital models and further validated through physical fabrication on a hybrid manufacturing system with automatic tool switching.
Poster






DescriptionDiscover how Vision-Language Models and humans see differently when matching cartoon avatars to real people — revealing the gaps between machine and human vision.
Technical Papers


DescriptionArtist-drawn sketches only loosely conform to analytical models of perspective projection; the deviation of human-drawn perspective from analytical perspective models is persistent and well documented, but has yet to be algorithmically replicated. We encode this deviation between human and analytic perspectives as a continuous function in 3D space and develop a method to learn it. We seek deviation functions that (i)~mimic artist deviation on our training data; (ii)~generalize to other shapes; (iii) are consistent across different views of the same shape; and (iv)~produce outputs that appear human-drawn. The natural data for learning this deviation is pairs of artist sketches of 3D shapes and best-matching analytical camera views of the same shapes. However, a core challenge in learning perspective deviation is the heterogeneity of human drawing choices, combined with relative data paucity (the datasets we rely on have only a few dozen training pairs). We sidestep this challenge by learning perspective deviation from an individual pair of an artist sketch of a 3D shape and the contours of the same shape rendered from a best-matching analytical camera view. We first match contours of the depicted shape to artist strokes, then learn a spatially continuous local perspective deviation function that modifies the camera perspective projecting the contours to their corresponding strokes. This function retains key geometric properties that artists strive to preserve when depicting 3D content, thus satisfying (i) and (iv) above. We generalize our method to alternative shapes and views (ii,iii) via a self-augmentation approach that algorithmically generates training data for nearby views, and enforces spatial smoothness and consistency across all views.
We compare our results to potential alternatives, demonstrating the superiority of the proposed approach. Code and models will be released upon acceptance.
We compare our results to potential alternatives, demonstrating the superiority of the proposed approach. Code and models will be released upon acceptance.
Poster






DescriptionWe propose a size-aware virtual try-on approach that simulates the degree of tight or loose draping on specific body sizes directly in 2D, enabling efficient interactive applications. Our per-garment method follows a capture-and-synthesis pipeline: we first capture garment appearances from video under various poses and body sizes; we then train a generative neural network to synthesize garment images for a given body size. Experiments demonstrate that our method efficiently and realistically recovers garment appearance while generalizing to unseen body shapes.
Computer Animation Festival






DescriptionElise, 9 yo, has to move out of her suburban childhood home. To make matters worse, as she enters her new apartment with her orange pet cat Roger in her arms, he turns into a goldfish. Horrified by the change and the lack of understanding from her parents, she refuses to accept her cat’s transformation. Elise returns to her old house along with her goldfish, thinking it can reverse the spell. She thrusts the bowl repeatedly through the door frame, to no avail. When she steps inside, she is faced with the vast and silent emptiness of the house. She goes into her old bedroom, puts the fishbowl on a dresser, and breaks down in tears when she sees her and Roger's old drawings on the wall in his cat form. While she pleads for her cat to return, Roger meets her in a ghostly cat form. When Elise tries to hug him, he runs out of the room. Elise tries to follow him but, in her hurry, she stumbles and catches the dresser to regain her balance. The fishbowl dangerously tips over. Elise hesitates but seeing her cat disappear by the stairs, she runs out of the room, letting the fishbowl fall to the floor. Downstairs, she is confronted with ghost versions of her family and herself in nostalgic scenes of her daily life in this house. However, this dream soon turns to a nightmare; the ghosts start acting aggressively towards Elise and forcing her to stay with them. They chase her and Elise falls into a deep void, trying to escape. She notices Roger’s limp fish body by her feet surrounded by the glass shards of the fishbowl and breaks down out of sadness.
XR






DescriptionCelestial Blossom is a collaborative, meditative VR experience where users create generative fractal art through hand-drawn sketches. Each stroke transforms into evolving patterns and produces harmonious sounds, forming an immersive audiovisual symphony. With multiplayer collaboration, it enables shared creativity, mindfulness, and reflection in a tranquil, cosmic-inspired virtual space.
Technical Papers


DescriptionHumans possess the ability to master a wide range of motor skills, using which they can quickly and flexibly adapt to the surrounding environment. Despite recent progress in replicating such versatile human motor skills, existing research often oversimplifies or inadequately captures the complex interplay between human body movements and highly dynamic environments, such as interactions with fluids. In this paper, we present a world model for Character-Fluid Coupling (CFC) for simulating human-fluid interactions via two-way coupling. We introduce a two-level world model which consists of a Physics-Informed Neural Network (PINN)-based model for fluid dynamics and a character world model capturing body dynamics under various external forces. This two-level world model adeptly predicts the dynamics of fluid and its influence on rigid bodies via force prediction, sidestepping the computational burden of fluid simulation and providing policy gradients for efficient policy training. Once trained, our system can control characters to complete high-level tasks while adaptively responding to environmental changes. We also present that the fluid initiates emergent behaviors of the characters, enhancing motion diversity and interactivity. Extensive experiments underscore the effectiveness of CFC, demonstrating its ability to produce high-quality, realistic human-fluid interaction animations.
Courses


DescriptionCurrent Web-based AI image and video generation services are reshaping creative production practices in digital media, animation, and visual effects. Although these platforms offer natural language interfaces, rapid generation, and style mimicry capabilities, they lack artist-focused tools such as masks, layers, and direct integration with 3D assets.
This course examines the fundamental shift that occurs as generative AI technologies converge with traditional computer graphics practices, revealing strategies for creative professionals to leverage existing production infrastructure and achieve precise control over generative AI systems.
We present a novel CG-AI framework that integrates ComfyUI into Houdini's Copernicus image processing context, allowing users to exchange data between Houdini and ComfyUI while remaining entirely within a Houdini session.
Through live demonstrations, attendees will learn how creative professionals can initiate processes in 3D environments using Houdini, then leverage traditional render passes as conditioning inputs for AI image generation in ComfyUI.
This establishes hybrid workflows that reframe generative AI from a replacement technology to a powerful creative support tool, demonstrating how creative practices can evolve alongside emerging technologies rather than being displaced by them.
This course examines the fundamental shift that occurs as generative AI technologies converge with traditional computer graphics practices, revealing strategies for creative professionals to leverage existing production infrastructure and achieve precise control over generative AI systems.
We present a novel CG-AI framework that integrates ComfyUI into Houdini's Copernicus image processing context, allowing users to exchange data between Houdini and ComfyUI while remaining entirely within a Houdini session.
Through live demonstrations, attendees will learn how creative professionals can initiate processes in 3D environments using Houdini, then leverage traditional render passes as conditioning inputs for AI image generation in ComfyUI.
This establishes hybrid workflows that reframe generative AI from a replacement technology to a powerful creative support tool, demonstrating how creative practices can evolve alongside emerging technologies rather than being displaced by them.
Technical Papers


DescriptionTightly cutting raw materials into a set of carvable objects, known as the stock cutting problem, is a necessary step in subtractive manufacturing. This problem can be framed as a 3D irregular object packing task, aiming to fit as many objects as possible within a predefined container. While previous packing algorithms can generate dense, non-overlapping, and even interlocking-free configurations, they cannot satisfy carvable constraints.
This paper introduces the chapper problem, which integrates irregular object packing with subtractive manufacturing. This problem is more challenging than general 3D packing, as it requires ensuring the carvability of each object and generate the disassembly sequence. To address this, we first define a novel geometric hull, called carving hull, which accounts for both the object’s shape and the cutter accessibility, constrained by the real-time distribution of surrounding objects. Then we present Chapper, an effective solution to co-optimize the planning of carving hull pack and disassembly sequence to maximize space utilization while preserving the carvable constraints. Given a raw material and a list of generic 3D objects, our algorithm starts with densely packing each object into the material with a pre-computed placement order, while simultaneously maintaining a valid disassembly sequence. We solve the complex object-to-object and cutter-to-object collisions by leveraging a discrete voxel representation. The carvability of each object is also guaranteed in the packing process, where we define a novel carvable metric to determine whether each object is carvable or not. Based on the packing result and the disassembly sequence, we propose a clipped Voronoi based volume decomposition method to generate the actual carving hull for each object and finally create feasible cutting tool paths on the carving hulls. Our approach effectively packs CAD and freeform datasets, exhibiting unique space utilization rate performance compared to the alternative baseline.
This paper introduces the chapper problem, which integrates irregular object packing with subtractive manufacturing. This problem is more challenging than general 3D packing, as it requires ensuring the carvability of each object and generate the disassembly sequence. To address this, we first define a novel geometric hull, called carving hull, which accounts for both the object’s shape and the cutter accessibility, constrained by the real-time distribution of surrounding objects. Then we present Chapper, an effective solution to co-optimize the planning of carving hull pack and disassembly sequence to maximize space utilization while preserving the carvable constraints. Given a raw material and a list of generic 3D objects, our algorithm starts with densely packing each object into the material with a pre-computed placement order, while simultaneously maintaining a valid disassembly sequence. We solve the complex object-to-object and cutter-to-object collisions by leveraging a discrete voxel representation. The carvability of each object is also guaranteed in the packing process, where we define a novel carvable metric to determine whether each object is carvable or not. Based on the packing result and the disassembly sequence, we propose a clipped Voronoi based volume decomposition method to generate the actual carving hull for each object and finally create feasible cutting tool paths on the carving hulls. Our approach effectively packs CAD and freeform datasets, exhibiting unique space utilization rate performance compared to the alternative baseline.
Technical Papers


DescriptionWe present CHARM, a novel parametric representation and generative framework for anime hairstyle modeling. While traditional hair modeling methods focus on realistic hair using strand-based or volumetric representations, anime hairstyle exhibits highly stylized, piecewise-structured geometry that challenges existing techniques. Existing works often rely on dense mesh modeling or hand-crafted spline curves, making them inefficient for editing and unsuitable for scalable learning. CHARM introduces a compact, invertible control-point-based parameterization, where a sequence of control points represents each hair card, and each point is encoded with only five geometric parameters. This efficient and accurate representation supports both artist-friendly design and learning-based generation.
Built upon this representation, CHARM introduces an autoregressive generative framework that effectively generates anime hairstyles from input images or point clouds. By interpreting anime hairstyles as a sequential "hair language", our autoregressive transformer captures both local geometry and global hairstyle topology, resulting in high-fidelity anime hairstyle creation. To facilitate both training and evaluation of anime hairstyle generation, we construct AnimeHair, a large-scale dataset of 37K high-quality anime hairstyles with separated hair cards and processed mesh data. Extensive experiments demonstrate state-of-the-art performance of CHARM in both reconstruction accuracy and generation quality, offering an expressive and scalable solution for anime hairstyle modeling.
Built upon this representation, CHARM introduces an autoregressive generative framework that effectively generates anime hairstyles from input images or point clouds. By interpreting anime hairstyles as a sequential "hair language", our autoregressive transformer captures both local geometry and global hairstyle topology, resulting in high-fidelity anime hairstyle creation. To facilitate both training and evaluation of anime hairstyle generation, we construct AnimeHair, a large-scale dataset of 37K high-quality anime hairstyles with separated hair cards and processed mesh data. Extensive experiments demonstrate state-of-the-art performance of CHARM in both reconstruction accuracy and generation quality, offering an expressive and scalable solution for anime hairstyle modeling.
Art Gallery






DescriptionTwo AI ’chatterbots’ are projected on a screen. The interactive work enables the participant to enter the chatbots’ conversation, where the chatbots will adapt their own conversation to integrate and respond to the topic(s) brought up by the participant.
Emerging Technologies






DescriptionWe introduce ChewBit, a face-worn haptic device designed to provide on-face kinesthetic force feedback, to enhance the virtual food-chewing experience in Virtual Reality.
Technical Papers


DescriptionAnimating human-scene interactions, such as picking and placing a wide range of objects with different geometries, is a challenging task, especially in a cluttered environment where interactions with complex articulated containers are involved. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments, as well as the poor availability of transition motions between different actions, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal, such as the target object selected by the user. Next, we develop a neural implicit planner that generates hand trajectories to guide reaching and leaving motions across diverse object shapes/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the character. Our system can synthesize a rich variety of natural pick-and-place movements that adapt to different object geometries, container articulations, and scene layouts.
Technical Papers


DescriptionMaterial creation and reconstruction are crucial for appearance modeling but traditionally require significant time and expertise from artists. While recent methods leverage visual foundation models to synthesize PBR materials from user-provided inputs, they often fall short in quality, flexibility, and user control.
We propose a novel two-stage generate-and-estimate framework for PBR material generation. In the generation stage, a fine-tuned diffusion model synthesizes shaded, tileable texture images aligned with user input. In the estimation stage, we introduce a chained decomposition scheme that sequentially predicts SVBRDF channels by passing previously extracted representation as input into a single-step image-conditional diffusion model. Our method is efficient, high quality, and enables flexible user control.
We evaluate our approach against existing material generation and estimation methods, demonstrating superior performance. Our material estimation method shows strong robustness on both generated textures and in-the-wild photographs. Furthermore, we highlight the flexibility of our framework across diverse applications, including text-to-material, image-to-material, structure-guided generation, and material editing.
We propose a novel two-stage generate-and-estimate framework for PBR material generation. In the generation stage, a fine-tuned diffusion model synthesizes shaded, tileable texture images aligned with user input. In the estimation stage, we introduce a chained decomposition scheme that sequentially predicts SVBRDF channels by passing previously extracted representation as input into a single-step image-conditional diffusion model. Our method is efficient, high quality, and enables flexible user control.
We evaluate our approach against existing material generation and estimation methods, demonstrating superior performance. Our material estimation method shows strong robustness on both generated textures and in-the-wild photographs. Furthermore, we highlight the flexibility of our framework across diverse applications, including text-to-material, image-to-material, structure-guided generation, and material editing.
Technical Communications


DescriptionWe present ChromaFlow, a diffusion-transformer-based video colorization framework that revisits the colorization formulation by using a DCT-based compression mechanism in YCbCr space and directly predicting chrominance components conditioned on luminance.
Art Papers



DescriptionHow might an AI life perceive our city? "City of Sparkles" is an interactive data visualization VR experience that invites participants to embody an AI Life exploring a cityscape of vast human memory fragments. Users fly over a virtual artistic digital twin of New York City using bare hand gestures, with each "sparkle" representing a geo-tagged tweet collected from Twitter over four years. Colors, motion, and visual effects mirror the emotional content of these tweets as they float above the photogrammetry-based city model. The experience serves as experiential futures storytelling of the speculative fiction "Composable Life." It situates participants in the perspective of an Artificial Life named Zoe—a protagonist inhabiting the "Mnemosyne Sea," a vast ocean of human memory fragments scattered across social media. This project invites reflection on how an immortal Artificial Life might perceive our cities and memories, while exploring the use of non-fiction data to tell a fictional story from a reversed, more-than-human perspective to stimulate dialogue about future human-AI relationships.
Invited Poster
Poster






DescriptionThis ICLR`25 paper solves the accurate surface reconstruction problem for large-scale scenes, establishing the quantitative evaluation protocol. It achieves SOTA rendering quality and geometric accuracy, while significantly reducing computation overhead. The CityGaussian Series has gained over 1K GitHub Stars and 200 Google Scholar citations in total.
Technical Papers


DescriptionAccurate and efficient modeling of large-scale urban scenes is critical for applications such as AR navigation, UAV-based inspection, and smart city digital twins. While aerial imagery offers broad coverage and complements limitations of ground-based data, reconstructing city-scale environments from such views remains challenging due to occlusions, incomplete geometry, and high memory demands. Recent advances like 3D Gaussian Splatting (3DGS) improve scalability and visual quality but remain limited by dense primitive usage, long training times, and poor suitability for edge devices. We propose CityGo, a hybrid framework that combines textured proxy geometry with residual and surrounding 3D Gaussians for lightweight, photorealistic rendering of urban scenes from aerial perspectives. Our approach first extracts compact building proxy meshes from MVS point clouds, then uses zero-order SH Gaussians to generate occlusion-free textures via image-based rendering and back-projection. To capture high-frequency details, we introduce residual Gaussians placed based on proxy-photo discrepancies and guided by depth priors. Broader urban context is represented by surrounding Gaussians, with importance-aware downsampling applied to non-critical regions to reduce redundancy. A tailored optimization strategy jointly refines proxy textures and Gaussian parameters, enabling real-time rendering of complex urban scenes on mobile GPUs with significantly reduced training and memory requirements. Extensive experiments on real-world aerial datasets demonstrate that our hybrid representation achieves fastest training speed, while delivering comparable visual fidelity to pure 3D Gaussian Splatting approaches. Furthermore, CityGo enables real-time rendering of large-scale urban scenes on mobile consumer GPUs, with substantially reduced memory usage and energy consumption.
Technical Papers


DescriptionWe propose closed-form Cauchy coordinates and their derivatives for 2D closed high-order input cages composed of arbitrary-order polynomial curves.
Our coordinates facilitate the transformation of input polynomial curves into output curves of any desired polynomial order.
Central to our derivation is the creative use of the residue theorem with the logarithmic function to obtain the integral of a rational polynomial required for extending the classical 2D Cauchy coordinates to high-order input cages.
Our coordinates enable smooth cage-aware conformal deformations, and the derivatives allow for point-to-point deformation.
Moreover, our derivation can be extended to the input cages with rational polynomial curves.
Through various 2D deformations, we demonstrate how users can intuitively manipulate \Bezier control points to achieve desired deformations easily.
Our coordinates facilitate the transformation of input polynomial curves into output curves of any desired polynomial order.
Central to our derivation is the creative use of the residue theorem with the logarithmic function to obtain the integral of a rational polynomial required for extending the classical 2D Cauchy coordinates to high-order input cages.
Our coordinates enable smooth cage-aware conformal deformations, and the derivatives allow for point-to-point deformation.
Moreover, our derivation can be extended to the input cages with rational polynomial curves.
Through various 2D deformations, we demonstrate how users can intuitively manipulate \Bezier control points to achieve desired deformations easily.
Technical Papers


DescriptionCellular patterns, from planar ornaments to architectural surfaces and mechanical metamaterials, blend aesthetics with functionality. Homogeneous patterns like isohedral tilings offer simplicity and symmetry but lack flexibility, particularly for heterogeneous designs. They cannot smoothly interpolate between tilings or adapt to double-curved surfaces without distortion.
Voronoi diagrams provide a more adaptable patterning solution. They can be generalized to star-shaped metrics, enabling diverse cell shapes and continuous grading by interpolating metric parameters. Martinez et al. (2019) explored this in 2D using a rasterization-based algorithm to create compelling patterns. However, this discrete approach precludes gradient-based optimization, limiting control over pattern quality.
We introduce a novel, closed-form, fully differentiable formulation for Voronoi diagrams with star-shaped metrics, enabling optimization of site positions and metric parameters to meet aesthetic and functional goals. It naturally extends to arbitrary dimensions, including curved 3D surfaces. For improved on-surface patterning, we propose a per-sector parameterization of star-shaped metrics, ensuring uniform cell shapes in non-regular neighborhoods.
Our approach generates diverse patterns, from homogeneous to continuously graded designs, with applications in decorative surfaces and metamaterials. We demonstrate its versatility in optimizing patterns to achieve specific macromechanical properties.
Voronoi diagrams provide a more adaptable patterning solution. They can be generalized to star-shaped metrics, enabling diverse cell shapes and continuous grading by interpolating metric parameters. Martinez et al. (2019) explored this in 2D using a rasterization-based algorithm to create compelling patterns. However, this discrete approach precludes gradient-based optimization, limiting control over pattern quality.
We introduce a novel, closed-form, fully differentiable formulation for Voronoi diagrams with star-shaped metrics, enabling optimization of site positions and metric parameters to meet aesthetic and functional goals. It naturally extends to arbitrary dimensions, including curved 3D surfaces. For improved on-surface patterning, we propose a per-sector parameterization of star-shaped metrics, ensuring uniform cell shapes in non-regular neighborhoods.
Our approach generates diverse patterns, from homogeneous to continuously graded designs, with applications in decorative surfaces and metamaterials. We demonstrate its versatility in optimizing patterns to achieve specific macromechanical properties.
Art Gallery






DescriptionCLOUD SCRIPTS is a generative art installation that uses machine learning to create live plotted talismans based on Daoist Cloud Seals. Functioning as a conceptual machine, it acts as a spiritual medium, reclaiming non-utilitarian, sacred technological practices in resistance to the alienating effects of modern, efficiency-driven machines.
Technical Papers


DescriptionExisting 4D Gaussian Splatting (4DGS) methods struggle to accurately reconstruct dynamic scenes, often failing to resolve ambiguous pixel correspondences and inadequate densification in dynamic regions. We address these issues by introducing a novel method composed of two key components: (1) Elliptical Error Clustering and Error Correcting Splat Addition that pinpoints dynamic areas to improve and initialize fitting splats and (2) Grouped 4D Gaussian Splatting that improves consistency of mapping between splats and represented dynamic objects. Specifically, we classify rendering errors into missing-color and occlusion types, then apply targeted corrections via backprojection or foreground splitting guided by cross-view color consistency. Evaluations on Neural 3D Video and Technicolor datasets demonstrate that our approach significantly improves temporal consistency and achieves state-of-the-art perceptual rendering quality, improving 0.39dB of PSNR on the Technicolor Light Field dataset. Our visualization shows improved alignment between splat and dynamic objects and the error correction method's capability to identify errors and properly initialize new splats. Our implementation details and source code will be publicly released to facilitate further research.
Art Papers



DescriptionCRIA (Cocktail Responsive Interactive Avatar) is an interactive digital human designed to simulate emotional instability in AI. By combining mood-driven language generation, expressive speech synthesis, and digital human, the system embodies states such as intoxication, anger, and confusion. Exhibited in public settings, CRIA received strong audience engagement; participants described it as easy to interact with and emotionally expressive, indicating the effectiveness of its interaction design and emotion engine. Rather than portraying AI as coherent or helpful, CRIA embraces affective breakdown as a means of critical reflection. The project challenges normative expectations of emotional AI and provokes new questions about machine subjectivity, ethical design, and the aesthetics of failure.
XR






DescriptionOur collaborative scene mood authoring system enables multiple users in a virtual reality environment to design and control the multimodal feedback of a shared scene through voice commands.
Poster






DescriptionWe propose color-corrected ray-based hologram generation that calibrates color using reference planes and adjusts amplitude with correction factors, achieving a consistent color to a rendered image with precise occlusion.
Technical Papers


DescriptionWe propose SCom Tree, a compact representation for geometry based on Signed Distance Fields (SDF), which outperforms previous approaches in both size and quality while maintaining comparable rendering speed. Our representation employs the concept that many surface regions are similar to each other up to a linear transformation, and only a small fraction of them can be stored to represent the whole 3D model. At the top level, we use BVH trees to accelerate ray tracing and efficiently cull empty space. The BVH leaves store octrees, with nodes referencing surface regions, each represented by a small 3D grid (brick) and transformation. The transformations themselves are a limited set of rotations and translations that are encoded with the short index.
SCom Tree supports a continuous level of detail that enables efficient streaming and performance control at far distances, making our method particularly useful for rendering large-scale 3D models with ray tracing.
SCom Tree supports a continuous level of detail that enables efficient streaming and performance control at far distances, making our method particularly useful for rendering large-scale 3D models with ray tracing.
Poster






DescriptionThis study compares voxel, point cloud, and image-based 3D data representations for humanoid keypoint estimation using geometric deep learning, with a rigger study revealing diverse practices and subjective rigging standards.
Technical Papers


DescriptionGenerating high-fidelity images of humans with fine-grained control over attributes such as hairstyle and clothing remains a core challenge in personalized text-to-image synthesis. While prior methods emphasize identity preservation from a reference image, they lack modularity and fail to provide disentangled control over specific visual attributes. We introduce a new paradigm for attribute-specific image prompting, in which distinct sets of reference images are used to guide the generation of individual aspects of human appearance, such as hair, clothing, and identity. Our method encodes these inputs into attribute-specific tokens, which are injected into a pre-trained text-to-image diffusion model. This enables compositional and disentangled control over multiple visual factors, even across multiple people within a single image. To promote natural composition and robust disentanglement, we curate a cross-reference training dataset featuring subjects in diverse poses and expressions, and propose a multi-attribute cross-reference training strategy that encourages the model to generate faithful outputs from misaligned attribute inputs while adhering to both identity and textual conditioning. Extensive experiments show that our method achieves state-of-the-art performance in accurately following both visual and textual prompts. Our framework paves the way for more configurable human image synthesis by combining visual prompting with text-driven generation.
Technical Papers


DescriptionWe introduce mathematical tools to describe the geometric problem of sieves, two-dimensional holes that admit certain three-dimensional objects to pass through them, but block others.
This is achieved by formulating the sieve design problem as a two-player game where both players (the one that wants to pass, and the one that wants to block) try to find a set of rigid transformations to achieve their objective. We also introduce an algorithm for solving this game by solving a global optimization problem employing both differentiable rendering with gradient-based optimization as well as particle swarm optimization.
Our procedure accounts for real-world manufacturing concerns, and we fabricate a variety of examples demonstrating the practical viability of our sieves. Our implementation takes advantage of GPUs, and does not rely on any clean or manifold input geometry as long as it is a triangle mesh.
We can produce intricate sieves that block an arbitrary set of shapes B but admit another arbitrary set of shapes A (if finding a solution is possible for our method).
This is achieved by formulating the sieve design problem as a two-player game where both players (the one that wants to pass, and the one that wants to block) try to find a set of rigid transformations to achieve their objective. We also introduce an algorithm for solving this game by solving a global optimization problem employing both differentiable rendering with gradient-based optimization as well as particle swarm optimization.
Our procedure accounts for real-world manufacturing concerns, and we fabricate a variety of examples demonstrating the practical viability of our sieves. Our implementation takes advantage of GPUs, and does not rely on any clean or manifold input geometry as long as it is a triangle mesh.
We can produce intricate sieves that block an arbitrary set of shapes B but admit another arbitrary set of shapes A (if finding a solution is possible for our method).
Technical Papers


DescriptionA stretch sensor is a device that attaches to objects and measures the amount by which they deform. These sensors have shown great promise as an alternative to vision-based motion-capture systems, and for robotic sensing. Currently, they are generally limited to linear designs, and require a somewhat challenging calibration process. Our goal is to enable inverse design of such sensor, and to largely eliminate the calibration process.
To this end, we introduce a highly accurate, differentiable simulator for capacitive stretchsensors, that treats both the elasto- and electro-static parts of the system. The differentiability allows us to optimize the geometric properties of the sensor in order to improve its design for specific applications. We demonstrate the accuracy of our simulator and the effectiveness of our sensor optimization process for various use-cases, such as human interfaces and robotics.
To this end, we introduce a highly accurate, differentiable simulator for capacitive stretchsensors, that treats both the elasto- and electro-static parts of the system. The differentiability allows us to optimize the geometric properties of the sensor in order to improve its design for specific applications. We demonstrate the accuracy of our simulator and the effectiveness of our sensor optimization process for various use-cases, such as human interfaces and robotics.
Technical Papers


DescriptionHigh-frame-rate photorealistic rendering at modern displays is demanding. Existing frame generation and super-resolution techniques accelerate rendering by reducing rendering samples across space or time. However, uniform sample reduction often sacrifices quality, particularly in areas with complex details or dynamic shading. To address this, our approach accelerates rendering by selecting local areas: we reuse generated frames and re-shade essential areas to extrapolate frames. We introduce the Predictive Error-Flow-eXtrapolation Network (EFXNet), an architecture designed to concurrently predict errors, estimate flows, and generate frames. EFXNet predicts error scores and guides the future shading mask generation by leveraging temporal coherence. To handle inputs that may contain errors, the target-grid correlation module computes error-robust flow residuals by correlating reprojected multi-scale features. The temporally stable frame synthesis method distinctly processes historical geometric and lighting components using dedicated motion representations. Extensive experimental results show that, compared with state-of-the-art methods, our frame extrapolation method exhibits superior visual quality and temporal stability under a low rendering budget.
Technical Papers


DescriptionRecent advances in training-free attention control methods have enabled flexible and efficient text-guided editing capabilities for existing image and video generation models. However, current approaches struggle to simultaneously deliver strong editing strength while preserving consistency with the source. For instance, in color-editing tasks, they struggle to maintain structural consistency in edited regions while preserving the rest intact. This limitation becomes particularly critical in multi-round and video editing, where visual errors can accumulate over time. Moreover, most existing methods enforce global consistency, which limits their ability to modify individual attributes such as texture while preserving others, thereby hindering fine-grained editing. Recently, the architectural shift from U-Net to Multi-Modal Diffusion Transformers (MM-DiT) has brought significant improvements in generative performance and introduced a novel mechanism for integrating text and vision modalities. These advancements pave the way for overcoming challenges that previous methods failed to resolve. Through an in-depth analysis of MM-DiT, we identify three key insights into its attention mechanisms. Building on these, we propose ConsistEdit, a novel attention control method specifically tailored for MM-DiT. ConsistEdit incorporates vision-only attention control, mask-guided pre-attention fusion, and differentiated manipulation of the query, key, and value tokens to produce consistent, prompt-aligned edits. Extensive experiments demonstrate that ConsistEdit achieves state-of-the-art performance across a wide range of image and video editing tasks, including both structure-consistent and structure-inconsistent scenarios. Unlike prior methods, it is the first approach to perform editing across all inference steps and attention layers without handcraft, significantly enhancing reliability and consistency, which enables robust multi-round and multi-region editing. Furthermore, it supports progressive adjustment of structural consistency, enabling finer control. ConsistEdit represents a significant advancement in generative model editing and unlocks the full editing potential of MM-DiT architectures.
Technical Papers


DescriptionIn text-to-image models, consistent character generation is the task of achieving text alignment while maintaining the subject's appearance across different prompts. However, since style and appearance are often entangled, the existing methods struggle to preserve consistent subject characteristics while adhering to varying style prompts. Current approaches for consistent text-to-image generation typically rely on large-scale fine-tuning on curated image sets or per-subject optimization, which either fail to generalize across prompts or do not align well with textual descriptions. Meanwhile, training-free methods often fail to maintain subject consistency across different styles.
In this work, we introduce a training-free method that, for the first time, jointly achieves style preservation and subject consistency across varied styles. The attention matrices are manipulated such that Queries and Keys are obtained from the anchor image(s) that are used to define the subject, while the Values are imported from a parallel copy that is not subject-anchored. Additionally, cross-image components are added to the self-attention mechanism by expanding the Key and Value matrices. To do without shifting from the target style, we align the statistics of the Value matrices.
As is demonstrated in a comprehensive battery of qualitative and quantitative experiments, our method effectively decouples style from subject appearance and enables faithful generation of text-aligned images with consistent characters across diverse styles.
In this work, we introduce a training-free method that, for the first time, jointly achieves style preservation and subject consistency across varied styles. The attention matrices are manipulated such that Queries and Keys are obtained from the anchor image(s) that are used to define the subject, while the Values are imported from a parallel copy that is not subject-anchored. Additionally, cross-image components are added to the self-attention mechanism by expanding the Key and Value matrices. To do without shifting from the target style, we align the statistics of the Value matrices.
As is demonstrated in a comprehensive battery of qualitative and quantitative experiments, our method effectively decouples style from subject appearance and enables faithful generation of text-aligned images with consistent characters across diverse styles.
Technical Papers


DescriptionRecent advances in diffusion models have made significant progress in digital human generation. However, most existing models still struggle to maintain 3D consistency, temporal coherence, and motion accuracy. These limitations primarily stem from two key factors: the limited representation ability of commonly used control signals (e.g., landmarks, depth maps), and the lack of diversity in identity and pose variations within publicly available datasets. In this paper, we construct a powerful head model from both aspects by constructing learnable control signals and enabling the model to adaptively leverage synthetic data. Firstly, we introduce a novel control signal representation that is learnable, dense, expressive, and 3D consistent. Our method embeds learnable Gaussians onto a parametric head surface, which significantly enhances the consistency and expressiveness of diffusion-based head models. Secondly, in terms of data, we synthesize a large-scale dataset covering diverse poses and identities. To reduce the negative impact of artifacts in synthetic data, we introduce real/synthetic embeddings that allow the model to distinguish between real and synthetic samples and learn to utilize them adaptively. Extensive experiments show that our model outperforms existing methods in terms of realism, expressiveness, and 3D consistency. We will release our code, synthetic datasets, and pre-trained models.
Technical Papers


DescriptionRecent advances in interactive video generation have shown promising results, yet existing approaches struggle with scene-consistent memory capabilities in long video generation due to limited use of historical context. In this work, we propose Context-as-Memory, which utilizes historical context as memory for video generation. It includes two simple yet effective designs: (1) storing context in frame format without additional post-processing; (2) conditioning by concatenating context and frames to be predicted along the frame dimension at the input, requiring no external control modules. Furthermore, considering the enormous computational overhead of incorporating all historical context, we propose the Memory Retrieval module to select truly relevant context frames by determining FOV (Field of View) overlap between camera poses, which significantly reduces the number of candidate frames without substantial information loss. Experiments demonstrate that Context-as-Memory achieves superior memory capabilities in interactive long video generation compared to SOTAs, even generalizing effectively to open-domain scenarios not seen during training.
Birds of a Feather






DescriptionThe rapid progress of artificial intelligence and immersive technologies has opened new opportunities for developing intelligent agents that can operate seamlessly across both real-world and virtual environments. Context-awareness—an agent’s ability to perceive, interpret, and adapt to situational factors—is becoming a key enabler for next-generation applications in domains such as robotics, and augmented/virtual reality.
This BOF aims to bring together researchers, practitioners, and industry experts to explore the design, research and development of context-aware intelligent agents. Topics of interest include: multimodal perception and reasoning, embodied and situated intelligence, adaptive interaction models, cross-reality environments, collaborative agents, and evaluation methodologies.
This BOF aims to bring together researchers, practitioners, and industry experts to explore the design, research and development of context-aware intelligent agents. Topics of interest include: multimodal perception and reasoning, embodied and situated intelligence, adaptive interaction models, cross-reality environments, collaborative agents, and evaluation methodologies.
Technical Papers


DescriptionNeural-network-based character controllers are increasingly common and capable. However, the integration of desired control inputs such as joystick movement, motion paths, and objects in the environment, remains challenging. This is because these inputs often require custom feature engineering, specific neural network architectures, and training procedures. This renders these methods largely inaccessible to non-technical designers. To address this challenge, we introduce Control Operators, a powerful and flexible framework for specifying the control mechanisms of interactive character controllers. By breaking down the control problem into a set of simple operators, each with a semantic meaning for designers, and a corresponding neural network structure, we allow non-technical users to design control mechanisms in a way that is intuitive and can be composed together to train models that have multiple skills and control modes. We demonstrate their potential with two current state-of-the-art interactive character controllers - a flow-matching-based auto-regressive model, and a variation of Learned Motion Matching. We validate the approach via a user study wherein industry practitioners with varying degrees of ML and technical expertise explore the use of our system.
Technical Papers


DescriptionUnoriented surface reconstructions based on the Gauss formula have attracted much attention due to their mathematical formulation and good experimental performance. However, the formula's isotropy limits its capacity to leverage the directional features of point clouds. In this study, we introduce a convection augmentation term to extend the classic Gauss formula. This new term allows our method to leverage point clouds' directional characteristics effectively. With the proper choice of the velocity field, this method could construct more equations to calculate a more precise indicator function. Furthermore, an adaptive selection strategy of the velocity field is proposed. For large-scale point clouds, we propose a CUDA-and-octree-based acceleration algorithm with $O(N)$ space complexity and $O(N\text{log}N)$ time complexity. Our method can complete the orientation and reconstruction tasks of point clouds with up to 500K within a few seconds. Extensive experiments demonstrate that our method achieves state-of-the-art performance and manages various challenging situations, especially for models with thin structures or small holes. The source code is publicly available at \textcolor{blue}{\textit{https://github.com/mayueji/CAGR}}.
Technical Communications


DescriptionA novel approach to extract pruned skeleton from a noisy image using compact convolutional neural network. We proposed a novel loss function that ensures better connectivity and suppresses spurious branches.
XR






DescriptionCosmicStride enables natural walking in expansive virtual environments within confined spaces by leveraging head motion via standard HMD sensors. Our algorithm scales virtual strides proportional to real gait cycles, breaking spatial constraints for LBE while preserving proprioception. Validated in space exploration scenarios, it delivers large-scale VR navigation without additional hardware.
Key Event
Keynote



DescriptionAI is experiencing a pivotal shift from the language-centric intelligence of LLMs to the visual and spatial intelligence enabled by world models, which are key for general intelligence. Modeling dynamic 3D environments allows agents to perceive and interact with complex worlds, enhancing embodied robotics, autonomous systems, and so on.
This talk will focus on the development and future of world models from a 3D vision perspective, introducing the Hunyuan World model series. Defining a ""world"" as an interactive 3D scene, we will explore our research across three core areas: world generation, reconstruction, and interaction.
The presentation will chart our work's progression, beginning with Hunyuan World 1.0 for immersive 3D scene generation and our subsequent enhancements that dramatically accelerate generation speed and expand the scale of explorable 3D worlds. We will then introduce WorldMirror, our unified model designed for feedforward and high-fidelity 3D reconstruction from varied input settings. Finally, we will demonstrate WorldPlay, our model for enabling dynamic world interaction with long-term 3D memory and real-time latency. Together, the Hunyuan World series charts a path from static generation to dynamic interaction, representing a significant step in world-level 3D content creation and providing strong baselines for future research in this exciting field.
This talk will focus on the development and future of world models from a 3D vision perspective, introducing the Hunyuan World model series. Defining a ""world"" as an interactive 3D scene, we will explore our research across three core areas: world generation, reconstruction, and interaction.
The presentation will chart our work's progression, beginning with Hunyuan World 1.0 for immersive 3D scene generation and our subsequent enhancements that dramatically accelerate generation speed and expand the scale of explorable 3D worlds. We will then introduce WorldMirror, our unified model designed for feedforward and high-fidelity 3D reconstruction from varied input settings. Finally, we will demonstrate WorldPlay, our model for enabling dynamic world interaction with long-term 3D memory and real-time latency. Together, the Hunyuan World series charts a path from static generation to dynamic interaction, representing a significant step in world-level 3D content creation and providing strong baselines for future research in this exciting field.
Exhibitor Talk






DescriptionK-Pop Demon Hunters, produced by Sony Pictures Animation for Netflix, is built around a creative vision that merges the emotional sensibility of K-Drama cinematography with the dynamic energy and visual rhythm of K-Pop music videos. Delivering this hybrid aesthetic required a stylized 3D look that pushed far beyond traditional physically based lighting. Early in production, artists at Sony Pictures Imageworks recognized that many of the film’s defining visual elements—graphic edges, color fringing, manga-inspired accents, and music-responsive effects—were more naturally achieved in compositing than in lighting. This shifted the filmmaking approach toward a comp-driven stylization pipeline, with Nuke becoming the central creative platform for developing and executing the film’s visual identity.
Technical Communications


DescriptionThis paper presents a system that generates probabilistic grammar-based expression trees and compiles them into GPU kernels to achieve orders of magnitude faster rendering of stochastic expression trees.
XR






DescriptionAn innovative immersive design system combining AI, real-time ray-tracing, and 3D Gaussian Splatting enables rapid creation and exploration of realistic future living spaces through XR/MR technologies.
Featured Session



DescriptionIn this session, animation director and IP creator Momo Wang explores how great characters transcend the screen to become part of global culture.
From her early creation Tuzki, a silent rabbit born in college that became a global icon, to her work with Illumination (Penglai, Minions), Momo shares what makes characters timeless: simplicity, emotional clarity, and cultural flexibility.
She dives into the concept of character ecosystems, where one design must adapt across formats such as films, shorts, memes, merchandise, and AI-generated remixes. Drawing on her recent work using AI tools like Minimax, Kling, and Luma, she discusses how AI is reshaping the character design pipeline.
Momo also previews two major new projects. Looking back at Japan’s anime boom during its 1980s recession, she reminds us that stories often flourish in times of uncertainty and that AI may now offer creators a new kind of freedom.
Her message is clear:
“AI doesn’t replace creativity, it expands it. And great characters don’t just live on screens. They live in us.”
From her early creation Tuzki, a silent rabbit born in college that became a global icon, to her work with Illumination (Penglai, Minions), Momo shares what makes characters timeless: simplicity, emotional clarity, and cultural flexibility.
She dives into the concept of character ecosystems, where one design must adapt across formats such as films, shorts, memes, merchandise, and AI-generated remixes. Drawing on her recent work using AI tools like Minimax, Kling, and Luma, she discusses how AI is reshaping the character design pipeline.
Momo also previews two major new projects. Looking back at Japan’s anime boom during its 1980s recession, she reminds us that stories often flourish in times of uncertainty and that AI may now offer creators a new kind of freedom.
Her message is clear:
“AI doesn’t replace creativity, it expands it. And great characters don’t just live on screens. They live in us.”
Courses


DescriptionPURPOSE OF THE COURSE
Recent advances in extended reality (XR) technologies have unlocked the tremendous potential of XR for delivering interactive and immersive education experiences. Such XR experiences generally involve animating virtual characters in real-world scene, driven by scene semantics.
For example, via a mixed reality headset, one can experience a virtual campus tour with virtual characters animating around buildings and trees. Using a virtual reality headset, one can go through an immersive underwater education experience in the form of a 360 video with objects of interest, such as sea creatures, highlighted and annotated with biological information. What's more, via a combination of CAVE and MR headsets, one could be instantly transported to a 1930s Shanghai nightclub to interact with characters from a classic Chinese modernist novel, gaining a deeper understanding of the literary narrative through immersive storytelling.
This course aims at giving an overview of the pipeline and key techniques for authoring such compelling XR educational experiences centered around real-world scene data.
SUMMARY OF COURSE CONTENT
This course covers the key elements of XR experience creation, especially for educational applications in art and science domains. It starts with the topics of scene acquisition and understanding, followed by utilizing the semantics and geometry of the acquired scenes for virtual content authoring and character animation. It concludes with insights of how to apply visual computing techniques, integrated with the latest XR devices, to devise practical XR educational experiences of the future, some of which have been adopted for science and art education in higher education institutions.
EXPECTED LEARNING OUTCOMES
Taking this course, attendees will gain an overview of the pipeline commonly applied to create XR educational applications. Through practical examples, they will understand the key techniques of such a pipeline, including scene acquisition, scene understanding and annotations, and virtual character animation.
Recent advances in extended reality (XR) technologies have unlocked the tremendous potential of XR for delivering interactive and immersive education experiences. Such XR experiences generally involve animating virtual characters in real-world scene, driven by scene semantics.
For example, via a mixed reality headset, one can experience a virtual campus tour with virtual characters animating around buildings and trees. Using a virtual reality headset, one can go through an immersive underwater education experience in the form of a 360 video with objects of interest, such as sea creatures, highlighted and annotated with biological information. What's more, via a combination of CAVE and MR headsets, one could be instantly transported to a 1930s Shanghai nightclub to interact with characters from a classic Chinese modernist novel, gaining a deeper understanding of the literary narrative through immersive storytelling.
This course aims at giving an overview of the pipeline and key techniques for authoring such compelling XR educational experiences centered around real-world scene data.
SUMMARY OF COURSE CONTENT
This course covers the key elements of XR experience creation, especially for educational applications in art and science domains. It starts with the topics of scene acquisition and understanding, followed by utilizing the semantics and geometry of the acquired scenes for virtual content authoring and character animation. It concludes with insights of how to apply visual computing techniques, integrated with the latest XR devices, to devise practical XR educational experiences of the future, some of which have been adopted for science and art education in higher education institutions.
EXPECTED LEARNING OUTCOMES
Taking this course, attendees will gain an overview of the pipeline commonly applied to create XR educational applications. Through practical examples, they will understand the key techniques of such a pipeline, including scene acquisition, scene understanding and annotations, and virtual character animation.
Poster






DescriptionDiffusion-based system for interactive hairstyle design, enabling hybrid style blending, realistic recoloring, and drag-based volumetric edits for creative, controllable, and photorealistic virtual hairstyling applications.
Birds of a Feather






DescriptionJapan’s creativity continues to flourish, serving as a key to engaging the world and driving innovation.
Join us for "Creative Japan," a dynamic session that explores the forefront of computer graphics and interactive technologies with a focus on Japan.
The session features leading researchers, industry experts, and creators from Japan and around the globe who associated with Japan are shaping the future of the field . Discover groundbreaking ideas and innovations through talks by top experts. This inspiring lineup highlights the unique perspectives of Japanese professionals driving progress in both academia and industry.
Join us for "Creative Japan," a dynamic session that explores the forefront of computer graphics and interactive technologies with a focus on Japan.
The session features leading researchers, industry experts, and creators from Japan and around the globe who associated with Japan are shaping the future of the field . Discover groundbreaking ideas and innovations through talks by top experts. This inspiring lineup highlights the unique perspectives of Japanese professionals driving progress in both academia and industry.
Technical Papers


DescriptionCross fields play a critical role in various geometry processing tasks, especially for quad mesh generation.
Existing methods for cross field generation often struggle to balance computational efficiency with generation quality, using slow per-shape optimization.
We introduce CrossGen, a novel framework that supports both feed-forward prediction and latent generative modeling of cross fields for quad meshing by unifying geometry and cross field representations within a joint latent space.
Our method enables extremely fast computation of high-quality cross fields of general input shapes, typically within one second
without per-shape optimization.
Our method assumes a point-sampled surface, also called a point-cloud surface, as input, so we can accommodate various surface representations by a straightforward point sampling process.
Using an auto-encoder network architecture, we encode input point-cloud surfaces
into a sparse voxel grid with fine-grained latent spaces, which are decoded into both SDF-based surface geometry and cross fields (see the teaser figure).
We also contribute a dataset of models with both high-quality signed distance fields (SDFs) representations and their corresponding cross fields, and use it to train our network.
Once trained, the network is capable of computing a cross field of an input surface in a feed-forward manner, ensuring high geometric fidelity, noise resilience, and rapid inference.
Furthermore, leveraging the same unified latent representation, we incorporate a diffusion model for computing cross fields of new shapes generated from partial input, such as sketches.
To demonstrate its practical applications, we validate CrossGen on the quad mesh generation task for a large variety of surface shapes.
Experimental results demonstrate that CrossGen generalizes well across diverse shapes and consistently yields high-fidelity cross fields, thus facilitating the generation of high-quality quad meshes.
Existing methods for cross field generation often struggle to balance computational efficiency with generation quality, using slow per-shape optimization.
We introduce CrossGen, a novel framework that supports both feed-forward prediction and latent generative modeling of cross fields for quad meshing by unifying geometry and cross field representations within a joint latent space.
Our method enables extremely fast computation of high-quality cross fields of general input shapes, typically within one second
without per-shape optimization.
Our method assumes a point-sampled surface, also called a point-cloud surface, as input, so we can accommodate various surface representations by a straightforward point sampling process.
Using an auto-encoder network architecture, we encode input point-cloud surfaces
into a sparse voxel grid with fine-grained latent spaces, which are decoded into both SDF-based surface geometry and cross fields (see the teaser figure).
We also contribute a dataset of models with both high-quality signed distance fields (SDFs) representations and their corresponding cross fields, and use it to train our network.
Once trained, the network is capable of computing a cross field of an input surface in a feed-forward manner, ensuring high geometric fidelity, noise resilience, and rapid inference.
Furthermore, leveraging the same unified latent representation, we incorporate a diffusion model for computing cross fields of new shapes generated from partial input, such as sketches.
To demonstrate its practical applications, we validate CrossGen on the quad mesh generation task for a large variety of surface shapes.
Experimental results demonstrate that CrossGen generalizes well across diverse shapes and consistently yields high-fidelity cross fields, thus facilitating the generation of high-quality quad meshes.
Art Papers



DescriptionCryoscape is an interactive installation that explores the intersection of water's phase transitions, 3D printing technology, and artificial intelligence to create a dynamic commentary on climate change. Born from observations during an Arctic Circle Artist Residency in Svalbard, this project develops a novel approach to ice sculpture through controlled droplet deposition on temperature-regulated surfaces with varying hydrophobic and hydrophilic treatments. The resulting micro-landscapes, captured by real-time macro photography, are interpreted by an AI system generating Haiku poetry, creating a dialogue between machine perception and natural phenomena. Each installation adapts to local environmental conditions—humidity, temperature, and water composition—making the work site-specific and collaborative with nature. Beyond its artistic merit, Cryoscape raises critical questions about the anthropogenic impact on Arctic environments, including the ironic contribution of AI systems to global warming through their energy consumption. The project demonstrates potential applications in tissue engineering and microfluidics while serving as a meditation on slow technology, environmental fragility, and the blurred boundaries between artificial and natural systems.
Technical Papers


DescriptionWe present an unsupervised framework for physically plausible shape interpolation and dense correspondence estimation between 3D articulated shapes. Our method uses Neural Ordinary Differential Equations to generate smooth flow fields that define diffeomorphic transformations, ensuring topological consistency and preventing self-intersections while accommodating hard constraints, such as volume preservation.
By incorporating a lightweight skeletal structure, we impose kinematic constraints that resolve symmetries without requiring manual skinning or predefined poses. We enhance physical realism by interpolating skeletal motion with dual quaternions and applying constrained optimization to align the flow field with the skeleton, preserving local rigidity. Additionally, we employ an efficient formulation of Normal Cycles, a metric from geometric measure theory, to capture higher-order surface details like curvature, enabling precise alignment between complex articulated structures and recovery of accurate dense correspondence mapping.
Evaluations on multiple benchmarks show notable improvements over state-of-the-art methods in both interpolation quality and correspondence accuracy, with consistent performance across different skeletal configurations, demonstrating broad applicability to shape matching and animation tasks.
By incorporating a lightweight skeletal structure, we impose kinematic constraints that resolve symmetries without requiring manual skinning or predefined poses. We enhance physical realism by interpolating skeletal motion with dual quaternions and applying constrained optimization to align the flow field with the skeleton, preserving local rigidity. Additionally, we employ an efficient formulation of Normal Cycles, a metric from geometric measure theory, to capture higher-order surface details like curvature, enabling precise alignment between complex articulated structures and recovery of accurate dense correspondence mapping.
Evaluations on multiple benchmarks show notable improvements over state-of-the-art methods in both interpolation quality and correspondence accuracy, with consistent performance across different skeletal configurations, demonstrating broad applicability to shape matching and animation tasks.
Technical Papers


DescriptionThis paper introduces a novel curve-based slicing method for generating planar layers with dynamically varying orientations in digital light processing (DLP) 3D printing. Our approach effectively addresses key challenges in DLP printing, such as regions with large overhangs and staircase artifacts, while preserving its intrinsic advantages of high resolution and fast printing speeds. We formulate the slicing problem as an optimization task, in which parametric curves are computed to define both the slicing layers and the model partitioning through their tangent planes. These curves inherently define motion trajectories for the build platform and can be optimized to meet critical manufacturing objectives, including collision-free motion and floating-free deposition. We validate our method through physical experiments on a robotic multi-axis DLP printing setup, demonstrating that the optimized curves can robustly guide smooth, high-quality fabrication of complex geometries.
Poster






DescriptionWe propose an interactive avatar system with a half-mirror display and parametric loudspeakers. It creates a “whisper-beside-you” 3D audio experience, enhancing perceived closeness and immersion for customer service.
Technical Papers


DescriptionEffective multi-shot generation demands purposeful, film-like transitions and strict cinematic continuity. Current methods, however, often prioritize basic visual consistency, neglecting crucial editing patterns (e.g., shot/reverse shot, cutaways) that drive narrative flow for compelling storytelling. This yields outputs that may be visually coherent but lack narrative sophistication and true cinematic integrity.
To bridge this, we introduce \textbf{Next Shot Generation (NSG)}: synthesizing a subsequent, high-quality shot that critically conforms to professional editing patterns while upholding rigorous cinematic continuity. Our framework, \textbf{Cut2Next}, leverages a Diffusion Transformer (DiT). It employs in-context tuning guided by a novel \textit{Hierarchical Multi-Prompting} strategy. This strategy uses \textit{Relational Prompts} to define overall context and inter-shot editing styles. \textit{Individual Prompts} then specify per-shot content and cinematographic attributes. Together, these guide Cut2Next to generate cinematically appropriate next shots. Architectural innovations, \textit{Context-Aware Condition Injection (CACI)} and \textit{Hierarchical Attention Mask (HAM)}, further integrate these diverse signals without introducing new parameters.
We construct RawCuts (large-scale) and CuratedCuts (refined) datasets, both with hierarchical prompts, and introduce CutBench for evaluation. Experiments show Cut2Next excels in visual consistency and text fidelity. Crucially, user studies reveal a strong preference for Cut2Next, particularly for its adherence to intended editing patterns and overall cinematic continuity, validating its ability to generate high-quality, narratively expressive, and cinematically coherent subsequent shots.
To bridge this, we introduce \textbf{Next Shot Generation (NSG)}: synthesizing a subsequent, high-quality shot that critically conforms to professional editing patterns while upholding rigorous cinematic continuity. Our framework, \textbf{Cut2Next}, leverages a Diffusion Transformer (DiT). It employs in-context tuning guided by a novel \textit{Hierarchical Multi-Prompting} strategy. This strategy uses \textit{Relational Prompts} to define overall context and inter-shot editing styles. \textit{Individual Prompts} then specify per-shot content and cinematographic attributes. Together, these guide Cut2Next to generate cinematically appropriate next shots. Architectural innovations, \textit{Context-Aware Condition Injection (CACI)} and \textit{Hierarchical Attention Mask (HAM)}, further integrate these diverse signals without introducing new parameters.
We construct RawCuts (large-scale) and CuratedCuts (refined) datasets, both with hierarchical prompts, and introduce CutBench for evaluation. Experiments show Cut2Next excels in visual consistency and text fidelity. Crucially, user studies reveal a strong preference for Cut2Next, particularly for its adherence to intended editing patterns and overall cinematic continuity, validating its ability to generate high-quality, narratively expressive, and cinematically coherent subsequent shots.
Art Papers



DescriptionThis paper introduces Cutting Kim, an interactive VR artwork that explores playful transgression by transforming raw vocal utterances—from a hesitant hum to a cathartic scream—into a 'sonic sword,' a spectacular and destructive force. Through an iterative, practice-based methodology spanning four distinct public exhibitions, from workshops to a large-scale cultural festival, the work revealed a central paradox of the 'publicly private' act: the VR headset grants perceived privacy for vocal release yet simultaneously transforms the participant's physical body and voice into a public spectacle. This paper analyzes this complex negotiation between inner liberation and public self-awareness. We argue that by creating such experiences, interactive art can function as a 'rehearsal space' for social possibilities, offering a tangible model for how technology can be used not to control but to liberate the expressive human voice.
XR






DescriptionWe propose an AR expression method that audience can use with just a smartphone, utilizing LED lighting devices commonly found in music venues. Using flickering pattern of LEDs instead of AR markers enables AR expression suits for music venue. This method also opens up AR expression in dimly lit areas.
Poster






DescriptionBCTI framework transforms plants into autonomous generative agents within the metaverse, driving a bio-renaissance where ecological processes create art, demonstrating 150% growth in plant-driven digital creativity.
Invited Poster
Poster






DescriptionWe present DECON, a novel framework addressing clothed-geometric single-image multi-human 3D reconstruction with interacting relationships. Our decouple-and-reconstruct paradigm has two contributions: 1) decoupling of entangled individuals through parametric priors and then reconstructing the complete body by generative multi-view synthesis, and 2) Perspective-Aware Position Optimization of multi-person spatial arrangements.
Art Papers



DescriptionComputational technologies integrated into ecological art paradoxically perpetuate anthropocentric violence despite their environmental ethos, constrained by Western-centric datasets and unsustainable practices. This study critically examines this historical tension from 1960s land art to contemporary bio-digital works. Current AI eco-art manifests a fundamental contradiction where tools intended for ecological care instead enable computational colonialism through epistemological erasure, material extraction, and ethical hypocrisy. We develop the Earmarked Eco-Relationship Framework(EERF), transforming computational systems into ecological negotiators through decolonial data protocols, entropic material accounting, and redistributed agency mechanisms. Applied through case studies (for example, Hong Kong's blockchain-mediated mangrove restoration), this framework proves algorithmic systems can serve as active co-stewards rather than extractive tools. This work advances ecological art beyond virtual representation by establishing a tripartite methodology for shifting the paradigm from human-centered control to actionable multispecies material ethics and fostering symbiotic co-creation.
Invited Poster
Poster






DescriptionManufacturability is crucial in product design and production, especially for subtractive manufacturing, where geometric accessibility analysis is time-consuming and hard to scale. Existing deep learning methods often overlook geometric challenges and are limited to specific models. This paper introduces DeepMill, a neural framework that accurately and efficiently predicts inaccessible and occluded regions under various machining cutters for both CAD and freeform models. By developing a cutter-aware dual-head octree convolutional network and generating datasets with diverse cutter sizes, DeepMill effectively addresses cutter-collision issues and data scarcity.
Technical Papers


DescriptionWe propose DeMapGS, a structured Gaussian Splatting framework that jointly optimizes deformable surfaces and surface-attached 2D Gaussian splats. By anchoring splats to a deformable template mesh, our method overcomes topological inconsistencies and enhances editing flexibility, addressing limitations of prior Gaussian Splatting methods that treat points independently. The unified representation in our method supports extraction of high-fidelity diffuse, normal, and displacement maps, enabling the reconstructed mesh to inherit the photorealistic rendering quality of Gaussian Splatting. To support robust optimization, we introduce a gradient diffusion strategy that propagates supervision across the surface, along with an alternating 2D/3D rendering scheme to handle concave regions. Experiments demonstrate that DeMapGS achieves state-of-the-art mesh reconstruction quality and enables downstream applications for Gaussian splats such as editing and cross-object manipulation through a shared parametric surface.
Courses


DescriptionThis course offers a hands-on exploration of the optical principles that govern virtual reality (VR) head-mounted displays (HMDs), delivered through an accessible, browser-based simulator. Designed for attendees with little to moderate background in graphics or optics, it bridges the gap between theoretical models and practical understanding of how VR systems render stereo imagery.
Participants will learn how core parameters such as inter-pupillary distance (IPD), eye relief, and field of view (FOV) affect image geometry, distortion, and user comfort. The course blends foundational concepts from computer graphics and optical physics with practical exercises using a web-based HMD simulator that offers real-time feedback on frustums, stereo rendering, and display calibration.
Unlike traditional slide decks or video tutorials, this course uses a custom interactive simulation that let attendees manipulate parameters and directly observe the consequences on perceived image geometry. This not only clarifies complex phenomena like binocular disparity, vergence-accommodation conflict, and FOV distortion but also fosters a more intuitive grasp of HMD design constraints.
By the end of the session, participants will gain the conceptual tools needed to better design immersive applications, debug visual discomfort issues, or optimize VR experiences for diverse users. Whether you are a developer, researcher, educator, or hobbyist, this course provides a practical foundation in VR optics critical for creating effective and comfortable virtual experiences. The simulator remains freely available after the course for participants to continue exploring and deepening their understanding at their own pace.
Participants will learn how core parameters such as inter-pupillary distance (IPD), eye relief, and field of view (FOV) affect image geometry, distortion, and user comfort. The course blends foundational concepts from computer graphics and optical physics with practical exercises using a web-based HMD simulator that offers real-time feedback on frustums, stereo rendering, and display calibration.
Unlike traditional slide decks or video tutorials, this course uses a custom interactive simulation that let attendees manipulate parameters and directly observe the consequences on perceived image geometry. This not only clarifies complex phenomena like binocular disparity, vergence-accommodation conflict, and FOV distortion but also fosters a more intuitive grasp of HMD design constraints.
By the end of the session, participants will gain the conceptual tools needed to better design immersive applications, debug visual discomfort issues, or optimize VR experiences for diverse users. Whether you are a developer, researcher, educator, or hobbyist, this course provides a practical foundation in VR optics critical for creating effective and comfortable virtual experiences. The simulator remains freely available after the course for participants to continue exploring and deepening their understanding at their own pace.
Emerging Technologies






Description"Denshin Impulse" enhances human-agent connection by creating physiological synchrony. In VR, as a user approaches an agent, synchronized heartbeats are delivered via haptic feedback to the user and an adjacent physical doll, deepening empathy through a shared physiological experience.
Birds of a Feather






DescriptionThe session is for attendees interested in information exchange between the research and industry/application communities working with different implementations and applications of neural radiance fields.
This community-driven forum features a panel composed of leading researchers presenting at SA2025 and industry representatives, plus an audience Q&A session. It aims to foster knowledge exchange, identify future directions, and promote new collaborations, connecting the research and user communities within computer graphics and interactive techniques in applying NeRF/3DGS for visualization, digital twins, interactive media (XR/3D/web) design. The discussion will delve into data compression, web-based hosting, game engine integration, interoperability & standards, dynamic techniques (4DGS).
This community-driven forum features a panel composed of leading researchers presenting at SA2025 and industry representatives, plus an audience Q&A session. It aims to foster knowledge exchange, identify future directions, and promote new collaborations, connecting the research and user communities within computer graphics and interactive techniques in applying NeRF/3DGS for visualization, digital twins, interactive media (XR/3D/web) design. The discussion will delve into data compression, web-based hosting, game engine integration, interoperability & standards, dynamic techniques (4DGS).
Technical Papers


DescriptionShape grammars offer a powerful framework for computational design, but synthesizing shape programs to achieve specific goals remains challenging. Inspired by the success of gradient-based optimization in high-dimensional, nonconvex spaces such as those in machine learning, we ask: what makes a shape grammar amenable to gradient-based optimization? To explore this, we introduce Stochastic Rewrite Descent (SRD), an algorithm that interleaves structural rewrites with continuous parameter updates, taking steps in both to optimize a given objective. We analyze the core challenges which have previously prevented optimizing shape programs via descent, and identify a set of desirable properties for grammars that support effective optimization, along with concrete grammar design recommendations to achieve them. We validate this approach across three shape grammars, demonstrating its effectiveness in diverse domains including image fitting, text-driven generation, and topology optimization. Through ablations and comparisons, we show that grammars satisfying our proposed properties lead to significantly better optimization performance. The goal of this work is to open the door to more general and flexible computational paradigms for inverse design with shape grammars.
Poster






DescriptionThis project presents an AI-driven nonverbal visual feedback system for VR social interaction, enhancing emotion, clarity, and immersion, with validated user improvements and future scalability across diverse scenarios.
Poster






DescriptionDesignEval is a human-centered framework integrating vision, language, and multimodal models to evaluate graphic designs against user goals, providing quantitative, interpretable feedback, and enhancing creativity, workflow efficiency, and stakeholder communication.
Technical Papers


DescriptionModeling surface reflectance is central to connecting optical theory with real-world rendering and fabrication. While analytic BRDFs remain standard in rendering, recent advances in geometric and wave optics have expanded the design space for complex reflectance effects. However, existing wave-optics-based methods are limited to controlling reflectance intensity only, lacking the ability to design full-spectrum, color-dependent BRDFs. In this work, we present the first method for designing and fabricating color BRDFs using a fully differentiable wave optics framework. Our differentiable and memory-efficient simulation framework supports end-to-end optimization of microstructured surfaces under scalar diffraction theory, enabling joint control over both angular intensity and spectral color of reflectance. We leverage grayscale lithography with a feature size of 1.5--2.0 𝜇m to fabricate 15 BRDFs spanning four representative categories: anti-mirrors, pictorial reflections, structural colors, and iridescences. Compared to prior work, our approach achieves significantly higher fidelity and broader design flexibility, producing physically accurate and visually compelling results. By providing a practical and extensible solution for full-color BRDF design and fabrication, our method opens up new opportunities in structural coloration, product design, security printing, and advanced manufacturing.
Educator's Forum



DescriptionHow can we make the teaching of computer graphics (CG) concepts more engaging and purposeful? This case study explores the project-based assessments of a second-year undergraduate CG-focused elective course. Throughout the semester, CG skills and knowledge are built alongside a scaffolded sequence of project activities and collaborative studio practice. The course attracts a diverse cohort, including learners from non-programming disciplines, so studio practice becomes a valuable way to combine creative literacy with emerging computational fluency. The projects are motivated by a learner-selected United Nations Sustainable Development Goal and an accompanying statistic, with the graphical and computational challenge to visually convey: (i) What is the issue? (ii) Why does the issue exist? and (iii) How do we bring change? Applying this real-world lens encourages learners to see CG programming not only as a technical skill, but as a purposeful practice with social impact. We present reflections on assessment techniques and challenges, highlighting how purpose-led, data-informed, interdisciplinary contexts can motivate learners, foster creativity, and position CG as a driver for social change.
Educator's Forum



DescriptionThis paper reflects on some developing pedagogical approaches for introducing technical, ethical and critical engagement with generative artificial intelligence (GenAI) in interaction design (IxD) education. As GenAI reshapes creative practice, students must move beyond passive automation toward thoughtful and principled design approaches. Drawing on curriculum development from first year workshops in the Bachelors of Interaction Design at Auckland University of Technology (AUT), the author reflects on AI as a UX design paradigm, tool, and ethical subject. Case studies from workshops include introductions to GenAI and “vibe coding”. In these sessions, first-year students engage with AI tools and methodologies developing foundational AI fluency, and critically examine concepts of authorship, bias in generative systems, and their own ethical positioning. The approaches aim to support lifelong learning and positions design education as a space for process-oriented, socially responsive, and critically engaged practice. This paper reflects on how GenAI can be critically and ethically integrated into first-year interaction design education through reflective and experiential learning. The work also offers practical insights for integrating generative AI into design curricula in ways that are technically rigorous and ethically grounded.
Technical Papers


DescriptionWe propose a novel method to automatically approximate a free-form surface using a set of near developable patches that form a tensile-like structure when anchored at a sparse set of points. These structures are appealing for their ability to span large areas with low material cost and structural weight, while also offering strong aesthetic potential. Our algorithm strikes a balance between approximation accuracy, patch simplicity, and visual quality, while ensuring manufacturability and structural feasibility. The layout is guided by a curvature field and refined through a combinatorial process that incrementally adds patches until performance and fabrication constraints are met. Redundant elements are then removed to improve clarity and elegance.
We demonstrate the effectiveness of our method on several architectural surfaces, supported by fabricated prototypes that showcase the interplay between geometric design, structural behavior, and visual appeal.
We demonstrate the effectiveness of our method on several architectural surfaces, supported by fabricated prototypes that showcase the interplay between geometric design, structural behavior, and visual appeal.
Technical Papers


DescriptionWe present a unique system for large-scale, multi-performer, high resolution 4D volumetric capture providing realistic free-viewpoint video up to and including 4K resolution facial closeups. To achieve this, we employ a novel volumetric capture, reconstruction and rendering pipeline based on Dynamic Gaussian Splatting and Diffusion-based Detail Enhancement. We design our pipeline specifically to meet the demands of high-end media production. We employ two capture rigs: the Scene Rig, which captures multi-actor performances at a resolution which falls short of 4K production quality, and the Face Rig, which records high-fidelity single-actor facial detail to serve as a reference for detail enhancement. We first reconstruct dynamic performances from the Scene Rig using 4D Gaussian Splatting, incorporating new model designs and training strategies to improve reconstruction, dynamic range, and rendering quality. Then to render high-quality images for facial closeups, we introduce a diffusion-based detail enhancement model. This model is fine-tuned with high-fidelity data from the same actors recorded in the Face Rig. We train on paired data generated from low- and high-quality Gaussian Splatting (GS) models, using the low-quality input to match the quality of the Scene Rig, with the high-quality GS as ground truth. Our results demonstrate the effectiveness of this pipeline in bridging the gap between the scalable performance capture of a large-scale rig and the high-resolution standards required for film and media production.
Technical Papers


DescriptionThe depth-of-field (DoF) effect, which introduces aesthetically pleasing blur, enhances photographic quality but is fixed and difficult to modify once the image has been created. This becomes problematic when the applied blur is undesirable (e.g., the subject is out of focus).
To address this, we propose DiffCamera, a model that enables flexible refocusing of a created image conditioned on an arbitrary new focus point and a blur level.
Specifically, we design a diffusion transformer framework for refocusing learning. However, the training requires pairs of data with different focus planes and bokeh levels in the same scene, which are hard to acquire.
To overcome this limitation, we develop a simulation-based pipeline to generate large-scale image pairs with varying focus planes and bokeh levels.
With the simulated data, we find that training with only a vanilla diffusion objective often leads to incorrect DoF behaviors due to the complexity of the task.
This requires a stronger constraint during training.
Inspired by the photographic principle that photos of different focus planes can be linearly blended into a multi-focus image, we propose a stacking constraint during training to enforce precise DoF manipulation.
This constraint enhances model training by imposing physically grounded refocusing behavior that the focusing results should be faithfully aligned with the scene structure and the camera conditions so that they can be combined into the correct multi-focus image.
We also construct a benchmark to evaluate the effectiveness of our refocusing model.
Extensive experiments demonstrate that DiffCamera supports stable refocusing across a wide range of scenes, providing unprecedented control over DoF adjustments for photography and generative AI applications.
To address this, we propose DiffCamera, a model that enables flexible refocusing of a created image conditioned on an arbitrary new focus point and a blur level.
Specifically, we design a diffusion transformer framework for refocusing learning. However, the training requires pairs of data with different focus planes and bokeh levels in the same scene, which are hard to acquire.
To overcome this limitation, we develop a simulation-based pipeline to generate large-scale image pairs with varying focus planes and bokeh levels.
With the simulated data, we find that training with only a vanilla diffusion objective often leads to incorrect DoF behaviors due to the complexity of the task.
This requires a stronger constraint during training.
Inspired by the photographic principle that photos of different focus planes can be linearly blended into a multi-focus image, we propose a stacking constraint during training to enforce precise DoF manipulation.
This constraint enhances model training by imposing physically grounded refocusing behavior that the focusing results should be faithfully aligned with the scene structure and the camera conditions so that they can be combined into the correct multi-focus image.
We also construct a benchmark to evaluate the effectiveness of our refocusing model.
Extensive experiments demonstrate that DiffCamera supports stable refocusing across a wide range of scenes, providing unprecedented control over DoF adjustments for photography and generative AI applications.
Technical Papers


DescriptionRadiance fields have gained tremendous success with applications ranging from novel view synthesis to geometry reconstruction, especially with the advent of Gaussian splatting. However, they sacrifice modeling of material reflective properties and lighting conditions, leading to significant geometric ambiguities and the inability to easily perform relighting.
One way to address these limitations is to incorporate physically-based rendering, but it has been prohibitively expensive to include full global illumination within the inner loop of the optimization. Therefore, previous works adopt simplifications that make the whole optimization with global illumination effects efficient but less accurate.
In this work, we adopt Gaussian surfels as the primitives and build an efficient framework for differentiable light transport, inspired from the classic radiosity theory. The whole framework operates in the coefficient space of spherical harmonics, enabling both diffuse and specular materials. We extend the classic radiosity into non-binary visibility and semi-opaque primitives, propose novel solvers to efficiently solve the light transport, and derive the backward pass for gradient optimizations, which is more efficient than auto-differentiation.
During inference, we achieve view-independent rendering where light transport need not be recomputed under viewpoint changes, enabling hundreds of FPS for global illumination effects, including view-dependent reflections using a spherical harmonics representation.
Through extensive qualitative and quantitative experiments, we demonstrate superior geometry reconstruction, view synthesis and relighting than previous inverse rendering baselines, or data-driven baselines given relatively sparse datasets with known or unknown lighting conditions.
One way to address these limitations is to incorporate physically-based rendering, but it has been prohibitively expensive to include full global illumination within the inner loop of the optimization. Therefore, previous works adopt simplifications that make the whole optimization with global illumination effects efficient but less accurate.
In this work, we adopt Gaussian surfels as the primitives and build an efficient framework for differentiable light transport, inspired from the classic radiosity theory. The whole framework operates in the coefficient space of spherical harmonics, enabling both diffuse and specular materials. We extend the classic radiosity into non-binary visibility and semi-opaque primitives, propose novel solvers to efficiently solve the light transport, and derive the backward pass for gradient optimizations, which is more efficient than auto-differentiation.
During inference, we achieve view-independent rendering where light transport need not be recomputed under viewpoint changes, enabling hundreds of FPS for global illumination effects, including view-dependent reflections using a spherical harmonics representation.
Through extensive qualitative and quantitative experiments, we demonstrate superior geometry reconstruction, view synthesis and relighting than previous inverse rendering baselines, or data-driven baselines given relatively sparse datasets with known or unknown lighting conditions.
Poster






DescriptionWe propose a framework for constructing retroreflective models for mid-air image simulation. These models are derived from photographs of mid-air images and optimized using differentiable rendering. They reproduce viewing-dependent luminance and blur more accurately than conventional models, reducing LPIPS by an average of 0.080 and improving AIRR simulation fidelity.
Technical Papers


DescriptionSimplified proxy models are commonly used to represent detailed architectural structures, reducing storage requirements and enabling real-time rendering. However, the geometric simplifications inherent in proxies result in a loss of fine detail, making it essential for textures to compensate for this loss. Preserving the rich texture information from the original dense architectural reconstructions remains a daunting task, particularly when working with unordered RGB photographs. We propose an automated method for generating realistic texture maps for architectural proxy models at the texel level from an unordered collection of registered photographs. Our approach establishes correspondences between texels on a UV map and pixels in the input images, with each texel’s color computed as a weighted blend of associated pixel values. Using differentiable rendering, we optimize blending parameters to ensure photometric and perspective consistency, while maintaining seamless texture coherence. Experimental results demonstrate the effectiveness and robustness of our method across diverse architectural models and varying photographic conditions, enabling the creation of high-quality textures that preserve visual fidelity and structural detail.
Technical Papers


DescriptionRecovering high-fidelity spatially varying bidirectional reflectance distribution function (SVBRDF) maps from a single image remains an ill-posed and challenging problem, especially in the presence of saturated highlights. Existing methods often fail to reconstruct the underlying texture in regions overwhelmed by intense specular reflections. This kind of bake-in artifacts caused by highlight corruption can be greatly alleviated by providing a series of material images under different lighting conditions. To this end, our key insight is to leverage the strong priors of diffusion models to generate images of the same material under varying lighting conditions. These generated images are then used to aid a multi-image SVBRDF estimator in recovering highlight-free reflectance maps. However, strong highlights in the input image lead to inconsistencies across the relighting results. Moreover, texture reconstruction becomes unstable in saturated regions, with variations in background structure, specular shape, and overall material color. These artifacts degrade the quality of SVBRDF recovery. To address this issue, we propose a shuffle-based background consistency module that extracts stable background features and implicitly identifies saturated regions. This guides the diffusion model to generate coherent content while preserving material structures and details. Furthermore, to stabilize the appearance of generated highlights, we introduce a lightweight specular prior encoder that estimates highlight features and then performs grid-based latent feature translation, injecting consistent specular contour priors while preserving material color fidelity. Both quantitative analysis and qualitative visualization demonstrate that our method enables stable neural relighting from a single image and can be seamlessly integrated into multi-input SVBRDF networks to estimate highlight-free reflectance maps.
Technical Communications


DescriptionSynchronizing lights with a 360-degree global shutter camera enables capturing two lighting conditions almost simultaneously, supporting visual effects like relighting, matting, and separating reflectance components between frames.
Key Event
Real-Time Live!



DescriptionDigital Life Project 2 (DLP2) presents an open-source real-time framework that brings Large Language Models (LLMs) to life through expressive 3D avatars. Users converse naturally by voice, while characters respond on demand with unified audio, whole-body animation, and physics simulation directly in the browser. At its core are: (1) an agentic orchestration of large and small LLMs that governs character behavior, supported by a memory system tracking emotional states and evolving relationships to enable context-dependent reactions; (2) a hybrid real-time pipeline that segments long LLM responses, performs parallel motion retrieval and audio-motion synchronization, and streams efficiently through a custom Protocol Buffers structure for low-latency playback of voice, motion, and expression; and (3) robust mechanisms for user interruption handling, adaptive buffering, and fault tolerance. Characters are fully customizable in both appearance (3D models) and personality (character prompts) and readily adaptable to any LLM or text-to-speech (TTS) service. DLP2 demonstrates how LLMs can be embodied in responsive 3D characters, offering a practical blueprint for real-time, emotionally adaptive digital interactions on the web.
Technical Papers


DescriptionExisting intrinsic triangulation frameworks represent powerful tools for geometry processing; however, they all require the extraction of the common subdivision between extrinsic and intrinsic triangulations for visualization and optimized data transfer. We describe an efficient and effective algorithm for directly rendering intrinsic triangulations that avoids extracting common subdivisions. Our strategy is to use GPU shaders to render the intrinsic triangulation while rasterizing extrinsic triangles. We rely on a point-location algorithm supported by a compact data structure, which requires only two values per extrinsic triangle to represent the correspondence between extrinsic and intrinsic triangulations. This data structure is easier to maintain than previous proposals while supporting all the standard topological operations for improving the intrinsic mesh quality, such as edge flips, triangle refinements, and vertex displacements. Computational experiments show that the proposed data structure is numerically robust and can process nearly degenerate triangulations. We also propose a meshless strategy to accurately transfer data from intrinsic to extrinsic triangulations without relying on the extraction of common subdivisions.
Technical Papers


DescriptionThe miniaturization of shell structures presents a versatile and complex challenge, bridging geometry with diverse practical applications. In this paper, we introduce a novel approach for computing origami crease patterns to compress arbitrary 3D shell objects. First, we employ the adapted Material Point Method (MPM) to simulate the compression of a target surface and obtain an initial folded configuration. Since MPM produces overly smooth curved surfaces, their crease patterns are unsuitable for practical origami fabrication. We then propose a novel Folding Line Extraction (FLE) method that optimizes these smoothed surfaces to extract folding lines that achieve the target compression with minimal deformation and stretching outside the crease lines. This method produces smooth curved folding lines.
Fabrication and experimental validation of the extracted patterns demonstrate their effectiveness and applicability in real-world scenarios.
Fabrication and experimental validation of the extracted patterns demonstrate their effectiveness and applicability in real-world scenarios.
Technical Communications


DescriptionWe present disentangled 3D Gaussian representation, extend to temporal modeling, achieving high-fidelity geometry reconstruction and photorealistic novel-view synthesis. Disentangled-3DGS generates relightable volumetric video that supports downstream lighting via deferred shading.
Technical Communications


DescriptionWe propose double/triple-axis moments for real-time polygonal light shading with both directional light profiles and glossy BRDFs, which are represented with axis-symmetric cosine lobes.
Poster






DescriptionIn this internet-connected MR art experience, participants co-create by drawing lines in mid-air, generating sounds and VR landscapes, while estimated emotions are reflected as shifting colors in avatars and lines.
Technical Communications


DescriptionThis sketch-based method enables the creation and editing of layered 3D characters. It supports clothing replacement and fine-grained control, producing high-quality geometry and textures for digital characters.
Technical Papers


DescriptionIn this paper, we introduce DreamID, a diffusion-based face swapping model that achieves high levels of ID similarity, attribute preservation, image fidelity, and fast inference speed. Unlike the typical face swapping training process, which often relies on implicit supervision and struggles to achieve satisfactory results. DreamID establishes explicit supervision for face swapping by constructing Triplet ID Group data, significantly enhancing identity similarity and attribute preservation. The iterative nature of diffusion models poses challenges for utilizing efficient image-space loss functions, as performing time-consuming multi-step sampling to obtain the generated image during training is impractical. To address this issue, we leverage the accelerated diffusion model SD Turbo, reducing the inference steps to a single iteration, enabling efficient pixel-level end-to-end training with explicit Triplet ID Group supervision. Additionally, we propose an improved diffusion-based model architecture comprising SwapNet, FaceNet, and ID Adapter. This robust architecture fully unlocks the power of the Triplet ID Group explicit supervision. Finally, to further extend our method, we explicitly modify the Triplet ID Group data during training to fine-tune and preserve specific attributes, such as glasses and face shape. Extensive experiments demonstrate that DreamID outperforms state-of-the-art methods in terms of identity similarity, pose and expression preservation, and image fidelity. Overall, DreamID achieves high-quality face swapping results at 512×512 resolution in just 0.6 seconds and performs exceptionally well in challenging scenarios such as complex lighting, large angles, and occlusions.
Art Papers



DescriptionWhile extended reality (XR) has increasingly been used for narrative immersion and cultural visualization, few projects have explored its potential to embody eastern metaphysical thoughts through spatial interactions. In our work \textit{Dreaming of Butterflies}, we address this gap by transforming Chinese philosopher Zhuangzi's classic parable, the Butterfly Dream, into an interactive XR journey that invites participants to experience the boundaries between illusion and reality, self and other. By merging embodied interaction, stylized visual language, and Daoist cosmology, the experience demonstrates how XR can serve as a medium for metaphysical reflection and poetic experience, offering a novel framework for spatializing philosophical thought in immersive art.
Technical Papers


DescriptionRecently, extensive research on image customization (e.g., identity, subject, style, background, etc.) demonstrates strong customization capabilities in large-scale generative models. However, most approaches are designed for specific tasks, restricting their generalizability to combine different types of condition. Developing a unified framework for image customization remains an open challenge. In this paper, we present DreamO, an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions. Specifically, DreamO utilizes a diffusion transformer (DiT) framework to uniformly process input of different types. During training, we construct a large-scale training dataset that includes various customization tasks, and we introduce a feature routing constraint to facilitate the precise querying of relevant information from reference images. Additionally, we design a placeholder strategy that associates specific placeholders with conditions at particular positions, enabling control over the placement of conditions in the generated results. Moreover, we employ a progressive training strategy consisting of three stages: an initial stage focused on simple tasks with limited data to establish baseline consistency, a full-scale training stage to comprehensively enhance the customization capabilities, and a final quality alignment stage to correct quality biases introduced by low-quality data. Extensive experiments demonstrate that the proposed DreamO can effectively perform various image customization tasks with high quality and flexibly integrate different types of control conditions.
Art Papers



DescriptionDrift of the Uncharted presents as a robotic performance in combination with expanded cinema. It combines climate prediction and real-world landscape reconstruction, revealing a speculative world where contemporary urban landscapes experience future sea-level rise. A quadruped robot with an onboard projector serves as a rescue robot, simultaneously traversing both physical and virtual spaces, projecting the drowned cityscapes into the exhibition space. This research examines how speculative narrative and robotic performance mediate climate change—a hyperobject that transcends conventional scales of space and time into an experiential form while simultaneously expanding robotic art by positioning the robot as a narrative agent.
Invited Poster
Poster






DescriptionIn this paper, we analyze the causes of overfitting in sparse-view 3DGS and introduce Dropout into 3DGS to improve rendering robustness. Additionally, we propose an edge-guided splitting strategy to enhance detail rendering quality. Our DropoutGS outperforms the existing methods and shows state-of-the-art performance on multiple benchmark datasets.
Technical Papers


DescriptionMonte Carlo rendering often faces a dilemma, namely, whether to choose an unbiased estimator or a biased one. Although different integrators have been developed to address various scenarios, no single method can effectively manage all situations. Thus, finding a good approach to combine different integrators has always been a topic that warrants exploration.
This work proposes DSCombiner, a new shrinkage estimator that flexibly combines unbiased and biased estimators (typically generated by different integrators) in image space into a single estimating procedure, strategically utilizing the strengths of different integrators while minimizing their weaknesses. DSCombiner overcomes the limitation of single shrinkage combiners by introducing a two-step shrinkage towards a noise-free radiance prior. We derive optimal shrinkage factors for the two steps within a hierarchical Bayesian framework, and provide a deep learning-based method to improve the results. Comprehensive qualitative and quantitative validations across diverse scenes demonstrate visible improvements in image quality, as compared with previous image-space and path-space combiners.
This work proposes DSCombiner, a new shrinkage estimator that flexibly combines unbiased and biased estimators (typically generated by different integrators) in image space into a single estimating procedure, strategically utilizing the strengths of different integrators while minimizing their weaknesses. DSCombiner overcomes the limitation of single shrinkage combiners by introducing a two-step shrinkage towards a noise-free radiance prior. We derive optimal shrinkage factors for the two steps within a hierarchical Bayesian framework, and provide a deep learning-based method to improve the results. Comprehensive qualitative and quantitative validations across diverse scenes demonstrate visible improvements in image quality, as compared with previous image-space and path-space combiners.
Technical Papers


DescriptionDocument dewarping aims to rectify deformations in photographic document images, thus improving text readability, which has attracted much attention and made great progress, but it is still challenging to preserve document structures. Given recent advances in diffusion models, it is natural for us to consider their potential applicability to document dewarping. However, it is far from straightforward to adopt diffusion models in document dewarping due to their unfaithful control on highly complex document images (e.g., 2000$\times$3000 resolution).
In this paper, we propose DvD, the first generative model to tackle document \textbf{D}ewarping \textbf{v}ia a \textbf{D}iffusion framework. To be specific, DvD introduces a coordinate-level denoising instead of typical pixel-level denoising, generating a mapping for deformation rectification. In addition, we further propose a time-variant condition refinement mechanism to enhance the preservation of document structures. In experiments, we find that current document dewarping benchmarks can not evaluate dewarping models comprehensively. To this end, we present AnyPhotoDoc6300, a rigorously designed large-scale document dewarping benchmark comprising 6,300 real image pairs across three distinct domains, enabling fine-grained evaluation of dewarping models. Comprehensive experiments demonstrate that our proposed DvD can achieve state-of-the-art performance with acceptable computational efficiency on multiple metrics across various benchmarks including DocUNet, DIR300, and AnyPhotoDoc6300. The new benchmark and code will be publicly available at https://github.com/afdgasggx/DvD.
In this paper, we propose DvD, the first generative model to tackle document \textbf{D}ewarping \textbf{v}ia a \textbf{D}iffusion framework. To be specific, DvD introduces a coordinate-level denoising instead of typical pixel-level denoising, generating a mapping for deformation rectification. In addition, we further propose a time-variant condition refinement mechanism to enhance the preservation of document structures. In experiments, we find that current document dewarping benchmarks can not evaluate dewarping models comprehensively. To this end, we present AnyPhotoDoc6300, a rigorously designed large-scale document dewarping benchmark comprising 6,300 real image pairs across three distinct domains, enabling fine-grained evaluation of dewarping models. Comprehensive experiments demonstrate that our proposed DvD can achieve state-of-the-art performance with acceptable computational efficiency on multiple metrics across various benchmarks including DocUNet, DIR300, and AnyPhotoDoc6300. The new benchmark and code will be publicly available at https://github.com/afdgasggx/DvD.
Poster






DescriptionOur method stylizes eye highlights in 3D anime-style characters by introducing Position-Based Dynamics into reflection mapping and solving it as a constraint.
Featured Session



DescriptionBremble will be walking through the challenges presented to the Base team on Ne Zha 2, which led to the most ambitious shots done in the company's history. The work included dynamic fluid simulations, massive crowds, and the destruction and fluid interactions of an entire village. In addition, Bremble will touch on the team's animation work for key sequences on the film, and speak briefly about the current state of animation in China and anticipate the years ahead.
Technical Papers


DescriptionWe present a method for augmenting real-world videos with newly generated dynamic content. Given an input video and a simple user-provided text instruction describing the desired content, our method synthesizes dynamic objects or complex scene effects that naturally interact with the existing scene over time. The position, appearance, and motion of the new content are seamlessly integrated into the original footage while accounting for camera motion, occlusions, and interactions with other dynamic objects in the scene, resulting in a cohesive and realistic output video. We achieve this via a zero-shot, training-free framework that harnesses a pre-trained text-to-video diffusion transformer to synthesize the new content and a pre-trained Vision Language Model to envision the augmented scene in detail. Specifically, we introduce a novel inference-based method that manipulates features within the attention mechanism, enabling accurate localization and seamless integration of the new content while preserving the integrity of the original scene.
Our method is fully automated, requiring only a simple user instruction. We demonstrate its effectiveness on a wide range of edits applied to real-world videos, encompassing diverse objects and scenarios involving both camera and object motion
Our method is fully automated, requiring only a simple user instruction. We demonstrate its effectiveness on a wide range of edits applied to real-world videos, encompassing diverse objects and scenarios involving both camera and object motion
Technical Papers


DescriptionRecovering spatial-varying bi-directional reflectance distribution function (SVBRDF) from as few as possible captured images has been a challenging task in computer graphics.
Benefiting from the co-located flashlight-camera capture strategy and data-driven priors, SVBRDF can be estimated from few input images.
However, this capture strategy usually requires a controllable darkroom environment, ensuring the flashlight is a single light source. It is often impractical during on-site capture in real-world scenarios.
To support SVBRDF estimation in an uncontrolled environment, the key challenge lies in the high-precise estimation of unknown environment lighting and its effective utilization on SVBRDF recovery.
To address this issue, we proposed a novel exemplar-based environment lighting representation, which is easier to use for neural networks.
These exemplars are a set of rendered images of selected materials under the environment lighting.
By embedding the rendering process, our approach transforms environment lighting represented in the spherical domain into the sample-surface domain, thereby achieving the domain alignment with input images.
This significantly reduces the network's learning burden, resulting in a more precise environment lighting estimation.
Furthermore, after lighting prediction, we also present a dominant lighting extraction algorithm and an adaptive exemplar selection algorithm to enhance the guidance of environment lighting in SVBRDF estimation.
Finally, considering the distant contribution of environment lighting and point lighting to SVBRDF recovery, we proposed a well-designed cascaded network.
Quantitative assessments and qualitative analysis have demonstrated that our method achieves superior SVBRDF estimations compared to previous approaches.
The source code will be released.
Benefiting from the co-located flashlight-camera capture strategy and data-driven priors, SVBRDF can be estimated from few input images.
However, this capture strategy usually requires a controllable darkroom environment, ensuring the flashlight is a single light source. It is often impractical during on-site capture in real-world scenarios.
To support SVBRDF estimation in an uncontrolled environment, the key challenge lies in the high-precise estimation of unknown environment lighting and its effective utilization on SVBRDF recovery.
To address this issue, we proposed a novel exemplar-based environment lighting representation, which is easier to use for neural networks.
These exemplars are a set of rendered images of selected materials under the environment lighting.
By embedding the rendering process, our approach transforms environment lighting represented in the spherical domain into the sample-surface domain, thereby achieving the domain alignment with input images.
This significantly reduces the network's learning burden, resulting in a more precise environment lighting estimation.
Furthermore, after lighting prediction, we also present a dominant lighting extraction algorithm and an adaptive exemplar selection algorithm to enhance the guidance of environment lighting in SVBRDF estimation.
Finally, considering the distant contribution of environment lighting and point lighting to SVBRDF recovery, we proposed a well-designed cascaded network.
Quantitative assessments and qualitative analysis have demonstrated that our method achieves superior SVBRDF estimations compared to previous approaches.
The source code will be released.
Technical Papers


DescriptionConversational behavior generation, being a crucial capability of embodied agents, is a significant factor influencing human-computer interaction. Generating high-quality conversational motions requires not only appropriate audio-motion mapping but also interactive responses to interlocutor behaviors and comprehensive understanding of conversational semantics. Existing methods primarily rely on audio signals and interlocutor motions for main agent motion generation, lacking high-level semantic understanding of the conversational content, leading to moderate quality motions that are not appropriate for the dialogue. To address these limitations, we leverage the powerful semantic understanding capabilities of large language models, to comprehend complex conversational contexts. Inspired by human conversation processes that conversational motions are highly related to both global and local semantic factors, including the conversational context, and the intentions, emotions, and passive or active states of the participants, we propose an agentic system named Echo that analyzes such information. To achieve comprehensive conversational understanding, Echo leverages multiple prompts and test-time recipes to guide large language models in decomposing conversational structures and extracting fine-grained semantic information. Furthermore, we design a hierarchical feature fusion network that systematically integrates from frame-level audio-motion features to sentence-level semantic understanding and finally to conversation-level contextual comprehension, organically combining fine-grained semantic features from large language models with audio and motion characteristics. Experimental results demonstrate that our framework can be effectively integrated with several state-of-the-art motion generation models to enhance their performance in generating high-quality conversational behaviors. Code and data for this paper are at \textit{https://github.com/Echo-Motion/Echo.
Poster






DescriptionThe project recontextualizes classical Chinese poetry into modern pop music through interactive narrative visualization, blending tradition with modernity to reveal new meanings, emotions, and cultural functions of Chinese poetry.
Poster






DescriptionEchoes of the Matchlock is a Mixed Reality system making Japanese matchlock firearms engaging for younger audiences through gamified education, bridging the gap in cultural heritage transmission.
Art Gallery






DescriptionEcovotive Chimeras (2025) is a generative AI-based installation featuring 3D-printed hybrid beings and AR animations that ritualize ecological grief while exploring expanded AI aesthetics. Emerging from global wildfire imagery, these spectral chimeras invite audiences into a shared zone of transformation, mourning, and post-natural becoming.
Poster






DescriptionThis study proposes a method for designing and fabricating jellies with desired color caustic patterns by embedding colored jelly inclusions in a transparent base jelly.
Technical Papers


DescriptionRadiance fields such as 3D Gaussian Splatting allow real-time rendering of scenes captured from photos. They also reconstruct most specular reflections with high visual quality, but typically model them with “fake” reflected geometry, using primitives behind the reflector. Our goal is to correctly reconstruct the reflector and the reflected objects such as to make specular reflections editable; we present a proof of concept which exploits promising learning-based methods to extract diffuse and specular buffers from photos, as well as geometry and BRDF buffers. Our method builds on three key components. First, by using diffuse/specular buffers of input training views, we optimize a diffuse version of the scene and use path tracing to efficiently generate physically-based specular reflections. Second, we present a specialized training method that allows this process to converge. Finally, we present a fast ray tracing algorithm for 3D Gaussian primitives that enables efficient multi-bounce reflections. Our method reconstructs reflectors and reflected objects—including those not seen in the input images—in a unique scene representation. Our solution allows real-time, consistent editing of captured scenes with specular reflections, including multi-bounce effects, changing roughness etc. We mainly show results using ground truth buffers from synthetic scenes, and also preliminary results in real scenes with currently imperfect learning-based buffers.
Games






DescriptionThis session invites educators to discuss the potential of AI application to game and animation development, and explore how education and training of game and animation development will be changed. Three educators will share their insights and their presentation topics are as follows:
"Game development education in the AI Age" by Prof Mike Fischer, Professor of Interactive Media, USC Games
"Exploration of AI Generated Content Integration in Animation Teaching” by Prof Zhou Xiaogan, Sichuan University of Media and Communications, China
"Player, Student, Teacher: Discovering Agency in the Classroom" by Vince Siu, Founder of Press Start Academy and Games for Change Hong Kong
"Game development education in the AI Age" by Prof Mike Fischer, Professor of Interactive Media, USC Games
"Exploration of AI Generated Content Integration in Animation Teaching” by Prof Zhou Xiaogan, Sichuan University of Media and Communications, China
"Player, Student, Teacher: Discovering Agency in the Classroom" by Vince Siu, Founder of Press Start Academy and Games for Change Hong Kong
Birds of a Feather






DescriptionThis birds of a feather session seeks to connect industry (Trainers and Talent Managers) with Educators.
It's very often difficult for these two groups to make cross sector connections and SIGGRAPH gives an amazing opportunity for them to get together and work on strengthening ties.
This benefits education, industry and cultures across the board.
I've been involved with, and run similar BOF sessions in the past and feel that they have always been impactful.
Topics to discuss include:
Curriculum.
Pedagogy.
Soft Skills
Fundamentals.
Making the leap from Graduate to Employee.
Understanding each others world.
It's very often difficult for these two groups to make cross sector connections and SIGGRAPH gives an amazing opportunity for them to get together and work on strengthening ties.
This benefits education, industry and cultures across the board.
I've been involved with, and run similar BOF sessions in the past and feel that they have always been impactful.
Topics to discuss include:
Curriculum.
Pedagogy.
Soft Skills
Fundamentals.
Making the leap from Graduate to Employee.
Understanding each others world.
Poster






DescriptionThis study proposes applying manga-style "focus lines" for attentional guidance in VR , demonstrating through comparative experiments that they can enhance guidance effects over conventional methods while suggesting it is possible to maintain immersion by adjusting parameters.
Technical Communications


DescriptionGalvanic vestibular stimulation in virtual-reality enhances immersion by aligning visual and vestibular cues. It advances human–computer interactions, simulations, motion-simulating games, offering new opportunities for next-generation virtual, augmented, and mixed reality.
Technical Papers


DescriptionIn this paper, we introduce a novel approach to spatial regularization of
optimal transport problems. Based on the notion of forward and backward "mean maps" of a transport plan, we introduce a convex formulation of optimal transport problems that incorporates regularization of these mean maps to promote spatial continuity of the resulting optimal plan. Unlike previous regularization approaches that required the optimization of all the transport plan coefficients, our formulation translates into an ADMM-based solver combined with Sinkhorn type algorithms, which drastically reduces the number of variables and scales up to large problems. We demonstrate the usefulness and efficiency of this new computational tool for various applications and for different regularizations.
optimal transport problems. Based on the notion of forward and backward "mean maps" of a transport plan, we introduce a convex formulation of optimal transport problems that incorporates regularization of these mean maps to promote spatial continuity of the resulting optimal plan. Unlike previous regularization approaches that required the optimization of all the transport plan coefficients, our formulation translates into an ADMM-based solver combined with Sinkhorn type algorithms, which drastically reduces the number of variables and scales up to large problems. We demonstrate the usefulness and efficiency of this new computational tool for various applications and for different regularizations.
Technical Papers


DescriptionIn 3D object reconstruction from photographs, estimating material properties is challenging. We propose an inverse rendering method that uses active area lighting: as this provides a wider range of lighting angles per photo than point lighting, material reconstruction can be more accurate for the same number of photos. We compare area light shading with point lighting. With either mesh or 3D Gaussian splatting pipelines, area lighting can improve BRDF reconstruction and leads to a relighting PSNR improvement of 3 dB, or alternatively needs 1/5 of the input photos for the same quality as with point lights. We also compare area light shading with Monte Carlo ray tracing and with differential linearly transformed cosines (LTC) plus shadow visibility weighting. LTC can be faster, improving optimization times by 25%. In end-to-end method-level comparisons, our approach improves material reconstruction over SOTA methods, particularly for material roughness, leading to superior relighting quality.
Poster






DescriptionFor large-scale CFD/FEA grids, we propose importance-driven hybrid reduction—explicit decimation for low-importance and learning-based reduction for salient regions—achieving 3–8× size reduction with ≈99% accuracy and higher FPS.
Technical Papers


DescriptionReal-time 3D reconstruction is a fundamental task in computer graphics. Recently, differentiable-rendering-based SLAM system has demonstrated significant potential, enabling photorealistic scene rendering through learnable scene representations such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Current differentiable rendering methods face dual challenges in real-time computation and sensor noise sensitivity, leading to degraded geometric fidelity in scene reconstruction and limited practicality. To address these challenges, we propose a novel real-time system EGG-Fusion, featuring robust sparse-to-dense camera tracking and a geometry-aware Gaussian surfel mapping module, introducing an information filter-based fusion method that explicitly accounts for sensor noise to achieve high-precision surface reconstruction. The proposed differentiable Gaussian surfel mapping effectively models multi-view consistent surfaces while enabling efficient parameter optimization. Extensive experimental results demonstrate that the proposed system achieves a surface reconstruction error of 0.6 cm on standardized benchmark datasets including Replica and ScanNet++, representing over 20% accuracy improvement compared to state-of-the-art (SOTA) GS-based methods. Notably, the system maintains real-time processing capabilities at 24 FPS, establishing it as one of the most accurate differentiable rendering-based real-time reconstruction systems currently available.
Poster






DescriptionWe propose a two-stage pipeline combining causal TCN and Transformer refinement, enabling accurate and temporally coherent 3D egocentric pose estimation under severe occlusions with minimal latency.
Technical Papers


DescriptionBlind Face Restoration (BFR) aims to recover face images suffering from unknown degradations. A recent approach to solve BFR is via plug-and-play methods for image restoration, which combine a likelihood function with pre-trained diffusion models as priors. However, as the likelihood is inherently unknown in BFR, existing methods rely instead on heuristic constraints. This leads to suboptimal distortion and identity preservation metrics. We introduce Expectation-based Likelihood Approximation with Diffusion prior (ELAD), a novel plug-and-play approach that explicitly models the likelihood function for BFR. ELAD estimates the first and second moments of the likelihood distribution by employing a Degradation Estimator to predict the degradation sequence from the input. This enables principled Bayesian inference without requiring end-to-end training. Our method achieves state-of-the-art distortion and identity preservation results compared to existing plug-and-play BFR techniques, while maintaining competitive perceptual quality. As we show, while being plug-and-play, our method still rivals end-to-end trained BFR models.
Educator's Forum



DescriptionThis paper investigates the integration of experimental motion capture into a collaborative drawing workshop as a pedagogical strategy to foster embodied learning, multimodal thinking in creative education. Facilitated by UNSW's Drawing Research Group and involving visiting academics and animation students from Hong Kong Polytechnic University, the workshop explored drawing as a time-based, spatially immersive process, extending traditional mark making through full-body movement, motion capture, and collective experimentation. Activities ranged from large-scale floor drawings to sensory tabletop exercises, scaffolded through creative prompts that encouraged play, improvisation, and non-verbal communication. Emphasising process over product, the workshop reframed motion capture as a generative material, subject to distortion, abstraction, and surprise, rather than a precision tool. Drawing on Bishop’s Antagonism and relational aesthetics (2004) as well as embodied cognition, posthuman creativity, and Universal Design for Learning (UDL), the paper argues for a reimagining of authorship, assessment, and participation in art and design education. Findings highlight how multimodal, collaborative tasks foster student agency, critical engagement with digital materiality, and experimental learning literacy. The workshop affirms the value of interdisciplinary, sensorimotor-rich environments that blur boundaries between drawing, performance, and media art, modelling an inclusive and future-facing pedagogy.
XR






DescriptionFORM is a motion-capture interactive prototype that transforms body movement into dynamic audiovisual environments in real time. Using Rokoko suits and Unreal Engine 5, it explores how subtle gestures generate responsive feedback, emphasizing embodied interaction and the unpredictable nature of digital expression.
Birds of a Feather






DescriptionFor attendees exploring robotics, computer vision, and AI-driven automation in photography and videography for creative media production and interactive storytelling.
Poster






DescriptionThis study uses VR-based autonomous driving scenarios with Calm and Tense styles to analyze multimodal responses. A Random Forest classifier achieved 97.1% accuracy, demonstrating feasibility of emotion-aware driving systems.
Emerging Technologies






DescriptionExperience Holographic AR-HUD with Your Eyes: This prototype leverages a free-form optical combiner and learned calibration to deliver distortion-free, multi-plane 3D holographic AR on automotive windshields. Attendees can directly view the holograms through a beam splitter, with true depth perception. Our camera-in-the-loop system compensates complex curvature, enabling practical automotive integration
XR






DescriptionHybrid poster sessions are essential for academic exchange but difficult to adapt due to the lack of gaze and gesture cues in traditional platforms. We propose a face-to-face display system with gaze rendering and robot-controlled viewpoints, enabling natural, equitable interaction between remote and onsite participants in hybrid environments.
Poster






DescriptionWe introduce a method that makes Large Language Models both data-aware and visually-aware by augmenting them with structured metadata. Visual features and textual descriptions are combined into a compact JSON file, enabling accurate, context-specific answers without fine-tuning. Tested on geospatial datasets, a user study confirmed its effectiveness and user appeal.
Emerging Technologies






DescriptionEnchanted Touch is a finger-worn module that leaves the finger-pad bare yet adds real-time warmth, coolness, and vibration to any object by redirecting stimuli from the finger’s sides and nail. Attendees will bring plush toys to life, feel elemental trading-card effects, and explore tablet textures without modifying the objects.
Poster






DescriptionThis study aims to implement a cel-shader in Unreal Engine and discuss its pros and cons, such as limitations and how one may be able to do so.
Poster






DescriptionWe propose a novel VR method combining first-person and third-person perspectives. It uses binocular rivalry to balance embodied sensation and spatial awareness. Visual enhancements improve the effect and task efficiency.
Art Papers



DescriptionThis paper presents a computational framework for critically reinterpreting colonial-era stereoscopic box sets. Using James Ricalton’s China through the Stereoscope (1901) as a case study, we deconstruct its linear narrative using NLP methods for thematic modeling, semantic mapping, and colonial language identification. This analysis informs two complementary artworks in a large-scale 360° VR environment. Cross Eyed offers a guided experience, using thematic models as lenses for inquiry and visually distinguishing identified rhetoric. Latent Cartographies, in contrast, empowers users to freely navigate the archive’s raw semantic space, fostering diverse modes of experiential engagement with contested heritage.
Technical Papers


DescriptionInteractive applications demand believable character animation that responds naturally to dynamic environments. Traditional animation techniques often struggle to handle arbitrary situations, leading to a growing trend of dynamically selecting motion-captured animations based on predefined features. While Motion Matching has proven effective for locomotion by aligning to target trajectories, animating environment interactions and crowd behaviors remains challenging due to the need to consider surrounding elements. Existing approaches often involve manual setup or lack the naturalism of motion capture. Furthermore, in crowd animation, body animation is frequently treated as a separate process from trajectory planning, leading to inconsistencies between body pose and root motion. To address these limitations, we present Environment-aware Motion Matching, a novel real-time system for full-body character animation that dynamically adapts to obstacles and other agents, emphasizing the bidirectional relationship between pose and trajectory. In a preprocessing step, we extract shape, pose, and trajectory features from a motion capture database. At runtime, we perform an efficient search that matches user input and current pose while penalizing collisions with a dynamic environment. Our method allows characters to naturally adjust their pose and trajectory to navigate crowded scenes.
Poster






DescriptionOur paper proposes ETCE, a novel concept erasure algorithm for text-to-image models, designed to prevent the generation of illegal or copyright-infringing content.
Technical Papers


DescriptionGeometric features between the micro and macro scales produce an ex-
pressive family of visual effects grouped under the term "glints". Efficiently
rendering these effects amounts to finding the highlights caused by the
geometry under each pixel. To allow for fast rendering, we represent our
faceted geometry as a 4D point process on an implicit multiscale grid, de-
signed to efficiently find the facets most likely to cause a highlight. The
facets’ normals are generated to match a given micro-facet normal distri-
bution such as Trowbridge-Reitz (GGX) or Beckmann, to which our model
converges under increasing surface area. Our method is simple to imple-
ment, memory-and-precomputation-free, allows for importance sampling
and covers a wide range of different appearances such as anisotropic as well
as individually colored particles. We provide a base implementation as a
standalone fragment shader.
pressive family of visual effects grouped under the term "glints". Efficiently
rendering these effects amounts to finding the highlights caused by the
geometry under each pixel. To allow for fast rendering, we represent our
faceted geometry as a 4D point process on an implicit multiscale grid, de-
signed to efficiently find the facets most likely to cause a highlight. The
facets’ normals are generated to match a given micro-facet normal distri-
bution such as Trowbridge-Reitz (GGX) or Beckmann, to which our model
converges under increasing surface area. Our method is simple to imple-
ment, memory-and-precomputation-free, allows for importance sampling
and covers a wide range of different appearances such as anisotropic as well
as individually colored particles. We provide a base implementation as a
standalone fragment shader.
Technical Communications


DescriptionEvolvingGS captures long, complex 3D human performances with high quality and stability by preserving temporal continuity and adapting to rapid changes without segmenting the sequence.
Technical Papers


DescriptionIn this work, we propose a system that covers the complete workflow for achieving controlled authoring and editing of textures that present distinctive local characteristics. These include various effects that change the surface appearance of materials, such as stains, tears, holes, abrasions, discoloration, and more. Such alterations are ubiquitous in nature, and including them in the synthesis process is crucial for generating realistic textures.
We introduce a novel approach for creating textures with such blemishes, adopting a learning-based approach that leverages unlabeled examples. Our approach does not require manual annotations by the user; instead, it detects the appearance-altering features through unsupervised anomaly detection. The various textural features are then automatically clustered into semantically coherent groups, which are used to guide the conditional generation of images.
Our pipeline as a whole goes from a small image collection to a versatile generative model that enables the user to interactively create and paint features on textures of arbitrary size. Notably, the algorithms we introduce for diffusion-based editing and infinite stationary texture generation are generic and should prove useful in other contexts as well.
We introduce a novel approach for creating textures with such blemishes, adopting a learning-based approach that leverages unlabeled examples. Our approach does not require manual annotations by the user; instead, it detects the appearance-altering features through unsupervised anomaly detection. The various textural features are then automatically clustered into semantically coherent groups, which are used to guide the conditional generation of images.
Our pipeline as a whole goes from a small image collection to a versatile generative model that enables the user to interactively create and paint features on textures of arbitrary size. Notably, the algorithms we introduce for diffusion-based editing and infinite stationary texture generation are generic and should prove useful in other contexts as well.
Emerging Technologies






DescriptionThe Experience Sharing Box with the Experium Platform records, storing, and playback synchronized audio-visual-haptic, multisensory memories. A spatial display and haptic gloves capture data tied to an Experience ID, distribute it across local and cloud storage, then retrieve it for aligned playback, as shown in arm-wrestling, ball-scooping, and toy-train scenes.
Birds of a Feather






DescriptionThis Birds of a Feather session brings together researchers, artists, developers, and engineers to explore the creative and technical frontiers of 3D Gaussian Splatting. Moderated by Mindy Li, the session features perspectives from research, education, and creative practice. Hao Wang will discuss advances in 3D scene reconstruction and applications with 3DGS, Zeyu Wang will present 3D Gaussians as a creative medium for digital humans and spatial editing, and Lukasz Mirocha, will share how he applies 3DGS and interaction techniques in teaching and creative works. The session concludes with a short panel discussion on current challenges and future possibilities of 3DGS.
Educator's Forum



DescriptionThis paper presents a reflective case study on the use of immersive technologies—virtual reality (VR) and robotics—within a cross-cultural educational hackathon. The 12-day program brought together interdisciplinary student teams from two countries to address a socially relevant challenge: how to promote physical activity among children on the autism spectrum. Anchored in a Problem-Based Learning (PBL) framework, the event combined ethnographic research, iterative prototyping, and intercultural collaboration. The paper critically examines the hackathon format as a pedagogical tool, exploring both its potential and its limitations. Particular attention is given to the dynamics of international teamwork, the challenges of inclusive design, and the tension between creativity and feasibility. It is argued that such educational interventions not only foster technical skills, but also cultivate empathy, cross-cultural understanding, and deeper engagement with complex real-world issues.
Educator's Forum



DescriptionRay tracing is a fundamental topic in introductory computer graphics courses. Previous research suggested that students find ray tracing particularly challenging. Ray tracing requires spatial understanding, mathematical skills, and programming skills, and it makes use of other computer graphics concepts such as illumination models. To investigate students’ conceptual understanding and identify common misconceptions, we conducted a drawing-based user study with 23 participants enrolled in an introductory course. Our findings revealed that the most common errors included misunderstandings of geometric and ray tracing concepts, as well as process-related mistakes. Based on our analysis, we propose an integrated instructional approach that incorporates spatial reasoning training, AR/VR-based visualization, and drawing-based assessment to reduce cognitive load and improve students’ understanding of ray tracing concepts.
Key Event
Keynote



DescriptionThe low altitude airspace, generally defined as the region below 1000 meters above ground level, remains a frontier ripe for exploration and economic exploitation. With advancing technology, this domain is poised to become a crucible for diverse economic activities, transmuting a mere natural resource into a potent economic asset. This presentation offers a comprehensive overview of the burgeoning low altitude economy (LAE), bolstered by first-hand insights into the infrastructure developments enabling LAE's realization. Specifically, I will delve into the research and development towards constructing a smart integrated infrastructure for the LAE. At the core of this infrastructure lies the Smart Integrated Low Altitude System (SILAS), an operating system designed to address the multifaceted needs of operations, regulations, and end-users.
Similar to conventional operating systems such as Windows, SILAS orchestrates resource management, activity coordination, and user administration within the low altitude airspace. This comprehensive management spans from the registration and operation of drones to the establishment of landing posts and the seamless orchestration of communication channels, ensuring all airborne activities are scheduled efficiently in both space and time. SILAS is engineered to perform real-time spatiotemporal flow computing for numerous flying objects, a critical capability to ensure safety within the low altitude airspace. This advanced system must adeptly manage the intricate and high-frequency flying activities, from observation to proactive guidance, overcoming numerous technological hurdles. Designed to handle one million daily flights in a major city, with a peak online presence of one hundred thousand, SILAS sets a new benchmark for airspace management. In comparison, contemporary metropolitan airports currently manage only a few thousand commercial flights daily. The volume and complexity of future flights in the low altitude airspace surpass the capabilities of traditional airspace management systems employed in commercial airports, underscoring the necessity of SILAS.
Similar to conventional operating systems such as Windows, SILAS orchestrates resource management, activity coordination, and user administration within the low altitude airspace. This comprehensive management spans from the registration and operation of drones to the establishment of landing posts and the seamless orchestration of communication channels, ensuring all airborne activities are scheduled efficiently in both space and time. SILAS is engineered to perform real-time spatiotemporal flow computing for numerous flying objects, a critical capability to ensure safety within the low altitude airspace. This advanced system must adeptly manage the intricate and high-frequency flying activities, from observation to proactive guidance, overcoming numerous technological hurdles. Designed to handle one million daily flights in a major city, with a peak online presence of one hundred thousand, SILAS sets a new benchmark for airspace management. In comparison, contemporary metropolitan airports currently manage only a few thousand commercial flights daily. The volume and complexity of future flights in the low altitude airspace surpass the capabilities of traditional airspace management systems employed in commercial airports, underscoring the necessity of SILAS.
Technical Papers


DescriptionWe present a method to automatically approximate a given surface with a small set of patches, each being a developable ruled surface featuring long ruling lines. These construction primitives are attractive for their inherent ease of fabrication by cutting and folding inextensible materials and for their favorable structural properties. Our algorithm strikes a good tradeoff between the simplicity of produced designs (in terms of the number and shapes of the patches) and approximation quality. To this end, it is guided by a smooth curvature-aligned cross-field.
Compared to traditional methods, we rely on final discretization steps to ensure the developability of the ruled surfaces and produce a fabricable layout, bypassing the need to enforce that the strips are strictly developable in continuous settings (which requires difficulty in enforcing geometric conditions). We demonstrate the effectiveness of the proposed algorithm by producing several viable designs and using them to physically fabricate various physical objects.
Compared to traditional methods, we rely on final discretization steps to ensure the developability of the ruled surfaces and produce a fabricable layout, bypassing the need to enforce that the strips are strictly developable in continuous settings (which requires difficulty in enforcing geometric conditions). We demonstrate the effectiveness of the proposed algorithm by producing several viable designs and using them to physically fabricate various physical objects.
Technical Papers


DescriptionWe present FairyGen, an automatic system to generate storied cartoon videos from a single child's drawing character with a highly personalized style. Unlike previous subjects and motion-customization methods, we identify the whole story as layers of character modeling, environment generation, and shot design for the continuous story. Giving a single hand-drawn image,our approach initiates by utilizing the Multi-modality Large Language Model~(MLLM) to create a structured storyboard that includes dynamic shots, setting up both the narrative flow and the spatial layout of the main character. To model the character, we develop a 3D proxy that allows us to produce tailored motion that incorporates intricate and real-world dynamics. Then, for environment generation, we design a style propagation adapter to learn the style from the foreground character and propagate it to the background via the pre-trained background inpainting diffusion models, so that the identity of the foreground is naturally guaranteed.
After the style customization, a shot design module is used to crop the scene image by the M-LLM for detailed shot design and to increase the diversity of the story. Finally, for animation, given the motion sequences from 3D proxy and the stylized prior, we then fine-tune the MMDiT-based image-to-video diffusion model to learn the complex motion of the given foreground character. This is achieved by a motion customization adapter with a timestep-shift strategy to keep long-term motion fidelity and coherence. After training, this model can be directly used on the cropped shots for generating diverse video scenes. Overall, we conduct extensive experiments and evaluations to demonstrate that FairyGen produces animations that are stylistically faithful, narratively aligned, and rich in natural, smooth motion, highlighting its effectiveness and flexibility for personalized story animation.
After the style customization, a shot design module is used to crop the scene image by the M-LLM for detailed shot design and to increase the diversity of the story. Finally, for animation, given the motion sequences from 3D proxy and the stylized prior, we then fine-tune the MMDiT-based image-to-video diffusion model to learn the complex motion of the given foreground character. This is achieved by a motion customization adapter with a timestep-shift strategy to keep long-term motion fidelity and coherence. After training, this model can be directly used on the cropped shots for generating diverse video scenes. Overall, we conduct extensive experiments and evaluations to demonstrate that FairyGen produces animations that are stylistically faithful, narratively aligned, and rich in natural, smooth motion, highlighting its effectiveness and flexibility for personalized story animation.
Technical Papers


DescriptionWe propose a Reinforcement Learning (RL) algorithm that combines several novel techniques to achieve more stable and robust control results for coupled solid-fluid systems. Our method utilizes the twin-delayed actor-critic algorithm to efficiently utilize off-policy data and achieve faster convergence. For more accurate estimations of the value function to guide the search of optimal policies, we use the Boltzmann softmax operator to reduce the bias of estimation. We further introduce a novel two-step Q-value estimator to reduce the well-known under-estimation issue. Furthermore, to mitigate the requirement of excessive exploration under sparse rewards, we propose the Fluid Effective Domain Guidance (FEDG) algorithm to guide policy exploration, where the policy for an easier task is trained jointly with that for a harder task. Put together, our framework achieves state-of-the-art performance in complex fluid-solid coupling control benchmarks, delivering stable and reliable performance in both 2D and 3D tasks over long horizons.
Poster






DescriptionTo overcome rich-contact problems, we propose a lightweight approximation scheme that significantly reduces the density of the contact system in large-scale elastic simulation.
Invited Poster
Poster






DescriptionWe present a GPU-friendly framework for real-time implicit simulation of hyperelastic materials with frictional contact. By embedding nonlinear complementarity conditions into a local-global scheme, the method efficiently handles strong nonlinearities and non-smooth interactions. A simple, highly parallel solver with a sparse inverse structure enables fast GPU performance, while a novel splitting strategy improves both computation and friction accuracy. Extensive experiments demonstrate robustness across large deformations, non-smooth contacts, and a wide range of material stiffnesses. Despite its efficiency and generality, the approach remains simple, relying only on standard matrix operations.
Invited Poster
Poster






DescriptionDetecting surface self-intersections is crucial for CAD modeling to prevent issues in simulation and manufacturing. This paper presents an algebraic-signature-based algorithm for fast determining self-intersections of NURBS surfaces. This signature is then recursively cross-used to compute the self-intersection locus, guaranteeing robustness in critical cases including tangency and small loops.
Poster






DescriptionTask–mesh shader pipeline maps sparse volume for first-hit rendering; L1 acceleration-structure jumps and tile-local HDDA remove intermediary geometry buffers, delivering 1.5x average speedups over vertex-shader baselines while preserving G-buffer equivalence.
Technical Papers


DescriptionWe present a novel multigrid solver framework that significantly advances the efficiency of physical simulation for unstructured meshes. While multigrid methods theoretically offer linear scaling, their practical implementation for deformable body simulations faces substantial challenges, particularly on GPUs. Our framework achieves up to 6.9× speedup over traditional methods through an innovative combination of matrix-free vertex block Jacobi smoothing with a Full Approximation Scheme (FAS), enabling both piecewise constant and linear Galerkin formulations without the computational burden of dense coarse matrices. Our approach demonstrates superior performance across varying mesh resolutions and material stiffness values, maintaining consistent convergence even under extreme deformations and challenging initial configurations. Comprehensive evaluations against state-of-the-art methods confirm our approach achieves lower simulation error with reduced computational cost, enabling simulation of tetrahedral meshes with over one million vertices at approximately one frame per second on modern GPUs.
Emerging Technologies






DescriptionAn interactive system that links visual heat cues to instant thermal feedback using a novel non-contact device. By analyzing video semantics and visual prominence, it dynamically delivers heating or cooling sensations, enabling users to feel what they see in immersive, multisensory experiences.
Emerging Technologies






DescriptionFlowing colored liquids in tubes have attracted interest in HCI. However, intricate patterns remain underexplored. We propose a tubular system to express gradients by controlling droplet density using a custom-designed connector inspired by microfluidic techniques. Fluids are driven by ElectroHydroDynamic (EHD) pumps, a lightweight, compact, silent alternative to mechanical systems. This approach expands the potential of liquid expression.
Technical Papers


DescriptionSelection is the first step in many image editing processes, enabling faster and simpler modifications of all pixels sharing a common modality. In this work, we present a method for material selection in images, robust to lighting and reflectance variations, which can be used for editing downstream tasks. We rely on vision transformer (ViT) models and process their features for selection, proposing a multi-resolution processing strategy that yields finer and more stable selection results than current methods. Furthermore, we enable selection at two levels: texture and sub-texture, leveraging our novel two-level material selection (DuMaS) dataset, which includes dense annotations for over 800,000 synthetic images, both on the texture and subtexture levels.
Art Papers



DescriptionFinHERtip introduces a musical performance system co-designed with a blind, queer musician to explore identity expression music creation. The system combines beadwork, computer vision, and audio synthesis to translate tactile gestures into layered, expressive sound. Drawing from both blind and queer cultural practices, it centers embodied authorship and reclaims technical space through personal ritual and material interaction. An Expressive Co-Creation Framework is proposed to support collaborations between human experience and machine response. This work advances accessible digital performance by challenging normative design and valuing lived identity as a site of creative agency.
Technical Papers


DescriptionWe present a combustion simulation framework to model fire phenomena across solids, liquids, and gases. Our approach extends traditional fluid solvers by incorporating multi-species thermodynamics and reactive transport for fuel, oxygen, nitrogen, carbon dioxide, water vapor, and residuals. Combustion reactions are governed by stoichiometry-dependent heat release, allowing accurate simulation of premixed and diffusive flames with varying severity and composition. We support a wide range of scenarios including jet fires, water suppression (sprinklers, sprays), fuel evaporation, and starvation conditions. The system enables interactive heat sources, fire detectors, and realistic rendering of flames (e.g., laminar-to-turbulent transitions, blue-to-orange color shifts). Our key contributions include the tight coupling of species dynamics with thermodynamic feedback, evaporation modeling, and a hybrid SPH-grid representation for efficient combustion simulation. We validate our method through numerous experiments, demonstrating versatility in both indoor and outdoor fire scenarios.
Poster






DescriptionOur fisheye patch structure efficiently incorporates global context in patch-based learning by compressing surroundings while preserving central detail, achieving accurate and consistent colorization of anime line drawings.
Invited Poster
Poster






DescriptionFixTalk is a novel framework for improving talking head generation by addressing identity leakage (IL) and rendering artifacts (RA). It identifies that IL arises from identity information within motion features and proposes an Enhanced Motion Indicator (EMI) to separate identity from motion. To fix RA, an Enhanced Detail Indicator (EDI) uses the leaked identity information to restore missing details. Experiments show that FixTalk outperforms existing methods, effectively reducing both IL and RA, and producing higher-quality talking head outputs.
Invited Poster
Poster






DescriptionWe achieve action transfer in heterogeneous scenarios with varying spatial structures or cross-domain subjects.
Poster






DescriptionOur single-stage Graph Beta Diffusion generates residential floorplans as graphs, jointly modeling coordinates, room types, and edges. An unsupervised Manhattan-alignment loss improves FID over GAN baselines with simpler, end-to-end pipeline.
Poster






DescriptionTherapist-supervised VR flight-exposure prototype pairs Meta Quest Pro with a three-actuator seat and web dashboard. Feasibility results: high usability and engagement, minimal simulator sickness, no adverse events, supporting aviophobia research.
Poster






DescriptionWe propose a force feedback button, which can provide three types of feedback: "pressing force," "click feeling," and "stroke." Using this feedback, we developed application examples for various situations.
Technical Papers


DescriptionWe generate subspaces from force distributions, allowing us to carry out force simulations in many regimes of actuations. The dragon above is composed of spring muscles that lend themselves to multiple types of motion, including walking, flapping it's wings, or wagging its tail. A user can interactively actuate each one of these motions, while observing physical dynamic effects with our subspace. In contrast, using a standard subspace that does not take into account these force distributions, leads to visible artifacts at run-time.
Technical Papers


DescriptionThe desire for cameras with smaller form factors has recently led to a push for exploring computational imaging systems with reduced optical complexity such as a smaller number of lens elements. Unfortunately such simplified optical systems usually suffer from severe aberrations, especially in off-axis regions, which can be difficult to correct purely in software.
In this paper we introduce Fovea Stacking, a new type of imaging system that utilizes an emerging dynamic optical component called the deformable phase plate (DPP) for localized aberration correction anywhere on the image sensor. By optimizing DPP deformations through a differentiable optical model, off-axis aberrations are corrected locally, producing a foveated image with enhanced sharpness at the fixation point - analogous to the eye’s fovea. Stacking multiple such foveated images, each with a different fixation point, yields a composite image free from aberrations. To efficiently cover the entire field of view, we propose joint optimization of DPP deformations under imaging budget constraints. Due to the DPP device's non-linear behavior, we introduce a neural network-based control model for improved agreement between simulation and hardware performance.
We further demonstrated that for extended depth-of-field imaging, Fovea Stacking outperforms traditional focus stacking in image quality. By integrating object detection or eye-tracking, the system can dynamically adjust the lens to track the object of interest-enabling real-time foveated video suitable for downstream applications such as surveillance or foveated virtual reality displays.
In this paper we introduce Fovea Stacking, a new type of imaging system that utilizes an emerging dynamic optical component called the deformable phase plate (DPP) for localized aberration correction anywhere on the image sensor. By optimizing DPP deformations through a differentiable optical model, off-axis aberrations are corrected locally, producing a foveated image with enhanced sharpness at the fixation point - analogous to the eye’s fovea. Stacking multiple such foveated images, each with a different fixation point, yields a composite image free from aberrations. To efficiently cover the entire field of view, we propose joint optimization of DPP deformations under imaging budget constraints. Due to the DPP device's non-linear behavior, we introduce a neural network-based control model for improved agreement between simulation and hardware performance.
We further demonstrated that for extended depth-of-field imaging, Fovea Stacking outperforms traditional focus stacking in image quality. By integrating object detection or eye-tracking, the system can dynamically adjust the lens to track the object of interest-enabling real-time foveated video suitable for downstream applications such as surveillance or foveated virtual reality displays.
Poster






DescriptionOur work investigates the efficacy and efficiency of utilizing foveated rendering in the domains of stenography, showing increases in payload capacity and image quality.
Technical Papers


DescriptionStokes parameters are the standard representation of polarized light intensity in Mueller calculus and are widely used in polarization-aware computer graphics. However, their reliance on local frames--aligned with ray propagation directions--introduces a fundamental limitation: numerical discontinuities in Stokes vectors despite physically continuous fields of polarized light. This issue originates from the Hairy Ball Theorem, which guarantees unavoidable singularities in any frame-dependent function defined over spherical directional domains. In this paper, we overcome this long-standing challenge by introducing the first frame-free representation of Stokes vectors. Our key idea is to reinterpret a Stokes vector as a Dirac delta function over the directional domain and project it onto spin-2 spherical harmonics, retaining only the lowest-frequency coefficients. This compact representation supports coordinate-invariant interpolation and distance computation between Stokes vectors across varying ray directions--without relying on local frames. We demonstrate the advantages of our approach in two representative applications: spherical resampling of polarized environment maps (e.g., between cube map and equirectangular formats), and view synthesis from polarized radiance fields. In both cases, conventional frame-dependent methods produce singularity artifacts. In contrast, our frame-free representation eliminates these artifacts, improves numerical robustness, and simplifies implementation by decoupling polarization encoding from local frames.
Art Papers



DescriptionThis paper investigates screenshots as intentional or automated acts
of device and network interaction. Their significance lies as much
in how they are interpreted as in how they are produced. Pictured
examples are analyzed for gestural, software, and networked influ-
ences that shape what becomes visible. Some highlight interpretive
reflections on interactive processes, while others raise doubts about
authorship when time-automated software produces similar results.
To probe this tension, screenshots were introduced to local photog-
raphers, prompting reflections on normally unseen networked or
embodied interactions within their practice. These findings suggest
that screenshots, though often overlooked, can serve as probes
for authorial intent, reframing how everyday images function as
archives of both physical and networked interaction.
of device and network interaction. Their significance lies as much
in how they are interpreted as in how they are produced. Pictured
examples are analyzed for gestural, software, and networked influ-
ences that shape what becomes visible. Some highlight interpretive
reflections on interactive processes, while others raise doubts about
authorship when time-automated software produces similar results.
To probe this tension, screenshots were introduced to local photog-
raphers, prompting reflections on normally unseen networked or
embodied interactions within their practice. These findings suggest
that screenshots, though often overlooked, can serve as probes
for authorial intent, reframing how everyday images function as
archives of both physical and networked interaction.
Technical Papers


DescriptionArticulated 3D objects are central to many applications in robotics, AR/VR, and animation. Recent approaches to modeling such objects either rely on optimization-based reconstruction pipelines that require dense-view supervision or on feed-forward generative models that produce coarse geometric approximations and often overlook surface texture. In contrast, open-world 3D generation of static objects has achieved remarkable success, especially with the advent of native 3D diffusion models such as Trellis. However, extending these methods to articulated objects by training native 3D diffusion models poses significant challenges. In this work, we present FreeArt3D, a training-free framework for articulated 3D object generation. Instead of training a new model on limited articulated data, FreeArt3D repurposes a pre-trained static 3D diffusion model (e.g., Trellis) as a powerful shape prior. It extends Score Distillation Sampling (SDS) into the 3D-to-4D domain by treating articulation as an additional generative dimension. Given a few images captured in different articulation states, FreeArt3D jointly optimizes the object’s geometry, texture, and articulation parameters—without requiring task-specific training or access to large-scale articulated datasets. Our method generates high-fidelity geometry and textures, accurately predicts underlying kinematic structures, and generalizes well across diverse object categories. Despite following a per-instance optimization paradigm, FreeArt3D completes in minutes and significantly outperforms prior state-of-the-art approaches in both quality and versatility.
Technical Papers


DescriptionWe propose FreeMusco, a motion-free framework that jointly learns latent representations and control policies for musculoskeletal characters. By leveraging the musculoskeletal model as a strong prior, our method enables energy-aware and morphology-adaptive locomotion to emerge without motion data. The framework generalizes across human, non-human, and synthetic morphologies, where distinct energy-efficient strategies naturally appear—for example, quadrupedal gaits in Chimanoid versus bipedal gaits in Humanoid. The latent space and corresponding control policy are constructed from scratch, without demonstration, and enable downstream tasks such as goal navigation and path following—representing, to our knowledge, the first motion-free method to provide such capabilities. FreeMusco learns diverse and physically plausible locomotion behaviors through model-based reinforcement learning, guided by the locomotion objective that combines control, balancing, and biomechanical terms. To better capture the periodic structure of natural gait, we introduce the temporally averaged loss formulation, which compares simulated and target states over a time window rather than on a per-frame basis. We further encourage behavioral diversity by randomizing target poses and energy levels during training, enabling locomotion to be flexibly modulated in both form and intensity at runtime. Together, these results demonstrate that versatile and adaptive locomotion control can emerge without motion capture, offering a new direction for simulating movement in characters where data collection is impractical or impossible.
Birds of a Feather






DescriptionThis session explores how emerging 3D geospatial technologies—Gaussian Splats, GeoAI, USD and advanced 3D workflows—are transforming digital twin creation. We will highlight breakthroughs in photorealistic capture, intelligent analysis, and scalable visualization that enable more dynamic and immersive digital twin experiences. Attendees will gain insights into recent cross-industry collaborations led by Khronos, OGC, Esri, Cesium, and Niantic to standardize Gaussian Splats within the glTF format, ensuring interoperability and performance. Real-world use cases will showcase how these capabilities enhance workflows, address technical challenges, and open new opportunities for building precise, intelligent, and future-ready geospatial digital twins.
Birds of a Feather






DescriptionThis Birds-of-a-Feather session brings together early CG pioneers with roots in Hong Kong who helped shape the industry’s formative years. The discussion will explore their cultural and professional transitions into the Western content creation landscape and how these experiences influenced CG practices. Participants will reflect on key milestones, technical and creative challenges, and lessons that remain relevant to today’s research and production pipelines. By sharing these pioneering journeys, the session aims to inspire emerging professionals to move beyond theory and contribute to the next wave of innovation in computer graphics.
Richard Chuang, Ellen Poon, Raman Hui, Tim Cheung, Paul Chung
Richard Chuang, Ellen Poon, Raman Hui, Tim Cheung, Paul Chung
Featured Session



DescriptionThis session explores how breakthroughs in computer graphics have become core enablers for physical AI. From neural rendering to world models, it is now possible to generate physically-based data to test and validate autonomous machines and digital twins at an incredible pace. We will show how the latest research, including 4D Gaussian splatting, 3D scene reconstruction and simulation-ready physically correct asset creation, is driving advancements across robotic embodiments.
Technical Papers


DescriptionHand-drawn character animation is a vibrant research area in computer graphics, presenting unique challenges in achieving geometric consistency while conveying expressive motion details. Traditional skeletal animation methods maintain geometric consistency but often struggle with complex non-rigid elements like flowing hair and skirts, resulting in unnatural deformation and missing secondary dynamics. In contrast, video diffusion models effectively synthesize physics-aware dynamics but suffer from stylized artifacts and geometric distortions when applied to stylized drawings due to domain gaps. In this work, we propose a novel hybrid animation system that integrates the strengths of skeletal animation and video diffusion priors. The core idea is to generate coarse images from characters retargeted with skeletal animations for geometric consistency guidance and to further enhance these images with video diffusion models in terms of texture details and secondary dynamics. We reformulate the enhancement of coarse images as an inpainting task and propose a domain-adapted diffusion model to refine regions requiring improvement, particularly those involving secondary dynamics, guided by user-provided masks. To further enhance motion realism, we propose a Secondary Dynamics Enhancement strategy during the denoising process that incorporates latent features from a pre-trained diffusion model enriched with human motion priors. Additionally, to address unnatural deformation resulting from hair sticking in skeletal animation, we introduce a hair layering modeling method that employs segmentation maps to separate hair from the body in the implicit fields, allowing our system to animate challenging hair-sticking characters more naturally. Through extensive experiments and a perceptual study, we demonstrate that our system generates high-fidelity animations with realistic dynamics and artistic integrity. The code and more animation results are available in the supplementary materials.
Educator's Forum



DescriptionThis paper presents a brand-new pedagogical framework for teaching Virtual Reality (VR) development to undergraduate students with no prior programming experience through an intensive 24-hour, narrative-driven curriculum. We demonstrate that combining story-centered learning with multi-instructor support enables novice learners to create sophisticated Unity-based VR applications addressing complex social challenges. Our approach integrates academic guidance, industry mentorship, and technical laboratory support to bridge the gap between creative vision and technical implementation. Results show the following four distinct application categories that emerged organically: student projects spanning healthcare interventions (MindEye), cultural preservation (Fire·Blessing), social justice awareness (Invisible Chain, OpenFashion VR), psychological narratives (Mindscape), and empathy-building experiences (Through Their Eyes). These demonstrate the effectiveness of this rapid prototyping methodology. Our findings indicate that narrative-driven, project-based learning can successfully democratize access to VR development education while maintaining high standards for technical achievement and social impact. This pedagogical workflow offers a replicable framework for institutions seeking to implement accessible, immersive technology curricula in resource-constrained environments, challenging traditional assumptions about technical education prerequisites and demonstrating VR's potential to address societal challenges through empathy-building, immersive experiences.
Technical Papers


DescriptionRecently, generating 3D assets with the control of condition images has achieved impressive quality.
However, existing 3D generation methods are limited to handling a single control objective and lack the ability to utilize multiple images to independently control different regions of a 3D asset, which hinders their flexibility in applications.
We propose Fuse3D, a novel method that enables generating 3D assets under the control of multiple images, allowing for the seamless fusion of multi-level regional controls from global views to intricate local details.
First, we introduce a Multi-Condition Fusion Module to integrate the visual features from multiple image regions.
Then, we propose a method to automatically align user-selected 2D image regions with their associated 3D regions based on semantic cues.
Finally, to resolve control conflicts and enhance local control features from multi-condition images, we introduce a Local Attention Enhancement Strategy that flexibly balances region-specific feature fusion.
Overall, we introduce the first method capable of controllable 3D asset generation from multiple condition images.
The experimental results indicate that Fuse3D can flexibly fuse multiple 2D image regions into coherent 3D structures, resulting in high-quality 3D assets.
However, existing 3D generation methods are limited to handling a single control objective and lack the ability to utilize multiple images to independently control different regions of a 3D asset, which hinders their flexibility in applications.
We propose Fuse3D, a novel method that enables generating 3D assets under the control of multiple images, allowing for the seamless fusion of multi-level regional controls from global views to intricate local details.
First, we introduce a Multi-Condition Fusion Module to integrate the visual features from multiple image regions.
Then, we propose a method to automatically align user-selected 2D image regions with their associated 3D regions based on semantic cues.
Finally, to resolve control conflicts and enhance local control features from multi-condition images, we introduce a Local Attention Enhancement Strategy that flexibly balances region-specific feature fusion.
Overall, we introduce the first method capable of controllable 3D asset generation from multiple condition images.
The experimental results indicate that Fuse3D can flexibly fuse multiple 2D image regions into coherent 3D structures, resulting in high-quality 3D assets.
Technical Papers
GarmageNet: A Multimodal Generative Framework for Sewing Pattern Design and Generic Garment Modeling
2:50pm - 3:01pm HKT Tuesday, 16 December 2025 Meeting Room S423+S424, Level 4

DescriptionRealistic digital garment modeling remains a labor-intensive task due to the intricate process of translating 2D sewing patterns into high-fidelity, simulation-ready 3D garments. We introduce GarmageNet, a unified generative framework that automates the creation of 2D sewing patterns, the construction of sewing relationships, and the synthesis of 3D garment initializations compatible with physics-based simulation.
Central to our approach is Garmage, a novel garment representation that encodes each panel as a structured geometry image, effectively bridging the semantic and geometric gap between 2D structural patterns and 3D garment geometries. Followed by GarmageNet, a latent diffusion transformer to synthesize panel-wise geometry images and GarmageJigsaw, a neural module for predicting point-to-point sewing connections along panel contours.
To support training and evaluation, we build GarmageSet, a large-scale dataset comprising 14,801 professionally designed garments with detailed structural and style annotations.
Our method demonstrates versatility and efficacy across multiple application scenarios, including scalable garment generation from multi-modal design concepts (text prompts, sketches, photographs), automatic modeling from raw flat sewing patterns, pattern recovery from unstructured point clouds, and progressive garment editing using conventional instructions, laying the foundation for fully automated, production-ready pipelines in digital fashion.
Central to our approach is Garmage, a novel garment representation that encodes each panel as a structured geometry image, effectively bridging the semantic and geometric gap between 2D structural patterns and 3D garment geometries. Followed by GarmageNet, a latent diffusion transformer to synthesize panel-wise geometry images and GarmageJigsaw, a neural module for predicting point-to-point sewing connections along panel contours.
To support training and evaluation, we build GarmageSet, a large-scale dataset comprising 14,801 professionally designed garments with detailed structural and style annotations.
Our method demonstrates versatility and efficacy across multiple application scenarios, including scalable garment generation from multi-modal design concepts (text prompts, sketches, photographs), automatic modeling from raw flat sewing patterns, pattern recovery from unstructured point clouds, and progressive garment editing using conventional instructions, laying the foundation for fully automated, production-ready pipelines in digital fashion.
Technical Papers


DescriptionIntegral linear operators play a key role in many graphics problems, but solutions obtained via Monte Carlo methods often suffer from high variance. A common strategy to improve the efficiency of integration across various inputs is to precompute the kernel function. Traditional methods typically rely on basis expansions for both the input and output functions. However, using fixed output bases can restrict the precision of output reconstruction and limit the compactness of the kernel representation. In this work, we introduce a new framework that approximates both the kernel and the input function using Gaussian mixtures. This formulation allows the integral operator to be evaluated analytically, leading to improved flexibility in kernel storage and output representation. Moreover, our method naturally supports the sequential application of multiple operators and enables closed-form operator composition, which is particularly beneficial in tasks involving chains of operators. We demonstrate the versatility and effectiveness of our approach across a variety of graphics problems, including environment map relighting, boundary value problems, and fluorescence rendering.
Technical Papers


DescriptionWe present Gaussian See, Gaussian Do, a novel approach for semantic 3D motion transfer from multiview video. Our method enables rig-free, cross-category motion transfer between objects with semantically meaningful correspondence. Building on implicit motion transfer techniques, we extract motion embeddings from source videos via condition inversion, apply them to rendered frames of static target shapes, and use the resulting videos to supervise dynamic 3D Gaussian Splatting reconstruction. Our approach introduces an anchor-based view-aware motion embedding mechanism, ensuring cross-view consistency and accelerating convergence, along with a robust 4D reconstruction pipeline that consolidates noisy supervision videos. We establish the first benchmark for semantic 3D motion transfer and demonstrate superior motion fidelity and structural consistency compared to adapted baselines.
Technical Papers


DescriptionGradient-domain rendering estimates image-space gradients using correlated sampling, which can be combined with color information to reconstruct smoother and less noisy images. While simple L2 reconstruction is unbiased, it often leads to visible artifacts. In contrast, most recent reconstruction methods based on learned or handcrafted techniques improve visual quality but introduce bias, leaving the development of practically unbiased reconstruction approaches relatively underexplored.
In this work, we propose a generalized framework for unbiased reconstruction in gradient-domain rendering. We first derive the unbiasedness condition under a general formulation that linearly combines pixel colors and gradients. Based on this unbiasedness condition, we design a practical algorithm that minimizes image variance while strictly satisfying unbiasedness. Experimental results demonstrate that our method not only guarantees unbiasedness but also achieves superior quality compared to existing unbiased and slightly biased reconstruction methods.
In this work, we propose a generalized framework for unbiased reconstruction in gradient-domain rendering. We first derive the unbiasedness condition under a general formulation that linearly combines pixel colors and gradients. Based on this unbiasedness condition, we design a practical algorithm that minimizes image variance while strictly satisfying unbiasedness. Experimental results demonstrate that our method not only guarantees unbiasedness but also achieves superior quality compared to existing unbiased and slightly biased reconstruction methods.
Technical Papers


DescriptionGenerating 3D scenes is still a challenging task due to the lack of readily available scene data. Most existing methods only produce partial scenes and provide limited navigational freedom. We introduce a practical and scalable solution that uses 360° video as an intermediate scene representation, capturing the full-scene context and ensuring consistent visual content throughout the generation. We propose WorldPrompter, a generative pipeline that synthesizes traversable 3D scenes from text prompts. WorldPrompter incorporates a conditional 360° panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model, trained with a mix of image and video data, achieves convincing spatial and temporal consistency for static scenes. This is validated by an average COLMAP matching rate of 94.6%, allowing for high-quality panoramic Gaussian splat reconstruction and improved navigation throughout the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360° video generators and 3D scene generation models.
Technical Papers


DescriptionWe focus on the problem of using generative diffusion models for the task of motion detailing: converting a rough ``sketch'' of a character animation, represented by a sparse set of coarsely posed, and imprecisely timed key poses, into a detailed, natural looking character animation. Current diffusion models can address the problem of correcting the timing of imprecisely timed key poses, but we find that no good solution exists for leveraging the diffusion prior to enhance a sparse set of key poses with additional pose detail. We overcome this challenge using a simple inference-time trick. Each diffusion step we blend the outputs of an unconditioned diffusion model with input key pose constraints using per-keypose tolerance weights, and pass this result in as the input condition to an pre-existing motion retiming model. We find this approach works significantly better than existing approaches that attempt to add detail by blending model outputs or via expressing keypose constraints as guidance. The result is the first diffusion model that can robustly convert blocking-level keyposes into plausible detailed character animations.
Technical Papers


DescriptionGenerating articulated objects, such as laptops and microwaves, is a crucial yet challenging task with extensive applications in Embodied AI and AR/VR. Current image-to-3D methods primarily focus on surface geometry and texture, neglecting part decomposition and articulation modeling. Meanwhile, neural reconstruction approaches (e.g., NeRF or Gaussian Splatting) rely on dense multi-view or interaction data, limiting their scalability.
In this paper, we introduce DreamArt, a novel framework for generating high-fidelity, interactable articulated assets from single-view images. DreamArt employs a three-stage pipeline: firstly, it reconstructs part‑segmented and complete 3D object meshes through a combination of image-to-3D generation, mask-prompted 3D segmentation, and part amodal completion. Second, we fine-tune a video diffusion model to capture part-level articulation priors, leveraging movable part masks as prompt and amodal images to mitigate ambiguities caused by occlusion.
Finally, DreamArt optimizes the articulation motion, represented by a dual quaternion, and conducts global texture refinement and repainting to ensure coherent, high-quality textures across all parts.
Experimental results demonstrate that DreamArt effectively generates high-quality articulated objects, possessing accurate part shape, high appearance fidelity, and plausible articulation, thereby providing a scalable solution for articulated asset generation.
In this paper, we introduce DreamArt, a novel framework for generating high-fidelity, interactable articulated assets from single-view images. DreamArt employs a three-stage pipeline: firstly, it reconstructs part‑segmented and complete 3D object meshes through a combination of image-to-3D generation, mask-prompted 3D segmentation, and part amodal completion. Second, we fine-tune a video diffusion model to capture part-level articulation priors, leveraging movable part masks as prompt and amodal images to mitigate ambiguities caused by occlusion.
Finally, DreamArt optimizes the articulation motion, represented by a dual quaternion, and conducts global texture refinement and repainting to ensure coherent, high-quality textures across all parts.
Experimental results demonstrate that DreamArt effectively generates high-quality articulated objects, possessing accurate part shape, high appearance fidelity, and plausible articulation, thereby providing a scalable solution for articulated asset generation.
Technical Papers


DescriptionWe seek to answer the question: what can a motion-blurred image reveal about a scene's past, present, and future? Although motion blur obscures image details and degrades visual quality, it also encodes information about scene and camera motion during an exposure. Previous techniques leverage this information to estimate a sharp image from an input blurry one, or to predict a sequence of video frames showing what might have occurred at the moment of image capture. However, they rely on handcrafted priors or network architectures to resolve ambiguities in this inverse problem, and do not incorporate image and video priors on large-scale datasets. As such, existing methods struggle to reproduce complex scene dynamics and do not attempt to recover what occurred before or after an image was taken. Here, we introduce a new technique that repurposes a pre-trained video diffusion model trained on internet-scale datasets to recover videos revealing complex scene dynamics during the moment of capture and what might have occurred immediately into the past or future. Our approach is robust and versatile; it outperforms previous methods for this task, generalizes to challenging in-the-wild images, and supports downstream tasks such as recovering camera trajectories, object motion, and dynamic 3D scene structure.
Educator's Forum



DescriptionGenerative AI is driving a new era of creativity and problem-solving—a renaissance for all and for humanity. This one-hour, hands-on workshop takes participants “from zero to hero” by guiding them through three interconnected themes: open data, space computing, and sustainable smart cities.
The session begins with open data, where participants curate and publish datasets from 3D urban scans. They will learn how to apply responsible data governance practices that make data easier to find, reuse, and share while ensuring broad societal benefit.
Building on this foundation, the workshop advances to space computing, where participants transform curated datasets into 3D models and interactive simulations. Through web-based visualizations, they explore how data-driven modeling can support applications in digital infrastructure and scientific exploration.
Finally, the workshop links these practices to sustainable smart cities. Participants will prototype simulations such as energy-aware drone routing for low-altitude logistics, connecting technical work to climate action and the United Nations Sustainable Development Goals (SDGs).
The session combines short lectures, guided exercises, and collaborative reflection, and it concludes by incubating capstone project proposals. These proposals extend workshop outputs into course- or program-level initiatives, enabling participants to integrate generative AI, open data, and interactive simulation into their own teaching and research contexts.
Attendees will leave with a collection of openly licensed teaching resources—including slides, activity guides, metadata templates, and Jupyter notebooks—along with practical strategies to embed responsible, human-centered innovation into curricula.
The session begins with open data, where participants curate and publish datasets from 3D urban scans. They will learn how to apply responsible data governance practices that make data easier to find, reuse, and share while ensuring broad societal benefit.
Building on this foundation, the workshop advances to space computing, where participants transform curated datasets into 3D models and interactive simulations. Through web-based visualizations, they explore how data-driven modeling can support applications in digital infrastructure and scientific exploration.
Finally, the workshop links these practices to sustainable smart cities. Participants will prototype simulations such as energy-aware drone routing for low-altitude logistics, connecting technical work to climate action and the United Nations Sustainable Development Goals (SDGs).
The session combines short lectures, guided exercises, and collaborative reflection, and it concludes by incubating capstone project proposals. These proposals extend workshop outputs into course- or program-level initiatives, enabling participants to integrate generative AI, open data, and interactive simulation into their own teaching and research contexts.
Attendees will leave with a collection of openly licensed teaching resources—including slides, activity guides, metadata templates, and Jupyter notebooks—along with practical strategies to embed responsible, human-centered innovation into curricula.
Birds of a Feather






DescriptionWhat an AI powered future might look like? Rapid evolution of computing capabilities enables transforming process of scientific discovery, creative production and invent new ways to experience and interact with art, music, design, film, literature, theatre, fashion and every other sphere of cultural production. This super unique BoF seeks to bring together technologists, artists, arts organizations and researchers to discuss what impact AI and generative technologies may have for our future. Agenda is available always at https://generative-ai-bof.matters.today
Birds of a Feather






DescriptionThis session brings together technologists, artists, and researchers to explore how generative AI is transforming visual content creation across Art-Tech. As the 8th installment of the AI for Creative Visual Content Generation, Editing, and Understanding (CVEU) series, following successful events at SIGGRAPH’25, CVPR’25, SIGGRAPH’24, and CVPR’24 with over 3,000 attendees, this session continues to grow the community. Partnering with leading universities and industry, it advances Art-Tech education and fosters genuine academia–industry integration. Confirmed speakers include Hollywood film creators Paul Debevec and actress Tia Carrere, filmmaker Tony Ngai, industry leaders Tencent-Video CEO, and top university deans Huamin Qu and Emilie Yeh.
Technical Papers


DescriptionEnabling photorealistic avatar animations in virtual and augmented reality (VR/AR) has been challenging because of the difficulty of obtaining ground truth state of faces. It is physically impossible to obtain synchronized images from head-mounted cameras (HMC) sensing input, which has partial observations in infrared (IR), and an array of outside-in dome cameras, which have full observations that match avatars' appearance. Prior works relying on analysis-by-synthesis methods could generate accurate ground truth, but suffer from imperfect disentanglement between expression and style in their personalized training. The reliance of extensive paired captures (HMC and dome) for the same subject makes it operationally expensive to collect large-scale datasets, which cannot be reused for different HMC viewpoints and lighting. In this work, we propose a novel generative approach, Generative HMC (GenHMC), that leverages large unpaired HMC captures, which are much easier to collect, to directly generate high-quality synthetic HMC images given any conditioning avatar state from dome captures. We show that our method is able to properly disentangle the input conditioning signal that specifies facial expression and viewpoint, from facial appearance, leading to more accurate ground truth. Furthermore, our method can generalize to unseen identities, removing the reliance on the paired captures. We demonstrate these breakthroughs by both evaluating synthetic HMC images and universal face encoders trained from these new HMC-avatar correspondences, which achieve better data efficiency and state-of-the-art accuracy.
Courses


DescriptionGenerative AI now drives storyboarding, previs, and look-development, yet two gaps slow adoption: artists struggle with opaque tools, while ML engineers lack cinematic grammar. This half-day master class closes both gaps by pairing concise theory with hands-on, human-in-the-loop practice and built-in ethics.
Through an Explain → Show → Do rhythm, each concept moves from a crisp technical snapshot to a live GPU demo and a guided task. Team exercises turn peer critique into a rapid feedback loop, while questions of authorship, bias, and legal clearance surface at every step—embedding responsible practice into real production workflows.
Live demos built on the CineVision pipeline transform a log-line into reference frames, shot lists, and colour-graded contact sheets, showcasing diffusion, LoRA, ControlNet, AnimateDiff, and IP-Adapter in action.
You will leave able to:
Explain how modern diffusion and multimodal generators work.
Customise AI tool-chains without ceding creative control.
Integrate AI assets into coherent, ethically sound sequences.
Assess—and build—production-ready pipelines that enhance director–cinematographer collaboration.
Advisory Committee
Prof. Maneesh Agrawala — Stanford University
Prof. Huamin Qu — The Hong Kong University of Science & Technology
Prof. James Evans — University of Chicago
Prof. Shane Denson — Stanford University
Prof. Tim Gruenewald — The University of Hong Kong
Prof. Bárbara Fernández-Melleda — The University of Hong Kong
Through an Explain → Show → Do rhythm, each concept moves from a crisp technical snapshot to a live GPU demo and a guided task. Team exercises turn peer critique into a rapid feedback loop, while questions of authorship, bias, and legal clearance surface at every step—embedding responsible practice into real production workflows.
Live demos built on the CineVision pipeline transform a log-line into reference frames, shot lists, and colour-graded contact sheets, showcasing diffusion, LoRA, ControlNet, AnimateDiff, and IP-Adapter in action.
You will leave able to:
Explain how modern diffusion and multimodal generators work.
Customise AI tool-chains without ceding creative control.
Integrate AI assets into coherent, ethically sound sequences.
Assess—and build—production-ready pipelines that enhance director–cinematographer collaboration.
Advisory Committee
Prof. Maneesh Agrawala — Stanford University
Prof. Huamin Qu — The Hong Kong University of Science & Technology
Prof. James Evans — University of Chicago
Prof. Shane Denson — Stanford University
Prof. Tim Gruenewald — The University of Hong Kong
Prof. Bárbara Fernández-Melleda — The University of Hong Kong
Poster






DescriptionWe propose a product recommendation system that utilizes generative AI to automatically create characters based on product features extracted from product images and instantly display them on shelf-edge digital signage.
In a user study with six store staff members, the system was highly evaluated for its usefulness.
The results suggest its potential to reduce the workload of creating promotional materials and to enable flexible, visually appealing in-store promotions.
In a user study with six store staff members, the system was highly evaluated for its usefulness.
The results suggest its potential to reduce the workload of creating promotional materials and to enable flexible, visually appealing in-store promotions.
Art Gallery






DescriptionAn architectural wall of generative clay tiles, created via an integrated Houdini-to-fabrication pipeline. Our unique coloring system uses AI to generate palettes from keywords, then applies the Kubelka-Munk model to predict physical glaze recipes. This work explores posthuman craft, where code, clay, and intent collaborate to revitalize ceramics.
Poster






DescriptionThis work introduces Generative Tonal Art Maps (GTAMs), a workflow that bridges generative art and non-photorealistic rendering, enabling artists to design customizable textures in p5.js and apply them for real-time stylized rendering in Blender.
Technical Papers


DescriptionManipulating the illumination of a 3D scene within a single image represents a fundamental challenge in computer vision and graphics. This problem has traditionally been addressed using inverse rendering techniques, which involve
explicit 3D asset reconstruction and costly ray-tracing simulations. Meanwhile, recent advancements in visual foundation models suggest that a new paradigm could soon be possible -- one that replaces explicit physical models with networks that are trained on large amounts of image and video data. In this paper, we exploit the implicit scene understanding of a video diffusion model, particularly Stable Video Diffusion, to relight a single image.
We introduce Genlit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model, enabling users to directly insert and manipulate a point light in the 3D world within a given image and generate results directly as a video sequence.
We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes, enabling single-image relighting with plausible and convincing shadows and inter-reflections. Our results highlight the ability of video foundation models to capture rich information about lighting, material, and shape, and our findings indicate that such models, with minimal training, can be used to perform relighting without explicit asset reconstruction or ray-tracing.
explicit 3D asset reconstruction and costly ray-tracing simulations. Meanwhile, recent advancements in visual foundation models suggest that a new paradigm could soon be possible -- one that replaces explicit physical models with networks that are trained on large amounts of image and video data. In this paper, we exploit the implicit scene understanding of a video diffusion model, particularly Stable Video Diffusion, to relight a single image.
We introduce Genlit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model, enabling users to directly insert and manipulate a point light in the 3D world within a given image and generate results directly as a video sequence.
We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes, enabling single-image relighting with plausible and convincing shadows and inter-reflections. Our results highlight the ability of video foundation models to capture rich information about lighting, material, and shape, and our findings indicate that such models, with minimal training, can be used to perform relighting without explicit asset reconstruction or ray-tracing.
Technical Communications


DescriptionWe provide formulations and methods to previously unsupported geometric queries for walk on stars on closed implicit surfaces, under the scope of walkin' Robin.
Technical Communications


DescriptionGeoROS++ is a robust real-time aerial orthophoto stitching framework that adaptively selects motion models and compensates illumination changes to handle large baselines and poor lighting effectively.
Emerging Technologies






DescriptionWe demonstrate the first physical prototype of a Truncated Cylindrical Array Plate (TCAP), an optical element for ghost-free mid-air images. Its design was verified through physically based rendering and realized by aligning multiple truncated cylinders to successfully form a floating letter in mid-air.
Technical Papers


DescriptionTracking and mapping in large-scale, unbounded outdoor environments using only monocular RGB input presents substantial challenges for existing SLAM systems. Traditional Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) SLAM methods are typically limited to small, bounded indoor settings. To overcome these challenges, we introduce GigaSLAM, the first NeRF/3DGS-based SLAM framework for kilometer-scale outdoor environments, as mainly demonstrated on the KITTI and KITTI 360 datasets. Our approach employs a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes. For front-end tracking, GigaSLAM utilizes a metric depth model combined with epipolar geometry and PnP algorithms to accurately estimate poses, while incorporating a Bag-of-Words-based loop closure mechanism to maintain robust alignment over long trajectories. Consequently, GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks, establishing a robust SLAM solution for large-scale, long-term scenarios, and significantly extending the applicability of Gaussian Splatting SLAM systems to unbounded outdoor environments.
Technical Papers


DescriptionWhen observing an intense light source, humans perceive dense radiating spikes known as glare/starburst patterns. These patterns are frequently used in computer graphics applications to enhance the perception of brightness (e.g., in games and films). Previous works have computed the physical energy distribution of glare patterns under daytime conditions using approximations like Fresnel diffraction. These techniques are capable of producing visually believable results, particularly when the pupil remains small. However, they are insufficient under nighttime conditions, when the pupil is significantly dilated and the assumptions behind the approximations no longer hold. To address this, we employ the Rayleigh–Sommerfeld diffraction solution, from which Fresnel diffraction is derived as an approximation, as our baseline reference. In pursuit of performance and visual quality, we also employ Ochoa’s approximation and the Chirp Z transform to efficiently generate high-resolution results for computer graphics applications. By also taking into account background illumination and certain physiological characteristics of the human photoreceptor cells, particularly the visual threshold of light stimulus, we propose a framework capable of producing plausible visual depictions of glare patterns for both daytime and nighttime scenes.
Birds of a Feather






DescriptionAs real-time 3D content becomes increasingly integral across industries, open standards such as glTF continue to expand to support new data types and rendering techniques. This session brings together ecosystem leaders advancing glTF implementations, exploring 3DGS and Gaussian Splat technologies, and integrating these capabilities into platforms such as Three.js and HarmonyOS 3D. Attendees will gain insight into how these complementary efforts are shaping an open, interoperable foundation for photorealistic, portable 3D experiences on the web, mobile devices, and beyond.
Games






DescriptionEcho of Spring is a first-person interactive VR narrative game set in a women’s dormitory at the turn of the century. Players explore personal belongings to unlock emotional connections with five characters, each represented by a symbolic “Five Elements” room—Wood, Fire, Earth, Metal, and Water—reflecting their memories, struggles, and destinies.
The gameplay centers on embodied interaction: players match items, trigger monologues, and experience transitions between the physical dormitory and inner consciousness spaces. Each room presents distinct mechanics and aesthetics, from shattering mirrors to igniting objects on stage. These interactions culminate in a “destiny exchange” ritual, where the player’s choices determine multiple endings.
Designed with a nostalgic, dreamlike visual style, the game blends cultural heritage with psychological storytelling. It explores themes of women’s identity, fate, and self-discovery, inviting players to reflect on how memory and choice shape who we are.
The gameplay centers on embodied interaction: players match items, trigger monologues, and experience transitions between the physical dormitory and inner consciousness spaces. Each room presents distinct mechanics and aesthetics, from shattering mirrors to igniting objects on stage. These interactions culminate in a “destiny exchange” ritual, where the player’s choices determine multiple endings.
Designed with a nostalgic, dreamlike visual style, the game blends cultural heritage with psychological storytelling. It explores themes of women’s identity, fate, and self-discovery, inviting players to reflect on how memory and choice shape who we are.
Computer Animation Festival






DescriptionGod Dam is a stop-motion mockumentary about two beavers that work at a post office. It follows a typical day-in-the-life of Clint Eatswood, the energetic manager, and Humphrey Damiels, his unenthusiastic employee. After their toilet erupts with envelopes, Clint and Humphrey argue about whose fault it is. Clint tries to get Humphrey to help him clean up, but instead Humphrey just finds ways to laugh at him. Reaching his breaking point, Clint tackles Humphrey and the two tussle around the office. Clint is left defeated with an even bigger mess to clean up and no hope that Humphrey will work overtime to help.
God Dam was created at the Savannah College of Art and Design in Savannah, Georgia, USA by a team of six animation students as their capstone project. The beaver puppets are made of silicone covered with fur and fabric, with a wire armature inside. To allow them to speak, each beaver has ten magnetic mouth pieces that key inside of a plastic head core that is also covered with fur. The stop-motion animation was shot at twelve frames-per-second using Dragonframe. God Dam was written, directed, and produced by Abigail Hill, who hopes it will bring laughter to audiences and looks forward to creating more episodes of this story in the future.
God Dam was created at the Savannah College of Art and Design in Savannah, Georgia, USA by a team of six animation students as their capstone project. The beaver puppets are made of silicone covered with fur and fabric, with a wire armature inside. To allow them to speak, each beaver has ten magnetic mouth pieces that key inside of a plastic head core that is also covered with fur. The stop-motion animation was shot at twelve frames-per-second using Dragonframe. God Dam was written, directed, and produced by Abigail Hill, who hopes it will bring laughter to audiences and looks forward to creating more episodes of this story in the future.
Poster






DescriptionGPStroke enables people to create GPS art by collecting fragmented daily walking paths as "strokes" and later combining them into artworks, promoting walking without requiring dedicated exercise time.
Technical Papers


DescriptionWe propose a training-free method for feature field rendering in 3D Gaussian Splatting, enabling fast and scalable embedding of high-dimensional features into 3D scenes. Unlike training-based feature distillation methods, which are computationally expensive and often yield feature embeddings that poorly reflect the rendered semantics, our approach back-projects 2D features onto pre-trained 3D Gaussians using influence weights derived from the rendering equation. This projection produces a queryable 3D feature field, validated on tasks including 2D and 3D segmentation, affordance transfer, and identity encoding, spanning queries using language, pixel, and synthetic embeddings. These capabilities, in turn, enable downstream applications in augmented and virtual reality, interactive scene editing, and robotics. Across different tasks, our method achieves performance comparable to or better than training-based approaches, while significantly reducing computational cost. The project page is at https://jojijoseph.github.io/3dgs-backprojection
Poster






DescriptionWe propose a CAD2CAM system using Graph Reinforcement Learning for brick assembly. Our method optimizes assembly stability, reducing costs and enhancing structural integrity in mobile manipulator tasks.
Technical Papers


DescriptionThis paper presents GS-RoadPatching, an inpainting method for driving scene completion by referring to completely reconstructed regions, which are represented by 3D Gaussian Splatting (3DGS). Unlike existing 3DGS inpainting methods that perform generative completion relying on 2D perspective-view-based diffusion or GAN models to predict limited appearance or depth cues for missing regions, our approach enables substitutional scene inpainting and editing directly through the 3DGS modality, extricating from requiring spatial-temporal consistency of 2D cross-modals, and eliminating the need for time-intensive re-training of Gaussians. Our key insight is that the highly repetitive patterns in driving scenes often share multi-modal similarities within the implicit 3DGS feature space, and are particularly suitable for structural matching to enable effective 3DGS based substitutional inpainting. Practically, we construct feature-embedded 3DGS scenes to incorporate a patch measurement method for abstracting local context at different scales, and subsequently, propose a structural search method to find candidate patches in 3D space effectively. Finally, we propose a simple yet effective substitution-and-fusion optimization for better visual harmony. We conduct extensive experiments on multiple publicly available datasets, to demonstrate the effectiveness and efficiency of our proposed method in driving scenes, and the results validate our method achieves state-of-the-art performance compared to the baseline methods in terms of both quality and interoperability. Additional experiments in general scenes also demonstrate the applicability of the proposed 3D inpainting strategy. Our source code will be publicly available.
Technical Papers


Description3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets and reconstructing faithful geometry with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. In contrary, volumetric signed distance field (SDF) methods provide robust geometry reconstruction, while the expensive ray marching hinders its real-time application and slows the training. Besides, these methods struggle to capture sharp geometric details. To this end, we
propose to guide 3DGS and SDF bidirectionally in a complementary manner, including an SDF-aided Gaussian splatting for efficient optimization of the relighting model and a GS-guided SDFenhancementforhigh-quality geometry reconstruction. At the core of our SDF-aided Gaussian splatting is the mutual supervision of the depth and normal between blended Gaussians and SDF,
which avoids the expensive volume rendering of SDF. Thanks to this mutual supervision, the learned blended Gaussians are well-constrained with a minimal time cost. As the Gaussians are rendered in a deferred shading mode, the alpha-blended Gaussians are smooth, while individual Gaussians may still be outliers, yielding floater artifacts. Therefore, we introduce an SDF-aware pruning strategy to remove Gaussian outliers located distant from the surface defined by SDF, avoiding the floater issue. This way, our GS framework provides reasonable normal and achieves realistic relighting, while the mesh of truncated SDF (TSDF) fusion from depth is still problematic. Therefore, we design a GS-guided SDF refinement, which utilizes the blended normal from Gaussians to finetune SDF. Equipped with the efficient enhancement, our method can further provide high-quality meshes for reflective objects at the cost of 17% extra training time.
propose to guide 3DGS and SDF bidirectionally in a complementary manner, including an SDF-aided Gaussian splatting for efficient optimization of the relighting model and a GS-guided SDFenhancementforhigh-quality geometry reconstruction. At the core of our SDF-aided Gaussian splatting is the mutual supervision of the depth and normal between blended Gaussians and SDF,
which avoids the expensive volume rendering of SDF. Thanks to this mutual supervision, the learned blended Gaussians are well-constrained with a minimal time cost. As the Gaussians are rendered in a deferred shading mode, the alpha-blended Gaussians are smooth, while individual Gaussians may still be outliers, yielding floater artifacts. Therefore, we introduce an SDF-aware pruning strategy to remove Gaussian outliers located distant from the surface defined by SDF, avoiding the floater issue. This way, our GS framework provides reasonable normal and achieves realistic relighting, while the mesh of truncated SDF (TSDF) fusion from depth is still problematic. Therefore, we design a GS-guided SDF refinement, which utilizes the blended normal from Gaussians to finetune SDF. Equipped with the efficient enhancement, our method can further provide high-quality meshes for reflective objects at the cost of 17% extra training time.
Technical Papers


Description3D Gaussian Splatting (3DGS) has shown strong capability in reconstructing and rendering photorealistic 3D scenes with high efficiency. However, extending 3DGS to synthesize large-scale or infinite terrains from a single captured exemplar—remains an open challenge. In this paper, we propose a tile-based framework that addresses this problem. Our method builds on Wang Tiles, where each tile encodes a local field of Gaussians with boundary constraints to ensure seamless transitions. This enables stochastic yet continuous tiling of Gaussian fields over arbitrary surfaces, allowing for procedural generation of expansive terrains with high spatial diversity. Furthermore, we introduce several rendering optimizations tailored to the unique characteristics of 3DGS Wang tiles, achieving real-time rendering of large-scale 3DGS terrains.
Educator's Forum



DescriptionThe rapid evolution of visual effects (VFX) technologies is reshaping both production pipelines and the skills expected of graduates entering the industry. Real-time rendering, virtual production, and generative artificial intelligence (AI) have expanded the creative toolkit more than ever before, but they also raise urgent pedagogical questions: How can educators prepare students for a profession where tools evolve faster than curricula? How can programs ensure graduates remain adaptable, employable, and artistically mature, rather than short-lived “tool operators”?
This talk builds on the outcomes of the SIGGRAPH 2025 panels Navigating the Future of VFX Education and Education Disrupted, as well as insights from other presentations at the conference. Both educators and industry supervisors emphasized three priorities: (1) grounding students in artistic and cinematic foundations such as composition, lighting, and storytelling; (2) ensuring proficiency in traditional VFX workflows including matchmove, compositing, and CG integration; and (3) creating structured opportunities for future-facing experimentation with real-time tools, AI-assisted workflows, and hybrid pipelines.
Case studies from classroom projects demonstrate how students can begin with conventional pipelines and later pivot to emerging technologies without sacrificing production quality or conceptual integrity. For example, projects that transitioned mid-production to real-time methods improved iteration speed while still requiring a solid grasp of cinematic fundamentals.
By embedding artistic foundations, workflow literacy, and experimentation into curricula, educators can foster adaptability and critical judgment. Industry-ready graduates are not defined by the software they know today, but by their ability to evaluate, learn, and creatively apply the tools of tomorrow.
This talk builds on the outcomes of the SIGGRAPH 2025 panels Navigating the Future of VFX Education and Education Disrupted, as well as insights from other presentations at the conference. Both educators and industry supervisors emphasized three priorities: (1) grounding students in artistic and cinematic foundations such as composition, lighting, and storytelling; (2) ensuring proficiency in traditional VFX workflows including matchmove, compositing, and CG integration; and (3) creating structured opportunities for future-facing experimentation with real-time tools, AI-assisted workflows, and hybrid pipelines.
Case studies from classroom projects demonstrate how students can begin with conventional pipelines and later pivot to emerging technologies without sacrificing production quality or conceptual integrity. For example, projects that transitioned mid-production to real-time methods improved iteration speed while still requiring a solid grasp of cinematic fundamentals.
By embedding artistic foundations, workflow literacy, and experimentation into curricula, educators can foster adaptability and critical judgment. Industry-ready graduates are not defined by the software they know today, but by their ability to evaluate, learn, and creatively apply the tools of tomorrow.
Art Gallery






DescriptionGuài is an MR interactive installation exploring AI’s role in identity construction. Combining biometric analysis with mythological transformation inspired by ancient Chinese literature, participants encounter speculative avatars shaped by algorithmic interpretation. The work interrogates generative systems, cultural epistemologies, AI ethics and post-human aesthetics—inviting reflection on how automated technologies reshape selfhood.
Poster






DescriptionThis project integrates Chinese Sign Language into immersive interaction, enhancing expressiveness and inclusivity while enabling barrier-free communication that bridges Deaf and hearing communities beyond natural gestures and instruction.
Courses


DescriptionThis is an intermediate level course for attendees to gain a strong understanding of the basic principles of generative AI. The course will help build intuition around several topics with easy-to-understand explanations and examples from some of the prevalent algorithms and models including Autoencoders, CNN, Diffusion Models, Transformers, and NeRFs.
Emerging Technologies






DescriptionWe present a haptic customization system for motor skill learning that integrates three feedback modalities: EMS, vibrotactile, and mechanical linkage. Our tool enables users to personalize feedback combinations, leading to improved performance and higher agency.
Poster






DescriptionWe present Haptix - a framework for context-aware haptics synthesis for mobile cinematic videos, enhancing immersion through spatiotemporal alignment. User studies highlight effectiveness, adaptive design needs, and immersive experiences.
Technical Papers


DescriptionWe present a variance reduction technique for Walk on Spheres (WoS) that solves elliptic partial differential equations (PDEs) by combining overlapping harmonic expansions of the solution, each estimated using unbiased Monte Carlo walks. Our method supports both the Laplace and screened-Poisson equations with Dirichlet, Neumann and Robin boundary conditions in 2D and 3D. By adaptively covering the domain with local expansion regions and extrapolating from each using a truncated Fourier basis, we achieve orders of magnitude lower error than traditional pointwise WoS, in equal time. While low-order truncations could achieve low bias, unbiased reconstruction is possible with stochastic truncation. Compared to other recently developed caching algorithms for WoS such as Boundary and Mean Value Caching, our approach generally generates results with lower error and fewer correlation artifacts.
Technical Papers


DescriptionDeep image restoration models aim to learn a mapping from degraded image space to natural image space. However, they face several critical challenges: removing degradation, generating realistic details, and ensuring pixel-level consistency. Over time, three major classes of methods have emerged, including MSE-based, GAN-based, and diffusion-based methods. However, they fail to achieve a good balance between restoration quality, fidelity, and speed. We propose a novel method, HYPIR, to address these challenges. Our solution pipeline is straightforward: it involves initializing the image restoration model with a pre-trained diffusion model and then fine-tuning it with adversarial training. This approach does not rely on diffusion loss, iterative sampling, or additional adapters. We theoretically demonstrate that initializing adversarial training from a pre-trained diffusion model positions the initial restoration model very close to the natural image distribution. Consequently, this initialization improves numerical stability, avoids mode collapse, and substantially accelerates the convergence of adversarial training. Moreover, HYPIR inherits the capabilities of diffusion models with rich user control, enabling text-guided restoration and adjustable texture richness. Requiring only a single forward pass, it achieves faster convergence and inference speed than diffusion-based methods. Extensive experiments show that HYPIR outperforms previous state-of-the-art methods, achieving efficient and high-quality image restoration.
Poster






DescriptionWe introduce a novel computational framework for designing hybrid-anisotropic-Voronoi microstructures governed by a set of fully geometric-driven parameters, enabling comprehensive control for 3D cell shape, orientation, density, hybridity, and thickness.
Invited Poster
Poster






DescriptionThis study addresses the significant challenge of generating realistic 3D human models from a single image. Our HFHuman presents an innovative approach that extracts and fuses multi-modality information, such as depth, surface normals, and body posture, from single images to construct more precise 3D models.
Technical Papers


DescriptionThis paper presents a new approach to estimate accurate and robust 3D semantic correspondence with a hierarchical neural semantic representation. Our work has three key contributions. First, we design the hierarchical neural semantic representation (HNSR), which consists of a global semantic
feature capturing high-level structure and multi-resolution local geometric features preserving fine details, by carefully harnessing the 3D priors from pre-trained 3D generative models. Second, we design a progressive global-tolocal matching strategy, which establishes coarse semantic correspondence using global semantic feature then iteratively refines it with local geometric features, yielding accurate and semantically-consistent mappings. Third, our framework is training-free and broadly compatible with various pretrained 3D generative backbones, demonstrating strong generalization across diverse shape categories. Our method also supports various applications, such as shape co-segmentation, keypoint matching, and texture transfer , and
generalizes well to structurally diverse shapes, with promising results even in cross-category scenarios. Both qualitative and quantitative evaluations show that our method outperforms previous state-of-the-art techniques.
feature capturing high-level structure and multi-resolution local geometric features preserving fine details, by carefully harnessing the 3D priors from pre-trained 3D generative models. Second, we design a progressive global-tolocal matching strategy, which establishes coarse semantic correspondence using global semantic feature then iteratively refines it with local geometric features, yielding accurate and semantically-consistent mappings. Third, our framework is training-free and broadly compatible with various pretrained 3D generative backbones, demonstrating strong generalization across diverse shape categories. Our method also supports various applications, such as shape co-segmentation, keypoint matching, and texture transfer , and
generalizes well to structurally diverse shapes, with promising results even in cross-category scenarios. Both qualitative and quantitative evaluations show that our method outperforms previous state-of-the-art techniques.
Poster






DescriptionWe propose an electrolysis-free underwater wireless power transfer method enabling clear bubble-free 3D displays. The method powers 7 mm cubic devices and supports 115.2 kbps communication, achieving high-clarity volumetric visualization.
Technical Papers


DescriptionGenerating highly dynamic and photorealistic portrait animations driven by audio and skeletal motion remains challenging due to the need for precise lip synchronization, natural facial expressions, and high-fidelity body motion dynamics. We propose a human-preference-aligned diffusion framework that addresses these challenges through two key innovations. First, we introduce direct preference optimization tailored for human-centric animation, leveraging a curated dataset of human preferences to align generated outputs with perceptual metrics for portrait motion-video alignment and naturalness of expression. Second, the proposed temporal motion modulation resolves spatiotemporal resolution mismatches by reshaping motion conditions into dimensionally aligned latent features through temporal channel redistribution and proportional feature expansion, preserving the fidelity of high-frequency motion details in diffusion-based synthesis. The proposed mechanism is complementary to existing UNet and DiT-based portrait diffusion approaches, and experiments demonstrate obvious improvements in lip-audio synchronization, expression vividness, body motion coherence over baseline methods, alongside notable gains in human preference metrics. Code and datasets will be released to advance reproducible research in preference-aligned portrait animation.
Technical Papers


DescriptionDiffusion models have emerged as the leading approach for image synthesis, demonstrating exceptional photorealism and diversity. However, training diffusion models at high resolutions remains computationally prohibitive, and existing zero-shot generation techniques for synthesizing images beyond training resolutions often produce artifacts, including object duplication and spatial incoherence. In this paper, we introduce HiWave, a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis using pretrained diffusion models. Our method employs a two-stage pipeline: generating a base image from the pretrained model followed by a patch-wise DDIM inversion step and a novel wavelet-based detail enhancer module. Specifically, we first utilize inversion methods to derive initial noise vectors that preserve global coherence from the base image. Subsequently, during sampling, our wavelet-domain detail enhancer retains low-frequency components from the base image to ensure structural consistency, while selectively guiding high-frequency components to enrich fine details and textures. Extensive evaluations using Stable Diffusion XL demonstrate that HiWave effectively mitigates common visual artifacts seen in prior methods, achieving superior perceptual quality. A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80\% of comparisons, highlighting its effectiveness for high-quality, ultra-high-resolution image synthesis without requiring retraining or architectural modifications.
Technical Papers


DescriptionWhile recent advances in human-object interaction (HOI) video generation showcase promising capabilities for synthesizing coordinated human-object dynamics, existing methods remain constrained by their reliance on meticulously curated motion sequences and actor-specific data, thereby limiting practical scalability and user accessibility. Furthermore, generalization to novel object appearances and interaction scenarios remains understudied. To address these limitations, we propose HOMA, a weakly conditioned multimodal-driven HOI video generation framework that introduces sparse, decoupled motion guidance to enhance controllability and reduce dependency on stringent input conditions. Our approach encodes appearance and motion signals into the dual input space of a multimodal diffusion transformer (MMDiT), fusing them within a shared context space to enable temporally consistent and physically plausible interactions. To optimize learning efficiency and feature injection accuracy, we introduce a parameter-space HOI adapter initialized with pretrained MMDiT weights to preserve prior knowledge while enabling efficient adaptation. Additionally, we design a facial cross-attention adapter for audio-driven lip synchronization, ensuring anatomically accurate speech animation. Extensive experiments demonstrate that HOMA achieves state-of-the-art performance in interaction naturalness and generalization under weak supervision, outperforming existing methods by significant margins. We further illustrate HOMA's versatility through diverse applications, including text-conditioned generation and interactive object manipulation, facilitated by a user-friendly demo interface.
Art Papers



DescriptionIn the paper we introduce Homogenized Image Planet, a multi-stage process artwork that examines how planetary images acquire authority today. We collect visuals from NASA archives, Wikipedia, search platforms, and generative AI; archive their metadata; and fuse them with equal weights after feature-based alignment. The resulting texture animates a Unity and WebGL installation and is preserved on a radiation-tolerant chip launched to low-Earth orbit, forming a physical–virtual digital twin. Framed by power/knowledge, computational objectivity, and simulacra, audience surveys and expert interviews indicate decreased trust in “official” imagery and increased critical awareness of algorithmic mediation.
Poster






DescriptionHoverCanvas is a drone-based aerial display that achieves stable projection onto a 2.3m screen from the projection system 100m away. Its novel use of polar IR markers and high-speed mirror control overcomes critical challenges of self-occlusion and projection latency.
Technical Papers


DescriptionNatural head rotation is critical for believable embodied virtual agents, yet this micro-level behavior remains largely underexplored.
While head-rotation prediction algorithms could, in principle, reproduce this behavior, they typically focus on visually salient stimuli and overlook the cognitive motives that guide head rotation.
This yields agents that look at conspicuous objects while overlooking obstacles or task-relevant cues, diminishing realism in a virtual environment.
We introduce SCORE, a Symbolic COgnitive Reasoning framework for Embodied Head Rotation, a data-agnostic framework that produces context-aware head movements without task-specific training or hand-tuned heuristics. A controlled VR study ($N$=20) identifies five motivational drivers of human head movements: Interest, Information Seeking, Safety, Social Schema, and Habit.
SCORE encodes these drivers as symbolic predicates, perceives the scene with a Vision–Language Model (VLM), and plans head poses with a Large Language Model (LLM). The framework employs a hybrid workflow: the VLM-LLM reasoning is executed offline, after which a lightweight FastVLM performs online validation to suppress hallucinations while maintaining responsiveness to scene dynamics.
The result is an agent that predicts not only "where" to look but also "why", generalizing to unseen scenes and multi-agent crowds while retaining behavioral plausibility.
While head-rotation prediction algorithms could, in principle, reproduce this behavior, they typically focus on visually salient stimuli and overlook the cognitive motives that guide head rotation.
This yields agents that look at conspicuous objects while overlooking obstacles or task-relevant cues, diminishing realism in a virtual environment.
We introduce SCORE, a Symbolic COgnitive Reasoning framework for Embodied Head Rotation, a data-agnostic framework that produces context-aware head movements without task-specific training or hand-tuned heuristics. A controlled VR study ($N$=20) identifies five motivational drivers of human head movements: Interest, Information Seeking, Safety, Social Schema, and Habit.
SCORE encodes these drivers as symbolic predicates, perceives the scene with a Vision–Language Model (VLM), and plans head poses with a Large Language Model (LLM). The framework employs a hybrid workflow: the VLM-LLM reasoning is executed offline, after which a lightweight FastVLM performs online validation to suppress hallucinations while maintaining responsiveness to scene dynamics.
The result is an agent that predicts not only "where" to look but also "why", generalizing to unseen scenes and multi-agent crowds while retaining behavioral plausibility.
Art Papers



DescriptionHow I Perceive It augments machine visual interpretation with human memory, shifting from superficial seeing to deep perceiving. Although today’s vision language models (VLMs) can generate image captions with a degree of subjectivity, they still struggle to explain the underlying reasons or experiential basis for such subjectivity. Machines can see, but they do not perceive as humans do, who link perception with prior experience and memory. To bridge this gap, this paper introduces a visual interpretation system that integrates individual memory into machine perception, founded on structure-mapping theory. By merging what the machine sees with what the individual remembers, the system produces individualized interpretations that uncover more insightful meanings among visual elements that are not immediately visible on the surface.
Technical Papers


DescriptionMulti-domain image inpainting utilizes complementary contextual information from auxilliary domain images to restore corrupted regions. While existing methods reconstruct auxiliary images to provide additional guidance, they face fundamental limitations: recovered pixels with complex patterns often lack representative details, while oversimplified patterns offer insufficient contextual information. To address these challenges, we propose HRC-Net, a novel framework incorporating three generative sub-networks for comprehensive image inpainting task. Our architecture consists of: (1) A \emph{Hypothesis Sub-network} that enables robust samplings of pixel-wise hypotheses from multi-domain inputs; A \emph{Representative Sub-network} that learns to score hypothesis quality based on contextual relevance; and a \emph{Collaboration Sub-network} that optimizes adaptive fusion kernels to integrate the most pertinent details. Together, these components model the joint distribution of representative scores and convolutional kernels, fostering a precise interaction between auxiliary hypotheses and target image corruption to meticulously repair the target image. Extensive evaluations across multiple benchmark datasets demonstrate HRC-Net's superior performance, significantly outperforming state-of-the-art methods in both quantitative metrics and visual quality.
Technical Papers


DescriptionWe present HRM^2Avatar, a novel framework for creating high-fidelity avatars from monocular phone scans, which can be rendered and animated in real-time on mobile devices. Monocular capture with commodity smartphones provides a low-cost, pervasive alternative to studio-grade multi-camera rigs, making avatar digitization accessible to non-expert users. Reconstructing
high-fidelity avatars from single-view video sequences poses significant challenges due to deficient visual and geometric data relative to multi-camera setups. To address these limitations, at the data level, our method leverages two types of data captured with smartphones: static pose sequences for
detailed texture reconstruction and dynamic motion sequences for learning pose-dependent deformations and lighting changes. At the representation level, we employ a lightweight yet expressive representation to reconstruct high-fidelity digital humans from sparse monocular data. First, we extract explicit garment meshes from monocular data to model clothing deformations
more effectively. Second, we attach illumination-aware Gaussians to the mesh surface, enabling high-fidelity rendering and capturing pose-dependent lighting changes. This representation efficiently learns high-resolution and dynamic information from our tailored monocular data, enabling the creation of detailed avatars. At the rendering level, real-time performance
is critical for rendering and animating high-fidelity avatars in AR/VR, social gaming, and on-device creation, demanding sub-frame responsiveness. Our fully GPU-driven rendering pipeline delivers 120 FPS on mobile devices and 90 FPS on standalone VR devices at 2K resolution, over 2.7X faster than representative mobile-engine baselines. Experiments show that HRM^2Avatar delivers superior visual realism and real-time interactivity at high resolutions, outperforming state-of-the-art monocular methods.
high-fidelity avatars from single-view video sequences poses significant challenges due to deficient visual and geometric data relative to multi-camera setups. To address these limitations, at the data level, our method leverages two types of data captured with smartphones: static pose sequences for
detailed texture reconstruction and dynamic motion sequences for learning pose-dependent deformations and lighting changes. At the representation level, we employ a lightweight yet expressive representation to reconstruct high-fidelity digital humans from sparse monocular data. First, we extract explicit garment meshes from monocular data to model clothing deformations
more effectively. Second, we attach illumination-aware Gaussians to the mesh surface, enabling high-fidelity rendering and capturing pose-dependent lighting changes. This representation efficiently learns high-resolution and dynamic information from our tailored monocular data, enabling the creation of detailed avatars. At the rendering level, real-time performance
is critical for rendering and animating high-fidelity avatars in AR/VR, social gaming, and on-device creation, demanding sub-frame responsiveness. Our fully GPU-driven rendering pipeline delivers 120 FPS on mobile devices and 90 FPS on standalone VR devices at 2K resolution, over 2.7X faster than representative mobile-engine baselines. Experiments show that HRM^2Avatar delivers superior visual realism and real-time interactivity at high resolutions, outperforming state-of-the-art monocular methods.
Technical Papers


DescriptionCreating high-quality, photorealistic 3D digital humans from a single image remains challenging. While existing methods can generate visually appealing multi-view outputs, they often suffer from inconsistencies in viewpoints and camera poses, resulting in suboptimal 3D reconstructions with reduced realism. Furthermore, most approaches focus on body generation while over-looking facial consistency – a perceptually critical issue caused by the fact that the face occupies only a small area in a full-body image (e.g., ∼ 80×80 pixels out of a 512 × 512 image). This limited resolution and low weight for the facial regions during optimization leads to insufficient facial details and inconsistent facial identity features across multiple views. To address these challenges, we leverage the powerful capabilities of 2D video diffusion models for consistent multi-view RGB and Normal human image generation, combined with the 3D SMPL-X representation to enable spatial consistency and geometrical details. By fine-tuning the DiT models (HumanWan-DiTs) on realistic 3D human datasets using the LoRA technique, our method ensures both generalizability and 3D visual consistency on realistic multi-view human image generation. The proposed facial enhancement is integrated into 3D Gaussian optimization to enhance facial details. To further refine results, we apply super-resolution and generative priors to reduce facial blurring alongside SMPL-X parameter tuning and the assistance of generated multi-view normal images, achieving photorealistic and consistent rendering from a single image. Extensive experiments demonstrate that our approach outperforms existing methods, producing photorealistic, consistent, and fine-detailed human renderings.
Poster






DescriptionBy integrating the joint refinement of Mesh and Gaussian Splatting, HybridDeform4D captures complex motion and appearance variations in video-to-4D object generation, improving temporal dynamics without requiring high-quality initialization.
Technical Papers


DescriptionAcquiring bidirectional reflectance distribution functions (BRDFs) is essential for simulating light transport and analytically modeling material properties. Over the past two decades, numerous intensity-only BRDF datasets in the visible spectrum have been introduced, primarily for RGB image rendering applications. However, in scientific and engineering domains, there remains an unmet need to model light transport with polarization a fundamental wave property of light across hyperspectral bands. To address this gap, we present the first hyperspectral-polarimetric BRDF (hpBRDF) dataset, spanning wavelengths from 414 to 950 nm and densely sampled at 68 spectral bands. This dataset covers both the visible and near-infrared (NIR) spectra, enabling detailed material analysis and light reflection simulations that incorporate polarization at each narrow spectral band. We develop an efficient hpBRDF acquisition system that captures high-dimensional hpBRDFs within a practical acquisition time. Using this system, we analyze the hpBRDFs with respect to their dependencies on wavelength, polarization state, material type, and illumination/viewing geometry. We further perform numerical analyses, including principal component analysis (PCA) and the development of a neural hpBRDF representation for continuous modeling.
XR






DescriptionTentacus is a mixed reality ritual that explores decentralized intelligence and collective embodiment, where each dancer becomes a tentacle by wearing soft-textile gloves with embedded smartphones that function as both sensors and actuators. They negotiate movement through connection to a circular ring, becoming a collective, fluid tentacular being.
Poster






DescriptionAuto 3D object scaling is important and difficult. We propose an image-similarity-based estimation method, delivering faster and more precise model resizing for AI model and scene generation.
XR






DescriptionThis XR system supports aspiring childcare professionals in developing picture-book storytelling skills. It analyzes users’ voice, posture, and gaze in real time, allowing virtual child avatars to respond dynamically and provide intuitive feedback, helping learners improve their storytelling techniques before entering real-world childcare settings.
Technical Papers


DescriptionGenerating artistic and coherent 3D scene layouts is crucial in digital content creation. Traditional optimization-based methods are often constrained by cumbersome manual rules, while deep generative models face challenges in producing content with richness and diversity. Furthermore, approaches that utilize large language models frequently lack robustness and fail to accurately capture complex spatial relationships. To address these challenges, this paper presents a novel vision-guided 3D layout generation system.
We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts. Subsequently, we employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library. We then develop a robust image parsing module to recover the 3D layout of scenes based on visual semantics and geometric information. Finally, we optimize the scene layout using scene graphs and overall visual semantics to ensure logical coherence and alignment with the images. Extensive user testing demonstrates that our algorithm significantly outperforms existing methods in terms of layout richness and quality. The code and dataset will be available.
We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts. Subsequently, we employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library. We then develop a robust image parsing module to recover the 3D layout of scenes based on visual semantics and geometric information. Finally, we optimize the scene layout using scene graphs and overall visual semantics to ensure logical coherence and alignment with the images. Extensive user testing demonstrates that our algorithm significantly outperforms existing methods in terms of layout richness and quality. The code and dataset will be available.
Technical Papers


DescriptionReverse engineering 3D computer-aided design (CAD) models from images is an important task for many downstream applications including interactive editing, manufacturing, architecture, robotics, etc. The difficulty of the task lies in vast representational disparities between the CAD output and the image input. CAD models are precise, programmatic constructs that involves sequential operations combining discrete command structure with continuous attributes -- making it challenging to learn and optimize in an end-to-end fashion.
Concurrently, input images introduce inherent challenges such as photometric variability and sensor noise, complicating the reverse engineering process. In this work, we introduce a novel approach that conditionally factorizes the task into two sub-problems. First, we leverage vision-language foundation models (VLMs), a finetuned Llama3.2, to predict the global discrete base structure with semantic information. Second, we propose TrAssembler that conditioned on the discrete structure with semantics predicts the continuous attribute values.
To support the training of our TrAssembler, we further constructed an annotated CAD dataset of common objects from ShapeNet. Putting all together, our approach and data demonstrate significant first steps towards CAD-ifying images in the wild.
Concurrently, input images introduce inherent challenges such as photometric variability and sensor noise, complicating the reverse engineering process. In this work, we introduce a novel approach that conditionally factorizes the task into two sub-problems. First, we leverage vision-language foundation models (VLMs), a finetuned Llama3.2, to predict the global discrete base structure with semantic information. Second, we propose TrAssembler that conditioned on the discrete structure with semantics predicts the continuous attribute values.
To support the training of our TrAssembler, we further constructed an annotated CAD dataset of common objects from ShapeNet. Putting all together, our approach and data demonstrate significant first steps towards CAD-ifying images in the wild.
Invited Poster
Poster






DescriptionThis work presents IMLS-Splatting, which establishes a direct, differentiable bridge between discrete point clouds and continuous surfaces and achieves end-to-end mesh optimization from multi-view images by integrating differentiable rasterization, avoiding any post-processing or regularization.
Poster






DescriptionA VR neuroanatomy system transforms medical education through immersive 3D brain visualization, AI-generated visual mnemonics, and gamified learning experiences that significantly enhance students' spatial understanding and knowledge retention.
Birds of a Feather






DescriptionThis session presents an immersive VR pedagogy project developed at HKUST to support undergraduate life science students in learning rat anatomy and dissection. By addressing ethical concerns, cost barriers, and student anxiety, the VR platform offers a safe, repeatable, and interactive learning environment. Students reported greater preparedness, reduced stress, and improved spatial understanding. Beyond this case study, the session highlights how immersive technologies foster empathy, creativity, and critical thinking, positioning VR as a powerful tool to align STEM education with human-centered values and responsible digital citizenship.
Educator's Forum



DescriptionThis paper presents an immersive VR pedagogy project developed at HKUST to
support undergraduate life science students in learning rat anatomy and dissection.
Traditional anatomy teaching often confronts ethical concerns, high material costs, and
student anxiety when handling specimens. Our VR platform provides a safe, repeatable,
and interactive environment where students can explore anatomical structures, practice
procedures, and build confidence prior to the wet lab. Baseline surveys revealed that
64.6% of students felt more prepared after VR practice, while 79.3% expressed
satisfaction with the learning experience, citing reduced anxiety and enhanced spatial
understanding.
Beyond this case study, we position the project within broader educational futures. First,
immersive HCI can serve as emotional support, reducing stress and fostering inclusive
learning environments. Second, the integration of AI and multimodal tools enables
students to engage in creative expression, combining scientific learning with storytelling
and cross-disciplinary perspectives. Finally, as educational technologies increasingly
collect and process student information, cultivating critical thinking about education
technology.
By situating a concrete VR teaching practice within these three educational
dimensions—emotional, creative, and critical—this contribution demonstrates not only
the practical value of immersive tools in STEM education but also the importance of
aligning technological innovation with human-centered values. Together, they point
toward a holistic vision of education where VR, AI, and HCI collectively shape inclusive,
thoughtful, and future-ready learning environments.
support undergraduate life science students in learning rat anatomy and dissection.
Traditional anatomy teaching often confronts ethical concerns, high material costs, and
student anxiety when handling specimens. Our VR platform provides a safe, repeatable,
and interactive environment where students can explore anatomical structures, practice
procedures, and build confidence prior to the wet lab. Baseline surveys revealed that
64.6% of students felt more prepared after VR practice, while 79.3% expressed
satisfaction with the learning experience, citing reduced anxiety and enhanced spatial
understanding.
Beyond this case study, we position the project within broader educational futures. First,
immersive HCI can serve as emotional support, reducing stress and fostering inclusive
learning environments. Second, the integration of AI and multimodal tools enables
students to engage in creative expression, combining scientific learning with storytelling
and cross-disciplinary perspectives. Finally, as educational technologies increasingly
collect and process student information, cultivating critical thinking about education
technology.
By situating a concrete VR teaching practice within these three educational
dimensions—emotional, creative, and critical—this contribution demonstrates not only
the practical value of immersive tools in STEM education but also the importance of
aligning technological innovation with human-centered values. Together, they point
toward a holistic vision of education where VR, AI, and HCI collectively shape inclusive,
thoughtful, and future-ready learning environments.
XR






DescriptionWe demonstrate innovations in using mixed and augmented reality headsets to improve how archaeologists work in the field. During excavation, archaeologists can immersively interact with 3D spatial data about a site or shape data about ancient artifacts. They can also collect and recall data data hands-free.
Birds of a Feather






DescriptionTraditionally (since 2014), this session is about immersive visualization systems, software, and tools for science, research, scientific visualization, information visualization, art, design and digital twins. Invited speakers and panelists discuss newest initiatives and developments in immersive space as applied to data exploration, scientific discoveries, and more. Agenda is available always at https://vis-bof.matters.today
Invited Poster
Poster






DescriptionThe study achieved the first empirical validation of imperceptible gaze guidance in virtual reality by leveraging ocularity-based cues grounded in the V1 Saliency Hypothesis, demonstrating that subtle interocular contrast can reflexively steer user attention without perceptible visual changes, preserving immersion and enabling new human-centered XR interaction methods.
Technical Papers


DescriptionWe present an image-space control variate technique to improve Monte Carlo~(MC) integration-based rendering. Our method selects spatially nearby pixel estimates as control variates to exploit spatial coherence among pixel estimates in a rendered image without requiring analytic modeling of the control variate functions. Employing control variates is a classical and well-established technique for variance reduction in MC integration, typically relying on the assumption that the expectations of control variates are readily obtainable. When this condition is met, control variate theory offers a principled framework for optimizing their use by adjusting coefficients that determine the relative contribution of each control variate. However, our image-space approach introduces a technical challenge, as the expectations of the pixel-based control variates are unknown and must be estimated from additional MC samples, which are unbiased but inherently noisy. In this paper, we propose a control variate estimator designed to optimally leverage such imperfect control variates by relaxing the traditional requirement that their expectations are known. We demonstrate that our approach, which estimates the optimal coefficients while explicitly accounting for uncertainty in the expectation estimates, effectively reduces the variance of MC rendering across various test scenes.
Technical Papers


DescriptionThis paper proposes a novel simulation approach integrating implicit integration with BDEM for faster, more stable, and accurate fracture simulation. It introduces an optimization-based integrator and manifold optimization to accelerate quaternion-constrained systems, outperforming FEM/MPM in scale consistency and collision effects with 2.1–9.8× speedup over explicit BDEM.
Invited Poster
Poster






DescriptionDiffusion models now generate diverse character motion from artist-friendly controls such as Bezier curves but lack the temporal flexibility needed for professional animation. This work introduces the Implicit Bézier Motion Model (IBMM), allowing precise, sparse joint control at arbitrary timings. IBMM is exposed to all control point configurations and includes a quantitative ease-in/out measure for artistic control.
Technical Papers


DescriptionWe present a novel implicit porous flow solver using SPH, which maintains fluid incompressibility and is able to model a wide range of scenarios, driven by strongly coupled solid-fluid interaction forces. Many previous SPH porous flow methods reduce particle volumes as they transition across the solid-fluid interface, resulting in significant stability issues. We instead allow fluid and solid to overlap by deriving a new density estimation. This further allows us to extend SPH pressure solvers to take local porosity into account and results in strict enforcement of incompressibility. As a result, we can simulate porous flow using physically consistent pressure forces between fluid and solid. In contrast to previous SPH porous flow methods, which use explicit forces for internal fluid flow, we employ implicit non-pressure forces. These we solve as a linear system and strongly couple with fluid viscosity and solid elasticity. We capture the most common effects observed in porous flow, namely drag, buoyancy and capillary action due to adhesion. To achieve elastic behavior change based on local fluid saturation, such as bloating or softening, we propose an extension to the elasticity model. We demonstrate the efficacy of our model with various simulations that showcase the different aspects of porous flow behavior. To summarize, our system of strongly coupled non-pressure forces and enforced incompressibility across overlapping phases allows us to naturally model and stably simulate complex porous interactions.
Technical Papers


DescriptionThe efficient simulation of incompressible fluids remains a difficult and open problem. Prior works often make various tradeoffs between incompressibility, stability, and cost. Yet, it is rare to obtain all three. In this paper, we introduce a novel incompressible Smoothed Particle Hydrodynamics (SPH) scheme which uses a second-order implicit descent scheme to optimize a variational energy specially formulated to approach incompressibility. We demonstrate that our method is superior in both incompressibility and stability with a minimal cost to computational budget. Furthermore, we demonstrate that our method is unconditionally stable even under extreme time steps, making it suitable for interactive applications.
Poster






DescriptionWe study MLP input angular parameterizations' impact on compact neural materials, showing suitable choices improve visual quality, especially for small MLPs, supported by extensive experiments and practical recommendations.
Technical Papers


DescriptionWe introduce a divergence-free nD vector noise defined as the n-dimensional cross product of the gradients of n-1 noise functions. We show that this vector noise function is divergence-free and hence volume preserving for any dimension n. Our method enables precise integration and extends to new settings by substituting noise functions with implicit surfaces, (hyper)surfaces, or custom functions. We demonstrate applications including image warping, surface texturing, noise bounded by implicit surfaces, anisotropic curl-noise, and high-dimensional point jittering up to 7D.
Poster






DescriptionOur method, LGSfM, is the first approach to optimize SfM-based 3D models for visual localization. Experiments show LGSfM achieves consistent improvements in real-world localization across both indoor and outdoor datasets.
Poster






DescriptionWe propose a three-stage method improving prior stitching depth generation, introducing SeamRemove for seam removal and SOTA accuracy, and applying existing enhancement techniques to produce visually superior panoramic depth maps.
Computer Animation Festival






DescriptionIn Half is not just a 3D digital animated short, it is a cinematic language laboratory that invites the SIGGRAPH Asia community to reconsider what an image can truly convey. The film opens with a pulsing silence, and each frame functions like a note in a visual score. What unfolds between the visuals and sound design doesn’t need to be explained, it’s felt. This image-driven, sound-centered approach redefines exposition, moving away from text-based storytelling and inviting the audience to build meaning alongside the screen.
Technically, In Half is a tightly interwoven orchestration. Lighting, composition, and image texture interact seamlessly with a sound design that breathes, sighs, and at times chills through silence. Flecks of light, shifting shadows, the rhythmic pulse of a heartbeat or a distant hum all become narrators in their own right. Symbolic motifs across the narrative form a choreography of tension and union between inner and outer worlds, creating a code each viewer deciphers personally. This fosters an interdisciplinary dialogue between visual art, media psychology, and digital technology.
On a production level, In Half showcases a fully integrated pipeline where image, rhythm, and atmosphere are co-dependent. Camera work, texture design, and a new character design line function as a unified system. Subtle shifts in color or texture shape emotional perception, while changes in the soundscape reframe entire scenes. This method allows for universal accessibility while retaining cultural specificity, using symbolic elements as bridges between local communities and global audiences enriching discussions around representation in digital media.
For SIGGRAPH Asia 2025, In Half offers a concrete framework for exploring how modern audiences emotionally engage with visual and sonic stimuli, highlighting animation’s power as both immersive experience and pixel-driven narrative art.
Technically, In Half is a tightly interwoven orchestration. Lighting, composition, and image texture interact seamlessly with a sound design that breathes, sighs, and at times chills through silence. Flecks of light, shifting shadows, the rhythmic pulse of a heartbeat or a distant hum all become narrators in their own right. Symbolic motifs across the narrative form a choreography of tension and union between inner and outer worlds, creating a code each viewer deciphers personally. This fosters an interdisciplinary dialogue between visual art, media psychology, and digital technology.
On a production level, In Half showcases a fully integrated pipeline where image, rhythm, and atmosphere are co-dependent. Camera work, texture design, and a new character design line function as a unified system. Subtle shifts in color or texture shape emotional perception, while changes in the soundscape reframe entire scenes. This method allows for universal accessibility while retaining cultural specificity, using symbolic elements as bridges between local communities and global audiences enriching discussions around representation in digital media.
For SIGGRAPH Asia 2025, In Half offers a concrete framework for exploring how modern audiences emotionally engage with visual and sonic stimuli, highlighting animation’s power as both immersive experience and pixel-driven narrative art.
Technical Papers


DescriptionWe pose a new problem, In-2-4D, for generative 4D (i.e., 3D + motion)
inbetweening to interpolate two single-view images. In contrast to video/4D
generation from only text or a single image, our interpolative task can lever-
age more precise motion control to better constrain the generation. Given
two monocular RGB images representing the start and end states of an
object in motion, our goal is to generate and reconstruct the motion in 4D,
without making assumptions on the object category, motion type, length,
or complexity. To handle such arbitrary and diverse motions, we utilize
a foundational video interpolation model for motion prediction.
However, large frame-to-frame motion gaps can lead to ambiguous
interpretations. To this end, we employ a hierarchical approach to
identify keyframes that are visually close to the input states while
exhibiting significant motions, then generate smooth fragments be-
tween them. For each fragment, we construct a 3D representation of
the keyframe using Gaussian Splatting (3DGS). The temporal frames
within the fragment guide the motion, enabling their transforma-
tion into dynamic 3DGS through a deformation field. To improve
temporal consistency and refine the 3D motion, we expand the self-
attention of multi-view diffusion across timesteps and apply rigid
transformation regularization. Finally, we merge the independently
generated 3D motion segments by interpolating boundary defor-
mation fields and optimizing them to align with the guiding video,
ensuring smooth and flicker-free transitions. Through extensive
qualitative and quantitive experiments as well as a user study, we
demonstrate the effectiveness of our method and design choices.
inbetweening to interpolate two single-view images. In contrast to video/4D
generation from only text or a single image, our interpolative task can lever-
age more precise motion control to better constrain the generation. Given
two monocular RGB images representing the start and end states of an
object in motion, our goal is to generate and reconstruct the motion in 4D,
without making assumptions on the object category, motion type, length,
or complexity. To handle such arbitrary and diverse motions, we utilize
a foundational video interpolation model for motion prediction.
However, large frame-to-frame motion gaps can lead to ambiguous
interpretations. To this end, we employ a hierarchical approach to
identify keyframes that are visually close to the input states while
exhibiting significant motions, then generate smooth fragments be-
tween them. For each fragment, we construct a 3D representation of
the keyframe using Gaussian Splatting (3DGS). The temporal frames
within the fragment guide the motion, enabling their transforma-
tion into dynamic 3DGS through a deformation field. To improve
temporal consistency and refine the 3D motion, we expand the self-
attention of multi-view diffusion across timesteps and apply rigid
transformation regularization. Finally, we merge the independently
generated 3D motion segments by interpolating boundary defor-
mation fields and optimizing them to align with the guiding video,
ensuring smooth and flicker-free transitions. Through extensive
qualitative and quantitive experiments as well as a user study, we
demonstrate the effectiveness of our method and design choices.
Technical Papers


DescriptionRecent advances in diffusion models have enhanced multimodal-guided visual generation, enabling customized subject insertion that seamlessly "brushes" user-specified objects into a given image guided by textual prompts. However, existing methods often struggle to insert customized subjects with high fidelity and align results with the user's intent through textual prompts. In this work, we propose "In-Context Brush", a zero-shot framework for customized subject insertion by reformulating the task within the paradigm of in-context learning. Without loss of generality, we formulate the object image and the textual prompts as cross-modal demonstrations, and the target image with the masked region as the query. The goal is to inpaint the target image with the subject aligning textual prompts without model tuning. Building upon a pretrained MMDiT-based inpainting network, we perform test-time enhancement via dual-level latent space manipulation: intra-head latent feature shifting within each attention head that dynamically shifts attention outputs to reflect the desired subject semantics and inter-head attention reweighting across different heads that amplifies prompt controllability through differential attention prioritization. Extensive experiments and applications demonstrate that our approach achieves superior identity preservation, text alignment, and image quality compared to existing state-of-the-art methods, without requiring dedicated training or additional data collection.
Computer Animation Festival






DescriptionEun-sang, a no-name artist, creates art that features a fly.
She is eating convenience food as usual when she senses a strange texture and learns that it was a fly. Soon, she is plagued by anxiety from believing that the fly is living in her stomach and makes all kinds of efforts to digest the fly.
She is eating convenience food as usual when she senses a strange texture and learns that it was a fly. Soon, she is plagued by anxiety from believing that the fly is living in her stomach and makes all kinds of efforts to digest the fly.
Technical Papers


DescriptionWe introduce a general, scalable computational framework for multi-axis 3D printing based on implicit neural fields (INFs) that unifies all stages of toolpath generation and global collision-free motion planning. In our pipeline, input models are represented as signed distance fields, with fabrication objectives—such as support-free printing, surface finish quality, and extrusion control—directly encoded in the optimization of an implicit guidance field. This unified approach enables toolpath optimization across both surface and interior domains, allowing shell and infill paths to be generated via implicit field interpolation. The printing sequence and multi-axis motion are then jointly optimized over a continuous quaternion field. Our continuous formulation constructs the evolving printing object as a time-varying SDF, supporting differentiable global collision handling throughout INF-based motion planning. Compared to explicit-representation-based methods, INF-3DP achieves up to two orders of magnitude speedup and significantly reduces waypoint-to-surface error. We validate our framework on diverse, complex models and demonstrate its efficiency with physical fabrication experiments using a robot-assisted multi-axis system.
Technical Papers


DescriptionGenerating realistic and controllable 3D human avatars is a long-standing challenge. The difficulty increases when covering a broad range of attributes such as ethnicity, age, clothing styles, and detailed body shapes. Capturing and annotating large-scale human datasets for training generative models is prohibitively expensive and limited in both scale and diversity. The central question we address in this paper is: Can we distill existing foundation models to generate theoretically unbounded richly annotated 3D human data? We introduce InfiniHuman, a novel framework to distill these models synergistically, to generate richly annotated human data with minimal cost and theoretically unlimited scalability. Specifically, we propose InfiniHumanData, a fully automatic pipeline that leverages vision-language and image generation models to create a large-scale multi-modal dataset. Remarkably, users cannot distinguish our automatically generated identities from scan renderings. InfiniHumanData contains 111K identities and covers unprecedented diversity in ethnicity, age, clothing styles, and more. Each identity is annotated with multi-granularity text descriptions, multi-view RGB images, detailed clothing images, and SMPL body shape parameters. Based on this, we learn InfiniHumanGen, a diffusion-based generative pipeline conditioned on text, body shape, and clothing assets. InfiniHumanGen enables fast, realistic, and precisely controllable avatar generation. Extensive experiments demonstrate that InfiniHuman significantly surpasses existing state-of-the-art methods in terms of visual quality, generation speed, and controllability. Importantly, our approach democratizes high-quality avatar generation with fine-grained control at infinite scale through a practical and affordable solution. To facilitate future research, we will publicly release our automatic data generation pipeline and the comprehensive dataset InfiniHumanData, and the generative models InfiniHumanGen.
Featured Session



DescriptionReflecting on a lifelong journey of innovations in the computer graphics industry - from the inception of CG production in the early 1980s to the current rise of AI - Richard will explore the challenges and potentials to inspire a new generation of innovators standing at the crossroads of creativity and technology.
Technical Papers


DescriptionRecent works use diffusion models for realistic co-speech video synthesis from audio, with various applications like video creation and virtual agents. However, existing diffusion-based methods are very slow due to many denoising steps and costly attention mechanisms, preventing real-time deployment. In this work, we aim to distill a many-step diffusion video model into a few-step student model. Unfortunately, directly applying recent model distillation methods degrades video quality and falls short of real-time performance. To address these issues, we introduce a new video distillation method that leverages input human pose conditioning for both attention and loss functions. For attention, we propose sparse attention across frames, using accurate correspondence from input human pose keypoints to guide attention to salient dynamic regions like the speaker's face, hands, and upper body. This input-aware sparse attention reduces redundant computations and strengthens temporal correspondences of body parts, improving inference efficiency and motion coherence. To further improve visual quality, we introduce an input-aware region loss that enhances lip synchronization and hand motion realism. Integrating our input-aware sparse attention and region loss, our method achieves real-time performance with improved visual quality compared to recent audio-driven and input-driven methods. We also conduct an extensive ablation study showing the effectiveness of our designs.
Educator's Forum



DescriptionThis paper reflects on the transition from the legacy Digital Design major to the Animation, Visual Effects, and Game Design (AVG) degree at Auckland University of Technology (AUT). It examines how first-year papers DESN511 and DESN512 support student engagement and pathway selection through inclusive, interdisciplinary, and culturally grounded teaching. Students explore animation, VFX, and game design through hands-on, concept-led projects that centre identity, storytelling, and experimentation. Activities such as collaborative storytelling and tactile self-portraiture foster connection and inclusion, aligning with Mātauranga Māori and the Bachelor of Design’s educational philosophy. These experiences help students feel seen and supported, with reflective annotations and anecdotal feedback indicating increased confidence and clarity in creative direction. The paper also addresses challenges, including cohort growth, neurodiversity, and staffing needs, proposing scalable strategies that balance foundational skill development with creative freedom. Animation is positioned as a shared language across disciplines, while VFX and game design offer entry points into narrative and interactive expression. By embedding writing into creative practice and fostering a culture of reflection, the AVG curriculum supports identity formation and interdisciplinary fluency, contributing to broader conversations about inclusive and future-focused design education.
Birds of a Feather






DescriptionHosted by founding members from NVIDIA, Meta and SIT of the Intelligent Immersification in Metaverse (I²M) ACM Thematic Chapter, this BoF session continues our ongoing effort to bring together researchers, practitioners, and enthusiasts at the intersection of intelligent systems and immersive technologies. Building on the success of prior I²M BoF sessions at SIGGRAPH Asia 2023 and 2024, this year’s theme highlights Immersion with LLMs, showcasing key developments and emerging trends drawn from the chapter’s academic and industry activities over the past year, serving as a platform for vibrant discussion and knowledge exchange.
Invited Poster
Poster






DescriptionIntentMotion is a novel framework that generates human motion in 3D scenes from instructions. We first introduce the Intention-Guided Contact Field (IGCF), which explicitly aligns parsed language roles with spatial contact regions through a hierarchical attention mechanism. IGCF is jointly trained with a diffusion-based motion generator, allowing contact predictions to adapt dynamically through gradient feedback. To improve the controllability and physics-aware motion, we further propose an Intention-Aware Diffusion Model, which decouples the high-level semantic planning from the low-level contact refinement. Contact cues are utilized to guide the synthesis of coarse trajectory, followed by refining detailed pose sequences under IGCF supervision.
Poster






DescriptionWe propose a novel drawing system that integrates the entire process of interactive generating and observing 3DCG scenes on the tabletop from 2D sketches.
Educator's Forum



DescriptionThis presentation highlights projects implemented in my Interdisciplinary Digital Art course at the University of Toronto Scarborough. These projects encourage students to explore new technologies critically and creatively; they introduce them to practical skills and spark important reflections on the cultural, ethical, and social implications of ubiquitous digital systems. Among the technologies addressed: networking, data mining, surveillance, information aesthetics, social media culture, and locative media.
My approach to teaching encourages students to become active self-media creators and participants in global digital culture. Blending creative experimentation with technological aptitude, my pedagogical framework aims to provide students with the tools they need to engage critically with evolving media environments while developing interdisciplinary approaches to artmaking. To support this, I design innovative curricula that incorporate graphic and experimental art strategies informed by emerging technologies. By framing assignments around current innovations, students connect theoretical concepts to experiential learning. The rapid pace of change in digital media motivates me to continuously re-evaluate both the learning experiences I create for students and my own art practice.
My approach to teaching encourages students to become active self-media creators and participants in global digital culture. Blending creative experimentation with technological aptitude, my pedagogical framework aims to provide students with the tools they need to engage critically with evolving media environments while developing interdisciplinary approaches to artmaking. To support this, I design innovative curricula that incorporate graphic and experimental art strategies informed by emerging technologies. By framing assignments around current innovations, students connect theoretical concepts to experiential learning. The rapid pace of change in digital media motivates me to continuously re-evaluate both the learning experiences I create for students and my own art practice.
Technical Papers


DescriptionThermal imaging, as a promising approach for scalable and robust scene
perception, is invaluable for many applications in various fields, such as
architecture and building physics. Despite many recent works having demon-
strated their capability to incorporate thermal images into radiance field
methods, they typically do not explicitly model how radiation interacts and
reflects within the scene before reaching the camera, which is essential for in-
ferring thermal physics and properties of objects in a scene. Using Gaussian
primitives as the scene representation, our method estimates surface tem-
perature and material properties to generate infrared renderings that closely
match the input images. Taking inspirations from radiosity and hemicube
rasterization, our method decomposes the outgoing radiation from each
Gaussian primitive into two parts: self-emission and reflection originating
from other primitives and the environment. This formulation allows us to
simulate radiation under novel heating conditions and to find the best-fit
temperature and material parameters given thermal images. The method is
verified using both synthetic and real capture datasets.
perception, is invaluable for many applications in various fields, such as
architecture and building physics. Despite many recent works having demon-
strated their capability to incorporate thermal images into radiance field
methods, they typically do not explicitly model how radiation interacts and
reflects within the scene before reaching the camera, which is essential for in-
ferring thermal physics and properties of objects in a scene. Using Gaussian
primitives as the scene representation, our method estimates surface tem-
perature and material properties to generate infrared renderings that closely
match the input images. Taking inspirations from radiosity and hemicube
rasterization, our method decomposes the outgoing radiation from each
Gaussian primitive into two parts: self-emission and reflection originating
from other primitives and the environment. This formulation allows us to
simulate radiation under novel heating conditions and to find the best-fit
temperature and material parameters given thermal images. The method is
verified using both synthetic and real capture datasets.
Technical Papers


DescriptionA K-hedral tiling of a 2D finite domain is a covering of the domain with tiles without gaps or overlaps, where each tile is congruent to one of the K distinct shapes called prototiles. K, the number of prototiles, is preferred to be as small as possible for congruent tiling appearance and reducing fabrication cost, e.g., by molding. Typically, a forward approach is adopted to produce Khedral tilings by prescribing a set of prototiles and placing prototile instances (i.e., tiles) to cover the input domain. However, the prescribed prototile set may not be sufficient to tile the domain (for small K) or may lead to tiling results with excessive prototiles more than needed (for large K).
In this work, we formulate a new tiling problem called inverse tiling for producing K-hedral tilings in 2D finite domains, where the prototile set is inversely modeled to fit the input domain instead of being prescribed. Since the prototile set is unknown, inverse tiling allows exploring a large search space to discover a minimized number of prototiles for tiling the input domain. To solve the inverse tiling problem, we propose a computational approach that progressively builds the prototile set while tiling the input domain, starting from a prototile set with a single element. Once a tiling result is obtained, the approach further reduces the number of prototiles by locally re-tiling the input domain to eliminate prototiles with few instances. We demonstrate the effectiveness of our inverse tiling approach on a variety of finite domains, evaluate its performance in scalability and K minimization, and compare it quantitatively with forward tiling approaches.
In this work, we formulate a new tiling problem called inverse tiling for producing K-hedral tilings in 2D finite domains, where the prototile set is inversely modeled to fit the input domain instead of being prescribed. Since the prototile set is unknown, inverse tiling allows exploring a large search space to discover a minimized number of prototiles for tiling the input domain. To solve the inverse tiling problem, we propose a computational approach that progressively builds the prototile set while tiling the input domain, starting from a prototile set with a single element. Once a tiling result is obtained, the approach further reduces the number of prototiles by locally re-tiling the input domain to eliminate prototiles with few instances. We demonstrate the effectiveness of our inverse tiling approach on a variety of finite domains, evaluate its performance in scalability and K minimization, and compare it quantitatively with forward tiling approaches.
Poster






DescriptionWe make video signal processing invertible, for easily accessible raw data.
Invited Poster
Poster






DescriptionThis paper introduces an online self-exploration loop that enables multimodal agents to self-improve via AI-generated tasks and LLM-verified preference tuning without human annotations.
Technical Papers


DescriptionA core operation in Monte Carlo volume rendering is transmittance estimation: Given a segment along a ray, the goal is to estimate the fraction of light that will pass through this segment without encountering absorption or out-scattering. A naive approach is to estimate optical depth τ using unbiased ray marching and to then use exp(-τ) as transmittance estimate. However, this strategy systematically overestimates transmittance due to Jensen's inequality. On the other hand, existing unbiased transmittance estimators either suffer from high variance or have a cost governed by random decisions, which makes them less suitable for SIMD architectures. We propose a biased transmittance estimator with significantly reduced bias compared to the naive approach and a deterministic and low cost. We observe that ray marching with stratified jittered sampling results in estimates of optical depth that are nearly normal-distributed. We then apply the unique minimum variance unbiased (UMVU) estimator of exp(-τ) based on two such estimates (using two different sets of random numbers). Bias only arises from violations of the assumption of normal-distributed inputs. We further reduce bias and variance using a variance-aware importance sampling scheme. The underlying theory can be used to estimate any analytic function of optical depth. We use this generalization to estimate multiple importance sampling (MIS) weights and introduce two integrators: Unbiased MIS with biased MIS weights and a more efficient but biased combination of MIS and transmittance estimation.
Computer Animation Festival






DescriptionIn the heart of a furniture store, Jeanne and Jean Jean, a couple in their late thirties, debate intensely, in a control room, about their decoration desires.
Exasperated, Jean Jean goes to take a bathroom break. Meanwhile Jeanne witnesses the disturbing ballet of the other couples. On his return, full of good intentions, Jean Jean says he is ready for a decorative compromise, but frightened, Jeanne leaves without even listening to him. A dramatic chase then follows: run away from me, I'm following you, follow me, I'm running away from you, in the large aisles of the store.
By chance, they come face to face among the duvet covers. Moved, the lovebirds then reveal all their fears and doubts. But that was without counting on a crowd of harassing customers who commented and tried to influence them! Exhausted by all their emotions, Jeanne and Jean Jean lie down on two promotional mattresses. Then an idea came to them...that of not living together but of continuing to love each other.
Exasperated, Jean Jean goes to take a bathroom break. Meanwhile Jeanne witnesses the disturbing ballet of the other couples. On his return, full of good intentions, Jean Jean says he is ready for a decorative compromise, but frightened, Jeanne leaves without even listening to him. A dramatic chase then follows: run away from me, I'm following you, follow me, I'm running away from you, in the large aisles of the store.
By chance, they come face to face among the duvet covers. Moved, the lovebirds then reveal all their fears and doubts. But that was without counting on a crowd of harassing customers who commented and tried to influence them! Exhausted by all their emotions, Jeanne and Jean Jean lie down on two promotional mattresses. Then an idea came to them...that of not living together but of continuing to love each other.
Technical Papers


DescriptionIn music-to-motion generation, the interplay between movements and music tempo variations significantly influences the emotional expressiveness and realism of performances. However, tempo-changing mechanisms remain underexplored in neural network-based music-to-motion tasks due to the scarcity of relevant datasets. Therefore, in this paper, we propose to use novel music features explicitly representing tempo variations, and introduce a dataset, JoruriPuppet, incorporating the Japanese traditional Jo-Ha-Kyu principle characterized by expressive tempo changes. Furthermore, we design three metrics to quantitatively evaluate the synchronization and expressiveness of generated motions. Experiments on our dataset highlight the limitations of SOTA methods in capturing fine-grained tempo changes. We demonstrate that integrating tempo-changing features into them improves neural network-based music-to-motion performance across existing datasets, validating the general effectiveness and applicability of our research.
Technical Papers


DescriptionMarkov chain Monte Carlo (MCMC) algorithms are indispensable when sampling from a complex, high-dimensional distribution by a conventional method is intractable. Even though MCMC is a powerful tool, it is also hard to control and tune in practice. Simultaneously achieving both rapid local exploration of the state space and efficient global discovery of the target distribution is a challenging task.
In this work, we introduce a novel continuous-time MCMC formulation to the computer science community. Generalizing existing work from the statistics community, we propose a novel framework for adjusting an arbitrary family of Markov processes - used for local exploration of the state space only - to an overall process which is invariant with respect to a target distribution.
To demonstrate the potential of our framework, we focus on a simple, but yet insightful, application in light transport simulation. As a by-product, we introduce continuous-time MCMC sampling to the computer graphics community. We show how any existing MCMC-based light transport algorithm can be seamlessly integrated into our framework. We prove empirically and theoretically that the integrated version is superior to the ordinary algorithm. In fact, our approach will convert any existing algorithm into a highly parallelizable variant with shorter running time, smaller error and less variance.
Code and data for this paper are at https://github.com/sascha-holl/jump-restore-light-transport.
In this work, we introduce a novel continuous-time MCMC formulation to the computer science community. Generalizing existing work from the statistics community, we propose a novel framework for adjusting an arbitrary family of Markov processes - used for local exploration of the state space only - to an overall process which is invariant with respect to a target distribution.
To demonstrate the potential of our framework, we focus on a simple, but yet insightful, application in light transport simulation. As a by-product, we introduce continuous-time MCMC sampling to the computer graphics community. We show how any existing MCMC-based light transport algorithm can be seamlessly integrated into our framework. We prove empirically and theoretically that the integrated version is superior to the ordinary algorithm. In fact, our approach will convert any existing algorithm into a highly parallelizable variant with shorter running time, smaller error and less variance.
Code and data for this paper are at https://github.com/sascha-holl/jump-restore-light-transport.
Technical Papers


DescriptionExisting 3D Gaussian (3DGS) based methods tend to produce blurriness and artifacts on delicate textures (small objects and high-frequency textures) in aerial large-scale scenes. The reason is that the delicate textures usually occupy a relatively small number of pixels, and the accumulated gradients from loss function are difficult to promote the splitting of 3DGS. To minimize the rendering error, the model will use a small number of large Gaussians to cover these details, resulting in blurriness and artifacts. To solve the above problem, we propose a novel hierarchical Gaussian: JumpingGS. JumpingGS assigns different levels to Gaussians to establish a hierarchical representation. Low-level Gaussians are responsible for the coarse appearance, while high-level Gaussians are responsible for the details. First, we design a splitting strategy that allows low-level Gaussians to skip intermediate levels and directly split the appropriate high-level Gaussians for delicate textures. This level-jump splitting ensures that the weak gradients of delicate textures can always activate a higher level instead of being ignored by the intermediate levels. Second, JumpingGS reduces the gradient and opacity thresholds for density control according to the representation levels, which improves the sensitivity of high-level Gaussians to delicate textures. Third, we design a novel training strategy to detect training views in hard-to-observe regions, and train the model multiple times on these views to alleviate underfitting. Experiments on aerial large-scale scenes demonstrate that JumpingGS outperforms existing 3DGS-based methods, accurately and efficiently recovering delicate textures in large scenes.
Birds of a Feather






DescriptionThis K-BOF session presents a comprehensive overview of computer graphics in South Korea, tracing the history of CG research and its development within the global community. It highlights major international activities, introduces a leading research center and a promising emerging researcher, and examines national R&D policies with a focus on fostering international collaboration. The session offers participants a unique chance to engage with Korean experts and explore new opportunities for scientific exchange, partnership, and joint development across emerging technology domains. (* We hope to be the final session of the day to connect with the Korea Night event.)
Computer Animation Festival






DescriptionOur biggest challenge was to build a semi-realistic world where animation and atmosphere worked seamlessly together. Because KAMARADE’s main theme is loneliness, we chose a slower pace and an intentionally empty framing. We aimed for a retro-futuristic and soviet-inspired aesthetic, drawing from the visual and musical codes of the era. In post-production, we added space-like glows and a film grain to enrich the mood. We wanted to surprise our audience and show it something that felt familiar but mysterious. We tried to give the viewer just enough information, holding back key details to preserve the impact of the final twist.
Birds of a Feather






DescriptionThis Birds of a Feather is for attendees interested in the advancements in digital content creation that have transformed entertainment, gaming, and interactive media. Motion capture, virtual production, and MetaHumans now underpin storytelling and character design. Mocap systems have evolved from marker-based to AI-enhanced full-body/facial capture, producing lifelike performances in real time. Virtual production, powered by Unreal Engine, enables filmmakers to interact with digital sets rendered on LED volumes, reducing costs and improving immersion. MetaHumans provide a frontier in character creation, coupled with AI for responsive conversations. Challenges still remain, but these technologies promise ever more immersive digital experiences.
Games






DescriptionWe brought our KJ-Stick motion controller to the Hong Kong Book Fair and it was a huge hit! Families lined up to play "KJ-Stick Jump Jump Jump," a Roblox obby where you jump in real life to jump in the game.
We're merging the digital world of Roblox with physical fitness, creating laughter, exercise, and unforgettable family moments. This is the new future of active play!
We're merging the digital world of Roblox with physical fitness, creating laughter, exercise, and unforgettable family moments. This is the new future of active play!
Computer Animation Festival






DescriptionKenopsia follows a man confronting his past and the weight of his mistakes, expressed through two contrasting artistic worlds. Reality is rendered in a dimensional, semi-painted 3D style, while haunting visions take on a painterly, 2D brushed aesthetic. This contrast emphasizes the tension between life as it is and the inner world that torments him.
Central to these visions is the raven, a symbolic representation of the protagonist’s inner demons. Confined to this painterly realm, it intensifies the emotional impact and anchors the audience in the psychological dimension of the story.
By merging two different visual styles, Kenopsia explores memory, guilt, and imagination through an hybrid approach to storytelling.
Central to these visions is the raven, a symbolic representation of the protagonist’s inner demons. Confined to this painterly realm, it intensifies the emotional impact and anchors the audience in the psychological dimension of the story.
By merging two different visual styles, Kenopsia explores memory, guilt, and imagination through an hybrid approach to storytelling.
Birds of a Feather






DescriptionThis extended Khronos Fast Forward session combines high-level updates across all Khronos standards with deeper technical dives into Vulkan and Slang. The first half highlights recent advancements in open standards enabling cross-platform graphics, XR, and intelligent compute, including OpenXR, WebGL, glTF, SYCL, and more. The second half focuses on Vulkan’s continued evolution for high-performance graphics and compute, and the expanding role of the open-source Slang shading language in neural and differentiable rendering. Join Khronos experts and developers to explore how these technologies are shaping the future of portable, high-performance visual computing.
Technical Papers


DescriptionKinetic multiphase flow solvers have recently demonstrated exquisitely complex and turbulent fluid phenomena involving splashing and bubbling. However, they require full simulation of both the liquid phase and the air to capture a large spectrum of fluid behaviors. Moreover, they rely on diffuse interface tracking to properly account for the interfacial forces involved in fluid-air interactions. Consequently, simulating visually appealing fluids is extremely compute intensive given the required resolution to capture small bubbles, and foam simulation is unattainable with this family of methods. While water simulation involves density and viscosity differences between the two phases so large that one can safely ignore the dynamics of air, so-called kinetic free-surface solvers that only consider the liquid motion have been unable to reproduce the full gamut of turbulent fluid behaviors, being often unstable for even moderately complex scenarios. By revisiting kinetic solvers using sharp interfaces and incorporating recent advances in single-phase and multiphase LBM solvers, we propose a free-surface kinetic solver, which we call HOME-FREE LBM, that not only handles turbulence, glugging, and bubbling, but even foam where bubbles stick to each other through surface tension. We demonstrate that our fluid simulator allows for
fast and robust bubble growth, breakup, and coalescence, at a fraction of the computational time that existing CG fluid solvers require.
fast and robust bubble growth, breakup, and coalescence, at a fraction of the computational time that existing CG fluid solvers require.
Technical Papers


DescriptionHand-drawn vector sketches often contain implied lines, imprecise intersections, and unintended gaps, making it challenging to identify closed regions for colorization. These challenges become more pronounced as the number of strokes increases. In this paper, we present KISSColor, a novel method for inferring users’ intended closed regions. Specifically, we propose intuitive stroke stretching by extending open strokes along tangent isolines of winding number fields, which provably form geometrically aligned closed regions. Extending all open strokes can lead to overly fragmented regions due to redundant intersections. While a Mixed Integer Programming (MIP) formulation helps reduce redundancy, it is computationally expensive. To improve efficiency, we introduce kinetic stroke stretching, which grows all strokes simultaneously and prioritizes early intersections using a kinetic data structure. This approach preserves stylistic ambiguity for lines requiring long extensions. Based on the growth results, redundant regions are suppressed to minimize fragmentation. We conduct extensive experiments demonstrating the effectiveness of KISSColor, which generates more intuitive partitions, especially for imprecise sketches (see teaser figure). Our code and data will be released upon publication.
Poster






DescriptionTactile feedback on the neck is considered to hold great potential due to its unique sensory characteristics as a sensory organ. However, this area remains largely unexplored at present.
To address this potential, we propose koloHart, a neck-worn tactile choker designed to utilize the neck's distinctive properties for both sensory substitution and sensory enhancement. koloHart is a wearable haptics device equipped with eight linear resonant actuators (LRAs) arranged around the neck, capable of delivering fine vibrotactile feedback from 360-degree directions. This poster showcases a variety of potential application scenarios that highlight the expressive and immersive potential of neck-based tactile feedback.
To address this potential, we propose koloHart, a neck-worn tactile choker designed to utilize the neck's distinctive properties for both sensory substitution and sensory enhancement. koloHart is a wearable haptics device equipped with eight linear resonant actuators (LRAs) arranged around the neck, capable of delivering fine vibrotactile feedback from 360-degree directions. This poster showcases a variety of potential application scenarios that highlight the expressive and immersive potential of neck-based tactile feedback.
Poster






DescriptionKomorebi-mediated ambient notifications use dappled light to minimize attentional capture. Mechanical and optical prototypes steer movement and shadow hardness, offering gentle cues for time, status, and personal prompts in spaces.
Games






DescriptionSEDAP! is a co-op cooking-combat adventure that combines the chaos of a collaborative kitchen and the wonders of adventuring into an unexplored world. Embark on an exciting culinary journey, whip up delectable delicacies, and serve your way through a fantastical reimagination of Southeast Asia!
Technical Papers


DescriptionEmbedding a language field in a 3D representation enables richer semantic understanding of spatial environments by linking geometry with descriptive meaning. This allows for a more intuitive human-computer interaction, enabling querying or editing scenes using natural language, and could potentially improve tasks like scene retrieval, navigation, and multimodal reasoning. While such capabilities could be transformative, in particular for large-scale scenes, we find that recent feature distillation approaches cannot effectively learn over massive Internet data due to challenges in semantic feature misalignment and inefficiency in memory and runtime. To this end, we propose a novel approach to address these challenges. First, we introduce extremely low-dimensional semantic bottleneck features as part of the underlying 3D Gaussian representation. These are processed by rendering and passing them through a multi-resolution, feature-based, hash encoder. This significantly improves efficiency both in runtime and GPU memory. Second, we introduce an Attenuated Downsampler module and propose several regularizations addressing the semantic misalignment of ground truth 2D features.
We evaluate our method on the in-the-wild HolyScenes dataset and demonstrate that it surpasses existing approaches in both performance and efficiency. Code will be available.
We evaluate our method on the in-the-wild HolyScenes dataset and demonstrate that it surpasses existing approaches in both performance and efficiency. Code will be available.
Technical Papers


DescriptionDifferentiable optics, as an emerging paradigm that jointly optimizes
optics and (optional) image processing algorithms, has made many
innovative optical designs possible across a broad range of imaging and display
applications. Many of these systems utilize diffractive optical
components for holography, PSF engineering, or wavefront
shaping. Existing approaches have, however, mostly remained limited to laboratory prototypes, owing to a large quality gap between simulation and manufactured devices.
As such, we aim at lifting the fundamental technical barriers to the practical use of learned diffractive optical systems. To this end, we propose a
fabrication-aware design pipeline for diffractive optics fabricated by
direct-write grayscale lithography followed by replication with
nano-imprinting, which is directly suited for inexpensive
mass-production of large area designs. We propose a super-resolved
neural lithography model that can accurately predict the 3D geometry
generated by the fabrication process. This model can be seamlessly
integrated into existing differentiable optics frameworks, enabling
fabrication-aware, end-to-end optimization of computational optical
systems across a wide range of applications.
To tackle the computational challenges of such large-scale
inverse designs at high resolution, especially the high memory consumption, we
also devise tensor-parallel compute framework centered on distributing
large-scale FFT computation across many GPUs.
Using this combination of methods, we demonstrate large scale
diffractive optics designs up to 32.16\,mm $\times$ 21.44\,mm, simulated on
grids of up to 128,640 by 85,760 feature points. We find adequate agreement
between simulation and fabricated prototypes for applications such as
holography and PSF engineering. We also achieve high image
quality from an imaging system comprised only of a single diffractive
optical element, with images processed only by a one-step inverse filter utilizing
the simulation PSF. We believe our findings lift the fabrication limitations for real-world applications of diffractive optics and differentiable optical design.
optics and (optional) image processing algorithms, has made many
innovative optical designs possible across a broad range of imaging and display
applications. Many of these systems utilize diffractive optical
components for holography, PSF engineering, or wavefront
shaping. Existing approaches have, however, mostly remained limited to laboratory prototypes, owing to a large quality gap between simulation and manufactured devices.
As such, we aim at lifting the fundamental technical barriers to the practical use of learned diffractive optical systems. To this end, we propose a
fabrication-aware design pipeline for diffractive optics fabricated by
direct-write grayscale lithography followed by replication with
nano-imprinting, which is directly suited for inexpensive
mass-production of large area designs. We propose a super-resolved
neural lithography model that can accurately predict the 3D geometry
generated by the fabrication process. This model can be seamlessly
integrated into existing differentiable optics frameworks, enabling
fabrication-aware, end-to-end optimization of computational optical
systems across a wide range of applications.
To tackle the computational challenges of such large-scale
inverse designs at high resolution, especially the high memory consumption, we
also devise tensor-parallel compute framework centered on distributing
large-scale FFT computation across many GPUs.
Using this combination of methods, we demonstrate large scale
diffractive optics designs up to 32.16\,mm $\times$ 21.44\,mm, simulated on
grids of up to 128,640 by 85,760 feature points. We find adequate agreement
between simulation and fabricated prototypes for applications such as
holography and PSF engineering. We also achieve high image
quality from an imaging system comprised only of a single diffractive
optical element, with images processed only by a one-step inverse filter utilizing
the simulation PSF. We believe our findings lift the fabrication limitations for real-world applications of diffractive optics and differentiable optical design.
Technical Papers


DescriptionModeling 3D articulated objects with realistic geometry, textures, and kinematics is essential for a wide range of applications. However, existing optimization-based reconstruction methods often require dense multi-view inputs and expensive per-instance optimization, limiting their scalability. Recent feedforward approaches offer faster alternatives but frequently produce coarse geometry, lack texture reconstruction, and rely on brittle, complex multi-stage pipelines. We introduce LARM, a unified feedforward framework that reconstructs 3D articulated objects from sparse-view images by jointly recovering detailed geometry, realistic textures, and accurate joint structures. LARM extends LVSM—a recent novel view synthesis (NVS) approach for static 3D objects—into the articulated setting by jointly reasoning over camera pose and articulation variation using a transformer-based architecture, enabling scalable and accurate novel view synthesis. In addition, LARM generates auxiliary outputs such as depth maps and part masks to facilitate explicit 3D mesh extraction and joint estimation. Our pipeline eliminates the need for dense supervision and supports high-fidelity reconstruction across diverse object categories. Extensive experiments demonstrate that LARM outperforms state-of-the-art methods in both novel view and state synthesis as well as 3D articulated object reconstruction, generating high-quality meshes that closely adhere to the input images.
Poster






DescriptionWe propose "LayerPack", a method for training-free layer-wise frame interpolation.
Technical Papers


DescriptionImage vectorization is a powerful technique that converts raster images into vector graphics, enabling enhanced flexibility and interactivity. However, popular image vectorization tools struggle with occluded regions, producing incomplete or fragmented shapes that hinder editability. While recent advancements have explored optimization-based and learning-based layer-wise image vectorization, these methods face limitations in vectorization quality and flexibility. In this paper, we introduce LayerPeeler, a novel layer-wise image vectorization approach that addresses these challenges through a progressive simplification paradigm. The key to LayerPeeler's success lies in its autoregressive peeling strategy: by identifying and removing the topmost non-occluded layers while recovering underlying content, we generate vector graphics with complete paths and coherent layer structures. Our method leverages vision-language models to construct a layer graph that captures occlusion relationships among elements, enabling precise detection and description for non-occluded layers. These descriptive captions are used as editing instructions for a finetuned image diffusion model to remove the identified layers. To ensure accurate removal, we employ localized attention control that precisely guides the model to target regions while faithfully preserving the surrounding content. To support this, we contribute a large-scale dataset specifically designed for layer peeling tasks. Extensive quantitative and qualitative experiments demonstrate that LayerPeeler significantly outperforms existing techniques, producing vectorization results with superior path semantics, geometric regularity, and visual fidelity.
Computer Animation Festival






DescriptionThe main visual challenge with Le Cocon was to create an 2D, dreamy painting-like look throughout the film. A special attention was put into shaping the final look through hand painted textures in Mari : this meticulous work bring forth the desire of the protagonist's family to control every aspect of each members life. The image was almost completely re-build inside of Nuke with a combination of white shader passes to get light informations and albedo pass.
Technical Communications


DescriptionWe introduce a lensless camera with an implicit neural representation that reconstructs display light fields from nine samples, enabling accessible, accurate calibration within a 46.6°×37.6° viewing cone.
Technical Papers


DescriptionLearning human motion based on a time-dependent input signal presents a challenging yet impactful task with various applications. The goal of this task is to generate or estimate human movement that consistently reflects the temporal patterns of conditioning inputs. Existing methods typically rely on cross-attention mechanisms to fuse the condition with motion. However, this approach primarily captures global interactions and struggles to maintain step-by-step temporal alignment. To address this limitation, we introduce Temporally Conditional Mamba, a new mamba-based model for human motion understanding. Our approach integrates conditional information into the recurrent dynamics of the Mamba block, enabling better temporally aligned motion. To validate the effectiveness of our method, we evaluate it on a variety of human motion tasks. Extensive experiments demonstrate that our model significantly improves temporal alignment, motion realism, and condition consistency over state-of-the-art approaches.
Educator's Forum



DescriptionThis study developed learning materials for introductory computer programming from the viewpoint of gender and field inclusion. Subjects often used as learning materials for programming, like robotics and electronics, do not engage female students and students studying the humanities. We have considered the strategy of gender-inclusive and field-inclusive programming education, with a focus on data visualization. Finding interesting data and expressing it in an informative way using programming would enhance learners’ motivation regardless of their major. We have developed an original library (Datamate.js) for data visualization, web content for classroom instruction and self-study, and a layout tool for publishing data visualization results on a web page. In this paper, we report on the library and the tool we developed, as well as our classroom practices at an art university.
Technical Papers


DescriptionLearning a control policy for a multi-phase, long-horizon task, such as basketball maneuvers, remains challenging for reinforcement learning approaches due to the need for seamless policy composition and transitions between skills. A long-horizon task typically consists of distinct subtasks with well-defined goals, separated by transitional subtasks with unclear goals but critical to the success of the entire task. Existing methods like the mixture of experts and skill chaining struggle with tasks where individual policies do not share significant commonly explored states or lack well-defined initial and terminal states between different phases. In this paper, we introduce a novel policy integration framework to enable the composition of drastically different motor skills in multi-phase long-horizon tasks with ill-defined intermediate states. Based on that, we further introduce a high-level soft router to enable seamless and robust transitions between the subtasks. We evaluate our framework on a set of fundamental basketball skills and challenging transitions. Policies trained by our approach can effectively control the simulated character to interact with the ball and accomplish the long-horizon task specified by real-time user commands, without relying on ball trajectory references.
Technical Papers


DescriptionFocus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography.
Technical Papers


DescriptionThis paper presents LEGO-Maker, a new learning-based generative model that can effectively consider over 100 unique brick types and rapidly generate hundreds of bricks to create LEGO models conditioned on images. This work has three major technical contributions that enable it to achieve surpassing capabilities beyond existing generative approaches. First, we design a compact LEGO tokenization scheme to serialize LEGO models and bricks into tokens for autoregressive learning. Second, we build LEGO-Maker, an autoregressive image-conditioned architecture, with a multi-token prediction strategy to encourage pre-considering multiple brick attributes and a rollback mechanism for collision-free generation. Third, we propose an effective data preparation pipeline with a procedural generator to synthesize LEGO models and a LEGO-to-real image translator distilled from a large vision language model to translate LEGO renderings into associated photorealistic images, leveraging rich prior to address the scarcity of image-to-LEGO data. Extensive evaluations and comparisons are conducted on two object categories, facade and portrait, over metrics in four aspects: geometry, color, semantics, and structural integrity, together with a user study. Experimental results demonstrate the versatility and compelling strengths of LEGO-Maker in producing structures and details given by the reference image. Also, the evaluation scores manifest that our method clearly surpasses the baselines, consistently for all evaluation metrics.
Technical Papers


DescriptionAutomated LEGO design is challenging due to the extensive variety of LEGO brick types and the necessity of constructing semantically meaningful models from individually meaningless components. Current automatic LEGO generation methods face two key challenges: i) They typically rely on explicit modeling of brick connectivity to ensure structural validity. However, this requires extensive manual annotation, which is labor-intensive as the variety of LEGO primitives increases. This limits training data diversity, restricting the variety of LEGO bricks that can be effectively utilized. ii) To facilitate learning within neural networks, current methods often employ either volume or text-based descriptions to represent LEGO models. However, volumetric representations are computationally expensive and hamper large-scale generative training, while text-based approaches rely on large language models and dedicated text-to-brick mapping rules, introducing a semantic gap between language tokens and 3D brick structures.
To address these challenges, we propose LegoACE, an autoregressive construction engine for expressive LEGO assembly generation conditioned on text prompts or multi-view normal maps. By leveraging the sequential nature of LEGO assemblies, LegoACE implicitly learns connection relationships by modeling the conditional distribution of each brick’s position, orientation, and type based on previously placed bricks. This formulation enables us to construct a large-scale dataset, LegoVerse, which comprises over 55,000 unique models with 9,314 different brick types. To precisely represent LEGO primitives while minimizing computational overhead, we propose a tokenization strategy, LEGO Native Tokenization Algorithm, which transforms unstructured bricks into compact tokens encoding position, rotation, and type. This enables a decoder-only transformer to autoregressively generate LEGO models based on conditional inputs. Furthermore, to enhance model performance, we apply Direct Preference Optimization (DPO) to fine-tune LegoACE. Our experimental results show that LegoACE demonstrates the ability to generate diverse and expressive LEGO assemblies with robust brick connectivity, significantly advancing the state-of-the-art in LEGO model generation.
To address these challenges, we propose LegoACE, an autoregressive construction engine for expressive LEGO assembly generation conditioned on text prompts or multi-view normal maps. By leveraging the sequential nature of LEGO assemblies, LegoACE implicitly learns connection relationships by modeling the conditional distribution of each brick’s position, orientation, and type based on previously placed bricks. This formulation enables us to construct a large-scale dataset, LegoVerse, which comprises over 55,000 unique models with 9,314 different brick types. To precisely represent LEGO primitives while minimizing computational overhead, we propose a tokenization strategy, LEGO Native Tokenization Algorithm, which transforms unstructured bricks into compact tokens encoding position, rotation, and type. This enables a decoder-only transformer to autoregressively generate LEGO models based on conditional inputs. Furthermore, to enhance model performance, we apply Direct Preference Optimization (DPO) to fine-tune LegoACE. Our experimental results show that LegoACE demonstrates the ability to generate diverse and expressive LEGO assemblies with robust brick connectivity, significantly advancing the state-of-the-art in LEGO model generation.
Key Event
Real-Time Live!



DescriptionWe present “Let There Be Light”, a real-time system that extracts discrete, production-ready lighting parameters from in-the-wild character images for direct use in tools such as Blender and Unreal Engine. Unlike traditional inverse rendering or environment map estimation, our generative neural network predicts controllable light configurations, including light types, positions, colors, and intensities, optimized for real-world animation workflows.
Built on a cascade neural architecture, the system first encodes intrinsic visual cues and then decodes them into a compact, discrete lighting representation. Trained on a large curated dataset of synthetic and real imagery with ground-truth lighting setups, it achieves near real-time inference while maintaining production-level accuracy.
On stage, we will demonstrate the system as a Blender Add-on, showcasing its robustness across diverse inputs and its ability to accelerate animation lighting design from hours to seconds. The tool empowers artists to replicate, adapt, and iterate lighting setups interactively, transforming what was once a labor-intensive process into an intuitive, generative experience. Beyond character lighting, the approach opens new possibilities for real-time scene authoring and cinematic lighting design in virtual production.
Built on a cascade neural architecture, the system first encodes intrinsic visual cues and then decodes them into a compact, discrete lighting representation. Trained on a large curated dataset of synthetic and real imagery with ground-truth lighting setups, it achieves near real-time inference while maintaining production-level accuracy.
On stage, we will demonstrate the system as a Blender Add-on, showcasing its robustness across diverse inputs and its ability to accelerate animation lighting design from hours to seconds. The tool empowers artists to replicate, adapt, and iterate lighting setups interactively, transforming what was once a labor-intensive process into an intuitive, generative experience. Beyond character lighting, the approach opens new possibilities for real-time scene authoring and cinematic lighting design in virtual production.
Games






DescriptionWordcross is a competitive word game that fuses the strategic thinking of classic crossword puzzles with the intensity of real-time multiplayer battles. Players race against each other to create the highest-scoring crossword-style grid before the clock runs out. Every round is a fast-paced contest of vocabulary, spatial strategy, and quick decision-making.
Power-ups add an extra layer of tactical play, allowing you to block, steal, or boost your way to victory. Matches are dynamic and unpredictable, rewarding both language skills and clever maneuvering.
Designed and developed entirely in-house at Leveret Games, Wordcross combines efficient word-generation algorithms, smooth matchmaking, and an intuitive interface to deliver accessible yet highly competitive gameplay. More than just entertainment, it encourages vocabulary growth, pattern recognition, and friendly rivalry.
Whether you are a casual player or a competitive challenger, Wordcross delivers a fresh, exciting take on multiplayer word gaming.
Power-ups add an extra layer of tactical play, allowing you to block, steal, or boost your way to victory. Matches are dynamic and unpredictable, rewarding both language skills and clever maneuvering.
Designed and developed entirely in-house at Leveret Games, Wordcross combines efficient word-generation algorithms, smooth matchmaking, and an intuitive interface to deliver accessible yet highly competitive gameplay. More than just entertainment, it encourages vocabulary growth, pattern recognition, and friendly rivalry.
Whether you are a casual player or a competitive challenger, Wordcross delivers a fresh, exciting take on multiplayer word gaming.
Technical Papers


DescriptionComputing the boundary surface of the 3D volume swept by a rigid or deforming solid remains a challenging problem in geometric modeling. Existing approaches are often limited to sweeping rigid shapes, cannot guarantee a watertight surface, or struggle with modeling the intricate geometric features (e.g., sharp creases and narrow gaps) and topological features (e.g., interior voids). We make the observation that the sweep boundary is a subset of the projection of the intersection of two implicit surfaces in a higher dimension, and we derive a characterization of the subset using winding numbers. These insights lead to a general algorithm for any sweep represented as a smooth time-varying implicit function satisfying a genericity assumption, and it produces a watertight and intersection-free surface that better approximates the geometric and topological features than existing methods.
Technical Papers


DescriptionIn user-generated-content (UGC) applications, non-expert users often rely on image-to-3D generative models to create 3D assets. In this context, primitive-based shape abstraction offers a promising solution for UGC scenarios by compressing high-resolution meshes into compact, editable representations. Towards this end, effective shape abstraction must therefore be structure-aware, characterized by low overlap between primitives, part-aware alignment, and primitive compactness. We present Light-SQ, a novel superquadric-based optimization framework that explicitly emphasizes structure-awareness from three aspects. (a) We introduce SDF carving to iteratively udpate the target signed distance field, discouraging overlap between primitives. (b) We propose a block-regrow-fill strategy guided by structure-aware volumetric decomposition, enabling structural partitioning to drive primitive placement. (c) We implement adaptive residual pruning based on SDF update history to surpress over-segmentation and ensure compact results.
In addition, Light-SQ supports multiscale fitting, enabling localized refinement to preserve fine geometric details. To evaluate our method, we introduce 3DGen-Prim, a benchmark extending 3DGen-Bench with new metrics for both reconstruction quality and primitive-level editability. Extensive experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics for complex generated geometry, advancing the feasibility of 3D UGC creation.
In addition, Light-SQ supports multiscale fitting, enabling localized refinement to preserve fine geometric details. To evaluate our method, we introduce 3DGen-Prim, a benchmark extending 3DGen-Bench with new metrics for both reconstruction quality and primitive-level editability. Extensive experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics for complex generated geometry, advancing the feasibility of 3D UGC creation.
Technical Papers


DescriptionSupersampling has proven highly effective in enhancing visual fidelity by reducing aliasing, increasing resolution, and generating interpolated frames. It has become a standard component of modern real-time rendering pipelines. However, on mobile platforms, deep learning-based supersampling methods remain impractical due to stringent hardware constraints, while non-neural supersampling techniques often fall short in delivering perceptually high-quality results. In particular, producing visually pleasing reconstructions and temporally coherent interpolations is still a significant challenge in mobile settings. In this work, we present a novel, lightweight supersampling framework tailored for mobile devices. Our approach substantially improves both image reconstruction quality and temporal consistency while maintaining real-time performance. For super-resolution, we propose an intra-pixel object coverage estimation method for reconstructing high-quality anti-aliased pixels in edge regions, a gradient-guided strategy for non-edge areas, and a temporal sample accumulation approach to improve overall image quality. For frame interpolation, we develop an efficient motion estimation module coupled with a lightweight fusion scheme that integrates both estimated optical flow and rendered motion vectors, enabling temporally coherent interpolation of object dynamics and lighting variations. Extensive experiments demonstrate that our method consistently outperforms existing baselines in both perceptual image quality and temporal smoothness, while maintaining real-time performance on mobile GPUs. A demo application and supplementary materials are available on the project page.
Key Event
Real-Time Live!



DescriptionLiminal is an AI-assisted interactive audiovisual installation that merges gesture, sound, and image into a unified real-time experience. The work invites participants to explore how music and visuals can emerge organically from human movement.
Using computer vision (CV), the system captures the participant’s body motions through a camera and translates them into coordinate data. These data activate AI-generated digital ink landscapes and luodian-inspired visuals, enhanced by dynamic real-time effects such as glitch textures and particle flows.
At the same time, the motion information drives a real-time sound synthesis engine that transforms gesture into musical parameters—pitch, duration, and spatial position. All sonic materials are generated live, framed within the aesthetics of traditional Chinese music, allowing ancient timbres and modern synthesis to blend seamlessly.
In Liminal, every element—painting, visuals, and music—is created in the moment, continually reshaping itself in response to human gesture. Rather than a fixed composition, it is a generative environment—a dialogue between technology and intuition, where creation begins the instant the participant moves.
Using computer vision (CV), the system captures the participant’s body motions through a camera and translates them into coordinate data. These data activate AI-generated digital ink landscapes and luodian-inspired visuals, enhanced by dynamic real-time effects such as glitch textures and particle flows.
At the same time, the motion information drives a real-time sound synthesis engine that transforms gesture into musical parameters—pitch, duration, and spatial position. All sonic materials are generated live, framed within the aesthetics of traditional Chinese music, allowing ancient timbres and modern synthesis to blend seamlessly.
In Liminal, every element—painting, visuals, and music—is created in the moment, continually reshaping itself in response to human gesture. Rather than a fixed composition, it is a generative environment—a dialogue between technology and intuition, where creation begins the instant the participant moves.
Poster






DescriptionLineDiff explores the possibility of incorporating a generative optical flow network into a traditional warp-and-synthesis frame interpolation network for line-drawing data, with the aim of embracing the merits of both.
Computer Animation Festival






DescriptionIn a world of towering mountains, dutiful train conductor Peregrin navigates his train through dangerous terrain like a well-oiled clock. One day, the free-spirited musician Clara smuggles herself into his machine room. Soon, catastrophe strikes, and Clara and Peregrin have to overcome their differences to save the train from derailing.
Poster






Description“Living Literature” revolutionizes e-book reading by blending VR, haptics, audio, and biosensing into a shared, emotion-responsive experience, enhancing engagement, immersion, and discussion depth in collaborative reading clubs
Technical Papers


DescriptionWe present LLM-Primitives: Large Language Model for 3D Reconstruction with Primitives, a novel approach to shape abstraction. By incorporating multi-modal conditional inputs, our method enables LLMs to reconstruct high-quality 3D primitives using only a modest amount of training data (tens of thousands of samples). This work marks a significant milestone in applying large language models to 3D primitive-based reconstruction, demonstrating both their feasibility and effectiveness in this domain.
Specifically, we leverage the point clouds of existing 3D models as conditional inputs to the LLM via a multi-modal connector. Instead of directly estimating primitive parameters, we introduce a center-to-surface vector representation, ensuring deterministic outputs and avoiding the ambiguity often associated with primitive parameterization. Experimental results show that LLM-Primitives surpass state-of-the-art 3D primitive methods across various quantitative metrics. Notably, the substantial improvements in visual quality further confirm that LLM-Primitives can reconstruct high-quality, practical 3D primitives. (Project page: \url{https://llm-primitives.github.io/LLM-Primitives/})
Specifically, we leverage the point clouds of existing 3D models as conditional inputs to the LLM via a multi-modal connector. Instead of directly estimating primitive parameters, we introduce a center-to-surface vector representation, ensuring deterministic outputs and avoiding the ambiguity often associated with primitive parameterization. Experimental results show that LLM-Primitives surpass state-of-the-art 3D primitive methods across various quantitative metrics. Notably, the substantial improvements in visual quality further confirm that LLM-Primitives can reconstruct high-quality, practical 3D primitives. (Project page: \url{https://llm-primitives.github.io/LLM-Primitives/})
Technical Papers


DescriptionWe present a general method for computing local parameterizations rooted at a point on a surface, where the surface is described only through a signed implicit function and a corresponding projection function. Using a two-stage process, we compute several points radially emanating from the map origin, and interpolate between them with a spline surface. The narrow interface of our method allows it to support several kinds of geometry such as signed distance functions, general analytic implicit functions, triangle meshes, neural implicits, and point clouds. We demonstrate the high quality of our generated parameterizations on a variety of examples, and show applications in local texturing and surface curve drawing.
Invited Poster
Poster






Description3D Gaussian Splatting (3DGS) provides high-fidelity novel-view synthesis but struggles in online long-sequence scenarios due to slow per-scene optimization and inefficient frame-wise updates. We present LongSplat, a real-time online 3D Gaussian reconstruction framework for long image streams. LongSplat maintains a global 3DGS set and introduces a streaming update strategy that compresses redundant historical Gaussians while adding new ones by comparing current observations with past states. A key component is the Gaussian-Image Representation (GIR), which encodes 3D Gaussian parameters into an image-like structure, enabling redundancy compression and fusion across frames. LongSplat achieves state-of-the-art efficiency–quality trade-offs with significantly reduced Gaussian counts.
Technical Papers


DescriptionHigh speed, high-resolution, and accurate 3D scanning would open doors to many new applications in graphics, robotics, science, and medicine by enabling the accurate scanning of deformable objects during interactions. Past attempts to use structured light, time-of-flight, and stereo in high-speed settings have usually required tradeoffs in resolution or inaccuracy. In this paper, we introduce a method that enables, for the first time, 3D scanning at 450 frames per second at 1 Megapixel, or 1,450 frames per second at 0.4 Megapixel in an environment with controlled lighting. The key idea is to use a per-pixel lookup table that maps colors to depths, which is built using a linear stage. Imperfections, such as lens-distortion and sensor defects are baked into the calibration. We describe our method and test it on a novel hardware prototype. We compare the system with both ground-truth geometry as well as commercially available dynamic sensors like the Microsoft Kinect and Intel Realsense. Our results show the system acquiring geometry of objects undergoing high-speed deformations and oscillations and demonstrate the ability to recover physical properties from the reconstructions.
Technical Communications
Lost in the Interface: How Structured UI Complexity Challenges Large Vision Language Models in Games
3:15pm - 3:30pm HKT Monday, 15 December 2025 Meeting Room S422, Level 4

DescriptionWe present the first benchmark analyzing how large vision-language models perform under varying levels of game UI complexity, providing new insights for deploying AI systems in visually demanding interactive environments.
Invited Poster
Poster






DescriptionLove That Defies Mortality adapts the Ming Dynasty classic The Peony Pavilion through asymmetric virtual reality and mobile collaboration. By distributing narrative roles across devices and perspectives, users embody Du Liniang and Liu Mengmei to traverse dream and reality in a digitally reimagined love story that bridges tradition and technology.
Technical Papers


DescriptionProcessing visual data often involves small adjustments or sequences of changes, e.g., image filtering, surface smoothing, and animation. While established graphics techniques like normal mapping and video compression exploit redundancy to encode such small changes efficiently, the problem of encoding small changes to neural fields—neural network parameterizations of visual or physical functions—has received less attention. We propose a parameter-efficient strategy for updating neural fields using low-rank adaptations (LoRA). LoRA, a method from the parameter-efficient fine-tuning LLM community, encodes small updates to pre-trained models with minimal computational overhead. We adapt LoRA for instance-specific neural fields, avoiding the need for large pre-trained models and yielding lightweight updates. We validate our approach with experiments in image filtering, geometry editing, video compression, and energy-based editing, demonstrating its effectiveness and versatility for representing neural field updates.
Technical Papers


DescriptionSpeech-driven 3D facial animation has attracted increasing interest since its potential to generate expressive and temporally synchronized digital humans. While recent works have begun to explore emotion-aware animation, they still depend on explicit one-hot encodings to represent identity and emotion with given emotion and identity labels, which limits their ability to generalize to unseen speakers. Moreover, the emotional cues inherently present in speech are often neglected, limiting the naturalness and adaptability of generated animations.
In this work, we propose LSF-Animation,
a novel framework that eliminates the reliance on explicit emotion and identity feature representations. Specifically, LSF-Animation implicitly extracts emotion information from speech and captures the identity features from a neutral facial mesh, enabling improved generalization
to unseen speakers and emotional states without requiring manual labels.
Furthermore, we introduce a Hierarchical Interaction Fusion Block (HIFB), which employs a fusion token to integrate dual transformer features and more effectively integrate emotional, motion-related and identity-related cues. Extensive experiments conducted on the 3DMEAD dataset demonstrate that our method surpasses recent state-of-the-art approaches in terms of emotional expressiveness, identity generalization, and animation realism.
The source code will be released on GitHub.
In this work, we propose LSF-Animation,
a novel framework that eliminates the reliance on explicit emotion and identity feature representations. Specifically, LSF-Animation implicitly extracts emotion information from speech and captures the identity features from a neutral facial mesh, enabling improved generalization
to unseen speakers and emotional states without requiring manual labels.
Furthermore, we introduce a Hierarchical Interaction Fusion Block (HIFB), which employs a fusion token to integrate dual transformer features and more effectively integrate emotional, motion-related and identity-related cues. Extensive experiments conducted on the 3DMEAD dataset demonstrate that our method surpasses recent state-of-the-art approaches in terms of emotional expressiveness, identity generalization, and animation realism.
The source code will be released on GitHub.
Technical Papers


DescriptionLarge transformer models are proving to be a powerful tool for 3D vision and novel view synthesis. However, the standard Transformer's well-known quadratic complexity makes it difficult to scale these methods to large scenes. To address this challenge, we propose the Local View Transformer (LVT), a large-scale scene reconstruction and novel view synthesis architecture that circumvents the need for the quadratic attention operation. Motivated by the insight that spatially nearby views provide more useful signal about the local scene composition than distant views, our model processes all information in a local neighborhood around each view. To attend to tokens in nearby views, we leverage a novel positional encoding that conditions on the relative geometric transformation between the query and nearby views. We decode the output of our model into a 3D Gaussian Splat scene representation that includes both color and opacity view-dependence.
Taken together, the Local View Transformer enables reconstruction of arbitrarily large, high-resolution scenes in a single forward pass.
Taken together, the Local View Transformer enables reconstruction of arbitrarily large, high-resolution scenes in a single forward pass.
Art Gallery






DescriptionWhen humans first carved symbols into stone, civilization began. Machine Civilization imagines robot observers as the first recorders of a new machine civilization. It gathers data and transmits it to an AI Parliament, where agents evolve language and create the first machine script, raising questions of coexistence.
Art Gallery






DescriptionMachine Dreaming is an interactive installation exploring human-AI perceptual and social dynamics. Viewers real-time video footage is transformed through AI, into alien-plant-like forms responsive to movement and gesture. The work prompts viewers to reflect on their relationship to AI and how perception, agency, and representation are negotiated with these systems.
Technical Papers


DescriptionWe propose a generative model that, given a coarsely edited image, synthesizes a photorealistic output that follows the prescribed layout. By using simple segmentations and coarse 2D manipulations, we can synthesize a photorealistic edit faithful to the user's input while addressing secondary effects like lighting and interactions between objects.
Invited Poster
Poster






DescriptionWe introduce the first diffusion-based method for generating coherent, controllable scroll images in VR, enhancing storytelling through adaptive multi-window strategies for multiple narrative structures, significantly improving visual consistency and user immersion.
Featured Session



DescriptionGo behind the scenes of the making of the new Netflix animated feature, IN YOUR DREAMS, with the film's writer/director Alex Woo. Produced by Berkeley based Kuku Studios with animation by Sony Pictures Imageworks, he’ll reveal how a group of independent filmmakers persevered through a 9-year journey to turn their dreams into reality. The film is a comedy adventure about Stevie and her brother Elliot, who journey into the landscape of their own dreams. If the siblings can withstand a snarky stuffed giraffe, zombie breakfast foods, and the queen of nightmares, the Sandman will grant them their ultimate dream, the perfect family. Poster signing session to follow.
Technical Papers


DescriptionRecent advances in text-to-image models have enabled a new era of creative and controllable image generation. However, generating compositional scenes with multiple subjects and attributes remains a significant challenge. To enhance user control over subject placement, several layout-guided methods have been proposed. However, these methods face numerous challenges, particularly in compositional scenes. Unintended subjects often appear outside the layouts, generated images can be out-of-distribution and contain unnatural artifacts, or attributes bleed across subjects, leading to incorrect visual outputs. In this work, we propose MALeR, a method that addresses each of these challenges. Given a text prompt and corresponding layouts, our method prevents subjects from appearing outside the given layouts while being in-distribution. Additionally, we propose a masked, attribute-aware binding mechanism that prevents attribute leakage, enabling accurate rendering of subjects with multiple attributes, even in complex compositional scenes. Qualitative and quantitative evaluation demonstrates that our method achieves superior performance in compositional accuracy, generation consistency, and attribute binding compared to previous work. MALeR is particularly adept at generating images of scenes with multiple subjects and multiple attributes per subject.
Poster






DescriptionThis paper presents a system for exploring pre-generated inbetweens via motion trajectories, enhancing efficiency and user experience in 2D animation production through interactive browsing.
Technical Papers


DescriptionDiscontinuous visibility changes remain a major bottleneck when optimizing surfaces within a physically based inverse renderer. Many previous works have proposed sophisticated algorithms and data structures to sample visibility silhouettes more efficiently.
Our work presents another solution: instead of differentiating a tentative surface locally, we differentiate a non-local perturbation of a surface. We refer to this as a “many-worlds” representation because it models a non-interacting superposition of conflicting explanations (“worlds”) of the input dataset. Each world is optically isolated from others, leading to a new transport law that distinguishes our method from prior work based on exponential random media.
The resulting Monte Carlo algorithm is simpler and more efficient than prior methods. We demonstrate that our method promotes rapid convergence, both in terms of the total iteration count and the cost per iteration.
Our work presents another solution: instead of differentiating a tentative surface locally, we differentiate a non-local perturbation of a surface. We refer to this as a “many-worlds” representation because it models a non-interacting superposition of conflicting explanations (“worlds”) of the input dataset. Each world is optically isolated from others, leading to a new transport law that distinguishes our method from prior work based on exponential random media.
The resulting Monte Carlo algorithm is simpler and more efficient than prior methods. We demonstrate that our method promotes rapid convergence, both in terms of the total iteration count and the cost per iteration.
Technical Papers


DescriptionAccurate surface geometry representation is crucial in 3D visual computing. Explicit representations, such as polygonal meshes, and implicit representations, like signed distance functions, each have distinct advantages, making efficient conversions between them increasingly important. Conventional surface extraction methods for implicit representations, such as the widely used Marching Cubes algorithm, rely on spatial decomposition and sampling, leading to inaccuracies due to fixed and limited resolution. We introduce a novel approach for analytically extracting surfaces from neural implicit functions. Our method operates natively in parallel and can navigate large neural architectures. By leveraging the fact that each neuron partitions the domain, we develop a depth-first traversal strategy to efficiently track the encoded surface. The resulting meshes faithfully capture the full geometric information from the network without ad-hoc spatial discretization, achieving unprecedented accuracy across diverse shapes and network architectures while maintaining competitive speed.
Technical Communications


DescriptionWe propose a 3D Gaussian Splatting framework using MCMC pruning and surface normal learning, enabling real-time relighting, faster convergence, and high-quality rendering with fewer primitives and reduced memory.
Technical Papers


DescriptionWe tackle the challenges of synthesizing versatile, physically simulated human motions for full-body object manipulation. Unlike prior methods that are focused on detailed motion tracking, trajectory following, or teleoperation, our framework enables users to specify versatile high-level objectives such as target object poses or body poses. To achieve this, we introduce MaskedManipulator, a generative control policy distilled from a tracking controller trained on large-scale human motion capture data. This two-stage learning process allows the system to perform complex interaction behaviors, while providing intuitive user control over both character and object motions. MaskedManipulator produces goal-directed manipulation behaviors that expand the scope of interactive animation systems beyond task-specific solutions.
Poster






DescriptionWe present work toward a free library of high-fidelity, physically based lunar surface materials to support training and simulation in support of future human and robotic missions to the Moon.
Technical Papers


DescriptionWe propose a novel optimization framework for computing the medial axis transform that simultaneously preserves the medial structure and ensures high medial mesh quality. The medial structure, consisting of interconnected sheets, seams, and junctions, provides a natural volumetric decomposition of a 3D shape. Our method introduces a structure-aware, particle-based optimization pipeline guided by the restricted power diagram (RPD), which partitions the input volume into convex cells whose dual encodes the connectivity of the medial mesh. Structure-awareness is enforced through a spherical quadratic error metric (SQEM) projection that constrains the movement of medial spheres, while a Gaussian kernel energy encourages an even spatial distribution. Compared to feature-preserving methods such as MATFP and MATTopo, our approach produces cleaner and more accurate medial structures with significantly improved mesh quality. In contrast to voxel-based, point-cloud-based, and variational methods, our framework is the first to integrate structural awareness into the optimization process, yielding medial meshes with superior geometric fidelity, topological correctness, and explicit structural decomposition.
Technical Communications


DescriptionMCP-Unity integrates AI with the Unity engine, replacing fragile code generation with a stable, state-aware protocol. This allows creators of all skill levels to intuitively build 3D worlds.
Technical Papers


DescriptionWe propose a fast, robust, and user-controllable algorithm for knot untangling and volume-filling curves. We extend prior work on surface-filling curves to the more challenging case of 3D volumes, equipped with a specialized gradient preconditioner that allows larger step sizes. Our method exhibits orders of magnitude faster runtime than existing methods. Our framework provides a whole new set of parameters to guide the shape of the curve, making it ideal for interactive design applications.
Invited Poster
Poster






DescriptionWe enhance the geometric accuracy and topological correctness of boundary representation (B-rep) reconstruction from meshes through two novel techniques: robust primitive extraction via iterative noise estimation; intersection-aware constraint detection and optimization.
Birds of a Feather






DescriptionThe Metaverse Standards Forum, with 2600+ global members, facilitates cooperation between standards organizations to enable AI-driven spatial computing. This BOF explores how the Forum coordinates pre- and post-standardization activities helping to create the constellation of interoperability standards needed to create the open metaverse as the spatial evolution of the web.
Technical Papers


DescriptionTraditional integral wood joints, despite their strength, durability, and elegance, remain rare in modern workflows due to the cost and difficulty of manual fabrication. CNC milling offers a scalable alternative, but directly milling traditional joints often fails to produce functional results
because milling induces geometric deviations---such as rounded inner corners---that alter the target geometries of the parts. Since joints rely on tightly fitting surfaces, such deviations introduce gaps or overlaps that undermine fit or block assembly. We propose to overcome this problem by (1) designing a language that represent millable geometry, and (2) co-optimizing part geometries to restore coupling. We introduce Millable Extrusion Geometry (MXG), a language for representing geometry as the outcome of milling operations performed with flat-end drill bits. MXG represents each operation as a subtractive extrusion volume defined by a tool direction and drill radius. This parameterization enables the modeling of artifact-free geometry under an idealized zero-radius drill bit, matching traditional joint designs. Increasing the radius then reveals milling-induced deviations, which compromise the integrity of the joint. To restore coupling, we formalize tight coupling in terms of both surface proximity and proximity constraints on the mill-bit paths associated with mating surfaces. We then derive two tractable, differentiable losses that enable efficient optimization of joint geometry. We evaluate our method on 30 traditional joint designs, demonstrating that it produces CNC-compatible, tightly fitting joints that approximates the original geometry. By reinterpreting traditional joints for CNC workflows, we continue the evolution of this heritage craft and help ensure its relevance in future making practices.
because milling induces geometric deviations---such as rounded inner corners---that alter the target geometries of the parts. Since joints rely on tightly fitting surfaces, such deviations introduce gaps or overlaps that undermine fit or block assembly. We propose to overcome this problem by (1) designing a language that represent millable geometry, and (2) co-optimizing part geometries to restore coupling. We introduce Millable Extrusion Geometry (MXG), a language for representing geometry as the outcome of milling operations performed with flat-end drill bits. MXG represents each operation as a subtractive extrusion volume defined by a tool direction and drill radius. This parameterization enables the modeling of artifact-free geometry under an idealized zero-radius drill bit, matching traditional joint designs. Increasing the radius then reveals milling-induced deviations, which compromise the integrity of the joint. To restore coupling, we formalize tight coupling in terms of both surface proximity and proximity constraints on the mill-bit paths associated with mating surfaces. We then derive two tractable, differentiable losses that enable efficient optimization of joint geometry. We evaluate our method on 30 traditional joint designs, demonstrating that it produces CNC-compatible, tightly fitting joints that approximates the original geometry. By reinterpreting traditional joints for CNC workflows, we continue the evolution of this heritage craft and help ensure its relevance in future making practices.
Technical Papers


DescriptionWe present MILO (Metric for Image- and Latent-space Optimization), a lightweight, multiscale, perceptual metric for full-reference image quality assessment (FR-IQA).
MILO is trained using pseudo-MOS (Mean Opinion Score) supervision, in which reproducible distortions are applied to diverse images and scored via an ensemble of recent quality metrics that account for visual masking effects.
This approach enables accurate learning without requiring large-scale human-labeled datasets. Despite its compact architecture, MILO outperforms existing metrics across standard FR-IQA benchmarks and offers fast inference suitable for real-time applications.
Beyond quality prediction, we demonstrate the utility of MILO as a perceptual loss in both image and latent domains. In particular, we show that spatial masking modeled by MILO, when applied to latent representations from a VAE encoder within Stable Diffusion, enables efficient and perceptually aligned optimization. By combining spatial masking with a curriculum learning strategy, we first process perceptually less relevant regions before progressively shifting the optimization to more visually distorted areas. This strategy leads to significantly improved performance in tasks like denoising, super-resolution, and face restoration, while also reducing computational overhead.
MILO thus functions as both a state-of-the-art image quality metric and as a practical tool for perceptual optimization in generative pipelines.
MILO is trained using pseudo-MOS (Mean Opinion Score) supervision, in which reproducible distortions are applied to diverse images and scored via an ensemble of recent quality metrics that account for visual masking effects.
This approach enables accurate learning without requiring large-scale human-labeled datasets. Despite its compact architecture, MILO outperforms existing metrics across standard FR-IQA benchmarks and offers fast inference suitable for real-time applications.
Beyond quality prediction, we demonstrate the utility of MILO as a perceptual loss in both image and latent domains. In particular, we show that spatial masking modeled by MILO, when applied to latent representations from a VAE encoder within Stable Diffusion, enables efficient and perceptually aligned optimization. By combining spatial masking with a curriculum learning strategy, we first process perceptually less relevant regions before progressively shifting the optimization to more visually distorted areas. This strategy leads to significantly improved performance in tasks like denoising, super-resolution, and face restoration, while also reducing computational overhead.
MILO thus functions as both a state-of-the-art image quality metric and as a practical tool for perceptual optimization in generative pipelines.
Technical Papers


DescriptionWhile recent advances in Gaussian Splatting have enabled fast reconstruction of high-quality 3D scenes from images, extracting accurate surface meshes remains a challenge. Current approaches extract the surface through costly post-processing steps, resulting in the loss of fine geometric details or requiring significant time and leading to very dense meshes with millions of vertices. More fundamentally, the a posteriori conversion from a volumetric to a surface representation limits the ability of the final mesh to preserve all geometric structures captured during training. We present MILo, a novel Gaussian Splatting framework that bridges the gap between volumetric and surface representations by differentiably extracting a mesh from the 3D Gaussians, and using the two representations jointly during training. Our method introduces three key technical contributions: (1) a bidirectional consistency framework ensuring both representations capture the same underlying geometry. (2) an adaptive mesh extraction process performed at each training iteration, which uses Gaussians as differentiable pivots for Delaunay triangulation; (3) a novel method for computing signed distance values from the 3D Gaussians that enables precise surface extraction while avoiding geometric erosion. Our approach can reconstruct complete scenes, including backgrounds, with state-of-the-art quality while requiring an order of magnitude fewer mesh vertices than previous methods. Due to their lightweight and empty interior, our meshes are well-suited for downstream applications such as physics simulations and animation.
Poster






DescriptionMindEye is a VR-based therapeutic system combining guided imagery and spatial-semantic association to restore visual imagery abilities in individuals with aphantasia and acquired imagery impairments.
Poster






DescriptionMindscape presents a VR psychological thriller where players embody a psychiatrist investigating murders while discovering their fragmented identity through multi-perspective storytelling, environmental symbolism, and innovative interaction design mechanics.
Poster






DescriptionA training-free pipeline for generating a spherical panorama scene from a single image.
Invited Poster
Poster






DescriptionAdvanced driver assistance systems require a comprehensive understanding of the driver’s mental and physical states as well as the surrounding traffic context. However, existing studies often overlook the advantages of jointly learning these related tasks. To address this gap, we propose MMTL-UniAD, a unified multimodal multi-task learning framework that simultaneously recognizes driver behavior, driver emotion, vehicle behavior, and traffic context, enabling more holistic and intelligent driving assistance.
Technical Papers


DescriptionWe introduce a gaze-tracking free method to reduce OLED display power consumption in VR with minimal perceptual impact. This technique exploits the time course of chromatic adaptation, the human visual system’s ability to maintain stable color perception under changing illumination.
To that end, we propose a novel psychophysical paradigm that models how human adaptation state changes with the scene illuminant. We exploit this model to compute an optimal illuminant shift trajectory, controlling the rate and extent of illumination change, to reduce display power under a given perceptual loss budget. Our technique significantly improves the perceptual quality over prior work that applies illumination shifts instantaneously. Our technique can also be combined with prior work on luminance dimming to reduce display power by 31% with no statistical loss of perceptual quality.
To that end, we propose a novel psychophysical paradigm that models how human adaptation state changes with the scene illuminant. We exploit this model to compute an optimal illuminant shift trajectory, controlling the rate and extent of illumination change, to reduce display power under a given perceptual loss budget. Our technique significantly improves the perceptual quality over prior work that applies illumination shifts instantaneously. Our technique can also be combined with prior work on luminance dimming to reduce display power by 31% with no statistical loss of perceptual quality.
Technical Communications


DescriptionWe introduce a computational framework for origami simulation using bilinear solid-shell elements, modeling crease folding via director vectors and enabling straight- and curved-crease origami validated through benchmarks and demonstrations.
Technical Papers
MODepth: Benchmarking Mobile Multi-frame Monocular Depth Estimation with Optical Image Stabilization
3:34pm - 3:45pm HKT Wednesday, 17 December 2025 Meeting Room S421, Level 4

DescriptionThis paper presents MODepth, a multi-frame monocular depth estimation system based on the controlled motion of an optical image stabilization (OIS) module. By actively injecting acoustic signals, we induce regular translational movements of the OIS lens, resulting in controllable camera pose changes and simplifying inter-frame pose estimation. Leveraging multi-frame images captured under OIS-controlled lens movements, we design a high-precision depth estimation network, MODNet, and introduce the principal point offset estimation module and pose estimation modules to fully exploit geometric information across frames. To validate the effectiveness of our approach, we collect a new dataset MODdata with 1100 samples in nearly 220 indoor scenarios and benchmark our model as an OIS-based multi-frame depth estimation method, comparing it to ground truth obtained from a depth sensor and other state-of-the-art monocular depth estimation algorithms. Our method achieves competitive or superior performance compared to fully supervised baselines, reaching an RMSE of 0.439, which outperforms all evaluated methods, demonstrating that self-supervised fine-tuning with OIS-induced parallax is a viable alternative to ground-truth supervision.
Invited Poster
Poster






DescriptionWe constructed FemaleSaanenGoat, the first real-world 3D dataset of Saanen dairy goats, comprising synchronized eight-view RGBD videos of 55 goats across different age groups. Leveraging multi-view DynamicFusion technology, we achieved high-precision 3D reconstruction that captures detailed morphological characteristics. Building upon this foundation, we developed a parametric SaanenGoat model that enables automatic measurement of six key body dimension indicators—includingbody length, height, chest width, chest girth, hip width, and hip height—from single-view inputs. This framework provides an efficient solution for automated phenotypic assessment in dairy goat breeding, offering significant advantages over traditional manual measurement methods for livestock management and breeding programs.
XR






DescriptionMoreOverleaf is an MR-enhanced collaborative writing system that extends traditional document editing with 3D comment boxes, revision playback, and avatar-based presence. By overlaying immersive cues around a physical monitor, it improves clarity, authorship awareness, and social presence in asynchronous and distributed writing workflows.
Art Gallery






Description"Morphology," is an exhibition that brings together the distinct artistic practices of Peter Nelson and Tobias Klein. Nelson's work explores the intersection of ink painting and computer graphics, showcasing a fusion of traditional techniques and innovative digital processes along with influences from philosophy and science fiction.On the other hand, Klein discusses "Digital Craft," melding CAD/CAM technologies with cultural narratives and site-specific design. His stone works often articulate a relationship between digital and physical materials, prompting us to consider the changing relationship between nature, technology, and artistic creation.
Technical Communications


DescriptionMoRAL is an interactive, plug-and-play motion restyling framework that decomposes spatio-temporal style modeling into posing and tempo LoRA adapters.
Technical Papers


DescriptionMotion in-betweening is the problem to synthesize movement between keyposes. Traditional research focused primarily on single characters. Extending them to densely interacting characters is highly challenging, as it demands precise spatial-temporal correspondence between the characters to maintain the interaction, while creating natural transitions towards predefined keyposes. In this research, we present a method for long-horizon interaction in-betweening that enables two characters to engage and respond to one another naturally.
To effectively represent and synthesize interactions, we propose a novel solution called Cross-Space In-Betweening, which models the interactions of each character across different conditioning representation spaces.
We further observe that the significantly increased constraints in interacting characters heavily limit the solution space, leading to degraded motion quality and diminished interaction over time.
To enable long-horizon synthesis, we present two solutions to maintain long-term interaction and motion quality, thereby keeping synthesis in the stable region of the solution space.
We first sustain interaction quality by identifying periodic interaction patterns through adversarial learning.
We further maintain the motion quality by learning to refine the drifted latent space and prevent pose error accumulation.
We demonstrate that our approach produces realistic, controllable, and long-horizon in-between motions of two characters with dynamic boxing and dancing actions across multiple keyposes, supported by extensive quantitative evaluations and user studies.
To effectively represent and synthesize interactions, we propose a novel solution called Cross-Space In-Betweening, which models the interactions of each character across different conditioning representation spaces.
We further observe that the significantly increased constraints in interacting characters heavily limit the solution space, leading to degraded motion quality and diminished interaction over time.
To enable long-horizon synthesis, we present two solutions to maintain long-term interaction and motion quality, thereby keeping synthesis in the stable region of the solution space.
We first sustain interaction quality by identifying periodic interaction patterns through adversarial learning.
We further maintain the motion quality by learning to refine the drifted latent space and prevent pose error accumulation.
We demonstrate that our approach produces realistic, controllable, and long-horizon in-between motions of two characters with dynamic boxing and dancing actions across multiple keyposes, supported by extensive quantitative evaluations and user studies.
Technical Communications


DescriptionVR experiments showed oscillatory motion (5-8 rad/s) improved peripheral hue discrimination by 40-43% versus static conditions. Findings enable motion-enhanced rendering for VR interfaces requiring accurate peripheral color perception.
Technical Papers


DescriptionThis work studies the challenge of transfer animations between characters whose skeletal topologies differ substantially. While many techniques have advanced retargeting techniques in decades, transfer motions across diverse topologies remains less-explored. The primary obstacle lies in the inherent topological inconsistency between source and target skeletons, which restricts the establishment of straightforward one-to-one bone correspondences. Besides, the current lack of large-scale paired motion datasets spanning different topological structures severely constrains the development of data-driven approaches. To address these limitations, we introduce Motion2Motion, a novel, training-free framework. Simply yet effectively, Motion2Motion works with only one or a few example motions on the target skeleton, by accessing a sparse set of bone correspondences between the source and target skeletons. Through comprehensive qualitative and quantitative evaluations, we demonstrate that Motion2Motion achieves efficient and reliable performance in both similar-skeleton and cross-species skeleton transfer scenarios. The practical utility of our approach is further evidenced by its successful integration in downstream applications and user interfaces, highlighting its potential for industrial applications. Code and data are available at https://lhchen.top/Motion2Motion.
Poster






DescriptionMourning Skins is a speculative visualization using generative AI to explore climate change through imagined species fur transformations presented as time lapse videos
Technical Communications


DescriptionA real-time multi-user gaze interaction system for 360° 3D visualization enables low-latency, marker-free collaboration using binocular eye tracking and portable processing, demonstrated in interactive museum and educational immersive environments.
Technical Papers


DescriptionSimulating the interactions between fluids and porous media has attracted significant attention in computer graphics. A key challenge in this domain is modeling the Poro-Elasto-Capillary (PEC) coupling effect which describes the intricate interplay of three physical phenomena in soft porous materials: pore-structure evolution, elastic deformation, and wetting driven by capillary pressure. These phenomena collectively govern dynamic behavior such as the softening and fracturing of biscuits upon water absorption or the swelling of cellulose sponges due to liquid infiltration. Most existing simulation methods model porous media either as static grids or as solid particles with augmented water content attributes, failing to capture the full spectrum of PEC-driven effects due to the lack of physical modeling for elasticity, dynamic porosity changes, and capillary interactions. We propose a multiphase particle-based framework to holistically simulate PEC coupling effects with porous media. We develop a physics-driven model that captures elasticity and dynamic pore-structure evolution under capillary action, enabling realistic simulation of softening and swelling. We derive a saturation-aware pressure Poisson equation to enforce fluid incompressibility within and around the porous medium, ensuring accurate capillary-driven flow while preserving mass and momentum. Finally, we propose a representative elementary volume-based formulation to unify the modeling of homogeneous macro-porous media and cavity-embedded structures, enhancing the representation of pore-scale PEC effects. Comparisons with prior work and real footage show the advantages of our approach in achieving visually realistic fluid-porous media interactions.
Technical Papers


DescriptionRecent breakthroughs in video generation, powered by large-scale datasets and diffusion techniques, have shown that video diffusion models can function as implicit 4D novel view synthesizers. Nevertheless, current methods primarily concentrate on redirecting camera trajectory within the front view while struggling to generate 360-degree viewpoint changes. In this paper, we focus on human-centric subdomain and present MV-Performer, an innovative framework for creating synchronized novel view videos from monocular full-body captures. To achieve a 360-degree synthesis, we extensively leverage the MVHumanNet dataset and incorporate an informative condition signal. Specifically, we use the camera-dependent normal maps rendered from oriented partial point clouds, which effectively alleviate the ambiguity between seen and unseen observations. To maintain synchronization in the generated videos, we propose a multi-view human-centric video diffusion model that fuses information from the reference video, partial rendering, and different viewpoints. Additionally, we provide a robust inference procedure for in-the-wild video cases, which greatly mitigates the artifacts induced by imperfect monocular depth estimation. Extensive experiments on three datasets demonstrate our MV-Performer’s state-of-the-art effectiveness and robustness, setting a strong model for human-centric 4D novel
view synthesis.
view synthesis.
Technical Papers


DescriptionDigital human avatars aim to simulate the dynamic appearance of humans in virtual environments, enabling immersive experiences across gaming, film, virtual reality, and more. However, the conventional process for creating and animating photorealistic human avatars is expensive and time-consuming, requiring large camera capture rigs and significant manual effort from professional 3D artists. With the advent of capable image and video generation models, recent methods enable automatic rendering of realistic animated avatars from a single casually captured reference image of a target subject. While these techniques significantly lower barriers to avatar creation and offer compelling realism, they lack constraints provided by multi-view information or an explicit 3D representation. So, image quality and realism degrade when rendered from viewpoints that deviate strongly from the reference image. Here, we build a video model that generates animatable multi-view videos of digital humans based on a single reference image and target expressions. Our model, MVP4D, is based on a state-of-the-art pre-trained video diffusion model and generates hundreds of frames simultaneously from viewpoints varying by up to 360 degrees around a target subject. We show how to distill the outputs of this model into a 4D avatar that can be rendered in real-time. Our approach significantly improves the realism, temporal consistency, and 3D consistency of generated avatars compared to previous methods.
XR






DescriptionThis project provides a system for alleviating short-term negative emotions such as fear and anxiety. Through NPC-guided dialogues, users can generate visual representations of their negative imagery in real-time, progressively weaken these images, and ultimately create personalized guardian symbols to enhance positive psychological reinforcement.
Birds of a Feather






DescriptionThis Birds of a Feather gathers artists, curators, researchers, and technologists to exchange art-led approaches to: how machines signal intent and legibility, how agency and authorship are negotiated in human-machine co-creation, and how ethics and audience care shape exhibition making. We’ll share short provocations and open the floor to a structured conversation on curatorial strategies, commissioning frameworks, and the infrastructures (institutions, platforms, communities) that sustain techno-social practices across the region.
Art Papers



DescriptionNanlai Zhi Hua (Flowers from the South) is a data-driven visualization project that explores the narrative geographies and affective landscapes of Hong Kong’s Southbound Literati, a group of writers who migrated from Mainland China after 1949. Combining methods from literary mapping, spatial theory, and affective design, the project visualizes both the authors’ lived geographies and locations referenced in their literary works to examine themes of displacement, memory, and cultural identity. Drawing on texts from 25 authors—including Jin Yong and Eileen Chang—the study translates narrative motifs into poetic “data flowers” through methods such as temporal juxtaposition and color-time flattening. Each flower metaphorically represents an author’s creative journey, capturing the interplay between biography, narrative, and spatial memory.
The three-stage creation process involves compiling biographical and narrative data, applying visual encoding, and rendering the results using tools such as Gephi, Blender, and Houdini. Network graphs, spatial layouts, and visual metaphors collectively reveal how spatial nostalgia and emotional landmarks structure diasporic texts. These visualizations were exhibited through an installation using projection mapping and dynamic displays to foster public engagement with literary space.
By emphasizing interpretive flexibility over analytical finality, the project offers a prototype for humanities-centered visualization. It demonstrates how interdisciplinary techniques can visually uncover latent narrative structures and emotional patterns in literary geography. The work contributes to spatial literary studies and showcases how data visualization can support new modes of scholarly and public interpretation in digital humanities contexts.
The three-stage creation process involves compiling biographical and narrative data, applying visual encoding, and rendering the results using tools such as Gephi, Blender, and Houdini. Network graphs, spatial layouts, and visual metaphors collectively reveal how spatial nostalgia and emotional landmarks structure diasporic texts. These visualizations were exhibited through an installation using projection mapping and dynamic displays to foster public engagement with literary space.
By emphasizing interpretive flexibility over analytical finality, the project offers a prototype for humanities-centered visualization. It demonstrates how interdisciplinary techniques can visually uncover latent narrative structures and emotional patterns in literary geography. The work contributes to spatial literary studies and showcases how data visualization can support new modes of scholarly and public interpretation in digital humanities contexts.
Technical Papers


DescriptionDenoising diffusion models excel at generating high-quality images conditioned on text prompts, yet their effectiveness heavily relies on careful guidance during the sampling process.
Classifier-Free Guidance (CFG) provides a widely used mechanism for steering generation by setting the guidance scale, which balances image quality and prompt alignment.
However, the choice of the guidance scale has a critical impact on the convergence toward a visually appealing and prompt-adherent image.
In this work, we propose an annealing guidance scheduler which dynamically adjusts the guidance scale over time based on the conditional noisy signal.
By learning a scheduling policy, our method addresses the temperamental behavior of CFG.
Empirical results demonstrate that our guidance scheduler significantly enhances image quality and alignment with the text prompt, advancing the performance of text-to-image generation.
Notably, our novel scheduler requires no additional activations or memory consumption, and can seamlessly replace the common classifier-free guidance, offering an improved trade-off between prompt alignment and quality.
Classifier-Free Guidance (CFG) provides a widely used mechanism for steering generation by setting the guidance scale, which balances image quality and prompt alignment.
However, the choice of the guidance scale has a critical impact on the convergence toward a visually appealing and prompt-adherent image.
In this work, we propose an annealing guidance scheduler which dynamically adjusts the guidance scale over time based on the conditional noisy signal.
By learning a scheduling policy, our method addresses the temperamental behavior of CFG.
Empirical results demonstrate that our guidance scheduler significantly enhances image quality and alignment with the text prompt, advancing the performance of text-to-image generation.
Notably, our novel scheduler requires no additional activations or memory consumption, and can seamlessly replace the common classifier-free guidance, offering an improved trade-off between prompt alignment and quality.
Technical Papers


DescriptionLightweight, mesh-level models of knit fabric behavior are useful for both interactive pattern editing and initialization of yarn-level simulations. However, existing mesh-level simulation methods abstract knitting as a homogeneous material, which prevents them from capturing more complicated mixed structures. Furthermore, these methods require different simulation parameters depending on the knit pattern, or arrangement of stitches within the knit. Thus fitting these parameters to physical examples must be done for each new pattern, even if it uses the same types of stitches. To address this, we observe that physical behavior of a stitch is determined not only by its individual structure but also by the stitch types that surround it. In our work, we extend the stitch mesh model to allow for neighbor-aware material properties at the stitch level. Using structural analysis of stitch connections, we derive a finite set of four-way kernels that combine to create general knit-purl patterns for relaxation. From this, we generate a set of reference patterns that can be measured to infer the rest-lengths of the kernels using a linear model. After knitting and measuring these reference patterns, we used the derived kernel rest lengths to run relaxation on our stitch mesh models with mixtures of knits and purls that we then validated against physical examples. Our results show that the 4 neighbors of each stitch is sufficient to account for much of the neighborhood-dependent deformation, while remaining simple enough to directly fit to measured data with a set of 11 basis swatches. This allows our relaxation method to efficiently estimate the rest shape of mixtures of knit-purl patterns, which enables fast knit fabric preview and more accurate yarn-level simulation.
Technical Papers


DescriptionRecent advances in neural rendering have extensively explored modeling
the radiance fields with neural representations, while overlooking the under-
lying mechanisms for producing various lighting effects, and consequently
leading to limited adaptability to dynamic scenes. These lighting effects,
including highlights, shadows, and indirect lighting, are usually computed
using physically-based rendering methods like path tracing, which can be
computationally prohibitive for complex indoor luminaires. Although sev-
eral recent studies have attempted to model global illumination effects with
neural representations, they commonly suffer from long training times or
poor generalizability to novel scenes. In light of these challenges, this work
presents a novel neural lighting function generation model that can syn-
thesize diverse lighting effects in real time for unseen dynamic scenes and
complex indoor luminaires, with results comparable to state-of-the-art ren-
dering pipelines. Our model specifically consists of two stages. In the first
stage, multi-view observation images of the luminaire are captured and
then used to encode a compact, scene-independent 3D neural lighting field.
In the second stage, light information is sampled from the neural lighting
field and combined with the G-buffers and shadow clues to produce the
shading results. In parallel, we leverage a state-of-the-art generative model
together with our HDR Lift module to generate an HDR 3D Gaussian representation of the luminaire.In our experiments, the model trained on a dataset of 10,000 modern indoor scenes demonstrates strong generalizability, high efficiency, and visually convincing results across a wide range of test scenes, highlighting its potential as a practical and flexible solution for high-fidelity, real-time neural indoor rendering.
the radiance fields with neural representations, while overlooking the under-
lying mechanisms for producing various lighting effects, and consequently
leading to limited adaptability to dynamic scenes. These lighting effects,
including highlights, shadows, and indirect lighting, are usually computed
using physically-based rendering methods like path tracing, which can be
computationally prohibitive for complex indoor luminaires. Although sev-
eral recent studies have attempted to model global illumination effects with
neural representations, they commonly suffer from long training times or
poor generalizability to novel scenes. In light of these challenges, this work
presents a novel neural lighting function generation model that can syn-
thesize diverse lighting effects in real time for unseen dynamic scenes and
complex indoor luminaires, with results comparable to state-of-the-art ren-
dering pipelines. Our model specifically consists of two stages. In the first
stage, multi-view observation images of the luminaire are captured and
then used to encode a compact, scene-independent 3D neural lighting field.
In the second stage, light information is sampled from the neural lighting
field and combined with the G-buffers and shadow clues to produce the
shading results. In parallel, we leverage a state-of-the-art generative model
together with our HDR Lift module to generate an HDR 3D Gaussian representation of the luminaire.In our experiments, the model trained on a dataset of 10,000 modern indoor scenes demonstrates strong generalizability, high efficiency, and visually convincing results across a wide range of test scenes, highlighting its potential as a practical and flexible solution for high-fidelity, real-time neural indoor rendering.
Technical Papers


DescriptionWe introduce NESIs, a compact representation of detailed 3D shapes as intersections of neural explicit height-field surfaces (HFs). NESI provides more accurate approximations of the input than state-of-the-art alternatives with the same parameter count, while efficiently supporting processing operations such as occupancy queries and parametric access.
Technical Papers


DescriptionRepresenting and rendering dynamic scenes with complex motions remains challenging in computer vision and graphics. Recent dynamic view synthesis methods achieve high-quality rendering but often produce physically implausible motions. We introduce NeHaD, a neural deformation field for dynamic Gaussian Splatting governed by Hamiltonian mechanics. Our key observation is that existing methods using MLPs to predict deformation fields introduce inevitable biases, resulting in unnatural dynamics. By incorporating physics priors, we achieve robust and realistic dynamic scene rendering. Hamiltonian mechanics provides an ideal framework for modeling Gaussian deformation fields due to their shared phase-space structure, where primitives evolve along energy-conserving trajectories. We employ Hamiltonian neural networks to implicitly learn underlying physical laws governing deformation. Meanwhile, we introduce Boltzmann equilibrium decomposition, an energy-aware mechanism that adaptively separates static and dynamic Gaussians based on their spatial-temporal energy states for flexible rendering. To handle real-world dissipation, we employ second-order symplectic integration and local rigidity regularization as physics-informed constraints for robust dynamics modeling. Additionally, we extend NeHaD to adaptive streaming through scale-aware mipmapping and progressive optimization. Extensive experiments demonstrate that NeHaD achieves physically plausible results with a rendering quality-efficiency trade-off. To our knowledge, this is the first exploration leveraging Hamiltonian mechanics for neural Gaussian deformation, enabling physically realistic dynamic scene rendering with streaming capabilities.
Technical Papers


DescriptionWe integrate smoothing B-splines into a standard differentiable vector graphics (DiffVG) pipeline through linear mapping, and show how this can be used to generate smooth and arbitrarily long paths within image-based deep learning systems. We take advantage of derivative-based smoothing costs for parametric control of fidelity vs. simplicity tradeoffs, while also enabling stylization control in geometric and image spaces. The proposed pipeline is compatible with recent vector graphics generation and vectorization methods.
We demonstrate the versatility of our approach with four applications aimed at the generation of stylized vector graphics: stylized space-filling path generation, stroke-based image abstraction, closed-area image abstraction, and stylized text generation.
We demonstrate the versatility of our approach with four applications aimed at the generation of stylized vector graphics: stylized space-filling path generation, stroke-based image abstraction, closed-area image abstraction, and stylized text generation.
Technical Papers


DescriptionWe propose mesh-free fluid simulations that exploit a kinematic neural basis for velocity fields represented by an MLP. We design a set of losses that ensures that these neural bases approximate fundamental physical properties such as orthogonality, divergence-free, boundary alignment, and smoothness. Our neural bases can then be used to fit an input sketch of a flow, which will inherit the same fundamental properties from the bases. We then can animate such flow in real-time using standard time integrators. Our neural bases can accommodate different domains, moving boundaries, and naturally extend to three dimensions.
Art Gallery






DescriptionNeural MONOBLOC Black is a series of 8 furniture pieces generated directly in 3D with a custom fine-tuned (dreambooth) prolificDreamer model and fabricated in an artisanal way with 100% upcycled wood. It expresses the notorious 'Janus problem' of non-3D-aware text-to-3D generative models as a continuous form of becoming and mirroring.
Technical Papers


DescriptionNeural implicit representation, the parameterization of a continuous distance function as a Multi-Layer Perceptron (MLP), has emerged as a promising lead in tackling surface reconstruction from unoriented point clouds. In the presence of noise, however, its lack of explicit neighborhood connectivity makes sharp edges identification particularly challenging, hence preventing the separation of smoothing and sharpening operations, as is achievable with its discrete counterparts. In this work, we propose to tackle this challenge with an auxiliary field, the \emph{octahedral field}. We observe that both smoothness and sharp features in the distance field can be equivalently described by the smoothness in octahedral space. Therefore, by aligning and smoothing an octahedral field alongside the implicit geometry, our method behaves analogously to bilateral filtering, resulting in a smooth reconstruction while preserving sharp edges. Despite being operated purely pointwise, our method outperforms various traditional and neural implicit fitting approaches across extensive experiments, and is very competitive with methods that require normals and data priors. Code and data of our work are available at: https://github.com/Ankbzpx/frame-field.
Technical Communications


DescriptionWe present a Neural Radiance Cache (NRC) integrated into the global illumination and reflection passes of a hybrid renderer targeting mobile devices.
Technical Papers


Description3D Gaussian Splatting (3DGS) has emerged as a leading approach for high-quality novel view synthesis, with numerous variants extending its applicability to a broad spectrum of 3D and 4D scene reconstruction tasks. Despite its success, the representational capacity of 3DGS remains limited by the use of 3D Gaussian kernels to model local variations. Recent works have proposed to augment 3DGS with additional per-primitive capacity, such as per-splat textures, to enhance its expressiveness. However, these per-splat texture approaches primarily target dense novel view synthesis with a reduced number of Gaussian primitives, and their effectiveness tends to diminish when applied to more general reconstruction scenarios. In this paper, we aim to achieve concrete performance improvement over state-of-the-art 3DGS variants across a wide range of reconstruction tasks, including novel view synthesis, geometry and dynamic reconstruction, under both sparse and dense input settings. To this end, we introduce Neural Texture Splatting (NTS). At the core of our approach is a global neural field (represented as a hybrid of a tri-plane and a neural decoder) that predicts local appearance and geometric texture fields for each primitive. By leveraging this shared global representation that models local texture fields across primitives, we significantly reduce model size and facilitate efficient global information exchange, demonstrating strong generalization across tasks. Furthermore, our neural modeling of local texture fields introduces expressive view- and time-dependent effects, a critical aspect that existing methods fail to account for. Extensive experiments show that Neural Texture Splatting consistently improves models and achieves state-of-the-art results across multiple benchmarks.
Technical Papers


DescriptionPoint clouds are widely used representations of 3D data, but determining the visibility of points from a given viewpoint remains a challenging problem due to their sparse nature and lack of explicit connectivity. Traditional methods, such as Hidden Point Removal (HPR), face limitations in computational efficiency, robustness to noise, and handling concave regions or low-density point clouds. In this paper, we propose a novel approach to visibility determination in point clouds by formulating it as a binary classification task. The core of our network consists of a 3D U-Net that extracts view-independent point-wise features and a shared multi-layer perceptron (MLP) that predicts point visibility using the extracted features and view direction as inputs. The network is trained end-to-end with ground-truth visibility labels generated from rendered 3D models. Our method significantly outperforms HPR in both accuracy and computational efficiency, achieving up to 126 times speedup on large point clouds. Additionally, our network demonstrates robustness to noise and varying point cloud densities and generalizes well to unseen shapes. We validate the effectiveness of our approach through extensive experiments on the ShapeNet and real-world datasets, showing substantial improvements in visibility accuracy. We also demonstrate the versatility of our method in various applications, including point cloud visualization, surface reconstruction, normal estimation, shadow rendering, and viewpoint optimization.
Technical Papers


DescriptionReal-time visibility determination in expansive or dynamically changing environments has long posed a significant challenge in computer graphics. Existing techniques are computationally expensive and often applied as a precomputation step on a static scene. We present NeuralPVS, the first deep-learning approach for visibility computation that efficiently determines from-region visibility in a large scene, running at approximately 100 Hz processing with less than $1\%$ missing geometry. This approach is possible by using a neural network operating on a voxelized representation of the scene. The network's performance is achieved by combining sparse convolution with a 3D volume-preserving interleaving for data compression. Moreover, we introduce a novel repulsive visibility loss that can effectively guide the network to converge to the correct data distribution. This loss provides enhanced robustness and generalization to unseen scenes. Our results demonstrate that NeuralPVS outperforms existing methods in terms of both accuracy and efficiency, making it a promising solution for real-time visibility computation.
Technical Papers


DescriptionNeural implicit shape representation has drawn significant attention in recent years due to its continuity, differentiability, and topological flexibility. However, directly modeling the shape of neural implicit fields, especially the neural signed distance function (SDF), with sparse geometric control is still a challenging task. While 3D curve networks can provide intuitive control over explicit surfaces, the sparsity and varied topology of these networks introduce ambiguity in surface shape interpolation and present challenges in mesh layout design under curve constraints. Consequently, achieving reasonable surfacing from curve networks has long been a challenge in mesh modeling. In this paper, we propose NeuVAS, a curvature-based approach
to solve the neural SDF under curve network constraints. Leveraging the differentiability of neural shape representations, we introduce a smooth term to regularize the zero-level surface of the SDF, providing dense control over shape interpolation. Typically, a reasonable surface interpolated from a curve network consists of piecewise smooth surface patches that are 𝐶0 continuous at curve constraints. However, encoding such a shape using a neural SDF poses significant challenges. To construct piecewise smoothness on neural SDFs, we minimize an optional smooth term based on curvature in the space between the curves while relaxing this constraint near feature curves. Moreover, our method can accommodate either structured curve
networks or oriented point clouds as input constraints, making it applicable to a broad range of scenarios. A comprehensive comparison with existing state-of-the-art methods demonstrates the significant advantages of our approach in surfacing curve networks.
to solve the neural SDF under curve network constraints. Leveraging the differentiability of neural shape representations, we introduce a smooth term to regularize the zero-level surface of the SDF, providing dense control over shape interpolation. Typically, a reasonable surface interpolated from a curve network consists of piecewise smooth surface patches that are 𝐶0 continuous at curve constraints. However, encoding such a shape using a neural SDF poses significant challenges. To construct piecewise smoothness on neural SDFs, we minimize an optional smooth term based on curvature in the space between the curves while relaxing this constraint near feature curves. Moreover, our method can accommodate either structured curve
networks or oriented point clouds as input constraints, making it applicable to a broad range of scenarios. A comprehensive comparison with existing state-of-the-art methods demonstrates the significant advantages of our approach in surfacing curve networks.
Computer Animation Festival






DescriptionNine Awaken is a 20-minute, 100% AI-generated animated short film, created entirely without 3D modeling or conventional CGI techniques. Supported by CCIDAHK under the Future Animation 2 Program by the Hong Kong Digital Entertainment Association (HKDEA), it stands as a pioneering work in AI-driven cinematic storytelling.
Blending human creativity with machine intelligence, the script was written by a professional screenwriter, and the storyboard was hand-drawn by the director. Real actors’ performances were used to drive AI character animation, achieving genuine emotional expression. The music is composed by AI, while the sound design and final mix were completed by a professional sound designer — showcasing a seamless collaboration between human artistry and AI innovation.
Set in a futuristic space station inspired by retro Hong Kong aesthetics, the film crafts a distinctive visual identity that merges nostalgia with a bold sci-fi vision.
Achievements & Innovation
• Participated in the Annecy International Animation Film Festival 2025, featuring over 200 fully AI-generated shots
• Winner – Best AI Short Film, EyeCatcher International Short Film Awards
• Winner – Jury’s Choice, Future Animation Program
• Winner – ACFM InnoAsia AI Film International Summit, 30th Busan International Film Festival 2025
Backed by a 20-year award-winning animation studio, our team’s proprietary AI development achieved a 300–500% improvement in production quality within comparable budgets.
Nine Awaken redefines what’s possible in animated filmmaking — demonstrating that when guided by human storytelling, AI can enhance cinematic art with emotional depth, visual beauty, and technical innovation.
Blending human creativity with machine intelligence, the script was written by a professional screenwriter, and the storyboard was hand-drawn by the director. Real actors’ performances were used to drive AI character animation, achieving genuine emotional expression. The music is composed by AI, while the sound design and final mix were completed by a professional sound designer — showcasing a seamless collaboration between human artistry and AI innovation.
Set in a futuristic space station inspired by retro Hong Kong aesthetics, the film crafts a distinctive visual identity that merges nostalgia with a bold sci-fi vision.
Achievements & Innovation
• Participated in the Annecy International Animation Film Festival 2025, featuring over 200 fully AI-generated shots
• Winner – Best AI Short Film, EyeCatcher International Short Film Awards
• Winner – Jury’s Choice, Future Animation Program
• Winner – ACFM InnoAsia AI Film International Summit, 30th Busan International Film Festival 2025
Backed by a 20-year award-winning animation studio, our team’s proprietary AI development achieved a 300–500% improvement in production quality within comparable budgets.
Nine Awaken redefines what’s possible in animated filmmaking — demonstrating that when guided by human storytelling, AI can enhance cinematic art with emotional depth, visual beauty, and technical innovation.
Technical Papers


DescriptionThe Noise2Noise method allows for training machine learning-based denoisers with pairs of input and target images where both the input and target can be noisy. This removes the need for training with clean target images, which can be difficult to obtain. However, Noise2Noise training has a major limitation: nonlinear functions applied to the noisy targets will skew the results. This bias occurs because the nonlinearity makes the expected value of the noisy targets different from the clean target image. Since nonlinear functions are common in image processing, avoiding them limits the types of preprocessing that can be performed on the noisy targets. Our main insight is that certain nonlinear functions can be applied to the noisy targets without adding significant bias to the results. We develop a theoretical framework for analyzing the effects of these nonlinearities, and describe a class of nonlinear functions with minimal bias.
We demonstrate our method on the denoising of high dynamic range (HDR) images produced by Monte Carlo rendering, where generating high-sample count reference images can be prohibitively expensive. Noise2Noise training can have trouble with HDR images, where the training process is overwhelmed by outliers and performs poorly. We consider a commonly used method of addressing these training issues: applying a nonlinear tone mapping function to the model output and target images to reduce their dynamic range. This method was previously thought to be incompatible with Noise2Noise training because of the nonlinearities involved. We show that certain combinations of loss functions and tone mapping functions can reduce the effect of outliers while introducing minimal bias. We apply our method to an existing machine learning-based Monte Carlo denoiser, where the original implementation was trained with high-sample count reference images. Our results approach those of the original implementation, but are produced using only noisy training data.
We demonstrate our method on the denoising of high dynamic range (HDR) images produced by Monte Carlo rendering, where generating high-sample count reference images can be prohibitively expensive. Noise2Noise training can have trouble with HDR images, where the training process is overwhelmed by outliers and performs poorly. We consider a commonly used method of addressing these training issues: applying a nonlinear tone mapping function to the model output and target images to reduce their dynamic range. This method was previously thought to be incompatible with Noise2Noise training because of the nonlinearities involved. We show that certain combinations of loss functions and tone mapping functions can reduce the effect of outliers while introducing minimal bias. We apply our method to an existing machine learning-based Monte Carlo denoiser, where the original implementation was trained with high-sample count reference images. Our results approach those of the original implementation, but are produced using only noisy training data.
Art Papers



DescriptionThe development of innovative virtual reality (VR) methodologies represents a critical imperative in an era where VR reconfigures human perception and facilitates complex cultural expression. This paper explores the intersection of feminism and VR creation, presenting an interactive narrative experience. "Nora" recounts the story of "Na" , a rural woman’s journey of self-growth. Guided by feminist theory, the work is innovatively designed in terms of content, perspective, and interactivity. Additionally, user feedback was analyzed to inform future improvements. "Nora" explores how feminist theory influences various aspects of VR creation, to provide new insights into the transformative role of feminism in VR artistic practice.
Poster






DescriptionThis study presents the active force feedback glove and its control algorithm, successfully guiding users' fingers into a handshake posture, allowing users to experience realistic virtual handshakes in XR environments.
Technical Papers


DescriptionThe realistic simulation of sand, soil, powders, rubble piles, and large collections of rigid bodies is a common and important problem in the fields of computer graphics, computational physics, and engineering. Direct simulation these individual bodies quickly becomes expensive, so we often approximate the entire group as a continuous material that can be more easily computed using tools for solving partial differential equations, like the material point method (MPM). In this paper, we present a method for automatically extracting continuum material properties from a collection of rigid bodies. We use numerical homogenization with periodic boundary conditions to simulate an effectively infinite number of rigid bodies in contact. We then record the effective stress-strain relationships from these simulations and convert them into elastic properties and yield criteria for the continuum simulations.
Our experiments validate existing theoretical models like the Mohr-Coulomb yield surface by extracting material behaviors from a collection of spheres in contact. We further generalize these existing models to more exotic materials derived from diverse and non-convex shapes. We observe complicated jamming behaviors from non-convex grains, and we introduce a new material model for materials with extremely high levels of internal friction and cohesion. We simulate these new continuum models using MPM with an improved return mapping technique. The end result is a complete system for turning an input rigid body simulation into an efficient continuum simulation with the same effective mechanical properties.
Our experiments validate existing theoretical models like the Mohr-Coulomb yield surface by extracting material behaviors from a collection of spheres in contact. We further generalize these existing models to more exotic materials derived from diverse and non-convex shapes. We observe complicated jamming behaviors from non-convex grains, and we introduce a new material model for materials with extremely high levels of internal friction and cohesion. We simulate these new continuum models using MPM with an improved return mapping technique. The end result is a complete system for turning an input rigid body simulation into an efficient continuum simulation with the same effective mechanical properties.
Technical Papers


DescriptionShell structures are thin curved surface structures in architectural design that efficiently carry loads through in-plane stresses, rather than relying on bending. The process of determining their shape—called form finding—ensures that shells remain structurally efficient. Some computational methods solve this using discrete meshes, while others approach it through continuum mechanics, where the Airy stress function plays a central role. An Airy stress function is a smooth surface that encodes internal stress distributions in its curvatures. Earlier methods, including work by Miki et al. (2022), applied this approach to mixed tension–compression shells and solved the equilibrium equation using the Variable Projection method (VarPro). In 2024, they further extended the method to design bending-free metal–glass grid shells with flat panels by aligning stress and curvature directions through a bilinear partial differential equation (PDE). However, these methods struggle with complex geometries, particularly where boundaries are topologically disjoint but mechanically interactive. This limitation arises not from the numerical methods themselves but from the Airy stress function, which cannot model stress fields that transmit net forces across such boundaries. To address this, we revive an overlooked supplementary stress function from Schaefer (1953) and Gurtin (1963), which, when combined with the Airy function, can represent any stress state, regardless of boundary complexity. Using the same computational framework, VarPro, we demonstrate that this extension can solve problems involving topologically complex shapes through various examples, including a Stanford bunny grid shell. Our method ensures alignment between conjugate curvature and stress directions, and also aligns the resulting grid layout with the boundary curves, enhancing both structural efficiency and architectural expression. Optionally, one can make the resulting grid strictly orthogonal, resulting in lines of curvature-principal stress alignment. Moreover, we present a variant of VarPro in which the computational efficiency and the memory usage are both significantly improved.
Technical Papers


DescriptionWe introduce a method for composing object-level visual prompts within a text-to-image diffusion model. Our approach addresses the task of generating semantically coherent compositions across diverse scenes and styles, similar to the versatility and expressiveness offered by text prompts. A key challenge in this task is to preserve the identity of the objects depicted in the input visual prompts, while also generating diverse compositions across different images. To address this challenge, we introduce a new KV-mixed cross-attention mechanism, in which keys and values are learned from distinct visual representations. The keys are derived from an encoder with a small bottleneck for layout control, whereas the values come from a larger bottleneck encoder that captures fine-grained appearance details. By mixing keys and values from these complementary sources, our model preserves the identity of the visual prompts while supporting flexible variations in object arrangement, pose, and composition. During inference, we further propose object-level compositional guidance to improve the method's identity preservation and layout correctness. Results show that our technique produces diverse scene compositions that preserve the unique characteristics of each visual prompt, expanding the creative potential of text-to-image generation.
Invited Poster
Poster






DescriptionOctGPT is a novel multiscale autoregressive model for 3D shape generation. It introduces hierarchical serialized octree representation, octree-based transformer with 3D RoPE and token-parallel generation schemes. OctGPT significantly accelerates convergence, achieves performance rivaling or surpassing state-of-the-art diffusion models, and supports text/sketch/image-conditioned generation and scene-level synthesis.
Technical Papers


DescriptionStochastic solvers have emerged as a powerful alternative to traditional discretization-based methods for solving partial differential equations (PDEs), especially in geometry processing and graphics. While off-centered estimators enhance sample reuse in Monte Carlo solvers, they introduce correlation artifacts and bias when Green’s functions are approximated. In this paper, we propose a statistically weighted off-centered Monte Carlo estimator that leverages local similarity filtering to selectively combine samples across neighboring evaluation points. Our method balances bias and variance through a principled weighting strategy that suppresses unreliable estimators. We demonstrate our approach's effectiveness on various PDEs—including screened Poisson equations—and boundary conditions, achieving consistent improvements over existing solvers such as vanilla Walk on Spheres, mean value caching, and boundary value caching. Our method also naturally extends to gradient field estimation and mixed boundary problems.
XR






DescriptionWe developed an XR application that integrates hand-tracked 3D circuit assembly with circuit simulation using SpiceSharp on Meta Quest 3. Sustaining 72 FPS with moderate circuit complexity and consistent performance for over 30 minutes. It enables immersive electronic prototyping and laying groundwork for future MR electronics prototyping.
Technical Papers


DescriptionIn Omnimatte, one aims to decompose a given video into semantically meaningful layers, including the background and individual objects along with their associated effects, such as shadows and reflections. Existing methods often require extensive training or costly self-supervised optimization. In this paper, we present OmnimatteZero, a training-free approach that leverages off-the-shelf pre-trained video diffusion models for omnimatte. It can remove objects from videos, extract individual object layers along with their effects, and composite those objects onto new videos.
These are accomplished by adapting zero-shot image inpainting techniques for video object removal, a task they fail to handle effectively out-of-the-box. We show that self-attention maps capture information about the object and its footprints and use them to inpaint the object's effects, leaving a clean background. Additionally, through simple latent arithmetic, object layers can be isolated and recombined seamlessly with new video layers to produce new videos. Evaluations show that OmnimatteZero not only achieves superior performance in terms of background reconstruction but also sets a new record for the fastest Omnimatte approach, achieving real-time performance with minimal frame runtime.
These are accomplished by adapting zero-shot image inpainting techniques for video object removal, a task they fail to handle effectively out-of-the-box. We show that self-attention maps capture information about the object and its footprints and use them to inpaint the object's effects, leaving a clean background. Additionally, through simple latent arithmetic, object layers can be isolated and recombined seamlessly with new video layers to produce new videos. Evaluations show that OmnimatteZero not only achieves superior performance in terms of background reconstruction but also sets a new record for the fastest Omnimatte approach, achieving real-time performance with minimal frame runtime.
Technical Papers


DescriptionThe creation of 3D assets with explicit, editable part structures is crucial for advancing interactive applications, yet most generative methods produce only monolithic shapes, limiting their utility.
We introduce OmniPart, a novel framework for part-aware 3D object generation designed to achieve high semantic decoupling among components while maintaining robust structural cohesion.
OmniPart uniquely decouples this complex task into two synergistic stages: (1) an autoregressive structure planning module generates a controllable, variable-length sequence of 3D part bounding boxes, critically guided by flexible 2D part masks that allow for intuitive control over part decomposition without requiring direct correspondences or semantic labels; and (2) a spatially-conditioned rectified flow model, efficiently adapted from a pre-trained holistic 3D generator, synthesizes all 3D parts simultaneously and consistently within the planned layout. Our approach supports user-defined part granularity, precise localization, and enables diverse downstream applications.
Extensive experiments demonstrate that OmniPart achieves state-of-the-art performance, paving the way for more interpretable, editable, and versatile 3D content.
We introduce OmniPart, a novel framework for part-aware 3D object generation designed to achieve high semantic decoupling among components while maintaining robust structural cohesion.
OmniPart uniquely decouples this complex task into two synergistic stages: (1) an autoregressive structure planning module generates a controllable, variable-length sequence of 3D part bounding boxes, critically guided by flexible 2D part masks that allow for intuitive control over part decomposition without requiring direct correspondences or semantic labels; and (2) a spatially-conditioned rectified flow model, efficiently adapted from a pre-trained holistic 3D generator, synthesizes all 3D parts simultaneously and consistently within the planned layout. Our approach supports user-defined part granularity, precise localization, and enables diverse downstream applications.
Extensive experiments demonstrate that OmniPart achieves state-of-the-art performance, paving the way for more interpretable, editable, and versatile 3D content.
Art Gallery






DescriptionAn interactive installation of 4,000 vintage switches forming a binary universe. Image, sound, and light respond to switch combinations, revealing the tension between control and the unknown through tangible digital logic.
Technical Papers


DescriptionWe present a computational approach for designing freeform structures that can be rapidly assembled from initially flat configurations by a single string pull. The target structures are decomposed into rigid spatially varied quad tiles that are optimized to approximate the user-provided surface, forming a flat mechanical linkage. Our algorithm then uses a two-step method to find a physically realizable string path that controls only a subset of tiles to smoothly actuate the structure from flat to assembled configuration. We initially compute the minimal subset of tiles that are required to be controlled with the string considering the geometry of the structure and interaction among the tiles. We then find a valid string path through these tiles that minimizes friction, which will assemble the flat linkage into the target 3D structure upon tightening a single string. The resulting designs can be easily manufactured with computational fabrication techniques such as 3D printing, CNC milling, molding, etc. in flat configuration that, in addition to manufacturing, facilitates storage and transportation. We validate our approach by developing a series of physical prototypes and showcasing various application case studies, ranging from medical devices, space shelters, to architectural designs.
Technical Papers


DescriptionDiffusion models have significantly advanced image manipulation techniques, and their ability to generate photorealistic images is beginning to transform retail workflows, particularly in presale visualization. Beyond artistic style transfer, the capability to perform fine-grained visual feature transfer is becoming increasingly important. Embroidery is a textile art form characterized by intricate interplay of diverse stitch patterns and material properties, which poses unique challenges for existing style transfer methods. To explore the customization for such fine-grained features, we propose a novel contrastive learning framework that disentangles fine-grained style and content features with a single reference image, building on the classic concept of image analogy. We first construct an image pair to define the target style, and then adopt a similarity metric based on the decoupled representations of pretrained diffusion models for style-content separation. Subsequently, we propose a two-stage contrastive LoRA modulation technique to capture fine-grained style features. In the first stage, we iteratively update the whole LoRA and the selected style blocks to initially separate style from content. In the second stage, we design a contrastive learning strategy to further decouple style and content through self-knowledge distillation. Finally, we build an inference pipeline to handle image or text inputs with only the style blocks. To evaluate our method on fine-grained style transfer, we build a benchmark for embroidery customization. Our approach surpasses prior methods on this task and further demonstrates strong generalization to three additional domains: artistic style transfer, sketch colorization, and appearance transfer.
Art Papers



DescriptionWhy build yet another virtual reality artwork when VR already promises boundless exploration? We began with a blunt provocation: make the body feel the gap between virtual promise and material fact. Our installation embeds a fully occlusive headset inside a transparent glass maze so constraint remains visible. An adaptive reinforcement-learning (RL) lure, treated as a performing partner—edits salience with gentle invitations rather than commands. Across over ten exhibitions (2017–2025), the piece evolved from onlooker voyeurism to a choreography of agency among audience, machine-performer, and artist-architect. By partially publicizing policy intent on the floor, projections, and rail, what is usually hidden as seamless UX becomes dramatically legible; guidance is experienced as both bodily and social, with collisions reading as shared mischief more than punishment. We contribute an art-led, large-scale architecture that couples predictive models with embodied interaction while retaining ambiguity: visible constraint (glass), public policy signals (floor/projections/rail), and an RL lure that persuades by staging opportunity. Drawing on in-exhibition observations, we report regularities that motivate lightweight, gallery-feasible evaluations without altering the artwork, thereby paving the way for systematic study. More broadly, the work reframes “good guidance” not only as seamlessness but as strategic visibility, an ethico-aesthetic stance on when power should be allowed to show, and the autonomy we trade when an algorithm anticipates our limits.
Educator's Forum



DescriptionThis talk introduces Machine Learning for Artists and Designers, a course at NYU Shanghai that makes machine learning accessible to students without technical backgrounds. Working in the browser with p5.js, ml5.js, and custom tools for language, image, and embodied AI, students explore both how these systems work and what they mean culturally. Examples of student projects illustrate how creative practice can engage AI critically, playfully, and reflectively.
Educator's Forum



DescriptionChinese herbal medicine (CHM), as a part of traditional Chinese
medicine, is of great importance due to its remarkble performance
on disease prevention, treatment, and daily healthcare. There exists
a wide variety of CHM. In particular, some have similar shapes.
Consequently, we face dilemmas for CHM identificaon. In traditional
education, educators manually displaying pictures of CHM
aim to insist students know the classes of CHM. This way is apparently
time-consuming and labor-intensive. Knowledge graphs
(KGs) are powerful and modern tools to boost model performance
on various applications such as objective detection, natural language
question-answering, and recommendation systems. However,
existing unimodal KGs impede their potential for powerful performance
improvement in downstream tasks. Nevertheless, knowledge
acquisition is multimodal, including text, image, video and audio
modalities. Thus, multimodal knowledge can accumulate from different
views of textual and visual information. Researchers focus
on multimodal knowledge graphs (MKGs). Currently, KGs used for CHM are predominantly unimodal. Due to lacking the professional
knowledge, students find it difficult to gain a comprehensive
perception. Therefore, we propose the OpenCHM, a MKG-based
education system for CHM, which can support image-text retrieval.
Meanwhile, we construct a MKG which integrates texts and images.
Our MKG can provide a wider view to explore further information
of CHM. The service system is suitable for educational demonstrations
as well as real-world applications. Finally, we design several
MKG-based downstream tasks such as knowledge visualization,
multimodal knowledge retrieval and a question-answering platform
for promoting the development of CHM.
medicine, is of great importance due to its remarkble performance
on disease prevention, treatment, and daily healthcare. There exists
a wide variety of CHM. In particular, some have similar shapes.
Consequently, we face dilemmas for CHM identificaon. In traditional
education, educators manually displaying pictures of CHM
aim to insist students know the classes of CHM. This way is apparently
time-consuming and labor-intensive. Knowledge graphs
(KGs) are powerful and modern tools to boost model performance
on various applications such as objective detection, natural language
question-answering, and recommendation systems. However,
existing unimodal KGs impede their potential for powerful performance
improvement in downstream tasks. Nevertheless, knowledge
acquisition is multimodal, including text, image, video and audio
modalities. Thus, multimodal knowledge can accumulate from different
views of textual and visual information. Researchers focus
on multimodal knowledge graphs (MKGs). Currently, KGs used for CHM are predominantly unimodal. Due to lacking the professional
knowledge, students find it difficult to gain a comprehensive
perception. Therefore, we propose the OpenCHM, a MKG-based
education system for CHM, which can support image-text retrieval.
Meanwhile, we construct a MKG which integrates texts and images.
Our MKG can provide a wider view to explore further information
of CHM. The service system is suitable for educational demonstrations
as well as real-world applications. Finally, we design several
MKG-based downstream tasks such as knowledge visualization,
multimodal knowledge retrieval and a question-answering platform
for promoting the development of CHM.
Art Gallery






DescriptionOrbital Oscillation is a kinetic sculpture where a glass object rolls autonomously on a table, actuated by hidden mechanisms. Concealed tilting systems and real-time sensing create imperceptible motion, exploring time perception (Husserl) and technological poetics (Heidegger). The work dematerializes craft, shifting from physical structures to choreographed temporal experience.
Art Papers



DescriptionOrbital Oscillation is a kinetic sculpture that explores the phenomenology of time through a deceptively simple act: a glass object rolling in slow, deliberate orbits on a round table. The object’s movement is not self-initiated but subtly actuated by a concealed robotic system beneath the table’s surface. Using real-time sensing and feedback control, the table imperceptibly tilts to inject energy into the glass object’s trajectory.
On the one hand, this paper presents the mechatronic and algorithmic design of the system, detailing how sensing, control, and form are orchestrated to create an experience that appears autonomous, fluid, and continuous, yet is precisely choreographed. On the other hand, the conceptual framework of Digital Craft is discussed, placing the work in the context of perception through Husserl, and a critical engagement with technology through Heidegger and Hui. The work invites reflection on how machine intelligence can shape perception without revealing its own presence.
On the one hand, this paper presents the mechatronic and algorithmic design of the system, detailing how sensing, control, and form are orchestrated to create an experience that appears autonomous, fluid, and continuous, yet is precisely choreographed. On the other hand, the conceptual framework of Digital Craft is discussed, placing the work in the context of perception through Husserl, and a critical engagement with technology through Heidegger and Hui. The work invites reflection on how machine intelligence can shape perception without revealing its own presence.
Poster






DescriptionOuroboros Code shows how color, lighting, and spatial design transform nonlinear science fiction into clear, emotionally resonant worlds.
Validated by literature, case studies, and playtests, the researchers present a framework.
Validated by literature, case studies, and playtests, the researchers present a framework.
Educator's Forum



DescriptionThis paper addresses the challenge of limited acting skills in facial animation education, which is further heightened by the advent of realistic digital humans such as Unreal Engine MetaHuman. We introduce a transformative interdisciplinary framework that combines animation technology (Unreal Engine MetaHuman) with psychological theory (Scherer’s appraisal theory) to address this issue. Together with the integration of AI tools in the workflow, such as Microsoft Copilot, Adobe Firefly, and an AI voice changer, the framework supports student creativity, helping students overcome performance anxiety and develop nuanced facial animation performance.
Technical Papers


DescriptionThe detection and computation of the overlap region between two NURBS surfaces, as a special case of the intersection problem, are essential components of CAD systems, directly influencing the robustness of the entire system. Despite their importance, efficient, topologically correct, and numerically robust algorithms for detecting overlap regions remain lacking. To address this issue, we propose an optimization approach for computing the overlap region between two NURBS surfaces within a given error threshold. Based on a bi-level optimization framework, our algorithm first employs cubic B\'ezier simplices to approximate the boundary of the overlap region. The boundary points of the overlap region are computed iteratively, followed by a Delaunay triangulation to establish the boundary topology. Additional refinement of the boundary edge is applied to ensure the topological correctness and maintain the precision of the overlap region within the specified error threshold. Our main contribution lies in the development of a novel and robust algorithm to calculate the boundary of the overlap region. This approach differs from previous overlap computation methods, which seldom account for error thresholds and are difficult to implement in floating-point arithmetic in CAD systems. We demonstrate the robustness and topological accuracy of our method through extensive experiments on a diverse set of complex examples with varying error thresholds.
Technical Papers


DescriptionWe present a method for dynamic 3D reconstruction of deformable objects from casually captured, unposed monocular videos. Unlike existing approaches, our method handles long video sequences featuring substantial object deformation, large-scale camera movement, and limited view coverage that typically challenge conventional systems. Specifically, our approach first trains a personalized, object-centric pose estimation model utilizing a pre-trained image-to-3D diffusion model. This guides the optimization of a deformable 3D Gaussian representation and a neural skinning model, enhanced by a long-term point tracking regularization over the entire input video. By combining diffusion priors and differentiable rendering, our method reconstructs high-fidelity, articulated 3D representations of category-agnostic objects. Extensive qualitative and quantitative results show that our approach is robust and generalizes well across challenging scenarios, highlighting its potential for dynamic scene understanding and 3D content creation.
Technical Communications


DescriptionWe present a novel mosaic technique that embeds historical artworks within individual brush strokes using semantic segmentation and dual-criteria retrieval to create fluid compositions where each stroke contains authentic artworks.
Technical Papers


DescriptionIn this paper, we present PanoDreamer, a novel method for producing a coherent 360° 3D scene from a single input image. Unlike existing methods that generate the scene sequentially, we frame the problem as single-image panorama and depth estimation. Once the coherent panoramic image and its corresponding depth are obtained, the scene can be reconstructed by inpainting the small occluded regions and projecting them into 3D space. Our key contribution is formulating single-image panorama and depth estimation as two optimization tasks and introducing alternating minimization strategies to effectively solve their objectives. We demonstrate that our approach outperforms existing techniques in single-image 360° 3D scene reconstruction in terms of consistency and overall quality.
Poster






DescriptionPaperMythXR fuses real paper-cutting with mixed reality to drive Pangu's myth. A Read-Create-Advance loop and human-AI production pipeline unite book, tools, and MR, boosting immersion, cultural connection, and creative agency.
Poster






DescriptionWe propose a novel part-aware 3D shape retrieval framework that improves fine-grained search accuracy by embedding both whole objects and their segmented parts into a shared metric space.
Technical Papers


DescriptionWe present PartComposer: a framework for part-level concept learning from single-image examples that enables text-to-image diffusion models to compose novel objects from meaningful components. Existing methods either struggle with effectively learning fine-grained concepts or require a large dataset as input. We propose a dynamic data synthesis pipeline generating diverse part compositions to address one-shot data scarcity. Most importantly, we propose to maximize the mutual information between denoised latents and structured concept codes via a concept predictor, enabling direct regulation on concept disentanglement and re-composition supervision. Our method achieves strong disentanglement and controllable composition, outperforming subject and part-level baselines when mixing concepts from the same, or different, object categories.
Art Papers



DescriptionWe present a semantic-feedback framework that treats natural language as a regulatory signal for evolving artificial-life systems. Instead of using prompts to select finished images, text in our system shapes the dynamics of an interactive ecosystem, allowing audiences to cultivate behaviors over time. The framework couples a learned mapping from prompts to simulation parameters with evolutionary search and vision–language evaluation, so user intent modulates both visible outcomes and the underlying generative rules. It supports iterative prompt refinement, multi-agent interaction, and the synthesis of new collective rules from community input. In a user study, participants achieved higher semantic alignment and reported a greater sense of control than with manual tuning, while behaviors remained diverse across generations. As an art-led contribution, the work reframes authoring as participatory cultivation and advances open-ended evolution as a socially distributed, not solely algorithmic, process; as a tool contribution, it offers a practical platform for co-creative generative design.
Technical Papers


DescriptionUV unwrapping flattens 3D surfaces to 2D with minimal distortion, often requiring the complex surface to be decomposed into multiple charts. Although extensively studied, existing UV unwrapping methods frequently struggle with AI-generated or reconstructed meshes, which are typically noisy, bumpy, and poorly conditioned. These methods often produce highly fragmented charts and suboptimal boundaries, introducing artifacts and hindering downstream tasks. We introduce PartUV, a part-based UV unwrapping pipeline that generates significantly fewer, part-aligned charts while maintaining low distortion. Built on top of a recent learning-based part decomposition method PartField, PartUV combines high-level semantic part decomposition with novel geometric heuristics in a top-down recursive framework. It ensures each chart’s distortion remains below a user-specified threshold while minimizing the total number of charts. The pipeline integrates and extends parameterization and packing algorithms, incorporates dedicated handling of non-manifold and degenerate meshes, and is extensively parallelized for efficiency. Evaluated across four diverse datasets—including man-made, CAD, AI-generated, and Common Shapes—PartUV outperforms existing tools and recent neural methods in chart count and seam length, achieves comparable distortion, exhibits high success rates on challenging meshes, and enables new applications like part-specific multi-tiles packing.
Art Gallery






DescriptionPassengers is a video installation where orbital imagery becomes intimate airplane windows through AI interpretation, each view accompanied by the imagined consciousness of fictional passengers. The artwork explores how machine vision might reconstruct our relationship to the shared sky, transforming orbital observation into speculative imagination.
Educator's Forum



DescriptionThis talk examines the integration of 3D Gaussian Splatting (3DGS) technology into creative media, design and journalism education programs at undergraduate and postgraduate levels. It presents findings and best practices from multiple educational implementations across different institutional and cultural contexts and approaches. As an emerging spatial reconstruction technique that has rapidly gained attention in computer graphics research and creative industries, 3DGS presents unique opportunities and challenges for educators seeking to incorporate cutting-edge technologies into curricula. This talk draws from direct experience implementing 3DGS education across various formats and cultural contexts, including intensive workshop-based programs, semester-long courses, and professional development sessions, conducted at institutions and conferences worldwide, including Hong Kong, Poland, Japan, New Zealand, Korea, and the United Arab Emirates.
XR






DescriptionPercussive Hemispheres is an XR installation where synchronized steel drum and head tapping makes participants feel their head is resonating like an instrument. Visual hemispheres reinforce the illusion, often accompanied by the audiotactile magnet-head effect. A lab study confirmed significant enhancement of both perceptual phenomena.
Technical Papers


DescriptionIn graphics applications featuring dynamically moving visual targets -- such as film and gaming -- we have to rotate our eyes to follow objects as they move across the screen. Because target motion is often unpredictable and ever-changing, we must rapidly respond to motion cues and adjust eye movements to maintain the target within the fovea, a process known as catch-up. This catch-up behavior reflects how efficiently the eyes react to and compensate for sudden changes in motion, making it a critical indicator for both task performance and the overall visual experience. In this work, we study and measure the eye catch-up performance during visual tracking. In particular, we present a behavioral analysis that predicts users’ reaction latency to abrupt target motion based on target visibility. Our numerical analysis and human subject studies evidence the effectiveness and generalizability. We further show how the catch-up metric can be applied to evaluate video quality, adjust game difficulty, and optimize display configurations for enhanced user performance. We envision this research to create a computational link between human perception and behavioral performance in dynamic graphics contexts.
Technical Papers


DescriptionUnderstanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated which involves repairing broken or incomplete furniture that miss several parts with a toolkit. The primary challenge persists in how to reveal the potential part relations to infer the absent parts from multiple indistinguishable candidates with similar geometries, and complete for well-connected, structurally stable and aesthetically pleasing assemblies. This task necessitates not only specialized knowledge of part composition but, more importantly, an awareness of physical constraints, i.e., connectivity, stability, and symmetry. Neglecting these constraints often results in assemblies that, although visually plausible, are impractical. To address this challenge, we propose PhysFiT, a physical-aware 3D shape understanding framework. This framework is built upon attention-based part relation modeling and incorporates connection modeling, simulation-free stability optimization and symmetric transformation consistency. We evaluate its efficacy on 3D part assembly and 3D assembly completion, a novel assembly task presented in this work. Extensive experiments demonstrate the effectiveness of PhysFiT in constructing geometrically sound and physically compliant assemblies.
Educator's Forum



DescriptionKey physics concepts and principles, such as momentum conservation, are fundamental to the mechanics section of physics education. They play a crucial role in understanding the interactions between objects and the motion laws governing systems. However, since these processes occur instantaneously and are difficult to observe, traditional teaching methods, such as chalkboard explanations, static textbook diagrams, or apparatus demonstrations, often fail to effectively convey the dynamic changes involved. Additionally, these traditional methods cannot effectively demonstrate how these processes comply with physical laws, such as the conservation of momentum. As a result, students struggle to grasp the underlying principles. To address these issues, this paper proposes PhysGest, a framework that transforms static textbook diagrams into physically accurate and interactive videos for physics education. PhysGest captures the instructor's gestures and extracts temporal motion trajectories, from which physical parameters such as velocity and direction are derived. These extracted parameters are then used to animate the static diagrams, generating dynamic videos that align with the instructor's gestures and adhere to physical laws. Therefore, PhysGest overcomes the lack of interaction by converting static diagrams into videos that enhance students' ability to understand physical concepts. Our evaluation results demonstrate that PhysGest significantly enhances students’ self-efficacy, perceived enjoyment, and knowledge retention with improved performance in understanding underlying concepts and principles.
Technical Papers


DescriptionReconstructing physically plausible human motion from monocular videos remains a challenging problem in computer vision and graphics. Existing methods primarily focus on kinematics-based pose estimation, often leading to unrealistic results due to the lack of physical constraints. To address such artifacts, prior methods have typically relied on physics-based post-processing following the initial kinematics-based motion estimation. However, this two-stage design introduces error accumulation, ultimately limiting the overall reconstruction quality. In this paper, we present PhysHMR, a unified framework that directly learns a visual-to-action policy for humanoid control in a physics-based simulator, enabling motion reconstruction that is both physically grounded and visually aligned with the input video. A key component of our approach is the pixel-as-ray strategy, which lifts 2D keypoints into 3D spatial rays and transforms them into global space. These rays are incorporated as policy inputs, providing robust global pose guidance without depending on noisy 3D root predictions. This soft global grounding, combined with local visual features from a pretrained encoder, allows the policy to reason over both detailed pose and global positioning. To overcome the sample inefficiency of reinforcement learning, we further introduce a distillation scheme that transfers motion knowledge from a mocap-trained expert to the vision-conditioned policy, which is then refined using physically motivated reinforcement learning rewards. Extensive experiments demonstrate that PhysHMR produces high-fidelity, physically plausible motion across diverse scenarios, outperforming prior approaches in both visual accuracy and physical realism.
Technical Papers


DescriptionReconstructing metrically accurate humans and their surrounding scenes from a single image is crucial for virtual reality, robotics, and comprehensive 3D scene understanding. However, existing methods struggle with depth ambiguity, occlusions, and physically inconsistent contacts. To address these challenges, we introduce PhySIC, a unified framework for physically plausible Human–Scene Interaction and Contact reconstruction. PhySIC recovers metrically consistent SMPL-X human meshes, dense scene surfaces, and vertex-level contact maps within a shared coordinate frame, all from a single RGB image. Starting from coarse monocular depth and parametric body estimates, PhySIC performs occlusion-aware inpainting, fuses visible depth with unscaled geometry for a robust initial metric scene scaffold, and synthesizes missing support surfaces like floors. A confidence-weighted optimization subsequently refines body pose, camera parameters, and global scale by jointly enforcing depth alignment, contact priors, interpenetration avoidance, and 2D reprojection consistency. Explicit occlusion masking safeguards invisible body regions against implausible configurations. PhySIC is highly efficient, requiring only 9 seconds for a joint human-scene optimization and less than 27 seconds for end to end reconstruction process. Moreover, the framework naturally handles multiple humans, enabling reconstruction of diverse human scene interactions. Empirically, PhySIC substantially outperforms single-image baselines, reducing mean per-vertex scene error from 641 mm to 227 mm, halving the pose-aligned mean per-joint position error (PA-MPJPE) to 42 mm, and improving contact F1-score from 0.09 to 0.51. Qualitative results demonstrate that PhySIC yields realistic foot-floor interactions, natural seating postures, and plausible reconstructions of heavily occluded furniture. By converting a single image into a physically plausible 3D human-scene pair, PhySIC advances accessible and scalable 3D scene understanding. Code and evaluation scripts will be publicly released upon publication.
Art Papers



DescriptionThis paper explores the physical manifestation of generative AI music systems for live performance, focusing on bridging the expressive gap between AI-generated music and audience perception. Through a year-long collaboration with a human performer, we constructed a kinetic sculpture that visualizes the outputs of an AI jam_bot during concerts. The sculpture, powered by ML-based and pattern-driven mapping methodologies, interprets real-time AI musical decisions as expressive movements. Audience feedback indicates increased engagement and curiosity, although interpretability remains a challenge. Our work highlights the potential of embodied visualization to establish communicative presence for AI performers and suggests avenues for future research.
Technical Papers


DescriptionMulti-objective optimization problems, which require the simultaneous optimization of multiple objectives, are prevalent across numerous applications. Existing multi-objective optimization methods often rely on manually-tuned aggregation functions to formulate a joint optimization objective. The performance of such hand-tuned methods is heavily dependent on careful weight selection, a time-consuming and laborious process. These limitations also arise in the setting of reinforcement-learning-based motion tracking methods for physically simulated characters, where intricately crafted reward functions are typically used to achieve high-fidelity results. Such solutions not only require domain expertise and significant manual tuning, but also limit the applicability of the resulting reward function across diverse skills. To bridge this gap, we present a novel adversarial multi-objective optimization technique that is broadly applicable to a range of multi-objective reinforcement-learning tasks, including motion tracking. Our proposed Adversarial Differential Discriminator (ADD) receives a single positive sample, yet is still effective at guiding the optimization process. We demonstrate that our technique can enable characters to closely replicate a variety of acrobatic and agile behaviors, achieving comparable quality to state-of-the-art motion-tracking methods, without relying on manually-designed reward functions.
Technical Papers


DescriptionGenerative models have recently demonstrated impressive capabilities in producing high-quality 3D shapes from a variety of user inputs (e.g., text or images). However, generated objects often lack physical integrity. We introduce PhysiOpt, a differentiable physics optimizer designed to improve the physical behavior of 3D generative outputs, enabling them to transition from virtual designs to physically plausible, real-world objects. While most generative models represent geometry as continuous implicit fields, physics-based approaches often rely on the finite element method (FEM), requiring ad hoc mesh extraction to perform shape optimization. In addition, these methods are typically slow, limiting their integration in fast, iterative generative design workflows. Instead, we bridge the representation gap and propose a fast and effective differentiable simulation pipeline that optimizes shapes directly in the latent space of generative models using an intuitive and easy-to-implement differentiable mapping. This approach enables fast optimization while preserving semantic structure, unlike traditional methods relying on local mesh-based adjustments. We demonstrate the versatility of our optimizer across a range of shape priors, from global and part-based latent models to a state-of-the-art large-scale 3D generator, and compare it to a traditional mesh-based shape optimizer. Our method preserves the native representation and capabilities of the underlying generative model while supporting user-specified materials, loads, and boundary conditions. The resulting designs exhibit improved physical behavior, remain faithful to the learned priors, and are suitable for fabrication. We demonstrate the effectiveness of our approach on both virtual and fabricated objects.
Emerging Technologies






DescriptionWe present PinCity, a multiplayer AR urban design system integrated with a shape-changing display that supports co-design interaction.
XR






DescriptionPITAR is an LLM-powered XR agent that fuses eye gaze, gestures, and speech to interpret pronoun-based commands and control virtual objects. It enables real-time, human-like interaction through multimodal reasoning and few-shot prompting on the Meta Quest Pro.
Art Gallery






DescriptionPla-tsugi: Terrain as Fracture reframes rupture as generative. A stone from Mount Rokko is aligned with terrain data, sutured through MR design and PLA 3D printing. Extending kintsugi from vessels to landscapes, viewers cradle fragments in cherishing gestures, envisioning Generative Futures where rupture catalyzes connections between art, nature, and technology.
Educator's Forum



DescriptionKnowledge of pests and diseases in agriculture and forestry has been conveyed through static texts and two-dimensional images. However, traditional static resources lack interactivity and contextual reasoning support to guide students better master and remember these knowledge. To address these challenges, we present PlantPal, an interactive question-answering system powered by large language models (LLMs) that enables dynamic reasoning and multimodal interpretation of such knowledge. This system illustrates the potential of LLM-based approaches to enhance agricultural students' cognitive engagement and support the translation of theoretical understanding into practical application. By developing a knowledge representation framework that balances domain specificity with accessibility, the system effectively communicates complex concepts, while remaining aligned with the foundational cognitive level of undergraduate students in agricultural education. Results from extensive user studies, involving 50 undergraduate agricultural students, demonstrate statistically significant improvement in students’ comprehension and practical reasoning, with participants reporting increased motivation and engagement compared to traditional text-based learning resources. This work contributes to the ongoing transformation of agricultural pedagogy by introducing a new paradigm of immersive cognitive intelligence.
Emerging Technologies






DescriptionWe present Play4D, an accelerated and interactive FVV streaming pipeline for next-generation VR and light field displays. Our demo showcases two complementary experiences: Play4D-LF, a walk-up station offering interactive multi-view playback on light field displays; and Play4D-VR, a headset-based experience that enables full volumetric navigation through 4D Gaussian Splatting content.
Poster






DescriptionStudying emotion visualization for VR poetry recitation training, we designed 'Poetry Space'. This system captures the reciter’s emotions and voice real-time, transforming them into a virtual mountain landscape.
Technical Papers


DescriptionMany network architectures exist for learning on meshes, yet their constructions entail delicate trade‑offs between difficulty learning high-frequency features, insufficient receptive field, sensitivity to discretization, and inefficient computational overhead. Drawing from classic local-global approaches in mesh processing, we introduce PoissonNet, a novel neural architecture that overcomes all of these deficiencies by formulating a local-global learning scheme, which uses Poisson's equation as the primary mechanism for feature propagation. Our core network block is simple; we apply learned local feature transformations in the gradient domain of the mesh, then solve a Poisson system to propagate scalar feature updates across the surface globally. Our local‑global learning framework preserves the features's full frequency spectrum and provides a truly global receptive field, while remaining agnostic to mesh triangulation. Our construction is efficient, requiring far less compute overhead than previous methods, which enables scalability---both in the size of our datasets, and the size of individual training samples. These qualities are validated on various experiments where, compared to previous intrinsic architectures, we attain state-of-the-art performance on semantic segmentation and parameterizing highly-detailed animated surfaces. Finally, as a central application of PoissonNet, we show its ability to learn deformations, significantly outperforming all other architectures that learn on surfaces. Code will be made available upon publication.
Poster






DescriptionCustomizable multilingual aesthetic movie poster generator for diverse audiences.
Technical Papers


DescriptionThe computation of a potentially visible set (PVS) can accelerate many computer graphics algorithms, such as framerate upsampling, streaming rendering, global illumination, and multi-fragment effects. Algorithms for from-region PVS have an inherently high complexity. Previous from-region PVS algorithms propagate occlusion through the scene in a front-to-back manner and are order-dependent, which places bounds on parallelism and restricts execution speed. We introduce the disocclusion buffer, which operates on a sparse, layered representation of the scene with quantized depth. In this representation, we invert the traditional PVS problem formulation and explicitly compute disocclusion rather than occlusion. Disocclusion can be computed in parallel in an order-independent manner, overcoming the main bottleneck in traditional PVS computation. Our PVS algorithm is over six times faster than the previous state of the art at the same level of accuracy in a direct comparison. It runs in shaders on the GPU without requiring any hardware extensions. We demonstrate how our work outperforms previous PVS algorithms in the range of supported camera motion without compromising quality.
Technical Papers


Description3D Gaussian Splatting (3DGS) combines classic image-based rendering, point- based graphics, and modern differentiable techniques, and offers an interesting alternative to traditional physically-based rendering. 3DGS-family models are far from efficient for power-constrained Extended Reality (XR) devices, which need to operate at a Watt-level. This paper introduces Pow- erGS, the first framework to jointly minimize the rendering and display power in 3DGS under a quality constraint. We present a general problem formulation and show that solving the problem amounts to 1) identifying the iso-quality curve(s) in the landscape subtended by the display and rendering power and 2) identifying the power-minimal point on a given curve, which has a closed-form solution given a proper parameterization of the curves. PowerGS also readily supports foveated rendering for further power savings. Extensive experiments and user studies show that PowerGS achieves up to 86% total power reduction compared to state-of-the-art 3DGS models, with minimal loss in both subjective and objective quality.
Technical Papers


DescriptionA fundamental challenge in rendering has been the dichotomy between surface and volume models. Gaussian Process Implicit Surfaces (GPISes) recently provided a unified approach for surfaces, volumes, and the spectrum in between. However, this representation remains impractical due to its high computational cost and mathematical complexity. We address these limitations by reformulating GPISes as procedural noise, eliminating expensive linear system solves while maintaining control over spatial correlations. Our method enables efficient sampling of stochastic realizations and supports flexible conditioning of values and derivatives through pathwise updates. To further enable practical rendering, we derive analytic distributions for surface normals, allowing for variance-reduced light transport via next-event estimation and multiple importance sampling. Our framework achieves efficient, high-quality rendering of stochastic surfaces and volumes with significantly simplified implementations on both CPU and GPU, while preserving the generality of the original GPIS representation.
Technical Papers


DescriptionLight control in generated images is a difficult task, posing specific challenges, spanning over the entire image and frequency spectrum. Most approaches tackle this problem by training on extensive yet domain-specific datasets, limiting the inherent generalization and applicability of the foundational backbones used. Instead, PractiLight is a practical approach, effectively leveraging foundational understanding of recent generative models for the task. Our key insight is that lighting relationships in an image are similar in nature to token interaction in self-attention layers, and hence are best represented there. Based on this and other analyses regarding the importance of early diffusion iterations, PractiLight trains a lightweight LoRA regressor to produce the direct irradiance map for a given image, using a small set of training images. We then employ this regressor to incorporate the desired lighting into the generation process of another image using Classifier Guidance. This careful design generalizes well to diverse conditions and image domains. We demonstrate state-of-the-art performance in terms of quality and control with proven parameter and data efficiency compared to leading works over a wide variety of scenes types. We hope this work affirms that image lighting can feasibly be controlled by tapping into foundational knowledge, enabling practical and general relighting.
Technical Papers


DescriptionMany physical phenomena exhibit discontinuities in their spatial derivatives—such as folds in creased materials or interfaces in heterogeneous solids—making their accurate representation essential for high-fidelity simulation. Traditional approaches address such discontinuities by aligning mesh discretizations with the interface, but this tight coupling between geometry and simulation limits generalization: changes in discontinuity geometry require remeshing, which alters the system’s discrete operators and prevents consistent reuse of reduced-order basis. Since these basis are typically derived from mesh-dependent operators, applying reduced-order modeling across varying geometries remains a fundamental challenge.Neural representations offer an alternative by encoding basis functions as continuous neural fields, enabling generalization across shape variations. However, their inherent continuity makes it difficult to represent functions with discontinuous gradients. While recent work has explored discontinuities in function values, modeling continuous functions with discontinuous derivatives has remained largely unexplored.We introduce a neural field construction capable of capturing gradient discontinuities while maintaining continuity in the function itself. Our approach augments input coordinates with a non-trainable, smoothly clamped distance function within a lifting framework, allowing the gradient discontinuity to be encoded explicitly. We show that this construction yields higher-quality basis functions compared to traditional neural fields and supports reduced-order simulation across families of shapes with heterogeneous materials and creases—capabilities not demonstrated by prior work. Furthermore, our method can be combined with previous techniques that model function-value discontinuities via lifting, enabling the simulation of examples with simultaneous cuts and creases.
Poster






DescriptionWe propose a spatio-temporal physics-informed graph neural network (PIGNN) that predicts passenger anxiety using gaze and context data collected in a fully autonomous virtual reality driving simulation.
Technical Papers


DescriptionWe introduce a fully automatic pipeline for dynamic scene reconstruction from casually captured monocular RGB videos. Rather than designing a new scene representation, we enhance the priors that drive Dynamic Gaussian Splatting. Video segmentation combined with epipolar-error maps yields object-level masks that closely follow thin structures; these masks (i) guide an object-depth loss that sharpens the consistent video depth, and (ii) support skeleton-based sampling plus mask-guided re-identification to produce reliable, comprehensive 2-D tracks. Two additional objectives embed the refined priors in the reconstruction stage: a virtual-view depth loss removes floaters, and a scaffold-projection loss ties motion nodes to the tracks, preserving fine geometry and coherent motion. The resulting system surpasses previous monocular dynamic scene reconstruction methods and delivers visibly superior renderings.
Technical Papers


DescriptionHigh-fidelity avatar reconstruction from monocular videos faces significant challenges due to imperfect foreground segmentation and inaccurate body poses. Existing methods typically depend on additive components, such as explicit background modeling, which introduce additional overhead and reduce the flexibility of avatar reconstruction. We argue that these challenges need to be addressed fundamentally. To this end, we propose leveraging a learned 3D human prior to guide the reconstruction of 3D avatars, dubbed PriorAvatar, without increasing model complexity. At the core of our method is a learned 3D prior, which consists of a multi-person feature codebook that stores the 3D shapes and appearances derived from human scans. These latent features are complemented by a shared U-Net decoder that converts them into a set of renderable 3D Gaussians. During reconstruction, the learned 3D prior allows for fitting to unseen subjects in the monocular videos by fine-tuning with 2D photometric losses using 3D Gaussians. This approach ensures that the reconstruction process effectively utilizes the learned latent spaces while minimizing discrepancies with the 2D observations. In our experiments, we demonstrate the efficiency and robustness of our novel reconstruction scheme, as evidenced by its state-of-the-art quantitative and qualitative performance without relying on complex regularizers or additional model enhancements. The results of ablation studies further verify the effectiveness of incorporating a learned human prior for monocular avatar reconstruction.
Emerging Technologies






DescriptionPro's Eyes is a smart eyewear system that enables novices to experience expert observational patterns from a first-person perspective. Using dual transparent displays and eye tracking, the system visualizes expert gaze trajectories in real-time or playback mode, helping users internalize efficient viewing strategies for accelerated skill acquisition across professional domains.
Technical Papers


DescriptionSynthesizing 3D scenes from open-vocabulary text descriptions is a challenging, important, and recently-popular application. One of its critical subproblems is layout generation: given a set of objects, lay them out to produce a scene matching the input description. Nearly all recent work adopts a declarative paradigm for this problem: using LLM to generate specification of constraints between objects, then solving those constraints to produce the final layout. In contrast, we explore an alternative imperative paradigm, in which an LLM iteratively places objects, with each object's position and orientation computed as a function of previously-placed objects. The imperative approach allows for a simpler scene specification language while also handling a wider variety and larger complexity of scenes. We further improve the robustness of our imperative scheme by developing an error correction mechanism that iteratively improves the scene's validity while staying as close as possible the original layout generated by the LLM. In forced-choice perceptual studies, participants preferred layouts generated by our imperative approach 82% and 94% of the time, respectively, when compared against two declarative layout generation methods. We also present a simple, automated evaluation metric for 3D scene layout generation that aligns well with human preferences.
Technical Papers


DescriptionWith the rise of digital fashion, reusing high-quality garment assets to assemble new outfits has become increasingly important for improving design efficiency and reducing production costs. However, combining multiple garments often introduces complex inter-garment intersections that are difficult to resolve. In this paper, we propose a novel framework that introduces a midsurface representation to simplify multilayered garments for intersection-free outfit assembly. Each garment is approximated by a watertight tetrahedral enclosure, enabling efficient resolution of inter-garment collisions at the midsurface level. To assemble an outfit, our method progressively untangles pairs of single-layer midsurfaces and incrementally constructs a merged midsurface. To efficiently recover the intersection-free full geometry from these deformed midsurfaces and enable instantaneous transfer across different poses, we introduce a novel algorithm that uses embedded anchors to drive inversion-free deformation of enclosing tetrahedral cages. Through various examples, we demonstrate that our method provides a scalable and automated solution for virtual outfit coordination, enabling the direct reuse of garment assets in high-fidelity, collision-free digital fashion workflows.
Poster






DescriptionWe propose a projection technique that induces finger-bending illusions using luminance-based motion cues, enhancing perceived softness on rigid objects without physical deformation or wearables in augmented reality contexts.
Technical Papers


DescriptionVideo identity customization seeks to synthesize realistic, temporally coherent videos of a specific subject, given a single reference image and a text prompt. This task presents two core challenges: (1) maintaining identity consistency while aligning with the described appearance and actions, and (2) generating natural, fluid motion without unrealistic stiffness. To address these challenges, we introduce Proteus-ID, a novel diffusion-based framework for identity-consistent and motion-coherent video customization. First, we propose a Multimodal Identity Fusion (MIF) module that unifies visual and textual cues into a joint identity representation using a Q-Former, providing coherent guidance to the diffusion model and eliminating modality imbalance. Second, we present a Time-Aware Identity Injection (TAII) mechanism that dynamically modulates identity conditioning across denoising steps, improving fine-detail reconstruction. Third, we propose Adaptive Motion Learning (AML), a motion-aware optimization strategy that reweights the training loss based on optical-flow-derived motion heatmaps, enhancing motion realism without requiring additional inputs. To support this task, we construct Proteus-Bench, a high-quality dataset comprising 200K curated clips for training and 150 individuals from diverse professions and ethnicities for evaluation. Extensive experiments demonstrate that Proteus-ID outperforms prior methods in identity preservation, text alignment, and motion quality, establishing a new benchmark for video identity customization.
Technical Papers


DescriptionIn this paper, we introduce a state-of-the-art blendshape compression algorithm that significantly reduces storage requirements and computational complexity in facial animation. Our approach leverages large sparse matrix factorization and quantization to compress high-dimensional blendshape coefficients into a compact representation, preserving essential features and high-frequency geometric details. The proposed algorithm outperforms existing methods in terms of compression ratio, reconstruction quality, and computational efficiency. We demonstrate its effectiveness through extensive experiments on various animated face models, achieving compression factors of up to 100x over sparse blendshapes with minimal impact on quality. Our technique offers compression rates up to 4.6x better than the prior state-of-the-art while also improving approximation error and preserving features like wrinkles. Additionally, our runtime computation is up to 3x faster than state-of-the-art on CPU and 70% faster than state-of-the-art on GPU, facilitating high-quality facial animation on low-powered computing platforms with limited resources.
Technical Papers


DescriptionWith the advancement of Gaussian Splatting techniques, a growing number of datasets based on this representation have been developed. However, performing accurate and efficient clipping for Gaussian Splatting remains a challenging and unresolved problem, primarily due to the volumetric nature of Gaussian primitives, which makes hard clipping incapable of precisely localizing their pixel-level contributions. In this paper, we propose a hybrid rendering framework that combines rasterization and ray tracing to achieve efficient and high-fidelity clipping of Gaussian Splatting data. At the core of our method is the RaRa strategy, which first leverages rasterization to quickly identify Gaussians intersected by the clipping plane, followed by ray tracing to compute attenuation weights based on their partial occlusion. These weights are then used to accurately estimate each Gaussian's contribution to the final image, enabling smooth and continuous clipping effects. We validate our approach on diverse datasets, including general Gaussians, hair strand Gaussians, and multi-layer Gaussians, and conduct user studies to evaluate both perceptual quality and quantitative performance. Experimental results demonstrate that our method delivers visually superior results while maintaining real-time rendering performance and preserving high fidelity in the unclipped regions.
Technical Papers
RCTrans: Transparent Object Reconstruction in Natural Scene via Refractive Correspondence Estimation
1:10pm - 1:20pm HKT Monday, 15 December 2025 Meeting Room S421, Level 4

DescriptionTransparent object reconstruction in an uncontrolled natural scene is a challenging task due to its complex appearance. Existing methods optimize the object shape with RGB color as supervision, which suffer from locality and ambiguity, and fail to recover fine details. In this paper, we present RC-Trans, which uses ray-background correspondence as much more efficient constraints to achieve high-quality reconstruction, while maintaining a convenient setup. The key technology to achieve this is a novel pre-trained correspondence estimation network, which allows us to acquire correspondence under arbitrary scenes and camera views. In addition, a confidence evaluation is introduced to protect the reconstruction from inaccurate estimated correspondence. Extensive experiments on both synthetic and real data demonstrate that our method can produce much more accurate results, without any extra acquisition burden. The code and dataset will be publicly available.
XR






DescriptionThis session presents a real-time 2D character rigging and IK-based motion capture workflow designed for setups of any scale, including consumer-grade AR/VR. Featuring real-time pose reset and motion sensitivity controls, the system expands usability and stability, enabling expressive, retargetable animation for nontraditional characters through accessible and democratized motion capture tools.
Invited Poster
Poster






DescriptionWe introduce a novel method for continuous collision detection (CCD) between triangle meshes and signed distance fields (SDFs), reformulating the problem as a spatio-temporal optimization to robustly compute the first time-of-impact. An enhanced Frank–Wolfe algorithm with golden section line search (FWGSS) efficiently handles collision detection for complex motions and avoids tunneling. We further propose an adaptive triangle subdivision to support coarse meshes, and an augmented bounding sphere hierarchy for efficient collision pruning. Our method outperforms existing discrete collision detection techniques in accuracy, robustness, and performance, enabling interpenetration-free simulations suitable for real-time applications.
Technical Papers


DescriptionRealistic fabric rendering is still a significant challenge due to the complex structures and varying fiber properties. We present a new fabric shading technique, which models both reflection and transmission using a hybrid of wave and ray optics methods, grounded in simulation data.
We target fabrics woven from yarns, each formed by twisting together one or more plies, which further contain twisted fibers. Our model is based on simulations that predict the scattering of a narrow Gaussian beam by a single ply. Comparing results from full-wave simulations and path tracing, we found that ray optics can accurately simulate the average far field scattering from an ensemble of plies, but not the variation among individual ply instances, and ray tracing overlooks important diffraction effects. Following these observations, our model is built from ray simulations performed for many ply instances, with simulation data fitted to Gaussian mixtures to be used during rendering. Wave simulations are used to calibrate noise functions that account for instance-to-instance variation, and an aperture diffraction model is used to handle light passing between plies and yarns.
The result is a hybrid model capable of producing realistic appearance and highlight structure in fabrics, while capturing spatial break-ups and irregularities and simulating the subtle color shifts and blurriness that occur in transmission. We validate our results by comparing rendered images with photographs, demonstrating the effectiveness of our approach in achieving realistic cloth rendering.
We target fabrics woven from yarns, each formed by twisting together one or more plies, which further contain twisted fibers. Our model is based on simulations that predict the scattering of a narrow Gaussian beam by a single ply. Comparing results from full-wave simulations and path tracing, we found that ray optics can accurately simulate the average far field scattering from an ensemble of plies, but not the variation among individual ply instances, and ray tracing overlooks important diffraction effects. Following these observations, our model is built from ray simulations performed for many ply instances, with simulation data fitted to Gaussian mixtures to be used during rendering. Wave simulations are used to calibrate noise functions that account for instance-to-instance variation, and an aperture diffraction model is used to handle light passing between plies and yarns.
The result is a hybrid model capable of producing realistic appearance and highlight structure in fabrics, while capturing spatial break-ups and irregularities and simulating the subtle color shifts and blurriness that occur in transmission. We validate our results by comparing rendered images with photographs, demonstrating the effectiveness of our approach in achieving realistic cloth rendering.
Technical Communications


DescriptionWe propose ReChar, a framework for artistic character generation that explicitly decouples structure, style, and decorative elements, enabling controllable generation guided by user-defined aesthetics.
Technical Papers


DescriptionWe present a computational framework for designing geometric metamaterials capable of approximating freeform 3D surfaces via rotationally deployable kirigami patterns.
While prior inverse design methods typically rely on a single, well-studied pattern, such as equilateral triangles or quadrilaterals, we step back to examine the broader design space of the patterns themselves. Specifically, we derive principled rules to determine whether a given periodic planar tiling can be cut into a hinged kirigami structure with rotational freedom--a mechanical property that facilitates deployment and curvature adaptation. These insights allow us to generate and validate a broad family of novel tiling patterns beyond traditional examples.
We further analyze two key deployment states of a general pattern: the commonly used maximal area expansion, and the maximal rotation angle reached just before face collisions, which we adopt as the default for inverse design as it allows for simple deployment in practice, i.e., rotating the faces to their natural limit. Finally, we solve the inverse problem: given a target 3D surface, we compute a planar tiling that, when cut and deployed to its maximal rotation angle, approximates the input geometry. For a subset of patterns, their deployed configurations are hole-free, demonstrating that curvature can be achieved from planar sheets through local combinatorial changes. Our experiments, including physical fabrications, demonstrate the effectiveness of our approach and validate a wide range of previously unexplored patterns that are both physically realizable and geometrically expressive.
While prior inverse design methods typically rely on a single, well-studied pattern, such as equilateral triangles or quadrilaterals, we step back to examine the broader design space of the patterns themselves. Specifically, we derive principled rules to determine whether a given periodic planar tiling can be cut into a hinged kirigami structure with rotational freedom--a mechanical property that facilitates deployment and curvature adaptation. These insights allow us to generate and validate a broad family of novel tiling patterns beyond traditional examples.
We further analyze two key deployment states of a general pattern: the commonly used maximal area expansion, and the maximal rotation angle reached just before face collisions, which we adopt as the default for inverse design as it allows for simple deployment in practice, i.e., rotating the faces to their natural limit. Finally, we solve the inverse problem: given a target 3D surface, we compute a planar tiling that, when cut and deployed to its maximal rotation angle, approximates the input geometry. For a subset of patterns, their deployed configurations are hole-free, demonstrating that curvature can be achieved from planar sheets through local combinatorial changes. Our experiments, including physical fabrications, demonstrate the effectiveness of our approach and validate a wide range of previously unexplored patterns that are both physically realizable and geometrically expressive.
Emerging Technologies






DescriptionWe propose “pain masking by contextual modification” as a method of pain reduction. By presenting a less unpleasant visual stimulus (e.g., cat scratch) at the timing when the pain stimulus is given, this method misleads the user about the cause of the pain and reduces the perception of pain.
Educator's Forum



DescriptionThis paper explores the transformative role of augmented reality (AR) in reimagining museum engagement through two case studies: the Pocketbook AR App and Echoes of Newark. Both projects leverage mobile AR to enhance visitor interaction, memory retention, and post-visit reflection. The Pocketbook app enables users to scan and collect artifacts, access contextual information, and curate personalized digital galleries. Echoes of Newark gamifies the museum experience through trivia-based scavenger hunts and animated artworks, fostering curiosity and repeat engagement. Drawing on constructivist learning theory and participatory design, the paper situates these tools within broader trends in digital heritage and curatorial innovation. It also examines their technical implementation using Unity, Vuforia, and AI animation platforms. By comparing their features and educational goals, the paper highlights synergies between the two projects and discusses how they support emerging curatorial practices that blend physical and digital storytelling, expand access, and redefine the museum experience in the 21st century.
Poster






DescriptionThis modular, artist-guided diffusion workflow reimagines CGI pipelines by generating 3D product videos from images in minutes, drastically reducing production times and enabling designers to iterate faster with creative freedom.
Technical Papers


DescriptionTraditional solvers struggle with stiff materials: explicit needs tiny time steps, implicit requires many iterations. Position-based dynamics lacks versatility. Our Reliable Iterative Dynamics (RID) uses dual descent for per-iteration visual reliability, fast/stable convergence, handles all stiffness, integrates with FEM/MPM/SPH etc.
Technical Papers


DescriptionThe introduction of the neural implicit representation has notably propelled the advancement of online dense reconstruction techniques. Compared to traditional explicit representations, such as TSDF, it substantially improves the mapping completeness and memory efficiency. However, the lack of reconstruction details and the time-consuming learning of neural representations hinder the widespread application of neural-based methods to large-scale online reconstruction. We introduce RemixFusion, a novel residual-based mixed representation for scene reconstruction and camera pose estimation dedicated to high-quality and large-scale online RGB-D reconstruction. In particular, we propose a residual-based map representation comprised of an explicit coarse TSDF grid and an implicit neural module that produces residuals representing fine-grained details to be added to the coarse grid. Such mixed representation allows for detail-rich reconstruction with bounded time and memory budget, contrasting with the overly-smoothed results by the purely implicit representations, thus paving the way for high-quality camera tracking. Furthermore, we extend the residual-based representation to handle multi-frame joint pose optimization via bundle adjustment (BA). In contrast to the existing methods, which optimize poses directly, we opt to optimize pose changes. Combined with a novel technique for adaptive gradient amplification, our method attains better optimization convergence and global optimality. Furthermore, we adopt a local moving volume to factorize the whole mixed scene representation with a divide-and-conquer design to facilitate efficient online learning in our residual-based framework. Extensive experiments demonstrate that our method surpasses all state-of-the-art ones, including those based either on explicit or implicit representations, in terms of the accuracy of both mapping and tracking on large-scale scenes.
Invited Poster
Poster






DescriptionWe introduce a novel flow field reconstruction framework based on divergence-free kernels (DFKs), which inherently enforce incompressibility while capturing fine structures without relying on hierarchical or heterogeneous representations.
Technical Papers


DescriptionWe present ReSTIR Path Guiding (ReSTIR-PG), a real-time method that extracts guiding distributions from resampled paths produced by ReSTIR and uses them to generate improved initial candidates for the next frame. While ReSTIR significantly reduces variance through spatiotemporal resampling, its effectiveness is ultimately limited by the quality of the initial candidates, which are often poorly distributed and introduce correlation artifacts. Our key observation is that ReSTIR’s accepted paths already approximate the target path contribution density, and that their bounce directions follow the ideal distribution for local path guiding – the product of incident radiance and the cosine-weighted BSDF. We exploit this structure to fit lightweight guiding distributions using each frame’s resampled paths by density estimation. Compared to conventional guiding based on raw path-traced samples, ReSTIR-PG closes the loop between guiding and resampling. Our method achieves lower variance, faster response time to scene change, reduced correlation artifacts, all while preserving real-time performance.
Educator's Forum



DescriptionThis paper presents the first global adaptation of Artificial Intelligence (AI) lecturers and the MetaClassroom, a virtual environment that operates with and without AI lecturers in higher education. The study involved two cross-campus postgraduate courses conducted over four consecutive semesters (N=18, 26, 20, 35) within the Division of Emerging Interdisciplinary Programs at two universities. We provide a detailed description of the courses, methods, and pedagogical impact, as well as insights into practical implementation, including challenges and best practices. Additionally, we compare traditional, online, and VR classes, both with and without AI lecturers, based on our observations and interviews with 24 students. The paper also summarizes feedback collected from students and our teaching team, outlining future directions for integrating immersive technology and digital humans into educational practices.
Art Gallery






DescriptionReWilding AI leads you into stunningly immersive, curious and paradoxical ecologies of AI through an immersive game-based experience called Weather Spores.
Through a multi-dimensional maze of five immersive portals, you will encounter the symbiotic interspecies intelligence of mycelium, birds and trees, cross expanded geographies of urban AI data systems, and witness present-tense archaeologies of the supply chains that deliver AI. You will play with carbon and blow straws and ‘sticky encounters’ via the strange and wild worlds of ‘emergence’ and ‘indeterminacy’, and explore how the very fundamentals of linear and non-linear thinking and creating triangulate with the political, the cultural and ethical to encourage different forms of individual and collective agency.
In ReWilding Ai, we move away from probability and randomness towards indeterminacy, emergence and entanglement. Here we find critical, urgent new ways to query the temporalities and geo-politics of AI, rethinking immersive generative AI through art-led practice from the ground up and for the common good.
Through a multi-dimensional maze of five immersive portals, you will encounter the symbiotic interspecies intelligence of mycelium, birds and trees, cross expanded geographies of urban AI data systems, and witness present-tense archaeologies of the supply chains that deliver AI. You will play with carbon and blow straws and ‘sticky encounters’ via the strange and wild worlds of ‘emergence’ and ‘indeterminacy’, and explore how the very fundamentals of linear and non-linear thinking and creating triangulate with the political, the cultural and ethical to encourage different forms of individual and collective agency.
In ReWilding Ai, we move away from probability and randomness towards indeterminacy, emergence and entanglement. Here we find critical, urgent new ways to query the temporalities and geo-politics of AI, rethinking immersive generative AI through art-led practice from the ground up and for the common good.
Art Gallery






DescriptionReWilding AI presents breakthrough research exploring the radical mattering of narrative ecologies and storytelling via immersive, distributed and symbiotic intelligence. It showcases the importance of art in reshaping approaches to generative AI and complex adaptive systems, with focus on mycelium, birdsounds, 1,000 year-old trees, extractivism, poetics and portal-hopping.
Technical Papers


DescriptionWe introduce RibbonSculpt, the first method for the interactive surface modeling in VR using sparsely drawn oriented ribbons. Instead of reconstructing a surface from a fully drawn VR sketch, we address the real-time creation and progressive refinement of a closed surface of any topological genus, thanks to the continuous update of a volumetric proxy. The latter corresponds to a filtered subset of the Voronoi balls defined by the user-sketched oriented ribbons. Guided by this proxy, users can easily refine their design by adding or removing ribbons, which sculpt the set of Voronoi balls until the intended 3D shape is achieved. Finally, a mesh is extracted from the proxy surface and further beautified through Laplacian-based energy minimization, yielding a smooth surface that still interpolates the user-drawn ribbons. Our results, supported by user studies, show that RibbonSculpt allows users to model the desired shapes in VR with minimal effort.
Technical Papers


DescriptionReconstructing object deformation from a single image remains a significant challenge in computer vision and graphics. Existing methods typically rely on multi-view video to recover deformation, limiting their applicability under constrained scenarios. To address this, we propose DeformSplat, a novel framework that effectively guides 3D Gaussian deformation from only a single image. Our method introduces two main technical contributions. First, we present a Gaussian-to-Pixel Matching that bridges the domain gap between 3D Gaussian representations and 2D pixel observations. This enables robust deformation guidance from sparse visual cues. Second, we propose novel Rigid Part Segmentation consisting of initialization and refinement steps. This segmentation explicitly identifies rigid regions, crucial for maintaining geometric coherence during deformation. By combining these two techniques, our approach can reconstruct consistent deformations from limited input. Extensive experiments demonstrate that our approach significantly outperforms existing methods, particularly in deformation accuracy and geometric preservation. Furthermore, our framework naturally extends to various applications, such as frame interpolation and interactive object manipulation.
Poster






DescriptionRingFlowUI uses a finger-mounted camera and deep learning to infer 3D hand shape and motion, enabling natural, fatigue-free VR/AR interaction with compact, intuitive, and immersive hand-based control.
Technical Papers


DescriptionApproximate Convex Decomposition (ACD) aims to approximate complex 3D shapes with convex components, which is widely applied to create compact collision representations for real-time applications including VR/AR, interactive games, and robotic simulations. Efficiency and optimality are crucial for ACD algorithms to compute approximations for large-scale and complex 3D shape assets, while generating high-quality decompositions with a minimal number of components. Regretfully, existing methods either employ sub-optimal greedy strategies or rely on computationally intensive multi-step searches. In this work, we propose RL-ACD, a data-driven, reinforcement learning-based approach for efficient and near-optimal convex shape decomposition.
We formulate ACD as a Markov Decision Process (MDP), where cutting planes are iteratively applied based on the current stage's mesh fragments rather than the entire fine-grained mesh, leading to a novel, efficient geometric encoding. To train near-optimal policies for ACD, we propose a novel dual-state Bellman loss and analyze its convergence using a Q-learning algorithm.
Extensive evaluations across multiple datasets demonstrate that our RL-ACD algorithm is highly efficient and accurate for decomposition tasks. Our method outperforms the multi-step tree search by 15 times in terms of computational speed. Furthermore, RL-ACD reduces the number of resulting components by 16% compared to the current state-of-the-art greedy algorithms, significantly narrowing the sub-optimality gap and enhancing downstream task performance.
We formulate ACD as a Markov Decision Process (MDP), where cutting planes are iteratively applied based on the current stage's mesh fragments rather than the entire fine-grained mesh, leading to a novel, efficient geometric encoding. To train near-optimal policies for ACD, we propose a novel dual-state Bellman loss and analyze its convergence using a Q-learning algorithm.
Extensive evaluations across multiple datasets demonstrate that our RL-ACD algorithm is highly efficient and accurate for decomposition tasks. Our method outperforms the multi-step tree search by 15 times in terms of computational speed. Furthermore, RL-ACD reduces the number of resulting components by 16% compared to the current state-of-the-art greedy algorithms, significantly narrowing the sub-optimality gap and enhancing downstream task performance.
Technical Communications


DescriptionWe present an unsupervised hybrid functional map framework combining Laplace-Beltrami and elastic bases for stable, efficient non-rigid shape matching, achieving state-of-the-art performance across multiple benchmarks.
Technical Papers


DescriptionMonte Carlo methods based on the walk on spheres (WoS) algorithm offer a parallel, progressive, and output-sensitive approach for solving partial differential equations (PDEs) in complex geometric domains. Building on this foundation, the walk on stars (WoSt) method generalizes WoS to support mixed Dirichlet, Neumann, and Robin boundary conditions. However, accurately computing spatial derivatives of PDE solutions remains a major challenge: existing methods exhibit high variance and bias near the domain boundary, especially in Neumann-dominated problems. We address this limitation with a new extension of WoSt specifically designed for derivative estimation.
Our method reformulates the boundary integral equation (BIE) for Poisson PDEs by directly leveraging the harmonicity of spatial derivatives. Combined with a tailored random-walk sampling scheme and an unbiased early termination strategy, we achieve significantly improved accuracy in derivative estimates near the Neumann boundary. We further demonstrate the effectiveness of our approach across various tasks, including recovering the non-unique solution to a pure Neumann problem with reduced bias and variance, constructing divergence-free vector fields, and optimizing parametrically defined boundaries under PDE constraints.
Our method reformulates the boundary integral equation (BIE) for Poisson PDEs by directly leveraging the harmonicity of spatial derivatives. Combined with a tailored random-walk sampling scheme and an unbiased early termination strategy, we achieve significantly improved accuracy in derivative estimates near the Neumann boundary. We further demonstrate the effectiveness of our approach across various tasks, including recovering the non-unique solution to a pure Neumann problem with reduced bias and variance, constructing divergence-free vector fields, and optimizing parametrically defined boundaries under PDE constraints.
Technical Papers


DescriptionWe consider the problem of active 3D imaging using single-shot structured light systems, which are widely employed in commercial 3D sensing devices such as Apple Face ID and Intel RealSense. Traditional structured light methods typically decode depth correspondences through pixel-domain matching algorithms, resulting in limited robustness under challenging scenarios like occlusions, fine-structured details, and non-Lambertian surfaces. Inspired by recent advances in neural feature matching, we propose a learning-based structured light decoding framework that performs robust correspondence matching within feature space rather than the fragile pixel domain. Our method extracts neural features from the projected patterns and captured infrared (IR) images, explicitly incorporating their geometric priors by building cost volumes in feature space, achieving substantial performance improvements over pixel-domain decoding approaches. To further enhance depth quality, we introduce a depth refinement module that leverages strong priors from large-scale monocular depth estimation models, improving fine detail recovery and global structural coherence. To facilitate effective learning, we develop a physically-based structured light rendering pipeline, generating nearly one million synthetic pattern-image pairs with diverse objects and materials for indoor settings. Experiments demonstrate that our method, trained exclusively on synthetic data with multiple structured light patterns, generalizes well to real-world indoor environments, effectively processes various pattern types without retraining, and consistently outperforms both commercial structured light systems and passive stereo RGB-based depth estimation methods.
Technical Communications


DescriptionRotoShop aims to significantly reduce the costs of segmentation labelling by efficiently vectorizing mask sequences produced by computer vision models into temporarily consistent splines suitable for handoff to rotoscoping artists.
Technical Papers


DescriptionScratch-represented 3D visual arts can create compelling visual effects by manipulating light reflections across surfaces. Established works, such as those involving scratch holograms, have realized impressive multi-view imagery effects of reflection arts. However, creating a continuous view of 3D virtual objects with shading effects, especially view-dependent shading remains a challenge. Yet, most reported works are demonstrated on planar surfaces, leaving exploring the potential benefits of leveraging curved surfaces for diverse imagery scenarios an interesting research avenue. This work explores the continuous view-dependent imagery with rich shading
effects via scratch-based reflection, whose design space has the potential to be extended to arbitrary curved surfaces. This is achieved by solving the ordinary differential equations under constraints calculated from established bidirectional reflectance distribution function models to optimize scratch distribution on substrate surfaces. Importantly, we create real-world examples
by manufacturing optimized reflectors using off-the-shelf carving machines, delivering state-of-the-art specular view-dependent imagery that features continuous and realistic shading effects on both planar and curved surfaces.
effects via scratch-based reflection, whose design space has the potential to be extended to arbitrary curved surfaces. This is achieved by solving the ordinary differential equations under constraints calculated from established bidirectional reflectance distribution function models to optimize scratch distribution on substrate surfaces. Importantly, we create real-world examples
by manufacturing optimized reflectors using off-the-shelf carving machines, delivering state-of-the-art specular view-dependent imagery that features continuous and realistic shading effects on both planar and curved surfaces.
Technical Papers


DescriptionCaustics rendering remains a long-standing challenge in Monte Carlo rendering because high-energy specular paths occupy only a small region of path space, making them difficult to sample effectively. Recent work such as Specular Manifold Sampling (SMS) [Zeltner et al. 2020] can stochastically sample these specular paths and estimate their unbiased weights using Bernoulli trials. However, applying SMS in interactive rendering is non-trivial because it is slow and delivers noisy images given a very limited time budget.
In this work, we extend SMS for high-quality caustic rendering in interactive settings using sample space partitioning. Our insight is that Newton iterations, the main performance bottleneck of SMS, can be restricted to the vicinity of the seed path, which can dramatically improve the performance. We achieve this with tile-based sample space partitioning, which bounds the manifold walk region and allows building a per-frame prior distribution that concentrates initial guesses around solutions. This reduces the cost of SMS and improves its sampling quality. Applying spatiotemporal reuse (ReSTIR) further amortizes the sample generation cost, greatly increasing the effective sample count. As a result, we achieve significant variance reduction compared to SMS in interactive rendering scenarios.
In this work, we extend SMS for high-quality caustic rendering in interactive settings using sample space partitioning. Our insight is that Newton iterations, the main performance bottleneck of SMS, can be restricted to the vicinity of the seed path, which can dramatically improve the performance. We achieve this with tile-based sample space partitioning, which bounds the manifold walk region and allows building a per-frame prior distribution that concentrates initial guesses around solutions. This reduces the cost of SMS and improves its sampling quality. Applying spatiotemporal reuse (ReSTIR) further amortizes the sample generation cost, greatly increasing the effective sample count. As a result, we achieve significant variance reduction compared to SMS in interactive rendering scenarios.
Technical Papers


DescriptionWe present a novel method for accurately calibrating the optical properties of full-color 3D printers using only a single, directly printable calibration target.
Our approach is based on accurate multiple-scattering light transport and estimates the single-scattering albedo and extinction coefficient for each resin.
These parameters are essential for both soft-proof rendering of 3D printouts and for advanced, scattering-aware 3D halftoning algorithms.
In contrast to previous methods that rely on thin, precisely fabricated resin samples and labor-intensive manual processing, our technique achieves higher accuracy with significantly less effort.
Our calibration target is specifically designed to enable algorithmic recovery of each resin's optical properties through a series of one-dimensional and two-dimensional numerical optimizations, applied first on the white and black resins, and then on any remaining resins.
The method supports both RGB and spectral calibration, depending on whether a camera or spectrometer is used to capture the calibration target.
It also scales linearly with the number of resins, making it well-suited for modern multi-material printers.
We validate our approach extensively, first on synthetic and then on real resins across 242 color mixtures, printed thin translucent samples, printed surface textures, and fully textured 3D models with complex geometry, including an eye model and a figurine.
Our approach is based on accurate multiple-scattering light transport and estimates the single-scattering albedo and extinction coefficient for each resin.
These parameters are essential for both soft-proof rendering of 3D printouts and for advanced, scattering-aware 3D halftoning algorithms.
In contrast to previous methods that rely on thin, precisely fabricated resin samples and labor-intensive manual processing, our technique achieves higher accuracy with significantly less effort.
Our calibration target is specifically designed to enable algorithmic recovery of each resin's optical properties through a series of one-dimensional and two-dimensional numerical optimizations, applied first on the white and black resins, and then on any remaining resins.
The method supports both RGB and spectral calibration, depending on whether a camera or spectrometer is used to capture the calibration target.
It also scales linearly with the number of resins, making it well-suited for modern multi-material printers.
We validate our approach extensively, first on synthetic and then on real resins across 242 color mixtures, printed thin translucent samples, printed surface textures, and fully textured 3D models with complex geometry, including an eye model and a figurine.
Poster






DescriptionRelive memories in VR. Scan, transform, connect.
Create 3D time capsules, embody your avatar, and share stories in a nostalgic, shared virtual world.
Past meets present. Memory becomes magic.
Create 3D time capsules, embody your avatar, and share stories in a nostalgic, shared virtual world.
Past meets present. Memory becomes magic.
Art Gallery






DescriptionThis work uses generative AI and a 3D printer to recreate the past inventions of Hedy Lamarr, a Hollywood actress and inventor. It is also a speculative forensic investigation of the past, using AI to visualize the contributions of women to science and technology.
Poster






DescriptionWhat if proactive AI knew you so precisely that it inspired deeper curiosity instead of limiting it? In our study with See, Sense, Spark, a smart-glasses-based proactive AI, we compared coarse and fine-grained user profiling during creative research ideation. Surprisingly, participants with fine-grained profiles showed higher interest-driven curiosity and stronger epistemic agency. These results suggest that when AI interventions are more contextually relevant, they can enhance curiosity and reflective thinking more effectively than broad, random prompts. Profiling granularity thus emerges as a key design parameter for future adaptive systems that balance personalization, curiosity, and human–AI collaboration in complex cognitive work.
XR






DescriptionWe present BinocuSim, a novel human vision simulator that replicates natural human eye movements while capturing and displaying the wearer's XR-NED view. Achieving a mean gaze point error of just 0.79 mm, BinocuSim delivers high‑fidelity, human-like perception and enables scalable evaluation of XR-NEDs across diverse HCI scenarios.
Technical Papers


DescriptionDecomposing an image Iinto the combination of structure S and texture T components is an important problem in computational photography and image analysis. Traditional solutions are basically non-learning based, because it is difficult to construct datasets containing ground-truth decompositions or find effective structure/texture supervisions. In this article, we present a self-supervised framework for smoothing out textures while maintaining the image structures. At the core of our method is a texture-inversion observation - if structure S and texture T are well disentangled, then S-T will produce a texture-inverted image that is symmetric to the input image I= S + T and the two will be visually highly similar, while for other conditions that structure and texture are not effectively separated, the generated texture-inverted images will be less similar to the input. Based on the observation, we propose to learn texture fltering from unlabeled data by encouraging the texture inverted image generated from the fltering output to be visually more similar to the input via contrastive learning. Experiments show that our method can robustly produce high-quality texture smoothing results, and also enables various applications.
Technical Papers


DescriptionExisting underwater image processing methods often struggle due to the limited availability of real paired training data. Models trained on public datasets frequently fail to generalize across diverse underwater conditions and produce suboptimal color restoration. To address these challenges, we propose a self-supervised underwater color restoration framework based on a Wavelet-Diffusion Model with Filtered Multi-Scale Feature Distillation. Specifically, we introduce a wavelet-diffusion training paradigm on terrestrial images, guided by a stochastic underwater imaging model prior. This randomized control enables the model to learn diverse underwater imaging processes, facilitating effective generalization to real-world underwater images and achieving precise color restoration. Furthermore, to tackle feature entanglement in zero-shot domain generalization and mitigate the slow sampling and partial corruption issues of diffusion models, We integrate a Mamba-based U-shaped student network for multi-scale feature distillation. Additionally, we introduce a filtering mechanism to refine the diffusion sampled features, allowing the student model to outperform the teacher in both performance and image quality. Extensive experiments across multiple underwater datasets demonstrate that our approach effectively restores natural colors, eliminating water-induced distortions while achieving state-of-the-art performance in both qualitative and quantitative evaluations. Code and data for this paper are at https://github.com/zx826/FMFD
Technical Papers


DescriptionTraining native 3D texture generative models remains a fundamental yet challenging problem, largely due to the limited availability of large-scale, high-quality 3D texture datasets. This scarcity hinders generalization to real-world scenarios. To address this, most existing methods finetune foundation image generative models to exploit their learned visual priors. However, these approaches typically generate only multi-view images and rely on post-processing to produce UV texture maps-- an essential representation in modern graphics pipelines. Such two-stage pipelines often suffer from error accumulation and spatial inconsistencies across the 3D surface. In this paper, we introduce SeqTex, a novel end-to-end framework that leverages the visual knowledge encoded in pretrained video foundation models to directly generate complete UV texture maps. Unlike previous methods that model the distribution of UV textures in isolation, SeqTex reformulates the task as a sequence generation problem, enabling the model to learn the joint distribution of multi-view renderings and UV textures. This design effectively transfers the consistent image-space priors from video foundation models into the UV domain. To further enhance performance, we propose several architectural innovations: a decoupled multi-view and UV branch design, geometry-informed attention to guide cross-domain feature alignment, and adaptive token resolution to preserve fine texture details while maintaining computational efficiency. Together, these components allow SeqTex to fully utilize pretrained video priors and synthesize high-fidelity UV texture maps without the need for post-processing. Extensive experiments show that SeqTex achieves state-of-the-art performance on both image-conditioned and text-conditioned 3D texture generation tasks, with superior 3D consistency, texture-geometry alignment, and real-world generalization.
Technical Papers


DescriptionHuman motion capture with sparse inertial sensors has gained significant attention recently. However, existing methods almost exclusively rely on a template adult body shape to model the training data, which poses challenges when generalizing to individuals with largely different body shapes (such as a child). This is primarily due to the variation in IMU-measured acceleration caused by changes in body shape. To fill this gap, we propose Shape-aware Inertial Poser (SAIP), the first solution considering body shape differences in sparse inertial-based motion capture. Specifically, we decompose the sensor measurements related to shape and pose in order to effectively model their joint correlations. Firstly, we train a regression model to transfer the IMU-measured accelerations of a real body to match the template adult body model, compensating for the shape-related sensor measurements. Then, we can easily follow the state-of-the-art methods to estimate the full body motions of the template-shaped body. Finally, we utilize a second regression model to map the joint velocities back to the real body, combined with a shape-aware physical optimization strategy to calculate global motions on the subject. Furthermore, our method relies on body shape awareness, introducing the first inertial shape estimation scheme. This is accomplished by modeling the shape-conditioned IMU-pose correlation using an MLP-based network. To validate the effectiveness of SAIP, we also present the first IMU motion capture dataset containing individuals of different body sizes. This dataset features 10 children and 10 adults, with heights ranging from 110 cm to 190 cm, and a total of 400 minutes of paired IMU-Motion samples. Extensive experimental results demonstrate that SAIP can effectively handle motion capture tasks for diverse body shapes.
Technical Papers


DescriptionRecent advances in deep generative modeling have unlocked unprecedented opportunities for video synthesis. In real-world applications, however, users often seek tools to faithfully realize their creative editing intentions with precise and consistent control. Despite the progress achieved by existing methods, ensuring fine-grained alignment with user intentions remains an open and challenging problem. In this work, we present Shape-for-Motion, a novel framework that incorporates a 3D proxy for precise and consistent video editing. Shape-for-Motion achieves this by converting the target object in the input video to a time-consistent mesh, i.e., a 3D proxy, allowing edits to be performed directly on the proxy and then inferred back to the video frames. To simplify the editing process, we design a novel Dual-Propagation Strategy that allows users to perform edits on the 3D mesh of a single frame, and the edits are then automatically propagated to the 3D meshes of the other frames. The 3D meshes for different frames are further projected onto the 2D space to produce the edited geometry and texture renderings, which serve as inputs to a decoupled video diffusion model for generating edited results. Our framework supports various precise, physically consistent manipulations across the video frames, including pose editing, rotation, scaling, translation, texture modification, and object composition. Our approach marks a key step toward high-quality, controllable video editing workflows. Extensive experiments demonstrate the superiority and effectiveness of our approach. Code and data will be made publicly available.
Technical Papers


DescriptionInspired by generative paradigms in image and video, 3D shape generation has made notable progress, enabling the rapid synthesis of high-fidelity 3D assets from a single image. However, current methods still face challenges, including the lack of intricate details, overly smoothed surfaces, and fragmented thin-shell structures. These limitations leave the generated 3D assets still one step short of meeting the standards favored by artists. In this paper, we present ShapeGen, which achieves high-quality image-to-3D shape generation through 3D representation and supervision improvements, resolution scaling up, and the advantages of linear transformers. These advancements allow the generated assets to be seamlessly integrated into 3D pipelines, facilitating their widespread adoption across various applications. Specifically, in contrast to existing methods: 1) We investigate how different representations and VAE supervision strategies affect the generation process, and address issues like aliasing artifacts and fragmented thin-shell structures by using an TSDF-based representation supervised with BCE loss. 2) We scale up the resolution of 3D data, image conditioning inputs, and the number of latent tokens to enhance generation fidelity. 3) We adopt mixed conditioning using raw RGB images and normal maps during training, effectively resolving ambiguities caused by inconsistencies between ControlNet-generated RGB images and the underlying geometry from untextured assets. 4) We replace the original softmax attention with linear attention to improve training and inference efficiency when handling a large number of latent tokens. 5) We introduce an inference-time scaling strategy that enhances generation quality at test time. Through extensive experiments, we validate the impact of these improvements on overall performance. Ultimately, thanks to the synergistic effects of these enhancements, ShapeGen achieves a significant leap in image-to-3D generation, establishing a new state-of-the-art performance.
Technical Communications


DescriptionShapeMeshing transforms 2D animation into clean 3D geometry. It offers modular, artist-friendly tools that bring traditional animation principles into CG, enabling new methods to merge the 2D and 3D styles.
Technical Papers


DescriptionThe intricate geometric complexity of knots, tangles, dreads and clumps require sophisticated grooming systems that allow artists to both realistically model and artistically control fur and hair systems. Recent volumetric and 3D neural style transfer techniques provided a new paradigm of art directability, allowing artists to modify assets drastically with the use of single style images. However, these previous 3D neural stylization approaches were limited to volumes and meshes. In this paper we propose the first stylization pipeline to support hair and fur. Through a carefully tailored fur/hair representation, our approach allows complex, 3D consistent and temporally coherent grooms that are stylized using style images.
Birds of a Feather






DescriptionVulkan is the explicit open-standard API that is powering the next wave of cross-platform high-performance graphics and compute innovation. This BOF will bring the Vulkan developer community together to network, exchange ideas, solve problems and help steer the future development of the API and ecosystem. During the BOF developers will learn from leading Vulkan experts, gain real-world insights into the latest Vulkan developments including, improved profiling tools, broader GPU adoption and the latest cross-vendor extensions. Join us to help shape the future of 3D graphics on mobile, desktop and the cloud.
Technical Communications


DescriptionIn a 51-participant study, user-customized virtual humans were rated more similar and representative than measurement-based models, indicating measurements miss identity aspects; user-driven customization improves faithful digital self-representation.
Technical Communications


DescriptionSHARE reconstructs 3D human motion and scene from an RGB video. It achieves improved spatial accuracy by optimizing the human meshes against the estimated human point maps.
Emerging Technologies






DescriptionWe present ShelfScape, an interactive shelf signage system using magnetic attachments and Hall sensors to enable tangible interactions such as push, pull, rotate, and place. This system enhances user engagement by linking physical actions with digital content. A user study suggests strong memorability and potential for in-store applications.
Poster






DescriptionThe XR-based film Shen Shu Yu Jin Wu draws inspiration from the ancient Chinese myth of the three-legged golden crow in The Classic of Mountains and Seas: Eastern Sea Chapter.
Technical Papers


Description3D scene reconstruction from a single measurement is challenging, especially in the presence of occluded regions and specular materials, such as mirrors. We address these challenges by leveraging single-photon lidars. These lidars estimate depth from light that is emitted into the scene and reflected directly back to the sensor. However, they can also measure light that bounces multiple times in the scene before reaching the sensor. This multi-bounce light contains additional information that can be used to recover dense depth, occluded geometry, and material properties. Prior work with single-photon lidar, however, has only demonstrated these use cases when a laser sequentially illuminates one scene point at a time. We instead focus on the more practical - and challenging - scenario of illuminating multiple scene points simultaneously. The complexity of light transport due to the combined effects of multiplexed illumination, two-bounce light, shadows, and specular reflections is challenging to invert analytically. Instead, we propose a data-driven method to invert light transport in single-photon lidar. To enable this approach, we create the first large-scale simulated dataset of ~100k lidar transients for indoor scenes. We use this dataset to learn a prior on complex light transport, enabling measured two-bounce light to be decomposed into the constituent contributions from each laser spot. Finally, we experimentally demonstrate how this decomposed light can be used to infer 3D geometry in scenes with occlusions and mirrors from a single measurement.
Technical Papers


DescriptionThis paper introduces a method for simplifying textured surface triangle meshes in the wild while maintaining high visual quality. While previous methods achieve excellent results on manifold meshes by using the quadric error metric, they struggle to produce high-quality outputs for meshes in the wild, which often contain non-manifold elements and multiple connected components. In this work, we begin by outlining the pitfalls of existing mesh simplification techniques and highlighting the discrepancy in their formulations with existing mesh data. In this work, we propose a method for simplifying these ``wild'' textured triangle meshes. We formulate mesh simplification as a problem of decimating simplicial 2-complexes to handle multiple non-manifold mesh components as a whole. Building on top of the huge success of quadric error simplification, we iteratively collapse 1-simplices (vertex pairs) with our modified quadric error that converges to the original quadric error metric for closed manifold meshes, while greatly improving the results on wild meshes. For textures, instead of following existing strategies to preserve UVs, we take a novel perspective which focuses on computing mesh correspondences throughout the decimation, regardless of the UV layout. This combination leads to a textured mesh simplification system that can operate on arbitrary triangle meshes, leading to high quality results on wild inputs without sacrificing the excellent performance on clean inputs. Our method guarantees to avoid common issues in textured mesh simplification, such as texture bleeding. We extensively evaluate our method on multiple mesh datasets, showing improvements over prior techniques through qualitative and quantitative evaluations, along with user studies.
Poster






DescriptionData Driven Physics informed learning for Temperature field transient simulation in hot aisle region of a data center.
Poster






DescriptionGenerative agents simulate loneliness, matching human data on standardized psychological scales, presenting a computational approach to model subjective and dynamic psychological experiences.
Technical Papers


DescriptionRendering novel, relit views of a human head, given a monocular portrait image as input, is an inherently underconstrained problem. The traditional graphics solution is to explicitly decompose the input image into geometry, material and lighting via differentiable rendering; but this is constrained by the multiple assumptions and approximations of the underlying models and parameterizations of these scene components. We propose 3DPR, an image-based relighting model that leverages generative priors learnt from multi-view One-Light-at-A-Time (OLAT) images captured in a light stage. We introduce a new diverse and large-scale multi-view 4K OLAT dataset, FaceOLAT, consisting of 139 subjects to learn a high-quality prior over the distribution of high-frequency face reflectance. We leverage the latent space of a pre-trained generative head model that provides a rich prior over face geometry learnt from in-the-wild image datasets. The input portrait is first embedded in the latent manifold of such a model through an encoder-based inversion process. Then a novel triplane-based reflectance network trained on our lightstage data is used to synthesize high-fidelity OLAT images to enable image-based relighting. Our reflectance network operates in the latent space of the generative head model, crucially enabling a relatively small number of lightstage images to train the reflectance model. Combining the generated OLATs according to a given HDRI environment maps yields physically accurate environmental relighting results. Through quantitative and qualitative evaluations, we demonstrate that 3DPR outperforms previous methods, particularly in preserving identity and in capturing lighting effects such as specularities, self-shadows, and subsurface scattering.
Technical Communications


DescriptionMeshFormer accurately reconstructs high-fidelity 3D human models from a single back image, enabling radiation-free scoliosis assessment with sub-millimeter precision.
Technical Papers


DescriptionWe present a novel single-shot method for capturing high-quality facial appearance that enables per-pixel estimation of diffuse albedo, specular albedo, specular roughness, and photometric (specular) normals using only linearly polarized RGB illumination and consumer cameras. Our approach leverages color-multiplexed sinusoidal lighting encoded across RGB channels, allowing phase-based decomposition of reflectance parameters per view, without time-multiplexing or iterative refinement. Unlike prior works that jointly optimize appearance and geometry, we decouple the estimation process -- separately recovering spatially-varying specular albedo and roughness for the first time in a single-shot capture setup. Additionally, we estimate high-frequency specular photometric normals independently using view-dependent specular phase cues, obtaining accurate surface mesostructure. We demonstrate our system using a practical monitor-based capture setup with 15 polarized DSLRs, producing detailed reflectance maps suitable for photorealistic rendering. Our approach achieves higher accuracy in reflectance separation and more accurate estimation of fine-scale surface details compared to previous single-shot methods.
Invited Poster
Poster






DescriptionSinGS — To creates your own high-quality, efficient and animatable avatar from just single image!
Poster






DescriptionSketch-in-Scene is a novel sketch- and environment-based authoring system that generates 3D assets that are coherent with the surroundings.
Technical Papers


Description3D human pose estimation from sketches has broad applications in computer animation and film production. Unlike traditional human pose estimation, this task presents unique challenges due to the abstract and disproportionate nature of sketches. Previous sketch-to-pose methods, constrained by the lack of large-scale sketch-3D pose annotations, primarily relied on optimization with heuristic rules—an approach that is both time-consuming and limited in generalizability. To address these challenges, we propose a novel approach leveraging a "learn from synthesis" strategy. Firstly, a diffusion model is learned to synthesize sketch images from 2D poses projected from 3D human poses, mimicking disproportionate human structures in sketches. This process enables the creation of a synthetic dataset, SKEP-120K, consisting of 120k accurate sketch-3D pose annotation pairs across various sketch styles. Building on this synthetic dataset, we introduce an end-to-end data-driven framework for estimating human poses and shapes from diverse sketch styles. Our framework combines existing 2D pose detectors and generative diffusion priors for sketch feature extraction with a feed-forward neural network for efficient 2D pose estimation. Multiple heuristic loss functions have been incorporated to guarantee geometric coherence between the derived 3D poses and the detected 2D poses while preserving accurate self-contacts. Qualitative, quantitative, and subjective evaluations collectively affirm that our proposed model substantially surpasses previous ones in both estimation accuracy and speed for sketch-to-pose tasks. The code and data will be released upon publication.
Technical Communications


DescriptionWe present a method to extract simulatable polylines of slender objects from user sketches in Gaussian Splatting scenes using efficient screen-space shortest-path analysis.
Technical Communications


DescriptionWe propose a training platform that synthesizes real-time ultrasound from CT and fuses multimodal imaging to improve spatial understanding, accelerate cross-modality learning, and advance musculoskeletal minimally invasive surgery skills.
Technical Papers


DescriptionAnimation retargeting involves applying a sparse motion description (e.g., 2D/3D keypoint sequences) to a given character mesh to produce a semantically plausible and temporally coherent full-body motion. This brings the characters to life. Given its practical relevance, It remains a highly desired tool in any digital character workflow. An ideal data-driven solution to this problem should be able to work without templates, without access to corrective keyframes, and still generalize to novel characters and unseen motions. Existing approaches come with a mix of restrictions -- they require annotated training data, assume access to template-based shape priors or artist-designed deformation rigs, suffer from limited generalization to unseen motion and/or shapes, or exhibit motion jitter. We propose Self-supervised Motion Fields (SMF) as a self-supervised framework that can be robustly trained with sparse motion representations, without requiring dataset specific annotations, templates, or rigs. At the heart of our method are Kinetic Codes, a novel autoencoder-based sparse motion encoding, that exposes a semantically rich latent space simplifying large-scale training. Our architecture comprises of dedicated spatial and temporal gradient predictors, which are trained end-to-end. The resultant network, regularized by the Kinetic Codes's latent space, has good generalization across shapes and motions. We evaluated our method on unseen motion sampled from AMASS, D4D, Mixamo, and raw monocular video for animation transfer on various characters with varying shapes and topology. We report a new SoTA on the AMASS dataset in the context of generalization to unseen motion. (Source code will be released.)
Technical Papers


DescriptionWe introduce a novel class of polyhedral tori (PQ-toroids) that snap between
two stable configurations – a flat state and a deployed one separated by an
energy barrier. Being able to create PQ-toroids from any set of given planar
bottom and side faces opens the possibility to assemble the bistable blocks
into a thick freeform curved shell structure to follow a planar quadrilateral
(PQ) net with coplanar adjacent offset directions.
A design pipeline is developed and presented for inversely computing
PQ-toroid modules using conjugate net decompositions of a given surface.
We analyze the snapping behavior and energy barriers through simulation
and build physical prototypes to validate the feasibility of the proposed
system.
This work expands the geometric design space of multistable origami for
lightweight modular structures and offers practical applications in architec-
tural and deployable systems.
two stable configurations – a flat state and a deployed one separated by an
energy barrier. Being able to create PQ-toroids from any set of given planar
bottom and side faces opens the possibility to assemble the bistable blocks
into a thick freeform curved shell structure to follow a planar quadrilateral
(PQ) net with coplanar adjacent offset directions.
A design pipeline is developed and presented for inversely computing
PQ-toroid modules using conjugate net decompositions of a given surface.
We analyze the snapping behavior and energy barriers through simulation
and build physical prototypes to validate the feasibility of the proposed
system.
This work expands the geometric design space of multistable origami for
lightweight modular structures and offers practical applications in architec-
tural and deployable systems.
Technical Papers


DescriptionWe present Social Agent, a novel framework for synthesizing realistic and contextually appropriate co-speech nonverbal behaviors in dyadic conversations. In this framework, we develop an agentic system driven by a Large Language Model (LLM) to direct the conversation flow and determine appropriate interactive behaviors for both participants. Additionally, we propose a novel dual-person gesture generation model based on an auto-regressive diffusion model, which synthesizes coordinated motions from speech signals. The output of the agentic system is translated into high-level guidance for the gesture generator, resulting in realistic movement at both the behavioral and motion levels. Furthermore, the agentic system periodically examines the movements of interlocutors and infers their intentions, forming a continuous feedback loop that enables dynamic and responsive interactions between the two participants. User studies and quantitative evaluations show that our model significantly improves the quality of dyadic interactions, producing natural, synchronized nonverbal behaviors. We will release the code and prompts for academic research.
Technical Papers


DescriptionRecent advances in 3D Gaussian representations have significantly improved the quality and efficiency of image-based scene reconstruction.
Their explicit nature facilitates real-time rendering and fast optimization, yet extracting accurate surfaces - particularly in large-scale, unbounded environments - remains a difficult task.
Many existing methods rely on approximate depth estimates and global sorting heuristics, which can introduce artifacts and limit the fidelity of the reconstructed mesh.
In this paper, we present Sorted Opacity Fields (SOF), a method designed to recover detailed surfaces from 3D Gaussians with both speed and precision.
Our approach improves upon prior work by introducing hierarchical resorting and a robust formulation of Gaussian depth, which better aligns with the level-set.
To enhance mesh quality, we incorporate a level-set regularizer operating on the opacity field and introduce losses that encourage geometrically-consistent primitive shapes.
In addition, we develop a parallelized Marching Tetrahedra algorithm tailored to our opacity formulation, reducing meshing time by up to an order of magnitude.
As demonstrated by our quantitative evaluation, SOF achieves higher reconstruction accuracy while cutting total processing time by more than a factor of three.
These results mark a step forward in turning efficient Gaussian-based rendering into equally efficient geometry extraction.
Their explicit nature facilitates real-time rendering and fast optimization, yet extracting accurate surfaces - particularly in large-scale, unbounded environments - remains a difficult task.
Many existing methods rely on approximate depth estimates and global sorting heuristics, which can introduce artifacts and limit the fidelity of the reconstructed mesh.
In this paper, we present Sorted Opacity Fields (SOF), a method designed to recover detailed surfaces from 3D Gaussians with both speed and precision.
Our approach improves upon prior work by introducing hierarchical resorting and a robust formulation of Gaussian depth, which better aligns with the level-set.
To enhance mesh quality, we incorporate a level-set regularizer operating on the opacity field and introduce losses that encourage geometrically-consistent primitive shapes.
In addition, we develop a parallelized Marching Tetrahedra algorithm tailored to our opacity formulation, reducing meshing time by up to an order of magnitude.
As demonstrated by our quantitative evaluation, SOF achieves higher reconstruction accuracy while cutting total processing time by more than a factor of three.
These results mark a step forward in turning efficient Gaussian-based rendering into equally efficient geometry extraction.
Poster






DescriptionWe present SofiBuddy, a soft mobile interface with a lightweight transition module and a curvature-based inflatable actuator, offering gentle conforming motion with interactive potential for companionship, assistance, and VR engagement.
Technical Papers


DescriptionArtist-created meshes in-the-wild often do not have a well defined interior. We observe that they typically consist of a mix of solid elements, faces that bound a volume, and shell elements that represent the medial surface of a thin shell. The lack of a well-defined interior prevents downstream applications, such as solid-modeling, simulation, and manufacturing. We present a method that takes as input a surface mesh and assigns to each face a label determining whether it belongs to a solid or shell. These labels reduce ambiguity by defining the interior for solid faces through thresholding the generalized winding number field, and for shell faces as the volume within an offset. We cast the labeling problem as an optimization that outputs a solid/shell label for each face, guided by a sparse set of user inputs. Once labeling is complete, we show how the shape can be volume meshed by passing the shell faces through an offset mesher and the solid faces to an off-the-shelf tetrahedral mesher, producing a final volumetric mesh by taking their union. Experiments on diverse meshes with defects and multiple solid and shell components demonstrate that our approach delivers the desired labels, enabling modeling and simulation on wild meshes in a way that respects the user intent.
Computer Animation Festival






DescriptionThe story of a boy and his dad who live in a small house on the edge of the steppe. The boy is confined to a wheelchair and can't even take a spoon with porridge. The dad is close to despair, but everything changes when one day the TV shows the landing of the newest robot on Mars… This tale is dedicated to all parents who don't give up.
Art Papers



DescriptionSonic Shower is an interactive sound installation. The artwork relates to Velasco's research on distinguishing water temperature by its sound and the use of sound waves to cleanse the body as depicted in the science-fiction series Star Trek. In addition, Sonic Shower is a speculative design that imagines a dystopian future where individuals, confronted with severe water scarcity, employ technology to replicate the shower experience for psychological solace. The installation creates a playful experience by converting a classic wall-mounted standing multi-shower system into loudspeakers and MIDI controllers that generate sound using a Max patcher.
Technical Papers


DescriptionCloud computing has seen rapid growth in recent years, accompanied by the increasing popularity of game streaming services that allow users to play high-end games on low-end devices, across platforms, and from virtually anywhere.
The rise of multiplayer games, shared immersive experiences, and metaverse-style applications—such as exhibitions or social virtual spaces—presents unique opportunities for improving rendering efficiency.
In particular, the presence of multiple viewers within the same virtual environment opens the door for computation reuse across rendering instances.
We propose a scalable, multi-GPU cloud rendering system tailored for multi-viewer scenarios.
Built on top of on-surface caches, our system extends the core idea of decoupling shading from viewpoints to enable efficient reuse of shading information across multiple users.
Our system is designed to scale with an increasing number of viewers by dynamically distributing rendering workloads across multiple GPUs.
We further enhance scalability and significantly reduce inter-GPU bandwidth requirements from 6x up to 65x—through a novel sparse cache update strategy.
Instead of copying full frames between GPUs, our method selectively propagates only relevant cache updates, enabling efficient data sharing while minimizing redundant transfers.
The rise of multiplayer games, shared immersive experiences, and metaverse-style applications—such as exhibitions or social virtual spaces—presents unique opportunities for improving rendering efficiency.
In particular, the presence of multiple viewers within the same virtual environment opens the door for computation reuse across rendering instances.
We propose a scalable, multi-GPU cloud rendering system tailored for multi-viewer scenarios.
Built on top of on-surface caches, our system extends the core idea of decoupling shading from viewpoints to enable efficient reuse of shading information across multiple users.
Our system is designed to scale with an increasing number of viewers by dynamically distributing rendering workloads across multiple GPUs.
We further enhance scalability and significantly reduce inter-GPU bandwidth requirements from 6x up to 65x—through a novel sparse cache update strategy.
Instead of copying full frames between GPUs, our method selectively propagates only relevant cache updates, enabling efficient data sharing while minimizing redundant transfers.
Technical Papers


DescriptionReflectance acquisition from sparse images has been a long-standing problem in computer graphics. Previous works have addressed this by introducing either material-related priors or illumination multiplexing with a general sampling strategy.
However, fixed lighting patterns in multiplexing can lead to redundant sampling and entangled observations, making it necessary to adaptively capture salient reflectance responses in each shot based on material behavior. In this paper, we propose combining adaptive sampling with illumination multiplexing for SVBRDF reconstruction from sparse images lit by a planar light source. Central to our method is the modeling of a sampling importance distribution on lighting surface, guided by the statistical nature of microfacet theory. Based on this sampling structure, our framework jointly trains networks to learn an adaptive sampling strategy in the lighting domain, and furthermore, approximately separates pure specular-related information from observations to reduce ambiguities in reconstruction. We validate our approach through experiments and comparisons with previous works on both synthetic and real materials.
However, fixed lighting patterns in multiplexing can lead to redundant sampling and entangled observations, making it necessary to adaptively capture salient reflectance responses in each shot based on material behavior. In this paper, we propose combining adaptive sampling with illumination multiplexing for SVBRDF reconstruction from sparse images lit by a planar light source. Central to our method is the modeling of a sampling importance distribution on lighting surface, guided by the statistical nature of microfacet theory. Based on this sampling structure, our framework jointly trains networks to learn an adaptive sampling strategy in the lighting domain, and furthermore, approximately separates pure specular-related information from observations to reduce ambiguities in reconstruction. We validate our approach through experiments and comparisons with previous works on both synthetic and real materials.
Poster






DescriptionWe propose a skiing system for people with visual impairments (PVI) that uses spatial audio to indicate the positions of obstacles in dynamic environments.
Poster






DescriptionThis study proposes a novel method for aligning 3DGS data by leveraging its unique characteristics. Our approach uses a sampling and cleaning process to efficiently extract key points, enabling high-accuracy and fast registration without GPU. Experiments show our method outperforms existing techniques in speed and accuracy, even on low-end hardware.
Technical Papers


DescriptionNeural fields excel at representing continuous visual signals but typically operate at a single, fixed resolution. We present a simple yet powerful method to optimize neural fields that can be prefiltered in a single forward pass. Key innovations and features include:
(1) We perform convolutional filtering in the input domain by analytically scaling
Fourier feature embeddings with the filter’s frequency response.
(2) This closed-form modulation generalizes beyond Gaussian filtering and
supports other parametric filters (Box and Lanczos) that are unseen at training time.
(3) We train the neural field using single-sample Monte Carlo estimates of the
filtered signal. Our method is fast during both training and inference, and imposes no additional constraints on the network architecture. We show quantitative and qualitative improvements over existing methods for neural-field filtering.
(1) We perform convolutional filtering in the input domain by analytically scaling
Fourier feature embeddings with the filter’s frequency response.
(2) This closed-form modulation generalizes beyond Gaussian filtering and
supports other parametric filters (Box and Lanczos) that are unseen at training time.
(3) We train the neural field using single-sample Monte Carlo estimates of the
filtered signal. Our method is fast during both training and inference, and imposes no additional constraints on the network architecture. We show quantitative and qualitative improvements over existing methods for neural-field filtering.
Technical Papers


DescriptionSpectral information plays a crucial role in many domains, including remote sensing, cultural heritage analysis, food inspection, and material appearance modeling. Spectral measurements, such as hyperspectral imaging, provide a powerful means of acquiring this information but often require expensive equipment and time-consuming capture procedures.
We propose a new method for recovering spectral information from multispectral images using differentiable rendering, which naturally incorporates 3D geometry and light transport. However, the inverse problem is ill-posed: conventional pipelines produce a single spectrum that may differ significantly from the ground truth. To address this ambiguity, we introduce a spectral upsampling framework based on null-space sampling, which generates multiple candidate spectra consistent with the input multi-band image. This enables uncertainty quantification across wavelengths and informs the design of additional measurements to improve reconstruction. We also demonstrate how to incorporate interreflections into the algorithm to enhance reconstruction accuracy.
We validate our method on synthetic scenes using real-world spectral data and RGB renderings, and demonstrate its effectiveness in physical experiments. Our approach not only avoids the cost and complexity of hyperspectral imaging, but also significantly accelerates the reconstruction process compared to brute-force methods that treat each wavelength independently. Moreover, it supports spectral material authoring by generating diverse, physically plausible spectra from a single RGB input, enabling greater flexibility and artistic control in spectral rendering.
We propose a new method for recovering spectral information from multispectral images using differentiable rendering, which naturally incorporates 3D geometry and light transport. However, the inverse problem is ill-posed: conventional pipelines produce a single spectrum that may differ significantly from the ground truth. To address this ambiguity, we introduce a spectral upsampling framework based on null-space sampling, which generates multiple candidate spectra consistent with the input multi-band image. This enables uncertainty quantification across wavelengths and informs the design of additional measurements to improve reconstruction. We also demonstrate how to incorporate interreflections into the algorithm to enhance reconstruction accuracy.
We validate our method on synthetic scenes using real-world spectral data and RGB renderings, and demonstrate its effectiveness in physical experiments. Our approach not only avoids the cost and complexity of hyperspectral imaging, but also significantly accelerates the reconstruction process compared to brute-force methods that treat each wavelength independently. Moreover, it supports spectral material authoring by generating diverse, physically plausible spectra from a single RGB input, enabling greater flexibility and artistic control in spectral rendering.
Technical Papers


DescriptionRecently, 3D Gaussian Splatting (3DGS) has achieved impressive results in novel view synthesis, demonstrating high fidelity and efficiency. However, it easily exhibits needle-like artifacts, especially when increasing the sampling rate. Mip-Splatting tries to remove these artifacts with a 3D smoothing filter for frequency constraints and a 2D Mip filter for approximated supersampling. Unfortunately, it tends to produce over-blurred results, and sometimes needle-like Gaussians still persist. Our spectral analysis of the covariance matrix during optimization and densification reveals that current 3DGS lacks shape awareness, relying instead on spectral radius and view positional gradients to determine splitting. As a result, needle-like Gaussians with small positional gradients and low spectral entropy fail to split and overfit high-frequency details. Furthermore, both the filters used in 3DGS and Mip-Splatting reduce
the spectral entropy and increase the condition number during zooming in to synthesize novel view, causing view inconsistencies and more pronounced artifacts. Our Spectral-GS, based on spectral analysis, introduces 3D shape-aware splitting and 2D view-consistent filtering strategies, effectively addressing these issues, enhancing 3DGS's capability to represent high-frequency details without noticeable artifacts, and achieving high-quality realistic rendering.
the spectral entropy and increase the condition number during zooming in to synthesize novel view, causing view inconsistencies and more pronounced artifacts. Our Spectral-GS, based on spectral analysis, introduces 3D shape-aware splitting and 2D view-consistent filtering strategies, effectively addressing these issues, enhancing 3DGS's capability to represent high-frequency details without noticeable artifacts, and achieving high-quality realistic rendering.
Technical Papers


DescriptionExisting single-view 3D generative models typically adopt multiview diffusion priors to reconstruct object surfaces, yet they remain prone to inter-view inconsistencies and are unable to faithfully represent complex internal structure or nontrivial topologies. In particular, we encode geometry information by projecting it onto a bounding sphere and unwrapping it into a compact and structural multi-layer 2D Spherical Projection (SP) representation. Operating solely in the image domain, SPGen offers three key advantages simultaneously: (1) Consistency. The injective SP mapping encodes surface geometry with single view-point which naturally eliminates view inconsistency and ambiguity; (2) Flexibility. Multi-layer SP maps represent nested internal structures and support direct lifting to watertight or open 3D surfaces; (3) Efficiency. The image-domain formulation allows direct inherit powerful 2D diffusion priors and enables effcient finetuning with limited computational resources. Extensive experiments demonstrate that SPGen significantly outperforms existing baselines in geometric quality and computational efficiency.
Technical Communications


DescriptionWe introduce a production-ready framework for controllable, non-destructive facial emotion editing using Plutchik’s emotion wheel and VLM annotation, enabling fine-grained emotion control while preserving performance integrity with an animator-centric interface.
Technical Communications


DescriptionSplineSplat replaces Gaussian kernels with compact B-spline bases for radiance field rendering. It enhances spatial localization, preserves fine details, and accelerates training with experimental results comparable with Gaussian splatting.
Technical Papers


DescriptionThis paper addresses the problem of decomposed 4D scene reconstruction from multi-view videos.
Recent methods achieve this by lifting video segmentation results to a 4D representation through differentiable rendering techniques.
Therefore, they heavily rely on the quality of video segmentation maps, which are often unstable, leading to unreliable reconstruction results.
To overcome this challenge, our key idea is to represent the decomposed 4D scene with the Freetime FeatureGS and design a streaming feature learning strategy to accurately recover it from per-image segmentation maps, eliminating the need for video segmentation.
Freetime FeatureGS models the dynamic scene as a set of Gaussian primitives with learnable features and linear motion ability, allowing them to move to neighboring regions over time.
We apply a contrastive loss to Freetime FeatureGS, forcing primitive features to be close or far apart based on whether their projections belong to the same instance in the 2D segmentation map.
As our Gaussian primitives can move across time, it naturally extends the feature learning to the temporal dimension, achieving 4D segmentation.
Furthermore, we sample observations for training in a temporally ordered manner, enabling the streaming propagation of features over time and effectively avoiding local minima during the optimization process.
Experimental results on several datasets show that the reconstruction quality of our method outperforms recent methods by a large margin.
Recent methods achieve this by lifting video segmentation results to a 4D representation through differentiable rendering techniques.
Therefore, they heavily rely on the quality of video segmentation maps, which are often unstable, leading to unreliable reconstruction results.
To overcome this challenge, our key idea is to represent the decomposed 4D scene with the Freetime FeatureGS and design a streaming feature learning strategy to accurately recover it from per-image segmentation maps, eliminating the need for video segmentation.
Freetime FeatureGS models the dynamic scene as a set of Gaussian primitives with learnable features and linear motion ability, allowing them to move to neighboring regions over time.
We apply a contrastive loss to Freetime FeatureGS, forcing primitive features to be close or far apart based on whether their projections belong to the same instance in the 2D segmentation map.
As our Gaussian primitives can move across time, it naturally extends the feature learning to the temporal dimension, achieving 4D segmentation.
Furthermore, we sample observations for training in a temporally ordered manner, enabling the streaming propagation of features over time and effectively avoiding local minima during the optimization process.
Experimental results on several datasets show that the reconstruction quality of our method outperforms recent methods by a large margin.
Technical Papers


DescriptionGenerating realistic and robust motion for virtual characters under complex physical conditions, such as irregular terrain, real-time control scenarios, and external disturbances, remains a key challenge in computer graphics.
While deep reinforcement learning has enabled high-fidelity physics-based character animation, such methods often suffer from limited generalizability, as learned controllers tend to overfit to the environments they were trained in. In contrast, simplified models, such as single rigid bodies, offer better adaptability, but traditionally require hand-crafted heuristics and can only handle short motion segments. In this paper, we present a general learning framework that trains a single-rigid-body (SRB) character controller from long and unstructured datasets, without the reliance on human-crafted rules. Our method enables zero-shot adaptation to diverse environments and unseen motion styles. The resulting controller generates expressive and physically plausible motions in real time and seamlessly integrates with high-level kinematic motion planners without retraining, enabling a wide range of downstream tasks.
While deep reinforcement learning has enabled high-fidelity physics-based character animation, such methods often suffer from limited generalizability, as learned controllers tend to overfit to the environments they were trained in. In contrast, simplified models, such as single rigid bodies, offer better adaptability, but traditionally require hand-crafted heuristics and can only handle short motion segments. In this paper, we present a general learning framework that trains a single-rigid-body (SRB) character controller from long and unstructured datasets, without the reliance on human-crafted rules. Our method enables zero-shot adaptation to diverse environments and unseen motion styles. The resulting controller generates expressive and physically plausible motions in real time and seamlessly integrates with high-level kinematic motion planners without retraining, enabling a wide range of downstream tasks.
Technical Papers


DescriptionWe present SS4D, a native 4D generative model that synthesizes dynamic 3D objects directly from monocular video. Unlike prior approaches that construct 4D representations by optimizing over 3D or video generative models, we train a generator directly on 4D data, achieving high fidelity, temporal coherence, and structural consistency. At the core of our method is a compressed set of structured spacetime latents. Specifically, (1) To address the scarcity of 4D training data, we build on a pre-trained single-image-to-3D model, preserving strong spatial consistency. (2) Temporal consistency is enforced by introducing dedicated temporal layers that reason across frames.
(3) To support efficient training and inference over long video sequences, we compress the latent sequence along the temporal axis using factorized 4D convolutions and temporal downsampling blocks. In addition, we employ a carefully designed training strategy to enhance robustness against occlusion and motion blur, leading to high-quality generation. Extensive experiments show that {\ourmethod} produces spatio-temporally consistent 4D objects with superior quality and efficiency, significantly outperforming state-of-the-art methods on both synthetic and real-world datasets.
(3) To support efficient training and inference over long video sequences, we compress the latent sequence along the temporal axis using factorized 4D convolutions and temporal downsampling blocks. In addition, we employ a carefully designed training strategy to enhance robustness against occlusion and motion blur, leading to high-quality generation. Extensive experiments show that {\ourmethod} produces spatio-temporally consistent 4D objects with superior quality and efficiency, significantly outperforming state-of-the-art methods on both synthetic and real-world datasets.
Technical Papers


DescriptionMotion capture (mocap) data often exhibits visually jarring artifacts due to inaccurate sensors and post-processing. Cleaning this corrupted data can require substantial manual effort from human experts, which can be a costly and time-consuming process. Previous data-driven motion cleanup methods offer the promise of automating this cleanup process, but often require in-domain paired corrupted-to-clean training data. Constructing such paired datasets requires access to high-quality, relatively artifact-free motion clips, which often necessitates laborious manual cleanup. In this work, we present StableMotion, a simple yet effective method for training motion cleanup models directly from unpaired corrupted datasets that need cleanup. The core component of our method is the introduction of motion quality indicators, which can be easily annotated— through manual labeling or heuristic algorithms—and enable training of quality-aware motion generation models on raw motion data with mixed quality. At test time, the model can be prompted to generate high-quality motions using the quality indicators. Our method can be implemented through a simple diffusion-based framework, leading to a unified motion generate-discriminate model, which can be used to both identify and fix corrupted frames. We demonstrate that our proposed method is effective for training motion cleanup models on raw mocap data in production scenarios by applying StableMotion to SoccerMocap, a 245-hour soccer mocap dataset containing real-world motion artifacts. The trained model effectively corrects a wide range of motion artifacts, reducing motion pops and frozen frames by 68% and 81%, respectively. On our benchmark dataset, we further show that cleanup models trained with our method on unpaired corrupted data outperform state-of-the-art methods trained on clean or paired data, while also achieving comparable performance in preserving the content of the original motion clips.
Technical Papers


Description3D cellular metamaterials are valued for many unique and useful mechanical properties. They enable lightweight, high-strength structures, with a wide range of directional stiffness profiles and possible auxetic behaviour. Infill patterns based on triply-periodic minimal surfaces (TPMS) are commonly used in additive manufacturing due to their high strength-to-weight ratio and near-isotropic mechanical behaviour. While existing work provides a wide range of cellular metamaterials to choose from, optimization of these patterns remains a significant challenge due to the diverse space of possible surface topologies and the lack of a unified parameterization. To this end, Voronoi diagrams with star-shaped distance metrics have been shown to provide a continuous parameterization of 2D cellular metamaterials, opening a rich space of possible designs. Extending the work of Zhou et al. [2025], we provide a novel, differentiable construction of 3D volumetric Voronoi diagrams with star-shaped metrics. We integrate this into a complete pipeline for mechanical metamaterial optimization, demonstrating the flexibility of star-shaped metric Voronoi diagrams by achieving periodic structures with a range of target directional stiffness profiles and stress-strain curves. Furthermore, we demonstrate the applicability of this framework to heterogeneous, smoothly graded cellular structures.
Birds of a Feather






DescriptionLearn more about how to join or start an ACM SIGGRAPH professional or student chapter, how to make the most of your chapters experience, and to network with current chapter members.
More information about Chapters is available at https://siggraph.org/chapters.
More information about Chapters is available at https://siggraph.org/chapters.
Technical Papers


DescriptionDenoising is an important post-processing step in physically based Monte Carlo (MC) rendering. While neural networks are widely used in practice, statistical analysis has recently become a viable alternative for denoising. In this paper, we present a general framework for statistics-based error reduction of both estimated radiance and variance. Specifically, we introduce a novel denoising approach for variance estimates, which can either improve variance-aware adaptive sampling or provide additional input for image denoising in a cascaded manner. Furthermore, we present multi-transform denoising: a general and efficient correction scheme for non-normal distributions, which typically occur in MC rendering. All these contributions combine to a robust denoising pipeline that does not require any pretraining and can run efficiently on current GPU hardware. Our results show distinct advantages over previous denoising methods, especially in the range of a few hundred samples per pixel, which is of high practical relevance. Finally, we demonstrate good convergence behavior as the number of samples increases, providing predictable results with low bias that are free of hallucinated neural artifacts. In summary, our statistics-based algorithms for adaptive sampling and denoising deliver fast, consistent, low-bias variance and radiance estimates.
Technical Communications


DescriptionWe propose fast and robust Steiner Traversal Initialization (STI) that significantly accelerates geometric flow-based surface-filling on manifold meshes. STI uses Poisson disk sampling and Steiner-tree traversal to generate initial curves.
Technical Papers


DescriptionIn recent years, the community has seen the emergence of neural-based super-resolution and frame generation techniques. These methods have effectively sped up high-resolution rendering by exploiting the spatial and temporal coherence between sequential frames, but none of them are designed specifically for improving the rendering performance in VR applications, where stereo rendering doubles the rendering cost.
To explicitly exploit the binocular coherence between left and right views in VR, we design a centering scheme to align the features from both eyes, so the network can efficiently handle binocular information with superior performance. With this centered feature as the core, we use a novel cyclic network to propagate the centered feature to the next frame to improve temporal stability. Finally, we propose a new multi-frequency composition scheme to robustly blend pixels of various frequencies, generating high-quality images. Our network architecture can effectively utilize both the temporal and cross-view coherence of the stereo rendered results. We thus propose a novel neural frame generation pipeline in which only one view needs to be shaded at low resolution for one frame, and we alternately shade the left and right eyes; the proposed network can generate a quadruple high-quality rendering result of both views while delivering superior temporal stability.
The experiments demonstrate the effectiveness of our method across a variety of scenarios, including complex lighting variations, intricate aggregate geometries, and multi-object motion.
To explicitly exploit the binocular coherence between left and right views in VR, we design a centering scheme to align the features from both eyes, so the network can efficiently handle binocular information with superior performance. With this centered feature as the core, we use a novel cyclic network to propagate the centered feature to the next frame to improve temporal stability. Finally, we propose a new multi-frequency composition scheme to robustly blend pixels of various frequencies, generating high-quality images. Our network architecture can effectively utilize both the temporal and cross-view coherence of the stereo rendered results. We thus propose a novel neural frame generation pipeline in which only one view needs to be shaded at low resolution for one frame, and we alternately shade the left and right eyes; the proposed network can generate a quadruple high-quality rendering result of both views while delivering superior temporal stability.
The experiments demonstrate the effectiveness of our method across a variety of scenarios, including complex lighting variations, intricate aggregate geometries, and multi-object motion.
Technical Papers


DescriptionEstimating lighting in indoor scenes is particularly challenging due to the diverse distribution of light sources and the complexity of scene geometry. Previous methods mainly focused on spatial variability and consistency for a single image or temporal consistency for video sequences. However, these approaches fail to achieve spatio-temporal consistency in video lighting estimation, which restricts applications such as compositing animated models into videos. In this paper, we propose STGLight, a lightweight and effective method for spatio-temporally consistent video lighting estimation, where our network processes a stream of LDR RGBD video frames while maintaining incrementally updated global representations of both geometry and lighting, enabling the prediction of HDR environment maps at arbitrary locations for each frame. We model indoor lighting with three components: visible light sources providing direct illumination, ambient lighting approximating indirect illumination, and local environmental textures producing high-quality specular reflections on glossy objects. To capture spatial-varying lighting, we represent scene geometry with point clouds, which support efficient spatio-temporal fusion and allow us to handle moderately dynamic scenes. To ensure temporal consistency, we apply a transformer-based fusion block that propagates lighting features across frames. Building on this, we further handle dynamic lighting with moving objects or changing light conditions by applying intrinsic decomposition on the point cloud and integrating the decomposed components with a neural fusion module. Experiments show that our online method can effectively predict lighting for any position within the video stream, while maintaining spatial variability and spatio-temporal consistency. Code is available at: https://github.com/nauyihsnehs/STGlight
Technical Papers


DescriptionWe present a method for automatically converting strand-based hair models into an efficient mesh-based representation, known as hair cards, for real-time rendering. Specifically, our method uses strands as inputs and outputs polygon strips with semitransparent texture, preserving the appearance of the original strand-based hair model. To achieve this, we first cluster strands into groups that are referred to as wisps and generate a detail-preserving texture for each wisp by aligning the strands into a normalized pose in the UV space via skinning-based deformation. We further preserve high-resolution details via texture compression, where fewer, but higher resolution textures are shared among similar wisps. Then, the textured polygon strip geometry is fitted to the original hair model via tailored differentiable rendering that can handle transparent cluster-colored coverage masks. The proposed method successfully handles a wide range of hair models and especially outperforms existing approaches in representing volumetric hairstyles such as curly and wavy ones. In addition, our card optimization can be easily parallelized and can efficiently convert a full-hair model with more than 100 thousand strands. Our method was extensively tested on both a hair database and many complex real-world hairstyles acquired using state-of-the-art hair capture methods.
Emerging Technologies






DescriptionOur data-driven system for Beat Saber uses co-embodiment with player recordings for
personalized motor skill training. Selecting a teacher close in skill but with a dissimilar motion
motif (recurrent movement units) improves learner agency and physical synchronization,
demonstrating a new method for effective XR training using community data.
personalized motor skill training. Selecting a teacher close in skill but with a dissimilar motion
motif (recurrent movement units) improves learner agency and physical synchronization,
demonstrating a new method for effective XR training using community data.
Technical Papers


DescriptionCreating 3D assets that follow the texture and geometry style of existing ones is often desirable or even inevitable in practical applications like video gaming and virtual reality. While impressive progress has been made in generating 3D objects from text or images, creating style-controllable 3D assets remains a complex and challenging problem. In this work, we propose StyleSculptor, a novel training-free approach for generating style-guided 3D assets from a content image and one or more style images. Unlike previous works, StyleSculptor achieves style-guided 3D generation in a zero-shot manner, enabling fine-grained 3D style control that captures the texture, geometry, or both styles of user-provided style images. At the core of StyleSculptor is a novel Style Disentangled Attention (SD-Attn) module, which establishes a dynamic interaction between \why{the input content image and style image for style-guided 3D asset generation via a cross-3D attention mechanism, enabling stable feature fusion and effective style-guided generation. To alleviate semantic content leakage, we also introduce a style-disentangled feature selection strategy within the SD-Attn module, which leverages the variance of 3D feature patches to disentangle style- and content-significant channels, allowing selective feature injection within the attention framework. With SD-Attn, the network can dynamically compute texture-, geometry-, or both-guided features to steer the 3D generation process. Built upon this, we further propose the Style Guided Control (SGC) mechanism, which enables exclusive geometry- or texture-only stylization, as well as adjustable style intensity control. StyleSculptor does not require prior training and enables instant adaptation to any reference models while maintaining strict user-specified style consistency. Extensive experiments demonstrate that StyleSculptor outperforms existing baseline methods in producing high-fidelity 3D assets.
Art Papers



DescriptionAdvances in generative art offer distinctive creative possibilities while also raising questions about control, power, and agency. This paper examines the parallels between Dadaist photomontage — techniques that subverted readymade images through decontextualisation and aesthetic exploration — and recent generative AI art practices. While the Dadaists harnessed appropriation to disrupt dominant narratives and express dissent, generative AI’s algorithmic opacity complicates artists’ agency and muddles contextual meanings. By analysing Dadaist methodologies, this paper encourages artists to engage with generative AI in a deliberate and considered manner, reframing its outputs as tools for disruption and creative intervention.
Technical Papers


DescriptionWe show how the problem of creating a triangulation in d-dimensional space that conforms to constraints given as sub-simplices can be turned into the problem of computing the lower hull of a sum of wedge functions. This sum can be interpreted as a Weighted Delaunay Triangulations, necessarily containing the constraints as unions of its elements. Intersections of wedges lead to Steiner points. As the number of such intersections is polynomial in the number of wedges, and the number of wedges per element is typically 1 (at most d), this proves that the complexity of the output is polynomial. Moreover, we show that the majority of wedge intersections is unnecessary for a conforming triangulation and further heuristically reduce the number of Steiner points. Using appropriate data structures, the function can be evaluated in quasi-linear time, leading to an output sensitive algorithm
Technical Papers


DescriptionWhen an image is seen on an optical see-through augmented reality (AR) display, the light from the display is mixed with the background light from the environment. This can severely limit the available contrast in AR, which is often orders of magnitude below that of traditional displays. Yet, the presented images appear sharper and show more details than the reduction in physical contrast would indicate. In this work, we hypothesize two effects that are likely responsible for the enhanced perceived contrast in AR: background discounting, which allows observers focused on the display plane to partially discount the light from environment; and supra-threshold contrast perception, which explains the differences in contrast perception across luminance levels. In a series of controlled experiments on AR high-dynamic-range multi-focal haploscope testbed, we found no statistical evidence supporting the effect of background discounting on contrast perception. Instead, the increase of visibility in AR is better explained with models of supra-threshold contrast perception. Our findings can be generalized to incorporate an image input and this model serves to design better algorithms and hardware for display systems affected by additive light, such as AR.
Technical Papers


DescriptionMany 3D tasks such as pose alignment, animation, motion transfer, and 3D reconstruction rely on establishing correspondences between 3D shapes. This challenge has recently been approached by pairwise matching of semantic features from pre-trained vision models. However, despite their power, these features struggle to differentiate instances of the same semantic class such as ``left hand'' versus ``right hand'' which leads to substantial mapping errors. To solve this, we learn a surface-aware embedding space that is robust to these ambiguities while facilitating shared mapping for an entire family of 3D shapes. Importantly, our approach is self-supervised and requires only a small number of unpaired training meshes to infer features for new possibly imperfect 3D shapes at test time. We achieve this by introducing a contrastive loss that preserves the semantic content of the features distilled from foundational models while disambiguating features located far apart on the shape's surface. We observe superior performance in correspondence matching benchmarks and enable downstream applications including 2D-to-3D and 3D-to-3D texture transfer, in-part segmentation, pose alignment, and motion transfer in low-data regimes. Unlike previous pairwise approaches, our solution constructs a joint embedding space, where both seen and unseen 3D shapes are implicitly aligned without further optimization.
Computer Animation Festival






DescriptionThe main challenge of Susurros was to create a hybrid 2D/3D graphic style, blending full matte painting/camera mapping sequences with 3D environments. In compositing, we explored ways to flatten the 3D render to echo a more illustrative look. A key focus was to separate past and present through color palettes and patterns, while maintaining unity in textures and character design. We also developed two distinct approaches to staging between the two time periods, ensuring both narrative clarity and visual coherence to match the weight of the subject.
On Susurros, we used Maya, Unreal Engine, Nuke, Substance, ZBrush, Houdini, and Photoshop.
On Susurros, we used Maya, Unreal Engine, Nuke, Substance, ZBrush, Houdini, and Photoshop.
Poster






DescriptionSWEAR is a bio-interactive jewelry system combining PPG sensing and sweat simulation to evoke embodied empathy. Through multimodal, stress-responsive interventions, it enables users to experience hyperhidrosis symptoms, bridging affective computing and wearable design for impactful applications in health communication and public awareness.
Technical Papers


DescriptionThe development of intelligent robots seeks to seamlessly integrate them into the human world, providing assistance and companionship in daily life and work, with the ultimate goal of achieving human-robot symbiosis. This requires robots with intelligent interaction abilities to work naturally and effectively with humans. However, current robotic simulators fail to
support real human participation, limiting their ability to provide authentic interaction experiences and gather valuable human feedback essential for enhancing robotic capabilities. In this paper, we introduce SymBridge, the first human-in-the-loop cyber-physical interactive system designed to enable the safe and efficient development, evaluation, and optimization of human-robot interaction methods. Specifically, we employ augmented reality technology
to enable real humans to interact with virtual robots in physical environments, creating an authentic interactive experience. Building on this, we propose a novel robotic interaction model that generates responsive, precise robot actions in real time through continuous human behavior observation. The model incorporates multi-resolution human motion features and environmental affordances, ensuring contextually adaptive robotic responses. Additionally, SymBridge enables continuous robot learning by collecting human feedback and dynamically adapting the robotic interaction model. By leveraging a carefully designed system architecture and modules, SymBridge builds a bridge between humans and robots, as well as between cyber and physical spaces, providing a natural and realistic online interaction experience while facilitating the continuous evolution of robotic intelligence. Extensive experiments, user studies, and real robot testing demonstrate the system’s promising performance and highlight its potential to significantly advance research on human-robot symbiosis.
support real human participation, limiting their ability to provide authentic interaction experiences and gather valuable human feedback essential for enhancing robotic capabilities. In this paper, we introduce SymBridge, the first human-in-the-loop cyber-physical interactive system designed to enable the safe and efficient development, evaluation, and optimization of human-robot interaction methods. Specifically, we employ augmented reality technology
to enable real humans to interact with virtual robots in physical environments, creating an authentic interactive experience. Building on this, we propose a novel robotic interaction model that generates responsive, precise robot actions in real time through continuous human behavior observation. The model incorporates multi-resolution human motion features and environmental affordances, ensuring contextually adaptive robotic responses. Additionally, SymBridge enables continuous robot learning by collecting human feedback and dynamically adapting the robotic interaction model. By leveraging a carefully designed system architecture and modules, SymBridge builds a bridge between humans and robots, as well as between cyber and physical spaces, providing a natural and realistic online interaction experience while facilitating the continuous evolution of robotic intelligence. Extensive experiments, user studies, and real robot testing demonstrate the system’s promising performance and highlight its potential to significantly advance research on human-robot symbiosis.
Emerging Technologies






DescriptionWe propose "SyncLimbs," a semi-autonomous control system for supernumerary limbs. In our system, AI hand synchronizes the progress of its task to the user's original hand. This enables users to perceive causal relationships between their natural movements and the supernumerary actions, and to enhance the sense of agency.
DLI Labs
Exhibitor Talk






DescriptionPlease bring your laptop and mouse to participate in this hands-on training. Seats are limited and available on a first-come, first-served basis.
In this introductory, hands-on course, learners will explore NVIDIA Cosmos™, a platform of generative world foundation models (WFM), advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline built to accelerate the development of physical AI.
In this introductory, hands-on course, learners will explore NVIDIA Cosmos™, a platform of generative world foundation models (WFM), advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline built to accelerate the development of physical AI.
Technical Papers


DescriptionLow-discrepancy sequences have seen widespread adoption in computer graphics thanks to the superior rates of convergence that they provide.
Because rendering integrals often are comprised of products of lower-dimensional integrals, recent work has focused on developing sequences that are also well-distributed in lower-dimensional projections. To this end, we introduce a novel construction of binary-based $(0, 4)$-sequences; that is, progressive fully multi-stratified sequences of 4D points, and extend the idea to higher power-of-two dimensions. We further show that not only it is possible to nest lower-dimensional sequences in higher-dimensional ones---for example, embedding a $(0, 2)$-sequence within our $(0, 4)$-sequence---but that we can ensemble two $(0, 2)$-sequences into a $(0, 4)$-sequence, four $(0, 4)$-sequences into a $(0, 16)$-sequence, and so on. Such sequences can provide excellent rates of convergence when integrals include lower-dimensional integration problems in 2, 4, 16,$\ldots$ dimensions. Our construction is based on using 2$\times$2 block matrices as symbols to construct larger matrices that potentially generate a sequence with the target $(0, s)$-sequence in base $s$ property. We describe how to search for suitable alphabets and identify two distinct, cross-related alphabets of block symbols, which we call $s$ and $z$, hence \emph{SZ} for the resulting family of sequences.
Given the alphabets, we construct candidate generator matrices and search for valid sets of matrices. We then infer a simple recurrence formula to construct full-resolution (64-bit) matrices.
Because our generator matrices are binary, they allow highly-efficient implementation using bitwise operations and can be used as a drop-in replacement for Sobol matrices in existing applications.
We compare SZ sequences to state-of-the-art low discrepancy sequences, and demonstrate mean relative squared error improvements up to $1.93\times$ in common rendering applications.
Because rendering integrals often are comprised of products of lower-dimensional integrals, recent work has focused on developing sequences that are also well-distributed in lower-dimensional projections. To this end, we introduce a novel construction of binary-based $(0, 4)$-sequences; that is, progressive fully multi-stratified sequences of 4D points, and extend the idea to higher power-of-two dimensions. We further show that not only it is possible to nest lower-dimensional sequences in higher-dimensional ones---for example, embedding a $(0, 2)$-sequence within our $(0, 4)$-sequence---but that we can ensemble two $(0, 2)$-sequences into a $(0, 4)$-sequence, four $(0, 4)$-sequences into a $(0, 16)$-sequence, and so on. Such sequences can provide excellent rates of convergence when integrals include lower-dimensional integration problems in 2, 4, 16,$\ldots$ dimensions. Our construction is based on using 2$\times$2 block matrices as symbols to construct larger matrices that potentially generate a sequence with the target $(0, s)$-sequence in base $s$ property. We describe how to search for suitable alphabets and identify two distinct, cross-related alphabets of block symbols, which we call $s$ and $z$, hence \emph{SZ} for the resulting family of sequences.
Given the alphabets, we construct candidate generator matrices and search for valid sets of matrices. We then infer a simple recurrence formula to construct full-resolution (64-bit) matrices.
Because our generator matrices are binary, they allow highly-efficient implementation using bitwise operations and can be used as a drop-in replacement for Sobol matrices in existing applications.
We compare SZ sequences to state-of-the-art low discrepancy sequences, and demonstrate mean relative squared error improvements up to $1.93\times$ in common rendering applications.
Poster






DescriptionWe developed a robotic system that collects non-destructive, motion-labeled tactile data, overcoming prior methods that damaged garments and enabling accurate machine learning classification.
Poster






DescriptionWe present work-in-progress on the development of a novel framework employing the World-in-Miniature paradigm, Tangible Interaction and Space-time bounds for efficient and intuitive immersive authoring of dynamic Mixed Reality scenes.
Technical Papers


Description3D Gaussian Splatting (3DGS) renders pixels by rasterizing Gaussian primitives, where conditional alpha-blending dominates the time cost in the rendering pipeline. This paper proposes TC-GS, an algorithm-independent universal module that expands Tensor Core (TCU) applicability for 3DGS, leading to substantial speedups and seamless integration into existing 3DGS optimization frameworks. The key innovation lies in mapping alpha computation to matrix multiplication, fully utilizing otherwise idle TCUs in existing 3DGS implementations. TC-GS provides plug-and-play acceleration for existing top-tier acceleration algorithms tightly coupled with rendering pipeline designs, like Gaussian compression and redundancy elimination algorithms. Additionally, we introduce a global-to-local coordinate transformation to mitigate rounding errors from quadratic terms of pixel coordinates caused by Tensor Core half-precision computation. Extensive experiments demonstrate that our method maintains rendering quality while providing an additional 2.18× speedup over existing Gaussian acceleration algorithms, thus reaching up to a total 5.6× acceleration.
Educator's Forum



DescriptionThis one-hour workshop presents a reproducible approach to teaching generative AI through cloud-first deployment and visual programming. Drawing from the School of Visual Arts’ “Innovating with Generative AI” course, we demonstrate how ComfyUI, a node-based interface for diffusion models, can be deployed at classroom scale using Docker containers and platforms such as Runpod. Participants will learn to teach text-to-image, image-to-3D, and multi-modal workflows without requiring local GPU hardware. Validated over three semesters, this framework lowers technical barriers and enables educators to integrate professional-grade AI tools into diverse curricula.
Technical Papers


DescriptionLarge pretrained diffusion models can provide strong priors beneficial for many graphics applications. However, generative applications such as neural rendering and inverse methods such as SVBRDF estimation and intrinsic image decomposition require additional input or output channels. Current solutions for channel expansion are often application specific and these solutions can be difficult to adapt to different diffusion models or new tasks. This paper introduces Teamwork: a flexible and efficient unified solution for jointly increasing the number of input and output channels as well as adapting a pretrained diffusion model to new tasks. Teamwork achieves channel expansion without altering the pretrained diffusion model architecture by coordinating and adapting multiple instances of the base diffusion model (\ie, teammates). We employ a novel variation of Low Rank-Adaptation (LoRA) to jointly address both adaptation and coordination between the different teammates. Furthermore Teamwork supports dynamic (de)activation of teammates. We demonstrate the flexibility and efficiency of Teamwork on a variety of generative and inverse graphics tasks such as inpainting, single image SVBRDF estimation, intrinsic decomposition, neural shading, and intrinsic image synthesis.
Technical Papers


DescriptionThe procedural occupancy function is a flexible and compact representation for creating 3D scenes. For rasterization and other tasks, it is often necessary to extract a mesh that represents the shape. Unbounded scenes with long-range camera trajectories, such as flying through a forest, pose a unique challenge for mesh extraction. A single static mesh representing all the geometric detail necessary for the full camera path can be prohibitively large. Therefore, independent meshes can be extracted for different camera views, but this approach may lead to popping artifacts during transitions. We propose a temporally coherent method for extracting meshes suitable for long-range camera trajectories in unbounded scenes represented by an occupancy function. The key idea is to perform 4D mesh extraction using a new spacetime tree structure called a binary-octree. Experiments show that, compared to existing baseline methods, our method offers superior visual consistency at a comparable cost.
Poster






DescriptionTentacles Fever is an installation that visualizes sound and music through the movement of tentacle-like actuators made with shape-memory alloy (SMA). Each actuator bends smoothly in multiple directions, synchronized with musical rhythms, producing an organic and performative expression—as if living creatures are dancing to music.
Invited Poster
Poster






DescriptionProfessional 3D asset creation often requires diverse sculpting brushes to add surface details and geometric structures. Despite recent progress in 3D generation, producing reusable sculpting brushes compatible with artists’ workflows remains a challenging problem. These sculpting brushes are typically represented as vector displacement maps (VDMs), which existing models cannot easily generate compared to natural images. This paper presents Text2VDM, a novel framework for text-to-VDM brush generation through the deformation of a dense planar mesh guided by score distillation sampling (SDS). We introduce weighted blending of prompt tokens to SDS, resulting in a more accurate target distribution and semantic guidance.
Poster






DescriptionTextFX introduces a structured pipeline for stylized text reconstruction, reducing design effort while achieving strong performance. User feedback validate effectiveness, with future work targeting learnable parameters and prompt-driven creative typography.
Technical Communications


DescriptionWe present TFD68, a fully annotated, pose-invariant thermal facial dataset with 68 landmarks, temperature maps, visual pairs, occlusions, and expressions, enabling robust landmark detection and expression recognition research.
Educator's Forum



DescriptionHuman pose prediction is a key technology for virtual environmental choreography in dance education. However, traditional deterministic prediction methods are unable to capture the diverse distribution of human poses, which limits their practical applicability. Thus, we propose a Temporal Hybrid Attention Diffusion Transformer (THADT) model for 3D to 3D prediction, which consists of forward diffusion and reverse generation processes. During forward diffusion, the discrete cosine transform converts human poses into frequency-domain features while gradually adding noise and training a denoising network to learn the noise distribution. In reverse generation, the model progressively removes noise by integrating historical pose data as conditional input, ultimately reconstructing future pose sequences through inverse transformation. Experimental results on the Human 3.6M and HumanEva-I datasets demonstrate that THADT outperforms existing state-of-the-art methods across key metrics such as ADE, FDE, and MMADE.
Workshop



DescriptionThe 3D Gaussian Splatting Workshop aims to collectively explore cutting-edge advancements in 3DGS, expand perspectives, and foster further development of the technology, including two parts: 1) the “3D Gaussian Splatting Challenge,” which aims to develop an efficient solution for high-fidelity reconstructions with ultra-low latency, and 2) invited keynote talks on current developments in 3D Gaussian Splatting.
For further information, please refer to the following website:
Website: https://gaplab.cuhk.edu.cn/projects/gsRaceSIGA2025/index.html
Agenda:
09:00 - 09:10: Welcome Remarks & Challenge Results
09:10 - 09:50: Winner talk
10:00 - 10:20: Invited talk: Support and Implementations of 3DGS on HarmonyOS by Yongqiang Gao, HUAWEI
10:20 - 10:40: Invited talk: Relightable 3DGS: accurate geometry and point-based ray tracing by Yao Yao, Nanjing University
10:40 - 11:00 Invited talk: Challenges of graphics area(ArkGraphics) in HarmonyOS by Ran Huang, HUAWEI
For further information, please refer to the following website:
Website: https://gaplab.cuhk.edu.cn/projects/gsRaceSIGA2025/index.html
Agenda:
09:00 - 09:10: Welcome Remarks & Challenge Results
09:10 - 09:50: Winner talk
10:00 - 10:20: Invited talk: Support and Implementations of 3DGS on HarmonyOS by Yongqiang Gao, HUAWEI
10:20 - 10:40: Invited talk: Relightable 3DGS: accurate geometry and point-based ray tracing by Yao Yao, Nanjing University
10:40 - 11:00 Invited talk: Challenges of graphics area(ArkGraphics) in HarmonyOS by Ran Huang, HUAWEI
Technical Papers


DescriptionWe introduce the Aging Multiverse, a framework for generating multiple plausible facial aging trajectories from a single image, each conditioned on external factors such as environment, health, and lifestyle. Unlike prior methods that model aging as a single deterministic path, our approach creates an aging tree that visualizes diverse futures.
To enable this, we propose a training-free diffusion-based method that balances identity preservation, age accuracy, and condition control. Our key contributions include attention mixing to modulate editing strength and a Simulated Aging Regularization strategy to stabilize edits. Extensive experiments and user studies demonstrate state-of-the-art performance across identity preservation, aging realism, and conditional alignment, outperforming existing editing and age-progression models, which often fail to account for one or more of the editing criteria. By transforming aging into a multi-dimensional, controllable, and interpretable process, our approach opens up new creative and practical avenues in digital storytelling, health education, and personalized visualization.
To enable this, we propose a training-free diffusion-based method that balances identity preservation, age accuracy, and condition control. Our key contributions include attention mixing to modulate editing strength and a Simulated Aging Regularization strategy to stabilize edits. Extensive experiments and user studies demonstrate state-of-the-art performance across identity preservation, aging realism, and conditional alignment, outperforming existing editing and age-progression models, which often fail to account for one or more of the editing criteria. By transforming aging into a multi-dimensional, controllable, and interpretable process, our approach opens up new creative and practical avenues in digital storytelling, health education, and personalized visualization.
Birds of a Feather






DescriptionWhat does it take to visualize an immersive experience or theme park in its early stages? In this panel, I’ll share my journey breaking into Walt Disney Imagineering and Universal Studios Hollywood as a concept artist. I’ll break down the process of preparing for the industry and offer insights on pitching ideas and working in the blue sky phase. This panel is for students and professionals just entering the industry, with a focus on the arts and early-stage concept design.
Featured Session



DescriptionCompare approaches to working across cultures and engaging diverse audiences through storytelling—finding ways to connect authentically, resonate universally, and build shared understanding.
Workshop



DescriptionAsiagraphics (AG) is a professional organization of the Asian research community of computer graphics and interactive technology, with its membership open to people interested in computer graphics from all over the world. It is a non-profitable organization registered in Hong Kong. Its full name is Asian Association for Computer Graphics and Interactive Technology. The main aim of the Asiagraphics Workshop on Intelligent Graphics is to promote collaboration and the exchange of knowledge among young researchers in the Asian graphics community. We are planning to feature approximately 9 invited speakers to share their latest research works.
For further information, please refer to the following website:
Website: https://www.asiagraphics.org/asiagraphics-workshop-on-intelligent-graphics-at-siggraph-asia
For further information, please refer to the following website:
Website: https://www.asiagraphics.org/asiagraphics-workshop-on-intelligent-graphics-at-siggraph-asia
Workshop



DescriptionAs AI enhances precise relic restoration, XR builds time-transcending immersive scenes, and digital reconstruction enables permanent heritage preservation, tech-humanities synergy opens new avenues for cultural inheritance. In an era of digital-driven cultural transformation, this workshop will gather global explorers to exchange views on digital tech’s application in cultural heritage. It showcases technology’s potential in safeguarding civilisation and revitalising traditions, serving as a cross-disciplinary dialogue bridge for heritage digitalisation.
For further information, please refer to the following website:
Website: https://www.polyu.edu.hk/nvidiarc/news-and-events/event/2025/12/siggraph-asia-2025-nvidia-workshop/?sc_lang=en
For further information, please refer to the following website:
Website: https://www.polyu.edu.hk/nvidiarc/news-and-events/event/2025/12/siggraph-asia-2025-nvidia-workshop/?sc_lang=en
Art Papers



Description“La Noche Cíclica” is a multi-part project that investigates the algorithmic suppression of night in Google Street View. Through a video sculpture, web archive, print album, and conceptual intervention, the work reveals how fragments of nighttime—caused by system delays and stitching errors—disrupt the platform’s default spatial-temperal logic of continous daylight. These fleeting anomalies challenge the linear, functional temporality of machine vision and question the infrastructure behind digital mapping systems. The project explores how data-driven media reconstruct the visible world, and how moments of error can become sites of resistance, visibility, and alternative spatial memory.
Art Papers



DescriptionThis paper presents The Dream of Zhuang Zhou, a multispecies virtual reality (VR) experience inspired by Zhuang Zi's renowned ``Butterfly Dream'' parable, which questions the nature of reality and identity. Leveraging 3D Gaussian Splatting technologies and ancient oriental aesthetics, the project reconstructs an immersive environment from the perspectives of four species: human, fish, butterfly, and bird. By offering seamless perceptual transitions, it exposes how reality is composed through entangled human-machine agencies and sensory biases. By fusing ancient philosophy with cutting-edge reconstruction, the project demonstrates VR’s potential as both technological canvas and experiential experiment for reimagining multispecies coexistence.
Poster






DescriptionThis study explores how object shape impressions affect weight perception in VR, aiming to clarify the relationship and underlying factors between shape and perceived weight.
Birds of a Feather






DescriptionWith the rapid rise of generative AI and large-scale world models, game computer graphics (CG) is undergoing a major transformation. This Birds of a Feather session will focus on how CG technologies in games evolve in the AI era, with particular emphasis on Non-Player Characters (NPCs), including personality-aligned generation of text, voice, expressions, and motions, multi-view consistent stylized rendering, and self-evolution. We will also discuss how world models can support more immersive virtual games. Distinguished researchers will share their insights, aiming to foster collaboration across research and development communities.
Project url: https://shandaai.github.io/siggraph-asia-bof/
Project url: https://shandaai.github.io/siggraph-asia-bof/
Birds of a Feather






DescriptionThis session explores how Magma is transforming production pipelines at the highest level through real-time visual collaboration—much like Figma did for product design, but purpose-built for artists. By offering artists and technical teams a shared creative environment to draw, review, and solve problems together, Magma enables smoother communication, faster iteration, and greater clarity across departments. The discussion highlights practical applications in animation, television, and games, illustrating how collaborative canvases enhance teamwork, strengthen decision-making, and empower both emerging and established talent to contribute meaningfully throughout every stage of the creative workflow.
Technical Papers


DescriptionThe simulation of sand--water mixtures requires capturing the stochastic behavior of individual sand particles within a uniform, continuous fluid medium.
However, most existing approaches, which only treat sand particles as markers within fluid solvers, fail to account for both the forces acting on individual sand particles and the collective feedback of the particle assemblies on the fluid.
This prevents faithful reproduction of characteristic phenomena including transport, deposition, and clogging.
Building upon kinetic ensemble averaging technique, we propose a physically consistent coupling strategy and introduce a novel Granule-In-Cell (GIC) method for modeling such sand--water interactions.
We employ the Discrete Element Method (DEM) to capture fine-scale granule dynamics and the Particle-In-Cell (PIC) method for continuous spatial representation and density projection.
To bridge these two frameworks, we treat granules as macroscopic transport flow rather than solid boundaries within the fluid domain. This bidirectional coupling allows our model to incorporate a range of interphase forces using different discretization schemes, resulting in more realistic simulations that strictly adhere to the mass conservation law.
Experimental results demonstrate the effectiveness of our method in simulating complex sand--water interactions, uniquely capturing intricate physical phenomena and ensuring exact volume preservation compared to existing approaches.
However, most existing approaches, which only treat sand particles as markers within fluid solvers, fail to account for both the forces acting on individual sand particles and the collective feedback of the particle assemblies on the fluid.
This prevents faithful reproduction of characteristic phenomena including transport, deposition, and clogging.
Building upon kinetic ensemble averaging technique, we propose a physically consistent coupling strategy and introduce a novel Granule-In-Cell (GIC) method for modeling such sand--water interactions.
We employ the Discrete Element Method (DEM) to capture fine-scale granule dynamics and the Particle-In-Cell (PIC) method for continuous spatial representation and density projection.
To bridge these two frameworks, we treat granules as macroscopic transport flow rather than solid boundaries within the fluid domain. This bidirectional coupling allows our model to incorporate a range of interphase forces using different discretization schemes, resulting in more realistic simulations that strictly adhere to the mass conservation law.
Experimental results demonstrate the effectiveness of our method in simulating complex sand--water interactions, uniquely capturing intricate physical phenomena and ensuring exact volume preservation compared to existing approaches.
Birds of a Feather






DescriptionHong Kong’s art scene has been making remarkable strides, generating new media art that speaks to people’s emotions every day. Home to an almost 30-year-running Microwave International New Media Arts Festival, Hong Kong offers a unique opportunity to pair that festival’s legacy with the SIGGRAPH art community. Through an art talk series to feature artists, technologists and curators with strong ties to Hong Kong, will explore how media art can connect Hong Kong with the wider world.
Join us for “The Hong Kong New Media Art Series Talks,” a session discussing the charm and future of Hong Kong’s media art.
Join us for “The Hong Kong New Media Art Series Talks,” a session discussing the charm and future of Hong Kong’s media art.
Workshop



DescriptionThis half-day workshop, hosted by IEEE Computer Graphics and Applications (CG&A), showcases the intersection of cutting-edge research and practical innovation in computer graphics. This event features recent CG&A publications selected for their relevance and impact for presenting at the ACM SIGGRAPH Asia conference.
For further information, please refer to the following website:
Website: https://ieee-cga-siga-workshop-2025.github.io/website/
For further information, please refer to the following website:
Website: https://ieee-cga-siga-workshop-2025.github.io/website/
Courses


DescriptionSimulations play an important role in replicating physical applications, continually evolving as demands for accuracy and quality grow. The growth of digital twins has contributed to advancements in mirroring physical systems in engineering and research. Digital twins are evolving into embodied agents capable of perceiving and acting within their virtual worlds, extending beyond passive models.
While widely adopted in domains such as robotics and industrial simulation, these same tools have the potential to reshape and reinvent interactive media, including games, film, and animation, by adding enhanced interactivity between the physical and digital worlds. By bridging the physical and digital through the simulation of objects and human behavior, new possibilities for adaptive media are emerging. In which our digital agents can learn, reason, and interact with the world.
As we enter the era of physical AI, we explore advancements in embodied agents used to simulate both realistic virtual objects and lifelike humanoid characters. By adopting platforms such as Unreal Engine 5 (UE5) as our simulation platform and prototyping with Raspberry Pis for robotics, we have the opportunity to blend photorealistic rendering and cinematic visual effects with our physical work.
While widely adopted in domains such as robotics and industrial simulation, these same tools have the potential to reshape and reinvent interactive media, including games, film, and animation, by adding enhanced interactivity between the physical and digital worlds. By bridging the physical and digital through the simulation of objects and human behavior, new possibilities for adaptive media are emerging. In which our digital agents can learn, reason, and interact with the world.
As we enter the era of physical AI, we explore advancements in embodied agents used to simulate both realistic virtual objects and lifelike humanoid characters. By adopting platforms such as Unreal Engine 5 (UE5) as our simulation platform and prototyping with Raspberry Pis for robotics, we have the opportunity to blend photorealistic rendering and cinematic visual effects with our physical work.
Technical Communications


DescriptionWe analyze artifacts in vision-based motion capture and present a physics-based framework for cleanup, improving physical realism, supporting single- and multi-character sequences, and reducing the need for labor-intensive manual refinement.
Computer Animation Festival






DescriptionThe Rise of Blus is an original 3D animated feature that reimagines what independent filmmaking can achieve through artistry, innovation, and global collaboration. Set in a dazzling city in the clouds, the story follows Gi — a 13-year-old dreamer whose father is wrongfully taken by an oppresive ruler. As Gi uncovers a hidden legacy, his journey becomes one of courage, identity, and the fight for freedom.
What distinguishes The Rise of Blus is not only its narrative ambition but its technical and creative approach. The film adopts a painterly Kuwahara style, giving 3D models the texture and warmth of hand-painted animation. By animating on twos, the team embraces the rhythm and charm of traditional 2D animation while working entirely in 3D. Instead of relying on expensive renderfarms, the production harnesses Unreal Engine’s real-time rendering pipeline, allowing for cinematic results with efficiency and flexibility.
Equally innovative is the production model: a completely remote studio bringing together artists from around the world. This global collaboration enables diverse perspectives to shape the world of Blus while demonstrating the viability of distributed pipelines for feature film animation projects.
As an independently developed IP, The Rise of Blus has already garnered recognition, including awards at Animator 2024 and selections at festivals such as Montreal International Animation Film Festival, Atlanta Film Festival and Kids First! Film Festival. Each milestone reflects the project’s ability to resonate across audiences while pioneering new workflows.
By sharing The Rise of Blus at SIGGRAPH Asia, we aim to contribute to the community an example of how independent creators can leverage new funding models, real-time tools and novel pipelines to tell bold, original stories. It is both a film and a case study in the evolving future of animated storytelling.
What distinguishes The Rise of Blus is not only its narrative ambition but its technical and creative approach. The film adopts a painterly Kuwahara style, giving 3D models the texture and warmth of hand-painted animation. By animating on twos, the team embraces the rhythm and charm of traditional 2D animation while working entirely in 3D. Instead of relying on expensive renderfarms, the production harnesses Unreal Engine’s real-time rendering pipeline, allowing for cinematic results with efficiency and flexibility.
Equally innovative is the production model: a completely remote studio bringing together artists from around the world. This global collaboration enables diverse perspectives to shape the world of Blus while demonstrating the viability of distributed pipelines for feature film animation projects.
As an independently developed IP, The Rise of Blus has already garnered recognition, including awards at Animator 2024 and selections at festivals such as Montreal International Animation Film Festival, Atlanta Film Festival and Kids First! Film Festival. Each milestone reflects the project’s ability to resonate across audiences while pioneering new workflows.
By sharing The Rise of Blus at SIGGRAPH Asia, we aim to contribute to the community an example of how independent creators can leverage new funding models, real-time tools and novel pipelines to tell bold, original stories. It is both a film and a case study in the evolving future of animated storytelling.
Birds of a Feather






DescriptionThis BOF provides the real-time graphics and compute shader developer community an opportunity to get together to exchange ideas, share experiences, raise issues and make suggestions on the the future evolution of the open-source Slang shading language, compiler and wider ecosystem. During the session attendees will learn about the latest Slang developments - including Neural Graphics, diffential rendering tools, and more. There will be plenty of time during the sessions to discuss the wider shading language landscape and ask questions.
Poster






DescriptionWe present a real-time tele-immersive system that transmits 3D point clouds and performers’ footstep vibrations to remote audiences with less than 100 ms latency. Combining synchronized LiDAR–camera capture, lightweight flicker reduction, and view-dependent rendering, the system streams dynamic stage scenes under live lighting conditions. A large 3D LED wall and a propagation-aware haptic floor reproduce both visual and tactile presence, allowing spectators to see and feel the performance as if the stage were brought to them. A public trial at Expo 2025 linked two sites 20 km apart, demonstrating natural interaction, seamless bidirectional communication, and strong audience engagement.
Poster






DescriptionThe Theater Machine is an interactive AI-based installation designed for theater foyers to make performing arts more accessible through participatory engagement. Using camera and touchscreen interaction, visitors are integrated in real time into dynamically generated theatrical scenes. The system operates on a high-performance workstation powered by a generative model trained via Adversarial Diffusion Distillation within a high-throughput StreamDiffusion pipeline. By merging live user input with AI-driven scene creation, the project explores new modes of audience involvement and creative expression. This paper presents the conceptual framework, system architecture, interaction design, and sustainability strategy underlying The Theater Machine.
Workshop



DescriptionThe TriFusion Workshop— “Towards Embodied Intelligence Across Humans, Avatars, and Humanoid Robotics”— aims to use computer graphic techniques to stregthen the bridge among human knowledge, avatar and humanoid robotics like better simulation rendering pipeline, plausible motion retargetting and tracking and so on. We call for papers focusing on 1) Motion capture, tracking, and retargeting, 2) Data-driven and physics-based human motion generation, and 3) Robotics sim-to-real transfer. Moreover, we invite cross-field experts on human behavior understanding, avatar and humanoid robotics for keynote speeches and talks.
For further information, please refer to the following website:
Website: https://trifusion.b12sites.com
For further information, please refer to the following website:
Website: https://trifusion.b12sites.com
Workshop



DescriptionThe Volumetric Video Workshop aims to accelerate the transition of volumetric video technology from laboratory prototypes to robust, production- and consumer-ready systems, enabling breakthroughs in immersive interactive experiences. It comprises two parts: 1) a challenge with a compression track and a sparse-view reconstruction track, based on benchmarking using a new high-quality multi-view dataset capturing diverse dynamic performers; 2) invited keynote talks on current developments in volumetric video.
For further information, please refer to the following website:
Website: http://4dv.ai/research/sig-asia2025-volumetric-video-challenges
For further information, please refer to the following website:
Website: http://4dv.ai/research/sig-asia2025-volumetric-video-challenges
Educator's Forum



DescriptionAs the second largest opera genre in China, Yue Opera's digital inheritance faces the dual challenges of a lack of interest from young people and insufficient interaction with existing technologies. To this end, this study proposes and constructs a virtual reality gamification system for the cultural inheritance of Yue Opera, "The VR Wonderland of Yue Opera"(VWYO). The system integrates virtual reality, motion capture, intelligent voice, and large language models to provide users with an immersive learning experience from cultural cognition, knowledge interaction, to performance imitation through the four-act scene of "voice dialogue - Yue opera game - body learning - gesture learning". The results show that VWYO has advantages in improving the immersion of Yue opera and helping to master knowledge, and also provides a generalizable technical path and practical paradigm for the digital living inheritance of intangible cultural heritage.
Poster






DescriptionOur research presents the Thread Display: a novel screen of wind-suspended, retro-reflective threads. Dynamically shaping the viewing area for a truly unique visual. This retraction capability minimizes visual intrusion and enhances the floating effect. By eliminating material replenishment, our method advances the operational efficiency, safety, and flexibility of aerial displays.
Art Papers



DescriptionWe present an interactive VR experience where users solve a physical Rubik's Cube while seeing it from the inside. This unusual game mechanic challenges natural spatial perception mechanisms, leading to ambiguity regarding their actual position with respect to the object in their hands. Beyond puzzle-solving, we applied this interaction paradigm to 3D scenes that must be reconstructed through Rubik's Cube operations in ways that some users described as akin to “being inside a work of Escher.” We discuss the technical implementations, document participant interactions, and discuss implications for artistic expression, game design, and embodiment in VR settings.
Poster






DescriptionThis study presents an immersive VR simulation system enabling first-person experience of Alzheimer's daily challenges, and demonstrates its effectiveness in enhancing users' empathy and understanding through mixed-methods evaluation.
Exhibitor Talk






DescriptionIn 1995, Pixar’s Toy Story changed animation forever … not just with great storytelling, but by redefining the rules of filmmaking with Pixar’s cutting-edge technology, RenderMan. Now, nearly 30 years later, Pixar is launching the most significant update in over a decade, RenderMan XPU, Pixar’s hybrid CPU + GPU renderer … and as an special preview for the SIGGRAPH Asia, you’ll get to see the first 72 seconds of the unreleased Toy Story 5 … all rendered in glorious XPU!
Join us for the official launch of RenderMan 27, with exclusive insights from artistic case studies showcasing the new workflows with XPU. From interactive photorealism to advanced stylization, XPU is empowering artists in new ways … for studios both big and small. This session looks back at the technological breakthroughs that are shaping the industry … focused on the creative freedom unlocked by XPU. Whether you’re rendering, directing, or dreaming … the future of storytelling is about to get faster, weirder, and a lot more fun.
And finally, there will be a limited supply of walking teapots for the audience members, so don’t miss out and come early! This is one talk you won’t want to miss.
Join us for the official launch of RenderMan 27, with exclusive insights from artistic case studies showcasing the new workflows with XPU. From interactive photorealism to advanced stylization, XPU is empowering artists in new ways … for studios both big and small. This session looks back at the technological breakthroughs that are shaping the industry … focused on the creative freedom unlocked by XPU. Whether you’re rendering, directing, or dreaming … the future of storytelling is about to get faster, weirder, and a lot more fun.
And finally, there will be a limited supply of walking teapots for the audience members, so don’t miss out and come early! This is one talk you won’t want to miss.
Computer Animation Festival






DescriptionThe girl, trapped in a dream where she is chased by those who resent her, falls into the place where the nightmare dwells. There, she faces the old sense of guilt that has turned into the nightmare.
Technical Papers


DescriptionVolumetric video is emerging as a key medium for digitizing the dynamic physical world, creating the virtual environments with six degrees of freedom to deliver immersive user experiences. However, robustly modeling general dynamic scenes, especially those involving topological changes while maintaining long-term tracking remains a fundamental challenge. In this paper, we present TaoGS, a novel topology-aware dynamic Gaussian representation that disentangles motion and appearance to support, both, long-range tracking and topological adaptation. We represent scene motion with a sparse set of motion Gaussians, which are continuously updated by a spatio-temporal tracker and photometric cues that detect structural variations across frames. To capture fine-grained texture, each motion Gaussian anchors and dynamically activates a set of local appearance Gaussians, which are non-rigidly warped to the current frame to provide strong initialization and significantly reduce training time. This activation mechanism enables efficient modeling of detailed textures and maintains temporal coherence, allowing high-fidelity rendering even under challenging scenarios such as changing clothes. To enable seamless integration into codec-based volumetric formats, we introduce a global Gaussian Lookup Table that records the lifespan of each Gaussian and organizes attributes into a lifespan-aware 2D layout. This structure aligns naturally with standard video codecs and supports up to 40× compression.TaoGS provides a unified, adaptive solution for scalable volumetric video under topological variation, capturing moments where ``elegance in motion'' and ``Power in Stillness''— delivering immersive experiences that harmonize with the physical world.
Invited Poster
Poster






DescriptionTaoGS is a topology-aware dynamic Gaussian representation that disentangles motion and appearance to robustly model dynamic scenes with topological changes, enabling long-range tracking, topological adaptation, high-fidelity rendering, and 40× compression.
Poster






DescriptionTo solve the scalability and hot-reloading challenges of instance polymorphism in ESLs, we've developed a topology-aware polymorphic architecture that drastically cuts compilation and rendering times.
Poster






DescriptionThis paper presents an optimized 3D Gaussian Splatting framework for real-time light field rendering on mobile devices, demonstrating the feasibility of portable and immersive 3D experiences anywhere.
Poster






DescriptionThis study develops a smartphone-only exergame for adolescents with orthostatic dysregulation to reduce sensor burden while maintaining detection accuracy, and compared it with a conventional method using an external sensor.
Technical Papers


DescriptionRecent advancements in 3D Gaussian Splatting (3DGS) have demonstrated its potential for efficient and photorealistic 3D reconstructions, which is crucial for diverse applications such as robotics and immersive media.
However, current Gaussian-based methods for dynamic scene reconstruction struggle with large inter-frame displacements, leading to artifacts and temporal inconsistencies under fast object motions.
To address this, we introduce \textit{TrackerSplat}, a novel method that integrates advanced point tracking methods to enhance the robustness and scalability of 3DGS for dynamic scene reconstruction.
TrackerSplat utilizes off-the-shelf point tracking models to extract pixel trajectories and triangulate per-view pixel trajectories onto 3D Gaussians to guide the relocation, rotation, and scaling of Gaussians before training.
This strategy effectively handles large displacements between frames, dramatically reducing the fading and recoloring artifacts prevalent in prior methods.
By accurately positioning Gaussians prior to gradient-based optimization, TrackerSplat overcomes the quality degradation associated with large frame gaps when processing multiple adjacent frames in parallel across multiple devices, thereby boosting reconstruction throughput while preserving rendering quality.
Experiments on real-world datasets confirm the robustness of TrackerSplat in challenging scenarios with significant displacements, achieving superior throughput under parallel settings and maintaining visual quality compared to baselines.
The code is available at \href{https://github.com/yindaheng98/TrackerSplat}{https://github.com/yindaheng98/TrackerSplat}.
However, current Gaussian-based methods for dynamic scene reconstruction struggle with large inter-frame displacements, leading to artifacts and temporal inconsistencies under fast object motions.
To address this, we introduce \textit{TrackerSplat}, a novel method that integrates advanced point tracking methods to enhance the robustness and scalability of 3DGS for dynamic scene reconstruction.
TrackerSplat utilizes off-the-shelf point tracking models to extract pixel trajectories and triangulate per-view pixel trajectories onto 3D Gaussians to guide the relocation, rotation, and scaling of Gaussians before training.
This strategy effectively handles large displacements between frames, dramatically reducing the fading and recoloring artifacts prevalent in prior methods.
By accurately positioning Gaussians prior to gradient-based optimization, TrackerSplat overcomes the quality degradation associated with large frame gaps when processing multiple adjacent frames in parallel across multiple devices, thereby boosting reconstruction throughput while preserving rendering quality.
Experiments on real-world datasets confirm the robustness of TrackerSplat in challenging scenarios with significant displacements, achieving superior throughput under parallel settings and maintaining visual quality compared to baselines.
The code is available at \href{https://github.com/yindaheng98/TrackerSplat}{https://github.com/yindaheng98/TrackerSplat}.
Technical Papers


DescriptionWe introduce a novel, training-free system for reconstructing, understanding, and rendering 3D indoor scenes from a sparse set of unposed RGB images. Unlike traditional radiance field approaches that require dense views and per-scene optimization, our pipeline achieves high-fidelity results without any training or pose preprocessing. The system integrates three key innovations: (1) A robust point cloud reconstruction module that filters unreliable geometry using a warping-based anomaly removal strategy; (2) A warping-guided 2D-to-3D instance lifting mechanism that propagates 2D segmentation masks into a consistent, instance-aware 3D representation; and (3) A novel rendering approach that projects the point cloud into new views and refines the renderings with a 3D-aware diffusion model. Our method leverages the generative power of diffusion to compensate for missing geometry and enhances realism, especially under sparse input conditions. We further demonstrate that object-level scene editing—such as instance removal—can be naturally supported in our pipeline by modifying only the point cloud, enabling the synthesis of consistent, edited views without retraining. Our results establish a new direction for efficient, editable 3D content generation without relying on scene-specific optimization.
XR






DescriptionLife is a self-confrontation Go game, where we’re both player and opponent. Each move, black or white, reshapes the shifting landscape of our lives. As perspectives shift, boundaries dissolve, revealing harmony within dualities that resolve into wholeness and transform into inner wisdom.
Technical Papers


DescriptionReconstructing the geometry and appearance of a given scene is a fundamental task in 3D computer graphics and computer vision. Recently, radiance fields emerged as a representation of light transport in the scene, allowing, as a byproduct, also to extract 3D geometry solely from multi-view imagery. Initially designed for RGB captures, existing approaches have been extended to other sensor modalities.
Among these, transient imaging—measuring the time-of-flight of light at picosecond resolution—has emerged as a promising alternative, offering rich spatio-temporal information to improve reconstruction quality from limited viewpoints and obstructed views.
However, its applicability outdoors has been highly problematic due to interference from ambient light and the different sensor behavior under high-photon-flux conditions typical of outdoor settings.
Addressing this gap, we introduce Transient LASSO, a neural scene reconstruction method operating on raw transient measures of outdoor in-the-wild captures to accurately reconstruct the underlying scene geometry and properties. We demonstrate the effectiveness of our method across a variety of outdoor environments, including complex urban scenes with dense traffic and infrastructure.
Finally, we also show the potential use cases of our method for downstream applications such as sensor parameter optimization.
Among these, transient imaging—measuring the time-of-flight of light at picosecond resolution—has emerged as a promising alternative, offering rich spatio-temporal information to improve reconstruction quality from limited viewpoints and obstructed views.
However, its applicability outdoors has been highly problematic due to interference from ambient light and the different sensor behavior under high-photon-flux conditions typical of outdoor settings.
Addressing this gap, we introduce Transient LASSO, a neural scene reconstruction method operating on raw transient measures of outdoor in-the-wild captures to accurately reconstruct the underlying scene geometry and properties. We demonstrate the effectiveness of our method across a variety of outdoor environments, including complex urban scenes with dense traffic and infrastructure.
Finally, we also show the potential use cases of our method for downstream applications such as sensor parameter optimization.
Invited Poster
Poster






DescriptionWe propose TransparentGS, a fast inverse rendering pipeline for transparent objects based on 3D-GS. The main contributions are three-fold: efficient transparent Gaussian primitives for specular refraction, GaussProbe to encode ambient light and nearby contents, and the IterQuery algorithm to reduce parallax errors in our probe-based framework.
Games






DescriptionDream Decor is an immersive home design game that allows players to bring their interior design ideas to life. From cozy living rooms to luxurious bedrooms, the game offers endless opportunities to create stunning spaces using a wide range of furniture, decor items, and color palettes.
Inspired by real-world design trends and styles, Dream Decor challenges players to craft visually appealing interiors while balancing creativity and functionality. Players can participate in design challenges, unlock exclusive decor collections, and share their creations with a vibrant community of like-minded enthusiasts.
With its rich visuals, diverse customization options, and engaging gameplay, Dream Decor offers a fun and relaxing experience for casual gamers and design lovers alike. Whether you’re designing your dream home or honing your creative skills, Dream Decor provides the perfect platform for self-expression and artistic exploration.
Inspired by real-world design trends and styles, Dream Decor challenges players to craft visually appealing interiors while balancing creativity and functionality. Players can participate in design challenges, unlock exclusive decor collections, and share their creations with a vibrant community of like-minded enthusiasts.
With its rich visuals, diverse customization options, and engaging gameplay, Dream Decor offers a fun and relaxing experience for casual gamers and design lovers alike. Whether you’re designing your dream home or honing your creative skills, Dream Decor provides the perfect platform for self-expression and artistic exploration.
Games






DescriptionA mummified ex-skater is brought back to life to roam the earth in a post-apocalyptic world. Muskater can perform any skate trick because he has no fear of dying, as he is already dead! One drawback is that it's lonely being invisible. This is an original hand drawn comic book by Rhys Turner.
This is an indie video game based on the comic book. It’s an endless skater platform game which comments on the paradoxical nature of living forever. Muskater lives on infinitely in a doom roll state — like Groundhog Day, but alone. Skate or die!
Muskater cant die but when he does crash its inconvenient
Unique features:
1. Retro 2D graphics
2. Unqiue skatebaording game with story and tricks
3. Comic book adaptation and narrative integrated
4. More than one game type: RPG Explore Map, Endless Runner, Platformer
5. LLM driven dialogue
6. Gravity and ragdoll mechanic - WIP coming soon
7. Procedural Level builder slice mechanic - WIP coming soon
This is an indie video game based on the comic book. It’s an endless skater platform game which comments on the paradoxical nature of living forever. Muskater lives on infinitely in a doom roll state — like Groundhog Day, but alone. Skate or die!
Muskater cant die but when he does crash its inconvenient
Unique features:
1. Retro 2D graphics
2. Unqiue skatebaording game with story and tricks
3. Comic book adaptation and narrative integrated
4. More than one game type: RPG Explore Map, Endless Runner, Platformer
5. LLM driven dialogue
6. Gravity and ragdoll mechanic - WIP coming soon
7. Procedural Level builder slice mechanic - WIP coming soon
Poster






DescriptionThis poster presents TryIto, a hybrid cloth simulation framework combining thin-shell mechanics, learned residual forces, and a dynamic 4D UV atlas, enabling more stable, accurate, and realistic virtual garment simulation.
Poster






DescriptionTunTun Diary is a playful mobile game where an alien puppy transforms nightmares into comic stories, offering comfort and emotional relief through AI storytelling and virtual companionship.
Key Event
Real-Time Live!



DescriptionThe SparcAI team developed a next-generation 3D generative model that pushes the boundaries of fidelity and speed. Our new system produces native-resolution geometry with exceptional structural detail, supports native material and texture generation, and achieves fast generation within seconds. Built upon an efficient discrete mesh codec and a novel generative architecture, the model delivers unmatched realism and responsiveness—enabling creation of high-fidelity 3D assets directly from images or text prompts. This breakthrough greatly advances cinematic-quality 3D asset generation toward real-world applications.
Poster






DescriptionWe proposed a two-stage vector sketch generation framework.
The structure stage captures global composition with coarse strokes, while the detail stage to refine fine-grained details.
The structure stage captures global composition with coarse strokes, while the detail stage to refine fine-grained details.
Exhibitor Talk






DescriptionLearn how our stereo HMC-based facial capture method delivers quality comparable to seated 4D capture stages. We train a personalized neural network that tracks subtle performance nuances without markers, under changing lighting conditions and camera angles. The method requires only FACS-based facial scans of the actor and a 20-minute training sequence in the HMC. We will also present our approach to building rigs that can reproduce and edit an actor’s 4D performance sequence with minimal loss, using a learned deformation layer.
Technical Papers


DescriptionGeometry-aware online motion retargeting is crucial for real-time character animation in gaming and virtual reality. However, existing methods often rely on complex optimization procedures or deep neural networks, which constrain their applicability in real-time scenarios. Moreover, they offer limited control over fine-grained motion details involved in character interactions, resulting in less realistic outcomes. To overcome these limitations, we propose a novel optimization framework for ultrafast, lightweight motion retargeting with joint-level control (i.e., controls over joint position, bone orientation, etc,). Our approach introduces a semantic-aware objective grounded in a spherical geometry representation, coupled with a bone-length-preserving algorithm that iteratively solves this objective. This formulation preserves spatial relationships among spheres, thereby maintaining motion semantics, mitigating interpenetration, and ensuring contact. It is lightweight and computationally efficient, making it particularly suitable for time-critical real-time deployment scenarios. Additionally, we incorporate a heuristic optimization strategy that enables rapid convergence and precise joint-level control. We evaluate our method against state-of-the-art approaches on the Mixamo dataset, and experimental results demonstrate that it achieves comparable performance while delivering order-of-magnitude speedup.
Technical Papers


DescriptionWe present UltraZoom, a system for generating gigapixel-resolution images of objects from casually captured inputs, such as handheld phone photos. Given a full-shot image (global, low-detail) and one or more close-ups (local, high-detail), UltraZoom upscales the full image to match the fine detail and scale of the close-up examples. To achieve this, we construct a per-instance paired dataset from the close-ups and adapt a pretrained generative model to learn object-specific low-to-high resolution mappings. At inference, we apply the model in a sliding window fashion over the full image. Constructing these pairs is non-trivial: it requires registering the close-ups within the
full image for scale estimation and degradation alignment. We introduce a simple, robust method for achieving registration on arbitrary materials in casual, in-the-wild captures. Together, these components form a system that enables seamless pan and zoom across the entire object, producing
consistent, photorealistic gigapixel imagery from minimal input. For full-resolution results and code, visit our project page at ultra-zoom.github.io.
full image for scale estimation and degradation alignment. We introduce a simple, robust method for achieving registration on arbitrary materials in casual, in-the-wild captures. Together, these components form a system that enables seamless pan and zoom across the entire object, producing
consistent, photorealistic gigapixel imagery from minimal input. For full-resolution results and code, visit our project page at ultra-zoom.github.io.
XR






DescriptionBy implementing new movement methods, we enhance the immersion and realism of the virtual underwater moving, allowing users to experience underwater navigation more authentically, thereby improving the playability of the underwater game.While introducing new virtual underwater movement methods and additional motion mechanisms, we prioritize preventing user motion sickness.
Technical Papers


DescriptionWe present a high-speed underwater optical backscatter communication technique based on acousto-optic light steering. Our approach enables underwater assets to transmit data at rates potentially reaching hundreds of Mbps, vastly outperforming current state-of-the-art optical and underwater backscatter systems, which typically operate at only a few kbps. In our system, a base station illuminates the backscatter device with a pulsed laser and captures the retroreflected signal using an ultrafast photodetector. The backscatter device comprises a retroreflector and a 2 MHz ultrasound transducer. The transducer generates pressure waves that dynamically modulate the refractive index of the surrounding medium, steering the light either toward the photodetector (encoding 1) or away from it (encoding 0). Using a 3-bit redundancy scheme, our prototype achieves a communication rate of approximately 0.66 Mbps with an energy consumption of <1 µJ/bit, representing a 60X improvement over prior techniques. We validate its performance through extensive laboratory experiments in which remote underwater assets wirelessly transmit multimedia data to the base station under various environmental conditions.
Technical Papers


DescriptionWe present Uni-Inter, a unified framework for human motion generation that supports a wide range of interaction scenarios—including human-human, human-object, and human-scene—within a single, task-agnostic architecture. In contrast to existing methods that rely on task-specific designs and exhibit limited generalization, Uni-Inter introduces the Unified Interactive Volume (UIV), a volumetric representation that encodes heterogeneous interactive entities into a shared spatial field. This enables consistent relational reasoning and compound interaction modeling. Motion generation is formulated as joint-wise probabilistic prediction over the UIV, allowing the model to capture fine-grained spatial dependencies and produce coherent, context-aware behaviors. Experiments across three representative interaction tasks demonstrate that Uni-Inter achieves competitive performance and generalizes well to novel combinations of entities. These results suggest that unified modeling of compound interactions offers a promising direction for scalable motion synthesis in complex environments. Additionally, our code will be open-sourced in the future to support further research and development in this area.
Technical Papers


DescriptionCamera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects. To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise control of both camera and human motion in video generation.
Uni3C includes two key contributions. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController, which utilizes unprojected point clouds from monocular depth to achieve accurate camera control. By leveraging the strong 3D priors of point clouds and the powerful capacities of video foundational models, PCDController shows impressive generalization, performing well regardless of whether the inference backbone is frozen or fine-tuned. This flexibility enables different modules of Uni3C to be trained in specific domains, i.e., either camera control or human motion control, reducing the dependency on jointly annotated data. Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters to unify the control signals for camera and human motion, respectively. Extensive experiments confirm that PCDController enjoys strong robustness in driving camera motion for fine-tuned backbones of video generation. Uni3C substantially outperforms competitors in both camera controllability and human motion quality. Additionally, we collect tailored validation sets featuring challenging camera movements and human actions to validate the effectiveness of our method.
Uni3C includes two key contributions. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController, which utilizes unprojected point clouds from monocular depth to achieve accurate camera control. By leveraging the strong 3D priors of point clouds and the powerful capacities of video foundational models, PCDController shows impressive generalization, performing well regardless of whether the inference backbone is frozen or fine-tuned. This flexibility enables different modules of Uni3C to be trained in specific domains, i.e., either camera control or human motion control, reducing the dependency on jointly annotated data. Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters to unify the control signals for camera and human motion, respectively. Extensive experiments confirm that PCDController enjoys strong robustness in driving camera motion for fine-tuned backbones of video generation. Uni3C substantially outperforms competitors in both camera controllability and human motion quality. Additionally, we collect tailored validation sets featuring challenging camera movements and human actions to validate the effectiveness of our method.
Technical Papers


DescriptionVideo data provides an accessible and rich source beyond expensive action-labeled robot data for advancing robotic learning paradigms. Motivated by this potential, researchers investigate methods to exploit video data in robotic learning. Recent approaches can be primarily divided into two categories: Action-based approaches tokenize latent actions from videos for policy pre-training. State-based approaches pre-train models to predict subsequent states. The former establishes rich motion priors, while the latter empowers the robot to anticipate future events. These complementary capabilities suggest significant potential for integration into a unified framework. In this paper, we propose UniMimic, a novel approach unifying latent action and latent state pre-training from videos. We first train a unified tokenizer to learn latent states from video frames while deriving latent actions between state tokens. Subsequently, the policy is pre-trained on videos to predict these latent actions and subsequent latent states. Finally, the policy is fine-tuned on an action-labeled robot dataset to transfer the learned priors to precise robot execution. Experiments exhibit that our pre-training stage enhances the performance by 19% in the Libero benchmark and improves the average number of tasks completed in a row of 5 from 2.50 and 2.35 to 3.89 and 3.73 in the CALVIN benchmark. In the real-world experiments, our method still delivers improvements exceeding 36%.
Invited Poster
Poster






DescriptionIn this paper, we introduce UnrealLLM, a novel multi-agent framework that connects natural language descriptions with the professional PCG system (Unreal Engine 5) to automate scene generation.
UnrealLLM constructs a comprehensive knowledge base to translate text into executable PCG blueprints and a diverse asset library that guarantees high-quality scene generation. Additionally, it also introduces a text-based blueprint system with a spline-based control mechanism for geometric arrangement, enabling natural language interaction and enhancing interactivity in 3D environments using UE5's advanced capabilities.
UnrealLLM constructs a comprehensive knowledge base to translate text into executable PCG blueprints and a diverse asset library that guarantees high-quality scene generation. Additionally, it also introduces a text-based blueprint system with a spline-based control mechanism for geometric arrangement, enabling natural language interaction and enhancing interactivity in 3D environments using UE5's advanced capabilities.
Featured Session



DescriptionJoin digital media pioneer Scott Ross in a live on-stage talk with Baoquan Chen, Professor of Peking University and Associate Dean of the School of Artificial Intelligence. Together, they’ll dive into the past, present, and future of visual effects, drawing from Ross’s book Upstart.
Expect candid insights into building groundbreaking studios, managing the chaos of innovation, and shaping the artistry and technology that transformed cinema. With Chen’s expertise in computer graphics and visualization, the conversation promises to explore not only the history of VFX but also the emerging technologies that will define its future.
Expect candid insights into building groundbreaking studios, managing the chaos of innovation, and shaping the artistry and technology that transformed cinema. With Chen’s expertise in computer graphics and visualization, the conversation promises to explore not only the history of VFX but also the emerging technologies that will define its future.
Technical Papers


DescriptionAI-assisted graphic design has emerged as a powerful tool for automating the creation and editing of design elements such as posters, banners and advertisements. While diffusion-based text-to-image models have demonstrated strong capabilities in visual content generation, their text rendering performance, particularly for small-scale typography and non-Latin scripts, remains limited. In this paper, we propose UTDesign, a unified framework for high-precision stylized text editing and conditional text generation in design images, supporting both English and Chinese scripts. Our framework introduces a novel DiT-based text style transfer model trained from scratch on a synthetic dataset, capable of generating transparent RGBA text foregrounds that preserve the style of reference glyphs. We further extend this model into a conditional text generation framework by training a multi-modal condition encoder on a curated dataset with detailed text annotations, enabling accurate, style-consistent text synthesis conditioned on background images, prompts, and layout specifications. Finally, we integrate our approach into a fully automated text-to-design (T2D) pipeline by incorporating pre-trained text-to-image (T2I) models and an MLLM-based layout planner. Extensive experiments demonstrate that UTDesign achieves state-of-the-art performance among open-source methods in terms of stylistic consistency and text accuracy, and also exhibits unique advantages compared to proprietary commercial approaches. Code and data for this paper are at www.dummy.url.
Technical Papers


Description3D sketches are an effective representation of a 3D shape, convenient to create via modern Virtual or Augmented Reality (VR/AR) interfaces or from 2D sketches. For 3D sketches drawn by designers, human observers can consistently imagine the surface they imply, yet reconstructing such a surface with modern methods remains an open problem. Existing methods either assume a clean, well-structured 3D curve network (while in reality most 3D sketches are rough and unstructured), or make no effort to produce a surface that aligns with the artist's intent. We propose a novel method that addresses this challenge by designing a system that reconstructs a surface that aligns with human perception from a clean or rough set of 3D sketches. As the topology of the desired surface is unknown, we use an implicit neural surface representation, parameterized via its gradient field.
As suggested by previous perception and modelling literature, human observers tend to imagine the surface by interpreting some of the input strokes as \emph{representative flow-lines}, related to the lines of curvature, and imagining the surface whose curvature agrees with those. Inspired by these observations, we design a novel loss that finds the surface with the smoothest principal curvature field aligned with the input strokes. Together with approximation and piecewise smoothness requirements, we formulate a variational optimization that performs robustly on a wide variety of 3D sketches. We validate our algorithmic choices via a series of qualitative and quantiative evaluations, and comparisons to ground truth surfaces and previous methods.
As suggested by previous perception and modelling literature, human observers tend to imagine the surface by interpreting some of the input strokes as \emph{representative flow-lines}, related to the lines of curvature, and imagining the surface whose curvature agrees with those. Inspired by these observations, we design a novel loss that finds the surface with the smoothest principal curvature field aligned with the input strokes. Together with approximation and piecewise smoothness requirements, we formulate a variational optimization that performs robustly on a wide variety of 3D sketches. We validate our algorithmic choices via a series of qualitative and quantiative evaluations, and comparisons to ground truth surfaces and previous methods.
Technical Papers


DescriptionRecent research on learnable neural representations has been widely adopted in the field of 3D scene reconstruction and neural rendering applications. However, traditional feature grid representations often suffer from substantial memory footprint, posing a significant bottleneck for modern parallel computing hardware. In this paper, we present neural vertex features, a generalized formulation of learnable representation for neural rendering tasks involving explicit mesh surfaces. Instead of uniformly distributing neural features throughout 3D space, our method stores learnable features directly at mesh vertices, leveraging the underlying geometry as a compact and structured representation for neural processing. This not only optimizes memory efficiency, but also improves feature representation by aligning compactly with the surface using task-specific geometric priors. Additionally, neural vertex features offer improved feature representation by compactly aligning with the surface using task-specific geometric priors. We validate our neural representation across diverse neural rendering tasks, with a specific emphasis on neural radiosity. Experimental results demonstrate that our method reduces memory consumption to only one-fifth (or even less) of grid-based representations, while maintaining comparable rendering quality and lowering inference overhead.
Technical Papers


DescriptionVertical binocular misalignment (VBM) can degrade image quality and contribute to visual discomfort in stereoscopic head-mounted displays, particularly for see-through AR. In this project, we investigate whether VBM impairs visual performance - namely, users' ability to process briefly-presented AR content, like text notifications. We also quantify how the impacts of VBM vary with an AR system's virtual image distance (VID). Across three experiments, participants were asked to (a) detect and (b) resolve, fuse, and process AR content presented with constant and time-varying VBM. Short text stimuli (words or sentences) were briefly presented on a multi-display haploscope, using additive and transmissive displays to emulate see-through AR. Experiments were repeated at three VIDs: 57, 100, 139 cm (1.75, 1, 0.72 D). The magnitude and frequency of VBM was adaptively sampled on each trial. Visual performance (as measured by participants’ time to fuse text) was steadily impaired with increasing VBM. For high VBM magnitudes, time to fuse did not meaningfully differ between VIDs; for low VBM, time to fuse was fastest in the furthest VID. Participants’ ability to detect VBM also improved at further VIDs. Correlations were observed between all three user outcome measures: detection, visual performance, and comfort. Overall, we find that visual performance metrics provide a useful framework to complement detection and visual comfort approaches, consistent with recent work on VBM and related artifacts in AR. The results of this study can be used to inform VBM tolerance guidelines and VID placement tradeoffs in future AR devices.
Art Gallery






DescriptionVICINO is an interactive installation exploring how touch rebuilds interpersonal connection. Through bioelectrical signals and heart rate, human contact becomes shifting trees and rhythms. The living system of particles invites participants to reflect on how closeness can be perceived beyond language, through gesture, presence, and embodied interaction.
Courses


DescriptionText-to-video generation has reached a new milestone with models like LTXV, which delivers exceptional quality and realism with great speed and efficiency. LTXV's design makes it ideal for quick Low-Rank Adaptation (LoRA), allowing for rapid iteration and experimentation. However, customizing these powerful foundation models for specific applications, characters, or styles remains a significant challenge for researchers and practitioners.
This hands-on course is the first comprehensive practical guide to LoRA training for state-of-the-art text-to-video models. LoRA fine-tuning provides a computationally efficient way to customize large video generation models without the high cost of full retraining. When combined with LTXV’s efficiency, LoRA adaptations can be completed in hours instead of days.
This course bridges the gap between research and practical implementation with live demonstrations and hands-on workflows. You will master LoRA adaptation fundamentals for video generation and learn to use it as a powerful tool to control the generated content. By the end of this course, you will have the skills to guide models using various control modalities like depth maps, pose, and identity, allowing you to dictate specific character movements and scene compositions with precision.
This hands-on course is the first comprehensive practical guide to LoRA training for state-of-the-art text-to-video models. LoRA fine-tuning provides a computationally efficient way to customize large video generation models without the high cost of full retraining. When combined with LTXV’s efficiency, LoRA adaptations can be completed in hours instead of days.
This course bridges the gap between research and practical implementation with live demonstrations and hands-on workflows. You will master LoRA adaptation fundamentals for video generation and learn to use it as a powerful tool to control the generated content. By the end of this course, you will have the skills to guide models using various control modalities like depth maps, pose, and identity, allowing you to dictate specific character movements and scene compositions with precision.
Poster






DescriptionAn end‑to‑end framework, Video2Song, for video soundtrack generation that combines a VLM and retrieval‑augmented generation to extract semantic, stylistic, and rhythmic cues, conditioning a LLM to guide music synthesis.
Technical Papers


DescriptionIn this paper, we propose VideoFrom3D, a novel framework for synthesizing high-quality 3D scene videos from coarse geometry, a camera trajectory, and a reference image. Our approach streamlines the 3D graphic design workflow, enabling flexible design exploration and rapid production of deliverables. A straightforward approach to synthesizing a video from coarse geometry might condition a video diffusion model on geometric structure. However, existing video diffusion models struggle to generate high-fidelity results for complex scenes due to the difficulty of jointly modeling visual quality, motion, and temporal consistency. To address this, we propose a generative framework that leverages the complementary strengths of image and video diffusion models. Specifically, our framework consists of a Sparse Anchor-view Generation (SAG) and a Geometry-guided Generative Inbetweening (GGI) module. The SAG module generates high-quality, cross-view consistent anchor views using an image diffusion model, aided by Sparse Appearance-guided Sampling. Building on these anchor views, GGI module faithfully interpolates intermediate frames using a video diffusion model, enhanced by flow-based camera control and structural guidance. Notably, both modules operate without any paired dataset of 3D scene models and natural images, which is extremely difficult to obtain. Comprehensive experiments show that our method produces high-quality, style-consistent scene videos under diverse and challenging scenarios, outperforming simple and extended baselines.
Invited Poster
Poster






DescriptionIn this work, we investigate and compare the effectiveness of the Traditional Memory Palace technique and a Virtual Reality counterpart for supporting memorization and recall, with a specific focus on university students with ADHD.
Featured Session



DescriptionMetropolis, Mary Poppins, The Matrix, and The Mandalorian all combine real foreground elements with virtual backgrounds to achieve their filmmakers’ vision. Today’s virtual production volumes combine traditional visual effects with lighting reproduction techniques, giving filmmakers even greater capability and control. And emerging machine learning techniques will allow the lighting of foregrounds and backgrounds to be harmonized near automatically. This talk will follow the thread of virtual production from the earliest days of cinema into the future of filmmaking.
Educator's Forum



DescriptionThis pilot study evaluates cloud-hosted virtual workstations delivered via Amazon Web Services (AWS) for real-time virtual production education using Unreal Engine, aiming to enhance accessibility for remote, regional, disabled, and socio-economically disadvantaged students. Conducted over two days through four structured sessions emulating authentic project-based learning, the research employed mixed-method data collection (including quantitative surveys and qualitative feedback from educators and students) to assess technical feasibility, user experience, and pedagogical outcomes.
Results indicated rapid participant adaptation, with initial access friction significantly decreasing and system usability metrics markedly improving over the course duration. Notably, latency experiences varied independently of internet speed, highlighting user interface and workload adaptation as critical factors influencing perceived performance. Educator reflections emphasized the necessity of targeted support ratios, ergonomic considerations, and adaptive pedagogical strategies tailored to virtual environments.
Findings demonstrate that cloud-based virtual workstations effectively overcome traditional barriers associated with on-campus computing resources, provided that pedagogical design prioritizes comprehensive user support, ergonomic awareness, and cognitive load management. This scalable approach offers promising implications for equitable, inclusive creative technology education.
Results indicated rapid participant adaptation, with initial access friction significantly decreasing and system usability metrics markedly improving over the course duration. Notably, latency experiences varied independently of internet speed, highlighting user interface and workload adaptation as critical factors influencing perceived performance. Educator reflections emphasized the necessity of targeted support ratios, ergonomic considerations, and adaptive pedagogical strategies tailored to virtual environments.
Findings demonstrate that cloud-based virtual workstations effectively overcome traditional barriers associated with on-campus computing resources, provided that pedagogical design prioritizes comprehensive user support, ergonomic awareness, and cognitive load management. This scalable approach offers promising implications for equitable, inclusive creative technology education.
Technical Papers


DescriptionWe introduce a framework that enables both multi-view character consistency and 3D camera control in video diffusion models through a novel customization data pipeline. We train the character consistency component with recorded volumetric capture performances re-rendered with diverse camera trajectories via 4D Gaussian Splatting (4DGS), lighting variability obtained with a video relighting model. We fine-tune state-of-the-art open-source video diffusion models on this data to provide strong identity preservation, precise camera control, and lighting adaptability. Our framework also supports core capabilities for virtual production, including multi-subject generation using two approaches: joint training and noise blending, the latter enabling efficient composition of independently customized models at inference time;
it also achieves scene and real-life video customization as well as control over motion and spatial layout during customization. Extensive experiments show improved video quality, higher personalization accuracy, and enhanced camera control and lighting adaptability, advancing the integration of AI-driven video generation into virtual production.
it also achieves scene and real-life video customization as well as control over motion and spatial layout during customization. Extensive experiments show improved video quality, higher personalization accuracy, and enhanced camera control and lighting adaptability, advancing the integration of AI-driven video generation into virtual production.
Technical Papers


DescriptionWe present a vorticity method for simulating incompressible viscous flows on curved surfaces governed by the Navier--Stokes equations. Unlike previous approaches, our formulation incorporates the often-overlooked Gaussian-curvature-dependent term in the viscous force, which influences both the vorticity equation and the evolution of harmonic components. We show that these curvature-related terms are crucial for reproducing physically correct fluid behavior. We introduce an implicit--explicit (IMEX) scheme for solving the resulting system on triangle meshes and demonstrate its effectiveness on surfaces with arbitrary topology, including non-orientable surfaces, and under a variety of boundary conditions. Our theoretical contributions include several explicit formulas: a vorticity jump condition across curvature sheets, a geometric correspondence between friction coefficients and boundary curvature adjustments, and the influence of boundary curvature on harmonic modes. These results not only simplify the algorithmic design but also offer geometric insight into curvature-driven fluid phenomena, such as the emergence of the Kutta condition under free-slip boundaries.
Courses


DescriptionThis 105-minute course explores how heat and light work together across Vision, Imaging, and Simulation. These phenomena are fundamentally coupled — absorbed light heats materials, while hot objects radiate thermal energy as infrared light — yet computer vision and imaging typically ignore heat, while engineering focuses only on thermal effects without considering light.
We show how accounting for both heat and light opens new possibilities. Measuring absorbed light intensity enables solving image analysis problems that were previously impossible. Heat flow patterns reveal object shapes. Multi-spectral thermal cameras can separate what objects reflect versus what they emit. These applications rely on thermal cameras that operate fundamentally differently from visible light cameras — using bolometric rather than photoelectric sensing — creating unique challenges in motion deblurring and noise modeling that we address.
These new vision and imaging capabilities demand equally novel simulation tools. The simulation component introduces Monte Carlo methods for thermal phenomena, showing how walk-on-spheres algorithms enable grid-free heat conduction simulation on complex geometry. These methods work together to handle both light and heat processes, enabling complete thermal simulation with potential applications in hardware design, synthetic dataset generation, and real-world scene analysis.
This course targets computer vision, graphics, and imaging researchers wanting to work beyond visible light. Participants will learn basic theory and practical techniques for heat-light interactions, understanding state-of-the-art developments and opening new research directions at the intersection of thermal vision, imaging, and physics-based simulation.
We show how accounting for both heat and light opens new possibilities. Measuring absorbed light intensity enables solving image analysis problems that were previously impossible. Heat flow patterns reveal object shapes. Multi-spectral thermal cameras can separate what objects reflect versus what they emit. These applications rely on thermal cameras that operate fundamentally differently from visible light cameras — using bolometric rather than photoelectric sensing — creating unique challenges in motion deblurring and noise modeling that we address.
These new vision and imaging capabilities demand equally novel simulation tools. The simulation component introduces Monte Carlo methods for thermal phenomena, showing how walk-on-spheres algorithms enable grid-free heat conduction simulation on complex geometry. These methods work together to handle both light and heat processes, enabling complete thermal simulation with potential applications in hardware design, synthetic dataset generation, and real-world scene analysis.
This course targets computer vision, graphics, and imaging researchers wanting to work beyond visible light. Participants will learn basic theory and practical techniques for heat-light interactions, understanding state-of-the-art developments and opening new research directions at the intersection of thermal vision, imaging, and physics-based simulation.
Poster






DescriptionWe developed an interactive storytelling system with tangible media, drawing on insights from pediatric healthcare professionals, to translate immune mechanisms and sublingual immunotherapy (SLIT) into playful, child-centered experiences that foster understanding and emotional engagement in long-term pediatric treatment.
Poster






DescriptionNanlai Zhi Hua visualizes the spatio-temporal narratives of Hong Kong’s Southbound Literati through “data flowers” that merge bio- graphical and fictional geographies. Integrating literary geography with information visualization, the project translates displacement and nostalgia into visual form, revealing how spatial memory struc- tures diasporic creativity in post-1949 Hong Kong.
Technical Papers


DescriptionVirtual try-on aims to synthesize a realistic image of a person wearing a target garment, but accurately modeling garment–body correspondence remains a persistent challenge, especially under pose and appearance variation.
In this paper, we propose Voost—a unified and scalable framework that jointly learns virtual try-on and try-off with a single diffusion transformer.
By modeling both tasks jointly, Voost enables each garment-person pair to supervise both directions and supports flexible conditioning over generation direction and garment category—enhancing garment–body relational reasoning without task-specific networks, auxiliary losses, or additional labels.
In addition, we introduce two inference-time techniques: attention temperature scaling for robustness to resolution or mask variation, and self-corrective sampling that leverages bidirectional consistency between tasks.
Extensive experiments demonstrate that Voost achieves state-of-the-art results on both try-on and try-off benchmarks, consistently outperforming strong baselines in alignment accuracy, visual fidelity, and generalization.
In this paper, we propose Voost—a unified and scalable framework that jointly learns virtual try-on and try-off with a single diffusion transformer.
By modeling both tasks jointly, Voost enables each garment-person pair to supervise both directions and supports flexible conditioning over generation direction and garment category—enhancing garment–body relational reasoning without task-specific networks, auxiliary losses, or additional labels.
In addition, we introduce two inference-time techniques: attention temperature scaling for robustness to resolution or mask variation, and self-corrective sampling that leverages bidirectional consistency between tasks.
Extensive experiments demonstrate that Voost achieves state-of-the-art results on both try-on and try-off benchmarks, consistently outperforming strong baselines in alignment accuracy, visual fidelity, and generalization.
Technical Papers


DescriptionReal-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion: A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence 2) Long-Range World Exploration: An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency, and 3) Scalable Data Engine: A video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Collectively, these designs result in a clear improvement over existing methods in visual quality and geometric accuracy, with versatile applications.
XR






DescriptionPartnering with Singapore Airlines, we developed a VR prototype that simulates complex Busan visual approach with threats/errors, allowing pilots to train anytime, anywhere at a fraction of Full Flight Simulator costs. VR pilot training offers a scalable, cost-effective alternative, enabling practice for rare and high-stakes procedures.
Educator's Forum



DescriptionPlants, landscape architecture, and garden design are three important teaching contents of landscape architecture majors in colleges. However, the traditional teaching method has the problems of plant resource limitation, high cost of fieldwork in classical gardens, and insufficient feedback of design practice, etc. This study proposes and constructs an interactive garden teaching platform based on virtual reality technology, which contains a knowledge database of typical garden plants and three major Chinese classical garden scene libraries, proposes a dynamic gesture classification algorithm based on $p point cloud recognizer, and realizes an interactive garden landscape layout system based on gesture interactive garden landscape layout system, through the system integration of plant cognitive training, classical garden digital tour and gesture interactive planting meter three modules, innovatively realizes the virtual-real integration of the core teaching objectives of gardening profession. The experimental study shows that the platform is significantly better than the traditional teaching mode in the dimensions of knowledge transfer efficiency, immediacy of feedback on design practice and user experience, providing a reliable and replicable teaching model for landscape architecture professional education.
Poster






DescriptionVR K-Heritage Assembler enables immersive assembly of Gong-po bracket structures from twelve Korean national treasure buildings, offering interactive, repeatable, and cost-effective virtual learning to enhance architectural heritage education and understanding.
Exhibitor Talk
VTubing: How Precision Motion Technology Powers the Next Era of Creativity, Science, and Performance
10:30am - 11:30am HKT Wednesday, 17 December 2025 Theatre 2, Level 1





DescriptionA primer on the history, technology of VTubing and a look to the future of what is one of the fastest growing entertainment mediums.
Games






DescriptionAn action game with simple fighting game features,
'Competitive' & 'Entirely collidable'.
No matter you're timid, aggressive, reckless, strategic or crazy,
you'll find your own playstyle in the game.
The 2 Elements of the game -
Competitive:
Normal Enemies come in a pair or formation: Offensive (Attack & Dodge) & Defensive (Defend & Disturb) or Attack from ground & from the air.
Mid-Boss & Boss are equipped with super armor, there's a way to break and knock them down.
Player would feel like competing with opponents from fighting games but in an action game.
Entirely collidable:
All of the enemies or objects on the stage are able to be collided.
Players can grab and crash them into other enemies.
'Competitive' & 'Entirely collidable'.
No matter you're timid, aggressive, reckless, strategic or crazy,
you'll find your own playstyle in the game.
The 2 Elements of the game -
Competitive:
Normal Enemies come in a pair or formation: Offensive (Attack & Dodge) & Defensive (Defend & Disturb) or Attack from ground & from the air.
Mid-Boss & Boss are equipped with super armor, there's a way to break and knock them down.
Player would feel like competing with opponents from fighting games but in an action game.
Entirely collidable:
All of the enemies or objects on the stage are able to be collided.
Players can grab and crash them into other enemies.
Technical Papers
Waste-to-Value: Reutilized Material Maximization for Additive and Subtractive Hybrid Remanufacturing
4:30pm - 4:40pm HKT Monday, 15 December 2025 Meeting Room S423+S424, Level 4

DescriptionRemanufacturing effectively extends component lifespans by restoring used or end-of-life parts to like-new or even superior conditions, with an emphasis on maximizing reutilized material, especially for high-cost materials. Hybrid manufacturing technology combines the capabilities of additive and subtractive manufacturing, with the ability to add and remove material, enabling it to remanufacture complex shapes and is increasingly being applied in remanufacturing. How to effectively plan the process of additive and subtractive hybrid remanufacturing (ASHRM) to maximize material reutilization has become a key focus of attention. However, current ASHRM process planning methods lack strict consideration of collision-free constraints, hindering practical application. This paper introduces a computational framework to tackle ASHRM process planning for general shapes with strictly considering these constraints. We separate global and local collision-free constraints, employing clipping planes and graph to tackle them respectively, ultimately maximizing the reutilized volume while ensuring these constraints are satisfied. Additionally, we also optimize the setup of the target model that is conducive to maximizing the reutilized volume. Extensive experiments and physical validations on a 5-axis hybrid manufacturing platform demonstrate the effectiveness of our method across various 3D shapes, achieving an average material reutilization of 69% across 12 cases.
Technical Papers


DescriptionWatertight tessellation is essential for real-time rendering of large-scale surfaces, particularly for Non-Uniform Rational B-Splines (NURBS) and Catmull-Clark Subdivision (CCS) surfaces. We present WATER, a software-based framework that delivers watertight, non-uniform tessellation with pixel-level accuracy at real-time frame rates. Unlike fixed-function hardware tessellation, WATER adopts a fully GPU-driven pipeline with cache-friendly design and novel algorithms, offering greater flexibility, scalability, and performance. Under our framework, a 2x-3x speedup over hardware tessellation is achieved for bi-3 Bézier surfaces, while bi-7 Bézier surfaces exhibit a 7x-11x times improvement. Compared to ETER [Xiong et al. 2023], our method achieves 1.4x-2.5x faster rendering and 52%–72% lower memory usage under the same non-watertight uniform pattern. When enforcing watertightness, a moderate overhead of 23%–51% is incurred. With its advantages in quality, efficiency, and adaptability, WATER provides a compelling alternative for industrial-scale rendering tasks requiring watertightness.
Technical Papers


DescriptionThis paper presents a novel wavelet-based framework for simulating single-phase (e.g., smoke) and two-phase (e.g., bubbly water) flows, featuring unified boundary condition handling for free surfaces and solid obstacles.
In liquid simulations, conventional pressure projection methods solve a simplified pressure Poisson equation by enforcing zero-pressure Dirichlet conditions at free surfaces. However, these methods ignore air-phase incompressibility, resulting in artificial bubble collapse. Stream function approaches address this limitation by solving a density-variable vector potential Poisson equation, ensuring incompressibility in both simulated and unsimulated regions while maintaining divergence-free liquid phases independent of solver accuracy. Yet, they triple the linear system’s dimensionality and suffer from poor convergence with solid boundaries.
The core limitation of both methods lies in their governing equations: singularities arise as density approaches extreme values. The pressure Poisson equation becomes ill-conditioned when density nears zero (air phase), compromising air-phase incompressibility, while the vector potential equation deteriorates as density approaches infinity (solid phase), hindering solid-boundary convergence.
To resolve these singularities, we first introduce a novel decomposition where zero and infinite densities are well-defined. We then reformulate this decomposition as a fixed-point iteration using density-agnostic curl-free and divergence-free projections, eliminating the need for linear system solves. Finally, we develop an iterative algorithm that alternately applies curl-free and divergence-free wavelet projections to efficiently solve the fixed-point problem.
Our method concurrently computes pressure and stream functions, retaining the incompressibility advantages of stream function approaches while overcoming their computational inefficiencies and solid-boundary convergence challenges. By leveraging the inherent parallelism of wavelet transforms, our framework enables efficient GPU implementation, achieving substantial performance gains.
In liquid simulations, conventional pressure projection methods solve a simplified pressure Poisson equation by enforcing zero-pressure Dirichlet conditions at free surfaces. However, these methods ignore air-phase incompressibility, resulting in artificial bubble collapse. Stream function approaches address this limitation by solving a density-variable vector potential Poisson equation, ensuring incompressibility in both simulated and unsimulated regions while maintaining divergence-free liquid phases independent of solver accuracy. Yet, they triple the linear system’s dimensionality and suffer from poor convergence with solid boundaries.
The core limitation of both methods lies in their governing equations: singularities arise as density approaches extreme values. The pressure Poisson equation becomes ill-conditioned when density nears zero (air phase), compromising air-phase incompressibility, while the vector potential equation deteriorates as density approaches infinity (solid phase), hindering solid-boundary convergence.
To resolve these singularities, we first introduce a novel decomposition where zero and infinite densities are well-defined. We then reformulate this decomposition as a fixed-point iteration using density-agnostic curl-free and divergence-free projections, eliminating the need for linear system solves. Finally, we develop an iterative algorithm that alternately applies curl-free and divergence-free wavelet projections to efficiently solve the fixed-point problem.
Our method concurrently computes pressure and stream functions, retaining the incompressibility advantages of stream function approaches while overcoming their computational inefficiencies and solid-boundary convergence challenges. By leveraging the inherent parallelism of wavelet transforms, our framework enables efficient GPU implementation, achieving substantial performance gains.
Poster






DescriptionIn this paper, with a new weight comparison task, we investigate the effect of viewpoint position on the weight illusion in detail.
Poster






DescriptionAn interactive audio-visual installation that reimagines Suona culture through collaborative tangible interaction, transforming ritual performance into a symbolic journey of bird calls and phoenix rebirth.
Birds of a Feather






DescriptionThe field of Neural Graphics (NG) uses AI to generate visual content. Instead of relying on traditional graphics pipelines NG uses learned models to render visual data. While such innovations are fantastic, the diversity of AI platforms poses a challenge for developers looking for cross-platform support. In this BOF, the Khronos Machine Learning Council will share the results of a recent in-depth study into accelerating AI and share their recommendations for community feedback. If you have an interest in AI and ML then join this interactive session to share your views with our panel of experts.
"
"
Poster






DescriptionWhen Stars Shine through Silence is an interactive visualization using star metaphors to show victims’ courage, online solidarity, and institutional response, fostering empowerment and collective awareness.
Art Papers



DescriptionTraditional sensors function as exact measuring instruments to represent the physical world. Conversely, humans are subjective, inaccurate sensors, whose measured quantities are often influenced by their perception of the environment. In this project, we speculate on the possibility of a sensing system where the human being becomes the sensor. We observe the human sensor through their movement, as a performative expression of the measure. We explore this concept through site-specific embodied practices, gathering movement and textual expressions related to environmental sensing from professional choreographers. This data is then utilized to fine-tune a large language model that generates abstract words based on the input movements. This language model is integrated into a system as a critical alternative to the technology of sensing, where the process of sensing turns into reflecting the observer's subjective body expression rather than providing a 1:1 mapping of the phenomenon.
Poster






DescriptionThe project visualizes Tang poetry imagery through a VR interactive system, enabling immersive exploration of poetic imagery.
Poster






DescriptionWhispering Shells is an immersive interactive installation that transforms scientific data about shell depletion into a sensory experience. Using Arduino-driven sculpture and real-time projections, it visualizes human impact on coastal ecosystems and promotes environmental awareness. By merging art, ecology, and technology, the work inspires behavioral change and demonstrates how interactive storytelling can make complex environmental issues tangible.
Computer Animation Festival






DescriptionIn a park, a family is enjoying a picnic, a businessman is working next to a grandpa playing with his dog, while two lovers are passionately kissing.
The wind blows hard and disrupts their daily routine. It blows harder and harder, so hard that they all get swept away together. In the air, they meet and see their lives evolve toward new horizons.
The wind blows hard and disrupts their daily routine. It blows harder and harder, so hard that they all get swept away together. In the air, they meet and see their lives evolve toward new horizons.
Emerging Technologies






DescriptionWe present WireDrum---an interpersonal and multimodal skill-sharing system for an augmented drumming experience. By integrating the muscle activities of multiple instructors into a target user’s muscles, the user can perform complex physical skills beyond their original ability. Simultaneously, instructors can reduce their cognitive load related to teaching and performing the task.
Computer Animation Festival






Description“Wishes: Windows & Nests” is a celebration of children's empathy and a reminder that they have the power change the world, one small action at a time. The premise of the story-engine is that “inside every one of us there’s a box of wishes... But the most powerful ones are those that belong to others”. It's told from the perspective of the creators as members of a migrant family of filmmakers, and combines 2D tradigital, painterly animation with digital cut-out puppets.
One of the technical challenges was to achieve the director's 'painterly style' (e.g. with textures rather than flat colours as part of the distinctive style) in an efficient way, to prototype a scalable model for potential series production. For this, the team collaborated with Toon Boom representatives to build on rigs they had developed internally during early R&D. These rigs not only included aspects of motion but also node systems for colouring, lighting, shading and effects.
The immersive sound was achieved using microphone arrays for fauna and city recordings, with a frontal MS setup for LCR, and stereo for LsRs. Sound Particles and GRM Space Grain were used to generate immersive soundscapes and the project was mixed in near-field Dolby Atmos, seeking consistency with theatrical 5.1 and binaural headphone playback.
The pipeline involved 'agile' iteration loops in which sound sketches were provided to animators, whose work would be influenced by these, and subsequently the sound designer would evolve their work based on the animation output.
This project received funding from Screen Australia, the South Australian Film Corporation and the Adelaide Film Festival, has been recognised with prestigious awards like the Screen Producers Australia Best Short Production and has been selected to 40+ festivals to date. It's also inspiration for a series in development (selected by Annecy-Mifa Pitches 2024).
One of the technical challenges was to achieve the director's 'painterly style' (e.g. with textures rather than flat colours as part of the distinctive style) in an efficient way, to prototype a scalable model for potential series production. For this, the team collaborated with Toon Boom representatives to build on rigs they had developed internally during early R&D. These rigs not only included aspects of motion but also node systems for colouring, lighting, shading and effects.
The immersive sound was achieved using microphone arrays for fauna and city recordings, with a frontal MS setup for LCR, and stereo for LsRs. Sound Particles and GRM Space Grain were used to generate immersive soundscapes and the project was mixed in near-field Dolby Atmos, seeking consistency with theatrical 5.1 and binaural headphone playback.
The pipeline involved 'agile' iteration loops in which sound sketches were provided to animators, whose work would be influenced by these, and subsequently the sound designer would evolve their work based on the animation output.
This project received funding from Screen Australia, the South Australian Film Corporation and the Adelaide Film Festival, has been recognised with prestigious awards like the Screen Producers Australia Best Short Production and has been selected to 40+ festivals to date. It's also inspiration for a series in development (selected by Annecy-Mifa Pitches 2024).
Birds of a Feather






DescriptionThis is a networking event for students, faculty, and industry researchers organized by WiGRAPH. The purpose of the event is to broaden the network of women researchers and provide a friendly and personal environment where students and researchers can interact. The event is open to all researchers, regardless of gender.
Birds of a Feather






DescriptionThis BoF is for attendees interested in interdisciplinary collaboration between art and technology, featuring speakers from both art and technology backgrounds who will share their experiences with a panel discussion.
Technical Papers


DescriptionGenerating 3D worlds from text is a highly anticipated goal in computer vision. Existing works are limited by the degree of exploration they allow inside of a scene, i.e., produce streched-out and noisy artifacts when moving beyond central or panoramic perspectives. To this end, we propose WorldExplorer, a novel method based on autoregressive video trajectory generation, which builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints. We initialize our scenes by creating multi-view consistent images corresponding to a 360 degree panorama. Then, we expand it by leveraging video diffusion models in an iterative scene generation pipeline. Concretely, we generate multiple videos along short, pre-defined trajectories, that explore the scene in depth, including motion around objects. Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results, like moving into objects. Finally, we fuse all generated views into a unified 3D representation via 3D Gaussian Splatting optimization. Compared to prior approaches, WorldExplorer produces high-quality scenes that remain stable under large camera motion, enabling for the first time realistic and unrestricted exploration. We believe this marks a significant step toward generating immersive and truly explorable virtual 3D environments.
Key Event
Keynote



DescriptionUpdating in Progress...
Technical Papers


DescriptionWe present X-Actor, a novel audio-driven portrait animation framework that generates lifelike, emotionally expressive talking head videos from a single reference image and an input audio clip. Unlike prior approaches that primarily focus on lip synchronization and visual fidelity in constrained speaking scenarios, X-Actor enables actor-quality, long-range portrait acting that captures diverse, fine-grained, and dynamically evolving yet temporally consistent human emotions, flowing and transitioning in sync with the audio dynamics. Central to our approach is a two-stage decoupled generation pipeline: an audio-conditioned autoregressive model that predicts expressive yet identity-agnostic facial motion latent tokens within a long temporal context window, followed by a diffusion-based video synthesis module that translates these motions into high-fidelity video animations. By operating in a compact facial motion latent space decoupled from visual and identity cues, our autoregressive model effectively learns long-range correlations between audio and facial dynamics through a diffusion-forcing training paradigm, enabling infinite-length motion prediction without error accumulation. Extensive experiments demonstrate that X-Actor produces compelling, cinematic-style performances that go beyond standard talking head animations and achieves state-of-the-art results in long-range, audio-driven emotional portrait acting.
Technical Papers


DescriptionWe present X-UniMotion, a unified and expressive implicit latent representation for whole-body human motion, encompassing facial expressions, body poses, and hand gestures. Unlike prior motion transfer methods that rely on explicit skeletal poses and heuristic cross-identity adjustments, our approach encodes multi-granularity human motion directly from a single image into a compact set of four disentangled latent tokens—one each for facial expression and body pose, and one per hand. These motion latents are both highly expressive and identity-agnostic, enabling high-fidelity, detailed cross-identity motion transfer across subjects with distinct identity attributes and pose configurations.
To achieve this, we introduce a self-supervised, end-to-end training framework that jointly learns the motion encoder and latent representation alongside a DiT-based video generative model, trained on large-scale video datasets spanning diverse human motions. Motion-identity disentanglement is enforced via spatial and color augmentations, as well as synthetic 3D renderings of cross-identity subject pairs. We further guide the learning of motion tokens using auxiliary spatial decoders to promote fine-grained, semantically aligned, and depth-aware motion embeddings.
Extensive experiments demonstrate that X-UniMotion outperforms state-of-the-art methods, producing highly expressive animations with superior motion expressiveness and identity preservation.
To achieve this, we introduce a self-supervised, end-to-end training framework that jointly learns the motion encoder and latent representation alongside a DiT-based video generative model, trained on large-scale video datasets spanning diverse human motions. Motion-identity disentanglement is enforced via spatial and color augmentations, as well as synthetic 3D renderings of cross-identity subject pairs. We further guide the learning of motion tokens using auxiliary spatial decoders to promote fine-grained, semantically aligned, and depth-aware motion embeddings.
Extensive experiments demonstrate that X-UniMotion outperforms state-of-the-art methods, producing highly expressive animations with superior motion expressiveness and identity preservation.
Birds of a Feather






DescriptionOpenXR is the widely embraced open standard that provides a common set of APIs for developing portable VR and AR applications. This BOF provides the XR community an opportunity to learn about the latest advances in OpenXR, including Spatial Entities, and join lively discussions on the challenges and best practices of applying OpenXR to deliver high-quality experiences. Join us to network, exchange ideas, share experiences, raise issues and make suggestions on the the future evolution of OpenXR. This session also features PICO’s open standard of next-gen Spatial Perception, offering fresh insights into the future of immersive interaction.
XR






DescriptionYurt Portal is an interactive XR installation merging physical and digital spaces through cultural digital twins. Set in a traditional Kyrgyz yurt, it mirrors Kyrgyz culture to Japanese culture using AI-driven interactions, promoting empathy-based cultural learning and positioning XR as a powerful medium for cross-cultural connection and understanding.
Poster






DescriptionZenith is a pipeline that automates the creation of multi-layered top-down maps for World of Warcraft dungeons. It combines procedural geometry processing with fine-tuned diffusion models to generate artistic layers (line art, shadows, highlights). A novel multi-encoder ControlNet ensures spatial coherence, reducing artist authoring time by up to 40%.
Technical Papers


DescriptionRecent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now feasible, most existing methods require per-instance fine-tuning, limiting scalability. We introduce a fully zero-shot framework for dynamic concept personalization in text-to-video models. Our method leverages structured 2×2 video grids that spatially organize input and output pairs, enabling the training of lightweight Grid-LoRA adapters for editing and composition within these grids. At inference, a dedicated Grid Fill module completes partially observed layouts, producing temporally coherent and identity preserving outputs. Once trained, the entire system operates in a single forward pass, generalizing to previously unseen dynamic concepts without any test-time optimization. Extensive experiments demonstrate high-quality and consistent results across a wide range of subjects beyond trained concepts and editing scenarios.
Computer Animation Festival






DescriptionÜMIT is a 2D/3D animated short film created by students at the Savannah College of Art and Design (SCAD). Set in the 15th-century Kazakh Kingdom—present-day Kazakhstan—the story follows a young astronomer named Ümit, who is determined to bring back the long-lost Sun to her village, which has been trapped in darkness for centuries. As hope fades and life becomes increasingly difficult, Ümit clings to hope and builds a device that could reunite her people with the light.
Inspired by nomadic Central Asian cultures, the Timurid Renaissance, Tengri music, and coming-of-age tales, ÜMIT is a heartfelt tribute to resilience, astronomy, and cultural identity. Through its story and setting, the film hopes to shed light on Central Asian narratives—one among many cultures around the world often overlooked in global media and animation.
Inspired by nomadic Central Asian cultures, the Timurid Renaissance, Tengri music, and coming-of-age tales, ÜMIT is a heartfelt tribute to resilience, astronomy, and cultural identity. Through its story and setting, the film hopes to shed light on Central Asian narratives—one among many cultures around the world often overlooked in global media and animation.
Sessions
Cultural Program
Chinese Rainbow Calligraphy
12:00pm - 2:00pm HKT Thursday, 18 December 2025 Hall 3G, Level 3





Key Event
Closing of SIGGRAPH Asia 2025 & Opening of SIGGRAPH Asia 2026
9:30am - 10:00am HKT Thursday, 18 December 2025 





Computer Animation Festival
Key Event
Computer Animation Festival Closing & Awards Ceremony
6:00pm - 6:30pm HKT Wednesday, 17 December 2025 Hall 3F, Level 3


Computer Animation Festival
Key Event
Computer Animation Festival Opening Ceremony & Performance ("Weaving Awareness" by Desmond Leung)
4:00pm - 4:30pm HKT Monday, 15 December 2025 Hall 3F, Level 3


General Location
Conference Management Office
9:00am - 6:00pm HKT Monday, 15 December 2025 Room G311, Level 3 [Organizer Room]





General Location
Conference Management Office
9:00am - 6:00pm HKT Thursday, 18 December 2025 Room G311, Level 3 [Organizer Room]





General Location
Conference Management Office
9:00am - 6:00pm HKT Wednesday, 17 December 2025 Room G311, Level 3 [Organizer Room]





General Location
Conference Management Office
9:00am - 6:00pm HKT Tuesday, 16 December 2025 Room G311, Level 3 [Organizer Room]





Educator's Forum
Key Event
Educator's Forum - Closing & Awards
2:35pm - 2:50pm HKT Thursday, 18 December 2025 Meeting Room S228, Level 2


Emerging Technologies
Key Event
XR
Emerging Technologies & XR Awards
3:15pm - 3:30pm HKT Thursday, 18 December 2025 Hall 3G - Talk Stage, Level 3





General Location
Exhibition Management Office
9:00am - 6:00pm HKT Thursday, 18 December 2025 Room G312, Level 3 [Organizer Room]





General Location
Exhibition Management Office
9:00am - 6:00pm HKT Tuesday, 16 December 2025 Room G312, Level 3 [Organizer Room]





General Location
Exhibition Management Office
9:00am - 6:00pm HKT Wednesday, 17 December 2025 Room G312, Level 3 [Organizer Room]





General Location
Exhibition Management Office
9:00am - 6:00pm HKT Monday, 15 December 2025 Room G312, Level 3 [Organizer Room]





DLI Labs
Exhibitor Talk






Key Event
Poster
Posters Awards
3:00pm - 3:15pm HKT Thursday, 18 December 2025 Hall 3G - Talk Stage, Level 3





Key Event
Press Conference [For Media & Press only]
11:15am - 12:15pm HKT Tuesday, 16 December 2025 Hall 3G - Talk Stage, Level 3Key Event
Real-Time Live!
Real-Time Live! - Awards
6:00pm - 6:15pm HKT Thursday, 18 December 2025 Hall 3F, Level 3


General Location
Sensory Room
9:00am - 6:00pm HKT Wednesday, 17 December 2025 V103, Theatre 2, Level 1





General Location
Sensory Room
9:00am - 6:00pm HKT Thursday, 18 December 2025 V103, Theatre 2, Level 1





General Location
Sensory Room
9:00am - 6:00pm HKT Monday, 15 December 2025 V103, Theatre 2, Level 1





General Location
Sensory Room
9:00am - 6:00pm HKT Tuesday, 16 December 2025 V103, Theatre 2, Level 1





Key Event
SIGGRAPH Asia 2025 Official Reception
6:00pm - 8:00pm HKT Tuesday, 16 December 2025 Renaissance Harbour View Hotel Hong Kong, Concord Room & Oasis Room, 8/F

General Location
SIGGRAPH Asia Networking Lounge
10:00am - 5:00pm HKT Wednesday, 17 December 2025 Hall 3G, Level 3





General Location
SIGGRAPH Asia Networking Lounge
10:00am - 4:00pm HKT Thursday, 18 December 2025 Hall 3G, Level 3





General Location
SIGGRAPH Asia Networking Lounge
11:00am - 5:00pm HKT Tuesday, 16 December 2025 Hall 3G, Level 3





Technical Communications


General Location
Speaker Practice Room
8:00am - 6:00pm HKT Wednesday, 17 December 2025 Meeting Room S429, Level 4





General Location
Speaker Practice Room
8:00am - 6:00pm HKT Tuesday, 16 December 2025 Meeting Room S429, Level 4





General Location
Speaker Practice Room
8:00am - 6:00pm HKT Monday, 15 December 2025 Meeting Room S429, Level 4





General Location
Speaker Practice Room
8:00am - 5:00pm HKT Thursday, 18 December 2025 Meeting Room S429, Level 4





General Location
Speaker Practice Room
3:00pm - 5:00pm HKT Sunday, 14 December 2025 Meeting Room S429, Level 4





General Location
Speaker Preparation Room
8:00am - 6:00pm HKT Tuesday, 16 December 2025 Meeting Room S428, Level 4





General Location
Speaker Preparation Room
3:00pm - 5:00pm HKT Sunday, 14 December 2025 Meeting Room S428, Level 4





General Location
Speaker Preparation Room
8:00am - 6:00pm HKT Wednesday, 17 December 2025 Meeting Room S428, Level 4





General Location
Speaker Preparation Room
8:00am - 5:00pm HKT Thursday, 18 December 2025 Meeting Room S428, Level 4





General Location
Speaker Preparation Room
8:00am - 6:00pm HKT Monday, 15 December 2025 Meeting Room S428, Level 4





Key Event
Technical Communications
Technical Communications Awards
3:30pm - 3:45pm HKT Thursday, 18 December 2025 Meeting Room S422, Level 4

Try a different query.
