Chong Xia

Chong Xia | 夏冲

I'm a second-year PhD student in the Department of Electronic Engineering at Tsinghua University , advised by Prof. Yueqi Duan. In 2024, I obtained my B.Eng. in the Department of Automation, Tsinghua University.

My research interest lies in 3D Vision and Embodied AI. If you are interested in working with me, please feel free to drop me an email.

Email / CV / Github

News

2026-02: Two papers on SimReady and Online Reconstruction are accepted by CVPR 2026.

2025-06: One paper on World Generation is accepted by ICCV 2025.

2025-06: One paper on Embodied Perception is accepted by IROS 2025.

2024-02: One paper on Embodied Perception is accepted by CVPR 2024.

Preprints

*Equal contribution ^†Project leader

ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
Mingyu Dong*, Chong Xia*, Mingyuan Jia, Weichen Lyu, Long Xu , Zheng Zhu , Yueqi Duan
Arxiv, 2026
[Paper] [Code] [Project Page]

We propose ReplicateAnyScene, a framework for fully automated and zero-shot compositional 3D reconstruction from casually captured videos. Our method extracts and aligns cross-modal priors from vision foundation models to generate semantically coherent and physically plausible 3D scenes.

Selected Publications

*Equal contribution ^†Project leader

	SimRecon: SimReady Compositional Scene Reconstruction from Real Videos Chong Xia, Kai Zhu, Zizhuo Wang, Fangfu Liu , Zhizheng Zhang , Yueqi Duan Computer Vision and Pattern Recognition (CVPR), 2026, Highlight [Paper] [Code] [Project Page] In this paper, we propose a novel compositional scene reconstruction framework, coined SimRecon, that implements a "Perception-Generation-Simulation" pipeline with specialized bridging modules to ensure high visual fidelity and physical plausibility.
	OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution Chong Xia, Fangfu Liu , Yule Wang, Yize Pang, Yueqi Duan Computer Vision and Pattern Recognition (CVPR), 2026, Findings [Paper] [Code] [Project Page] To enable online 3D reconstruction from streaming images, we propose OnlineX, a framework that jointly models visual appearance and language fields through a novel decoupled memory state evolution.
	ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment Chong Xia, Shengjun Zhang , Fangfu Liu , Chang Liu, Khodchaphun Hirunyaratsameewong, Yueqi Duan International Conference on Computer Vision (ICCV), 2025 [Paper] [Code] [Project Page] In this paper, we introduce a novel world generation framework, coined ScenePainter, to direct the iterative scene generation process with semantically consistent scene concepts and relations.
	Anyview: General Indoor 3D Object Detection with Variable Frames Zhenyu Wu, Xiuwei Xu , Ziwei Wang , Chong Xia, Linqing Zhao , Jiwen Lu , Haibin Yan IEEE/RSJ International Conference on Intelligent Robots & Systems (IROS), 2025 [Paper] [Code] [Project Page] In this paper, we propose a novel network framework for indoor 3D object detection to handle variable input frame numbers in practical scenarios.
	Memory-based Adapters for Online 3D Scene Perception Xiuwei Xu* , Chong Xia, Ziwei Wang , Linqing Zhao , Yueqi Duan , Jie Zhou , Jiwen Lu Computer Vision and Pattern Recognition (CVPR)*, 2024 [Paper] [Code] [Project Page] We propose a model and task-agnostic plug-and-play module, which converts offline 3D scene perception models (receive reconstructed point clouds) to online perception models (receive streaming RGB-D videos).

Website Template