Generation models and real-time interactive applications bring us closer to the metaverse. However, it is still hard to achieve an immersive experience, due to the following three challenges. First, the pipeline, structure, and inference paradigm of generative models (e.g., GPT-3, Stable Diffusion) is more complicated than traditional one-shot models (e.g., ResNet). Hence, existing serving systems for one-shot models are easier to block, and thus cannot satisfy the ultra-low latency (e.g., <100ms) required by real-time interactive applications. Second, the output size of video/3D scene generative models is unprecedentedly large. Such a huge output puts tremendous pressure on rule-based codecs, thereby greatly increasing end-to-end latency. Third, a frame of metaverse typically consists of multiple 3D scenes, and different scenes (e.g., NeRF) require different rendering quality. It is crude to blindly use PSNR and SSIM to allocate rendering resources. Besides, existing algorithms, hardware, and interactive modes also require reformation. We endeavor to bridge these gaps through the holistic full-stack solution with an array of algorithm- and system-level innovations to achieve a real metaverse.

Untitled

Neural Radiance Fields

  1. Mildenhall, Ben, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. "Nerf: Representing scenes as neural radiance fields for view synthesis." Communications of the ACM 65, no. 1 (2021): 99-106.
  2. Gao, Kyle, Yina Gao, Hongjie He, Denning Lu, Linlin Xu, and Jonathan Li. "Nerf: Neural radiance field in 3d vision, a comprehensive review." arXiv preprint arXiv:2210.00379(2022).

Generative Models for 3D Scenes

  1. Li, Chenghao, Chaoning Zhang, Atish Waghwase, Lik-Hang Lee, Francois Rameau, Yang Yang, Sung-Ho Bae, and Choong Seon Hong. "Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era." arXiv preprint arXiv:2305.06131(2023).

Generative Model Serving System

  1. Romero, Francisco, Qian Li, Neeraja J. Yadwadkar, and Christos Kozyrakis. "{INFaaS}: Automated model-less inference serving." In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pp. 397-411. 2021.
  2. Wang, Yiding, Kai Chen, Haisheng Tan, and Kun Guo. "Tabi: An Efficient Multi-Level Inference System for Large Language Models." In Proceedings of the Eighteenth European Conference on Computer Systems, pp. 233-248. 2023.
  3. Wu, Bingyang, Yinmin Zhong, Zili Zhang, Gang Huang, Xuanzhe Liu, and Xin Jin. "Fast Distributed Inference Serving for Large Language Models." arXiv preprint arXiv:2305.05920(2023).

Metaverse Multimedia Streaming

  1. Zhang, Anlan, Chendong Wang, Bo Han, and Feng Qian. "{YuZu}:{Neural-Enhanced} volumetric video streaming." In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp. 137-154. 2022.
  2. Cheng, Yihua, Anton Arapin, Ziyi Zhang, Qizheng Zhang, Hanchen Li, Nick Feamster, and Junchen Jiang. "Grace++: Loss-Resilient Real-Time Video Communication under High Network Latency." arXiv preprint arXiv:2305.12333 (2023).
  3. Yan, Francis Y., Hudson Ayers, Chenzhi Zhu, Sadjad Fouladi, James Hong, Keyi Zhang, Philip Levis, and Keith Winstein. "Learning in situ: a randomized experiment in video streaming." In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pp. 495-511. 2020.

Real-time Capturing System

  1. Lawrence, Jason, Dan B. Goldman, Supreeth Achar, Gregory Major Blascovich, Joseph G. Desloge, Tommy Fortes, Eric M. Gomez et al. "Project Starline: A high-fidelity telepresence system." (2021). https://research.google/pubs/pub50903/
  2. Zhang, Yizhong, Jiaolong Yang, Zhen Liu, Ruicheng Wang, Guojun Chen, Xin Tong, and Baining Guo. "Virtualcube: An immersive 3d video communication system." IEEE Transactions on Visualization and Computer Graphics 28, no. 5 (2022): 2146-2156. https://arxiv.org/abs/2112.06730