Statement of Need

Simulation is an ever increasing data source for training deep learning models. In robotics, simulations have been successfully used to learn behaviors such as navigation, walking, flying or manipulation. The value of data generation in simulation mainly depends on the diversity and scale of scene layouts. Existing datasets [1, 2, 3, 4] are limited in that regard, whereas purely generative models still lack the ability to create scenes that can be used in physics simulator [5, 6, 7]. Other procedural pipelines either focus on learning visual models [8, 9, 10], address specific use-cases such as autonomous driving [11, 12], or make it hard to be extended and customized since they are tightly integrated with a particular simulation platform [13]. With scene_synthesizer we present a library that simplifies the process of writing scene randomizers in Python, with a particular focus on physics simulations for robot manipulation. It is fully simulator-agnostic.

[1]

Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, and Hao Su. PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2019. doi:10.1109/cvpr.2019.00100.

[2]

Alberto Garcia-Garcia, Pablo Martinez-Gonzalez, Sergiu Oprea, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Alvaro Jover-Alvarez. The robotrix: an extremely photorealistic and very-large-scale indoor dataset of sequences with robot trajectories and interactions. 2019. URL: https://arxiv.org/abs/1901.06514, arXiv:1901.06514, doi:10.1109/iros.2018.8594495.

[3]

Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Manipulathor: a framework for visual object manipulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2021. doi:10.1109/cvpr46437.2021.00447.

[4]

Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. Robocasa: large-scale simulation of everyday tasks for generalist robots. In Robotics: Science and Systems. 2024.

[5]

Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, and Ji Hou. Controlroom3d: room generation using semantic proxy rooms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2024. doi:10.1109/cvpr52733.2024.00593.

[6]

Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: physically interactable 3d scene synthesis for embodied ai. 2024. URL: https://arxiv.org/abs/2404.09465, arXiv:2404.09465, doi:10.1109/cvpr52733.2024.01539.

[7]

Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, and Matthias Nießner. Text2room: extracting textured 3d meshes from 2d text-to-image models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 7909–7920. October 2023. doi:10.1109/iccv51070.2023.00727.

[8]

Maximilian Denninger, Dominik Winkelbauer, Martin Sundermeyer, Wout Boerdijk, Markus Knauer, Klaus H. Strobl, Matthias Humt, and Rudolph Triebel. Blenderproc2: a procedural pipeline for photorealistic rendering. Journal of Open Source Software, 8(82):4901, 2023. URL: https://doi.org/10.21105/joss.04901, doi:10.21105/joss.04901.

[9]

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, and Andrea Tagliasacchi. Kubric: a scalable dataset generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2022. doi:10.1109/cvpr52688.2022.00373.

[10]

Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, Alejandro Newell, Hei Law, Ankit Goyal, Kaiyu Yang, and Jia Deng. Infinite photorealistic worlds using procedural generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 12630–12641. 2023. doi:10.1109/cvpr52729.2023.01215.

[11]

Daniel J. Fremont, Edward Kim, Tommaso Dreossi, Shromona Ghosh, Xiangyu Yue, Alberto L. Sangiovanni-Vincentelli, and Sanjit A. Seshia. Scenic: A language for scenario specification and data generation. CoRR, 2020. URL: https://arxiv.org/abs/2010.06580, arXiv:2010.06580.

[12]

Timm Hess, Martin Mundt, Iuliia Pliushch, and Visvanathan Ramesh. A procedural world generation framework for systematic evaluation of continual learning. arXiv preprint arXiv:2106.02585, 2021.

[13]

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, and Roozbeh Mottaghi. ProcTHOR: Large-Scale Embodied AI Using Procedural Generation. In Advances in Neural Information Processing Systems (NeurIPS). 2022.