Having a balanced video dataset is crucial for training action detection models. However, generating such data can be costly and require significant effort. Recent work has been able to generate human poses and motions based on a 3D scene. In addition, state-of-the-art radiance-field models can be used to generate 3D scenes from videos. Here, we use these technologies to develop a framework in Unreal Engine 5. This framework combines synthetic human motion generated from text prompts and reconstructed 3D scenes from real-life video scenes. The system accepts SMPL 24-joint skeleton motions, allowing for the utilization of various generative motion models. Gaussian splatting models have been used to reconstruct real environments with high fidelity in Unreal Engine. A synthetic video dataset of elderly individuals falling down has been creaded to demonstrate the usefulness of our framework in generating new data from actions that are difficult to record in real life but are likely to be detected. Our experiments using the generated synthetic data show that the video collection can be use to train an action recognision model to detect a higher range of actions.
@inproceedings{UnrealFall,
title={UnrealFall: Overcoming Data Scarcity through Generative Models},
author={David Mulero-Pérez and Manuel Benavent-Lledo and David Ortiz-Perez and Jose Garcia-Rodriguez},
booktitle={IJCNN},
year={2024}
}