We introduce JavisDiT, a novel & SoTA Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG) from open-ended user prompts. We hope to set a new standard for ...
See the UMI repository for installation. The temporal_agg parameter in eval.sh refers to temporal ensemble strategy mentioned in our paper, enabling smoother robot actions. Additionally, you can use ...