We introduce JavisDiT, a novel & SoTA Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG) from open-ended user prompts. We hope to set a new standard for ...
See the UMI repository for installation. The temporal_agg parameter in eval.sh refers to temporal ensemble strategy mentioned in our paper, enabling smoother robot actions. Additionally, you can use ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results