Feb 25, 2025 · đ Multiple Tasks: Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation. đ Visual Text Generation: Wan2.1 is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications. A machine learning-based video super resolution and frame interpolation framework. Est. Hack the Valley II, 2018. - k4yt3x/video2x Feb 23, 2025 · Video-R1 significantly outperforms previous models across most benchmarks. Notably, on VSI-Bench, which focuses on spatial reasoning in videos, Video-R1-7B achieves a new state-of-the-art accuracy of 35.8%, surpassing GPT-4o, a proprietary model, while using only 32 frames and 7B parameters. Jan 21, 2025 · This work presents Video Depth Anything based on Depth Anything V2, which can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability. Compared with other diffusion-based models, it enjoys faster inference speed, fewer parameters, and higher LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 30 FPS videos at 1216Ă704 resolution, faster than it takes to watch them. It can generate 30 FPS videos at 1216Ă704 resolution, faster than it takes to watch them. May 8, 2025 · Customized video generation aims to produce videos featuring specific subjects under flexible user-defined conditions, yet existing methods often struggle with identity consistency and limited input modalities. In this paper, we propose HunyuanCustom, a multi-modal customized video generation We present Step-Video-T2V, a state-of-the-art (SoTA) text-to-video pre-trained model with 30 billion parameters and the capability to generate videos up to 204 frames. To enhance both training and inference efficiency, we propose a deep compression VAE for videos, achieving 16x16
spatial and 8x temporal compression ratios. [2024.09.25] đ„đ„đ„ Our Video-LLaVA has been accepted at EMNLP 2024! We earn the meta score of 4. [2024.07.27] đ„đ„đ„ A fine-tuned Video-LLaVA focuses on theme exploration, narrative analysis, and character dynamics. Jan 13, 2025 · We present HunyuanVideo, a novel open-source video foundation model that exhibits performance in video generation that is comparable to, if not superior to, leading closed-source models. In order to train HunyuanVideo model, we adopt several key technologies for model learning, including data ReCamMaster:. WanVideo2_1_recammaster.mp4. TeaCache (with the old temporary WIP naive version, I2V): Note that with the new version the threshold values should be 10x higher