Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control Paper • 2410.06985 • Published 7 days ago • 5
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published 8 days ago • 14
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler Paper • 2410.05651 • Published 9 days ago • 13
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper • 2410.06244 • Published 8 days ago • 19
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published 8 days ago • 32
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations Paper • 2410.08049 • Published 6 days ago • 8
MiRAGeNews: Multimodal Realistic AI-Generated News Detection Paper • 2410.09045 • Published 5 days ago • 4
ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion Paper • 2410.08168 • Published 6 days ago • 7
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published 6 days ago • 43
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published 2 days ago • 21
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published 2 days ago • 37
ControlAR: Controllable Image Generation with Autoregressive Models Paper • 2410.02705 • Published 13 days ago • 7
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published 28 days ago • 11
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published 27 days ago • 18
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation Paper • 2409.12576 • Published 27 days ago • 15
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Paper • 2410.01036 • Published 15 days ago • 14
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published 14 days ago • 38
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models Paper • 2409.19989 • Published 16 days ago • 17
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 28 days ago • 71
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published 29 days ago • 24
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published 29 days ago • 13
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published 29 days ago • 27
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos Paper • 2409.07450 • Published Sep 11 • 10
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10 • 55
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published Sep 6 • 21
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers Paper • 2409.04196 • Published Sep 6 • 11
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation Paper • 2409.03718 • Published Sep 5 • 25
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents Paper • 2409.03512 • Published Sep 5 • 25
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4 • 86
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published Sep 3 • 33
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 80
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers Paper • 2408.17131 • Published Aug 30 • 11
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published Aug 29 • 29
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation Paper • 2408.09787 • Published Aug 19 • 6
DreamCinema: Cinematic Transfer with Free Camera and 3D Character Paper • 2408.12601 • Published Aug 22 • 28
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing Paper • 2408.08000 • Published Aug 15 • 7
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization Paper • 2408.05939 • Published Aug 12 • 13
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Paper • 2408.03178 • Published Aug 6 • 36
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization Paper • 2408.02555 • Published Aug 5 • 28
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency Paper • 2408.11054 • Published Aug 20 • 10
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views Paper • 2408.10195 • Published Aug 19 • 12
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper • 2408.10198 • Published Aug 19 • 32
Kalman-Inspired Feature Propagation for Video Face Super-Resolution Paper • 2408.05205 • Published Aug 9 • 8
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads Paper • 2408.05101 • Published Aug 9 • 6
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation Paper • 2408.02629 • Published Aug 5 • 13
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement Paper • 2408.00653 • Published Aug 1 • 27
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Paper • 2407.17438 • Published Jul 24 • 23
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Paper • 2407.17470 • Published Jul 24 • 14
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions Paper • 2407.12435 • Published Jul 17 • 13
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person Paper • 2407.16224 • Published Jul 23 • 23