Wuvin's picture

8 79 7

Wuvin

Wuvin

·

Wuvin

AI & ML interests

None yet

Organizations

None yet

Wuvin's activity

upvoted 3 papers about 13 hours ago

Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control

Paper • 2410.06985 • Published 7 days ago • 5

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

Paper • 2410.05677 • Published 8 days ago • 14

ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

Paper • 2410.05651 • Published 9 days ago • 13

upvoted 7 papers about 14 hours ago

Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

Paper • 2410.06244 • Published 8 days ago • 19

Pyramidal Flow Matching for Efficient Video Generative Modeling

Paper • 2410.05954 • Published 8 days ago • 32

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Paper • 2410.08049 • Published 6 days ago • 8

Progressive Autoregressive Video Diffusion Models

Paper • 2410.08151 • Published 6 days ago • 15

MiRAGeNews: Multimodal Realistic AI-Generated News Detection

Paper • 2410.09045 • Published 5 days ago • 4

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

Paper • 2410.08168 • Published 6 days ago • 7

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published 6 days ago • 43

upvoted 2 papers about 15 hours ago

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published 2 days ago • 21

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published 2 days ago • 37

upvoted 4 papers 7 days ago

ControlAR: Controllable Image Generation with Autoregressive Models

Paper • 2410.02705 • Published 13 days ago • 7

FlexiTex: Enhancing Texture Generation with Visual Guidance

Paper • 2409.12431 • Published 28 days ago • 11

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Paper • 2409.12957 • Published 27 days ago • 18

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Paper • 2409.12576 • Published 27 days ago • 15

upvoted 4 papers 8 days ago

Image Copy Detection for Diffusion Models

Paper • 2409.19952 • Published 16 days ago • 12

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Paper • 2410.01036 • Published 15 days ago • 14

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Paper • 2410.02073 • Published 14 days ago • 38

RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models

Paper • 2409.19989 • Published 16 days ago • 17

upvoted 17 papers 27 days ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published 28 days ago • 123

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published 28 days ago • 71

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published 29 days ago • 24

OSV: One Step is Enough for High-Quality Image to Video Generation

Paper • 2409.11367 • Published 29 days ago • 13

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 29 days ago • 27

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Paper • 2409.07450 • Published Sep 11 • 10

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published Sep 10 • 55

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published Sep 6 • 21

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers

Paper • 2409.04196 • Published Sep 6 • 11

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

Paper • 2409.03718 • Published Sep 5 • 25

From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

Paper • 2409.03512 • Published Sep 5 • 25

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published Sep 4 • 86

FLUX that Plays Music

Paper • 2409.00587 • Published Sep 1 • 31

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published Sep 3 • 33

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 80

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Paper • 2408.17131 • Published Aug 30 • 11

upvoted 9 papers 28 days ago

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Paper • 2408.16767 • Published Aug 29 • 29

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121

T3M: Text Guided 3D Human Motion Synthesis from Speech

Paper • 2408.12885 • Published Aug 23 • 9

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Paper • 2408.09787 • Published Aug 19 • 6

DreamCinema: Cinematic Transfer with Free Camera and 3D Character

Paper • 2408.12601 • Published Aug 22 • 28

MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing

Paper • 2408.08000 • Published Aug 15 • 7

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Paper • 2408.05939 • Published Aug 12 • 13

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion

Paper • 2408.03178 • Published Aug 6 • 36

MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

Paper • 2408.02555 • Published Aug 5 • 28

upvoted 14 papers about 1 month ago

NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency

Paper • 2408.11054 • Published Aug 20 • 10

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Paper • 2408.10195 • Published Aug 19 • 12

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Paper • 2408.10198 • Published Aug 19 • 32

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 60

Kalman-Inspired Feature Propagation for Video Face Super-Resolution

Paper • 2408.05205 • Published Aug 9 • 8

MooER: LLM-based Speech Recognition and Translation Models from Moore Threads

Paper • 2408.05101 • Published Aug 9 • 6

Achieving Human Level Competitive Robot Table Tennis

Paper • 2408.03906 • Published Aug 7 • 26

VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Paper • 2408.02629 • Published Aug 5 • 13

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

Paper • 2408.00653 • Published Aug 1 • 27

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 105

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

Paper • 2407.17438 • Published Jul 24 • 23

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Paper • 2407.17470 • Published Jul 24 • 14

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Paper • 2407.12435 • Published Jul 17 • 13

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Paper • 2407.16224 • Published Jul 23 • 23