Skip to content

Categories

Search

Computer vision

TurboDiffusion makes video diffusion models 100–200× faster

TurboDiffusion (code, paper) is an acceleration framework that significantly reduces inference time and…

January 19, 2026
MinerU2.5, a vision-language model for efficient document parsing

MinerU2.5 (paper, code) is a parsing vision-language model that converts complex documents, such…

November 8, 2025
Create long AI videos locally with FramePack from Stanford

FramePack is a next-frame prediction neural network for high-quality and efficient video generation,…

April 27, 2025
InfiniteYou, photo customization with identity preservation

ByteDance introduced InfiniteYou (InfU), a powerful model that allows flexible photo modifications based…

April 6, 2025
Meta’s VGGT reconstructs 3D scenes in seconds [CVPR 2025]

VGGT (Visual Geometry Grounded Transformer) is an advanced AI model that is able…

April 1, 2025
Create dynamic multi-angle videos with CAT4D diffusion model

CAT4D is a new AI model for creating 4D scenes from single-camera videos.…

December 20, 2024
LivePortrait, a fast and free AI tool to animate portraits

LivePortrait is an AI-powered tool that creates lifelike animations from portraits. Simply provide…

August 7, 2024
Magic Insert, the new style-aware drag-and-drop technology from Google

Magic Insert is a new method proposed by Google that lets you drag-and-drop…

July 22, 2024
Depth Anything V2, a highly capable depth estimation model

Depth Anything V2 is a new powerful monocular depth estimation model, delivering significantly…

July 8, 2024
YOLOv10, a faster and more accurate object detection model

YOLOv10 is a recent advancement in real-time object detection YOLO models that achieves…

June 25, 2024

Connect

Follow us on Twitter

Follow us on LinkedIn

Join us on Reddit

Company

Guides

Stable Diffusion

CLIP architecture

Links

Link
Reddit
Twitter