LLMs
-

MinerU2.5, a vision-language model for efficient document parsing
MinerU2.5 (paper, code) is a parsing vision-language model that converts complex documents, such…
-

AgentScope 1.0, a developer-centric framework for AI agents
AgentScope 1.0 (project page, paper, code) is a new framework designed to simplify…
-

V-JEPA 2, a model that uses videos to learn real world physics
Imagine a home robot that learns to tidy up simply by watching online…
-

FastVLM, a Vision-Language Model – CVPR 2025
Apple Research introduces FastVLM, a new architecture designed to accelerate Vision-Language Models (VLMs)…
-

AI models and book memorization, new study sparks debate
What if your favorite AI chatbot wasn’t just trained on books, but could…
-

BAGEL from ByteDance, an open-source multimodal AI
Have you ever wondered how AI can generate detailed captions from images, answer…
-

Universal geometry links multimodal embedding spaces
What if all the embedding spaces used in AI — text, images, speech,…
-

OpenAI launched HealthBench to test LLM safety in health
Large language models are rapidly entering the healthcare space. But how do we…
-

Absolute Zero – AI training without any human data
Imagine an AI that improves without human-labeled data and curated datasets. Just pure…
-

Qwen3 by Alibaba, a new open-source model with hybrid reasoning
Released on April 28, 2025, Qwen3 is an open-source multimodal LLM that extends…