Multimodal
-
Google launches Gemini, its most advanced AI model
On December 6, 2023, Google launched Gemini, a cutting-edge multimodal AI model that…
-
DiagrammerGPT generates better diagrams using LLMs
DiagrammerGPT is a new framework that uses large language models (LLMs) to generate…
-
Introducing Lemur-70B and Lemur-70B-Chat: the open-source models harmonizing text and code for powerful language agents
Lemur and Lemur-Chat are openly accessible language models optimized for both natural language…
-
Multimodal foundation models: the future of AI assistants
Researchers from Google AI and Hugging Face present a comprehensive survey of multimodal…
-
Meta’s SeamlessM4T can translate and transcribe speech and text across nearly 100 languages
Meta launched SeamlessM4T (Massively Multilingual & Multimodal Machine Translation), a new AI model…
-
Word-As-Image for Semantic Typography (SIGGRAPGH 2023 technical paper awards)
Word-As-Image is a novel and creative way to make semantic typography, where the letters…
-
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents (SIGGRAPH 2023 technical paper awards)
GestureDiffuClip is a new framework that can create realistic and expressive body movements…
-
Meta-Transformer: a unified framework for multimodal learning
Meta-Transformer is a new framework for multimodal learning. It uses a single network…
-
Meta releases ImageBind, a multisensory AI model that integrates six types of data
Meta AI has developedĀ ImageBind, a cutting-edge AI tool capable of integrating data from…
-
MultiModal-GPT: a vision and language model that can dialogue with humans
MultiModal-GPT is a generative model that can engage in multi-round conversations with humans…