Google

Google: Gemini Embedding 2

google/gemini-embedding-2

Gemini Embedding 2 is Google's first multimodal embedding model. We currently support mapping text and images into a unified vector space for semantic search and retrieval-augmented generation (RAG). It supports input context up to 8,192 tokens and flexible output dimensions from 128 to 3,072 (recommended: 768, 1536, or 3,072). Designed for cross-modal similarity — you can embed a text query and retrieve the most relevant images, or vice versa — making it well-suited for multimodal search, recommendation, and document understanding pipelines.

Context window: 8,192 tokens
Input: text, image, file, audio, video
Output: embeddings

View on OpenRouter. Model data sourced from OpenRouter.