Haotian Liu
LLaVA 13B
liuhaotian/llava-13b
LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities and setting a new state-of-the-art accuracy on Science QA. #multimodal
- Context window: 2,048 tokens
- Input: text, image
- Output: text
View on OpenRouter. Model data sourced from OpenRouter.