India's AI landscape is rapidly evolving, with significant advancements and innovations emerging. Krutrim AI Labs, an Ola Group company, is a key player in this growth, recently unveiling Chitrarth-1, a groundbreaking Vision Language Model (VLM). Designed for India's diverse linguistic and cultural context, Chitrarth-1 supports ten major Indian languages plus English, addressing a critical need for multilingual AI solutions. This article delves into Chitrarth-1 and its implications for India's expanding AI capabilities.
Table of Contents
- What is Chitrarth-1?
- Chitrarth-1 Architecture and Specifications
- Training Data and Methodology
- Phase 1: Adapter Pre-training
- Phase 2: Instruction Tuning
- Performance and Benchmarks
- Accessing Chitrarth-1
- Chitrarth-1 in Action
- Conclusion
What is Chitrarth-1?
Chitrarth-1 (combining "Chitra" – image and "Artha" – meaning) is a 7.5-billion parameter VLM integrating advanced language and vision processing. Built to serve India's diverse linguistic needs, it supports Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, Assamese, and English. This model embodies Krutrim's commitment to developing AI "for our country, of our country, and for our citizens." Its use of a rich, multilingual dataset minimizes bias and ensures robust performance across Indic languages and English, promoting equitable AI access. Research on Chitrarth-1 is published in leading academic journals, including NeurIPS and the Ninth Conference on Machine Translation.
Chitrarth-1 Architecture and Specifications
Chitrarth-1 utilizes the Krutrim-7B LLM as its foundation, enhanced by a vision encoder based on the SIGLIP (siglip-so400m-patch14-384) model. Key architectural components include:
- A pre-trained SIGLIP vision encoder for image feature extraction.
- A trainable linear mapping layer to project image features into the LLM's token space.
- Fine-tuning with instruction-following image-text datasets for improved multimodal performance.
Training Data and Methodology
Chitrarth-1's training involved two phases using a vast, multilingual dataset:
Phase 1: Adapter Pre-training
- Pre-trained on a diverse dataset translated into multiple Indic languages using an open-source model.
- Maintained a balanced representation of English and Indic languages to ensure equitable performance.
- Designed to avoid bias towards any single language, optimizing for efficiency and robustness.
Phase 2: Instruction Tuning
- Fine-tuned on a complex instruction dataset to enhance multimodal reasoning capabilities.
- Utilized an English-based instruction-tuning dataset and its multilingual translations.
- Included a vision-language dataset featuring diverse Indian imagery (personalities, monuments, artwork, cuisine).
- Incorporated high-quality proprietary English text data for balanced domain representation.
Performance and Benchmarks
Chitrarth-1 has been rigorously tested against leading VLMs like IDEFICS 2 (7B) and PALO 7B, consistently outperforming them on various benchmarks while maintaining competitiveness on tasks such as TextVQA and Vizwiz. It also surpasses LLaMA 3.2 11B Vision Instruct in key metrics. Krutrim introduced BharatBench, a new evaluation suite for ten under-resourced Indic languages across three tasks, establishing a baseline for future research and highlighting Chitrarth-1's ability to handle these languages effectively. Sample BharatBench results are shown below:
Language | POPE | LLaVA-Bench | MMVet |
---|---|---|---|
Telugu | 79.9 | 54.8 | 43.76 |
Hindi | 78.68 | 51.5 | 38.85 |
Bengali | 83.24 | 53.7 | 33.24 |
Malayalam | 85.29 | 55.5 | 25.36 |
Kannada | 85.52 | 58.1 | 46.19 |
English | 87.63 | 67.9 | 30.49 |
For more details, click here.
Accessing Chitrarth-1
Chitrarth-1 is accessible through:
- Hugging Face: Direct use or fine-tuning. (Click here to visit)
- GitHub: (Code provided in the original article)
- Krutrim Cloud: (Click here to explore)
Chitrarth-1 in Action
Examples of Chitrarth-1's capabilities include image analysis, image caption generation, and UI/UX screen analysis (images provided in the original article).
Conclusion
Krutrim AI Labs, a division of the Ola Group, is committed to building the future of AI computing. With Chitrarth-1, and other offerings like GPU as a Service, AI Studio, and more, they are establishing a new standard for inclusive, culturally sensitive AI, fostering a more equitable technological landscape.
The above is the detailed content of Chitrarth-1: A Multilingual VLM by Krutrim AI Labs. For more information, please follow other related articles on the PHP Chinese website!

Google is leading this shift. Its "AI Overviews" feature already serves more than one billion users, providing complete answers before anyone clicks a link.[^2] Other players are also gaining ground fast. ChatGPT, Microsoft Copilot, and Pe

In 2022, he founded social engineering defense startup Doppel to do just that. And as cybercriminals harness ever more advanced AI models to turbocharge their attacks, Doppel’s AI systems have helped businesses combat them at scale— more quickly and

Voila, via interacting with suitable world models, generative AI and LLMs can be substantively boosted. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including

Labor Day 2050. Parks across the nation fill with families enjoying traditional barbecues while nostalgic parades wind through city streets. Yet the celebration now carries a museum-like quality — historical reenactment rather than commemoration of c

To help address this urgent and unsettling trend, a peer-reviewed article in the February 2025 edition of TEM Journal provides one of the clearest, data-driven assessments as to where that technological deepfake face off currently stands. Researcher

From vastly decreasing the time it takes to formulate new drugs to creating greener energy, there will be huge opportunities for businesses to break new ground. There’s a big problem, though: there’s a severe shortage of people with the skills busi

Years ago, scientists found that certain kinds of bacteria appear to breathe by generating electricity, rather than taking in oxygen, but how they did so was a mystery. A new study published in the journal Cell identifies how this happens: the microb

At the RSAC 2025 conference this week, Snyk hosted a timely panel titled “The First 100 Days: How AI, Policy & Cybersecurity Collide,” featuring an all-star lineup: Jen Easterly, former CISA Director; Nicole Perlroth, former journalist and partne


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver CS6
Visual web development tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Linux new version
SublimeText3 Linux latest version

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Chinese version
Chinese version, very easy to use
