search
HomeTechnology peripheralsAIChitrarth-1: A Multilingual VLM by Krutrim AI Labs

India's AI landscape is rapidly evolving, with significant advancements and innovations emerging. Krutrim AI Labs, an Ola Group company, is a key player in this growth, recently unveiling Chitrarth-1, a groundbreaking Vision Language Model (VLM). Designed for India's diverse linguistic and cultural context, Chitrarth-1 supports ten major Indian languages plus English, addressing a critical need for multilingual AI solutions. This article delves into Chitrarth-1 and its implications for India's expanding AI capabilities.

Table of Contents

  • What is Chitrarth-1?
  • Chitrarth-1 Architecture and Specifications
  • Training Data and Methodology
    • Phase 1: Adapter Pre-training
    • Phase 2: Instruction Tuning
  • Performance and Benchmarks
  • Accessing Chitrarth-1
  • Chitrarth-1 in Action
  • Conclusion

What is Chitrarth-1?

Chitrarth-1 (combining "Chitra" – image and "Artha" – meaning) is a 7.5-billion parameter VLM integrating advanced language and vision processing. Built to serve India's diverse linguistic needs, it supports Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, Assamese, and English. This model embodies Krutrim's commitment to developing AI "for our country, of our country, and for our citizens." Its use of a rich, multilingual dataset minimizes bias and ensures robust performance across Indic languages and English, promoting equitable AI access. Research on Chitrarth-1 is published in leading academic journals, including NeurIPS and the Ninth Conference on Machine Translation.

Chitrarth-1 Architecture and Specifications

Chitrarth-1 utilizes the Krutrim-7B LLM as its foundation, enhanced by a vision encoder based on the SIGLIP (siglip-so400m-patch14-384) model. Key architectural components include:

  • A pre-trained SIGLIP vision encoder for image feature extraction.
  • A trainable linear mapping layer to project image features into the LLM's token space.
  • Fine-tuning with instruction-following image-text datasets for improved multimodal performance.

Training Data and Methodology

Chitrarth-1's training involved two phases using a vast, multilingual dataset:

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Phase 1: Adapter Pre-training

  • Pre-trained on a diverse dataset translated into multiple Indic languages using an open-source model.
  • Maintained a balanced representation of English and Indic languages to ensure equitable performance.
  • Designed to avoid bias towards any single language, optimizing for efficiency and robustness.

Phase 2: Instruction Tuning

  • Fine-tuned on a complex instruction dataset to enhance multimodal reasoning capabilities.
  • Utilized an English-based instruction-tuning dataset and its multilingual translations.
  • Included a vision-language dataset featuring diverse Indian imagery (personalities, monuments, artwork, cuisine).
  • Incorporated high-quality proprietary English text data for balanced domain representation.

Performance and Benchmarks

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Chitrarth-1 has been rigorously tested against leading VLMs like IDEFICS 2 (7B) and PALO 7B, consistently outperforming them on various benchmarks while maintaining competitiveness on tasks such as TextVQA and Vizwiz. It also surpasses LLaMA 3.2 11B Vision Instruct in key metrics. Krutrim introduced BharatBench, a new evaluation suite for ten under-resourced Indic languages across three tasks, establishing a baseline for future research and highlighting Chitrarth-1's ability to handle these languages effectively. Sample BharatBench results are shown below:

Language POPE LLaVA-Bench MMVet
Telugu 79.9 54.8 43.76
Hindi 78.68 51.5 38.85
Bengali 83.24 53.7 33.24
Malayalam 85.29 55.5 25.36
Kannada 85.52 58.1 46.19
English 87.63 67.9 30.49

For more details, click here.

Accessing Chitrarth-1

Chitrarth-1 is accessible through:

  • Hugging Face: Direct use or fine-tuning. (Click here to visit)
  • GitHub: (Code provided in the original article)
  • Krutrim Cloud: (Click here to explore)

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Chitrarth-1 in Action

Examples of Chitrarth-1's capabilities include image analysis, image caption generation, and UI/UX screen analysis (images provided in the original article).

Chitrarth-1: A Multilingual VLM by Krutrim AI Labs Chitrarth-1: A Multilingual VLM by Krutrim AI Labs Chitrarth-1: A Multilingual VLM by Krutrim AI Labs

Conclusion

Krutrim AI Labs, a division of the Ola Group, is committed to building the future of AI computing. With Chitrarth-1, and other offerings like GPU as a Service, AI Studio, and more, they are establishing a new standard for inclusive, culturally sensitive AI, fostering a more equitable technological landscape.

The above is the detailed content of Chitrarth-1: A Multilingual VLM by Krutrim AI Labs. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
I Tried Vibe Coding with Cursor AI and It's Amazing!I Tried Vibe Coding with Cursor AI and It's Amazing!Mar 20, 2025 pm 03:34 PM

Vibe coding is reshaping the world of software development by letting us create applications using natural language instead of endless lines of code. Inspired by visionaries like Andrej Karpathy, this innovative approach lets dev

Replit Agent: A Guide With Practical ExamplesReplit Agent: A Guide With Practical ExamplesMar 04, 2025 am 10:52 AM

Revolutionizing App Development: A Deep Dive into Replit Agent Tired of wrestling with complex development environments and obscure configuration files? Replit Agent aims to simplify the process of transforming ideas into functional apps. This AI-p

Top 5 GenAI Launches of February 2025: GPT-4.5, Grok-3 & More!Top 5 GenAI Launches of February 2025: GPT-4.5, Grok-3 & More!Mar 22, 2025 am 10:58 AM

February 2025 has been yet another game-changing month for generative AI, bringing us some of the most anticipated model upgrades and groundbreaking new features. From xAI’s Grok 3 and Anthropic’s Claude 3.7 Sonnet, to OpenAI’s G

How to Use YOLO v12 for Object Detection?How to Use YOLO v12 for Object Detection?Mar 22, 2025 am 11:07 AM

YOLO (You Only Look Once) has been a leading real-time object detection framework, with each iteration improving upon the previous versions. The latest version YOLO v12 introduces advancements that significantly enhance accuracy

How to Use DALL-E 3: Tips, Examples, and FeaturesHow to Use DALL-E 3: Tips, Examples, and FeaturesMar 09, 2025 pm 01:00 PM

DALL-E 3: A Generative AI Image Creation Tool Generative AI is revolutionizing content creation, and DALL-E 3, OpenAI's latest image generation model, is at the forefront. Released in October 2023, it builds upon its predecessors, DALL-E and DALL-E 2

Elon Musk & Sam Altman Clash over $500 Billion Stargate ProjectElon Musk & Sam Altman Clash over $500 Billion Stargate ProjectMar 08, 2025 am 11:15 AM

The $500 billion Stargate AI project, backed by tech giants like OpenAI, SoftBank, Oracle, and Nvidia, and supported by the U.S. government, aims to solidify American AI leadership. This ambitious undertaking promises a future shaped by AI advanceme

5 Grok 3 Prompts that Can Make Your Work Easy5 Grok 3 Prompts that Can Make Your Work EasyMar 04, 2025 am 10:54 AM

Grok 3 – Elon Musk and xAi’s latest AI model is the talk of the town these days. From Andrej Karpathy to tech influencers, everyone is talking about the capabilities of this new model. Initially, access was limited to

Google's GenCast: Weather Forecasting With GenCast Mini DemoGoogle's GenCast: Weather Forecasting With GenCast Mini DemoMar 16, 2025 pm 01:46 PM

Google DeepMind's GenCast: A Revolutionary AI for Weather Forecasting Weather forecasting has undergone a dramatic transformation, moving from rudimentary observations to sophisticated AI-powered predictions. Google DeepMind's GenCast, a groundbreak

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment