OpenAI Launches IndQA
IndQA

Artificial General Intelligence (AGI) should benefit everyone, irrespective of language or culture. Yet, around 80% of the global population doesn't use English as their primary language—creating significant gaps in how AI serves the world. Most benchmarks today focus heavily on English and simple translation tasks. Unfortunately, these don’t reflect the true nuances required to measure whether AIs genuinely understand context, culture, and history—a major barrier for non-English speakers.

The Saturation Problem: Why Existing Benchmarks Fall Short

Popular benchmarks like MMMLU, once groundbreaking, are now saturated—models regularly achieve top scores, making it difficult to assess real progress in language understanding. They typically emphasize translation or multiple-choice questions, lacking depth in evaluating an AI's ability to engage with culture-specific knowledge and reasoning. Researchers and users increasingly need benchmarks that probe much deeper.

Introducing IndQA: Crafted for India’s Linguistic Richness

Recognizing the limitations of existing benchmarks, OpenAI developed IndQA—a groundbreaking platform tailored to Indian languages and cultures. India’s diversity is staggering: nearly a billion citizens don’t use English daily, and the country boasts 22 official languages (with seven spoken by over 50 million people). India is also ChatGPT’s second-largest market, making it a compelling starting point for redefining language AI evaluation.

How IndQA Works: A Culturally Attuned Approach

IndQA is designed to measure AI's knowledge and reasoning about real-life Indian contexts in 12 languages, covering 10 cultural domains from literature and history to cuisine and entertainment. The benchmark includes 2,278 questions curated by 261 Indian experts—journalists, scholars, artists, and more—ensuring true cultural authenticity.

Key features

  • Breadth: Questions span domains like Architecture & Design, Arts & Culture, Food & Cuisine, History, Law & Ethics, Literature & Linguistics, Media & Entertainment, Religion & Spirituality, Sports & Recreation, and Everyday Life.

  • Languages: Prompts are natively written in Bengali, Gujarati, Hindi, Hinglish, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu, and English, accounting for prevalent code-switching.

  • Human-Centric Evaluation: Each item includes a prompt, English translation, grading rubric, and an ideal expert answer.

  • Rubric Grading: Domain experts set weighted criteria for responses, and a model-based grader assesses how well each response meets these expectations.

Building IndQA: Depth, Rigor, and Adversarial Testing

The creation of IndQA involved recruiting native-speaking domain experts from across India. They crafted reasoning-heavy prompts and detailed rubrics to ensure high standards. Importantly, all questions underwent adversarial filtering—only retaining those that OpenAI’s best models failed to answer well—so progress can be robustly measured over time.

Every question was peer-reviewed and refined to guarantee that ideal answers reflect the highest expert consensus. This meticulous process makes IndQA the most challenging—and rewarding—benchmark for true cultural and linguistic understanding among AI systems.

Sample IndQA Questions Showcase Cultural Depth

IndQA isn’t just about translation—it gets to the heart of India’s realities. Examples include:

Hinglish

hinglish

Bengali

bengali

The Experts Behind IndQA

IndQA owes its depth to the contributions of 261 Indian domain experts, including:

  • A Nandi Award-winning Telugu actor and screenwriter with over 750 films

  • A Marathi journalist and editor at Tarun Bharat

  • A scholar of Kannada linguistics and a dictionary editor

  • An International Chess Grandmaster who coaches top-100 chess players

  • A Tamil writer, poet, and cultural activist advocating for social justice, caste equity, and literary freedom

  • An award-winning Punjabi music composer

  • A Gujarati heritage curator and conservation specialist

  • An award-winning Malayalam poet and performance artist

  • A professor of history, specializing in Bengal's rich cultural heritage

  • A professor of architecture, focusing on Odishan temples

Looking Forward: Inspiring Global Benchmarks

With the launch of IndQA, OpenAI encourages the research community to adopt similar approaches for other languages and cultures that remain underrepresented in AI benchmarks. By pushing beyond translation and embracing cultural depth, IndQA sets a “north star” for the future—where AI models are truly accessible, relevant, and insightful for everyone.