Dialect Data

Building the world’s most authentic Arabic dialect dataset—capturing real voices, conversations, and cultures across the Middle East, powering speech recognition, NLP, and language models that actually understand how people speak.

Explore the Data

What We Do

We collect natural conversations in Arabic dialects—capturing the way people really speak, across regions, ages, and cultures. Our datasets are designed for speech recognition, speaker diarization, code-switching research, and training modern language models in Levantine, Iraqi, and Yemeni Arabic.

Why It Matters

Arabic is not one language—it’s a tapestry of dialects. From Beirut’s streets to Yemeni markets, we’re helping AI understand real-world speech.

Our Approach

We partner with local communities and creators—ensuring authentic, ethical data collection with full consent and transparency. Every recording is paired with rich metadata and consent documentation so teams can safely use our data in production-grade AI systems.

Get Involved

Are you a content creator or researcher? Learn how to contribute, or get in touch for collaborations.

Ready to explore the dataset? See what we capture.