Dialect Data
Building the world’s most authentic Arabic dialect dataset—capturing real voices, conversations, and cultures across the Middle East, powering speech recognition, NLP, and language models that actually understand how people speak.
Explore the DataWhat We Do
We collect natural conversations in Arabic dialects—capturing the way people really speak, across regions, ages, and cultures. Our datasets are designed for speech recognition, speaker diarization, code-switching research, and training modern language models in Levantine, Iraqi, and Yemeni Arabic.
Why It Matters
Arabic is not one language—it’s a tapestry of dialects. From Beirut’s streets to Yemeni markets, we’re helping AI understand real-world speech.
Our Approach
We partner with local communities and creators—ensuring authentic, ethical data collection with full consent and transparency. Every recording is paired with rich metadata and consent documentation so teams can safely use our data in production-grade AI systems.
Get Involved
Are you a content creator or researcher? Learn how to contribute, or get in touch for collaborations.
Ready to explore the dataset? See what we capture.