English-Odia Parallel corpus¶
Overview¶
- 80,437 English text followed by its Odia translation text pairs can be downloaded from our NMT model repo.
- Parallel pairs have been collected from many sources by many volunteers.
Sources¶
- Wikipedia Data dump
- Open Parallel Corpus
- OdiEnCorp 1.0
- TDIL - Technical strings 52,000 pairs-Data needs to be cleaned
- Global Voices - 328 sentences pairs
- Mann ki baat - 1000+ pairs
- Twitter:DoctorBabu - Around 100 Botanical terms En-Or pairs
- Rupesh Ranjan Panda - Around 300 generic En-Or pairs
- Krishna Kabi - 186 En-Or pairs
Additional links¶
Odia Monolingual corpus¶
- Monolingual Odia data has been extracted from Wikipedia.
- You can use this repo to fetch the latest dataset.
- Ready-made monolingual corpus (with ~17,000 wikipedia articles) can be found at Kaggle created by Gaurav.
Odia dictionary¶
- The dictionary data has been extracted from Odia Purnachandra Bhashakosha.
- The source code repository for the dataset are in: OdiaNLP/dictionary
Last update:
2023-03-27