All the contribution are Open source and freely available (with proper attribution) to the society. OdiaNLP has done either entire or partial contributions to the following projects:
Text to Speech or Speech to Text¶
- Speech corpora creation through Mozilla Common Voice.
- 201MB Speech data has been prepared with purely volunteering efforts as of 21 July 2021.
- After downloading you will get a folder structure like this:
cv-corpus-<version>-<date> └── or ├── reported.tsv ├── dev.tsv ├── other.tsv ├── test.tsv ├── train.tsv ├── validated.tsv ├── partials/template └── clips ├── common_voice_or_<count>.mp3 ├── common_voice_or_<count>.mp3 ├── common_voice_or_<count>.mp3 ├── common_voice_or_<count>.mp3 . . . └── common_voice_or_<count>.mp3
.tsvfiles contain the odia sentences in odia script.
.mp3files contain the corresponding pronunciation audio of the script.
- Odia has been added into Unofficial Google Translation API wrapper.
$ pip install googletrans >>> from googletrans import Translator >>> translator = Translator() >>> translator.translate("Hello Odia people", dest="or").text # 'ନମସ୍କାର ଓଡିଆ ଲୋକମାନେ |'
- For fake name generation purposes Odia support has been added to the best data anonymization library, Faker.
$ pip install Faker >>> from faker import Faker >>> fake = Faker("or_IN") >>> for _ in range(10): ... print(fake.name()) ... ଚିତରଂଜନ ନନ୍ଦି ରାଜ, ରବିନାରାୟଣ କେଦାରନାଥ ବର୍ମା ଅମରନାଥ ସେଠୀ ସାଲୁଜା, କଳ୍ପତରୁ ଦେବରାଜ ରାଧାରାଣୀ ପୋଦ୍ଦାର ରାଧୁ ମତଲୁବ ଶତପଥୀ ରନ୍ଧାରୀ, ସୁଶାନ୍ତ ଗୈାତମ ଓରାମ
Named Entity Recognition¶
- Odia persons' name dataset has been added to Kaggle, to make it publicly available and further development on NER in Odia language.
- Various localization projects to make websites and applications available in Odia language
Mozilla Firefox (In-Progress)¶
Duckduckgo - Privacy based search engine¶
COVID-19 website (Unofficial)¶
OpenOdia is a consolidated tool built for Odia language. It consists of various needed tools for Odia language like:
- Work tokenization
- Sentence tokenization
- Stopword removal
- Google Translate wrapper
- Automatic text summarization
- Odia name generation