Contributions¶
All the contribution are Open source and freely available (with proper attribution) to the society. OdiaNLP has done either entire or partial contributions to the following projects:
Text to Speech or Speech to Text¶
Mozilla Common Voice¶
- Speech corpora creation through Mozilla Common Voice.
- 201MB Speech data has been prepared with purely volunteering efforts as of 21 July 2021.
- After downloading you will get a folder structure like this:
cv-corpus-<version>-<date>
└── or
├── reported.tsv
├── dev.tsv
├── other.tsv
├── test.tsv
├── train.tsv
├── validated.tsv
├── partials/template
└── clips
├── common_voice_or_<count>.mp3
├── common_voice_or_<count>.mp3
├── common_voice_or_<count>.mp3
├── common_voice_or_<count>.mp3
.
.
.
└── common_voice_or_<count>.mp3
- The
.tsv
files contain the odia sentences in odia script. - The
.mp3
files contain the corresponding pronunciation audio of the script.
Machine Translation¶
Google Translation API wrapper¶
- Odia has been added into Unofficial Google Translation API wrapper.
$ pip install googletrans
>>> from googletrans import Translator
>>> translator = Translator()
>>> translator.translate("Hello Odia people", dest="or").text
# 'ନମସ୍କାର ଓଡିଆ ଲୋକମାନେ |'
Data Anonymization¶
Fake Odia name generation¶
- For fake name generation purposes Odia support has been added to the best data anonymization library, Faker.
$ pip install Faker
>>> from faker import Faker
>>> fake = Faker("or_IN")
>>> for _ in range(10):
... print(fake.name())
...
ଚିତରଂଜନ ନନ୍ଦି
ରାଜ, ରବିନାରାୟଣ
କେଦାରନାଥ ବର୍ମା
ଅମରନାଥ ସେଠୀ
ସାଲୁଜା, କଳ୍ପତରୁ
ଦେବରାଜ ରାଧାରାଣୀ ପୋଦ୍ଦାର
ରାଧୁ ମତଲୁବ ଶତପଥୀ
ରନ୍ଧାରୀ, ସୁଶାନ୍ତ
ଗୈାତମ ଓରାମ
Named Entity Recognition¶
Odia Persons' name dataset¶
- Odia persons' name dataset has been added to Kaggle, to make it publicly available and further development on NER in Odia language.
Localization¶
- Various localization projects to make websites and applications available in Odia language
Telegram - Open source instant messaging tool¶
Mozilla Firefox (In-Progress)¶
Duckduckgo - Privacy based search engine¶
COVID-19 website (Unofficial)¶
OpenOdia Library¶
OpenOdia is a consolidated tool built for Odia language. It consists of various needed tools for Odia language like:
- Work tokenization
- Sentence tokenization
- Stopword removal
- Google Translate wrapper
- Automatic text summarization
- Odia name generation
Last update:
2023-03-27