Skip to content

Contributions

All the contribution are Open source and freely available (with proper attribution) to the society. OdiaNLP has done either entire or partial contributions to the following projects:

Text to Speech or Speech to Text

Mozilla Common Voice

  • Speech corpora creation through Mozilla Common Voice.
  • 201MB Speech data has been prepared with purely volunteering efforts as of 21 July 2021.

  • After downloading you will get a folder structure like this:
cv-corpus-<version>-<date>
└── or
    ├── reported.tsv
    ├── dev.tsv
    ├── other.tsv
    ├── test.tsv
    ├── train.tsv
    ├── validated.tsv
    ├── partials/template
    └── clips
        ├── common_voice_or_<count>.mp3
        ├── common_voice_or_<count>.mp3
        ├── common_voice_or_<count>.mp3
        ├── common_voice_or_<count>.mp3
        .
        .
        .
        └── common_voice_or_<count>.mp3
  • The .tsv files contain the odia sentences in odia script.
  • The .mp3 files contain the corresponding pronunciation audio of the script.

Machine Translation

Google Translation API wrapper

$ pip install googletrans
>>> from googletrans import Translator
>>> translator = Translator()
>>> translator.translate("Hello Odia people", dest="or").text
# 'ନମସ୍କାର ଓଡିଆ ଲୋକମାନେ |'

Data Anonymization

Fake Odia name generation

  • For fake name generation purposes Odia support has been added to the best data anonymization library, Faker.
$ pip install Faker
>>> from faker import Faker
>>> fake = Faker("or_IN")
>>> for _ in range(10):
...     print(fake.name())
... 
ଚିତରଂଜନ ନନ୍ଦି
ରାଜ, ରବିନାରାୟଣ
କେଦାରନାଥ ବର୍ମା
ଅମରନାଥ ସେଠୀ
ସାଲୁଜା, କଳ୍ପତରୁ
ଦେବରାଜ ରାଧାରାଣୀ ପୋଦ୍ଦାର
ରାଧୁ ମତଲୁବ ଶତପଥୀ
ରନ୍ଧାରୀ, ସୁଶାନ୍ତ
ଗୈାତମ ଓରାମ

Named Entity Recognition

Odia Persons' name dataset

  • Odia persons' name dataset has been added to Kaggle, to make it publicly available and further development on NER in Odia language.

Localization

  • Various localization projects to make websites and applications available in Odia language

Telegram - Open source instant messaging tool

Mozilla Firefox (In-Progress)

Duckduckgo - Privacy based search engine

COVID-19 website (Unofficial)

OpenOdia Library

OpenOdia is a consolidated tool built for Odia language. It consists of various needed tools for Odia language like:

  1. Work tokenization
  2. Sentence tokenization
  3. Stopword removal
  4. Google Translate wrapper
  5. Automatic text summarization
  6. Odia name generation

Last update: 2023-03-27