Skip to content

Possible projects

This list contains the proposals for the projects which can be started with Odia language. This list has been created keeping students, research scholars and hobbist in mind, who by little knowledge on this domain can learn and able to execute this project.

Monolingual corpus

Language detector

  • Given a text string, detect its language. It should identify if Odia language text are given.
  • An existing language detector can be found in OpenOdia project.

Existing Algorithms

  1. Google language detector

Part of speech tagging

Given a sentence find out the part of speeches present on that sentence.
Part of speech can be verb, noun, adjective, pronoun, preposition, etc.

Stemming

Initial rough corpus
  • ["ଲେ", "ଠୁ", "ର", "ରେ", "ଟି", "ଟେ", "ଟା",
  • ୁଥିଲେ
  • ["ଥିଲେ", "ଥିଲ", "ଥିଲୁ", "ଥିଲି", "ଉଛେ", "ଉଛ", "ଉଛୁ", "ଉଛି", "ଇଛେ", "ଇଛ", "ଇଛୁ", "ଇଛି", "ଅଛେ", "ଅଛ", "ଅଛୁ", "ଅଛି", "ସିଛେ", "ସିଛ", "ସିଛୁ", "ସିଛି", "ଅନ୍ତେ", "ଅନ୍ତ", "ଅନ୍ତୁ", "ଅନ୍ତି", "ଇଲେ", "ଇଲ", "ଇଲୁ", "ଇଲି", "ଇବେ", "ଇବ", "ଇବୁ", "ଇବି", "ଥିବେ", "ଥିବ", "ଥିବୁ", "ଥିବି", "ଟାକୁ", "ଟାକେ", "ଟିର", "ଟିରେ", "ଟିଏ", "ମାନେ", "ଗୁଡ଼ା"]
  • ["ଗୁଡ଼ାଏ", "ଗୁଡ଼ାକ",]

Largest substring approach

  • By using the largest suffix substring removal process as shown in this code by Mohit for Hindi language.
  • In Odia language by using a specific set of suffixes we can omit critical information form the sentence.
  • For example the suffixes like uthilu, uthibe describes about the tense of the sentence, whether it is in future or past or present.
  • Similarly, there will be exceptions throughout the process and we can not use a generic set of suffixes to stem.
  • Therefore, a better method need to be found out.

Existing work

Odia text summarization

Optical Character Recognition of Odia script


Last update: 2023-03-27