Skip to content

Learning from Odias in ML conference


  • Prof Panchanan Mohanty
  • Prof Vivekananda Pani

Key points

This is the consolidated note from the above two professors:

  • There are two kinds of Odia language form: written and spoken Odia. The AI/ML process for Spoken Odia process has not started yet. Importance needs to be given on Written Odia.
  • NLP for Odia is just getting started and there is a vast area of opportunity on every field of NLP in Odia.
  • There are four major things needed in Odia for NLP:
    1. Standard Font
      • Unicode font is not made consulting Odia linguists.
      • The makers had taken shortcuts and not standardized the font
      • ASCII creates better Odia
      • Every algo will be suboptimal to English until this font issue is resolved.
      • We need to learn this encodings
      • ML has two aspects
      • Data Normalization
      • Data Cleaning
    2. Rules for spelling
      • 1990 1st Odia spell checker
      • Still Odisha Govt could not integrate with further tools
      • Spell checking need to be standardized.
    3. Dictionary
      • No good dictionary after Purnachandra Bhashakosha
      • Prepare dictionary using Lexicography
      • Exhaustive dictionary need to be created.
      • Get data from newspaper, publishers, prepare dictionary → Easy
    4. Grammar
      • Descriptive standard grammar needs to be there.
      • Book available on market are not Odia grammar

Last update: 2023-03-27