Learning from Odias in ML conference¶
Speakers¶
- Prof Panchanan Mohanty
- Prof Vivekananda Pani
Key points¶
This is the consolidated note from the above two professors:
- There are two kinds of Odia language form: written and spoken Odia. The AI/ML process for Spoken Odia process has not started yet. Importance needs to be given on Written Odia.
- NLP for Odia is just getting started and there is a vast area of opportunity on every field of NLP in Odia.
- There are four major things needed in Odia for NLP:
- Standard Font
- Unicode font is not made consulting Odia linguists.
- The makers had taken shortcuts and not standardized the font
- ASCII creates better Odia
- Every algo will be suboptimal to English until this font issue is resolved.
- We need to learn this encodings
- ML has two aspects
- Data Normalization
- Data Cleaning
- Rules for spelling
- 1990 1st Odia spell checker
- Still Odisha Govt could not integrate with further tools
- Spell checking need to be standardized.
- Dictionary
- No good dictionary after Purnachandra Bhashakosha
- Prepare dictionary using Lexicography
- Exhaustive dictionary need to be created.
- Get data from newspaper, publishers, prepare dictionary → Easy
- Grammar
- Descriptive standard grammar needs to be there.
- Book available on market are not Odia grammar
- Standard Font
Last update:
2023-03-27