The biggest Issues I am receiving to collect the parallel corpus is sentences alignment.
- The sentences are in an unordered format. Where the first sentence from one language might match with third sentence with second language.
- second one
- Using a set of pre-translated words pairs to match among the sentence pairs.
- Using numbers among the pairs to match.
- Crawl the entire purnachandra bhashakosh and prepare the dictionary.
Last update: 2023-03-27