English-Lithuanian-English Lexicon Database Management System for MT - PowerPoint PPT Presentation

english lithuanian english lexicon database management system for mt l.
Skip this Video
Loading SlideShow in 5 Seconds..
English-Lithuanian-English Lexicon Database Management System for MT PowerPoint Presentation
English-Lithuanian-English Lexicon Database Management System for MT

play fullscreen
1 / 15
Download Presentation
janna-ryan
Views
Download Presentation

English-Lithuanian-English Lexicon Database Management System for MT

Presentation Transcript

  1. English-Lithuanian-English Lexicon Database Management System for MT Singleton Labs. Gintaras Barisevicius and Elvinas Cernys Kaunas University of Technology, Department of Software Engineering

  2. Situation in Lithuania

  3. Situation in Lithuania • General electronic dictionaries • http://www.fotonija.lt; http://www.led.lt • Morphological analysis tools • Text corpora (100 mln. words) • http://donelaitis.vdu.lt • Speech recognition systems • Machine translation research

  4. Previous dictionary • Open to the user dictionary • Rigid dictionary structure • Lack of attributes • Not all parts of speech included • Indexed files for dictionary storage • Polysemy not included • Phrases not included

  5. Requirement to new system • Open to the user dictionary • Easy management of the attributes • All parts of speech • Big volume storage • Solution to polysemy and synonyms

  6. Project size

  7. Current system • Orientated to MT • More attributes, easy to extend • All parts of speech included • Database for dictionary storage • Polysemous words, domains • Automatic generation of morphological forms • System can work on various OS.

  8. Development process • From C++ to Java • Rational Rose tool • CVS for version control management • MySql database

  9. Adding new languages

  10. System deployment localy

  11. System deployment online

  12. Future ambitions • Phrases • Text corpora • Syntax rule realization • Additional features • Possible other translation choices • WEB translation • Video subtitle translation

  13. Text corpora usage in MT • The pen is on the table. PEN RASIKLIS RASIKLIS usage with STALAS is more often! Look usage with RASIKLIS STALAS TABLE LENTELE Look usage with RASIKLIS

  14. Conclusions • Thorough analysis of Lithuanian and English language conducted • Additional features to the dictionary have to be added (phrases, syntax rules) • Filling the dictionary can be started • Machine translation is underway

  15. Thank you for your attention. Gintaras Barisevicius gintaras.barisevicius@stud.ktu.lt gintaras.barisevicius@singleton-labs.lt