Dictionaries Dictionary Writing Systems Dictionary Writing Systems Adam Rambousek Natural Language Processing Center Faculty of Informatics, Masaryk University xrambous@fi.muni.cz Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems Introduction Standards Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems Introduction Standards Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems Introduction Standards Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems Introduction Standards Dictionaries digital formats typesetting electronic versions (CD, web) further processing in language applications no widely used standard custom SGML, XML formats Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems Introduction Standards TEI: Text Encoding Initiative founded in 1987 guidelines for the digital encoding of literary and linguistic texts publishing TEI Guidelines (1st version in 1990) current version P5 Guidelines published in 2007 new support for multimedia, graphics, people and places data interoperability with other XML formats modular, customizable TEI Lite “designed to meet 90 % of the needs of 90 % of the TEI user community”. Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems Introduction Standards XDXF: XML Dictionary Exchange Format founded in 2004 to unify formats of freely available dictionaries supported by the most of open-source dictionary viewers over 300 bilingual and explanatory dictionaries Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems Introduction Standards LMF: Lexical Markup Framework ISO 24613:2008 developed since 2003, published in 2008 “representing data in lexical databases used with monolingual and multilingual computer applications” main objective is machine processing and data exchange modular, extensible, connected with other ISO standards extended information: one XML tag with name and value Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii IDM DPS commercial, developed by IDM (France) “suitable for monolingual and bilingual dictionaries of any type and can also be used to produce a thesaurus, a biographical dictionary, a dictionary of quotations or an encyclopaedia” used by Oxford University Press and Cambridge University Press client-server application, both parts Windows only Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii independent modules: Central Database/Repository documents stored in XML format, optionally with DTD implemented in Oracle, MS SQL, or PostgreSQL Authoring XML Editor client application for authors showing tree structure of XML document and XSLT generated preview download the documents for off-line editing Search Engine custom query language, XML structure, regular expressions, lemmatization... Work Allocation and Workflow Manager managing team work, batch editing Microsoft Project import/export Proofing Tool generate PDF for proofing Adobe InDesign support for automatic corrections import Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii iLEX commercial, developed by Erlandsen Media Publishing (Denmark) current version 2.2, January 2010 price: 400 to 7000 EUR developed in Java, supported on Windows, Linux and MacOS client-server or standalone application used by Royal Swedish Academy of Sciences to compile Svenska Akademiens Ordbok Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii built–in XML database, own query language and XQuery setting stored as a XML document in special database user accounts and permissions controled by database output defined by XSLT, XSL:FO or own format customizable user interface, selectable panels document XML strucure, editing form, entry preview, statistics, links, multimedia library, ... extensible with many plug-ins spellcheck, lexicography, SmartEdit, workflow, changes tracking,... Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii TLex (TshwaneLex) commercial, developed by TsdwaneDJe (South Africa) current version 5, October 2010 price: 150 to 1900 EUR supported on Windows and MacOS single user or client-server application important clients include Oxford University Press, Pearson/Longman, or Macmillan Publishers Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii document structure set by DTD or graphic interface data stored in custom format import from XML or CSV, export to XML, HTML, MS Word, RTF, Adobe InDesign editing in forms or output preview localizable user interface multimedia support tracking changes and comparing document of several users integrated corpus tools Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii M¯at¯apuna free, open-source, developed by Thinktank Consulting (New Zealand) not developed since 2005 web application, programmed in Perl, data stored in PostgreSQL, export to XML designed for Maori dictionary project document structure can be changed only by source code modification workflow and statistics data checks (spellcheck, defining dictionary,...) 2004 Computerworld Excellence Award for the Use of IT in Government Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii Glossword free, open-source, developed by Dmitry N. Shilnikov developement canceled in 2009, revived in 2010 by Glossword.biz Team web application, programmed in PHP, data stored in MySQL import/export XML and CSV creating explanatory dictionaries, glossaries, references document structure can be changed only by source code modification multiple dictionaries with one installation built-in editor to customize output templates feedback form for each entry virtual keyboard with special characters Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii DEB ii free, open-source, developed by NLP Center, FI MU data stored in Oracle Berkeley DBXML, programmed in Ruby client-server architecture libraries and modules for dictionary application developement data storage and manipulation user and dictionary management, cooperation import, export, backup client: GUI, web application, API more than 10 projects – dictionaries, ontologies, lexical databases DEBDict – general dictionary browser DEBVisDic – wordnet editor and browser Praled – Czech lexical database editor FaNUK – dictionary of family names in UK ... Adam Rambousek Dictionary Writing Systems Dictionaries Dictionary Writing Systems IDM DPS iLEX TLex M¯at¯apuna Glossword DEB ii Adam Rambousek Dictionary Writing Systems