Researcher: Timothy R. Tangherlini

A morphological analysis tool for Old Icelandic.

ICEMorph is a morphological analysis and look-up tool for the study of Old Norse-Icelandic, among the most morphonologically complex of the Germanic languages.  The analysis tool uses a functional programming language, FM/Haskell, to tackle the problem of this complexity. Machine learning algorithms boost the performance of the system. The look-up tool is based on two of the most important Old Icelandic-English dictionaries: Cleasby-Vigfusson An Icelandic-English Dictionary (1874), and Zoëga Old Icelandic Dictionary (1910).

ICEMorph seeks to expand earlier work on Old Icelandic/Old Norse morphology and Old Icelandic/Old Norse texts and to complement current efforts to produce corpora of accurately morpho-syntactically tagged Old Norse-Icelandic text. These efforts aim to complement current projects such as the Medieval Nordic Text Archive, MENOTA, and the Icelandic Parsed Historical Corpus, IcePaHC.

We have developed an accurate morphological analyzer that includes nearly the entire Old Icelandic lexical set (including the poetic vocabulary). A feature of this system is the use of expert feedback to rapidly increase the accuracy of morphological analysis.

There are two interrelated tools in IceMorph: a morphological analysis and look-up environment that uses the Fornaldarsögur Norðrlanda (Guðni Jónsson 1943-1944) text corpus, and a dictionary lookup/browser tool, based on Cleasby and Vigfússon (1874) and Zoëga (1910).

The user of the morphological analysis tool can discover, for any word form encountered in the text, the form’s “lemma” or dictionary headword, grammatical information about the word including part-of-speech (POS) and inflectional class, a dictionary definition in English, and an inflectional paradigm for that word, including all of the word’s known forms.

The user of the dictionary tool can search for headwords in the dictionary, or any form of a dictionary headword. The tool returns the headword (or in cases of ambiguity headwords), POS information, English language definitions, and a table showing the inflection of the word.

An important feature of ICEMorph is its dynamic nature: as the system is fed expert feedback to correct or extend particular word forms, these corrections cascade through the system every night during nightly updates. ICEMorph also allows users to download the fully tagged corpus for use in other research environments.

ICEMorph and the tagged corpus is covered by the Creative Commons Open Archives non-commercial “by” license. The current citation for ICEMorph and the tagged corpus is:

Tangherlini, Timothy R., Kryztof Urban, Aurelijus Vijunas. 2013. ICEMorph: An Automated Morphological Analysis System and Dictionary Look-up Tool for Old Icelandic.

A long-term goal of the project is to integrate the morphological analysis and look-up tool with both normalized texts and diplomatic transcriptions of manuscripts in a rich study environment. This environment will allow for complex visualizations of textual relationships, and sophisticated word searches and study tools that are not limited to form-only searches, such as those available in Perseus. Automated normalization routines will increase the scope of texts available to the system, while assisted disambiguation routines will increase the power of word study tools. In this sophisticated study environment, manuscript transcriptions will be coupled to images of the manuscripts on which they are based, and the scholarly community will be able to engage in ongoing annotation–as well as lively debate–over the work, including discussion of disambiguation of difficult forms.

