More documents from Heller, Markus
More documents from Historical Social Research

Export to your Reference Manger

Please Copy & Paste



Bookmark and Share

Approximative Indexierungstechnik für historische deutsche Textvarianten

[journal article]

Heller, Markus

fulltextDownloadDownload full text

(286 KByte)

Citation Suggestion

Please use the following Persistent Identifier (PID) to cite this document:

Further Details
Abstract Historical documents have specific properties which make life hard for traditional information retrieval techniques. The missing notion of orthography and a general high degree of variation in the phonetic-graphemic representation, as well as in derivational morphology obstruct the possibility to find documents upon the entry of a modern word as the search term. The following paper gives an overview of existing string approximation technologies as used in bioinformatics, but also of phonetic approximation algorithms. It proposes an architecture of combining both notions, while using Jörg Michael’ phonet program to deduct from graphemes to a phonetic representation and a levenshtein automaton to allow for fast approximative matching. The final part of the paper evaluates the suitability of the approach, while using the levenshtein algorithm in its non-automaton-based implementation.
Keywords phonetics; text; information retrieval; data documentation
Classification Information Management, Information Processes, Information Economics
Document language German
Publication Year 2006
Page/Pages p. 288-307
Journal Historical Social Research, 31 (2006) 3
ISSN 0172-6404
Status Published Version; peer reviewed
Licence Creative Commons - Attribution-Noncommercial-No Derivative Works