Approximative Indexierungstechnik für historische deutsche Textvarianten

Heller, Markus

Zitationshinweis

Bitte beziehen Sie sich beim Zitieren dieses Dokumentes immer auf folgenden Persistent Identifier (PID):
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-49988

[Zeitschriftenartikel]

Heller, Markus

Abstract

Historical documents have specific properties which make life hard for traditional information retrieval techniques. The missing notion of orthography and a general high degree of variation in the phonetic-graphemic representation, as well as in derivational morphology obstruct the possibility to find documents upon the entry of a modern word as the search term. The following paper gives an overview of existing string approximation technologies as used in bioinformatics, but also of phonetic approximation algorithms. It proposes an architecture of combining both notions, while using Jörg Michael’ phonet program to deduct from graphemes to a phonetic representation and a levenshtein automaton to allow for fast approximative matching. The final part of the paper evaluates the suitability of the approach, while using the levenshtein algorithm in its non-automaton-based implementation.... weniger

Thesaurusschlagwörter
Phonetik; Datendokumentation; information retrieval; Text

Klassifikation
Informationsmanagement, informationelle Prozesse, Informationsökonomie

Sprache Dokument
Deutsch

Publikationsjahr
2006

Seitenangabe
S. 288-307

Zeitschriftentitel
Historical Social Research, 31 (2006) 3

DOI
https://doi.org/10.12759/hsr.31.2006.3.288-307

ISSN
0172-6404

Status
Veröffentlichungsversion; begutachtet (peer reviewed)

Lizenz
Creative Commons - Namensnennung 4.0