Download full text
(1003.Kb)
Citation Suggestion
Please use the following Persistent Identifier (PID) to cite this document:
https://nbn-resolving.org/urn:nbn:de:0168-ssoar-339398
Exports for your reference manager
Creating an Annotated Corpus for Sentiment Analysis of German Product Reviews
[research report]
Corporate Editor
GESIS - Leibniz-Institut für Sozialwissenschaften
Abstract The availability of annotated data is an important prerequisite for the development of machine learning
algorithms for sentiment analysis. However, as manually labeling large datasets is time-consuming
and expensive, few datasets are available and most of them represent a small sample of a very na... view more
The availability of annotated data is an important prerequisite for the development of machine learning
algorithms for sentiment analysis. However, as manually labeling large datasets is time-consuming
and expensive, few datasets are available and most of them represent a small sample of a very narrow
domain, e.g. movie reviews or reviews of a certain product type. Additionally, many annotated datasets
are available for English texts only. However, the influence of different characteristics of the input
dataset on the performance of algorithms for sentiment analysis remains unclear if only training data
from one specific domain is available or if specific domains are mixed in the test corpus. We therefore
introduce a new dataset for German product reviews of various product types and investigate whether
even small variances in this specific domain (different product types) already exhibit different characteristics,
e.g. with regard to the difficulty of sentiment annotation. The annotation of this corpus lays
the basis for future enhanced annotations of similar corpora and for the extension of our annotations
to corpora of inherently different domains. These will then serve to investigate the influence of different
corpus characteristics on different algorithms for sentiment analysis and as a basis to apply machine
learning methods for sentence-wise sentiment analysis for German texts.... view less
Classification
Natural Science and Engineering, Applied Sciences
Document language
English
Publication Year
2013
City
Mannheim
Page/Pages
16 p.
Series
GESIS-Technical Reports, 2013/05
ISSN
1868-9051
Status
Published Version; reviewed
Licence
Deposit Licence - No Redistribution, No Modifications