Abstract

Text mining: Approaches and application

Miloš Radovanović, Mirjana Ivanović

The field of text mining seeks to extract useful information from unstructured textual data through the identification and exploration of interesting patterns. The techniques employed usually do not involve deep linguistic analysis or parsing, but rely on simple “bag-of-words” text representations based on vector space. Several approaches to the identification of patterns are discussed, including dimensionality reduction, automated classification and clustering. Pattern exploration is illustrated through two applications from our recent work: a classification-based Web meta-search engine and visualization of coauthorship relationships automatically extracted from a semi-structured collection of documents describing researchers in the region of Vojvodina. Finally, preliminary results concerning the application of dimensionality reduction techniques to problems in sentiment classification are presented.