Framework for Fuzzy Classification of Digitized Documents


Aleksandar Janjić




The classification of a text document with respect to a predefined set of classes is an assignment of one of the values 0 or 1 to each ordered pair (document, class), depending on whether the document belongs to the class or not. Fuzzy classification generalizes this notion by enabling the membership to be expressed by any real number between 0 and 1. In this paper, we show one possible method of fuzzy classification by using the existing formulas for calculating the distance of a document from a class. As an illustration, we use this method to form a fuzzy classification of a subset of documents from Ebart-hier corpus. After that, we briefly describe the current state of the National Center for Digitization virtual library and show by an example how fuzzy classification can be used to improve the organization of the Library data and extend the querying possibilities.