Prospective automated hierarchical classification of digitized documents


Jovana Kovačević, Jelena Graovac




The paper presents a proposal of a method for hierarchical classification of digitized documents of NCD digital library. The classification model implements Structured Support Vector Machine method (SSVM) which has shown excellent performance on Ebart corpus of documents in Serbian language. We describe the developed model and its results on Ebart dataset, suggest two types of hierarchies of classes of the NCD library regarding its content and define a protocol for the application of the method to digitized documents.