Data Mining - Revealing the Sound Recordings Metadata Meaning


Vesna Aleksandrović, Ivan Pešić




What brings together bibliographic record (its format and content), labels, names, gramophone 78 rpm record, matrix numbers, roles, subjects? In the case of a digital library and information accessibility, that would be certainly - metadata. Every librarian knows the meaning of information. Things work right when you do have information to offer, and get. The problem arises when information is limited and almost does not exist. Now and here, in our hands are so valuable and precious, words, tones, tunes, information, voices of the past, written on one of the earliest sound carrier - 78 rpm gramophone record, representing cultural heritage of the mankind. Besides audio data, every single gramophone record has its story behind the scene, which we can hardly find even in old and dusty catalogues of gramophone records publisher houses, archives or in the memories of unique but informal, The 78 rpm Gramophone Records Collectors and Fans union. Our obligation is to find, explore and represent these data, such as recording and publishing date, location, matrix numbers, to resolve pseudonyms, initials, find out who composed or wrote lyrics or libretto and many other things which vary from record to record. Furthermore, these discoveries have to be put in some user friendly form and system, and publicly presented. The topic of this paper is information extraction for the purpose of wider digital object denotation and presentation. The present COBISS2 platform (used in National Library of Serbia) does not have an appropriate data export format suitable for this type of material. Therefore, we devised a process which extracts all necessary fields and subfields from a record, apply further processing of data, and store it in an XML file. We also developed an XML scheme for internal purposes which is used in MapForce mapping to represent the metadata in the final XML format. Also, there is a plan to create an additional mapping into DUBLIN CORE compatible format.