Преглед НЦД 22 (2013), 55–66
Jakub Řihák, Kateřina Kamrádková
National Technical Library, Prague, Czech Republic
Abstract: This paper describes recent activities in the area of digitization processes at the National Technical Library, Prague (Czech Republic), in particular a new schema of digitization workflow, its outputs and its impact on services provided to the National Technical Library´s users. Furthermore, it presents System Kramerius and cooperation oportunities in eBooks on Demand project for the Czech libraries.
One of the significant activities of the National Technical Library (NTL) is digitization of documents and their high quality OCR (Optical Character Recognition) processing. There is an effort to provide an access to a variety of digitized documents in the best quality to NTL´s library users. Therefore, all processes within the NTL digitization workflow had to be optimized and improved.
NTL focused on digitization of university textbooks from technical universities in previous years. These textbooks are a considerable part of NTL´s library collection. NTL´s priority was to digitize the most frequently borrowed university textbooks and new published ones. For this purpose, NTL designated a working place, equipped it with a document scanner and various software for image and document processing (i.e. Capture Perfect 3.0, Adobe Acrobat Professional, Abbyy FineReader 10 for OCR processing, etc.).
At the same time, NTL began an active participation in eBooks on Demand - A European Library Network (EOD) project. NTL is one of the four libraries in the Czech Republic that are involved in the project. Even though other Czech libraries interested in EOD partnership mostly have rare books collections, they do not have necessary resources to join the EOD network (for financial reasons, lack of staff or HW/SW). Thus, NTL offers a cooperation oportunity to these libraries to join the EOD project. Hence, such libraries can also offer their rare books as e-books. These e-books can be published in NTL´s digital library Kramerius – online and for free.
NTL began to upgrade and automate the whole digitization process in 2011. Until then the digitization process depended on human labour and it was time-consuming. To overcome these problems, the OCR process automation was needed. NTL bought a license for Abbyy Recognition Server 3.0 software and began to implement it to the digitization workflow. This software allows NTL to automate OCR processing in the ways that were not possible with Abbyy FineReader 10. Recognition Server was set up on a virtual server, on which Management Console is running. There are six other working stations connected to the server at the time (4 CPUs, each operates with 3,40 GHz frequency, 8GB RAM). Each processing station has one CPU designated for the OCR processing of the documents and is also used as a working station for the library employees during the week. Thanks to Recognition Server implementation, it was possible to decrease the time needed for OCR processing to Ľ of the original time. Consecutive OCR outputs (text files) verification can be executed simultaneously with OCR without waiting for the whole process to be finished. Verification can be done on any computer in the same network which is connected to the Manager Console on the server.
The next step was to upgrade the NTL´s digital library Kramerius. System Kramerius is used as a main access point to digital documents in the the National Technical Library as well as in various Czech libraries. It is based on Fedora core (which serves as a document repository), SOLR search platform, and Java-based interface. Digitized documents under the copyright published in Kramerius can be accessed after authentication “in-house”, while public domain documents can be accessed externally without restrictions. Further activities will focus on promoting the digital library services.
In connection with previously described activities, the need for significant changes in NTL´s digitization workflow emerged. New workflow was supposed to be simple, understandable and helpful so it could make all processes faster and in a better quality. All that had to be done with less staff, due to the budget cuts for this year.
Keywords: digitization, digitization workflow, OCR processing, digital library