Since phonograph records are quickly disappearing from circulation, it is extremely important that libraries and archives around the world start preserving this significant cultural artifact of the 20th century and make them more accessible.
Our team carries out research around a common theme- to create an efficient and economical workflow management system for digitization of the wealth of historical and cultural heritage material that exist in analogue formats. We focus on the digitization of phonograph records at this present time.
As a case study, this project digitizes a unique set of approximately 3000 jazz recordings from 1940's and 1950's in 78 rpm format. For more information about this research project, read the project description and the project abstract to the FQRSC (Fonds de Recherche sur la Societe et la Culture, Etablissement de nouveaux-chercheurs volet individual) award on digitization of phonograph records with automatic generation of text and metadata.
This pilot project digitizes approximately 500 LP recordings from David Edelberg's Handel collection housed in McGill Music Library. The implementation phase of the project started in the spring of 2004. A web data entry form was implemented to facilitate the encoding of the data and metadata of the phonograph records, using the metadata schema (presented here in a relational database design) we developed for describing phonograph records, which was created to the finest level of granularity possible. Then a database model was designed and implemented to hold the content of the digitized material. A workflow management system was drafted in June 2004 and finalized a month later. Digitization started in July 2004. Two digitizers were hired and trained to digitize audio, scan album covers as well as any accompanying materials, and enter data and metadata into a database. In the span of 16 working days, 14 albums were digitized to their entirety.
To avoid the expense of reconverting when technology requires or can use a richer digital file, we match the current technology to the task so we can fulfill our mission while ensuring the best use of resources. For audio capturing, we use an audio sampling rate of 96 kHz and sample size of 24 bits with two channels (stereo) to digitize the audio. For image scanning, we produce all digital images at 1200 dpi and 24-bit color. At this point, we are not using any automated means of noise reduction on digital copies of analog recordings or applying any automated image processing on scanned images. We will undertake the task of audio and image quality enhancement in the future using the digitized files.
A preliminary web site for the digitization of the David Edelberg Handel Collection is launched in October, 2004. Disk usage statistics for the digitization of the Edelberg Handel Collection project by album is also available. The Challenges in Developing Digital Collections of Phonograph Records was presented at the 5th Joint Conference on Digital Libraries (JCDL) in Denver, Colorado. The presentation is available in powerpoint. A poster on Metadata for Phonograph Records: Facilitating New Forms of Use and Access to Analog Sound Recordings was also presented at JCDL, which is available in JPG (~3MB).
For more information about this research project, read the Preliminary Project Proposal to the Richard M. Tomlinson Digital Library Innovation Awards on digitization of Edelberg Handel collection.
To implement the specialized document analysis required for this project, several open-source software are used.
|Gamera||An optical character recognition application|
|Gamera is a toolkit for the creation of domain-specific structured document recognition application by expert users.|
|Greenstone||A suite for building and distributing digital library collections|
|Greenstone is used for the creation of database of audio files, text files, and metadata, which will be searchable and accessible via the web.|
|Metadata Standards||Semantic and descriptive vocabulary for describing an object or resource|
|Metadata standards are compiled to ensure consistent names in metadata to aid in the optical character recognition step and enable automatic generation of text and metadata.|