DIGITARQ PROJECT
The Archive owns electronic finding aids mainly in ISIS format but also in ACCESS, EXCEL and WORD. This represents an universe of c. 150.000 records descriptions considered all agregation levels (fonds, class, series, compound document and document. The other finding aids were on paper: typed, published or handwritten, which represented c. 75.000 records descriptions.
The first step was to "clean" the DTD/EAD making some allowed adaptations, e.g., the way dates should appeared, or the description levels. The second step was to depelop transformers capable to import all electronic data to EAD XML tagged structure. We developed a specific transformer to each electronic format (ACCESS, WORD, EXCEL, ISIS).
Meanwhile an archivists' team was anotating the existing finding aids with the apropriated EAD tags in order to give feedback to computer experts when the time would come to import all data. That same team proceeded with OCR, revision and anotation of the paper finding aids.
At the end, all data (electronic and paper) were converted to EAD resulting in c. 700 texts organized by fonds (each text corresponding to a fonds or records group). Stylesheets were developed in order to produce html from those texts, which allowed patrons to access the information and permited a easier revision of those texts.
After that the development of a database where all the data might be imported and also that would permit the production of further descriptions took place. The database has an relational architecture combined with hierarchical XML structure, the interface with the user is articulated through two "lazy nodes" one of them interacting with XML and the other with SQL
The digital archive architecture is compliant with the following base documents:
The basic management unit is the digital object which comprehends images, which are in fact the atomic units of the GOD (digital object management in portuguese:). Every digital objet must be produced in the context of a specific project, being reflected in it's name. The Id string for each digital object is compliant to the following scheme: [nameOfCUstodialEntity][YearOfProduction][ProjectNumber][DigitalObjectNumber]. The first, second and fourth block are automatically generated by the aplication, which requires from the operator manual insertion of the project's Id.
The DO (digital object) exists in the virtual world and it corresponds to a description unit that exists in the real world. This description unit can be digitally replicated at discrete levels:
The search engine (pesquisa.adporto.org/pesquisa) was developed in web environement and it interacts with the description and DO databases.
The project ended beginning May 2004. The next step is to develop a e-commerce interface and digitisitation on demand to remotely provide products to patrons. This project however has not yet been budgeted.| archival standards | metadata |
| image capture configurations | conversion tools |
| scanning hardware | scanning software |
| development software | digital archive |
The descriptions were stored in a data structure that ha as basis the isad descriptions areas and inside, all the suitable EAD elements were inserted:
To see the complete compiled scheme click here.
Other metadata schemes were consulted:From benchmarking tests made during previous digitisation projects we had defined "digitisation profiles" which consist on capture parameters obtaines for a specific document tipology. These profiles must be considered accondingly to hardware and software capture used. These tipologies (or groups) were defined according to physical features of documents.
The profiles are intended to obtain matrix images with maximum archival quality and were obtained according to Cornell guidelines.(See also KENNEY, A, R.; CHAPMAN, S. - Digital Imaging for Libraries and Archives. Ithaca: Cornell University Library.1996
The derivatives configuration were obtained according to NARA guidelines for Digitizing Archival Materials.1998)We often used as information resource the TASI (Technical Advisory Service for Images) website.
To see the profiles click here to download a pdf file.
Several ad-hoc anotations were developed to allow automatic import to EAD structure.
To see some statistics regarding OCR conversion please click here to download a pdf file.
Two main devices were used in order to (1) digitise finding aids (which were after submitted to OCR process) and (2) digitise selected historical documents. The criteria to scan this last document group were based in conservation condition and access rate from patrons.
Interface Capture drivers:
For image processing:
A aplication was built according to OAIS (Open Archival Information System), now ISO 14721:2003 and Project Interpares deliverables guidelines on digital preservation.
Several normative and technical documentes were used: