Digital Corpus of Literary Papyri (DCLP)
Final Report Abstract
By project end we had created ca. 15,000 EpiDoc files with metadata for nearly all published Greek and Latin literary papyri and encoded just over 625 transcriptions of Greek papyri, which totaled more than 105,000 lines of text. The original target of 1100 texts was not met because we concentrated on longer works rather than on short fragments, with an average of ca. 160 lines per papyrus. Text encoding resulted from the combined efforts of project team members, partner projects, and volunteers from the papyrological community. Perhaps more important than these results, we also trained members of the papyrological and classics communities via training sessions and webinars (see Workshops and Outreach), with a view to ongoing data curation and the future viability of the project. A hallmark of this project was its success in drawing on existing human and IT capital. Much effort went into retooling functioning technologies and ingesting existing data sets. We also benefited from the expertise of highly skilled doctoral students and professionals. What the platform offers is the ability to search, browse (by papyrological series, author/work, TM identifier), add already published Greek and Latin texts, emend existing transcriptions, and create born-digital editions. From a technical standpoint, the Heidelberg team, which was led by James Cowey and Carmen Lanz, accomplished the following objectives. It integrated the DCLP into a cloned version of papyri.info’s SoSOL editor, adopting the basic schemas for identifiers, translations, text and metadata, as well as achieving navigator functionality through papyri.info’s number server, index search, and aggregated HTML pages. The original plan was to set up this clone at NYU, but initial attempts at this were unsuccessful and it was moved to a Linux machine in Heidelberg. Additionally, Lanz and team generated EpiDoc files from data supplied by the Leuven Database of Ancient Books and Herculaneum projects. The files contained descriptive information about ancient and modern bibliography, material, dates and locations. The team provided new modules for collecting and storing metadata such as ancient authors and works, images, and modern bibliography. They set up a development and production server with backup and snapshot routines and a Maven artifact repository, and transferred the existing components to a newer environment and fixed compatibility issues. To ensure greater automation of data processing, they set up an eXist-db server for data syncing both within the text and metadata and across the DCLP dropdown menu, thereby allowing xpath requests on underlying EpiDoc XML structures. They also created documentation for the server and procedures. Finally, Heidelberg’s technical team was responsible for the establishment of editor boards similar to those used by papyri.info. As far as documentation and data preservation is concerned, GitHub has been our primary data repository (https://github.com/DCLP), home to the project’s various branches. GitHub also serves as the project’s issue tracking service, where problems are reported and discussed. We used a Wiki page hosted in Heidelberg to track agendas for weekly team meetings that were conducted over Skype during the funded period (http://aquila.zaw.uniheidelberg.de/paptrac/wiki/dclpRegSkypeMeetingsOld).