Uninterrupted Research: Advancing the Digitization of Archives

Example archival photo box.  Source:  http://pharosartresearch.org/

Example archival photo box. Source: http://pharosartresearch.org/

Finding the right image or researching a particular era, writer or artist has become easier since 2013 with the creation of PHAROS.  Founded in 2013, the project is an initiative to consolidate the millions of, often unpublished, records in art-historical photo archives to specifically benefit scholars accustomed to digital research. Spearheaded by The Frick Collection, the consortium also represents other art-history paragons, including the Getty Research Institute, National Gallery of Art, Yale Center for British Art, Courtauld Institute, Institut National d’Histoire de l’Art, and Kunsthistoriches Institut in Florenz.

Currently, researchers can use PHAROS to search 100,000 images of 60,000 artworks, finding details of a work’s provenance or exhibition and conservation history through a keyword search. This search is enhanced through category filters, or by uploading a file to match the database’s image-recognition technology. By 2020, PHAROS hopes to digitize and publish 7 million images of the 25 million documents, photographs, and artifacts across 14 art institutions in the US and Europe. Much like Google’s reverse image search, a researcher has no limitation to formulate a research query into words. Pattern recognition technology is not new, but its applications in arts institutions is expanding. The application of visual information into a digital search is helping to broaden the narrative of cultural literacy, as Google Arts and Culture is working towards, but it also provides autonomy to smaller collections’ digitization projects.

Directory of digitized photos available in browser-view and can be sorted by filter.  Source:  http://images.pharosartresearch.org/search

Directory of digitized photos available in browser-view and can be sorted by filter. Source: http://images.pharosartresearch.org/search


Rather than an individual entity, or digital museum, working alone as an innovation-center, managing its collections and engaging with the community internally, PHAROS benefits from the larger community of arts institution expertise.

Source:  http://pharosartresearch.org/about

Source: http://pharosartresearch.org/about

With a growing scale of researchers, amateur or professional, looking for archival resources, a larger market force is contributing to change in these environments. Digitization helps to reduce the burden of this scale. In 2015, 2,500 researchers visited the physical card library from the Frick collection, meanwhile over 75,000 from 91 countries searched online. In addition to its books, and permanent collection, the Frick Art Reference Library contains thousands of photographs, collected through Helen Clay Frick’s underwritten expedition to the world’s museum. As well, these records maintain a century's worth of files with notations from visiting scholars, dealers, and museum professionals who contributed observations in attribution and iconography. Prior to digitization, researchers would need to invest in the time and travel expenses to physically sift through these archives.

One of the best known art research archives in America, or Frick Art Reference Library  Source:  http://pharosartresearch.org/frick-art-reference-library

One of the best known art research archives in America, or Frick Art Reference Library Source: http://pharosartresearch.org/frick-art-reference-library

To address information inequity, PHAROS follows an international standard that corresponds to the refined objectives proposed by the International Council of Museums (ICOM). Driven by a need for standardization from the museum community, it was first published in 2006 and has since spread to encompass other cultural heritage institutions. It provides a common reference point against divergent and incompatible sources of information to mediate and clarify semantics used in entities such as museums, libraries, and archives. Through these objective standards, PHAROS eventually hopes to house roughly 31 million digital images in linked open data on a collaborative site, ResearchSpace. With funding from the Samuel H. Kress Foundation in 2014, PHAROS collaborated with John Resig, Dean of Computer Science at Kahn Academy, to develop its Art Research Database. In addition to showing visually-similar images, the database reports any accompanying documentation to the related image across all affiliated photo-archives.


Once the database launched, affiliates of the consortium began pursuing historical documents and photo-archive digitization and conversion under the standards of the Comité International pour la Documentation (CIDOC) and its Conceptual Reference Model (CRM), providing cross-museum linked-open data. Shortly after the technology was designed according to industry standards, it was distributed to affiliated institutions under the expectation that they would oversee its execution according to CIDOC’s standards. The model helps to define an intellectual structure for cultural documentation in logical terms. CIDOC CRM specifically defines and is restricted to the semantics of a database’s schematics according to its formal ontology, meaning it does not construct terminology but predicts their relationship.

fig. 1: Possible data flow between different kinds of CRM-compatible systems and data structures; Definition of the  CIDOC Conceptual Reference Model  version 6.2.2 E.S.: In Progress since [25/1/2017]

fig. 1: Possible data flow between different kinds of CRM-compatible systems and data structures; Definition of the CIDOC Conceptual Reference Model version 6.2.2 E.S.: In Progress since [25/1/2017]

It sets out to inform these affiliates in systems of best practices, providing a common language, and creating system requirements for maintaining the digital domains of cultural material. This formal language is instrumental to the identification of common features, to implement automatic data transformation, migration, and integration across systems. As well, it provides a global model of the base classes and their associations to better prepare a user to formulate specific queries. However, the CRM’s logical forms are predominantly helpful for identifying related data and are not a replacement for scholarly text.

A major drawback of the CRM is the need for consistency. Any project at this scale requires a large investment of time, experience, and creativity. Instead of passing this burden onto the archive’s visitors, PHAROS has distributed it across a community of professionals in the field. In actively demonstrating their commitment to universal standards, PHAROS is digitizing collections with competitive advantage over those that are less organized. Digitization provides a clean transition into systems that expedite the process of research and discovery in cultural institutions. As a system, users benefit more from each additional user who contributes, widening the disparity away from those who do not. Ultimately, archival digitization is better for researchers, archivists, and the environment they operate in. 


Aaron. “Search for images with reverse image search.” Google. 2018. Accessed April 24, 2018. https://support.google.com/websearch/answer/1325808?hl=en.

“About”. Pharos consortium. 2016. Accessed April 24, 2018. http://pharosartresearch.org/about.

“ABOUT NEH”. National Endowment For The Humanities. Accessed April 24, 2018. https://www.neh.gov/about.

Bain, Marc. “Google has built a stunning, searchable archive of 3,000 years of world fashion”. QUARTZ. June 11, 2017. Accessed April 24, 2018. https://qz.com/1002651/google-has-built-a-stunning-searchable-archive-of-3000-years-of-world-fashion/.

Boucher, Brian. “All-Star Museums Team Up to Digitize 25 Million Images, Putting Art History Online”. ArtNet News. May 16, 2017. Accessed April 24, 2018. https://news.artnet.com/art-world/pharos-25-million-artworks-digitized-962210.

“CIDOC”. Icom. 2010. Accessed April 24, 2018. http://network.icom.museum/cidoc/.

“Definition of the CIDOC Conceptual Reference Model.” ICOM/CIDOC CRM Special Interest Group. 2003. Version 6.2.3. October 2017. Accessed April 24, 2018. http://www.cidoc-crm.org/sites/default/files/cidoc_crm_version_5.0.4.pdf.

Droitcour, Brian & Smith, William S. “The Digitized Museum”. Art in America. September 1, 2016. Accessed April 24, 2018. https://www.artinamericamagazine.com/news-features/magazines/the-digitized-museum/.

“eMuseum Digital Publishing Software Helps You Share your Collection on the Web”. Gallery Systems Inc. 2018. Accessed April 24, 2018. https://www.gallerysystems.com/products-and-services/emuseum/.

“Frick Art Reference Library.” Pharos consortium. 2016. Accessed April 24, 2018. http://pharosartresearch.org/frick-art-reference-library.

“Frick Art Reference Library”. The Frick Collection. 2018. Accessed April 24, 2018. https://www.frick.org/research/library.

“Google Arts and Culture”. Google. 2015. Accessed April 24, 2018. https://artsandculture.google.com/.

“Images.” Pharos Consortium. 2018. Accessed May 13, 2018. http://images.pharosartresearch.org/.

“Initiatives.” Pharos Consortium. 2016. Accessed May 13, 2018. http://pharosartresearch.org/initiatives.

“Institutions.” Pharos Consortium. 2016. Accessed May 13, 2018. http://pharosartresearch.org/institutions.

“ISO 21127:2014: Information and documentation -- A reference ontology for the interchange of cultural heritage information”. International Organization for Standardization. October 2014. Accessed April 24, 2018. https://www.iso.org/standard/57832.html.

“News.” Pharos Consortium. 2016. Accessed May 13, 2018. http://pharosartresearch.org/news.

“Other Projects”. The President and Fellows of Harvard College. 2018. Accessed April 24, 2018. https://projects.iq.harvard.edu/yenchinglib/galleries.

“Research.” The Courtauld Institute of Art. 2017. Accessed May 13, 2018. https://courtauld.ac.uk/research.

“Research.” Max-Planck-Gesellschaft, München. 2018. Accessed May 13, 2018. http://www.khi.fi.it/Department_Wolf.

“Research.” The National Gallery of Art. 2018. Accessed May 13, 2018. https://www.nga.gov/research.html.

“Research.” The Paul Mellon Centre for Studies in British Art. 2018. Accessed May 13, 2018. https://britishart.yale.edu/research.

“ResearchSpace.” Trustees of the British Museum. 2017. Accessed April 24, 2018. http://researchspace.org/.

Resig, John. “Using Computer Vision to Increase the Research Potential of Photo Archives”. John Resig. Accessed April 24, 2018. https://johnresig.com/research/computer-vision-photo-archives/.

 “Short Intro: CIDOC CRM”. International Council of Museums. International Committee for Documentation. Accessed April 24, 2018. http://www.cidoc-crm.org/node/202.

“Study and Research.” Institut National d’Histoire de l’Art. 2018. Accessed May 13, 2018. https://www.inha.fr/en/research.html

“The Getty Research Institute.” J. Paul Getty Trust. 2018. Accessed May 13, 2018. http://www.getty.edu/research/.

Tucker, Jennifer. “How facial recognition technology came to be.” Boston Globe. November 23, 2014. Accessed April 24, 2018. https://www.bostonglobe.com/ideas/2014/11/23/facial-recognition-technology-goes-way-back/CkWaxzozvFcveQ7kvdLHGI/story.html.

Weber, Jon. “Creative Crowdsourcing: How the Smithsonian Turned Data Entry into Engagement.” Arts Management and Technology Laboratory. February 26, 2018. Accessed April 24, 2018. http://amt-lab.org/blog/2018/2/crowdsourcing-archival-digitization-and-transcription-a-case-study-of-the-smithsonian-institutions-transcription-center.

Weber, Jon. “Digital Humanities: Library of Congress Labs Opens Collections for Productivity and Play.” Arts Management and Technology Laboratory. April 24, 2018. Accessed April 24, 2018. http://amt-lab.org/blog/2018/4/digital-humanities-library-of-congress-labs-opens-collections-for-productivity-and-play.

Wolf, Eric Michael & Gottlieb-Miller, Lauren. “The Small Easy: Budget-Neutral Digital Projects at Small Libraries”. Art Documentation: Journal of the Art Libraries Society of North America, 36:2. 332-344. 2017. https://www.journals.uchicago.edu/doi/abs/10.1086/688732.