Deep Learning in the Closed Stacks: Possibilities for Archives Procedure Automation

National_Museum_of_Natural_History_Rotunda_pano
Rotunda of the Smithsonian National Museum of Natural History. Image via Blake Patterson.

Lately, I’ve been doing a lot of reading about the origins of the “public museum”- a institution open to visitation by a general audience and sensitive to the societal needs of recreation, education, inspiration, and relevance. One of the largest categories of such museums, early on, was akin to a natural history museum: full of biological specimens, mineral samples, and other evidence of the marvels of the natural world. A major player within this particular subset of museums is the Smithsonian’s National Museum of Natural History. In fact, as of 2015, this institution was the third most-visited museum in the world. [1] It houses five million plant specimens in its Herbarium, and has recently launched a herculean project to digitize them all, along with their documentation, to be put online in a publicly-searchable database.[2]

The result is a massive, mostly untapped dataset. Convinced that this collection could reveal big, important things when analyzed in the aggregate (as “big data”) the Smithsonian engaged data scientists to employ deep learning techniques on the digitized collection that would enable automation of sorting tasks. The published findings suggest that computers are well-equipped to handle these sorts of tedious time sucks- in this case, sorting specimens that contain mercury stains, and sorting two physically similar yet distinct plants- which have normally been performed by human beings.

“The just-published findings are a striking proof of concept. Generated by a team of nine headed up by research botanist Eric Schuettpelz and data scientists Paul Frandsen and Rebecca Dikow, the study aims to answer two large-scale questions about machine learning and the herbarium. The first is how effective a trained neural network can be at sorting mercury-stained specimens from unsullied ones. The second, the highlight of the paper, is how effective such a network can be at differentiating members of two superficially similar families of plants—namely, the fern ally families Lycopodiaceae and Selaginellaceae.”[3]


The potential for this type of automated sorting seems pretty far-reaching. Archives, like the Herbarium’s collections, often require a great deal of physical processing and organization (by archives staff) and eventual comparison and sorting (by researchers). It is intriguing for me to speculate how the deep learning techniques employed at the Smithsonian might translate over from a curatorial to an archival context. Could it be used to establish document authenticity or provenance? Might it make it easier to sort documents into artificial collections of datasets that reveal connections and new insights, without interrupting the physical and intellectual organization of archival collections?

1. Hetter, Katia. “And the World’s Top Museum Is…” CNN.com, 16 June 2016. http://www.cnn.com/travel/article/world-top-10-museums-2016/index.html

2. Smith, Ryan P. “How Artificial Intelligence Could Revolutionize Archival Museum Research.”Smithsonian.com, 3 November 2017. https://www.smithsonianmag.com/smithsonian-institution/how-artificial-intelligence-could-revolutionize-museum-research-180967065/

3. Ibid.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s