Hathi Trust: Centuries of Information at Your Fingertips

From the first days of the computer age, there has been a dream of unlimited, immediate access to the information created by humanity over the centuries. Almost as soon as computer connectivity between remote locations began, the first digital library was started by uploading the text of well-known documents into the system. The goal of Project Gutenberg, which was the name given to the effort (in honor of the man who introduced the printing revolution to Europe) was to make the 10,000 most consulted books in the US available to public at virtually no cost to the users.

As technology progressed and the scanning of books became easier, other groups got interested, including Larry Page and Marissa Mayer of Google, who started a “secret ‘books’ project” around the turn of the twenty-first century. To have as much information as possible, Google needed as many books as possible. And to get books, Google staff went to the places that had the most books–large universities and public libraries. After a few years of scanning their books for the Google project, the universities of the Big Ten and the University of California System came together in 2008 to create a large-scale repository for all of the digitized content that came from their holdings; you could say it was an academic version of Google Books. This academic Google Books is known as the Hathi Trust–which is the closest thing we have to that dream of free and immediate access to the centuries of information we have created.

Hathi (pronounced Hah-tee) is the Hindi word for elephant, referring to the legendary memories of the animals. The resource is freely available to anyone with an internet connection and has much more than the 10,000 most consulted books—in fact, as of March 2020, there are over 17.3 million individual volumes in the Hathi Trust. That is nearly 800 terabytes of information; in physical form, the books and documents would rest on over 200 miles of shelves and weigh over 14,000 tons!

Of the 17 million+ volumes, 6.7 million of them are in the public domain, meaning that Hathi Trust display those books on their website. If provisions of US copyright law are anything, they are complex. But in general, anything published before 1925 is in the public domain. So are materials published by the US Government. And so are titles prior to 1978 for which notice of copyright was not included with the publication. With these titles, Central Michigan University researchers can not only read the documents on the Hathi Trust website, CMU researchers can also download the entirety of each book or download a selection of pages.

While many of the materials in Hathi Trust are not in the public domain, that doesn’t mean that they are not usable. First, everything in Hathi Trust is being stored for long-term digital preservation, giving humanity a back-up should the paper copies become inaccessible. Second, the data is incorporated into the big-data research projects Hathi Trust supports, such as statistical analyses of the evolution of language regarding a particular subject across time. Finally, and most importantly, materials that cannot be displayed via Hathi Trust’s access platform are still keyword searchable.

While having keyword access to documents one cannot view may seem odd, it is an extremely powerful tool. For instance, one can search every volume in Hathi Trust for a specific string of keywords and know exactly each book containing those search terms. This can narrow one’s universe of sources from hundreds or thousands down to dozens. Additionally, Hathi Trust can pinpoint the exact page on which the search terms were found, saving hours of research, especially when working with massive tomes.

For more information about the functionality of Hathi Trust and how you might be able to make use of it for your research, check out the embedded video above. If you have any questions, feel free to Ask a Librarian.

Hathi Trust is the closest realization we have to those dreams of great libraries where all knowledge known to humankind is housed in one place and freely accessible to us. As a member of Hathi Trust, Central Michigan University is proud to not only support this effort, but also make the most of what Hathi Trust has to offer for our researchers.

Leave a Reply

Your email address will not be published. Required fields are marked *