Closing the books on digital scanning

Closing the books on digital scanning

On the fourth floor of the Rawlings branch of the Pueblo City-County Library District is a large temperature controlled vault.  I stood in the vault wearing white gloves, trying not to breathe. I spent a day helping the resident archivist carefully peel apart newsprint dating back to the 1930s. The archives department contains original issues of the Pueblo Chieftain dating back to the very start, 1868. Besides the paper, there are dozens of rows of similarly irreplaceable book volumes, historical documents, and photography of the region. All of this material is slated to be digitized, but it could take a while.... decades, in fact!

There has been a great surge of interest worldwide to digitize printed material, partly for preservation, and partly to make it widely available. An early pioneer in the field is Project Gutenberg. This collection of now over 50,000 books was founded in 1971 by Michael Hart, making it the oldest digital library in the world. Newer projects, such as Google Books with over 25 million titles, are pushing the boundaries faster than ever before. On the ground, however, and in smaller local libraries, the push to digitize is a slow slog.

Even highly sophisticated scanning ventures face the scanning bottleneck: Every single page must be turned, photographed, and turned again. There are more invasive techniques that can be employed, such as cutting the spines off of books and then ingesting the stack of pages into the hopper of a scanner copier - YIKES! Not an approach for historical documents or book collections. Especially not for the fragile almost tissue-like news print that I was turning. 

Closed Book Scanning

 The MIT Media Lab recently demonstrated a technique that uses terahertz radio waves to penetrate the pages of a book, and a computational method to recreate an image of those pages. At terahertz frequencies (trillions of cycles per second) the radio waves act more like light in their ability to interact with solids and carry detailed information, but retain some of the penetrating power of radio. The big trick has been generating clean enough sources of this radiation, and then developing computer algorithms to resolve text information. The system is able to differentiate between pages in a stack by looking for the signature of a sliver-thin slice of air that separates them.

The current technique can only resolve he top twenty pages in a stack. The future, however, could see rows of books shuttling through a terahertz scanner via conveyor belt, full text steaming out to the web. Though this technology will not be at your public library any time soon soon, it has the potential to digitize printed material on a truly industrial scale, and in a way the physically preserves the most delicate materials in a collection. This is good news for archivists the world around!

Also, I would recommend encrypting your hand written letters because they are now readable. ;->

Some links:

MIT Media Lab's article & video on its closed-book scanning research
Digitally unwrapping an ancient scroll
A paper about large scale change to digital libraries of books
 

 

 

 

 

Come on over for a Jazz Concert!

Come on over for a Jazz Concert!

Start your own little space program

Start your own little space program

0