Twelve million images spanning a half-millennium are set to be published on image sharing site Flickr for free use. As the BBC reported, the trove of images are the result of a project initiated by American academic Kalev Leetaru, which has already resulted in the publication of 2.6 million copyright-free photographs, images, and illustrations on what he has called the Internet Archive Book Images. All are extensively tagged, allowing for users to find relevant images in a snap.
In order to create the archive Leetaru reverse-engineered the process by which the Internet Archive organization has digitized 600 million pages of text from libraries around the world. The previous code plucked all text from scans of book pages to create a searchable database of PDF-based information. Leetaru used the same pages but flipped the code such that just the images, their captions, and the paragraphs that preceded and followed the images remain. Each image was then converted to an individual jpeg and uploaded to Flickr.
The photos and illustrations date back to 1500 but go through only 1922 due to copyright restrictions. “Most of the images that are in the books are not in any of the art galleries of the world – the original copies have long ago been lost,” Leetaru told the BBC.
Leetaru is keen to see the archive expand in the future and plans to make his code available to the public in order for other libraries and organizations to process books in their collections. He hopes that other organizations such as Wikipedia will utilize the archive to enrich their own platforms as well. “Take a random page about a historical event and there’s probably a good chance that you’re going to find an image in here that bears in some way on that event or location,” he told the BBC. “Being able to basically enrich [them] would be huge.”