Mass Digitization    
 
Custom Search
   

 

 

On this page

Mass Digitization

An estimated 100m or more books have been produced since Johann Gutenberg invented movable type in the 15th century. A large part of this vast literature is now being converted to digital books and moved into the world of electronic publishing.

A significant effort is being made by a number of universities and companies such as Google and (until recently) Microsoft to digitally capture the content of huge numbers of books. This mass digitization of the world's literature is made possible by agreements made with major libraries around the world. Google reached a total of 1 million books scanned and digitized in early 2007. Digital publishing has to some degree replaced this effort where newly published books are concerned, but the vast body of literature was printed before 2010 when major publishers began to generate eBooks directly.

Google Book Search and Digitization of Books

Helping Google's mass digitization of books are a number of important libraries. Initially a group five libraries agreed to help Google. These are the New York Public Library, the libraries of the Universities of Stanford, Harvard, and Michigan, and the world-renowned Bodleian Library (founded in 1602) at the University of Oxford. Each of these libraries agreed (in December 2004) to allow Google to undertake mass digitization projects and to make their stock of books available in a digital library via Google Book Search (books.google.com). The combined result of the mass digitization of these five library collections will a digital library containing about 30 million volumes.

The University of Virginia, a pioneer in digitizing public domain materials, has also joined the Google book digitization project, as has the University of California, the University of Wisconsin-Madison Libraries and the Wisconsin Historical Society Library. Google is conducting this mass digitization under the auspices of the Open Content Alliance (opencontentalliance.org).

Japan's Keio University Library was the first Japanese partner (and the 26th library overall) to join Google's mass digitalization of books effort, the Google Books Library Project. At least 120,000 public domain books from the Keio University collections will be digitized.

Digitization of Bavarian State Library Collection German and other European Language Books

A very large and significant European language library is being digitized in a partnership between Google and the Bayerische Staatsbibliothek (Bavarian State Library). German-readers will be able to access the library's public domain works online through Google Book Search. The Bavarian State Library is one of Europe's most important and renowned international research libraries. Its partnership with the Google Books Library Project will see more than a million out-of-copyright books being digitized. These range from well-loved German classics by the Brothers Grimm and Goethe to extensive collections previously only available to those able to consult the library's stacks. In addition to German-language works, the library's collection includes numerous out-of-copyright works in French, Spanish, Latin, Italian and English. Some of the works of the Bayerische Staatsbibliothek date back to the very first moments of book printing and bear incredible cultural meaning.

Digitization of Libraries focused on Spanish Literature and Latin American History

Recently the University of Texas at Austin agreed to allow Google to mass digitize more than a million books in its libraries. Significantly, this includes the University of Texas' world-renowned Nettie Lee Benson Latin American Collection which contains many important works from early Latin America history. Other spanish literature collections which Google is progressively including in Google Book Search include those of The National Library of Catalonia, University Complutense of Madrid, and the University of California.

British Library Online Project

In a separate deal, Microsoft has made arrangements with the British Library to mass digitize about 100,000 books. Microsoft scanned about 25 million pages from the British Library during 2006. In May 2008, Microsoft announced it was bringing to a close it's book search project. To that date Microsoft said it had digitized 750,000 books and indexed 80 million journal articles. Microsoft said that Books and scholarly publications will continue to be integrated into Microsoft Search results, but not through separate indexes.

Universal Digital Library

The Universal Digital Library (UDL) is an initiative by Carnegie Mellon University and a number of UDL partners in China, India, and Egypt to mass digitize human knowledge by capturing all books in digital format and make them freely available over the internet. The Universal Digital Library has a particular focus on digitizing (and storing safely) rare and unique books from around the world. When and if there is a question about a book's copyright only 15 percent of the book is published online – however the entire book is scanned and archived by the Universal Digital Library.

An initial goal for the Universal Digital Library was the Million Book Digital Library project. The project's objective was to mass digitize over a million books to demonstrate the feasibility of the Universal Digital Library. This is less than 1% of all books in all languages ever published. (The total number of different titles indexed in OCLC's WorldCat is about 48 million). The 1 million target was reached in April 2007. By the end of 2007 over 1.5 million books had been digitized. A secondary objective of the Million Book Digital Library project is to provide a test bed that will support other researchers who are working on improved scanning techniques, improved optical character recognition, and improved indexing.

Market Size for Online Book Publishing, Digital Publishing, and eBooks

Most efforts to achieve the mass digitization of libraries are focused on making knowledge and documents widely accessible - particularly where documents are of historic significance.

The commercial market opportunity for sales of digitized books, magazines, and newspapers has recently begun to expand dramatically. Hand-held devices such as Amazon's Kindle and now the success of the Apple iPad, along with devices from Boarders' kobo and Motorola Xoom, have made eBook-reading both practicable and pleasurable.

In June 2011, Apple reported that more than 130 million books had been downloaded from their Apple iBookstore (originally an App on iOS devices like the iPad, now part of iTunes), having signed-up major publishers including Harper Collins, Macmillan, Simon & Shuster, Hachette Book Group, Penguin, and Random House.

At present it seems that reading books on laptops and portable book readers is far less appealing to most people than the form factor which the world has used for centuries: printed books. That said, digital publishing is clearly a growing trend, and most print publications also have online editions.

While not the only factor involved, the preference for paper books is currently influencing the market size for online book publishing. Publisher Simon & Schuster reported that its sales of electronic books in 2007 were about $1 million (out of total annual sales of roughly $1 billion), though the company plans to double the size of its catalogue of eBooks during 2008. In 2005 digital copies of books generated sales of just US$12 million. That gave digital publishing a market share of under 0.5%: the book publishing industry achieved sales of US$25 billion in 2005, according to the Association of American Publishers.

Commentary on Mass Digitization of Books, including Copyright issues and legal challenges.

Webcast of conference on the effects of Mass Digitization of books on Libraries at the University of Michigan.