Commentary and Controversy on the Mass Digitization of Books

The mass digitization of books is fairly controversial. Many publishers object to works which are in copyright being copied in projects such as the Google is performing for its Google Book Search system.

Indeed, the Authors Guild and a group of publishers backed by the Association of American Publishers have separately sued Google for making digital copies of copyrighted books from libraries without permission.

Book Digitization and Copyright

These project to digitize the world’s books and historical documents are causing controversy among traditional book publishers who wish to protect copyright and their own electronic publishing initiatives.

Prior to its December 2004 deal Google had been doing mass digitization deals with individual publishers, but the libraries deal has made such arrangements defunct. While it is copying entire volumes of copyright material, Google argues that the Google Book Search system is covered by the legal concept of “fair dealing”. No more than 20% of a copyright book will be available through Google Book Search. The system is designed to show only relevant passages and it will provide links to sites where the book can be bought.

Prominent among other mass digitization projects is Project Gutenberg. Project Gutenberg is mass digitizing only public domain books and currently many thousands of such books are accessible in its digital library. Similarly, the Open Content Alliance (OCA) is mass digitizing hundreds of thousands of books which are freely available for browsing and downloading on the Internet Archive site ( Unlike Google, which is only enabling downloads in PDF format, the OCA makes all its files available for downloading – including raw images and metadata.

A related scheme, though not as advanced, is Open WorldCat ( this is an effort by OCLC which is aimed at making books more discoverable including the identification of the nearest library where people can go and access them.

Microsoft attacking Google’s Mass Digitization programs

In March 2007, Microsoft attempted to increase pressure on Google by launches a fierce attack on Google over its “cavalier” approach to copyright. Microsoft accuses Google of exploiting books, music, films and television programmes without permission from the copyright owners. In response, Google states that it is keeping access to full copyrighted works secure and is only showing snippets of works that are in copyright. Google’s position seems to be that such brief extracts of works probably stimulate sales of the full works, and that its Google Book Search system in fact helps to identify and promote works of specific interest to individuals researching particular topics.

Such attacks are curious, since at this time Microsoft was conducting mass digitization projects of its own, and Google is know to quickly respond to requests to remove specific copyrighted works from Google Book Search when requested by owners of any copyrighted materials.

Google’s Book Search Blog

One of the many official blogs which Google runs is its Book Search Blog. This is usually the first place where news appears about additional libraries joining Google’s mass digitization projects. It is also a good place to read how the company sees its Google Book Search service being used.