1998: ONLINE BEOWULF

[Overview]

Libraries began putting (digital versions of) their treasures on the web for the world to enjoy. The British Library was a pioneer in this field. Several treasures were online in 1998, including Beowulf, known as the first great English masterpiece. Beowulf is the earliest known narrative poem in English, and one of the most famous works of Anglo-Saxon poetry. The British Library holds the only known manuscript of Beowulf, dated circa 1000. The poem itself is much older than the manuscript - some historians believe it might have been written circa 750. Scholarly discussions on the date of creation and provenance of the poem continue around the world, and researchers regularly require access to the manuscript. Taking Beowulf out of its display case for study not only raised conservation issues, it also made it unavailable for the many visitors who were coming to the Library expecting to see this literary treasure on display. The digitization of the manuscript offered a solution to these problems, while providing new opportunities for researchers and book lovers worldwide.

[In Depth (published in 1999)]

Libraries began using the web to make their treasures freely available to the world.

Here is the story of Beowulf.

Beowulf is a treasure of the British Library. "It is an Old English heroic epic poem of anonymous authorship. This work of Anglo-Saxon literature dates to between the 8th and the 11th century, the only surviving European manuscript dating to the early 11th century. At 3,183 lines, it is notable for its length." (excerpt from Wikipedia)

The manuscript was badly damaged by fire in 1731. 18th-century transcripts mention hundreds of words and letters which were then visible along the charred edges, and subsequently crumbled away over the years. To halt this process, each leaf was mounted on a paper frame in 1845.

Scholarly discussions on the date of creation and provenance of the poem continue around the world, and researchers regularly require access to the manuscript. Taking Beowulf out of its display case for study not only raised conservation issues, it also made it unavailable for the many visitors who were coming to the Library expecting to see this literary treasure on display. Digitization of the manuscript offered a solution to these problems, as well as providing new opportunities for readers, and for the world to enjoy.

The Electronic Beowulf Project was launched as a huge database of digital images of the Beowulf manuscript and related manuscripts and printed texts. In 1998, the database included fiber-optic readings of hidden letters and ultraviolet readings of erased text in the manuscript; full electronic facsimiles of the 18th-century transcripts of the manuscript; and selections from important 19th-century collations, editions and translations. Major additions were planned, such as images of contemporary manuscripts, and links with the Toronto Dictionary of Old English Project and with the comprehensive Anglo-Saxon bibliographies of the Old English Newsletter.

The project was developed in partnership with two leading experts, Kevin Kiernan, from the University of Kentucky and Paul Szarmach, from the Medieval Institute, Western Michigan University. Professor Kiernan edited the electronic archive and produced a CD-ROM containing a number of electronic images.

Brian Lang, chief executive of the British Library, explained in 1998: "The Beowulf manuscript is a unique treasure and imposes on the Library a responsibility to scholars throughout the world. Digital photography offered for the first time the possibility of recording text concealed by early repairs, and a less expensive and safer way of recording readings under special light conditions. It also offers the prospect of using image enhancement technology to settle doubtful readings in the text. Network technology has facilitated direct collaboration with American scholars and makes it possible for scholars around the world to share in these discoveries. Curatorial and computing staff learned a great deal which will inform any future programmes of digitization and network service provision the Library may undertake, and our publishing department is considering the publication of an electronic scholarly edition of Beowulf. This work has not only advanced scholarship; it has also captured the imagination of a wider public, engaging people (through press reports and the availability over computer networks of selected images and text) in the appreciation of one of the primary artifacts of our shared cultural heritage." (excerpt from the 1998 website)

The British Library was a pioneer in Europe. Other treasures of the library were already online: Magna Carta, the first English constitutional text, signed in 1215, with the Great Seal of King John; the Lindisfarne Gospels, dated 698; the Diamond Sutra, dated 868, which could be the world's earliest print book; the Sforza Hours, dated 1490-1520, an outstanding Renaissance treasure; the Codex Arundel, a notebook of Leonardo Da Vinci (1452-1519), and the Tyndale New Testament, the first English version of the New Testament, printed by Peter Schoeffer, in Worms.

Brian King also stated the importance of the paper world, and the ongoing commitment of the British Library to its paper collections. He added: "The importance of digital materials will, however, increase. We recognize that network infrastructure is at present most strongly developed in the higher education sector, but there are signs that similar facilities will also be available elsewhere, particularly in the industrial and commercial sector, and for public libraries. Our vision of network access encompasses all these. (…) The development of the Digital Library will enable the British Library to embrace the digital information age. Digital technology will be used to preserve and extend the Library's unparalleled collection. Access to the collection will become boundless with users from all over the world, at any time, having simple, fast access to digitized materials using computer networks, particularly the internet." (excerpt from the website)

Other national libraries started digitizing their collections to offer a free digital library.

When interviewed by Jérôme Strazzulla in the daily newspaper Le Figaro of June 3, 1998, Jean-Pierre Angremy, president of the French National Library, stated: "We cannot, we will not be able to digitize everything. In the long term, a digital library will only be one element of the whole library." The digital library Gallica went online in 1997 with thousands of texts and images relating to French history, life and culture. A major collection of 19th-century French texts and images was available one year later.

[Overview]

The job of librarians, that had already changed a lot with computers, went on to change even more with the internet. Computers made catalogs much easier to handle. Instead of all these cards to be patiently classified into wood or metal drawers, librarians could type in bibliographic records in a program that was sorting out books by alphabetical, chronological and systematic order. Librarians also began using computer programs to lend books and buy new ones. By networking computers, the internet gave a boost to union catalogs for a state, a country or a region, and furthered interlibrary loan. Electronic mail became commonplace for internal and external communications. Librarians could subscribe to newsletters and participate in newsgroups and discussion forums. A number of librarians became webmasters to run library websites, online catalogs and digital libraries.

[In Depth (published in 1999)]

I interviewed Peter Raggett, a digital librarian at OECD (Organization for Economic Co-operation and Development), and Bruno Didier, a digital librarian at Institute Pasteur. Here are some excerpts.

= At the OECD Library

What is OECD? "The OECD is a club of like-minded countries. It is rich, in that OECD countries produce two thirds of the world's goods and services, but it is not an exclusive club. Essentially, membership is limited only by a country's commitment to a market economy and a pluralistic democracy. The core of original members has expanded from Europe and North America to include Japan, Australia, New Zealand, Finland, Mexico, the Czech Republic, Hungary, Poland and Korea. And there are many more contacts with the rest of the world through programmes with countries in the former Soviet bloc, Asia, Latin America - contacts which, in some cases, may lead to membership." (excerpt from the 1998 website)

The Center for Documentation and Information (CDI) of OECD provides information to OECD agents in support of their research work. In 1998, there were 60,000 monographs and 2,500 periodicals. The CDI also provides information in electronic format from databases, CD-ROMs and the internet.

Peter Raggett, head of CDI, has been a professional librarian for nearly twenty years, first working in UK government libraries and then at the OECD since 1994. He has used the internet since 1996. He built up the CDI Intranet pages, which became a main tool for the staff.

Peter wrote in June 1998: "At the OECD Library we have collected together several hundred World Wide Web sites and have put links to them on the OECD Intranet. They are sorted by subject and each site has a short annotation giving some information about it. The researcher can then see if it is possible that the site contains the desired information. This is adding value to the site references and in this way the Central Library has built up a virtual reference desk on the OECD network. As well as the annotated links, this virtual reference desk contains pages of references to articles, monographs and websites relevant to several projects currently being researched at the OECD, network access to CD-ROMs, and a monthly list of new acquisitions. The Library catalogue will soon be available for searching on the Intranet. The reference staff at the OECD Library uses the Internet for a good deal of their work. Often an academic working paper will be on the web and will be available for full-text downloading. We are currently investigating supplementing our subscriptions to certain of our periodicals with access to the electronic versions on the internet."

Peter added: "The internet has provided researchers with a vast database of information. The problem for them is to find what they are seeking. Never has the information overload been so obvious as when one tries to find information on a topic by searching the internet. When one uses a search engine like Lycos or AltaVista or a directory like Yahoo!, it soon becomes clear that it can be very difficult to find valuable sites on a given topic. These search mechanisms work well if one is searching for something very precise, such as information on a person who has an unusual name, but they produce a confusing number of references if one is searching for a topic which can be quite broad. Try and search the web for Russia AND transport to find statistics on the use of trains, planes and buses in Russia. The first references you will find are freight-forwarding firms who have business connections with Russia."

What about the future? "The internet is impinging on many peoples' lives, and information managers are the best people to help researchers around the labyrinth. The internet is just in its infancy and we are all going to be witnesses to its growth and refinement. (…) Information managers have a large role to play in searching and arranging the information on the internet. I expect that there will be an expansion in internet use for education and research. This means that libraries will have to create virtual libraries where students can follow a course offered by an institution at the other side of the world. Personally, I see myself becoming more and more a virtual librarian. My clients may not meet me face-to-face but instead will contact me by e-mail, telephone or fax, and I will do the research and send them the results electronically."

= At the Institute Pasteur Library

In 1999, Bruno Didier was the webmaster of the Institute Pasteur Library. "The Pasteur Institutes are exceptional observatories for studying infectious and parasite-borne diseases. They are wedded to the solving of practical public health problems, and hence carry out research programmes which are highly original because of the complementary nature of the investigations carried out: clinical research, epidemiological surveys and basic research work. Just a few examples from the long list of major topics of the Institutes are: malaria, tuberculosis, AIDS, yellow fever, dengue and poliomyelitis." (excerpt from the 1999 website)

In August 1999, Bruno wrote about his work as a webmaster: "The main aim of the Pasteur Institute Library website is to serve the Institute itself and its associated bodies. It supports applications that have become essential in such a big organization: bibliographic databases, cataloguing, ordering of documents and of course access to online periodicals (presently more than 100). It is also a window for our different departments, at the Institute but also elsewhere in France and abroad. It plays a big part in documentation exchanges with the institutes in the worldwide Pasteur network. I am trying to make it an interlink adapted to our needs for exploration and use of the internet. The website has existed in its present form since 1996 and its audience is steadily increasing. I build and maintain the web pages and monitor them regularly. I am also responsible for training users, which you can see from my pages. The web is an excellent place for training and is included in most ongoing discussions about training."

What about the future of librarians? "Our relationship with both the information and the users is what changes. We are increasingly becoming mediators, and perhaps to a lesser extent 'curators'. My present activity is typical of this new situation: I am working to provide quick access to information and to create effective means of communication, but I also train people to use these new tools. I think the future of our job is tied to cooperation and use of common resources. It is certainly an old project, but it is really the first time we have had the means to set it up."

[Overview]

In 1998, Randy Hobler was a consultant in internet marketing for Globalink, a company specializing in language translation software and services. Randy wrote in September 1998: "85% of the content of the web in 1998 is in English and going down. This trend is driven not only by more websites and users in non-English-speaking countries, but by increasing localization of company and organization sites, and increasing use of machine translation to/from various languages to translate websites. (…) Because the internet has no national boundaries, the organization of users is bounded by other criteria driven by the medium itself. In terms of multilingualism, you have virtual communities, for example, of what I call 'Language Nations'… all those people on the internet wherever they may be, for whom a given language is their native language. Thus, the Spanish Language nation includes not only Spanish and Latin American users, but millions of Hispanic users in the US, as well as odd places like Spanish-speaking Morocco."

[In Depth (published in 2000, updated in 2004)]

In 1998, other languages than English began spreading on the web. In fact, main non-English languages were present nearly from the start. But most of the web was in English. Then people from all over the world began having access to the internet, and posting pages in their own languages. The percentage of the English language began to slowly decrease from nearly 100% to 90%.

In 1998, Randy Hobler was an internet marketing consultant forGlobalink, a company specialized in language translation software andservices. Previously, Randy worked as a consultant for IBM, Johnson &Johnson, Burroughs Wellcome, Pepsi, Heublein, and others.

Randy wrote in September 1998: "Because the internet has no national boundaries, the organization of users is bounded by other criteria driven by the medium itself. In terms of multilingualism, you have virtual communities, for example, of what I call 'Language Nations'… all those people on the internet wherever they may be, for whom a given language is their native language. Thus, the Spanish Language nation includes not only Spanish and Latin American users, but millions of Hispanic users in the US, as well as odd places like Spanish-speaking Morocco."

In 1999, Jean-Pierre Cloutier was the editor of Chroniques de Cybérie, a weekly report of internet news. Jean-Pierre wrote in August 1999: "The web is going to grow in these non English-speaking regions. So we have to take into account the technical aspects of the medium if we want to reach these 'new' users. I think it is a pity there are so few translations of important documents and essays published on the web — from English into other languages and vice-versa. (…) The recent introduction of the internet in regions where it is spreading raises questions which would be good to read about. When will Spanish-speaking communications theorists and those speaking other languages be translated?"

In 1999, Marcel Grangier was the head of the French Section of the Swiss Federal Government's Central Linguistic Services, which meant he was in charge of organizing translation matters for the Swiss government. Marcel wrote in January 1999: "We can see multilingualism on the internet as a happy and irreversible inevitability. So we have to laugh at the doomsayers who only complain about the supremacy of English. Such supremacy is not wrong in itself, because it is mainly based on statistics (more PCs per inhabitant, more people speaking English, etc.). The answer is not to 'fight' English, much less whine about it, but to build more sites in other languages. As a translation service, we also recommend that websites be multilingual. The increasing number of languages on the internet is inevitable and can only boost multicultural exchanges. For this to happen in the best possible circumstances, we still need to develop tools to improve compatibility. Fully coping with accents and other characters is only one example of what can be done."

In 1998, Henri Slettenhaar was a professor at Webster University, Geneva, Swizerland. He insisted regularly on the need of bilingual websites, in the original language and in English. He wrote in December 1998: "I see multilingualism as a very important issue. Local communities that are on the web should principally use the local language for their information. If they want to present it to the world community as well, it should be in English too. I see a real need for bilingual websites. I am delighted there are so many offerings in the original language now. I much prefer to read the original with difficulty than getting a bad translation."

He added in August 1999: "There are two main categories in my opinion. The first one is the global outreach for business and information. Here the language is definitely English first, with local versions where appropriate. The second one is local information of all kinds in the most remote places. If the information is meant for people of an ethnic and/or language group, it should be in that language first with perhaps a summary in English. We have seen lately how important these local websites are — in Kosovo and Turkey, to mention just the most recent ones. People were able to get information about their relatives through these sites."

He added in August 2000: "Multilingualism has expanded greatly. Many e-commerce websites are multilingual now and there are companies that sell products which make localization possible (adaptation of websites to national markets)."

Non English-speaking users reached 50% in Summer 2000. According to the company Global Reach, they were 52.5% in Summer 2001, 57% in December 2001, 59.8% in April 2002, 64.4% in September 2003 (including 34.9% non-English-speaking Europeans and 29.4% Asians) and 64.2% in March 2004 (including 37.9% non-English-speaking Europeans and 33% Asians).

1999: OPEN EBOOK FORMAT

[Overview]

In 1999, there were nearly as many eBook formats as eBooks, with every company and organization creating its own format for its own eBook reader and its own electronic device. The publishing industry felt the need to work on a common format for eBooks and and published in September 1999 the first version of the Open eBook (OeB) format, an eBook format based on XML (eXtensible Markup Language) and defined by the Open eBook Publication Structure (OeBPS). The Open eBook Forum was created in January 2000 to develop the OeB format and OeBPS specifications. Since 2000, most eBook formats were derived from - or are compatible with the OeB format. In April 2005, the Open eBook Forum became the International Digital Publishing Forum (IDPF), and the OeB format became the ePub format. The ePub format is one of the standards for the digital publishing industry.

[Overview]

Like many artists, Jean-Paul began exploring the internet and searching what hyperlinks could offer to expand his writing towards new directions. He switched from being a print author to being an hypermedia author, and created Cotres furtifs (Furtive Cutters), a website telling stories in 3D. He also enjoyed the freedom given by online self-publishing, and wrote in August 1999: "The internet allows me to do without intermediaries, such as record companies, publishers and distributors. Most of all, it allows me to crystallize what I have in my head: the print medium (desktop publishing, in fact) only allows me to partly do that." He added in June 2000: "Surfing the web is like radiating in all directions (I am interested in something and I click on all the links on a home page) or like jumping around (from one click to another, as the links appear). You can do this in the written media, of course. But the difference is striking. So the internet didn't change my life, but it did change how I write. You don't write the same way for a website as you do for a script or a play."

[In Depth (published in 2000)]

I interviewed Murray Suid, a writer of educational books, who was living in Palo Alto, California. Back in Paris, I interviewed Jean-Paul, an hypermedia author, who wrote some interesting comments about digital literature.

= Educational Books

In 1998, Murray Suid was living in Palo Alto, in the heart of Silicon Valley. He was writing educational books, books for kids, multimedia scripts and screenplays. He was among the first to choose a solution that many authors would soon adopt. He explained in September 1998: "If a book can be web-extended (living partly in cyberspace), then an author can easily update and correct it, whereas otherwise the author would have to wait a long time for the next edition, if indeed a next edition ever came out. (…) I do not know if I will publish books on the web — as opposed to publishing paper books. Probably that will happen when books become multimedia. (I currently am helping develop multimedia learning materials, and it is a form of teaching that I like a lot — blending text, movies, audio, graphics, and — when possible — interactivity)."

Murray added in August 1999: "In addition to 'web-extending' books, we are now web-extending our multimedia (CD-ROM) products — to update and enrich them." A few months later, he added: "Our company — EDVantage Software — has become an internet company instead of a multimedia (CD-ROM) company. We deliver educational material online to students and teachers."

= Hypermedia Writing

In 1999, Jean-Paul, an hypermedia author, was the webmaster of cotres.net, a site telling stories in 3D. He really enjoyed the freedom given by online publishing. He wrote in August 1999: "The internet allows me to do without intermediaries, such as record companies, publishers and distributors. Most of all, it allows me to crystallize what I have in my head: the print medium (desktop-publishing, in fact) only allows me to partly do that. Then the intermediaries will take over and I will have to look somewhere else, a place where the grass is greener…"

Jean-Paul added in June 2000: "Surfing the web is like radiating in all directions (I am interested in something and I click on all the links on a home page) or like jumping around (from one click to another, as the links appear). You can do this in the print media, of course. But the difference is striking. So the internet didn't change my life, but it did change how I write. You don't write the same way for a website as you do for a script or a play.

But it wasn't exactly the internet that changed my writing, it was the first model of the Mac. I discovered it when I was teaching myself Hypercard. I still remember how astonished I was during my month of learning about buttons and links and about surfing by association, objects and images. Being able, by just clicking on part of the screen, to open piles of cards, with each card offering new buttons and each button opening onto a new series of them. In short, learning everything about the web that today seems really routine was a revelation for me. I hear Steve Jobs and his team had the same kind of shock when they discovered the forerunner of the Mac in the labs of Rank Xerox.

Since then I have been writing directly on the screen. I use a paper print-out only occasionally, to help me fix up an article, or to give somebody who doesn't like screens a rough idea, something immediate. It is only an approximation, because print forces us into a linear relationship: the words scroll out page by page most of the time. But when you have links, you have a different relationship to time and space in your imagination. And for me, it is a great opportunity to use this reading/writing interplay, whereas leafing through a book gives only a suggestion of it — a vague one because a book is not meant for that."

[Overview]

After founding A Web of Online Dictionaries (WOD) in 1995, Robert Beard included it in a larger project, yourDictionary.com, that he cofounded in early 2000. He wrote in January 2000: "The new website is an index of 1,200+ dictionaries in more than 200 languages. Besides the WOD, the new website includes a word-of-the-day-feature, word games, a language chat room, the old Web of On-line Grammars (now expanded to include additional language resources), the Web of Linguistic Fun, multilingual dictionaries; specialized English dictionaries; thesauri and other vocabulary aids; language identifiers and guessers, and other features; dictionary indices. yourDictionary.com will hopefully be the premiere language portal and the largest language resource site on the web. It is now actively acquiring dictionaries and grammars of all languages with a particular focus on endangered languages. It is overseen by a blue ribbon panel of linguistic experts from all over the world."

[In Depth (published in 2001)]

After creating A Web of Online Dictionaries in 1995, Robert Beard cofounded yourDictionary.com in early 2000. He wrote in January 2000: "A Web of Online Dictionaries (WOD) is now a part of yourDictionary.com (as of February 15, 2000). The new website is an index of 1,200+ dictionaries in more than 200 languages. Besides the WOD, the new website includes a word-of-the-day-feature, word games, a language chat room, the old Web of On-line Grammars (now expanded to include additional language resources), the Web of Linguistic Fun, multilingual dictionaries; specialized English dictionaries; thesauri and other vocabulary aids; language identifiers and guessers, and other features; dictionary indices. YourDictionary.com will hopefully be the premiere language portal and the largest language resource site on the web. It is now actively acquiring dictionaries and grammars of all languages with a particular focus on endangered languages. It is overseen by a blue ribbon panel of linguistic experts from all over the world."

Answering my question about multilingualism, Robert Beard added in January 2000: "While English still dominates the web, the growth of monolingual non-English websites is gaining strength with the various solutions to the font problems. Languages that are endangered are primarily languages without writing systems at all (only 1/3 of the world's 6,000+ languages have writing systems). I still do not see the web contributing to the loss of language identity and still suspect it may, in the long run, contribute to strengthening it. More and more Native Americans, for example, are contacting linguists, asking them to write grammars of their language and help them put up dictionaries. For these people, the web is an affordable boon for cultural expression."

Answering the same question, Caoimhín Ó Donnaíle wrote in May 2001: "I would emphasize the point that as regards the future of endangered languages, the internet speeds everything up. If people don't care about preserving languages, the internet and accompanying globalization will greatly speed their demise. If people do care about preserving them, the internet will be a tremendous help."

Caoimhín Ó Donnaíle teaches computing - through the Gaelic language - at the Institute Sabhal Mór Ostaig, located on the Island of Skye, in Scotland. He also maintains the college website, which is the main site worldwide with information on Scottish Gaelic. He also maintains European Minority Languages, a list of minority languages by alphabetic order and by language family. He wrote in May 2001: "There has been a great expansion in the use of information technology at the Gaelic-medium college here. Far more computers, more computing staff, flat screens. Students do everything by computer, use Gaelic spell-checking, Gaelic online terminology database. More hits on our web site. More use of sound. Gaelic radio (both Scottish and Irish) now available continuously worldwide via the internet. Major project has been translation of the Opera web-browser into Gaelic - the first software of any size available in Gaelic."

Published by SIL International (SIL: Summer Institute of Linguistics), The Ethnologue: Languages of the World is a catalogue of more than 6,700 languages. A paper version and a CD-ROM are also available. Barbara Grimes was the editor of the 8th to 14th editions, 1971-2000. She wrote in January 2000: "It is a catalog of the languages of the world, with information about where they are spoken, an estimate of the number of speakers, what language family they are in, alternate names, names of dialects, other sociolinguistic and demographic information, dates of published Bibles, a name index, a language family index, and language maps."

[Overview]

The Bible of Gutenberg went online in November 2000, on the website of the British Library. As we all know, the Bible of Gutenberg is considered as the first print book. Gutenberg printed it in 1455 in Germany, perhaps printing 180 copies, with 48 copies that would still exist in 2000. Three copies - two full ones and one partial one - belong to the British Library. The two full copies - a little different from each other - were digitized in March 2000 by experts from the Keio University of Tokyo and NTT (Nippon Telegraph and Telephone Communications).

[Overview]

Conceived in October 2000 by Charles Franks, Distributed Proofreaders was launched online in March 2001 to help in the digitization of public domain books. The method is to break up the tedious work of checking eBooks for errors into small, manageable chunks. Originally meant to assist Project Gutenberg in the handling of shared proofreading, Distributed Proofreaders has become the main source of Project Gutenberg eBooks. In 2002, Distributed Proofreaders became an official Project Gutenberg site. The number of books processed through Distributed Proofreaders has grown fast. In 2003, about 250-300 people were working each day all over the world producing a daily total of 2,500-3,000 pages, the equivalent of two pages a minute. In 2004, the average was 300-400 proofreaders participating each day and finishing 4,000-7,000 pages per day, the equivalent of four pages a minute. Distributed Proofreaders processed a total of 3,000 books in February 2004, 5,000 books in October 2004, 7,000 books in May 2005, 8,000 books in February 2006 and 10,000 books in March 2007, with the help of 36,000 volunteers.

[In Depth (published in 2005, updated in 2008)]

The main "leap forward" of Project Gutenberg since 2000 is due to Distributed Proofreaders. In 2002, Distributed Proofreaders became an official Project Gutenberg site. In May 2006, Distributed Proofreaders became a separate entity and continues to maintain a strong relationship with Project Gutenberg.

Volunteers don't have a quota to fill, but it is recommended they do a page a day if possible. It doesn't seem much, but with hundreds of volunteers it really adds up. In December 2007, five books were produced per day by thousands of volunteers.

From the website one can access a program that allows several proofreaders to be working on the same book at the same time, each proofreading different pages. This significantly speeds up the proofreading process. Volunteers register and receive detailed instructions. For example, words in bold, italic or underlined, or footnotes are always treated the same way for any book. A discussion forum allows them to ask questions or seek help at any time. A project manager oversees the progress of a particular book through its different steps on the website.

The website gives a full list of the books that are: (a) completed, i.e. processed through the site and posted to Project Gutenberg; (b) in progress, i.e. processed through the site but not yet posted, because currently going through their final proofreading and assembly; (c) being proofread, i.e. currently being processed. On August 3, 2005, 7,639 books were completed, 1,250 books were in progress and 831 books were being proofread. On May 1st, 2008, 13,039 books were completed, 1,840 books were in progress and 1,000 books were being proofread.

Each time a volunteer (proofreader) goes to the website, s/he chooses a book, any book. Then one page of the book appears in two forms side by side: the scanned image of one page and the text from that image (as produced by OCR software). The proofreader can easily compare both versions, note the differences and fix them. OCR is usually 99% accurate, which makes for about 10 corrections a page. The proofreader saves each page as it is completed and can then either stop work or do another. The books are proofread twice, and the second time only by experienced proofreaders. All the pages of the book are then formatted, combined and assembled by post-processors to make an eBook. The eBook is now ready to be posted with an index entry (title, subtitle, author, eBook number and character set) for the database. Indexers go on with the cataloging process (author's dates of birth and death, Library of Congress classification, etc.) after the release.

Volunteers can also work independently, after contacting Project Gutenberg directly, by keying in a book they particularly like using any text editor or word processor. They can also scan it and convert it into text using OCR software, and then make corrections by comparing it with the original. In each case, someone else will proofread it. They can use ASCII and any other format. Everybody is welcome, whatever the method and whatever the format.

New volunteers are most welcome too at Distributed Proofreaders (DP), Distributed Proofreaders Europe (DP Europe) and Distributed Proofreaders Canada (DPC). Any volunteer anywhere is welcome, for any language. There is a lot to do. As stated on both websites, "Remember that there is no commitment expected on this site. Proofread as often or as seldom as you like, and as many or as few pages as you like. We encourage people to do 'a page a day', but it's entirely up to you! We hope you will join us in our mission of 'preserving the literary history of the world in a freely available form for everyone to use'."

[Overview]

The Public Library of Science (PLoS) was founded in October 2000 by biomedical scientists Harold Varmus, Patrick Brown and Michael Eisen, from Stanford University, Palo Alto, and University of California, Berkeley. Headquartered in San Francisco, PLoS is a non-profit organization whose mission is to make the world’s scientific and medical literature a public resource. In early 2003, PLoS created a non-profit scientific and medical publishing venture to provide scientists and physicians with high-quality, high-profile journals in which to publish their most important work: PLoS Biology (launched in 2003), PLoS Medicine (2004), PLoS Genetics (2005), PLoS Computational Biology (2005), PLoS Pathogens (2005), PLoS Clinical Trials (2006), PLoS Neglected Tropical Diseases (2007). All PLoS articles are freely available online, and deposited in the free public archive PubMed Central. They can be freely redistributed and reused, including for translations, as long as the author(s) and source are cited. PLoS also hopes to encourage other publishers to adopt the open access model, or to convert their existing journals to an open access model.

[Overview]

Launched in January 2001 by Jimmy Wales and Larry Sanger (Larry resigned later on), Wikipedia has quickly grown into the largest reference website on the internet. Its multilingual content is free and written collaboratively by people worldwide. Its website is a wiki, which means that anyone can edit, correct and improve information throughout the encyclopedia. The articles stay the property of their authors, and can be freely used according to the GFDL (GNU Free Documentation License). Wikipedia is hosted by the Wikimedia Foundation, which runs a number of other projects, for example Wiktionary - launched in December 2002 - followed by Wikibooks, Wikiversity, Wikinews and Wikiquote. In December 2004, Wikipedia had 1.3 million articles from 13,000 contributors in 100 languages. Two years later, in December 2006, it had 6 million articles in 250 languages.

[Overview]

Creative Commons (CC) was founded in 2001 by Lawrence Lessing, a professor at Stanford Law School, California. As stated on its website, "Creative Commons is a nonprofit corporation dedicated to making it easier for people to share and build upon the work of others, consistent with the rules of copyright. We provide free licenses and other legal tools to mark creative work with the freedom the creator wants it to carry, so others can share, remix, use commercially, or any combination thereof." There were one million Creative Commons licensed works in 2003, 4.7 million licensed works in 2004, 20 million licensed works in 2005, 50 million licensed works in 2006, 90 million licensed works in 2007, and 130 million licensed works in 2008. Science Commons was founded in 2005 to "design strategies and tools for faster, more efficient web-enabled scientific research." ccLearn was founded in 2007 as "a division of Creative Commons dedicated to realizing the full potential of the internet to support open learning and open educational resources."

[Overview]

The MIT OpenCourseWare (MIT OCW) is a large-scale, web-based electronic publishing initiative launched by MIT (Massachusetts Institute of Technology) to promote open dissemination of knowledge and information. A pilot version of the MIT OpenCourseWare (MIT OCW) was available online in September 2002, with 32 course materials of MIT. In September 2003, the site was officially launched with several hundred course materials. In March 2004, 500 course materials were available in 33 different topics. In May 2006, 1,400 course materials were offered by 34 departments belonging to the five schools of MIT. In November 2007, all 1,800 course materials were available, with 200 new and updated courses per year. In November 2005, the MIT launched the OpenCourseWare Consortium (OCW Consortium) as a collaboration of educational institutions creating a broad body of open educational content using a share model. One year later, the OCW Consortium included the courses of 100 universities worldwide.

[Overview]

In January 2004, Project Gutenberg spread across the Atlantic with the launching of Project Gutenberg Europe (PG Europe) and Distributed Proofreaders Europe (DP Europe) by Project Rastko, a non-governmental cultural and educational project located in Belgrade, Serbia. DP Europe uses the software of the original Distributed Proofreaders. DP Europe is a multilingual website, with its main pages translated into several European languages by volunteer translators. In April 2004, DP Europe was available in 12 languages. The long-term goal is 60 languages and 60 linguistic teams representing all European languages. DP Europe supports Unicode to be able to proofread eBooks in numerous languages. Unicode is an encoding system that gives a unique number for every character in any language. DP Europe finished processing its 100th book in May 2005 and its 500th book in October 2008. DP Europe operates under "life +50" copyright laws. When it gets up to speed, DP Europe will provide eBooks for several national and/or linguistic digital libraries.

[In Depth (published in 2005, updated in 2008)]

In 2004, multilingualism became one of the priorities of Project Gutenberg, like internationalization. Michael Hart went off to Europe, with stops in Paris, Brussels and Belgrade. In Belgrade, he met with the team of Project Rastko, to support the creation of Distributed Proofreaders Europe (launched in December 2003) and Project Gutenberg Europe (launched in January 2004).

The launching of Distributed Proofreaders Europe (DP Europe) by Project Rastko was indeed a very important step. DP Europe uses the software of the original Distributed Proofreaders and is dedicated to the proofreading of books for Project Gutenberg Europe. Since the very beginning, DP Europe has been a multilingual website, with its main pages translated into several European languages by volunteer translators. DP Europe was available in 12 languages in April 2004 and 22 languages in May 2008.

The long-term goal is 60 languages and 60 linguistic teams representing all the European languages. When it gets up to speed, DP Europe will provide books for several national and/or linguistic digital libraries. The goal is for every country to have its own digital library (according to the country copyright limitations), within a continental network (for France, the European network) and a global network (for the whole planet).

A few lines now on Project Rastko, which launched such a difficult and exciting project for Europe, and catalyzed volunteers' energy in both Eastern and Western Europe (and anywhere else: as the internet has no boundaries, there is no need to live in Europe to register). Founded in 1997, Project Rastko is a non-governmental cultural and educational project. One of its goals is the online publishing of Serbian culture. It is part of the Balkans Cultural Network Initiative, a regional cultural network for the Balkan peninsula in south-eastern Europe.

In May 2005, Distributed Proofreaders Europe finished processing its 100th book. In June 2005 Project Gutenberg Europe was launched with these first 100 books. DP Europe supports Unicode to be able to proofread books in numerous languages. Created in 1991 and widely used since 1998, Unicode is an encoding system that gives a unique number for every character in any language, contrary to the much older ASCII that was meant only for English and a few European languages.

On August 3, 2005, 137 books were completed (processed through the site and posted to Project Gutenberg Europe), 418 books were in progress (processed through the site but not yet posted, because currently going through their final proofreading and assembly), and 125 books were being proofread (currently being processed). On May 10, 2008, 496 books were completed, 653 books were in progress and 91 books were being proofread.

[Overview]

In October 2004, Google launched the first part of Google Print as a project aimed at publishers, for internet users to be able to see excerpts from their books and order them online. In December 2004, Google launched the second part of Google Print as a project intended for libraries, to build up a world digital library by digitizing the collections of main partner libraries. The beta version of Google Print went live in May 2005. In August 2005, Google Print was stopped until further notice because of lawsuits filed by associations of authors and publishers for copyright infringement. The program resumed in August 2006 under the new name of Google Books. Google Books has offered books digitized in the participating libraries (Harvard, Stanford, Michigan, Oxford, California, Virginia, Wisconsin-Madison, Complutense of Madrid and New York Public Library), with either the full text for public domain books or excerpts for copyrighted books. The lawsuit with associations of authors and publishers was settled in October 2008.

[In Depth (published in 2008)]

In October 2004, Google launched the first part of Google Print as a project aimed at publishers, for users to be able to see snippets of their books and order them online. The beta version of Google Print went on line in May 2005. In December 2004, Google launched the second part of Google Print as a project intended for libraries, to build up a digital library of 15 million books by scanning and digitizing the collections of main libraries, beginning with the Universities of Michigan (7 million books), Harvard, Stanford and Oxford, and the New York Public Library. The planned cost was an average of US $10 per book, and $150 to $200 million on ten years. In August 2005, Google Print was stopped until further notice because of lawsuits filed by publishers for copyright infringement. The program resumed in August 2006 under the new name of Google Books.

Google Books was launched in August 2006 to replace the controversial Google Print, stopped in August 2005 because of main copyright concerns. Google Books offers excerpts of books digitized by Google in the participating libraries (Harvard, Stanford, Michigan, Oxford, California, Virginia, Wisconsin-Madison, Complutense of Madrid and New York Public Library). Google scans 3,000 books a day, including copyrighted books. The inclusion of copyrighted books is widely criticized by authors and publishers worldwide. In the US, lawsuits were filed by the Authors Guild and the Association of American Publishers (AAP) for alleged copyright infringement. The assumption is that the full scanning and digitizing of copyrighted books infringes copyright laws, even if only snippets are made freely available on the search engine. To counteract copyright concerns and the problems of a closed platform, the Internet Archive launched the Open Content Alliance (OCA) with the goal of digitizing only public domain books and make them searchable and downloadable through any search engine.

[Overview]

The Open Content Alliance (OCA) was conceived by the Internet Archive in early 2005 to offer broad, public access to the world culture. It was launched in October 2005 as a group of cultural, technology, non profit and governmental organizations willing to build a permanent archive of multilingual digitized text and multimedia content. The project aims at digitizing public domain books around the world and make them searchable through any web search engine and downloadable for free. Unlike the Google Print project, the OCA scans and digitizes only public domain books, except when the copyright holder has expressly given permission. The first contributors to OCA were the University of California, the University of Toronto, the European Archive, the National Archives in the United Kingdom, O’Reilly Media and Prelinger Archives. The digitized collections are freely available in the Text Archive of the Internet Archive. In December 2006, they reached a milestone of 100,000 digitalized books publicly available, with 12,000 new books added per month. Two years later, in December 2008, one million books were "posted under OCA principles or otherwise public domain hosted by the Internet Archive."

[Overview]

Microsoft has also participated in the Open Content Alliance (OCA), launched by the Internet Archive in October 2005. In December 2006, Microsoft released the beta version of Live Search Books. The book search engine performs keyword searches for non copyrighted books digitized by Microsoft from the collections of the British Library, University of California, and University of Toronto, followed in January 2007 by the New York Public Library and Cornell University. Books offer full text views and can be downloaded in PDF files. In the future, Microsoft intends to add copyrighted works with the permission of their publishers. In May 2007, Microsoft announced agreements with several main publishers, including Cambridge University Press and McGraw Hill. After digitizing 750,000 books and indexing 80 million journal articles, Microsoft ended the Live Search Books program in May 2008 and closed the website.

[Overview]

WorldCat was created in 1971 by the non-profit OCLC (Online Computer Library Center) as the union catalog of the university libraries in the State of Ohio. Over the years, OCLC became a national and worldwide library cooperative, and WorldCat the largest library catalog in the world. In 2005, WorldCat had 61 million bibliographic records in 400 languages from 9,000 member libraries (paid subscription) in 112 countries. In 2006, 73 million bibliographic records were linking to 1 billion documents available in these libraries. In August 2006, WorldCat began to migrate to the web through the beta version of the new website WorldCat.org. Member libraries now provide free access to their catalogs and electronic resources: books, audio books, abstracts and full-text articles, photos, music CDs and videos. Another pioneer site was RedLightGreen, launched in Spring 2004 (with a beta version in Fall 2003) as the web version of the RLG Union Catalog, another major union catalog created in 1980 by the Research Libraries Group (RLG). RedLightGreen ended its service in November 2006, after a successful 3-year run, and RLG joined OCLC.

[In Depth (published in 1999)]

In 1998, two organizations - OCLC (Online Computer Library Center) and RLIN (Research Library Information Network) - were running international bibliographical databases through the internet.

The OCLC Online Computer Library Center is a non-profit, membership, library computer service and research organization dedicated to furthering access to the world's information and reducing information costs. More than 27,000 libraries in 65 countries were using OCLC services to manage their collections and to provide online reference services. The website was available in English, Chinese, French, German, Portuguese, and Spanish.

OCLC services included: access services; collections and technical services; reference services; resource sharing; Dewey Decimal Classification (published by OCLC Forest Press); and preservation resources. From its headquarters in Dublin, Ohio, OCLC operated one of the world's largest library information networks. Libraries in the US joined OCLC through their OCLC-affiliated regional networks. Libraries outside the US received OCLC services through OCLC Asia Pacific, OCLC Canada, OCLC Europe, OCLC Latin America and the Caribbean, or via international distributors.

OCLC was also running WorldCat - the name of the OCLC Online Union Catalog - which is a merged electronic catalog of library catalogs around the world, and the world's largest bibliographic database with its 38 million records (in early 1998) in 400 languages (with transliteration for non-Roman languages), and an annual increase of 2 million records.

WorldCat stemmed from a concept which is the same for all union catalogs: earn time to avoid the cataloguing of the same document by many catalogers worldwide. When they are about to catalog a publication, the catalogers of the member libraries search the OCLC catalog. If they find the record, they copy it in their own catalog and add some local information. If they don't find the record, they create it in the OCLC catalog, and this new record is immediately available to all the catalogers of the member libraries worldwide.

Unlike RLIN, another main union catalog that accepts several records for the same document (please see below), the OCLC Online Union Catalog accepts only one record per document, and asks its members not to create duplicate records for documents that were already cataloged. The records are created in USMARC format (MARC: Machine Readable Catalog) according to the Anglo-American Cataloguing Rules, 2nd version (AACR2).

What is the history of OCLC? "In 1967, the presidents of the colleges and universities in the state of Ohio founded the Ohio College Library Center (OCLC) to develop a computerized system in which the libraries of Ohio academic institutions could share resources and reduce costs. OCLC's first offices were in the Main Library on the campus of the Ohio State University (OSU), and its first computer room was housed in the OSU Research Center. It was from these academic roots that Frederick G. Kilgour, OCLC's first president, oversaw the growth of OCLC from a regional computer system for 54 Ohio colleges into an international network. In 1977, the Ohio members of OCLC adopted changes in the governance structure that enabled libraries outside Ohio to become members and participate in the election of the Board of Trustees; the Ohio College Library Center became OCLC, Inc. In 1981, the legal name of the corporation became OCLC Online Computer Library Center, Inc. Today, OCLC serves more than 27,000 libraries of all types in the US and 64 other countries and territories." (excerpt from the 1998 website)

In early 1998, WorldCat had 38 million records - with one record per document. RLIN (Research Libraries Information Network) had 88 million records - with several records per document.

RLIN was run by the Research Libraries Group (RLG). The central RLIN database was a union catalog of 88 million items held in main libraries belonging to RLG member institutions, including research and specialized libraries, like law, technical, and corporate libraries.

RLIN included:

(1) records that described works cataloged by the Library of Congress, the National Library of Medicine, the US Government Printing Office, CONSER (Conversion of Serials Project), the British Library, the British National Bibliography, the National Union Catalog of Manuscript Collections, and RLG members and users;

(2) nearly all the books cataloged since 1968 and rapidly expanding coverage for older materials;

(3) information about non-book materials ranging from musical scores, films, videos, serials, maps, and recordings, to archival collections and machine-readable data files;

(4) unique on-line access to special resources, such as the United Nations' DOCFILE and CATFILE records, and the Rigler and Deutsch Index to pre-1950 commercial sound recordings;

(5) international book vendors' in-process records, that were transferred to bibliographers, acquisition services and catalogers, to order records or help them for cataloguing items in their own local databases.

RLIN also provided:

(1) A catalog of computer files. Machine-readable data files were useful to a growing number of disciplines. RLIN contained records describing a number of such files, from the full-text French literary works in the ARTFL Database to the statistical data collected by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan;

(2) A catalog of archives and special collections. The archival and manuscript collections of research libraries, museums, state archives, and historical societies contained essential primary resources, but information about their contents was often elusive. Archivists and curators worked with RLG to create an automated format for these collections. In 1998, there were 500,000 records available in RLIN for archival collections located throughout North America. These records described many collections by personal name, organization, subject, and format.

RLIN also hosted the English Short Title Catalogue (ESTC), an invaluable research tool for scholars in English culture, language, and literature. This file provided extensive descriptions and holdings information for letterpress materials printed in UK or any of its dependencies in any language, from the beginnings of print to 1800 - as well as for materials printed in English anywhere else in the world. Produced by the ESTC editorial offices at the University of California, Riverside, and the British Library, in partnership with the American Antiquarian Society and over 1,600 libraries worldwide, the file was updated and expanded daily. ESTC served as a comprehensive bibliography of the hand-press era and as a census of surviving copies. ESTC included 420,000 records as of June 1998, from the beginnings of print (1473) through the 18th century - including materials ranging from Shakespeare and Greek New Testaments to anonymous ballads, broadsides, songs, advertisements and other ephemera.

[Overview]

Citizendium was launched in October 2006 as a pilot project to build a new encyclopedia, at the initiative of Larry Sanger, who was the cofounder of Wikipedia (with Jimmy Wales) in January 2001, but resigned later on over policy and content quality issues. Citizendium - which stands for a "citizen's compendium of everything" - is a wiki project open to public collaboration, but combining "public participation with gentle expert guidance." The project is experts-led, not experts-only. Contributors use their own names, not anonymous pseudonyms, and they are guided by expert editors. "Editors will be able to make content decisions in their areas of specialization, but otherwise working shoulder-to-shoulder with ordinary authors." (Larry Sanger, Toward a New Compendium of Knowledge, September 2006) Constables make sure the rules are respected. Citizendium was launched on March 25, 2007, with 1,100 articles, 820 authors and 180 editors.

[Overview]

Launched in May 2007, the Encyclopedia of Life is a global scientific effort to document all known species of animals and plants (1.8 million), and expedite the millions of species yet to be discovered and catalogued (8 to 10 million). This collaborative effort is led by several main institutions: Field Museum of Natural History, Harvard University, Marine Biological Laboratory, Missouri Botanical Garden, Smithsonian Institution, Biodiversity Heritage Library (BHL). The initial funding comes from the MacArthur Foundation (US $10 million) and the Sloan Foundation ($2.5 million). A number of pages will be available by mid-2008. The encyclopedia will be operational in 3-5 years and completed (with all known species) in 10 years. Built on the scientific integrity of thousands of experts around the globe, the Encyclopedia will be a moderated wiki-style environment, freely available to all users everywhere.

1968: ASCII: http://www.asciitable.com/ 1971: Project Gutenberg: https://www.gutenberg.org/ 1974: Internet: http://www.isoc.org/ 1977: UNIMARC: http://www.unimarc.net/ 1984: Copyleft: http://www.gnu.org/copyleft/ 1990: Web: http://www.w3.org/ 1991: Unicode: http://unicode.org/ 1993: Online Books Page: http://onlinebooks.library.upenn.edu/ 1993: PDF: http://www.adobe.com/products/acrobat/adobepdf.html 1994: Library Websites: http://lists.webjunction.org/libweb/ 1994: Bold Publishers: http://www.nap.edu/ 1995: Amazon.com: http://www.amazon.com/ 1995: Online Press: http://www.ipl.org/div/news/ 1996: Internet Archive: http://www.archive.org/ 1996: New Ways of Teaching: http://www.ifip.org/ 1996: Palm Pilot: http://www.palm.com/ 1997: Digital Publishing: http://www.adobe.com/devnet/digitalpublishing/ 1997: Logos Dictionary: http://www.logos.it/ 1997: Multimedia Convergence: http://www.ilo.org/public/english/dialogue/sector/ 1998: Online Beowulf: http://www.bl.uk/onlinegallery/onlineex/englit/beowulf/ 1998: Digital Librarians: http://www.ifla.org/ 1998: Multilingual Web: http://www.w3.org/International/ 1999: Digital Authors: http://www.cotres.net/ 1999: Open eBook: http://www.idpf.org/ 1999: yourDictionary.com: http://www.yourdictionary.com/ 2000: Online Bible of Gutenberg: http://www.bl.uk/treasures/gutenberg/homepage.html 2000: Distributed Proofreaders: http://www.pgdp.net/ 2000: Public Library of Science: http://www.plos.org/ 2001: Wikipedia: http://www.wikipedia.org/ 2001: Creative Commons: http://creativecommons.org/ 2002: MIT OpenCourseWare: http://ocw.mit.edu/ 2004: Project Gutenberg Europe: http://pge.rastko.net/ 2004: Google Print / Book Search: http://books.google.com/ 2005: Open Content Alliance: http://www.opencontentalliance.org/ 2006: Microsoft Live Search Books: http://blogs.msdn.com/livesearch/ 2006: Free WorldCat: http://www.worldcat.org/ 2007: Citizendium: http://en.citizendium.org/ 2007: Encyclopedia of Life: http://www.eol.org/

End of Project Gutenberg's Technology and Books for All, by Marie Lebert

Back to Index Next