Chapter 2

R.21. I tried to unzip my file, but it said the file was corrupt, or damaged.

The chances are that it didn't download correctly. Try downloading it again. If you don't succeed the second time, try downloading the unzipped version.

R.22. I see gibberish onscreen when I click on a book.

To save download time, our etexts are stored in zipped form as well as text form. Zipped files are smaller, and take less time to transfer to your computer, but you need a program to unzip them. If you try to view a zipped file directly, it looks like gibberish.

You can recognize zipped files easily because their filenames end in .zip.

If this happens, either make sure you're asking your browser to Save the file rather than display it (often, you right-click the file and choose Save) or else click on the version of the file that ends in .txt instead of .zip. You don't need a zip program to view .txt files.

Looking at a zip rather than a text file is by far the most common reason for this problem, but there are some others. If you're quite sure that you're not looking at a zip file, then it could be that the file you downloaded is in a character set that your viewer doesn't recognize, like Big-5 [V.78] for Chinese texts, or Unicode [V.77]. If this is the case, you will have to find a viewer that works on your computer for the specified character set. We may also have an ASCII version of the same text available for you—we do try to have ASCII versions for everything [G.17], but some languages, like Chinese, just cannot be sensibly expressed in ASCII.

If you can seemostof the characters, enough to be able to make out the text, but there are regular gibberish characters, black squares, empty boxes or obviously missing characters scattered about through words, then you are probably looking at an "8-bit" text [V.79], with accented characters, and your viewer doesn't handle the character set. See the FAQ "I can read the text file, but a few characters appear as black squares, or gibberish" [R.31].

If there are a very few gibberish characters, black squares or obviously missing characters in the text, then it's likely that this was intended to be a 7-bit text, but a few 8-bit characters like the British pound symbol or accented letters slipped through.

R.23. Can I download and read your books?

Yes. That's what Project Gutenberg is all about—making texts available free to everyone!

R.24. What am I allowed to do with the books I download?

Most Project Gutenberg e-texts are in the public domain. You can do anything you like with these—you can re-post them on your site, print them, distribute them, translate them to other languages, convert them to other formats, or redistribute them in unchanged form. However, if you distribute versions under the Project Gutenberg trademark, we do impose some conditions, which are explained in the header and/or footer in each text.

Some Project Gutenberg e-texts have copyright restrictions. You can still download and read these, but you may not be allowed to reproduce, modify or distribute them. When browsing or searching on the site, you will see these copyright-restricted texts indicated in the listings. For fuller information about them, download the e-text and read the header or footer of the file, which will spell out the conditions in detail.

R.25. Does Project Gutenberg know who downloads their books?

No, and we don't want to!

Like any Internet transfer, our sites have to know the IP addresses that contact them; without that, no communication is possible. But we do not trace, hold or examine them beyond what is necessary to deal with any problems or maintain logs or statistics. We never identify IP addresses with people.

Further, we encourage people, sites, schools around the world to mirror, or copy, our texts to their sites. Once that happens, we have no control over them, and we never have any idea who or even how many people access them after that.

Even further, we encourage people to distribute the texts on disks, CDs, paper, and any other storage format they can find. We encourage them to convert the texts to other formats, and share them.

For most people reading this, anonymity is probably not an issue, but you may live in a place or time where reading Paine, or Voltaire, or the Bible, or the Koran, is considered suspicious or even subversive. We don't know who you are, and what we don't know, we can't tell.

Currently (mid-2002), by means of DRM (Digital Rights/Restrictions Management) many commercial publishers can make a list of exactly who is reading which of their eBooks. Wedon'tknow, and we don'twantto know.

R.26. I've found some obvious typos in a Project Gutenberg text.How should I report them?

The first thing to remember is that the people who actually make the corrections you suggest are very experienced, and are used to seeing lots of different types of errata reports. So the exact format of your report isn't really very important—just get the report to us in any clear form that we can understand.

Beyond that, here are some tips to avoid misunderstandings.

It's always helpful if you report the full title, etext number, year and filename of the text you are correcting. We have multiple editions and versions of some texts, like Homer's "Odyssey", and unless you tell us exactly what text you mean, we may have to spend some time searching and guessing.

Especially,pleasecheck and report the exact filename of the text. It is amazingly common for people to report problems with abcde10.txt, when abcde11.txt is already posted, and has these and other errors already fixed.

When there are only a few errors, it's usually easiest to cut and paste the line or lines where the error is into your e-mail, with your comment.

It can also be useful to give the line number of the place where the error is, and some people who check texts regularly do this. If this seems natural to you, do it; if it doesn't, don't.

An ideal report for a typical errata list might look like:

Title: The Odyssey, by HomerTranslated by Butcher & LangApril, 1999 [Etext #1728]File: dyssy08.txt

Line 884:back Telemachus, who bas now resided there for a month."bas" should be "has"

Line 1491:Ithaca yet stands. But I wouldask thee, friend, concerning"would" and "ask" are run together here

Line 1563:in his father's seat and the elders gave place to himThis is the end of a paragraph, and needs a period at end.

Line 15346-7:'Hearken to me now, ye men of Ithaca, to thewill say. Through your own cowardice, my friends, haveI think there is something missing between "the" and "will"

But the following would get the job done as well:

In Homer's Odyssey, translated by Butcher and Lang, from /etext99,file dyssy08.txt, I found the following errors:

Telemachus, who bas now residedchange "bas" to "has"

But I wouldask thee,"would ask" run together

and the elders gave place to himneeds period

ye men of Ithaca, to the will say. line missing between "the" and "will"?

Where there are more than a few changes, it may be easiest all round just to submit a corrected version of the file. However, if you do this, please do not re-wrap the paragraphs unless it is really necessary; we need to check your suggestions before reposting, and if the file is very different, it is difficult and time-consuming for us to find your real changes among all of the changes in the lines.

R.27. I've found some obvious typos in a Project Gutenberg text.Who should I report them to?

The Posting Team, who post the books, also make the corrections, and ultimately, the corrections need to go to them.

Many producers put their e-mail addresses in their texts, specifically so that readers can contact them when errors are found. If you see that in your text, you should try to contact the producer first. This is especially true if the corrections aren't obvious, as in the case of missing words. The producer is likely to have the original book, and will probably be able to confirm your corrections without visiting a library. If the book needs the corrections, the producer can then notify the Posting Team.

If you get no response from the producer, or if there is no e-mail address listed, or if the corrections are small and obvious, you can send them to any or all of the Posting Team directly.

R.28. I've reported some typos. What will happen next?

This varies wildly. Sometimes, you may just get a response e-mail in a day or three saying thanks, and that we've fixed the typo. This is normal when you've just reported one or a few obvious typos.

Where there is some text missing, or the changes you suggest are otherwise not obvious, we may have to find someone with an eligible copy of the book to confirm the changes, and that might take time. Normally, you will get an e-mail explaining that within a week.

Sometimes, even though you've noticed only one or two small typos, one of the Posting Team who was looking at it may find many more, and decide that the whole text needs to be re-proofed. This may also take time.

If the text needs a lot of changes, we may post a new EDITION [R.35] of it, with a new filename: e.g. abcde10.txt may become abcde11.txt. In this case, you will receive a copy of the e-mail sent to the posted list announcing the new file. Our current rule of thumb is that we create a new edition when we make twelve significant changes, but we judge each on a case-by-case basis, and especially will usually not make a new edition if the original was posted recently.

R.29. I've got the text file, and I can read it, but it seems to be double-spaced or it has control characters like ^J or ^M at the end of every line.

This is most often seen on Mac or Linux. If you want to dig into why this effect happens, see the FAQ "Why use a CR/LF at end of line?" [V.85].

Perhaps viewing it in a different editor or viewer will help, but it's usually easiest just to globally replace all of the control characters (if you see them) with nothing, or to replace all double line-ends with single line-ends.

R.30. When I print out the text file, each line runs over the edgeof the page and looks bad.

If you have a file ending in .txt from Project Gutenberg, it isusually formatted with about 70 characters per line, and with aCarriage Return/Line Feed pair (also known as a "Hard Return" or a"Paragraph Mark") at the end of every line.

This is the most widely accepted format for text files, but it's not ideal on all computers and all programs. 70 characters per line means that if you are using an unusually large or small font to print it, lines may wrap around or not reach across the page. The hard return means that on some systems, the lines may appear double-spaced.

Unfortunately, we can't advise you how best to format texts on all systems, mostly because we don't know every system! Here are a couple of tips you might try:

If your font is too big or too small, try setting the font to Courier size 10 or Times size 12. It may not be ideal, but it mostly works.

In a word processor, you may be able to remove the Hard Returns, butbeware! if you remove too many, the whole text will become oneparagraph. One common formula for removing the HRs goes like this:1. First, all paragraphs and separate lines should be separatedby two HRs, so that you can see one blank line between them.Where they aren't, as in the case of a table of contents orlines of verse, add the extra HRs to make them so.2. Replace All occurrences of two HRs with some nonsense characteror string that doesn't exist in the text, like ~$~.3. Replace All remaining HRs with a space.4. Replace your inserted string ~$~ with one HR.

R.31. I can read the text file, but a few characters appear as blacksquares, or gibberish.

The text is using some character set that your editor or viewer isn't.For example, the text is using ISO-8859-1, and your viewer is usingCodepage 850—or vice versa. You can see the plain ASCII characters,but non-ASCII characters like accented letters display as nonsense.

Look at the top of the file for a clue to the character set encoding: if it's there, it may help you to find which editor, or font, or viewer you should be using.

R.32. Can I get a handheld device for reading PG texts? Which device should I get?

To read eBooks on a handheld, you need three things: the eBook content itself (which you can get from PG and other sites), a device (which I will sometimes call a PDA, even though technically, the RocketBook isn't a PDA) and the reader software that runs on the PDA.

In mid-2002, there are three main families of handheld devices people use for reading eBooks: Palms, Pocket PCs and RocketBooks (or their successor, REB1100s). In general, it is possible to use any of these in combination with any common type of personal computer.

Palms are very common, especially when you count not just the Palm itself, but PalmOS-based devices from other manufacturers, like:

the Franklin eBookman , the Handspring Visor . the Sony Clie and

Because of the number of makers of PalmOS-based devices, you can buy them with lots of combinations of features—color screen, audio, different memory sizes. Of course, Palms have other applications besides eBook reading. Palms are the smallest and most portable of the three classes, and tend to have the best battery life for travelling, but they also have the smallest screen. Just about all reader software will run on Palms, except the Microsoft Reader, which runs only on Pocket PCs, but you don't need the Microsoft Reader for Project Gutenberg eBooks.

In Pocket PCs, the Compaq iPaq is by far the most common in mid-2002.More expensive and bulkier than a Palm, it does have a bigger screen.Like the Palms, it can perform many functions besides reading eBooks.Only Pocket PCs can support the Microsoft Reader, but this is notnecessary for reading Project Gutenberg eBooks.

The RocketBook, and its successor the Gemstar REB1100, are quite different from the others. These were built specifically for reading eBooks, and do not have additional functions. They are not, technically, PDAs. Their screens are bigger, and excellent for reading, but do not offer color. They also don't offer a choice of readers—the dedicated reader is built-in to the device. Both of them require the eBooks you load to be formatted for their reader, and files made for them usually have the extension .rb for RocketBook. The REB1100 does not come with the RocketLibrarian, which is the program you run on your PC to turn an etext into a RocketBook file, but people are still making .rb files, and the RocketLibrarian is still available and popular among an enthusiastic group of Rocket users. (The REB1200 is entirely different from the REB1100, and, as far as we know, PG etexts cannot easily be transferred to it.)

In summary, the Rocket/REB1100 is a dedicated reader, with a good screen, but limited to what it does.

Palms are relatively cheap and common, with a wide range of options, and the capacity to function as PDAs as well. They can run all common readers except the Microsoft one.

The iPaq has a good color screen, but isbulkier than a Palm, and can run lots of readers, including theMicrosoft one, but not all Palm readers are available for Pocket PC.Like Palms, the iPaq can do other jobs besides displaying eBooks.

Different people make different choices among these for reading their eBooks, and they all work well; it's a matter of personal taste.

R.33. How can I read a PG eBook on my PDA (Palm, iPaq, Rocket . . .)

To read a book on your PDA, you need to get the file into a format that your reader software understands. Each PDA reader program will work only with a specific format of file. Some will read several formats, but, in general, it's a jungle of competing options.

Unless you use a Rocket or REB1100, you will need to install at least one reader program, and many veteran readers install two or three to deal with different formats. There are many of them available. In a recent internal poll of Gutenberg volunteers who use PDAs,

C Spot Run ,Mobipocket ,PalmReader Plucker

were our favored choices for reader programs.

Further, the process may be different depending on which reader software you're using. Each format that a reader understands has one or more converter programs that run on your PC, and turn the plain text file into that format. So in general, you have to:

1. Download the PG text2. Edit the text for the layout the converter wants (often HTML).3. Use the converter to create a file of the format the reader wants.4. Transfer the converted file to your PDA.

If all this sounds too complicated, remember that many people take and convert PG texts into many formats, and offer them for download from their sites. Of course, there is no guarantee that someone will have converted the particular eBook you want, but there are lots of options. Try Blackmask , which lists thousands of texts already converted for Mobipocket, iSilo, RocketBook and the Microsoft Reader.

There are many other sites that serve pre-converted PG texts.

MemoWare is also a useful resource for converted eBooks, and has lots of information, including an excellent map of the readers and formats jungle at

Tecriture hosts a service that downloads and converts PG texts on the fly, and delivers them straight to you.

If you're "rolling your own", you'll probably need to convert our plain texts to HTML at some point, because a lot of converters require HTML as input, and this is a common theme in readers' explanations of how they get texts onto their PDAs. Don't panic! You don't have to be a HTML wizard to do this—in fact, you don't need to know anything about HTML at all! Usually, it's just a matter of removing some line ends and Saving As HTML. You won't get a lot of fancy markup, or images out of thin air, but you will get the book.

One of the main things you usually have to do in making HTML is unwrap the lines. If you're making your HTML manually, this is usually done by replacing two paragraph marks with some nonsense marker like @@Z@@, replacing all single paragraph marks with a space, and replacing the nonsense marker with a paragraph mark. After unwrapping, the text can just be Saved As HTML.

There are some applications that specifically assist with auto-converting text into HTML:

GutenMark was specifically written for the purpose, and knows enough about PG conventions to do a very good job.

InterParse is a Windows-based generic text parser that is very easy and intuitive to use.

The World Wide Web Consortium lists some other options at

If you're using a RocketBook or REB1100, you don't have either the choices or the confusion to deal with. One of our volunteers who uses a RocketBook offered this recipe for getting a PG text onto a RocketBook:

On converting to Rocket:

1. Download text file.2. Using your utility for showing formatting, enter your wordprocessing program's edit mode.3. Replace all double paragraph marks with some nonsense sequencethat can't possibly actually be there, such as @@Z@@.4. Replace all single paragraph marks with one single space(enter).5. Replace your nonsense sequence with one paragraph mark.6. Convert all your double spaces to single spaces. Repeat thisuntil you get "0" for how many replacements were made.7. Save in HTML.8. Go into your Rocket Librarian. Use "import file using RocketLibrarian." Go and pick up the file, which will be automaticallyconverted to .rb in this process.

This sounds long, but it usually takes me under three minutes except for a very long text. I've never taken longer than five minutes. You can just go in and pick up the text file with Rocket Librarian, but what you get onscreen doing this looks very odd. Steps 2-7 are not essential, and if I'm in a hurry to read something once I might skip them, but if it's something I know I want to keep I use them.

This formula is not ideal for poetry or blank verse—if you want to keep the lines unwrapped, you should avoid removing the paragraph marks.

Another volunteer, who reads on Mobipocket offered this suggestion:

I use the MobiPocket Publisher, available free from www.mobipocket.com. It wants to take a HTML file as input, so the first thing I have to do is convert my PG text to HTML.

I usually do this by running GutenMark, available at . I can also do it in Microsoft Word using the following sequence:

Edit / Replace / Special and choose Paragraph Mark twice (or, from replace, you can type in ^p^p to get two Paragraph Marks) and replace with @@@@. Replace All. This saves off real paragraph ends by marking them with a nonsense sequence.

Now ReplaceoneParagraph Mark (^p) with a space. Replace All. This removes the line-ends.

Finally, replace @@@@ withoneParagraph Mark. Replace All. This brings back the Paragraph Ends.

Now I can Save As HTML.

GutenMark does a better job of converting to HTML than my simple Word formula, since it recognizes standard PG features, and sometimes Mobipocket doesn't like the HTML produced from Word—it complains of a missing file, or doesn't recognize quotation marks.

Having got my HTML file, I open Mobipocket Publisher, choose "Project Gutenberg", Add the File I created, and just Publish it to MobiPocket .PRC format. Then I pick it up on my iPaq the next time I sync. The whole process takes two or three minutes, and the results, since I discovered GutenMark, are good.

I recently came across InterParse 4 at . It doesn't have the built-in knowledge of GutenMark, so the results aren't as good, but it's really easy to use, and you can see the effect of your changes onscreen as you do it. For most PG books, all you have to do is just Open the text file and choose Options / Remove all CRLFs (Except at Paragraph End), then Convert / Text to HTML and Save As the HTML filename you want. Quick and painless.

About the Files:

R.34. What types of files are there, and how do I read them?

The vast majority of our files are plain text. You can read these with any editor or text viewer or browser. Some are HTML. You can read these with any browser.

For a full listing of other file types as of mid-2002, and how to read them, please see the Formats FAQ [F.2].

R.35. What do the filenames of the texts mean?

PG files are named for the text, the edition, and the format type.

As of February, 2002, all PG files are named in "8.3" format—that is, up to eight characters, a dot, and three more characters.

The first five characters in the filename are simply a unique name for that text, for example, "Ulysses" by Joyce begins with "ulyss".

If the text has been posted as both a 7-bit and 8-bit text, then thefirst character of the filename will be a 7 or an 8, to indicate that.For example, we have both 7crmp10 and 8crmp10 for Dostoevsky'sCrime and Punishment.

The 6th and 7th characters of the name are the edition number—01 through 99. We normally start at edition 10 (1.0); numbers lower than that indicate that we think the text needs some more work; numbers higher than that mean that someone has corrected the original edition 10.

The 8th character of the filename, if it exists, indicates either the version or the format of the file. When we get a different version of the text based on a different source, we give it an a, b, c, as for example if the text is from a different translation. Where we have posted a text in a different format, we also add an eighth character—"h" for HTML, "x" for XML, "r" for RTF, "t" for TeX, "u" for Unicode are established formats. There have been some experimental postings with "l" for LIT, and "p" for either PRC or PDB.

So, for example:

7crmp10 is our first edition of Crime and Punishment in plain ASCII8sidd10 is our first edition of Siddhartha, as an 8-bit textdyssy10b is our first edition of our third translation of Homer'sOdyssey, in plain ASCIIjsbys11 is our second edition of Jo's Boys, in plain ASCIIvbgle10h is our HTML format of our first edition of Darwin'sVoyage of the Beagle7ldv110 is our 7-bit ASCII version of the first volume of theNotebooks of Leonardo da Vinci

To make it worse, we don't always stick to these rules, for example:

1ddc810 is our first edition of the first book of Dante'sDivina Commedia in Italian, as an 8-bit text80day10 is our first edition of Verne's Around the World in 80 days,in plain 7-bit ASCII in English.emma10 is our first edition of Jane Austen's "Emma"—with a4-character basename instead of 5.

Some series have special, non-standard names. Shakespeare is named with a digit representing the overall source (First Folio, etc), then "ws", then a series number, so for example 0ws2610, 1ws2610 and 2ws2610 are all versions of "Hamlet". The Tom Swift series is named with a two-digit prefix denoting the series number, then "tom", so for example 01tom10 is "Tom Swift and his Motor-Cycle".

And what should we do with a text from a different source that is formatted as HTML? For example, if dyssy10b is the name of the third translation, what should the HTML version be named? dyssy10bh is obvious, but it uses 9 characters.

The problem, of course, is that we are trying to fit a lot of information into an 8-character filename, and as the collection grows, and the number of formats and versions increases, we come across more pressure on filenames, so while the filename is a good guide to the contents, it's not definitive.

R.36. What is the difference within PG between an "edition" and a "version"?

We give the name "edition" to a corrected file made from an existing PG text. For example, if someone points out some typos in our file of "War and Peace", we will fix them, and, if enough are found to warrant a "new edition", then instead of just replacing the file wrnpc10.txt, we may make a new file wrnpc11.txt, and leave the original alone. A new edition is always filed under the same year and etext number as the original—it's just an update.

We give the name "version" to a completely independent e-text made from the same original book, but a different source. For example, Homer's Odyssey was translated by many different people, but they all worked from the same book. The translations by Lang, Butler, Pope and Chapman are very different, but they all come from the same root.

Thus, these are all "versions" of Homer's Odyssey. We give them all the same basename—dyssy—and each gets a new number, but we keep the original basename, and add a letter to the filename to indicate that they are "versions" of the same original book:

dyssy10.txt Butler's Translation dyssy10a.txt Butcher & Lang's Translation dyssy10b.txt Pope's Translation

The differences don't have to be as extreme as this for us to create a new version. "Clotelle"/"Clotel", for example, was a book published multiple times in English by William Wells Brown, and each time, he changed the text. We preserve three different texts of the same book as different versions: clotl10 clotl10a and clotl10b.

R.37. What is the difference between an "etext" and an "eBook"?

If there is any, it seems to be in the eye of the Marketing Department! Michael Hart started the whole thing, and coined the word "Etext". The term "eBook" is gaining in popularity, even for texts that are not full books, so we've started using that more now.

R.38. What are the "Etext/Ebook numbers" on the texts?

These are simply a series of numbers. We give one to each etext as it is posted, so the earliest etexts have low numbers and later etexts have higher numbers. Etext number 1 is the Declaration of Independence, the first text that Michael Hart typed in to the mainframe that he was using in 1971.

A few numbers are reserved for books that we hope to have in the PG archive someday; for example, 1984 is reserved for Orwell's classic.

When we improve an text by making some corrections, we call it a new EDITION, and it keeps the same etext number, but when we post a different VERSION of the same text, from a different paper book—like different translations of Homer's Odyssey—each new version gets a new etext number.

R.39. What do the month and year on the text mean?

Project Gutenberg sets a production target for itself. The idea is that we try to produce X texts in a month, and we date the texts according to what month of our schedule they appear in. For example, if our target for September 2000 was 50 texts, and we actually produced 55, then the last five would be dated October 2000, and we'd get a head-start on the month. At the time of writing, in July 2002, that target is the publication of 200 books per month. However, our actual production has far outpaced our targets, with the result that the "head-start" has accumulated so much that we are currently releasing books scheduled for March, 2004!

The fact that we're so far ahead of schedule makes this quite confusing for newcomers. If it bothers you, just don't think about it! But at least it's better than beingbehindschedule. We didn't always produce so many books. In the September 1994 newsletter, Michael Hart wrote:

As always, I am terrified of the prospect of doubling our output to 16 Etexts per month for next year, we really need your help!!!

That was when the Project's target was 8 Etexts per month. Today, our target is heading towards 8 eBooks perday!

Copyright FAQ

C.1. What is copyright?

Copyright is a limited monopoly granted to the author of a work. It gives the author the exclusive right, among other things, to make copies of the work, hence the name.

C.2. Does copyright differ from country to country? From state to state?

Copyright laws are constantly changing all over the world. Each country has its own copyright laws, some within the framework of international treaties, some not. Within the U.S., copyright laws are federal, and do not vary from state to state.

C.3. What are the copyright laws outside the U.S.?

Sorry, we can't advise on copyright law outside the U.S. We can point you to resources like which tries to summarize the various copyright regimes, but we can't guarantee that these are accurate. Even when they are accurate, it is very hard to express some of the subtleties of copyright law in a summary—for example, the question of what constitutes "publication" for copyright purposes is sometimes unclear.

C.4. Why does Project Gutenberg advise only on U.S. copyright issues?

The Project Gutenberg Literary Archive Foundation is registered in the U.S. as a 501(c)(3) organization, and our two posting servers are situated in the U.S., so we are subject to U.S. copyright law, and only to U.S. copyright law.

Because copyright laws are so tangled and different between countries, not only in the broad sweep but also in the detail, and because Project Gutenberg is subject only to U.S. copyright law, we just don't have the expertise, time or resources to research and advise on the law in other countries.

C.5. I don't live in the U.S. Do these rules apply to me?

Your country's copyright laws are different from those in the U.S., and understanding and dealing with them is up to you. If you have a book that is in the public domain in your country, but not in the U.S., it is perfectly legal for you to publish it personally there, but we can't.

Similarly, it may be legal for us to publish it here, but not for you to publish it, or perhaps even copy it, where you are.

There are organizations in other countries operating in more liberal copyright regimes that may be able to publish texts that we cannot. For example, Project Gutenberg of Australia at can accept many works not eligible in the U.S.

C.6. What is the public domain?

The public domain is the set of cultural works that are free of copyright, and belong to everyone equally.

C.7. What can I do with a text that is in the public domain?

Anything you want! You can copy it, publish it, change its format, distribute it for free or for money. You can translate it to other languages (and claim a copyright on your translation), write a play based on it (if it's a novel), or a novelization (if it's a play). You can take one of the characters from the novel and write a comic strip about him or her, or write a screenplay and sell that to make a movie.

You don't need to ask permission from anyone to do any of this. When a text is in the public domain, it belongs as much to you as to anyone.

(However, when some character or part of the work is also trademarked, as in the case of Tarzan, it may not be possible to release new works with that trademark, since trademark does not expire in the same way as copyright. If you propose to base new works on public domain material, you should investigate possible trademark issues first.)

C.8. How does a book enter the public domain?

A book, or other copyrightable work, enters the public domain when its copyright lapses or when the copyright owner releases it to the public domain.

U.S. Government documents can never be copyrighted in the first place; they are "born" into the public domain.

There are certain other exceptional cases: for example, if a substantial number of copies were printed and distributed in the U.S. before March, 1989 without a copyright notice, and the work is of entirely American authorship, or was first published in the United States, the work is in the public domain in the U.S.

C.9. How does a copyright lapse?

Copyrights are issued for limited periods. When that period is up, the book enters the public domain.

Copyrights can lapse in other ways. Some books published without a copyright notice, for example, have fallen into the public domain.

C.10. What books are in the public domain?

Any book published anywhere before 1923 is in the public domain in the U.S. This is the rule we use most.

U.S. Government publications are in the public domain. This is the rule under which we have published, for example, presidential inauguration speeches.

Books can be released into the public domain by the owners of their copyrights.

Some books published without a copyright notice in the U.S. prior toMarch 1st, 1989 are in the public domain.

Some books published before 1964, and whose copyright was not renewed, are in the public domain.

If you want to rely on anything except the 1923 rule, things can get complicated, and the rules do change with time. Please refer to our Public Domain and Copyright How-To at for more detailed information.

C.11. My book says that it's "Copyright 1894". Is it in the public domain?

Yes.

Its copyright date is 1894, which is before 1923, so its copyright has lapsed.

C.12. How can a copyright owner release a work into the public domain?

A simple written statement, which may be placed into the work as released, is sufficient. When a copyright holder places a book into the public domain and wants PG to publish it, all we need is a letter [V.70] saying that they are or were the holder of the copyright, and that they have released it into the public domain.

C.13. When is an author not the owner of a copyright on his or her works?

An author may sell, assign, license, bequeath or otherwise transfer his or her copyright to another party, such as a publisher or heir.

C.14. What does Project Gutenberg mean by "eligible"?

A book is eligible for inclusion in the archives if we can legally publish it.

We can legally publish any material that is in the public domain in the U.S. [C.10], or for which we have the permission of the copyright holder.

C.15. I have a manuscript from 1900. Is it eligible?

Maybe not.

Works that were created but not "published" before 1978 will not enter the public domain before the end of 2002. This gets complicated, and it's not too common. If you have such a case, ask about it.

A borderline example is the classic "Seven Pillars of Wisdom" by T. E. Lawrence, which was actually printed and privately distributed, but not "published", in 1922. We haven't been able to confirm any pre-1923 "publication" for this.

C.16. How come my paper book of Shakespeare says it's "Copyright 1988"?

Shakespeare was published long enough ago to be indisputably in the public domain everywhere, so how can a Shakespeare text be copyrighted?

There are two possibilities:

1. The author or publisher has changed or edited the text enough to qualify as a "new edition", which gets a "new copyright".

2. The publisher has added extra material, such as an introduction, critical essays, footnotes, or an index. This extra material is new, and the publisher owns the copyright on it.

The problem with these practices is that a publisher, having added this copyrighted material, or edited the text even in a minor way, may simply put a copyright notice on the whole book, even though the main part of it—the text itself—is in the public domain! And as time goes on, the number of original surviving books that can be proved to be in the public domain grows smaller and smaller; and meanwhile publishers are cranking out more and more editions that have copyright notices. Eventually it becomes harder and harder to prove that a particular bookisin the public domain, since there are few pre-1923 copies available as evidence.

Among the most important things PG does is preventing this creeping perpetuation of copyright by proving, once and for all, that a particular edition of a particular bookisin the public domain, so that it can never be locked up again as the private property of some publisher. We do this by filing a copy of the TP&V, the title page where the copyright notice must be placed, so that if anyone ever challenges the work's public domain status, we can point to a proven public domain copy.

C.17. What makes a "new copyright"?

1. New edition

When a text is in the public domain, anyone—from you to the world's biggest publisher—can edit it and republish the edited version. When the edits are substantial enough, the edited work is deemed a "new edition", and gets a new copyright, dating from the time the new edition was created.

How substantial must the edits be to qualify as a "new edition"? That is for a court to decide in any particular case. Changing some punctuation or Americanizing British spelling would not qualify a work for a new edition. Theorizing something about Shakespeare and rewriting lots of lines in "Hamlet" to emphasize your pointwouldmake a new edition. In between those extremes is a grey area, where each new edition would have to be considered on a case-by-case basis.

A special case, that isn't quite a new edition, is when someone "marks up" a public domain text in, for example, HTML. Where this happens, the text is in the public domain, but the markup is copyrighted. We've already seen that when an editor adds footnotes to a public domain text, he owns copyright on the footnotes but not on the text: similarly, when he adds markup to the text, he owns copyright on the markup.

2. Translation

Translation is a common and justified special case of a new edition. When someone translates a public domain work from one language to another, they get a new copyright on the translation (but not on the original, of course, which stays in the public domain so that lots more people can use it.)

C.18. I have a 1990 book that I know was originally written in 1840, but the publisher is claiming a new copyright. What should I do?

From a practical point of view, there's not much you can do about it. It's a Catch-22 situation: in order to prove that the new printing should be in the public domain, you need a provably public domain copy to compare against the allegedly copyrighted edition, and if you have that, you don't need the modern edition anyway.

C.19. I have a 1990 reprint of an 1831 original. Is it eligible?

Yes, as long as we canshowthat it is a reprint, which usually means that it has tosaythat it's a reprint somewhere on the TP&V.

However, we need to be very careful in a case like this. Commonly, the book itself is eligible, but introductions, indexes, footnotes, glossaries, commentaries and other such extras may have been added by the modern publisher, so you should not include them except where you can prove that they are part of the reprinted material.

C.20. I have a text that I know was based on a pre-1923 book, but I don't have the title page. Can I submit it to PG?

Unfortunately, no.

What you "know" isn't proof that we could take into court if we were challenged about it in 20 years, and the whole problem of "new copyright" [C.17] makes it effectively impossible to tell for sure what is and isn't copyrighted anyway, without reliable evidence like the title page.

You need to find a matching paper edition for proof. See the FAQ "I've found an eligible text elsewhere on the Net, but it's not in the PG archives. Can I just submit it to PG?" [V.62]

C.21. How does Project Gutenberg "clear" books for copyright?

Usually, we just look at the TP&V. If it was published before 1923, or says it is a reprint of a pre-1923 edition, that's all we have to do.

In other cases, we may look up library publication data to prove, say, that a book published in the U.S. without a copyright notice was indeed published in the years when a copyright notice was required. Or we may simply see that a particular text was published by the U.S. Government.

The bottom line is the question: if someone comes to us claiming to hold the copyright on a text, do we have proof to show that they're wrong?

Whatever proof or search we have to do, we then file it, either on paper or electronically, so that the proof will be available in 20 or 50 years' time, or whenever the challenge is made.

C.22. I want to produce a particular book. Will it be copyright cleared?

If it was published before 1923, you will have no problem with its clearance. If you're relying on one of the other rules, it may just be too much work to try and prove its public domain status.

C.23. I have some extra material (images, introduction, preface, missing chapter) that should go into an existing PG text. Do I have to copyright-clear my edition before submitting it?

Yes.

Otherwise we would have no proof that the extra material you're adding isn't copyrighted by someone. It's quite common for modern publishers to add introductions or illustrations to a public-domain novel, and we need the same standard of proof for these additions that we do for the main text.

This doesn't apply to an occasional word or two that was omitted by mistake when the text was first typed. For example, you don't need to clear another edition just to restore the words "thus perfected the" and "eliminating all" to the sentence:

And while we Country, we were also sorts of tediums, disputable possibilities, and deadlocks from the game.

while fixing typos.

C.24. I see some Project Gutenberg eBooks that are copyrighted. What's up with that?

Authors or publishers may grant Project Gutenberg an unlimited license to republish their works. In this kind of case, the copyright holders still retain their rights, but grant permission for us to share these eBooks with the world.

These copyrighted PG publications can still be copied, but the permissions granted are spelled out in their headers, and usually forbid anyone to republish them commercially.

C.25. What are "non-renewed" books?

Works published before 1964 needed to have their copyrights renewed in their 28th year, or they'd enter into the public domain. Some books originally published outside of the US by non-Americans are exempt from this requirement, under GATT. Some works from before 1964 were automatically renewed.

C.26. How can I get Project Gutenberg to clear a non-renewed book?

As of mid-2002, you probably can't. Because of all of the checks we need to do to ensure that the book wasn't renewed, or wasn't one of the exceptions that was automatically renewed, we just don't have the time to do it. But we're working on it. Right now, we're processing copyright renewal records with the aim of making them searchable.

Volunteers' FAQ

About the Basics:

V.1. How do I get started as a Project Gutenberg volunteer?

What you actually need to do to produce a PG text can be stated very simply:

1. Borrow or buy an eligible book. 2. Send us a copy of the front and back of the title page. 3. Turn the book into electronic text. 4. Send it to us.

That's it! All the rest of the producing parts of the FAQ are about the details of how different people approach these steps.

Different people find their own ways into PG work, and once in, find their own niches. If you have your own ideas, don't let anything here stop you from pursuing them.

Some people just read the FAQs, go up to their attic, pull an eligible book off the shelf, send TP&V [V.25] in, and start typing or scanning. Next time we hear from them is when they send in [V.46] the completed eBook for posting. It can be as simple as that.

Some people just download existing PG texts, re-proof them very carefully and send in corrections.

Some people find regular collaborators through gutvol-d or the Volunteers' Board or the distributed proofing sites, earn a reputation as reliable proofers, and continue working as proofers.

Most people start small, and after a little experience of distributed proofreading or other proofing, begin their PG career as producers.

If you're a typist, cheer now, because you can ignore all the complicated paraphernalia of computer interfaces, and scanners, and the quality of OCR software and the mistakes it makes. You can just sit down at the keyboard with your eligible [V.18] book.

If you're not a typist, start thinking about scanners. It may be a while before you're ready to start scanning for yourself, but it's never too early to find out about them.

As soon as you have a solid grasp of how to turn a book into an etext, please start thinking about how you're going to become a producer. While proofing work is valuable, PG can only add books when someone makes the effort to actually make etexts from them, and the people who run distributed and co-operative proofing projects have to do a lot of work before and after the proofing step; we want to spread that around as widely as possible. Project Gutenberg needs more producers!

Whatever you do,don'tjust hang around expecting someone to offer you a task to undertake. There is no "head office" where overworked staff occasionally need interns to do filing and odd-jobs. There are maybe 200 fairly regular contributors to PG, producers and significant proofers. We almost never meet each other in person. We have jobs, and families, and other interests. We work for PG when we can, and when we want to. In many ways, you could look at us as 200 unrelated people, each doing our own etext project, using Project Gutenberg as an umbrella group that sets loose standards, files copyright proofs and provides secure placement for the finished texts. Since we each have our own self-assigned single-person tasks, there isn't too much room to delegate some of that work to a beginner. By all means, volunteer for some tasks—on the Volunteers' Board, or in gutvol-d—but you should think in terms of defining your own tasks, and making your own contribution.

Orientation.

Absolutely everyone—scanners, typists, proofers—should first spend some time working on a distributed or co-operative proofing project. This will allow you to get a feel for what happens in making an etext from paper pages without committing you to more than a few hours' work.

This is not in any way an institutional requirement, since we don't have any institutional requirements, but it is very good advice. Many volunteers start eagerly, wanting to do lots of PG work, and then drop out because they took on too much, too fast, without understanding the nature of the work. Don't let that happen to you. Take it in small chunks.

Check out these distributed proofing sites:

Charles Franks: JC Byers: Dewayne Cushman:

and spend a few hours over a couple of weeks just processing some pages for real.

While you're doing that, you should also join a couple of PG mailing lists [V.12]—gutvol-d and either the weekly or monthly Newsletter list. Reading these will start to get you connected to what's going on. Browse the Volunteers' Board—there may be some offers going, and there's a lot of experience captured in some of those "back-issues", so don't confine yourself to the front page.

Inform yourself on e-text issues generally, not just within Project Gutenberg. Explore The On-Line Books Page and the IPL [R.5] and from them find other eBooks available on-line.

Have a look at our In-Progress List and some lists of suggestions from others [B.4].

Look at sites like Blackmask and Pluckerbooks and Memoware and Bookshare to learn how our work is being used as a basis and copied and converted and amplified in many other projects.

Above all, READ a few Project Gutenberg eBooks! You don't have to read them in full; you don't need to spend weeks poring over Dostoyevsky or studying Shakespeare. Just download a few and skim them—you'll absorb what a PG text should be quite painlessly, and maybe you'll get caught up in the story! If you're looking for light reading, and can't think of something that you specifically want, how about these all-time favorites:

The Gift of the Magi, by O. Henry.The Lady, or the Tiger?, by Frank R. StocktonA Christmas Carol, by Charles DickensAlice in Wonderland, Lewis CarrollAnne of Green Gables, by Lucy Maud MontgomeryThe Marvelous Land of Oz, by L. Frank BaumA Princess of Mars, by Edgar Rice BurroughsHeidi, by Johanna SpyriA Connecticut Yankee in King Arthur's Court, by Mark TwainBlack Beauty, by Anna SewellTarzan of the Apes, by Edgar Rice BurroughsTom Swift and his Motor-Cycle, by Victor AppletonRebecca Of Sunnybrook Farm, by Kate Douglas WigginLittle Lord Fauntleroy, by Frances Hodgson BurnettAesop's FablesGrimms' Fairy TalesThe Art of War, by Sun TzuDracula, by Bram StokerSwiss Family Robinson, by Johann David WyssThe War of the Worlds, by H.G. Wells

If you have a taste for detectives and mysteries, there's

The Adventures of Sherlock Holmes, by Arthur Conan DoyleMonsieur Lecoq, by Emile GaboriauThe Mysterious Affair at Styles, by Agatha ChristieArsene Lupin, by Edgar Jepson & Maurice LeblancEdgar Allen Poe's "The Gold-Bug" and"The Murders in the Rue Morgue" in The Works of Edgar Allan Poe V. 1

For the excessive buckling of various swashes, see:

The Prisoner of Zenda, by Anthony HopeThe Man in the Iron Mask, by Dumas, PereThe Three Musketeers, by Alexandre DumasTreasure Island, by Robert Louis StevensonThe Scarlet Pimpernel, by Baroness Orczy

Effen youse got a hankerin' for a Western, there's:

Riders of the Purple Sage, by Zane GreyThe Virginian, Horseman Of The Plains, by Owen WisterBack to God's Country, By James Oliver CurwoodSelected Stories by Bret HarteJean of the Lazy A, by B. M. Bower

Or if you prefer your fiction more domesticated, there's:

Little Women, by Louisa May AlcottPride and Prejudice, by Jane AustenThe Warden, by Anthony TrollopeThe Heir of Redclyffe, by Charlotte M YongeMother, by Kathleen Norris

For something to raise a smile, you can rely on:

The Devil's Dictionary, by Ambrose BierceThe Wallet of Kai Lung, by Ernest BramahThe Importance of Being Earnest, by Oscar WildeThree Men in a Boat, by Jerome K. JeromePiccadilly Jim, by P. G. Wodehouse

If poetry is your thing, you have lots to choose from:

Shakespeare's SonnetsProject Gutenberg's Book of English VerseThe Home Book of Verse, edited by Burton StevensonThe Complete Poems of Henry Wadsworth LongfellowLeaves of Grass, by Walt Whitman

Now, that's just a handful from our over 5,000 eBooks, so don't tell me you can't find anything to read! If you do have ideas of your own, download GUTINDEX.ALL or PGWHOLE.TXT and browse through the whole list, or Browse by Author on the website at .

Download a few. Read them on your PC, or reformat them and print them out, or convert them for your PDA. Get used to working with and formatting text. Look at the formatting decisions that earlier volunteers have made—they're not entirely consistent; different people make different choices, different books require different methods, and PG conventions have shifted slightly over the last 10 years—but they're all perfectly readable and convertible today.

If you find typos [R.26] in any of them, tell us! That's also a part of being a Gutenberg volunteer. Our eBooksimprovewith time!

If you're thinking of making the best use of your time looking for errors in posted texts, a good start would be to download 40 or 50 texts, and run a spelling checker and gutcheck [P.1] on them all, spending only 5 or 10 minutes on each. Having had a quick look at all of them, concentrate on the ones that seem to have most problems—where automated checkers see 10 problems, a careful human will usually be able to pick up 20.

Getting Productive

OK, so you've seen what etexts should look like, you know what we do, and proofing hasn't scared you off. It's time to step up and become a producer. If you're not a typist and you don't have a scanner, take a detour down to the Scanning FAQ [S.1] now, and come back when your scanner is set up. If you're a typist or you've already got a scanner, read on . . .

Get a book. Just do it, OK?

Ya gotta start somewhere, right? And finding an eligible book is definitely somewhere.

Finding an eligible book is a threshold for many beginning volunteers—it's the first major step on the way to producing. For a lot of people, it's also the toughest barrier they have to cross. Fortunately, the barrier is only psychological, and can be crossed in a few minutes.

It's an unfamiliar process, and one that a lot of beginners feel some anxiety about. Don't. It's quite straightforward: it's just buying a book—you've done that, haven't you? Don't over-think it, don't worry about whether you're making the "right" choice, don't spend months comparing lists and choosing. Just do it. Once you've got your first, you'll wonder what all the fuss was about. Thanks to the wonders of the internet, your book can be on its way to you in an hour if you have $20 to spend.

Typists blessed with a good local library don't even have to buy their books—they can just borrow one and type it up! (You may be able to scan a library book, but get some experience with scanning first, and avoid damage!)

Let's deal with the decisions and other issues of picking one.

Copyright

For your first book, don't try getting fancy with copyright issues. Choose one that was published before 1923, and you're in the clear for U.S. and PG copyright purposes. You can read the dates just as well as we can—with books printed before 1923, there are no hidden catches: "Pre-'23 is free". Just read the TP&V [V.25] of the book, and see that it was printed before 1923, and you have no problems. Of course, reprints [V.19] of books copyrighted pre-1923 (and various other cases) are also clear, but if you have any concerns, just stick to pre-'23 editions.

Which book?

The answer to this question is different for everyone, but see how much you agree with the following statements:

"I have a favorite book, and I'd really like to produce that."

Well, hey, this is no problem! You already know what you want.Go check out whether the book is already on-line [V.29].

"I'd like to work on an important book, but I don't know which."

Well, everybody's definition of "important" is different, but some people have put their various ideas forward already; you can see whether you agree with them! The InProg List contains some, with the notation "Suggested book to transcribe" beside them. Steve Harris keeps a list of unproduced possibles at Steveharris.net. John Mark Ockerbloom's "Books Requested" page lists titles that people have asked for. [B.4] Your problem if you fall into this category is that other people probably wanted to produce "important" books too, and lots are already done.

"I just want an easy, trouble-free book to start with."

Your first book doesn't have to be War and Peace (we've already got that anyway!). Here's a tip: try looking for children's or what we would nowadays call "Young Adult" books. These are typically short, and may have large print, which makes life much easier if you're scanning. They age well: children's stories from a century or more ago are still readable and interesting to children today. We have many children's and YA eBooks: not just the classics like Grimm and Andersen and Heidi and Oz and Peter Pan and William Tell, but lesser-known but still enchanting stories like The Counterpane Fairy, or Lang's Fairy books. There are series, like the Motor Girls, or the (Country) Twins series, or the Bobbsey Twins. There is lots and lots of material here for you to start with, and these books are relatively plentiful, since they were made to take the kind of treatment children dish out, and many of them have been in school libraries or attics for years.

Whatever your choice, pick a book that you'll like; you'll be living with it up close and personal for a while. Light reading, adventure fiction, and books aimed at younger readers are safe first choices for most people. If you admire 19th Century scientists or scholars, and want to immortalize their work, great! But don't feel that you have to dive in at the deep end just because someone else wants you to.

Getting your book: a practical exercise

The Search

At this point, you've got a list of books—maybe just one, maybe several by an author or two, maybe just a genre like "Children's Books" with some specific ideas. Maybe your mind is still wide-open.

Before used booksellers had the Net, finding a particular old book was a daunting job. Booksellers had informal networks among themselves and exchanged catalogs so that each would know something about what was available elsewhere, but, for a buyer, finding a particular book was still hit-and-miss. Now, however, a number of large sites provide a service to booksellers, where they can list their inventories for people to search from anywhere.

So now we go hunt for them on the Net. No, you don't have to buy them on the Net—you can rummage in booksales and garage sales and used bookstores, and that's its own kind of fun, though on a physical hunt, what you need is to bring a long list of "already done" books with you. But even if you never buy over the Net, it's a vast source of information about what books are available, which are plentiful, and which are cheap. It gives you some experience of what to expect when you do your in-person browsing.

Here's a story of a typical Net-hunt. And you can follow along with it at home. :-) Your results, and the sites you end up at, will be different from mine, but even if you don't end up buying a book on this hunt, you'll get some experience of what's involved. C'mon, do it with me—see if you can find a better bargain!

I'm starting with two lists, and I'll follow up whatever seems promising. I'd like to spend about $20—might go to $30. Definitely not interested in $50 and up. I'm keeping in mind that I'll have to add a bit for delivery—usually up to $10 within the U.S., but can get expensive if you're in Perth, and ordering from a bookstore in Munich.

I'm also avoiding anything that might be tricky to clear on this search, and confining myself to books printed before 1923.


Back to IndexNext