Chapter 3

Of course, by the time you read this, some of these books may already have been produced, so if you're actually thinking of buying any, check carefully first!

My first shortlist consists of books that caught my eye from DavidPrice's In-Progress List, Steve Harris's site, and The On-Line BooksRequested page [B.4], and it reads:

Louisa May Alcott: The InheritanceE. W. Hornung: Irralie's BushrangerE. W. Hornung: StingareeA. A. Milne: The Dover RoadA. A. Milne: Once on a TimeSamuel Richardson: PamelaOscar Wilde: The Critic as Artist

As well as following along with my list, you should try finding two or three books of your own, from those sites or from your own preferences, and search for them in the same ways that I do.

Everyone has their own searching technique and their own favorite sites to search. For this session, I'm opening up three copies of my browser—one for Alibris , one for Abebooks , and one for the Catalog of the Library of Congress . I'll do my initial searches on Alibris and Abebooks, and keep the LoC site handy for reference.

In Alibris, I head straight for the Advanced Search page, since they allow searching by date, and I immediately put "before 1923" into every search, which avoids having to scan through modern reprints. In Abebooks, I choose "Hardcover" in their advanced search, which is not quite as good a filter, but does at least screen out recent paperback editions.

In each of the sites, I just enter the author's surname and one word from the title of each book, and look at the search results.

Louisa May Alcott's "Inheritance" looks like it's going to be tough. I don't find it in either of my two bookstores. On doing a little checking with modern bookstores, I find it was her first novel, written when she was 17, and as far as I can see, not published during her life: apparently only recently published—the LoC site has nothing prior to 1997. A disappointing start to my search. I understand why it's very desirable to get it online, but this one's going to be very tough to clear, and I'm staying away from it.

E. W. Horning's "Irralee's Bushranger" is also elusive: it doesn't show up at either of my sites, so I check out the LoC to confirm I have the title right, and yes, there it is: "Irralee's Bushranger, a story of Australian adventure, 1896." So I widen my search by visiting and searching many of the sites there. Still no luck. If I were particularly eager to get this book, there are several things I might do at this point: I might register a "want" with one of the sites, asking to be notified when a copy is listed, I might use the OCLC WorldCat search (which Abebooks calls "Find it at a local library") where I can locate libraries that have copies, or I might even contact some individual booksellers and make a request that they look for it. Some booksellers actually specialize in looking for hard-to-find books; but of course I expect I'd have to pay a bit more for it when they do find it, and given my success with the rest of my list, and my price bracket, there seems no need to go that far today.

Horning's "Stingaree", by contrast, seems to be everywhere, in several editions, and cheap. It must have been a bestseller in its day—not surprising, from the author of "Raffles". 1902, 1905, 1909 editions abound. The cheapest are 1910 and 1907 editions for $4.95 and $5.00 from booksellers listed at Abebooks.

Milne's "Dover Road" is available from both sites. There seems to have been a Putnam's printing in 1922 of "Three Plays: The Dover Road. The Truth About Blayds. The Great Broxopp." of which lots of copies survive. There also seem to be later printings which would qualify as reprints if I were desperate, but the 1922 edition is priced from $12.00 to $50.00, so I'll take the 1922 $12.00 copy from Abebooks. As a bonus, I don't see the other two plays listed as being online anywhere, so I'll get three texts (and short ones, too!—279 pages for all three) for the price and effort of one.

Milne's "Once on a Time" is a bit less common, but once again a Putnam's printing of 1922 keeps it in the race. There are a couple of booksellers in England selling for 15 pounds (which just about makes my $20 threshold) and 20 pounds, and an ex-library copy going for $25.

There are lots of eligible copies of "Pamela" available, ranging from a fourth edition at a mere $4,999 (no, thanks!) to a 1921 printing at $6.60 at Alibris. I'll take that one, please.

Wilde's "Critic as Artist" is fairly widely available. A 1905 edition of "Intentions: the Decay of Lying; Pen Pencil and Poison; the Critic as Artist; the Truth of Masks" is available at Alibris for $8.80, (and other copies of the same edition there and on Abebooks in the $20-$30 range) and Abebooks lists a London 1919 edition at $12.50. There are several copies listed in both places as "undated" and "reprints"—I'm avoiding these, since while it's quite likely that they might be clearable, I'm not taking risks on this search.

My second list isn't a list—just a vague category: children's books that are easy to do.

I go to Alibris' Advanced Search, and enter "Child's" in the title, and pre-1923 in the date, and, excluding titles already on-line, immediately get:

A Child's History of France $13.20A Child's Story of the Bible $5.50First Lessons in Botany or The Child's Book of Flowers $13.20The Child's Book of American Biography $11.00The Child's First Bible $8.80The Child's Music World $8.80

and so on through quite a list.

OK. That's a good start. But my choice so far is unimaginative. I need better search terms. So I go to main search engines with the terms "children's antiquarian books" and find a half-dozen or so sites that specialize in them. I can browse around there, though it's slower going without searches to focus my results. I find , specializing in children's books. Wading through the miles and miles of Alcotts and Barries and Burnetts, which are mostly already online, I think, I find a couple of authors from them who must have been popular, because they seem to have published lots of books before 1923: Angela Brazil and Dorothy Canfield. (I only got as far as the "C"s!)

I could of course stop here and buy some, but today I want to see what else is out there.

Back at Alibris and Abebooks, armed with my authors to search by, I turn up 4 pre-1923 books under $20 for Angela Brazil:

A Terrible TomboyThe Youngest Girl in the FifthA Fourth Form FriendshipA Pair of Schoolgirls

and several between $20 and $30.

Dorothy Canfield immediately yields multiple copies of:

The Brimming CupHome Fires in FranceHillsboro PeopleUnderstood BetsyRough HewnThe Real Motive

and others, and I haven't even got to $20 yet, nor to the letter "D".

A browse through the Ebay Collectible and Antiquarian Books section also throws up a respectable list of eligibles. I won't even bother counting that.

In 20 minutes, I have found five of the seven on my search list. In less than hour after that, I found over 16 eligible children's books, all under or around $20 and all available online.

Before committing to one, though, I would double-check that the book hasn't been transcribed online, and isn't In Progress.

Double-checking your selection

If you're concerned that the book you have chosen duplicates another that might be in progress, and want to double-check, you can e-mail the Posting Team asking them to check whether any recent clearances have come in for that title.

Duplications do happen—there's no way of avoiding them when different people are making independent decisions—but they are rare.

Dealing with used booksellers

As a class, used booksellers are very pleasant people—remarkably friendly, knowledgeable and helpful, even to people buying on a typical Gutenberger's budget.

Some of them are not, however, models of ideal data organization when it comes to Internet listings. There are lots of one- or two-person operations dealing with an inventory of many thousands of books, and having located your book online, you should check that it's still available.

You can place an order through the site and wait for the confirmation, or you can simply call the bookseller. Not all booksellers' contact details are listed, so it's not always an option, but when you do phone you're likely to be speaking immediately to someone who can tell you for sure whether the book is still there, can pull the book off the shelf and answer questions about it, and can take your credit card details on the spot and dispatch the book immediately.

Copyright Clearance

As soon as your book arrives, send us the information needed for Copyright Clearance first. Even if your book is a true-blue, no-questions-asked pre-1923 edition, we should know about it as soon as possible so that it can go onto the In-Progress list for others to see that someone has started on it.

Wait for the confirmation e-mail before starting any serious work. Some people have thought that "Copyright 1923" plus some wishful thinking would be good enough, and, unfortunately, it isn't. Some people have gone ahead and produced the whole book before sending in the clearance, only to be disappointed, all their work wasted.

Books published in 1922 or earlier are clearable, but some people, ever optimists, overlook that little "1927" in small print on the verso. Sometimes there is no copyright date on the front, and other optimists assume that these books are OK. They may be; they may not be. Don't get caught in the copyright trap.

As soon as you have what you think might be an eligible book, do not start on it. Do not ask another volunteer's opinion. Just send in the TP&V and wait for the confirmation e-mail to find out for sure.

Even when your TP&V clearly says "Copyright 1901", send it in. We need to get it into the clearance files so that we can register it as being In-Progress.

Producing

If you're a typist, there's not much more you need to know from this point: you can just get on with the job, with maybe a few tips from the FAQ. In fact, if you're a typist, you might wonder why the rest of us make such a fuss about scanners, and settings, and OCR. Take pity on us! we just can't produce the way you can. Smile indulgently, ignore all the scanner jargon, and submit your completed text while we're still saying bad words about the guttering on a greyscale image of page 372. :-)

If you are using a scanner to copy a book for the first time, be patient with yourself. Some people start off with too high expectations of what they can achieve. Believe it or not, scanning does work effectively; it just doesn't work perfectly. And often, you need a little practice before your scans work right with your OCR. The Scanning FAQ [S.1] has lots of specific tips you can try. Start by scanning a double-page about a third of the way through the book. Scan in Black and White and in Greyscale, at 300dpi and 400dpi. Try 600 dpi if it seems like a good idea. Put it through your OCR and see what comes out. Move your scanner so that you can be comfortable while placing the book and turning pages. Allow yourself an hour to experiment with different settings, and different pages. Put the sample images included with the Scanning FAQ through your OCR and see how the output compares to the text produced by other packages. That first hour finding out about how your setup works will be the most valuable hour of scanning you will ever do.

Having figured out what settings you want to use for this book, make sure you implement the best speed you can. Usually this means telling the scanner to scanonly as much area as the book covers. This is quite important, since the scanner will by default scan its whole area, and you don't need all that; it just wastes time and makes your images bigger.

You may also be able to set your OCR or scanner software to auto-scan pages with some preset delay, like 5 seconds. This also speeds things up, because the scanner isn't waiting for you to hit the keyboard, and you have both hands free at all times to turn the page and replace the book. It takes a few pages to get into the rhythm; if you miss a page-turn, don't worry—you can get it on the next scan.

Using a reasonably modern but quite ordinary home/office type flatbed scanner, you should be able to scan 200 pages an hour [S.9] of a typical book, at good quality. 400 pages an hour is not unheard-of. Now, it may fairly be said that scanning offers all the fun of ironing, without the sense of adventure :-), but if you have got your settings right, you will probably be able to do the whole job in less than two hours. And now you're really on the road!

V.2. What experience do I need to produce or proof a text?

None.

For producing, you will have to be able to type pretty well, or have a scanner.

For proofing someone else's text, when you don't have a copy of the book in front of you, you should be reasonably familiar with the language used in the book, and the styles of the time—Chaucer's English was quite different from ours, and even 19th Century novelists write some phrases unfamiliar to us today.

That's it. You don't need experience in publishing, editing, or computers.

V.3. How do I produce a text?

There are acres of words in this FAQ about that, but it all boils down to 4 simple steps:

1. Get an eligible book—pre-1923, or one of the exceptions. Pullit from your attic, borrow it from a library or a friend, buy itin your local bookstore, in a flea-market or on-line. We don'tcare which.2. Send us a copy or the front and back of the title page so wecan file proof of copyright clearance.3. Copy the text from the book into a computer text file. We don'tcare whether you type it, scan it, voice-dictate it, or think ofsome totally new way to do it. Just get it into a file.4. Send us the computer text file.

That's all there is to it!

V.4. Do I need any special equipment?

You need the use of a computer of some kind, and Internet access is usual, though we have had some volunteers contribute texts on floppy disks.

If you intend to scan books, you will need a scanner, but if you're just typing or proofing you won't.

V.5. Do I need to be able to program?

Absolutely not! Very little of Project Gutenberg's work involves programming, and it is never necessary to any part of volunteering.

V.6. I am a programmer, and I would like to help by programming.What can I do?

At the risk of sounding facetious, the very best thing you can do is figure out ways that more programming can help Project Gutenberg!

A lot of programmers work on PG books, and anything easy has probably already been done. The challenge for programmers who want to write something that will help to produce etexts is not in writing the code; it's in identifying ways that programs can help.

Please see the FAQ "What programs could I write to help with PG work?" [P.2] for some ideas in this direction. Whatever you do, don't just hang around waiting for someone to ask you to write something, because that's not going to happen. Think up a project, ask volunteers if they would use it, and dig in! Better still, produce a few etexts yourself, using the existing tools, and get a feel for the kinds of problems that new software could help with.

Apart from text production, we do develop some programs to help with posting work, but as of mid-2002, we have nothing like an ongoing programming project which people can join.

V.7. What does a Gutenberg volunteer actually do?

We buy or borrow eligible books, scan, type, and proofread. There are a few other activities, but they consume only a very small fraction of volunteer time.

V.8. Can I produce a book in my own language?

Yes! We want to encourage people to produce books in all languages, and we cheer when we can add a new language to the list.

V.9. Does it have to be a book? Can I produce pieces from a magazineor other periodical?

Magazines, newspapers, and other publications are just fine. For copyright clearance, they work just the same way as a book.

You do need to check the length of your piece [V.17]; we don't want a zillion separate one- or two-page files. If the piece you have in mind isn't long enough, you can add other pieces to it, or even most or all of the magazine. If the work was serialized over multiple issues, you can join them together for your PG text, but you do have to copyright clear every issue of the magazine from which you copy material.

If you have lots of old periodicals, you could even take one piece from several, and make a new text which is a "theme" anthology of those pieces. You can give it an appropriate title: "Civil War Commentaries from X magazine 1892-1898."

V.10. Do Ihaveto produce in plain ASCII text?

Certainly not if it doesn't make sense. To take an extreme example, if you're working in Japanese or Arabic, or creating audio files, there is no point in trying to reproduce that in ASCII!

Where the text can largely be expressed in ASCII, we do want to post an ASCII version, even if it is somewhat degraded compared to the original. However, we will post your file in as many open formats as you want to create, so that your original work is available for those who have the software to read it.

V.11. Where do I sign up as a volunteer?

You don't. We have no formal sign-up process, no list of volunteers, no roll-call. If you produce a PG eBook, or help to produce one, you are a volunteer.

V.12. How do PG volunteers communicate, keep in touch, or co-ordinate work?

We are very scattered geographically: U.S., Australia, Brazil, Taiwan, Germany, South Africa, Italy, India, England, and all over the world, so we can't really meet for coffee on Thursdays. :-)

Most co-operation and co-ordination goes on by private e-mail. This is efficient for volunteers who have worked with each other before, since they know each other's interests and skills, but not so easy for beginners to break in on, since they don't.

The Volunteers' Web Board at is a publicly accessible forum for volunteers or potential volunteers to post any question or information about how to create a PG eBook.

There are a few Project Gutenberg mailing lists. Information about joining them is available on the main site, at .

The Project Gutenberg Weekly and Monthly Newsletters, gweekly and gmonthly, are one-way announcements, which allow PG to communicate with non-volunteers who are interested in the eBooks we produce, but they also contain notes and requests for assistance from volunteers.

The Volunteers' Discussion Mailing list, gutvol-d, is a an e-mail discussion forum for subscribers about any Gutenberg topic.

The Volunteers' List, gutvol-l, is for private announcements for active volunteers.

The Programmers' List, gutvol-p, is for discussion of programming topics.

There are some other, specialized, closed lists for people who do specific work within PG:

The "Posted" List, posted, is for people who perform indexing on our texts. An e-mail is sent to this list every time we post a text (see the FAQ "How does a text get produced?" [V.16] section 5: Notification) and the members of the list use it to update their catalogs.

The Whitewashers' List, pgww, is for Posting Team internal messages.

The Heroic Helpers List, hhelpers, is for people who can devote some fairly regular time to doing odd jobs.

V.13. Where can I find a list of books that need proofing?

There is no central list of this kind. There are distributed proofing projects, currently at

Charles Franks: JC Byers: Dewayne Cushman:

where you can proof parts of a book. This is advisable when you're just starting out because it gives you some feel for what the work is like.

You can also look up existing, posted texts from the archives and proof them. Just as there always seems to be one more bug in any given program, there always seems to be one more typo in any given text! Download a few, and scan quickly for problems by doing a spellcheck or other automated check; if you can find any problems quickly, then there are likely others to be discovered by a careful proofing.

V.14. Is there a list of books that Project Gutenberg wants?

No. Project Gutenberg, as such, does not "want" any specific books. Individual volunteers choose what books to produce. Nobody gives orders to volunteers about what they should work on. Nobody has an official "hit-list" of books to add to the archives.

Of course, individual volunteers and non-volunteers have their preferences, and may suggest books to transcribe, and such suggested lists pop up every so often, and are often useful to people looking for ideas.

There are usually some suggestions in David Price's InProgress list. The On-Line Books Page has a section where people can list requests, and Steve Harris has a site devoted to lists of books not yet in Gutenberg or elsewhere. Treat all of these lists with some caution, since someone may have started or even finished one of their suggestions since they were last updated.

PG Books In Progress On-Line Requested List Steve Harris' "To-do"s

V.15. I have one book I'd like to contribute. Can I do just that withoutsigning up?

Well, since there is no formal sign-up, of course you can! A lot of texts have been contributed by people who just wanted to immortalize one favorite book. Many of them had already created the eBook before they even heard of Project Gutenberg, and we're always delighted to add these to the archive!

About production:

V.16. How does a text get produced?

As stated back in the Basics section, all you need to do is:

Borrow or buy an eligible book.Send us a copy of the front and back of the title page.Turn the book into electronic text.Send it to us.

That's all you actually need to know in order to be a producer. But if you're interested in the details of how other people actually do this, and want to know what else happens behind the scenes, here's a full, blow-by-blow account.

1. Finding an eligible book

Volunteers find eligible books [V.18] in all sorts of ways. Some lucky people have them in their bookshelves, or their attic. A lot of people have a good library nearby, where they can find books, or request them on interlibrary loan. Some people are big eBay fans; others like to hunt for bargains on specialist booksites. And of course lots of volunteers enjoy rummaging through actual used bookstores, or local markets, or yard sales.

Even if you're not going to take on a book yourself right now, search for some on the Net and find out about how to get a copy. Next time you pass an antiquarian bookstore, or a book market, drop in and browse. Ask your local library about interlibrary loans. Eligible books aren't hard to find once you know where to look.

2. Copyright Clearance

New volunteers sometimes find it hard to understand why this is so important, and why, in particular, Project Gutenberg is so careful about it. At base, it's simple: by keeping a filed copy of the TP&V [V.25] of every book we produce, we can at any time protect our publications against claims from publishers that they "own" the work, and thus we can keep them available to the public.

The copyright laws can be difficult to understand, and sometimes it may take serious research to prove that a particular edition is actually in the public domain. If you're not legally-inclined, just keep repeating "Pre-'23 is free" if you're in the U.S.A. and stick to books published before 1923. If you do want to delve deeper, read our Copyright Rules page at and then go on to reading the Library of Congress Copyright Office official papers at . If you're in another country, find out about your own copyright laws.

Volunteers send in the TP&V from the book for us to inspect. This not only gives us the proof to file, it also lets us know that someone is really working on the text so that we can list it as being In Progress for the information of others who might be interested.

3. Scanning, typing, proofing and editing

This makes up the bulk of PG's effort, and is discussed at great length elsewhere in this FAQ. There are many, many ways to create an etext from a paper book, and different people use different methods, but it all boils down to making a text file. For a typical book, it will probably take 40 hours of a volunteer's time. All that happens here is that somebody makes the effort to transcribe one paper book into a file that can be shared around the world and for all time.

4. Posting

[Note: this information is quite specific to the process we go through now. It is quite likely to change as we improve the automation of the tasks.]

Posting is done by the Posting Team. The basic job is to receive the text from the producer, check that it has been copyright cleared, check that it conforms to Project Gutenberg standards, check it for correctness (which can be anything from XML validity to simple spelling), add the Project Gutenberg header and copy the text to the two PG servers.

In a simple case, where everything goes right, this can take as little as fifteen minutes. In a complicated case, where we have to convert formats, or there are a lot of errors in the text, or there are problems with the copyright clearance, it can take hours or even days while we wait for responses, or do a lot of editing, or find conversion tools.

Michael Hart used to do this work entirely alone, but in September 2001, he created the Posting Team to handle the load. (The Posting Team are nicknamed the "Whitewashers" in honor of Tom Sawyer's victims. :-)

Transferring the file

You send the text to us [V.46] either by Web, by FTP with a username and password that any of the Posting Team can give you privately), or by e-mail.

If you're FTPing, you should e-mail one or more of us as well, to let us know what you've uploaded.

One problem is files that don't transfer correctly. Especially by e-mail, some files get damaged on the way. It's better to ZIP the file before sending, if possible, to prevent some common problems with text files. The use of compression formats other than Zip can also create problems. Members of the Posting Team work on multiple platforms—DOS, Windows, Linux, Solaris—and zipping and unzipping programs are commonly available for all of these. Other compression methods, like Stuffit or bzip2, are not so readily available, and may give us trouble.

We login via ssh to beryl, which is the Unix system on which we work when posting, the same one that you FTPed the file to, unzip the file and glance at the top of it.

Checking Clearance.

We then check it for copyright clearance. The one and only absolute rule that we NEVER bend, no matter what, is that we WILL NOT post a file that doesn't have a clearance. If it ain't in the clearance files, it don't get posted.

Most regulars know that they should include their clearance line in the e-mail submitting the text, but not everybody does, and not everybody remembers every time. This can be frustrating, when clearance is not included and not obvious.

When Michael gives you your clearance on a book, he sends you back an e-mail that has just one line, something like this:

The Works Of Homer [Iliad/Odyssey] Tr. George Chapman Jim Tinsley 06/14/01 ok

He saves these lines in files that we posters can access. We regard this information as private, so we don't publish the details of who has cleared what.

When we get the text, we check whether the submitter has cleared it. If there is a clearance line in the e-mail notifying us about the text, there's no problem. If we can find the title of the text under the submitter's name in the clearance files, there's no problem. Unfortunately, sometimes we can't find it. There are two usual reasons: either the text submitted ispartof the work cleared (for example, submitting one play from a collection), or the text hasn't been cleared yet. If the clearance isn't straightforward, we can go back and forth and round and round in e-mails for a while.

This is why it's a good idea to paste the clearance line into your e-mail.

If the title of the text you're sending isn't the same as the title of the text cleared, BE SURE to paste in the clearance line AND explain that the text you're sending is PART of the cleared book. Please also list the titles of the other parts; it really does cause confusion and delay when this is not clear.

Checking and Editing

Sometimes, people send in a book in a non-text format like Word Perfect or Microsoft Word, or send a text with unwrapped lines. In that case, we try to get the submitter to fix them, but if they can't, we have to convert the file to straight text before starting.

Some producers, particularly inexperienced ones, want to add non-standard annotations and mark-up and symbols to the text. This can get ticklish; we don't want to discourage them, but we need to keep texts reasonably standard. Usually, we can work something out. Maybe the book should be added inbothtext and HTML, for example.

Assuming that it's a plain text file, we next run gutcheck and a quick spellcheck on the file. This will tell immediately if it adheres to PG standards and if there is any serious problem with it.

If the file looks clean, we may skim it, looking for potential problems or formatting issues. For clean texts, the only things we usually need to change are unindented quotations or inconsistent chapter headings (a lot of people seem to mix "CHAPTER III" with "Chapter 14" and have irregular numbers of blank lines) or spacing and a few 8-bit characters. Occasionally, we have to rewrap a text. We also look out for included publishers' trademarks, which we normally prefer to remove (trademarks are NOT subject to copyright expiration: Macmillan(TM), the publishing house, is still around and trading), unnecessary or downright odd indentation or centering, stray page numbers, and prefaces or introductions or appendices that may not be in the public domain. If the file has lots of 8-bit characters, we probably need to make a separate 7-bit version, and post both.

If the gutcheck and spellcheck don't look clean, or if conversion is required, we may spend a lot more than 15 minutes on it. In a bad case, we may have to get the file re-proofed.

If you are conscious that you're doing something non-standard, and really mean it to stay, say so in your e-mail. (For example, I recently posted a text containing a family-tree representation that had lines over 80 characters. Now, I would have left that one alone anyway, but it helped that the submitter drew my attention to it in the e-mail.) If it's too non-standard, the poster may not allow it to stay, but at least you can discuss it. When a text needs a lot of non-standard formatting or markup, you really need to ask yourself whether you shouldn't be submitting it in HTML, with all the bells and whistles, and settle for something more normal in the text variant.

Mostly, errors are obvious, and there are at least some obvious errors in most texts. When errors are completely obvious, we just fix them without feedback to the producer unless you have specifically asked for feedback in your e-mail.

We're getting more HTML formats now, which is great, but incoming HTML often needs a lot of work, because people who are not experienced with HTML often make mistakes. The W3C is the official standard for valid HTML, but, for the average volunteer, it's awkward to use. However, if you're submitting a HTML format, please use Tidy, which you can get from , to check your text before sending it.

Header and Footer

We add the PG header and footer. If there is a header and footer already there, we strip them off first, since recent changes in the header mean that a lot of people send files with headers that are out of date. We have written programs to help with this.

We get the number for the text from a program on beryl called "ticket" that Brett Fishburne wrote, that dispenses the next number. That way, if two or three of us are posting at the same time, we won't all grab the same number. We create a 5-letter base filename, checking that it hasn't been used before, and finally zip up the file.

Posting

We now transfer the .ZIP and .TXT files to two servers: ftp.ibiblio.org and ftp.archive.org. (This is usually the point at which we realize that we forgot to make a change we noticed while checking. Aaaargh!)

5. Notification

At this point, the book is posted, but nobody knows about it! We need to do something about that. . . .

We compose an e-mail to the "posted" e-mail list, cc: the producer, with the line that is to go into GUTINDEX.ALL, the master list of PG files.

The "posted" list has only a few subscribers. These are the people who index and create links to PG texts, and include both PG volunteers and the maintainers of other sites that link to PG texts.

They also commonly download the texts to get more information for their indexes, and tell us if there is anything wrong with the files.

This e-mail is simply the official notification to all these people and the producer that the file has been posted. Here's a sample of such an e-mail:

To: "Posted Etexts for Project Gutenberg" Subject: [posted] Posted (#5301, Duncan) !From: "Jim Tinsley" Date: Tue, 25 Jun 2002 06:21:27 -0400 (EDT)Cc: you@example.com

Mar 2004 The Imperialist, by Sara Jeannette Duncan [SJD#4][mprlsxxx.xxx]5301

There may also be some remarks, if the text is in any way non-standard, or if files other than plain text were posted with it.

From this e-mail, you can, if you want to see any corrections made, immediately download the posted file and compare it to your version. Since the notification is madeafterthe file has been copied to the servers, it should be there waiting for you.

To find out how to download a book that has just been posted, see theFAQ "How can I download a PG text that hasn't been cataloged yet?" [R.3]

6. Indexing

From the "posted" list, the posting line is added to GUTINDEX.ALL and our indexers begin the cataloging process, which is much more thorough, for the website. This includes work like finding author's dates of birth & death, getting the Library of Congress classification, and the other information that makes up the website searchable index. That process takes extra time, which is why the website searchable catalog must always lag behind the actual titles posted.

7. Corrections

It's remarkable how many people who went over and over the text to the point of hating it suddenly see problems with it when they download it a couple of days after it's posted! Something psychological there, I expect. Anyhow, if you do download your text and see problems with it, don't worry, just e-mail whoever posted it, or any other member of the Posting Team. No, you're not stupid, or if you are, you're in good company, because we've all done it! There's no big deal about replacing the posted file with a corrected copy immediately.

Over time, other readers may submit corrections. If you find an error in a PG etext, see the FAQ "I've found some obvious typos in a Project Gutenberg text. How should I report them?" [R.26]

When the corrections are small, as most are, we will just make the change to the existing text. If there are a lot of changes, we may post a new edition [R.35] with a new edition number; e.g. if the file abcde10 was corrected, we may post abcde11. We never make a new edition when we get corrections immediately after posting.

V.17. How long must a text be to qualify for PG?

The rule of thumb is that we try not to post texts shorter than 25K, or about 350 lines of 70 characters. This rules out, for example, a lot of individual short poems. If you are interested in contributing this type of material, consider making a collection of similar texts—poems by the same author, or magazine articles on the same subject. We have made a few exceptions, like Martin Luther King's "I have a dream" speech, but very few.

V.18. What books are eligible?

A book is "eligible" for posting if we can legally publish it. This is the case if:

1. it is in the public domain in the U.S.A.,OR,2. the copyright holder has granted unlimitednon-exclusive distribution rights to PG.

V.19. Are reprints or facsimiles eligible?

A reprint or facsimile of a book that would be eligible is itself eligible.

For example, if a book published in 1995 is a reprint of a book published in 1900, then it is eligible. However, the onus is on us to prove that itisa reprint, and if it doesn'tsayon the TP&V that it is a reprint, confirming its eligibility may be impractical.

V.20. What is the difference between a reprint and a facsimile?

A facsimile retains the page layout and formatting of the original. A reprint keeps the same words, but may lay the pages out differently. For our copyright purposes, there is no difference—we can use either.

V.21. What is the difference between a reprint and a "new edition"?

A reprint contains only the words and pictures that were printed in the original. A new edition is in some way changed; it has different text, or pictures. It may be abridged, or expanded. It may have material added or changed, using other versions of the book.

A new edition gets a new copyright, and has to be cleared based on its own copyright date and status, not the date of the original printing of the title. See also the FAQ "How come my paper book of Shakespeare says it's 'Copyright 1988'?" [C.16] for an example.

Please note that we are talking here about a new edition of the printed book, not a new (corrected) edition number for Project Gutenberg naming purposes.

V.22. What book should I work on?

Nobody in Gutenberg is going to set assignments for you. You decide what book to process. Just pick one that no-one else has already done, or is working on. It's also sensible to pick one that you'll like—you'll be living with it for a while. On a practical note, it's probably better to start with a short book or even a short story, since a long book can take quite a while to produce.

Start by thinking of books written before 1923. Pick a book you like, and check it out. If it's already done or still in copyright, try other books by the same author.

Visit the Project Gutenberg site and download a full list of Gutenberg books in GUTINDEX.ALL. Have a look at the List of Books In Progress and Complete [B.1]. Look for authors you like, and see what books by them aren't yet available.

Check out your old books. Maybe you have an eligible edition that would be of great help to the project.

Try your library. They may have some eligible editions—books we can prove to be in the public domain—and you will certainly come away with ideas. Ask your librarian. Librarians are keen to help on projects like this.

Browse second-hand bookshops in your area. There are lots of treasures to be picked up very cheaply.

Search for literature pages and bookshops on the Internet.

If all else fails, you can always ask on the Volunteers' Board or try the gutvol-d mailing [V.12] list for ideas. Others may know of books that people are especially looking for, or projects already started where you could help out.

V.23. I have a book in mind, but I don't have an eligible copy.

First, determine whether there are any eligible copies of the book, by finding out the date it was published, possibly from the Catalog of the Library of Congress [B.5] and checking the Public Domain and Copyright Rules [B.1]. If there is a public domain edition, the next problem is to find one to work with.

V.24. Where can I find an eligible book?

The most commonly used outlets are used bookstores, garage sales, library sales, charity shops and any other place that sells old books.

The Internet is a wonderful medium for finding used and antiquarian books—used bookstores all over the world have found ways of co-operating and listing their inventories on the Net, so that whether you live in Los Angeles, Moscow or Perth, you can still find that book you're looking for in a shop in a laneway of Amsterdam. Most on-line listings will quote the publication year of the book, so you can check that it's pre-1923.

Two such sites that allow second-hand booksellers to list their inventory are:

Advanced Book Exchange

Alibris

The book search page at trussel.com [B.5] has a list of many such Net bookshops, or you can simply visit any search engine and search for Used or Antiquarian Bookshops. You can often buy eligible books through these sites very cheaply.

If you still can't find the book you need, post a message on the Volunteers' Board or to the gutvol-d mailing list; maybe someone else can find it for you.

Sometimes, it may be possible for you to work from a later edition, so long as somebody who has an eligible edition can check it to make sure that no changes have been made. Sometimes, you may be able to find a modern reprint; reprints may be eligible, as long as they say they are reprints of an edition that would be eligible.

If you can type, or can scan without damaging the book, you can borrow books long enough to produce them. Even if your local library doesn't have the books you want, they may well be able to get them for you on inter-library loan. Ask your librarian about it.

V.25. What is "TP&V"?

This is an abbreviation for "Title Page and Verso", and means a paper or image copy of the front and back of the title page.

Even if the back is blank, we need to have an image of it for the files, to show that itisblank, so that if, in ten years' time, somebody queries our right to publish, we can show that we haven't just lost it.

Publishers print copyright information, like title, author, copyright year and owner, and whether the book was a reprint, on the TP&V, and by filing this, we can prove that the book we produced was in the public domain.

Sending us the TP&V is the One True Way to getting PG copyright clearance [V.37].

V.26. What is "Posting"?

Posting is the final stage in the production process, where the file is given a number and official PG header, and copied onto our FTP servers for distribution. See section 4 of the FAQ "How does a text get produced?" [V.16] for a blow-by-blow account.

V.27. I think I've found an eligible book that I'd like to work on.What do I do next?

Make sure nobody else is working on it, and that it's not already online somewhere.

V.28. What books are currently being worked on?

Check out David Price's In Progress List (a.k.a. "the InProg List") online at . David gets the information from Copyright Clearances that have been done, and organizes it into a list. It can never be 100% up to date, since clearances come in all the time, but it's the best online facility we have, and it's much more clearly presented than the original clearance files.

V.29. How do I find out if my book is already on-line somewhere?

There's no foolproof method; some student somewhere could have scanned it and put it on her college web page without announcing it anywhere. However, there are some regular places to check.

It may sound obvious, but you should always look in the PG archives first. Download GUTINDEX.ALL and keep it handy. Search the InProg List [B.1].

The two other main places to search for your book are the Internet Public Library and the On-Line Books Page . These projects specialize in indexing books that people make available on-line.

If you still don't see your book on-line anywhere, hit your favorite search engine, and give it the title, author's last name, and preferably a few uncommon words from the first page of the book. Sometimes one of those solo efforts shows up in a general search.

V.30. My book is not on the In-Progress list, and I can't find it on-line.Is it safe to go ahead and buy it?

Probably. It could have been cleared, but not included in the InProg list yet. If the amount of money to buy it is a consideration, you can e-mail any of the members of the Posting Team, and ask them to check the latest clearances for you. Even this isn't foolproof; another volunteer could be placing their order at the same time you're placing yours. Such duplications do happen, but they are very rare.

V.31. My book is on-line, but not in Project Gutenberg. What should I do?

If the on-line file is from the same edition as the one you have (e.g. not a different translation) then you may be able to submit that file, perhaps slightly edited, to Gutenberg using the clearance from your paper copy. See "I've found an eligible text elsewhere on the Net, but it's not in the PG archives. Can I just submit it to PG?" [V.62] for how to do that.

And of course, you can always still make your own version for PG. It's surprising how often even very similar paper editions have small differences that can be interesting or significant.

V.32. My book is already on-line in Project Gutenberg, but my printed book is different from the version already archived. Can I add my version?

Yes! In fact, assuming that the version already there is in the public domain, you can piggyback on the work already done by what is called "comparative retyping". For example, let's say that you have a later edition than the existing file; you can just take the existing file, edit it to match your paper version, and submit it as a new file. Of course, you must have Copyright Cleared [V.37] your paper version as well.

V.33. I see a book that was being worked on three years ago. Is anyone still working on it?

Maybe, maybe not. Some people abandon books, some people who are regular producers clear them and put them at the bottom of the pile, perhaps for years (though they will get round to them sometime), and some people just simply take two or three years to produce a book.

Once, we put names and contact details on the public InProg list, but for privacy and spam-prevention reasons, we've taken them off. However, the Posting Team have access to the master list of cleared files, and will send a message on your behalf to the person who originally cleared the book, asking if the project is still active, or if the producer wants help.

So if you really want to check this situation out, e-mail one of thePosting Team.

V.34. I've decided which book to produce. How do I tell PGI'm working on it?

As soon as you get Copyright Clearance [V.37], your book is entered in the "cleared" files. David Price will take these, and add your entry in his next release of the In Progress List.

V.35. I have a two- or three-volume set. Should I submit them as one text, or one text for each volume?

Both.

Quite a lot of 18th and 19th Century books, even straightforward novels, were published as multipart sets. When you have such a set, you should usually submit one text for each volume, and a "complete" text with the contents of all volumes together.

People who do this often complete and submit one volume at a time, until they've finished, and then contribute the "complete" file.

V.36. I have one physical book, with multiple works in it (like acollection of plays). Should I submit each text separately?

If the works are clearly separate, stand-alone texts, and are long enough [V.17] to warrant inclusion on their own in the archives, then yes, you should, and youmayalso submit a "complete" version as well, if it seems appropriate. This most commonly happens in a collection of plays, though essays and other works may also fit the criteria. Collections of poetry rarely do, since most poems are too short to submit as stand-alone texts.

Sometimes the book includes a preface or introduction or glossary covering all the works in it. In this case, you can decide whether to include these with each of the parts, or save them for the "complete" version.

V.37. How do I get copyright clearance?

Basically we need to see images of the front and back of the title page of the book, which is where copyright information is usually shown. This is called "TP&V", for "Title Page and Verso" [V.25].

To Submit Online:

As of late 2002, we have a new automated upload procedure using a web page. This is by far the fastest and easiest way to get clearance. You need scanned images (PNG, JPEG, TIFF, GIF), of the two pages, of good enough resolution that the text can be read clearly, though the files don't need to be huge.

Just go to and follow the instructions.

There are two other, older ways to submit a text for clearance.

To submit by paper mail, photocopy the front and back of the title page, even if the back is blank, write your e-mail address on it, and send the photocopies to:

This is called Title Page & Verso, or TP&V for short, and is needed for copyright research. A colored envelope is best, to make sure your letter is easily recognized as TP&V.

E-mail Michael hart@pobox.com when you send them, so he knows they're on the way. It's a good idea to check back with him by e-mail after a week or so if you haven't heard from him.

About this, Michael says: "Please include always your e-mail name and address, and mark the envelope with some distinctive mark and or color. Colored envelopes fine. Just something so I can find it easily, the mail here is slow and deep, like snow. Please send a note to: for more info."

To submit by e-mail, scan the front and back of the title page, even ifthe back is blank, and e-mail the images to Greg Newby as TIFF, JPEG or GIF in medium resolution. Makesure that the print is legible before you send.

Whichever method you use, you should expect to get an e-mail back after about a week, with one line containing the Author, Title, your name and date with the word "OK" at the end. This means that your text has been cleared.

A Clearance Line looks something like:

The Works Of Homer [Iliad/Odyssey] Tr. George Chapman Jim Tinsley 06/14/01 ok

If you don't get any response, e-mail to check that your TP&V was received OK. If the word at the end of the line is not "OK", then your text is not eligible, and a comment will probably be appended explaining why it is not eligible.

Don't start work on your book until you get that OK! It's very sickening to do all that work, and then find out that your text can't legally be put on-line!

V.38. I have a two- or three-volume set. Do I have to get a separate clearance on each physical book?

Yes.

Some multi-volume works, notably reference books and translations, were published in a series, and it may be that the first volume is 1922, but the others are 1923 or later, so we have to clear each individually.

V.39. I have one physical book, with multiple works in it (like a collection of plays). Do I have to get a separate clearance for each work?

No. Since they were all printed together, one TP&V will suffice for all, but . . .

You should list each separate title included, if you intend to submit each title separately (see the FAQ "I have one physical book, with multiple works in it like a collection of plays. Should I submit each work separately?" [V.36]). If, say, you clear a "Collected Plays of Sheridan", and later submit an eBook of "The School for Scandal", we will have trouble finding your clearance unless we have made a note that "School for Scandal" is part of the contents of "Collected Plays".

In a case like this, you should include, on your paper or e-mail, something like:

George Bernard Shaw. Plays Unpleasant. 1905.Contents:Preface to Unpleasant PlaysWidower's HousesThe PhilandererMrs. Warren's Profession

You only need to do this when you are going to submit each part separately, which is commonly the case with plays, and sometimes essays, stories and novellas. Taking a different example, the "Collected Poems of Emily Dickinson", we would not need to list the contents, since we wouldn't publish each poem separately.

There is one exceptional case: if your book was printed after 1923, but contains stories or plays some of which are stated to be reprints of pre-1923 editions, you should give as much detail as possible about what you intend to submit.

V.40. Who will check up on my progress? When?

Nobody. There are no schedules or timetables. You're welcome to contact other volunteers [V.12] with comments or questions, though.

V.41. How long should it take me to complete a book?

Most books get done in between one and three months, but this varies wildly. It depends on the amount of time you can afford to give it, the length of the book and, if you're not typing, the quality of the scan—if the book scans badly, you need to put more time into proofing.

Some very productive volunteers manage to turn out an e-text a week; some books can take a year or more.

Scanning itself doesn't take too long. Even if it takes you as much as two minutes per page to scan, you will still complete a 300 page book in 10 hours, and you will probably be scanning much faster than that [S.9]. The problem is that the text generated by the scanner and your OCR package is usually faulty. There are many cute scanner errors, mistaking b for h, or e for c, so that "heard" is scanned as "beard" or "ear" as "car". Makes the story more interesting sometimes!

So now you need to do a first proof of the e-text. Read it carefully, correct scanning mistakes, and make sure that you haven't left out pages or got them in the wrong order. Unless your scan was exceptionally good, this is the time-burner in the process.

When you've done the first proof, you can either do a second proof yourself, or send it to another volunteer for second proofing.

If you're a typist, of course, you can skip right over the messy scanning and scan-correction process. Yay typists!!

V.42. I want/don't want my name published on my e-text

No problem. When you send the e-text for posting, mention exactly what, if anything, you want the Credits Line [V.47] to say.

V.43. I'd like to put a copy of my finished e-text, or anotherGutenberg text, on my own web page.

Great! PG encourages the widest possible distribution of e-texts. We like to publish everything in plain text, which is the most accessible format, since everybody can read plain text. But once it's available in plain text, it's open to you or anyone else to convert it to other formats like HTML for further distribution.

If you are reposting a text, though, please be careful to check that your posting complies with the conditions spelled out in the header, especially for copyrighted works.

V.44. I've scanned, edited and proofed my text. How do I find someoneto second-proof it?

You can post a request on the Volunteers' Board, or on the gutvol-dMailing List. You will probably get some offers there. In a difficultcase, you might ask Michael Hart to add it to the "Requests forAssistance" section of the next Newsletter.

In general, the best way to handle it is to make a co-operative proofing project out of it. This is like a miniature version of the distributed proofreading sites, without the page images.

There are always people looking for proofing work, but many beginners take on more than they can handle, and don't finish the job, and this can be very disappointing if you give the whole thing to one volunteer who then vanishes without trace. You can minimize the risk of this by splitting the book into chunks of about 20-30 pages, or one chapter if that's around the right size, each. Write explicit instructions about what you want them to do when they spot a suspected error, like fix it or mark it with an asterisk. (Marking is probably safer with beginners who don't have the book or an image of the page to refer to.) Give the first chapter to the first person who responds, the second to the second, and so on. As you hand out the chapters, let the proofers know that if they're not returned within three or five days, you'll assume they've quit. Three days is more than plenty of time for 20 pages. If someone returns a chapter, you can give them another. If someone doesn't get back to you within the time set, assume they're not going to, and recycle that chapter to someone else. No hard feelings, no problem. This process of "co-operative proofing" ensures that beginning proofers don't choke on the work, and that one vanishing volunteer doesn't hold up the whole project.

V.45. I've gone over and over my text. I can't find any more errors, and I'm sick of looking at it. What should I do now?

We all know that feeling! Particularly with your first book, you've probably gone through a patch when you thought you'd never finish—and when you do, you can't stand the idea of looking at it again. Heh. Cheer up—the first twenty texts are the worst! :-) And you'll feel a lot better when you see your text available for everyone to read.

You have three choices:

You can send it for posting as it is. [V.46]

You can put it aside for week or so, and come back to it with fresh eyes.

You can ask in any of the standard ways [V.12] for someone else to second-proof it for you. This has a lot to recommend it; it gets other sets of eyes looking at the text, it relieves the pressure that you may feel, it may rekindle your enthusiasm for the text, it allows you to "meet" other volunteers, and possibly form partnerships for future PG collaboration. Above all, it gives new proofers a chance to get their feet wet, and this is good for them, and good for PG. You are not only contributing a text, you're helping to train and encourage the next generation of producers.

V.46. Where and how can I send my text for posting?

As of late 2002, we have a new automated upload procedure using a web page. This has a lot of good things going for it, because we keep a record of what's uploaded, you get an e-mailed copy of the notification, you don't have to fiddle with FTP, and we can make up the header automatically from the information you enter, which saves time and prevents keying errors.

As always, it's better to ZIP your file first, because it'll take less time to transfer.

Just go to , fill in the form, specify the file to upload, and hit "Send" at the bottom.

And you're done!

If, for some reason, you can't use this page, there are two backup options: you can e-mail it, or you can upload it by FTP. Whichever you use, it is always best to ZIP the file first if you can.

If you are comfortable with sending files by FTP, this is better than e-mail, First, you will need a username and password, which you can get by e-mailing any of the Posting Team.

If you already know how to use command-line FTP, here's how to do it:

Log in to beryl.ils.unc.edu using the username and password supplied and change to the work directory by typing "cd work". Change to binary mode with the "bin" command and "put" your file.

Summary instructions:ftp beryl.ils.unc.edulogin: yourloginpassword: yourpasswordcd workbinput yourfile.extquit

Here is a sample session:

>ftp beryl.ils.unc.eduConnected to beryl.ils.unc.edu.220-Access from unknown@127.0.0.1 logged.220 FTP ServerUser (beryl.ils.unc.edu:(none)): xxxxxxxx331 Password required for xxxxxxxx.Password: xxxxxxxx230 User xxxxxxxx logged in.ftp> cd work250 CWD command successful.ftp> bin200 Type set to I.ftp> put MYFILE.ZIP200 PORT command successful.150 Opening BINARY mode data connection for MYFILE.ZIP.226 Transfer complete.ftp: 172313 bytes sent in 17.34Seconds 9.94Kbytes/sec.ftp> quit

When you are in the work directory, you will not be able to list files, but theydoexist and theyarethere.

When you have uploaded your file, e-mail a note to any or all of thePosting Team, including your1. filename2. credits line as you want it on your text3. clearance line you received [V.37]

An ideal note might be:

Subject: Beryl upload for posting: Hamlet

I have uploaded to beryl:Hamlet, by William Shakespeare

File is: hamlet.zip

Credits line is: Produced by John Doe

Clearance was given as:Hamlet William Shakespeare John Doe 05/03/02 ok


Back to IndexNext