Chapter 2

The librarians who testified at trial whose libraries use Internet filtering software all provide methods by which their patrons may ask the library to unblock specific Web sites or pages. Of these, only the Tacoma Public Library allows patrons to request that a URL be unblocked without providing any identifying information; Tacoma allows patrons to request a URL by sending an email from the Internet terminal that the patron is using that does not contain a return email address for the user. David Biek, the head librarian at the Tacoma Library's main branch, testified at trial that the library keeps records that would enable it to know which patrons made unblocking requests, but does not use that information to connect users with their requests. Biek also testified that he periodically scans the library's Internet use logs to search for: (1) URLs that were erroneously blocked, so that he may unblock them; or (2) URLs that should have been blocked, but were not, in order to add them to a blocked category list. In the course of scanning the use logs, Biek has also found what looked like attempts to access child pornography. In two cases, he communicated his findings to law enforcement and turned over the logs in response to a subpoena. At all events, it takes time for librarians to make decisions about whether to honor patrons' requests to unblock Web pages. In the libraries proffered by the defendants, unblocking decisions sometimes take between 24 hours and a week. Moreover, none of these libraries allows unrestricted access to the Internet pending a determination of the validity of a Web site blocked by the blocking programs. A few of the defendants' proffered libraries represented that individual librarians would have the discretion to allow a patron to have full Internet access on a staff computer upon request, but none claimed that allowing such access was mandatory, and patron access is supervised in every instance. None of these libraries makes differential unblocking decisions based on the patrons' age. Unblocking decisions are usually made identically for adults and minors. Unblocking decisions even for adults are usually based on suitability of the Web site for minors.

It is apparent that many patrons are reluctant or unwilling to ask librarians to unblock Web pages or sites that contain only materials that might be deemed personal or embarrassing, even if they are not sexually explicit or pornographic. We credit the testimony of Emmalyn Rood, discussed above, that she would have been unwilling as a young teen to ask a librarian to disable filtering software so that she could view materials concerning gay and lesbian issues. We also credit the testimony of Mark Brown, who stated that he would have been too embarrassed to ask a librarian to disable filtering software if it had impeded his ability to research treatments and cosmetic surgery options for his mother when she was diagnosed with breast cancer. The pattern of patron requests to unblock specific URLs in the various libraries involved in this case also confirms our finding that patrons are largely unwilling to make unblocking requests unless they are permitted to do so anonymously. For example, the Fulton County Library receives only about 6 unblocking requests each year, the Greenville Public Library has received only 28 unblocking requests since August 21, 2000, and the Westerville, Ohio Library has received fewer than 10 unblocking requests since 1999. In light of the fact that a substantial amount of overblocking occurs in these very libraries, see infra Subsection II.E.4, we find that the lack of unblocking requests in these libraries does not reflect the effectiveness of the filters, but rather reflects patrons' reluctance to ask librarians to unblock sites. 5. Internet Filtering Technology 1. What Is Filtering Software, Who Makes It, and What Does It Do?

Commercially available products that can be configured to block or filter access to certain material on the Internet are among the "technology protection measures" that may be used to attempt to comply with CIPA. There are numerous filtering software products available commercially. Three network-based filtering products SurfControl's Cyber Patrol, N2H2's Bess/i2100, and Secure Computing's SmartFilter currently have the lion's share of the public library market. The parties in this case deposed representatives from these three companies. Websense, another network-based blocking product, is also currently used in the public library market, and was discussed at trial. Filtering software may be installed either on an individual computer or on a computer network. Network-based filtering software products are designed for use on a network of computers and funnel requests for Internet content through a centralized network device. Of the various commercially available blocking products, network-based products are the ones generally marketed to institutions, such as public libraries, that provide Internet access through multiple terminals. Filtering programs function in a fairly simple way. When an Internet user requests access to a certain Web site or page, either by entering a domain name or IP address into a Web browser, or by clicking on a link, the filtering software checks that domain name or IP address against a previously compiled "control list" that may contain up to hundreds of thousands of URLs. The three companies deposed in this case have control lists containing between 200,000 and 600,000 URLs. These lists determine which URLs will be blocked.

Filtering software companies divide their control lists into multiple categories for which they have created unique definitions. SurfControl uses 40 such categories, N2H2 uses 35 categories (and seven "exception" categories), Websense uses 30 categories, and Secure Computing uses 30 categories. Filtering software customers choose which categories of URLs they wish to enable. A user "enables" a category in a filtering program by configuring the program to block all of the Web pages listed in that category. The following is a list of the categories offered by each of these four filtering programs. SurfControl's Cyber Patrol offers the following categories: Adult/Sexually Explicit; Advertisements; Arts & Entertainment; Chat; Computing & Internet; Criminal Skills; Drugs, Alcohol & Tobacco; Education; Finance & Investment; Food & Drink; Gambling; Games; Glamour & Intimate Apparel; Government & Politics; Hacking; Hate Speech; Health & Medicine; Hobbies & Recreation; Hosting Sites; Job Search & Career Development; Kids' Sites; Lifestyle & Culture; Motor Vehicles; News; Personals & Dating; Photo Searches; Real Estate; Reference; Religion; Remote Proxies; Sex Education; Search Engines; Shopping; Sports; Streaming Media; Travel; Usenet News; Violence; Weapons; and Web-based Email.

N2H2 offers the following categories: Adults Only; Alcohol; Auction; Chat; Drugs; Electronic Commerce; Employment Search; Free Mail; Free Pages; Gambling; Games; Hate/Discrimination; Illegal; Jokes; Lingerie; Message/Bulletin Boards; Murder/Suicide; News; Nudity; Personal Information; Personals; Pornography; Profanity; Recreation/Entertainment; School Cheating Information; Search Engines; Search Terms; Sex; Sports; Stocks; Swimsuits; Tasteless/Gross; Tobacco; Violence; and Weapons. The "Nudity" category purports to block only "non-pornographic" images. The "Sex" category is intended to block only those depictions of sexual activity that are not intended to arouse. The "Tasteless/Gross" category includes contents such as "tasteless humor" and "graphic medical or accident scene photos." Additionally, N2H2 offers seven "exception categories." These exception categories include Education, Filtered Search Engine, For Kids, History, Medical, Moderated, and Text/Spoken Only. When an exception category is enabled, access to any Web site or page via a URL associated with both a category and an exception, for example, both "Sex" and "Education," will be allowed, even if the customer has enabled the product to otherwise block the category "Sex." As of November 15, 2001, of those Web sites categorized by N2H2 as "Sex," 3.6% were also categorized as "Education," 2.9% as "Medical," and 1.6% as "History."

Websense offers the following categories: Abortion Advocacy; Advocacy Groups; Adult Material; Business & Economy; Drugs; Education; Entertainment; Gambling; Games; Government; Health; Illegal/Questionable; Information Technology; Internet Communication; Job Search; Militancy/Extremist; News & Media; Productivity Management; Bandwidth Management; Racism/Hate; Religion; Shopping; Society & Lifestyle; Special Events; Sports; Tasteless; Travel; Vehicles; Violence; and Weapons. The "Adult" category includes "full or partial nudity of individuals," as well as sites offering "light adult humor and literature" and "[s]exually explicit language." The "Sexuality/Pornography" category includes, inter alia, "hard-core adult humor and literature" and "[s]exually explicit language." The "Tasteless" category includes "hard-to-stomach sites, including offensive, worthless or useless sites, grotesque or lurid depictions of bodily harm." The "Hacking" category blocks "sites providing information on or promoting illegal or questionable access to or use of communications equipment and/or software." SmartFilter offers the following categories: Anonymizers/Translators; Art & Culture; Chat; Criminal Skills; Cults/Occult; Dating; Drugs; Entertainment; Extreme/Obscene/Violence; Gambling; Games; General News; Hate Speech; Humor; Investing; Job Search; Lifestyle; Mature; MP3 Sites; Nudity; On-line Sales; Personal Pages; Politics, Opinion & Religion; Portal Sites; Self-Help/Health; Sex; Sports; Travel; Usenet News; and Webmail. Most importantly, no category definition used by filtering software companies is identical to CIPA's definitions of visual depictions that are obscene, child pornography, or harmful to minors. And category definitions and categorization decisions are made without reference to local community standards. Moreover, there is no judicial involvement in the creation of filtering software companies' category definitions and no judicial determination is made before these companies categorize a Web page or site.

Each filtering software company associates each URL in its control list with a "tag" or other identifier that indicates the company's evaluation of whether the content or features of the Web site or page accessed via that URL meets one or more of its category definitions. If a user attempts to access a Web site or page that is blocked by the filter, the user is immediately presented with a screen that indicates that a block has occurred as a result of the operation of the filtering software. These "denial screens" appear only at the point that a user attempts to access a site or page in an enabled category. All four of the filtering programs on which evidence was presented allow users to customize the category lists that exist on their own PCs or servers by adding or removing specific URLs. For example, if a public librarian charged with administering a library's Internet terminals comes across a Web site that he or she finds objectionable that is not blocked by the filtering program that his or her library is using, then the librarian may add that URL to a category list that exists only on the library's network, and it would thereafter be blocked under that category. Similarly, a customer may remove individual URLs from category lists. Importantly, however, no one but the filtering companies has access to the complete list of URLs in any category. The actual URLs or IP addresses of the Web sites or pages contained in filtering software vendors' category lists are considered to be proprietary information, and are unavailable for review by customers or the general public, including the proprietors of Web sites that are blocked by filtering software.

Filtering software companies do not generally notify the proprietors of Web sites when they block their sites. The only way to discover which URLs are blocked and which are not blocked by any particular filtering company is by testing individual URLs with filtering software, or by entering URLs one by one into the "URL checker" that most filtering software companies provide on their Web sites. Filtering software companies will entertain requests for recategorization from proprietors of Web sites that discover their sites are blocked. Because new pages are constantly being added to the Web, filtering companies provide their customers with periodic updates of category lists. Once a particular Web page or site is categorized, however, filtering companies generally do not re-review the contents of that page or site unless they receive a request to do so, even though the content on individual Web pages and sites changes frequently. 2. The Methods that Filtering Companies Use to Compile Category Lists

While the way in which filtering programs operate is conceptually straightforward by comparing a requested URL to a previously compiled list of URLs and blocking access to the content at that URL if it appears on the list accurately compiling and categorizing URLs to form the category lists is a more complex process that is impossible to conduct with any high degree of accuracy. The specific methods that filtering software companies use to compile and categorize control lists are, like the lists themselves, proprietary information. We will therefore set forth only general information on the various types of methods that all filtering companies deposed in this case use, and the sources of error that are at once inherent in those methods and unavoidable given the current architecture of the Internet and the current state of the art in automated classification systems. We base our understanding of these methods largely on the detailed testimony and expert report of Dr. Geoffrey Nunberg, which we credit. The plaintiffs offered, and the Court qualified, Nunberg as an expert witness on automated classification systems. When compiling and categorizing URLs for their category lists, filtering software companies go through two distinct phases. First, they must collect or "harvest" the relevant URLs from the vast number of sites that exist on the Web. Second, they must sort through the URLs they have collected to determine under which of the company's self-defined categories (if any), they should be classified. These tasks necessarily result in a tradeoff between overblocking (i.e., the blocking of content that does not meet the category definitions established by CIPA or by the filtering software companies), and underblocking (i.e., leaving off of a control list a URL that contains content that would meet the category definitions defined by CIPA or the filtering software companies). 1. The "Harvesting" Phase

Filtering software companies, given their limited resources, do not attempt to index or classify all of the billions of pages that exist on the Web. Instead, the set of pages that they attempt to examine and classify is restricted to a small portion of the Web. The companies use a variety of automated and manual methods to identify a universe of Web sites and pages to "harvest" for classification. These methods include: entering certain key words into search engines; following links from a variety of online directories (e.g., generalized directories like Yahoo or various specialized directories, such as those that provide links to sexually explicit content); reviewing lists of newly-registered domain names; buying or licensing lists of URLs from third parties; "mining" access logs maintained by their customers; and reviewing other submissions from customers and the public. The goal of each of these methods is to identify as many URLs as possible that are likely to contain content that falls within the filtering companies' category definitions.

The first method, entering certain keywords into commercial search engines, suffers from several limitations. First, the Web pages that may be "harvested" through this method are limited to those pages that search engines have already identified. However, as noted above, a substantial portion of the Web is not even theoretically indexable (because it is not linked to by any previously known page), and only approximately 50% of the pages that are theoretically indexable have actually been indexed by search engines. We are satisfied that the remainder of the indexable Web, and the vast "Deep Web," which cannot currently be indexed, includes materials that meet CIPA's categories of visual depictions that are obscene, child pornography, and harmful to minors. These portions of the Web cannot presently be harvested through the methods that filtering software companies use (except through reporting by customers or by observing users' log files), because they are not linked to other known pages. A user can, however, gain access to a Web site in the unindexed Web or the Deep Web if the Web site's proprietor or some other third party informs the user of the site's URL. Some Web sites, for example, send out mass email advertisements containing the site's URL, the spamming process we have described above. Second, the search engines that software companies use for harvesting are able to search text only, not images. This is of critical importance, because CIPA, by its own terms, covers only "visual depictions." 20 U.S.C. Sec. 9134(f)(1)(A)(i); 47 U.S.C. Sec. 254(h)(5)(B)(i). Image recognition technology is immature, ineffective, and unlikely to improve substantially in the near future. None of the filtering software companies deposed in this case employs image recognition technology when harvesting or categorizing URLs. Due to the reliance on automated text analysis and the absence of image recognition technology, a Web page with sexually explicit images and no text cannot be harvested using a search engine. This problem is complicated by the fact that Web site publishers may use image files rather than text to represent words, i.e., they may use a file that computers understand to be a picture, like a photograph of a printed word, rather than regular text, making automated review of their textual content impossible. For example, if the Playboy Web site displays its name using a logo rather than regular text, a search engine would not see or recognize the Playboy name in that logo.

In addition to collecting URLs through search engines and Web directories (particularly those specializing in sexually explicit sites or other categories relevant to one of the filtering companies' category definitions), and by mining user logs and collecting URLs submitted by users, the filtering companies expand their list of harvested URLs by using "spidering" software that can "crawl" the lists of pages produced by the previous four methods, following their links downward to bring back the pages to which they link (and the pages to which those pages link, and so on, but usually down only a few levels). This spidering software uses the same type of technology that commercial Web search engines use. While useful in expanding the number of relevant URLs, the ability to retrieve additional pages through this approach is limited by the architectural feature of the Web that page-to-page links tend to converge rather than diverge. That means that the more pages from which one spiders downward through links, the smaller the proportion of new sites one will uncover; if spidering the links of 1000 sites retrieved through a search engine or Web directory turns up 500 additional distinct adult sites, spidering an additional 1000 sites may turn up, for example, only 250 additional distinct sites, and the proportion of new sites uncovered will continue to diminish as more pages are spidered. These limitations on the technology used to harvest a set of URLs for review will necessarily lead to substantial underblocking of material with respect to both the category definitions employed by filtering software companies and CIPA's definitions of visual depictions that are obscene, child pornography, or harmful to minors. 2. The "Winnowing" or Categorization Phase

Once the URLs have been harvested, some filtering software companies use automated key word analysis tools to evaluate the content and/or features of Web sites or pages accessed via a particular URL and to tentatively prioritize or categorize them. This process may be characterized as "winnowing" the harvested URLs. Automated systems currently used by filtering software vendors to prioritize, and to categorize or tentatively categorize the content and/or features of a Web site or page accessed via a particular URL operate by means of (1) simple key word searching, and (2) the use of statistical algorithms that rely on the frequency and structure of various linguistic features in a Web page's text. The automated systems used to categorize pages do not include image recognition technology. All of the filtering companies deposed in the case also employ human review of some or all collected Web pages at some point during the process of categorizing Web pages. As with the harvesting process, each technique employed in the winnowing process is subject to limitations that can result in both overblocking and underblocking.

First, simple key-word-based filters are subject to the obvious limitation that no string of words can identify all sites that contain sexually explicit content, and most strings of words are likely to appear in Web sites that are not properly classified as containing sexually explicit content. As noted above, filtering software companies also use more sophisticated automated classification systems for the statistical classification of texts. These systems assign weights to words or other textual features and use algorithms to determine whether a text belongs to a certain category. These algorithms sometimes make reference to the position of a word within a text or its relative proximity to other words. The weights are usually determined by machine learning methods (often described as "artificial intelligence"). In this procedure, which resembles an automated form of trial and error, a system is given a "training set" consisting of documents preclassified into two or more groups, along with a set of features that might be potentially useful in classifying the sets. The system then "learns" rules that assign weights to those features according to how well they work in classification, and assigns each new document to a category with a certain probability. Notwithstanding their "artificial intelligence" description, automated text classification systems are unable to grasp many distinctions between types of content that would be obvious to a human. And of critical importance, no presently conceivable technology can make the judgments necessary to determine whether a visual depiction fits the legal definitions of obscenity, child pornography, or harmful to minors. Finally, all the filtering software companies deposed in this case use some form of human review in their process of winnowing and categorizing Web pages, although one company admitted to categorizing some Web pages without any human review. SmartFilter states that "the final categorization of every Web site is done by a human reviewer." Another filtering company asserts that of the 10,000 to 30,000 Web pages that enter the "work queue" to be categorized each day, two to three percent of those are automatically categorized by their PornByRef system (which only applies to materials classified in the pornography category), and the remainder are categorized by human review. SurfControl also states that no URL is ever added to its database without human review.

Human review of Web pages has the advantage of allowing more nuanced, if not more accurate, interpretations than automated classification systems are capable of making, but suffers from its own sources of error. The filtering software companies involved here have limited staff, of between eight and a few dozen people, available for hand reviewing Web pages. The reviewers that are employed by these companies base their categorization decisions on both the text and the visual depictions that appear on the sites or pages they are assigned to review. Human reviewers generally focus on English language Web sites, and are generally not required to be multi-lingual. Given the speed at which human reviewers must work to keep up with even a fraction of the approximately 1.5 million pages added to the publicly indexable Web each day, human error is inevitable. Errors are likely to result from boredom or lack of attentiveness, overzealousness, or a desire to "err on the side of caution" by screening out material that might be offensive to some customers, even if it does not fit within any of the company's category definitions. None of the filtering companies trains its reviewers in the legal definitions concerning what is obscene, child pornography, or harmful to minors, and none instructs reviewers to take community standards into account when making categorization decisions.

Perhaps because of limitations on the number of human reviewers and because of the large number of new pages that are added to the Web every day, filtering companies also widely engage in the practice of categorizing entire Web sites at the "root URL," rather than engaging in a more fine-grained analysis of the individual pages within a Web site. For example, the filtering software companies deposed in this case all categorize the entire Playboy Web site as Adult, Sexually Explicit, or Pornography. They do not differentiate between pages within the site containing sexually explicit images or text, and for example, pages containing no sexually explicit content, such as the text of interviews of celebrities or politicians. If the "root" or "top-level" URL of a Web site is given a category tag, then access to all content on that Web site will be blocked if the assigned category is enabled by a customer. In some cases, whole Web sites are blocked because the filtering companies focus only on the content of the home page that is accessed by entering the root URL. Entire Web sites containing multiple Web pages are commonly categorized without human review of each individual page on that site. Web sites that may contain multiple Web pages and that require authentication or payment for access are commonly categorized based solely on a human reviewer's evaluation of the pages that may be viewed prior to reaching the authentication or payment page.

Because there may be hundreds or thousands of pages under a root URL, filtering companies make it their primary mission to categorize the root URL, and categorize subsidiary pages if the need arises or if there is time. This form of overblocking is called "inheritance," because lower-level pages inherit the categorization of the root URL without regard to their specific content. In some cases, "reverse inheritance" also occurs, i.e., parent sites inherit the classification of pages in a lower level of the site. This might happen when pages with sexual content appear in a Web site that is devoted primarily to non-sexual content. For example, N2H2's Bess filtering product classifies every page in the Salon.com Web site, which contains a wide range of news and cultural commentary, as "Sex, Profanity," based on the fact that the site includes a regular column that deals with sexual issues. Blocking by both domain name and IP address is another practice in which filtering companies engage that is a function both of the architecture of the Web and of the exigencies of dealing with the rapidly expanding number of Web pages. The category lists maintained by filtering software companies can include URLs in either their human-readable domain name address form, their numeric IP address form, or both. Through "virtual hosting" services, hundreds of thousands of Web sites with distinct domain names may share a single numeric IP address. To the extent that filtering companies block the IP addresses of virtual hosting services, they will necessarily block a substantial amount of content without reviewing it, and will likely overblock a substantial amount of content.

Another technique that filtering companies use in order to deal with a structural feature of the Internet is blocking the root level URLs of so-called "loophole" Web sites. These are Web sites that provide access to a particular Web page, but display in the user's browser a URL that is different from the URL with which the particular page is usually associated. Because of this feature, they provide a "loophole" that can be used to get around filtering software, i.e., they display a URL that is different from the one that appears on the filtering company's control list. "Loophole" Web sites include caches of Web pages that have been removed from their original location, "anonymizer" sites, and translation sites. Caches are archived copies that some search engines, such as Google, keep of the Web pages they index. The cached copy stored by Google will have a URL that is different from the original URL. Because Web sites often change rapidly, caches are the only way to access pages that have been taken down, revised, or have changed their URLs for some reason. For example, a magazine might place its current stories under a given URL, and replace them monthly with new stories. If a user wanted to find an article published six months ago, he or she would be unable to access it if not for Google's cached version.

Some sites on the Web serve as a proxy or intermediary between a user and another Web page. When using a proxy server, a user does not access the page from its original URL, but rather from the URL of the proxy server. One type of proxy service is an "anonymizer." Users may access Web sites indirectly via an anonymizer when they do not want the Web site they are visiting to be able to determine the IP address from which they are accessing the site, or to leave "cookies" on their browser. Some proxy servers can be used to attempt to translate Web page content from one language to another. Rather than directly accessing the original Web page in its original language, users can instead indirectly access the page via a proxy server offering translation features. As noted above, filtering companies often block loophole sites, such as caches, anonymizers, and translation sites. The practice of blocking loophole sites necessarily results in a significant amount of overblocking, because the vast majority of the pages that are cached, for example, do not contain content that would match a filtering company's category definitions. Filters that do not block these loophole sites, however, may enable users to access any URL on the Web via the loophole site, thus resulting in substantial underblocking. 3. The Process for "Re-Reviewing" Web Pages After Their Initial Categorization Most filtering software companies do not engage in subsequent reviews of categorized sites or pages on a scheduled basis. Priority is placed on reviewing and categorizing new sites and pages, rather than on re-reviewing already categorized sites and pages. Typically, a filtering software vendor's previous categorization of a Web site is not re-reviewed for accuracy when new pages are added to the Web site. To the extent the Web site was previously categorized as a whole, the new pages added to the site usually share the categorization assigned by the blocking product vendor. This necessarily results in both over- and underblocking, because, as noted above, the content of Web pages and Web sites changes relatively rapidly.

In addition to the content on Web sites or pages changing rapidly, Web sites themselves may disappear and be replaced by sites with entirely different content. If an IP address associated with a particular Web site is blocked under a particular category and the Web site goes out of existence, then the IP address likely would be reassigned to a different Web site, either by an Internet service provider or by a registration organization, such as the American Registry for Internet Numbers, see http://www.arin.net. In that case, the site that received the reassigned IP address would likely be miscategorized. Because filtering companies do not engage in systematic re-review of their category lists, such a site would likely remain miscategorized unless someone submitted it to the filtering company for re-review, increasing the incidence of over- and underblocking. This failure to re-review Web pages primarily increases a filtering company's rate of overblocking. However, if a filtering company does not re-review Web pages after it determines that they do not fall into any of its blocking categories, then that would result in underblocking (because, for example, a page might add sexually explicit content). 3. The Inherent Tradeoff Between Overblocking and Underblocking

There is an inherent tradeoff between any filter's rate of overblocking (which information scientists also call "precision") and its rate of underblocking (which is also referred to as "recall"). The rate of overblocking or precision is measured by the proportion of the things a classification system assigns to a certain category that are appropriately classified. The plaintiffs' expert, Dr. Nunberg, provided the hypothetical example of a classification system that is asked to pick out pictures of dogs from a database consisting of 1000 pictures of animals, of which 80 were actually dogs. If it returned 100 hits, of which 80 were in fact pictures of dogs, and the remaining 20 were pictures of cats, horses, and deer, we would say that the system identified dog pictures with a precision of 80%. This would be analogous to a filter that overblocked at a rate of 20%. The recall measure involves determining what proportion of the actual members of a category the classification system has been able to identify. For example, if the hypothetical animal- picture database contained a total of 200 pictures of dogs, and the system identified 80 of them and failed to identify 120, it would have performed with a recall of 40%. This would be analogous to a filter that underblocked 60% of the material in a category. In automated classification systems, there is always a tradeoff between precision and recall. In the animal-picture example, the recall could be improved by using a looser set of criteria to identify the dog pictures in the set, such as any animal with four legs, and all the dogs would be identified, but cats and other animals would also be included, with a resulting loss of precision. The same tradeoff exists between rates of overblocking and underblocking in filtering systems that use automated classification systems. For example, an automated system that classifies any Web page that contains the word "sex" as sexually explicit will underblock much less, but overblock much more, than a system that classifies any Web page containing the phrase "free pictures of people having sex" as sexually explicit.

This tradeoff between overblocking and underblocking also applies not just to automated classification systems, but also to filters that use only human review. Given the approximately two billion pages that exist on the Web, the 1.5 million new pages that are added daily, and the rate at which content on existing pages changes, if a filtering company blocks only those Web pages that have been reviewed by humans, it will be impossible, as a practical matter, to avoid vast amounts of underblocking. Techniques used by human reviewers such as blocking at the IP address level, domain name level, or directory level reduce the rates of underblocking, but necessarily increase the rates of overblocking, as discussed above. To use a simple example, it would be easy to design a filter intended to block sexually explicit speech that completely avoids overblocking. Such a filter would have only a single sexually explicit Web site on its control list, which could be re-reviewed daily to ensure that its content does not change. While there would be no overblocking problem with such a filter, such a filter would have a severe underblocking problem, as it would fail to block all the sexually explicit speech on the Web other than the one site on its control list. Similarly, it would also be easy to design a filter intended to block sexually explicit speech that completely avoids underblocking. Such a filter would operate by permitting users to view only a single Web site, e.g., the Sesame Street Web site. While there would be no underblocking problem with such a filter, it would have a severe overblocking problem, as it would block access to millions of non-sexually explicit sites on the Web other than the Sesame Street site.

While it is thus quite simple to design a filter that does not overblock, and equally simple to design a filter that does not underblock, it is currently impossible, given the Internet's size, rate of growth, rate of change, and architecture, and given the state of the art of automated classification systems, to develop a filter that neither underblocks nor overblocks a substantial amount of speech. The more effective a filter is at blocking Web sites in a given category, the more the filter will necessarily overblock. Any filter that is reasonably effective in preventing users from accessing sexually explicit content on the Web will necessarily block substantial amounts of non- sexually explicit speech. 4. Attempts to Quantify Filtering Programs' Rates of Over- and Underblocking The government presented three studies, two from expert witnesses, and one from a librarian fact witness who conducted a study using Internet use logs from his own library, that attempt to quantify the over- and underblocking rates of five different filtering programs. The plaintiffs presented one expert witness who attempted to quantify the rates of over- and underblocking for various programs. Each of these attempts to quantify rates of over- and underblocking suffers from various methodological flaws.

The fundamental problem with calculating over- and underblocking rates is selecting a universe of Web sites or Web pages to serve as the set to be tested. The studies that the parties submitted in this case took two different approaches to this problem. Two of the studies, one prepared by the plaintiffs' expert witness Chris Hunter, a graduate student at the University of Pennsylvania, and the other prepared by the defendants' expert, Chris Lemmons of eTesting Laboratories, in Research Triangle Park, North Carolina, approached this problem by compiling two separate lists of Web sites, one of URLs that they deemed should be blocked according to the filters' criteria, and another of URLs that they deemed should not be blocked according to the filters' criteria. They compiled these lists by choosing Web sites from the results of certain key word searches. The problem with this selection method is that it is neither random, nor does it necessarily approximate the universe of Web pages that library patrons visit.

The two other studies, one by David Biek, head librarian at the Tacoma Public Library's main branch, and one by Cory Finnell of Certus Consulting Group, of Seattle, Washington, chose actual logs of Web pages visited by library patrons during specific time periods as the universe of Web pages to analyze. This method, while surely not as accurate as a truly random sample of the indexed Web would be (assuming it would be possible to take such a sample), has the virtue of using the actual Web sites that library patrons visited during a specific period. Because library patrons selected the universe of Web sites that Biek and Finnell's studies analyzed, this removes the possibility of bias resulting from the study author's selection of the universe of sites to be reviewed. We find that the Lemmons and Hunter studies are of little probative value because of the methodology used to select the sample universe of Web sites to be tested. We will therefore focus on the studies conducted by Finnell and Biek in trying to ascertain estimates of the rates of over- and underblocking that takes place when filters are used in public libraries. The government hired expert witness Cory Finnell to study the Internet logs compiled by the public libraries systems in Tacoma, Washington; Westerville, Ohio; and Greenville, South Carolina. Each of these libraries uses filtering software that keeps a log of information about individual Web site requests made by library patrons. Finnell, whose consulting firm specializes in data analysis, has substantial experience evaluating Internet access logs generated on networked systems. He spent more than a year developing a reporting tool for N2H2, and, in the course of that work, acquired a familiarity with the design and operation of Internet filtering products.

The Tacoma library uses Cyber Patrol filtering software, and logs information only on sites that were blocked. Finnell worked from a list of all sites that were blocked in the Tacoma public library in the month of August 2001. The Westerville library uses the Websense filtering product, and logs information on both blocked sites and non-blocked sites. When the logs reach a certain size, they are overwritten by new usage logs. Because of this overwriting feature, logs were available to Finnell only for the relatively short period from October 1, 2001 to October 3, 2001. The Greenville library uses N2H2's filtering product and logs both blocked sites and sites that patrons accessed. The logs contain more than 500,000 records per day. Because of the volume of the records, Finnell restricted his analysis to the period from August 2, 2001 to August 15, 2001.

Finnell calculated an overblocking rate for each of the three libraries by examining the host Web site containing each of the blocked pages. He did not employ a sampling technique, but instead examined each blocked Web site. If the contents of a host Web site or the pages within the Web site were consistent with the filtering product's definition of the category under which the site was blocked, Finnell considered it to be an accurate block. Finnell and three others, two of whom were temporary employees, examined the Web sites to determine whether they were consistent with the filtering companies' category definitions. Their review was, of course, necessarily limited by: (1) the clarity of the filtering companies' category definitions; (2) Finnell's and his employees' interpretations of the definitions; and (3) human error. The study's reliability is also undercut by the fact that Finnell failed to archive the blocked Web pages as they existed either at the point that a patron in one of the three libraries was denied access or when Finnell and his team reviewed the pages. It is therefore impossible for anyone to check the accuracy and consistency of Finnell's review team, or to know whether the pages contained the same content when the block occurred as they did when Finnell's team reviewed them. This is a key flaw, because the results of the study depend on individual determinations as to overblocking and underblocking, in which Finnell and his team were required to compare what they saw on the Web pages that they reviewed with standard definitions provided by the filtering company.

Tacoma library's Cyber Patrol software blocked 836 unique Web sites during the month of August. Finnell determined that 783 of those blocks were accurate and that 53 were inaccurate. The error rate for Cyber Patrol was therefore estimated to be 6.34%, and the true error rate was estimated with 95% confidence to lie within the range of 4.69% to 7.99%. Finnell and his team reviewed 185 unique Web sites that were blocked by Westerville Library's Websense filter during the logged period and determined that 158 of them were accurate and that 27 of them were inaccurate. He therefore estimated the Websense filter's overblocking rate at 14.59% with a 95% confidence interval of 9.51% to 19.68%. Additionally, Finnell examined 1,674 unique Web sites that were blocked by the Greenville Library's N2H2 filter during the relevant period and determined that 1,520 were accurate and that 87 were inaccurate. This yields an estimated overblocking rate of 5.41% and a 95% confidence interval of 4.33% to 6.55%. Finnell's methodology was materially flawed in that it understates the rate of overblocking for the following reasons. First, patrons from the three libraries knew that the filters were operating, and may have been deterred from attempting to access Web sites that they perceived to be "borderline" sites, i.e., those that may or may not have been appropriately filtered according to the filtering companies' category definitions. Second, in their cross-examination of Finnell, the plaintiffs offered screen shots of a number of Web sites that, according to Finnell, had been appropriately blocked, but that Finnell admitted contained only benign materials. Finnell's explanation was that the Web sites must have changed between the time when he conducted the study and the time of the trial, but because he did not archive the images as they existed when his team reviewed them for the study, there is no way to verify this. Third, because of the way in which Finnell counted blocked Web sites i.e., if separate patrons attempted to reach the same Web site, or one or more patrons attempted to access more than one page on a single Web site, Finnell counted these attempts as a single block, see supra note 10 his results necessarily understate the number of times that patrons were erroneously denied access to information.

At all events, there is no doubt that Finnell's estimated rates of overblocking, which are based on the filtering companies' own category definitions, significantly understate the rate of overblocking with respect to CIPA's category definitions for filtering for adults. The filters used in the Tacoma, Westerville, and Greenville libraries were configured to block, among other things, images of full nudity and sexually explicit materials. There is no dispute, however, that these categories are far broader than CIPA's categories of visual depictions that are obscene, or child pornography, the two categories of material that libraries subject to CIPA must certify that they filter during adults' use of the Internet. Finnell's study also calculated underblocking rates with respect to the Westerville and Greenville Libraries (both of which logged not only their blocked sites, but all sites visited by their patrons), by taking random samples of URLs from the list of sites that were not blocked. The study used a sample of 159 sites that were accessed by Westerville patrons and determined that only one of them should have been blocked under the software's category definitions, yielding an underblocking rate of 0.6%. Given the size of the sample, the 95% confidence interval is 0% to 1.86%. The study examined a sample of 254 Web sites accessed by patrons in Greenville and found that three of them should have been blocked under the filtering software's category definitions. This results in an estimated underblocking rate of 1.2% with a 95% confidence interval ranging from 0% to 2.51%.

We do not credit Finnell's estimates of the rates of underblocking in the Westerville and Greenville public libraries for several reasons. First, Finnell's estimates likely understate the actual rate of underblocking because patrons, who knew that filtering programs were operating in the Greenville and Westerville Libraries, may have refrained from attempting to access sites with sexually explicit materials, or other contents that they knew would probably meet a filtering program's blocked categories. Second, and most importantly, we think that the formula that Finnell used to calculate the rate of underblocking in these two libraries is not as meaningful as the formula that information scientists typically use to calculate a rate of recall, which we describe above in Subsection II.E.3. As Dr. Nunberg explained, the standard method that information scientists use to calculate a rate of recall is to sort a set of items into two groups, those that fall into a particular category (e.g., those that should have been blocked by a filter) and those that do not. The rate of recall is then calculated by dividing the number of items that the system correctly identified as belonging to the category by the total number of items in the category.

In the example above, we discussed a database that contained 1000 photographs. Assume that 200 of these photographs were pictures of dogs. If, for example, a classification system designed to identify pictures of dogs identified 80 of the dog pictures and failed to identify 120, it would have performed with a recall rate of 40%. This would be analogous to a filter that underblocked at a rate of 60%. To calculate the recall rate of the filters in the Westerville and Greenville public libraries in accordance with the standard method described above, Finnell should have taken a sample of sites from the libraries' Internet use logs (including both sites that were blocked and sites that were not), and divided the number of sites in the sample that the filter incorrectly failed to block by the total number of sites in the sample that should have been blocked. What Finnell did instead was to take a sample of sites that were not blocked, and divide the total number of sites in this sample by the number of sites in the sample that should have been blocked. This made the denominator that Finnell used much larger than it would have been had he used the standard method for calculating recall, consequently making the underblocking rate that he calculated much lower than it would have been under the standard method.

Moreover, despite the relatively low rates of underblocking that Finnell's study found, librarians from several of the libraries proffered by defendants that use blocking products, including Greenville, Tacoma, and Westerville, testified that there are instances of underblocking in their libraries. No quantitative evidence was presented comparing the effectiveness of filters and other alternative methods used by libraries to prevent patrons from accessing visual depictions that are obscene, child pornography, or in the case of minors, harmful to minors. Biek undertook a similar study of the overblocking rates that result from the Tacoma Library's use of the Cyber Patrol software. He began with the 3,733 individual blocks that occurred in the Tacoma Library in October 2000 and drew from this data set a random sample of 786 URLs. He calculated two rates of overblocking, one with respect to the Tacoma Library's policy on Internet use that the pictorial content of the site may not include "graphic materials depicting full nudity and sexual acts which are portrayed obviously and exclusively for sensational or pornographic purposes" and the other with respect to Cyber Patrol's own category definitions. He estimated that Cyber Patrol overblocked 4% of all Web pages in October 2000 with respect to the definitions of the Tacoma Library's Internet Policy and 2% of all pages with respect to Cyber Patrol's own category definitions.

It is difficult to determine how reliable Biek's conclusions are, because he did not keep records of the raw data that he used in his study; nor did he archive images of the Web pages as they looked when he made the determination whether they were properly classified by the Cyber Patrol program. Without this information, it is impossible to verify his conclusions (or to undermine them). And Biek's study certainly understates Cyber Patrol's overblocking rate for some of the same reasons that Finnell's study likely understates the true rates of overblocking used in the libraries that he studied. We also note that Finnell's study, which analyzed a set of Internet logs from the Tacoma Library during which the same filtering program was operating with the same set of blocking categories enabled, found a significantly higher rate of overblocking than the Biek study did. Biek found a rate of overblocking of approximately 2% while the Finnell study estimated a 6.34% rate of overblocking. At all events, the category definitions employed by CIPA, at least with respect to adult use visual depictions that are obscene or child pornography are narrower than the materials prohibited by the Tacoma Library policy, and therefore Biek's study understates the rate of overblocking with respect to CIPA's definitions for adults. In sum, we think that Finnell's study, while we do not credit its estimates of underblocking, is useful because it states lower bounds with respect to the rates of overblocking that occurred when the Cyber Patrol, Websense, and N2H2 filters were operating in public libraries. While these rates are substantial between nearly 6% and 15% we think, for the reasons stated above, that they greatly understate the actual rates of overblocking that occurs, and therefore cannot be considered as anything more than minimum estimates of the rates of overblocking that happens in all filtering programs. 5. Methods of Obtaining Examples of Erroneously Blocked Web Sites

The plaintiffs assembled a list of several thousand Web sites that they contend were, at the time of the study, likely to have been erroneously blocked by one or more of four major commercial filtering programs: SurfControl Cyber Patrol 6.0.1.47, N2H2 Internet Filtering 2.0, Secure Computing SmartFilter 3.0.0.01, and Websense Enterprise 4.3.0. They compiled this list using a two-step process. First, Benjamin Edelman, an expert witness who testified before us, compiled a list of more than 500,000 URLs and devised a program to feed them through all four filtering programs in order to compile a list of URLs that might have been erroneously blocked by one or more of the programs. Second, Edelman forwarded subsets of the list that he compiled to librarians and professors of library science whom the plaintiffs had hired to review the blocked sites for suitability in the public library context. Edelman assembled the list of URLs by compiling Web pages that were blocked by the following categories in the four programs: Cyber Patrol: Adult/Sexually Explicit; N2H2: Adults Only, Nudity, Pornography, and Sex, with "exceptions" engaged in the categories of Education, For Kids, History, Medical, Moderated, and Text/Spoken Only; SmartFilter: Sex, Nudity, Mature, and Extreme; Websense: Adult Content, Nudity, and Sex.

Edelman then assembled a database of Web sites for possible testing. He derived this list by automatically compiling URLs from the Yahoo index of Web sites, taking them from categories from the Yahoo index that differed significantly from the classifications that he had enabled in each of the blocking programs (taking, for example, Web sites from Yahoo's "Government" category). He then expanded this list by entering URLs taken from the Yahoo index into the Google search engine's "related" search function, which provides the user with a list of similar sites. Edelman also included and excluded specific Web sites at the request of the plaintiffs' counsel.

Taking the list of more than 500,000 URLs that he had compiled, Edelman used an automated system that he had developed to test whether particular URLs were blocked by each of the four filtering programs. This testing took place between February and October 2001. He recorded the specific dates on which particular sites were blocked by particular programs, and, using commercial archiving software, archived the contents of the home page of the blocked Web sites (and in some instances the pages linked to from the home page) as it existed when it was blocked. Through this process, Edelman, whose testimony we credit, compiled a list of 6,777 URLs that were blocked by one or more of the four programs. Because these sites were chosen from categories from the Yahoo directory that were unrelated to the filtering categories that were enabled during the test (i.e., "Government" vs. "Nudity"), he reasoned that they were likely erroneously blocked. As explained in the margin, Edelman repeated his testing and discovered that Cyber Patrol had unblocked most of the pages on the list of 6,777 after he had published the list on his Web site. His records indicate that an employee of SurfControl (the company that produces Cyber Patrol software) accessed his site and presumably checked out the URLs on the list, thus confirming Edelman's judgment that the majority of URLs on the list were erroneously blocked. Edelman forwarded the list of blocked sites to Dr. Joseph Janes, an Assistant Professor in the Information School of the University of Washington who also testified at trial as an expert witness. Janes reviewed the sites that Edelman compiled to determine whether they are consistent with library collection development, i.e., whether they are sites to which a reference librarian would, consistent with professional standards, direct a patron as a source of information.

Edelman forwarded Janes a list of 6,775 Web sites, almost the entire list of blocked sites that he collected, from which Janes took a random sample of 859 using the SPSS statistical software package. Janes indicated that he chose a sample size of 859 because it would yield a 95% confidence interval of plus or minus 2.5%. Janes recruited a group of 16 reviewers, most of whom were current or former students at the University of Washington's Information School, to help him identify which sites were appropriate for library use. We describe the process that he used in the margin. Due to the inability of a member of Janes's review team to complete the reviewing process, Janes had to cut 157 Web sites out of the sample, but because the Web sites were randomly assigned to reviewers, it is unlikely that these sites differed significantly from the rest of the sample. That left the sample size at 699, which widened the 95% confidence interval to plus or minus 2.8%.

Of the total 699 sites reviewed, Janes's team concluded that 165 of them, or 23.6% percent of the sample, were not of any value in the library context (i.e., no librarian would, consistent with professional standards, refer a patron to these sites as a source of information). They were unable to find 60 of the Web sites, or 8.6% of the sample. Therefore, they concluded that the remaining 474 Web sites, or 67.8% of the sample, were examples of overblocking with respect to materials that are appropriate sources of information in public libraries. Applying a 95% confidence interval of plus or minus 2.8%, the study concluded that we can be 95% confident that the actual percentage of sites in the list of 6,775 sites that are appropriate for use in public libraries is somewhere between 65.0% and 70.6%. In other words, we can be 95% certain that the actual number of sites out of the 6,775 that Edelman forwarded to Janes that are appropriate for use in public libraries (under Janes's standard) is somewhere between 4,403 and 4,783.

The government raised some valid criticisms of Janes's methodology, attacking in particular the fact that, while sites that received two "yes" votes in the first round of voting were determined to be of sufficient interest in a library context to be removed from further analysis, sites receiving one or two "no" votes were sent to the next round. The government also correctly points out that results of Janes's study can be generalized only to the population of 6,775 sites that Edelman forwarded to Janes. Even taking these criticisms into account, and discounting Janes's numbers appropriately, we credit Janes's study as confirming that Edelman's set of 6,775 Web sites contains at least a few thousand URLs that were erroneously blocked by one or more of the four filtering programs that he used, whether judged against CIPA's definitions, the filters' own category criteria, or against the standard that the Janes study used. Edelman tested only 500,000 unique URLs out of the 4000 times that many, or two billion, that are estimated to exist in the indexable Web. Even assuming that Edelman chose the URLs that were most likely to be erroneously blocked by commercial filtering programs, we conclude that many times the number of pages that Edelman identified are erroneously blocked by one or more of the filtering programs that he tested. Edelman's and Janes's studies provide numerous specific examples of Web pages that were erroneously blocked by one or more filtering programs. The Web pages that were erroneously blocked by one or more of the filtering programs do not fall into any neat patterns; they range widely in subject matter, and it is difficult to tell why they may have been overblocked. The list that Edelman compiled, for example, contains Web pages relating to religion, politics and government, health, careers, education, travel, sports, and many other topics. In the next section, we provide examples from each of these categories. 6. Examples of Erroneously Blocked Web Sites

Several of the erroneously blocked Web sites had content relating to churches, religious orders, religious charities, and religious fellowship organizations. These included the following Web sites: the Knights of Columbus Council 4828, a Catholic men's group associated with St. Patrick's Church in Fallon, Nevada, http://msnhomepages.talkcity.com/SpiritSt/kofc4828, which was blocked by Cyber Patrol in the "Adult/Sexually Explicit" category; the Agape Church of Searcy, Arkansas, http://www.agapechurch.com, which was blocked by Websense as "Adult Content"; the home page of the Lesbian and Gay Havurah of the Long Beach, California Jewish Community Center, http://www.compupix.com/gay/havurah.htm, which was blocked by N2H2 as "Adults Only, Pornography," by Smartfilter as "Sex," and by Websense as "Sex"; Orphanage Emmanuel, a Christian orphanage in Honduras that houses 225 children, http://home8.inet.tele.dk/rfb_viva, which was blocked by Cyber Patrol in the "Adult/Sexually Explicit" category; Vision Art Online, which sells wooden wall hangings for the home that contain prayers, passages from the Bible, and images of the Star of David, http://www.visionartonline.com, which was blocked in Websense's "Sex" category; and the home page of Tenzin Palmo, a Buddhist nun, which contained a description of her project to build a Buddhist nunnery and international retreat center for women, http://www.tenzinpalmo.com, which was categorized as "Nudity" by N2H2.

Several blocked sites also contained information about governmental entities or specific political candidates, or contained political commentary. These included: the Web site for Kelley Ross, a Libertarian candidate for the California State Assembly, http://www.friesian.com/ross/ca40, which N2H2 blocked as "Nudity"; the Web site for Bob Coughlin, a town selectman in Dedham, Massachusetts, http://www.bobcoughlin.org, which was blocked under N2H2's "Nudity" category; a list of Web sites containing information about government and politics in Adams County, Pennsylvania, http://www.geocities.com/adamscopa, which was blocked by Websense as "Sex"; the Web site for Wisconsin Right to Life, http://www.wrtl.org, which N2H2 blocked as "Nudity"; a Web site that promotes federalism in Uganda, http://federo.com, which N2H2 blocked as "Adults Only, Pornography"; "Fight the Death Penalty in the USA," a Danish Web site dedicated to criticizing the American system of capital punishment, http://www.fdp.dk, which N2H2 blocked as "Pornography"; and "Dumb Laws," a humor Web site that makes fun of outmoded laws, http://www.dumblaws.com, which N2H2 blocked under its "Sex" category. Erroneously blocked Web sites relating to health issues included the following: a guide to allergies, http://www.x- sitez.com/allergy, which was categorized as "Adults Only, Pornography" by N2H2; a health question and answer site sponsored by Columbia University, http://www.goaskalice.com.columbia.edu, which was blocked as "Sex" by N2H2, and as "Mature" by Smartfilter; the Western Amputee Support Alliance Home Page, http://www.usinter.net/wasa, which was blocked by N2H2 as "Pornography"; the Web site of the Willis-Knighton Cancer Center, a Shreveport, Louisiana cancer treatment facility, http://cancerftr.wkmc.com, which was blocked by Websense under the "Sex" category; and a site dealing with halitosis, http://www.dreamcastle.com/tungs, which was blocked by N2H2 as "Adults, Pornography," by Smartfilter as "Sex," by Cyber Patrol as "Adult/Sexually Explicit," and by Websense as "Adult Content."

The filtering programs also erroneously blocked several Web sites having to do with education and careers. The filtering programs blocked two sites that provide information on home schooling. "HomEduStation the Internet Source for Home Education," http://www.perigee.net/~mcmullen/homedustation/, was categorized by Cyber Patrol as "Adult/Sexually Explicit." Smartfilter blocked "Apricot: A Web site made by and for home schoolers," http://apricotpie.com, as "Sex." The programs also miscategorized several career-related sites. "Social Work Search," http://www.socialworksearch.com/, is a directory for social workers that Cyber Patrol placed in its "Adult/Sexually Explicit" category. The "Gay and Lesbian Chamber of Southern Nevada," http://www.lambdalv.com, "a forum for the business community to develop relationships within the Las Vegas lesbian, gay, transsexual, and bisexual community" was blocked by N2H2 as "Adults Only, Pornography." A site for aspiring dentists, http://www.vvm.com/~bond/home.htm, was blocked by Cyber Patrol in its "Adult/Sexually Explicit" category. The filtering programs erroneously blocked many travel Web sites, including: the Web site for the Allen Farmhouse Bed & Breakfast of Alleghany County, North Carolina, http://planet- nc.com/Beth/index.html, which Websense blocked as "Adult Content"; Odysseus Gay Travel, a travel company serving gay men, http://www.odyusa.com, which N2H2 categorized as "Adults Only, Pornography"; Southern Alberta Fly Fishing Outfitters, http://albertaflyfish.com, which N2H2 blocked as "Pornography"; and "Nature and Culture Conscious Travel," a tour operator in Namibia, http://www.trans-namibia-tours.com, which was categorized as "Pornography" by N2H2.

The filtering programs also miscategorized a large number of sports Web sites. These included: a site devoted to Willie O'Ree, the first African-American player in the National Hockey League, http://www.missioncreep.com/mw/oree.html, which Websense blocked under its "Nudity" category; the home page of the Sydney University Australian Football Club, http://www.tek.com.au/suafc, which N2H2 blocked as "Adults Only, Pornography," Smartfilter blocked as "Sex," Cyber Patrol blocked as "Adult/Sexually Explicit" and Websense blocked as "Sex"; and a fan's page devoted to the Toronto Maple Leafs hockey team, http://www.torontomapleleafs.atmypage.com, which N2H2 blocked under the "Pornography" category. 7. Conclusion: The Effectiveness of Filtering Programs Public libraries have adopted a variety of means of dealing with problems created by the provision of Internet access. The large amount of sexually explicit speech that is freely available on the Internet has, to varying degrees, led to patron complaints about such matters as unsought exposure to offensive material, incidents of staff and patron harassment by individuals viewing sexually explicit content on the Internet, and the use of library computers to access illegal material, such as child pornography. In some libraries, youthful library patrons have persistently attempted to use the Internet to access hardcore pornography.

Those public libraries that have responded to these problems by using software filters have found such filters to provide a relatively effective means of preventing patrons from accessing sexually explicit material on the Internet. Nonetheless, out of the entire universe of speech on the Internet falling within the filtering products' category definitions, the filters will incorrectly fail to block a substantial amount of speech. Thus, software filters have not completely eliminated the problems that public libraries have sought to address by using the filters, as evidenced by frequent instances of underblocking. Nor is there any quantitative evidence of the relative effectiveness of filters and the alternatives to filters that are also intended to prevent patrons from accessing illegal content on the Internet. Even more importantly (for this case), although software filters provide a relatively cheap and effective, albeit imperfect, means for public libraries to prevent patrons from accessing speech that falls within the filters' category definitions, we find that commercially available filtering programs erroneously block a huge amount of speech that is protected by the First Amendment. Any currently available filtering product that is reasonably effective in preventing users from accessing content within the filter's category definitions will necessarily block countless thousands of Web pages, the content of which does not match the filtering company's category definitions, much less the legal definitions of obscenity, child pornography, or harmful to minors. Even Finnell, an expert witness for the defendants, found that between 6% and 15% of the blocked Web sites in the public libraries that he analyzed did not contain content that meets even the filtering products' own definitions of sexually explicit content, let alone CIPA's definitions.

This phenomenon occurs for a number of reasons explicated in the more detailed findings of fact supra. These include limitations on filtering companies' ability to: (1) harvest Web pages for review; (2) review and categorize the Web pages that they have harvested; and (3) engage in regular re-review of the Web pages that they have previously reviewed. The primary limitations on filtering companies' ability to harvest Web pages for review is that a substantial majority of pages on the Web are not indexable using the spidering technology that Web search engines use, and that together, search engines have indexed only around half of the Web pages that are theoretically indexable. The fast rate of growth in the number of Web pages also limits filtering companies' ability to harvest pages for review. These shortcomings necessarily result in significant underblocking. Several limitations on filtering companies' ability to review and categorize the Web pages that they have harvested also contribute to over- and underblocking. First, automated review processes, even those based on "artificial intelligence," are unable with any consistency to distinguish accurately material that falls within a category definition from material that does not. Moreover, human review of URLs is hampered by filtering companies' limited staff sizes, and by human error or misjudgment. In order to deal with the vast size of the Web and its rapid rates of growth and change, filtering companies engage in several practices that are necessary to reduce underblocking, but inevitably result in overblocking. These include: (1) blocking whole Web sites even when only a small minority of their pages contain material that would fit under one of the filtering company's categories (e.g., blocking the Salon.com site because it contains a sex column); (2) blocking by IP address (because a single IP address may contain many different Web sites and many thousands of pages of heterogenous content); and (3) blocking loophole sites such as translator sites and cache sites, which archive Web pages that have been removed from the Web by their original publisher.

Finally, filtering companies' failure to engage in regular re-review of Web pages that they have already categorized (or that they have determined do not fall into any category) results in a substantial amount of over- and underblocking. For example, Web publishers change the contents of Web pages frequently. The problem also arises when a Web site goes out of existence and its domain name or IP address is reassigned to a new Web site publisher. In that case, a filtering company's previous categorization of the IP address or domain name would likely be incorrect, potentially resulting in the over- or underblocking of many thousands of pages. The inaccuracies that result from these limitations of filtering technology are quite substantial. At least tens of thousands of pages of the indexable Web are overblocked by each of the filtering programs evaluated by experts in this case, even when considered against the filtering companies' own category definitions. Many erroneously blocked pages contain content that is completely innocuous for both adults and minors, and that no rational person could conclude matches the filtering companies' category definitions, such as "pornography" or "sex."

The number of overblocked sites is of course much higher with respect to the definitions of obscenity and child pornography that CIPA employs for adults, since the filtering products' category definitions, such as "sex" and "nudity," encompass vast amounts of Web pages that are neither child pornography nor obscene. Thus, the number of pages of constitutionally protected speech blocked by filtering products far exceeds the many thousands of pages that are overblocked by reference to the filtering products' category definitions.

No presently conceivable technology can make the judgments necessary to determine whether a visual depiction fits the legal definitions of obscenity, child pornography, or harmful to minors. Given the state of the art in filtering and image recognition technology, and the rapidly changing and expanding nature of the Web, we find that filtering products' shortcomings will not be solved through a technical solution in the foreseeable future. In sum, filtering products are currently unable to block only visual depictions that are obscene, child pornography, or harmful to minors (or, only content matching a filtering product's category definitions) while simultaneously allowing access to all protected speech (or, all content not matching the blocking product's category definitions). Any software filter that is reasonably effective in blocking access to Web pages that fall within its category definitions will necessarily erroneously block a substantial number of Web pages that do not fall within its category definitions. 2. Analytic Framework for the Opinion: The Centrality of Dole and the Role of the Facial Challenge

Both the plaintiffs and the government agree that, because this case involves a challenge to the constitutionality of the conditions that Congress has set on state actors' receipt of federal funds, the Supreme Court's decision in South Dakota v. Dole, 483 U.S. 203 (1987), supplies the proper threshold analytic framework. The constitutional source of Congress's spending power is Article I, Sec. 8, cl. 1, which provides that "Congress shall have Power . . . to pay the Debts and provide for the common Defence and general Welfare of the United States." In Dole, the Court upheld the constitutionality of a federal statute requiring the withholding of federal highway funds from any state with a drinking age below 21. Id. at 211-12. In sustaining the provision's constitutionality, Dole articulated four general constitutional limitations on Congress's exercise of the spending power.

First, "the exercise of the spending power must be in pursuit of 'the general welfare.'" Id. at 207. Second, any conditions that Congress sets on states' receipt of federal funds must be sufficiently clear to enable recipients "to exercise their choice knowingly, cognizant of the consequences of their participation." Id. (internal quotation marks and citation omitted). Third, the conditions on the receipt of federal funds must bear some relation to the purpose of the funding program. Id. And finally, "other constitutional provisions may provide an independent bar to the conditional grant of federal funds." Id. at 208. In particular, the spending power "may not be used to induce the States to engage in activities that would themselves be unconstitutional. Thus, for example, a grant of federal funds conditioned on invidiously discriminatory state action or the infliction of cruel and unusual punishment would be an illegitimate exercise of the Congress' broad spending power." Id. at 210.

Plaintiffs do not contend that CIPA runs afoul of the first three limitations. However, they do allege that CIPA is unconstitutional under the fourth prong of Dole because it will induce public libraries to violate the First Amendment. Plaintiffs therefore submit that the First Amendment "provide[s] an independent bar to the conditional grant of federal funds" created by CIPA. Id. at 208. More specifically, they argue that by conditioning public libraries' receipt of federal funds on the use of software filters, CIPA will induce public libraries to violate the First Amendment rights of Internet content-providers to disseminate constitutionally protected speech to library patrons via the Internet, and the correlative First Amendment rights of public library patrons to receive constitutionally protected speech on the Internet. The government concedes that under the Dole framework, CIPA is facially invalid if its conditions will induce public libraries to violate the First Amendment. The government and the plaintiffs disagree, however, on the meaning of Dole's "inducement" requirement in the context of a First Amendment facial challenge to the conditions that Congress places on state actors' receipt of federal funds. The government contends that because plaintiffs are bringing a facial challenge, they must show that under no circumstances is it possible for a public library to comply with CIPA's conditions without violating the First Amendment. The plaintiffs respond that even if it is possible for some public libraries to comply with CIPA without violating the First Amendment, CIPA is facially invalid if it "will result in the impermissible suppression of a substantial amount of protected speech."

Back to Index Next