Electronic Publishing, Virtual Libraries, and the Internet

Benjamin Franklin would have loved the Internet. Consider his passion for communications technologies: he was an author and publisher of journals and almanacs; he was a founder and user of public lending libraries; he saw the mail as an important mass communications medium -- not just a way for individuals to talk to one another. Transported to today's time, Franklin would no doubt enthusiastically embrace the Internet -- as a purveyor of his own writings, as a publisher of others' texts, and probably as an Internet service provider. Poor Richard's Almanac and the Pennsylvania Gazette would be among the first electronic publications. We do not have Franklin as an advocate and creator of Internet publishing mechanisms, but we do have a number of talented individuals and visionary organizations leading the way towards electronic publishing. These groups include libraries, university presses, commercial publishers, and scholarly organizations. This chapter will explore the efforts of some of these electronic publishing pioneers, and the role the Internet will play in this exciting arena.

What is Electronic Publishing?

For years, computers have been used to assist in the preparation of text for print publication. Whether a document is printed on a dot matrix printer, a laser printer, or a 1200 dot per inch typesetting machine, computer hardware and software is likely to play a major role in the preparation of printed matter. How does this relate to electronic publishing? In a nutshell, the idea of electronic publishing is to eliminate that last step or the process, physical printing. Instead, the document is captured in some sort of electronic form, and distributed to remote recipients for viewing online or for printing at the user's site.

Electronic publishing is not just a theoretical concept in the Internet context. Already, serious scholarly journals are being offered to Internet readers in electronic form. Some of these e- journals follow a formal editing regimen that includes peer review. Publishers of newspapers and other popular periodicals are also finding ways to publish via the Internet.

Electronic publishing can offer some major advantages:

Electronic publishing also faces challenges: There is a distinction between electronic publishing and more conventional online databases. With online databases, the emphasis is upon searching and random access. The user is hunting for a particular piece of information across a vast database or set of databases. Electronic publishing seeks to deliver actual documents intended for more-or-less sequential reading. (The use of hypertext confounds the concept of sequential reading somewhat.) Documents to be read electronically have authors, editors, and definite titles, and often are delivered as parts of online magazines or electronic journals. Of course, there is some overlap between the area of electronic publishing and online databases; an example is the full-text database of The New York Times offered on the commercial Nexis database.

Despite the relative newness of electronic publishing, and despite these challenges, the Internet has already become an important medium over which electronic publishing is conducted.

Electronic Publishing Technologies

Let's say you want to send a monograph you have written to a friend via the Internet. Assume your favorite word processor is Microsoft Word. Assume your friend uses PFS Write. If you send your document in Word's .doc format to your friend, he or she will have a bit of a problem seeing what you wrote online, or printing it locally. The problem of formats extends to all examples of electronic publishing. Despite numerous attempts to devise "the" standard for electronic document interchange, the practical reality is that none of these standards are universally accepted.

As a result, many of the early efforts at deploying electronic journals over the Internet have adopted a "least-common denominator" approach: text is delivered in flat ASCII form, which means that any of the features that give richness to printed documents are lost: there are no bold headlines, no variations in fonts, and no photographs or diagrams.

The format of the document is not the only issue; means of access is also a concern. Many authors and publishers want to reach the broadest possible audience on the Internet, and that means delivering information to users with e-mail-only access. Consequently, LISTSERV and other mailing list processors have been popular for many pioneering e-journals. But these tools were really desgined as discussion mechanisms, not ways to deliver documents. In particular, a list of back issues with readable titles must be offered as a separately-maintained document. A tool like Gopher or World-Wide Web can offer vastly superior presentation of a collection of document titles. Some of the early e-journals are now also available via these methods.

Publishers and vendors of technology are not satisfied with the flat ASCII model of delivery, and are working towards broadly-accepted alternatives. For years now some authors have supplied documents to their readers in PostScript form. Adobe Corporation's PostScript is the language most commonly used to drive laser printers. An author distributing a document in that form assumes that the reader will have ready access to a PostScript printer, or to an online viewing tool that will display the PostScript file on the user's screen.

By sending documents in PostScript form, an author can supply the features missing in flat ASCII files, including rich uses of multiple fonts including mathematical symbols, and diagrams and photographs. Unfortunately, although there is a rigorous standard for various versions of PostScript, various dialects have evolved, and these dialects are not portable. The result is that a recipient may send a file to the laser printer, only to find the file is utterly unprintable. The printer usually communicates its dissatisfaction with the useless semaphore of a blinking light. The author is unable to help, as the file prints perfectly fine on his or her printer. Even in cases where a document appears to print satisfactorily, often the printer does not have the same fonts installed as the author's word processor expected, and ill-fitting substitutes may occur.

In 1993, Adobe tried to address these problems by introducing a new Portable Document Format, and an Acrobat family of products capable of creating and displaying these files. Under the Acrobat model, the author of a document uses a special print driver to create a PDF file, or after the fact an author or publisher uses a tool called the Distiller to convert a PostScript file to PDF form. Acrobat has features designed to enhance portability; for instance it attempts to perform "intelligent font matching" so that a reader can view a document appropriately even if he or she lacks the particular software fonts used by the author. PostScript files can be quite large compared to their flat ASCII equivalents; PDF addresses this by using a more compact language and compression of embedded images. The following diagram summarizes the Acrobat family of tools:

Here is an example of what a file looks like as rendered by Acrobat. This is a sample page from a product of The New York Times Company called TimesFax. TimesFax was originally developed by the Times as a brief (approximately eight page) summary of the major articles in a given day's issue of the complete paper. The original TimesFax market was envisioned to be those aboard cruise ships and businesspeople overseas; the delivery medium is traditional facsimile. As of this writing, the Times is exploring the possibility of delivering TimesFax over the Internet using the Portable Document Format.

A tool like Acrobat offers the promise of delivering documents in their richest form, greatly advancing the possibilities for electronic publishing. But there are concerns: it takes a rather fast processor to scroll through such graphically-rich documents. Unless one is using a state-of-the-art computer, paging through an electronic file with Acrobat is nowhere near as brisk as flipping pages of a printed document. Furthermore, the user of Acrobat must obtain the Acrobat Reader, which is a commercial product, and install the Reader, prior to viewing any PDF files. Although Adobe markets the Reader with volume pricing discounts, some organizations resist the idea of buying a proprietary tool and installing it on hundreds or thousands of computers. [1]

Some advocates of Internet publishing see the Hypertext Markup Language used in the World- Wide Web as a viable, and preferable, alternative to a proprietary solution like Acrobat. They argue that HTML can produce eminently readable documents, and that the networked hypertext orientation of HTML is better suited to Internet document delivery. Indeed, we have seen examples of the rich quality of display a browser like Mosaic can offer. Although HTML is generally used to deliver documents over a live network connection, it also could be used to offer a large set of documents to be viewed locally via Mosaic. An author might offer a large document as a collection of interwoven HTML files, to be fetched en masse for viewing offline; such a collection might be delivered via anonymous FTP or even via floppy. Thus HTML competes with portable document solutions both in the context of Internet hypertext and in delivery of large documents to be viewed outside the Internet context.

There are key differences between an approach like Acrobat and the HTML model. The main difference is one of philosophy: PDF allows an author to prepare a document that will be rendered on screen, or printed on paper, in a close approximation to the layout and appearance chosen by the author. In contrast, as we have seen, the Web follows a philosophy of allowing the reader to configure a browser such as Mosaic to present information according to personal preference; the raison d'etre of HTML is to allow authors to mark the semantic elements of a document, not to control the precise presentation on screen or on paper. The following chart summarizes some of the differences between Acrobat and HTML.

Comparison of Online Publishing Technologies: PDF versus HTML

Feature           Portable Document Format      HyperText Markup Language
Origin             Adobe Corporation             World-Wide Web community (CERN, 
						other institutions worldwide)

Base technology   Postscript printer language;  Standard Generalized Markup 
		  JPEG still image format       Language

Availability      Commercial                    Public Domain

Hypertext          Yes, supports internal        Yes, inherently hypertext-
  -capable?       hypertext links within        oriented.  Links in the
		  documents; for instance, a    form of Uniform Resource
		  table of contents could have  Locators point to other
		  links to the relevant pages.  documents to retrieve

Authoring tools   Acrobat Distiller (given a    Text editors, with markup
		  Postscript file) or Exchange  manually inserted, plus browser
		  (used as a print driver in    such as Mosaic for verification.
		  application such as MS-Word)  [Various attempts at WYSIWYG
						editors underway]

Viewing software  Acrobat Reader (or Exchange)  Mosaic (or other browser)
used by end user 

Who controls      Generally, the author:        Generally, the user: The author
layout/present-   chooses layout, specific      identifies the key elements of 
ation             fonts, etc.  Acrobat viewers  the document (headers, para-
		  can use "intelligent font     graphs, list entries) & the
		  matching" to simulate fonts   client program (browser) dis-
		  that do not exist on a        plays text according to various
		  user's workstation            settings under user control.

Basic unit        An entire PDF document        A "page" including hypertext
transmitted over  (could be a page, a chapter,  links; conceivably arbitrarily
a network?        or a book)                    long, in practice usually no
						longer than a few screens of
Acrobat is not the only commercial effort in the area of portable document technology. No Hands Software offers a tool known as Common Ground. Farallon Inc. also offers a similar tool, known as Replica; the Replica Viewer is offered for Macintosh and Windows platforms, and is freely distributable. These products are marketed in a different fashion; the tools that allow users to read documents are given away at no charge, and the tools used to create the portable documents are sold to authors and publishers. As of this writing it is too early to tell whether one of these tools will achieve market dominance, or whether HTML or some other public domain alternative will become the Internet standard for electronic document delivery.

Nor are Acrobat and its recent competitors the only tool that aspire to offer document portability. IBM defined a Document Content Architecture (DCA) during the heyday of dedicated word processors in the early 1980s. Although a corresponding MIME type has been defined, in practice one does not encounter many files or applications that tout DCA compliance on the Internet today.

Note that a tool called Ghostscript is used by many installations as a way to view PostScript files online. This tool, freely available under the Gnu license, generally is able to display PostScript files reasonably well despite the problem of dialects. A commercial product called Freedom of the Press provides the same function.

Besides PostScript and various portable document forms, there are some disciplines in which specialized formats are normally used for exchanging draft papers, and in some cases even e- journals. For instance, many mathematicians and scholars who use math in their work use a language called TeX, the brainchild of Stanford computer scientist Dr. Donald Knuth. TeX is a powerful language all its own, extremely adept at complex publishing tasks such as dealing with mathematical formulae. Users of TeX reach a level of fluency with the language such that some can read an article in that language and understand it. The more normal case is to convert the TeX "source code" into a language such as PostScript for local printing.

Go To Section 2 of Chapter 18
Chapter 18, Section 1. Copyright (c) 1994, Richard W. Wiggins. All rights reserved.