Electronic publishing is not just a theoretical concept in the Internet context. Already, serious scholarly journals are being offered to Internet readers in electronic form. Some of these e- journals follow a formal editing regimen that includes peer review. Publishers of newspapers and other popular periodicals are also finding ways to publish via the Internet.
Electronic publishing can offer some major advantages:
Despite the relative newness of electronic publishing, and despite these challenges, the Internet has already become an important medium over which electronic publishing is conducted.
As a result, many of the early efforts at deploying electronic journals over the Internet have adopted a "least-common denominator" approach: text is delivered in flat ASCII form, which means that any of the features that give richness to printed documents are lost: there are no bold headlines, no variations in fonts, and no photographs or diagrams.
The format of the document is not the only issue; means of access is also a concern. Many authors and publishers want to reach the broadest possible audience on the Internet, and that means delivering information to users with e-mail-only access. Consequently, LISTSERV and other mailing list processors have been popular for many pioneering e-journals. But these tools were really desgined as discussion mechanisms, not ways to deliver documents. In particular, a list of back issues with readable titles must be offered as a separately-maintained document. A tool like Gopher or World-Wide Web can offer vastly superior presentation of a collection of document titles. Some of the early e-journals are now also available via these methods.
Publishers and vendors of technology are not satisfied with the flat ASCII model of delivery, and are working towards broadly-accepted alternatives. For years now some authors have supplied documents to their readers in PostScript form. Adobe Corporation's PostScript is the language most commonly used to drive laser printers. An author distributing a document in that form assumes that the reader will have ready access to a PostScript printer, or to an online viewing tool that will display the PostScript file on the user's screen.
By sending documents in PostScript form, an author can supply the features missing in flat ASCII files, including rich uses of multiple fonts including mathematical symbols, and diagrams and photographs. Unfortunately, although there is a rigorous standard for various versions of PostScript, various dialects have evolved, and these dialects are not portable. The result is that a recipient may send a file to the laser printer, only to find the file is utterly unprintable. The printer usually communicates its dissatisfaction with the useless semaphore of a blinking light. The author is unable to help, as the file prints perfectly fine on his or her printer. Even in cases where a document appears to print satisfactorily, often the printer does not have the same fonts installed as the author's word processor expected, and ill-fitting substitutes may occur.
In 1993, Adobe tried to address these problems by introducing a new Portable Document Format, and an Acrobat family of products capable of creating and displaying these files. Under the Acrobat model, the author of a document uses a special print driver to create a PDF file, or after the fact an author or publisher uses a tool called the Distiller to convert a PostScript file to PDF form. Acrobat has features designed to enhance portability; for instance it attempts to perform "intelligent font matching" so that a reader can view a document appropriately even if he or she lacks the particular software fonts used by the author. PostScript files can be quite large compared to their flat ASCII equivalents; PDF addresses this by using a more compact language and compression of embedded images. The following diagram summarizes the Acrobat family of tools:
Here is an example of what a file looks like as rendered by Acrobat. This is a sample page from a product of The New York Times Company called TimesFax. TimesFax was originally developed by the Times as a brief (approximately eight page) summary of the major articles in a given day's issue of the complete paper. The original TimesFax market was envisioned to be those aboard cruise ships and businesspeople overseas; the delivery medium is traditional facsimile. As of this writing, the Times is exploring the possibility of delivering TimesFax over the Internet using the Portable Document Format.
A tool like Acrobat offers the promise of delivering documents in their richest form, greatly advancing the possibilities for electronic publishing. But there are concerns: it takes a rather fast processor to scroll through such graphically-rich documents. Unless one is using a state-of-the-art computer, paging through an electronic file with Acrobat is nowhere near as brisk as flipping pages of a printed document. Furthermore, the user of Acrobat must obtain the Acrobat Reader, which is a commercial product, and install the Reader, prior to viewing any PDF files. Although Adobe markets the Reader with volume pricing discounts, some organizations resist the idea of buying a proprietary tool and installing it on hundreds or thousands of computers. [1]
Some advocates of Internet publishing see the Hypertext Markup Language used in the World- Wide Web as a viable, and preferable, alternative to a proprietary solution like Acrobat. They argue that HTML can produce eminently readable documents, and that the networked hypertext orientation of HTML is better suited to Internet document delivery. Indeed, we have seen examples of the rich quality of display a browser like Mosaic can offer. Although HTML is generally used to deliver documents over a live network connection, it also could be used to offer a large set of documents to be viewed locally via Mosaic. An author might offer a large document as a collection of interwoven HTML files, to be fetched en masse for viewing offline; such a collection might be delivered via anonymous FTP or even via floppy. Thus HTML competes with portable document solutions both in the context of Internet hypertext and in delivery of large documents to be viewed outside the Internet context.
There are key differences between an approach like Acrobat and the HTML model. The main difference is one of philosophy: PDF allows an author to prepare a document that will be rendered on screen, or printed on paper, in a close approximation to the layout and appearance chosen by the author. In contrast, as we have seen, the Web follows a philosophy of allowing the reader to configure a browser such as Mosaic to present information according to personal preference; the raison d'etre of HTML is to allow authors to mark the semantic elements of a document, not to control the precise presentation on screen or on paper. The following chart summarizes some of the differences between Acrobat and HTML.
Feature Portable Document Format HyperText Markup Language ------------------------------------------------------------------------------- Origin Adobe Corporation World-Wide Web community (CERN, other institutions worldwide) Base technology Postscript printer language; Standard Generalized Markup JPEG still image format Language Availability Commercial Public Domain Hypertext Yes, supports internal Yes, inherently hypertext- -capable? hypertext links within oriented. Links in the documents; for instance, a form of Uniform Resource table of contents could have Locators point to other links to the relevant pages. documents to retrieve Authoring tools Acrobat Distiller (given a Text editors, with markup Postscript file) or Exchange manually inserted, plus browser (used as a print driver in such as Mosaic for verification. application such as MS-Word) [Various attempts at WYSIWYG editors underway] Viewing software Acrobat Reader (or Exchange) Mosaic (or other browser) used by end user Who controls Generally, the author: Generally, the user: The author layout/present- chooses layout, specific identifies the key elements of ation fonts, etc. Acrobat viewers the document (headers, para- can use "intelligent font graphs, list entries) & the matching" to simulate fonts client program (browser) dis- that do not exist on a plays text according to various user's workstation settings under user control. Basic unit An entire PDF document A "page" including hypertext transmitted over (could be a page, a chapter, links; conceivably arbitrarily a network? or a book) long, in practice usually no longer than a few screens of informationAcrobat is not the only commercial effort in the area of portable document technology. No Hands Software offers a tool known as Common Ground. Farallon Inc. also offers a similar tool, known as Replica; the Replica Viewer is offered for Macintosh and Windows platforms, and is freely distributable. These products are marketed in a different fashion; the tools that allow users to read documents are given away at no charge, and the tools used to create the portable documents are sold to authors and publishers. As of this writing it is too early to tell whether one of these tools will achieve market dominance, or whether HTML or some other public domain alternative will become the Internet standard for electronic document delivery.
Nor are Acrobat and its recent competitors the only tool that aspire to offer document portability. IBM defined a Document Content Architecture (DCA) during the heyday of dedicated word processors in the early 1980s. Although a corresponding MIME type has been defined, in practice one does not encounter many files or applications that tout DCA compliance on the Internet today.
Note that a tool called Ghostscript is used by many installations as a way to view PostScript files online. This tool, freely available under the Gnu license, generally is able to display PostScript files reasonably well despite the problem of dialects. A commercial product called Freedom of the Press provides the same function.
Besides PostScript and various portable document forms, there are some disciplines in which specialized formats are normally used for exchanging draft papers, and in some cases even e- journals. For instance, many mathematicians and scholars who use math in their work use a language called TeX, the brainchild of Stanford computer scientist Dr. Donald Knuth. TeX is a powerful language all its own, extremely adept at complex publishing tasks such as dealing with mathematical formulae. Users of TeX reach a level of fluency with the language such that some can read an article in that language and understand it. The more normal case is to convert the TeX "source code" into a language such as PostScript for local printing.