"Sunny's Random hAIku Generator and Why It Matters"

by Laura Fillmore
President, Open Book Systems (OBS)

Presented to Text21 Conference
at Alfred University
March 16, 1996

Copyright © 1996–2024 by Laura Fillmore; written permission required to reprint.
laura@obs.com

Sometimes it seems like the bit business has invaded even the most everyday fields of perception. Everything is broken down into units these days: chunks of text, units of thought, blinks in a clickstream. It's hardly a still picture! But maybe it's not a new picture, maybe the phone wires and computer networks are only realizing what Lucretius imagined about two thousand years ago. He conceived of sensory and mental experience as an interactive particle stream flowing between the perceiver and the perceived.

We all accept the fact that sound waves enable you to hear what I am saying, and that light reflected off me enables you to see my hands waving, but units of thought, idea packets, or memes? Only by using the global and recorded medium of the Internet are we beginning to apprehend the possibilities of marketing our memes to one another. By packaging and marketing our ideas---made tangible through technology---and by monitoring their movement as they travel from person to person or group to group, we make a business out of thinking. This fresh sense of publishing assumes that through their use, ideas will evolve. Through the medium of electronic machines we publish our ideas, incorporate the ideas of others, carry on recorded conversations, and invigorate the collective thought process---all with a degree of intimacy and at a speed unknown in traditional publishing.

Publishing Transformed

Because we are recorded when we communicate on line, and we own the rights to our recorded thoughts and words, we can publish as confidently and more frequently than ever before. When we submit a poem to the "Il Postino" Web site at OBS or post a bed-and-breakfast critique to an online forum, we are publishing. But this way of publishing bears faint resemblance to the concrete and familiar discipline of paper publishing, where ownership of ideas is anchored in ink on a page, where author, reader, and publisher each enjoy a distinct role and financial responsibility in the business of information transfer.

Awakening to the Internet, many print publishers use the new electronic medium purely as a superior distribution mechanism, a means to market and distribute their existing wares to the global market. Retrofitting a product- or copy-based business into a kinetic electronic environment has led to a sudden blossoming of Web sites featuring bait in the form of cover art, text samples, tables of contents, all aimed at luring readers to buy or download a print product for pay. But downloading is dull, and printing out and squirreling away paper copies of favorite Web sites makes scant use of a medium that today boasts lights, music, video, and instantaneous global transmissions.

Fearful of the Internet's openness and immediacy, of irreverent cookie- cracking Java high jinks from high school kids impatient with textbooks and established procedures, many publishers respond with protectionist attitudes, through encryption and other security measures attempting to protect the copy-based products that constitute the foundation of their businesses. Their online strategies are shadowed by the dark dangers of illicitly downloaded files, unpaid-for photocopies of books bound for the beach or the bedside. However, the life of the medium finds its expression in ways more creative than the simple proliferation of printed copies.

The effective extension of publishing online involves creating opportunities for real-time applications of the ideas and information in a book rather than just transmitting the information contained in the book. The one-way transmission of files---even the static advertising lure of cover art and tables of contents---fails to harness the essence of the "meme machine" that is today's Internet.

The current trend toward encouraging readers to share in authorship and publishing is irresistible. Yet the original work in its encapsulated forms---printed pages, audiotapes, CD-ROMs---can exist simultaneously with complementary versions of it that allow readers to interact with the product and its original author in real time. Authors and publishers who venture onto the Internet should be prepared to engage in cooperative interactions with readers---to the creative advantage of all the players. And the achievement of economic goals must follow as well.

To illustrate a kinetic, application-oriented approach to publishing, let's think for a minute about the Java and cookies mentioned above. When a publisher uses the Internet simply as a means to distribute books by selling files for download, the reader might go to the online bookstore, purchase the official Acrobat files of a book such as Stephen King's "Insomnia," print out the pages, and sit, book in lap, before a fire, cookies and Java at hand. That's paper-based publishing supplemented by the added twist of instantaneous distribution of files through the Net.

Now put the cookies and the Java---as well as the King files---into the machine, and when your reader, Joe, logs onto his online bookstore, the server Cookie will welcome his browser Cookie by name---Hi, Joe!---and proceed to light up Joe's screen with the page of the King story where he left off last time. Cookies, a new feature of the Netscape software (and probably other browser/server software as well), offers a truly customized reading experience. Cookies enables client/server software to conduct a "handshake" exchange of information every time Joe visits a site featuring Cookies. It is like the book talking back to the reader, creating a software-mediated environment of understanding.

Were we to look at the recent OBS bed-and-breakfast title, Bernice Chesler's "Bed & Breakfast in New England," through the lens of Cookies and Java, we might see a whole new realm of interactive and custom publishing, using today's tools, not tomorrow's dreams. Instead of offering pages of type occasionally peppered by black-and-white state maps, the B&B publishing site customized by Cookies might aid a repeat visitor by reminding her what she looked at last time---B&Bs in Vermont perhaps---and pointing her to new enhancements or features about this subject area. It might allow her to post questions or suggestions to the author. The Java script might enable her to witness words or pictures in motion, see the leaves on the trees turn with the seasons. The point is, of course, that the more customized the experience, the closer each person's own version of a publishing server is tailored to that individual, the less significant the whole issue of copies becomes.

The Internet, relentlessly expanding, waits for no one to catch up, to learn about such innovations as Java and Cookies. The Net's distributive architecture, not to mention its users, encourages and thrives on Cookies/Java type innovations and abhors protective barriers and walls around product-centered ideas. Its incessant growth serves to decentralize business structures by enabling people to participate in the protean publishing economy all the time and from anywhere. The very verbs "to publish" and "to read" lose their familiar meanings when translated into bits. Anyone who advances ideas and information on the Net, who lives a recorded life online, is a publisher; every mouse-clicking icon pusher among us is a reader, made hungrier by those publishing servers he visits that feed his browser Cookies.

From Product to Process

In Internet publishing, process will take its equal place next to product. Process not only from the publisher's perspective of speedily preparing and posting publications that will change daily but also in terms of the way readers interact with the ideas in those publications. We have gone beyond posting files on an Internet server, beautifully formatted in PostScript, waiting for customers to download and read them. Tools such as Java and Cookies---and, still the most powerful of all, plain old email--- enable editors and readers to breathe life into those static files. Readers are anonymous and separate no more. The publications they experience are kinetic, and the interactive infoswap process that results has less to do with disseminating (or protecting) copies of things than with enabling readers to access and participate in the evolution of the information and ideas contained in the publications.

Key to this process are the readers, the recorded readers in the guise of site visitors, and the recorded session paths they leave behind. OBS has been involved with many of the prevalent business models for publishing online since 1992; our experience as online publishers has led us toward increasing recorded interactivity between an author's text and its readers. So in order to discover a workable type of commercial publishing model that we might expect to emerge from the roiling online publishing scene of today, it will be productive to look at both the existing publishing models and the increasing involvement of readers with texts online.

Emerging Business Models

Since 1993 the predominant publishing model involves marketing information products. Hundreds of thousands of hits on a server should yield increased print sales, increased box office traffic, increased hits on a sponsor's server. Developing this business model involves adapting traditional publishing roles. New functions emerge that blur the boundaries between editorial and sales, marketing and production. When link editors solicit and encourage links and online contributions to a site, they increase site traffic and thus increase marketing appeal. To gain traffic and satisfy readers, the link editor licenses or simply initiates external links, building context around her content through linking to existing resources elsewhere on the Net. Generally, the success of this marketing model is measured in hits, or files downloaded.

Gross hit counts, however, do not reflect the number of people visiting a site but rather the number of times people or machines such as Web crawlers have downloaded individual files. So the total hit count on a site may be a misleading economic indicator of the success or popularity of a site, as hits can be more indicative of the site's file structure than of the number of readers visiting that site.

The hit-based model has its antecedent in applause---the noise of popularity. Despite the commercial value of popularity as measured in gross hits---and this value today stands at tens of thousands of dollars per month for a sponsor's hot icon on a heavily trafficked site---still, such blind applause alone does not reflect the lasting value of an idea or a work of art presented online. What is a hit anyway? It is the mere downloading of a file, and in the case of automatic crawler-type hits, it is less than that---a mere acknowledgment of a file's existence. The more important question is, What becomes of a file or an idea once it is hit?

Rapid communication and the capability to break publications up into memes or idea pieces that can be named or encoded, incorporated into one document, extracted, reorganized, revised, reincorporated into a new document, and assigned a newly determined value as a result of new content and context---this is collective thinking in a climate of constant change. This is readers interacting with and changing the work they are reading. The extent to which we can both encourage and monitor this process of change will determine our effectiveness as online publishers. Tracking what becomes of a meme or idea unit once it is downloaded by its readers---how it is used---should yield a more productive publishing model than simply counting the number of hits on a file. The human editor, the judge of quality and value, plays an essential role in this tracking and evaluation process, a role that is beginning to be seen in the commercial publishing models of today.

The Building Blocks of the Meme Machine: Subscription, For-Pay Download, Site Licensing, and Microbilling Models

Using the subscription model, a publisher such as Harvard Medical School sells access to its secure site hosting its medical newsletters and allows access, for example, only to those whose Internet Protocol address matches the subscription list. The readers get their information immediately and can respond electronically in this copy-based online publishing model. Bridging paper and online, the Medical School can market both bits and what Negroponte calls atoms. So, too, the per-copy download model preserves the sanctity of copies. Practiced by many publishers who view the Net as a superior means of global distribution of product, this method generally involves posting the front matter and a chapter or two for free in the hopes of encouraging complete for-pay downloads of entire texts, and prohibiting redistribution of copies once they have been unzipped or decrypted.

Quick to market, updatable, and responsive to market needs, such publications benefit from their liberation from paper but at this time and on the predominantly free Internet, don't seem to be generating large amounts of revenue. The site licensing model, such as a news feed, on a per-hour or per-subscriber basis to a contained online service like AOL and CompuServe makes better use of the medium, at once preserving copy protection while generating usage-based revenues.

Perhaps the most enticing of the copy-based models is the microbilling model, where the user has the opportunity to pay pennies for chunks of information. Here, the for-free and for-pay content is delicately balanced, so that the more customized and timely the information is for each user, the more likely he will pay for it. The user may access New England B&B information for free but pay for access to a detailed road map from Logan Airport to a B&B on Cape Cod. Because the phone companies have a lot of practice in this type of microbilling for time spent talking on the wire, and cooperative settlement capabilities exist across international phone networks, this microbilling model for memes seems a natural for that industry. Two key considerations, though, when we turn from billing for ephemeral conversations across phone wires to selling chunks of information (such as road maps) for pennies: What is the inherent value of a meme, and what is it worth to the user at the moment he requests it? In some of the following practical examples I hope to point in the direction of answers, raise more questions, and encourage an open-minded approach to the medium.

Early Exercises in Interactivity

Nicholas Negroponte's "Being Digital"

After experimenting with the download-and-distribute method of online publishing for a few years, OBS slowly began to let our readers inside our books. One of the first forums we experimented with was for Nicholas Negroponte's "Being Digital," for which we agreed with the publisher to edit and forward on to the author only the top three emails per month for him to respond to. So we took the first steps in bringing author and readers together through the mediation of our link editor. This method points towards process-based publishing but improves on the "letter to the editor" experience only in that it happens more quickly--- usually. It relies on the mediation of human editors, who have a tendency to take weekends and holidays off, or to accidentally delete mail. What the method lacks in immediacy, it makes up for in content, however: the Negroponte forum grows more interesting as the global discussion about the book increases.

As the Negroponte forum grows and changes, it expands on the author's ideas independently of the author! Meanwhile, the book at the center retains its place as product, following a well-known pattern of published materials: a flurry of hoopla and sales on publication, followed by a steady pattern of sales, usually decreasing over time.

But once readers are let inside the online book, an independent, complementary process begins, and the interactive version of the book takes on a life of its own and becomes a meme-set of sorts. As the number of respondents increases, so does the traffic. Word spreads, through Web- crawling search engines and along the human thought paths of the Net. A steady stream of readers results, and the forum serves as the leavening, the living aspect of the site. As the static book files age, and linkrot erodes the vitality of the external links, people simply bookmark into the vitality of the hyperforum, where the book talks to them and they talk to the book.

At this point, one might justifiably ask whether it is the reader or the author who is contributing more to the evolution of the book's online meme- set? And as the value of the kinetic aspect becomes more apparent, a forward-looking publisher might develop a compensation mechanism to encourage quality meme evolution. The publisher will build incentive for readers to continue to use and contribute to this forum, to invest creativity and care in the discourse that occurs there. Such incentive might in fact be called for by virtue of ownership of the forum memes. Who owns the copyright on public Internet postings, words and ideas that draw traffic to the Negroponte site and presumably result in additional sales of the original author's copyrighted book?

Assume that the author owns his own words and deserves compensation for contributing them to the "Being Digital" site. One way to approach compensation---and to encourage increased and ongoing repeat participation in the site---would be through applying the microbilling model, which would reward the most hit-upon files and links in a site by paying them a fixed portion of the sponsorship fees. Say the sponsorship fees at a site are $1,000 per month and a percentage, say 25%, is allocated for the meme royalty payments, to be paid to the authors of the most hit-upon files of that site, whether those files are generated by the author or by readers. Implementing this model would involve securing paying site sponsors on a per-hit basis, registering readers, and of course defining carefully how hits are counted and imposing certain repeatability restrictions. Presumably, the prospect of gaining income through participating would help overcome reader reluctance to register formally at a site. Such a meme-based model gets beyond today's prevalent gimmicks and contests, which lure user participation by pushing the simple buttons of curiosity, love of gaming, and plain old greed.

"From Dusk Till Dawn"

There's no substitute for the thrill of what circus people call "working without a net." In the case of live forums, the experience of typing in an opinion, seeing it posted immediately on the global forum, and getting immediate feedback, proves alluring. Not only are the postings immediate, they can also be (relatively) eternal. Old postings never die! The first easy-to-write-on live forum we posted at the OBS site was for our Internet Projection of Quentin Tarantino's "From Dusk Till Dawn." Our January 1996 site focused on the issue of censorship, just being signed into law at that time by President Clinton in the form of the Computer Decency Act. That focus, combined with the popularity of this new vampire movie, caused an impressive influx of people writing everything from thoughtful treatises about censorship to lewd and vulgar scribblings.

Reflecting the medium itself, our decision at this point was on or off, a zero or a one. If we mediated and censored the objectionable prose, we would make ourselves liable for all the content appearing at the whim and will of the readers. So we opted for zero and shut off the live forum, replacing it with a more staid forum mediated by an email editor, such as we had been using for the Negroponte project. But the message from "Dusk" was clear---activity breeds activity, which makes the site grow in interest. Our hit counts reflected this.

Participation for "Dusk" involved not only the forum, which required original generation of prose, but our Web site contest. Just as recorded conversations can keep a book alive and the meme-set inside it evolving in response to the mental labor of the readers, so too with a film one can put in the hands of readers the stuff that shapes the film itself. We invited the Internet public to join us in creating an Internet Projection of the movie and to win a private screening of "From Dusk Till Dawn" for fifty friends by creating the most innovative "Dusk" site on the topic of censorship. The digital "paintbox" included access to the official movie pieces: portions of the script, and the stills and clips. In creating movie Web sites, the audience helped themselves and others actively think about the movie. Our role as publisher of the Internet Projection evolved from creating a Web marketing site to facilitating others' experiments with the movie in the context of current political events. No longer passive before the screen, the aficionados could actively weave themselves into the movie, get inside the film, and play with it.

Surely such participatory movie making is a step beyond video on demand, beyond the simple marketing and distribution of a film. No longer untouchable, the movies that used to be viewed, rewound, played over and over again, could now be talked back to or changed. We found we could capture the creativity and energy of the Tarantino enthusiasts by offering them access and priming the pump with some ideas and linking suggestions. We tried not to advance any point of view about the movie ourselves, as would be done in a straight distribution or promotion, but rather to set up an environment where users could create something new, which likely served the marketing purpose of our client, Miramax Films, as well or better.

Again, as with the Negroponte forum, the familiar issues of ownership, responsibility, copyright arise---how can a business model offer incentives and compensation to participatory readers? First, we need to explore that noun "readers" for a moment. It doesn't seem that, when online, people actually do a whole lot of reading. We skim for the hot links, click through messages, highlight blurbs and snippets. We surely don't do the kind of "deep reading" Sven Birkerts talks about in "Gutenberg Elegies." Rather, we create moving pictures out of stills, words and links, bites and clips---all part of the tool box for our "Dusk" Web site contest. But who owns the results?

Because we are not charging for access to the site, and because we awarded a prize for the contest, we are not bound to offer recompense to those contestants whom we link to, for we are participating in the prevailing Internet climate of today, where traffic in readers' attention is the operative currency of the day. Just by pointing to another site we are patronizing it and giving it a value. Building a tollbooth for the clickstream of our emerging meme machine might involve a three-way instead of a two-way street, a triangle incorporating the original author (or publisher), the readers, and the machine itself.

If we look at the "Dusk" forum for the three points of this triangle, the first point is the author, Dimension Films, our client. The recorded readers are the second point, and they contribute to the forum and to the Web site contest. And the machine (flesh and chips at this point) is fueled by the people at OBS, editors of concepts, links, and systems. The machine doesn't breathe on its own. Absent the spontaneously writable forum, users cannot post directly to the site. We on the machine develop systems to facilitate the interactive functions built into the site and automate as need requires. The site is like a fire that keeps going out without humans readjusting and rekindling it.

"The Postman (Il Postino)"

In the OBS Internet Projection of "Il Postino," the machine point of the triangle took on new life with Sunny's Random hAIku Generator. "Poetry belongs to those who use it, not those who write it!" the hero Mario exclaims in "The Postman" by Antonio Skármeta. He uses the poetic devices he learns from Pablo Neruda---the magic of metaphors---to woo and win his Beatriz.

In producing a marketing front end for Miramax Films this month, in celebration of the film's five Oscar nominations, we focused on the central role of poetry in the film, extending the interactive components we had used in previous projects: the interactive forum and the invitation to post complete works. Two key aspects differentiate this site from previous works at OBS, differences in human and machine. We hired Máighréad, an Irish poet, to oversee the poetry postings and to maintain the quality of the site. But we also offer readers more than the capability to post their own works and read the works of others. The Living Poetry section of "Il Postino" also points to an automatic Web-based poetry machine, Sunny's Random hAIku Generator.

The Generator is a program built by Sunny Gleason, a Rockport high school student. Into his poetry program he feeds words, programming them to come out again in the three-line haiku configuration of 5, 7, and 5 syllables. As he examines the results, determining what makes a poem out of a random selection of words, he adapts his code to reflect newly discovered esthetic algorithms. Filling out such topic areas as coffee, computers, and winter, Sunny turned to populating the Neruda word field when we pointed to his site from "Il Postino." He fed hundreds of Neruda's words (translated into English) into his machine and posted them on the Web for people to click on and generate poems---or to set on automatic pilot and watch the hAIku Generator cranking out a new "Neruda-bot" poem every twenty seconds.

To soften the transition between the blank page and the automatic haiku generator, we introduced a hybrid of human and machine in the form of three poetry bots, "rudimentary adaptations of three of the main characters of the book, Mario, Beatriz, and Mom. They work the "Well of Metaphors," where visitors to the site can draw metaphors for everyday use, an exercise not as complicated or difficult as penning your own poem. The whole theme of the book and the movie turns on metaphors. The lovers, Mario and Beatriz, fall in love because Mario learns from Pablo Neruda, the poet in exile in their tiny village, the transformative power of metaphors. Beatriz's mother is not so romantic about Mario's newfound gift for words, saying "There isn't a drug in the world worse than all that blah-blah- blah." This hybrid bot feature of the site is like an interactive game, whereas the haiku generator is more like an engine.

The boundaries between the three poetry sources at the Living Poetry section of "Il Postino" seem clear: the blank page supports original verse, while the poetry bot pages feature interaction between the readers and the human-powered poetry bots. The haikus are the products of the machine---or so it would seem. When one reviewer challenged Sunny's program, saying he could "expect it to generate only a finite set of machine-generated poems," Sunny responded,

"There are also a "finite" number of possible chess games, but that finite value is rather immense. (I remember reading somewhere that the number of possible games is greater than the number of atoms in the entire universe.)

"Given roughly 12 words/poem, and 220 words in the Neruda database, there are approximately 220^12 "random" poems possible. Even if only 0.000000001% of the poems are "understandable," that leaves 128,550,026,310,500,000 poems.

"So even if everybody on earth goes to my site and generates 20 million [!] poems, it is unlikely that there would be any duplication of "good" poems. (Just as a side note, the fact that humans can pick one "good" poem out of one billion is quite a testament to the human machine.)

"The randomly generated poetry of my site is NOT, in my opinion, solely the work of the machine. I believe the real art is in programming the algorithms of "esthetics" into the generator.

"Just as a paintbrush is an extension of the artist, the algorithms are an extension of the programmer. Like a sculptor, he chisels away the "incoherent" poetry, revealing "works of art." (However, until I learn more about AI, my hAIku are destined to be finger paintings and play-doh.) 8)

"Even if one tried to remove the "man" from the "machine" (perhaps by programming a machine to write its own haiku programs), the original programmer is still hopelessly a part of the machine, just as your random DNA "program" is a product of your grandfather's DNA "program," and on and on into prehistory.

"The fun lies in finding who the original programmer is. 8)"

So with Sunny's Random hAIku Generator, we see the beginnings of the possibilities of a human/machine publishing process, a process involving the machine as a key component of meme generation. The human (in the case of "Il Postino," it may be Sunny, the Irish poet, or the voluntary reader) selects the "keeper" haikus; Sunny, the programmer/author of the machine's software, writes esthetic algorithms that will program the machine to generate more rather than fewer "keepers," and in so doing, feeds the products of the machine back into its maw. With slow gulps we start the process.

It's a collaborative process. The machine is ever-productive and never sleeps. The problem is, it can't think, can't select a "keeper" from a random collection of words. But judgment is made easy at the Random hAIku Generator in the form of a "Submit" button, so whenever your poetical slot machine generates a poem that makes you stop and think and feel, you can submit it for posting to the site. The Irish poet editor determines which of those submitted "keepers" are indeed kept in a public spot on the "Il Postino" server, and Sunny determines how he will further adapt his poetry generator in response to the "keeper" poems as well as the poems that are discarded because they violate rules of syntax or sense.

Summary

When viewing the progress of online publishing from the simple downloading or linking of static files, to the generation, adaptation, and regeneration of live poetry online, we might reconsider the whole issue of copies as a basis for determining the value of a published work. When literature becomes an exercise, and poetry a game, the issue of copies pales against the intrigue of collaborating with an infinitely expanding machine to keep creating something new.

That something new calls for new methods of naming, accounting, and ownership before an economy can be built around it. Say we wanted to offer recompense to the "keepers" at the "Il Postino" site---what fair method would result? The words were penned in a Nobel prize--winning sequence and form by Neruda, yet he did not hold copyright on the individual words and didn't write English words in the first place! The Random hAIku Generator was built and is copyrighted by Sunny Gleason, yet in order to yield its fruits, it waits to be clicked upon by readers. Are these mouse-clicking readers, then, the authors of the resulting haikus? Complexities unfold before us as we seek out the owner of the haikus: Our client, Miramax Films, paid us a fee to create an Internet Projection of the film; the author wrote "The Postman" in the first place; and OBS hired Sunny and decided to point to his haiku generator as part of the poetry site it built to market the movie about poetry. The individual readers as well as the poetry editor judge what is or is not a "keeper." Don't they also qualify as owners of the haikus?

The confusion indicates we are getting close to a turning point, when the simple hit count model will be supplanted by a newer paradigm. Through such human/machine collaborations as Cookies and Sunny's Random hAIku Generator, the publishing process expands to include not only its base---a business of products and copies---but also a business involving memes. In our recorded electronic environment, the successful publishing models will be based on access to and use of ideas and information, not on the simple possession of information files. This use or application of ideas, when shared with others on an ongoing basis, becomes the foundation of a meme machine---an important part of tomorrow's publishing company.

Copyright © 1995–2024 by Laura Fillmore; written permission required to reprint. laura@obs.com

OBS White Papers