The Walt Whitman Archive is guided by pragmatic principles in its attempt to achieve two goals: 1) to edit the vast textual corpus Whitman produced, and 2) to provide access to a wide range of related materials that shed light on his writings, including photos, reviews, translations, criticism of his writings and accounts of his life, finding guides to his manuscripts, and a bibliography of criticism. We aim to achieve the best possible results within existing technical, legal, and financial constraints.
Whitman's writings survive in many forms, including manuscripts, notebooks, marginalia, periodical printings, and books. When we digitally represent and electronically transmit his vast oeuvre, we are, for better and worse, changing media. We recognize that the Archive cannot serve all purposes, nor can all editorial goals be pursued. We explain our practices to assist scholars in making informed judgments about their own use of the Archive. Text encoding, though crucial to our editorial practice, is excluded from this overview. See "Encoding Guidelines" for a detailed description of our text encoding practices.
The Whitman Archive's approach to editing is to establish documentary texts rather than to reconstruct what are assumed to be authorially intended texts. We are concerned with historical and social aspects of the texts, including how the texts made their way into the world and the multiple agents that brought them into being. Every step of the digitization process, however—from deciding which pieces of paper together form the manuscript to deciphering, transcribing, and encoding handwritten or printed marks—involves judgment. As a result, although authorial intention is not the primary determining factor in our editorial decisions, we do, when dealing with manuscripts, rely on intellectual continuity as well as physical evidence such as paper and the ordering of materials in repositories to determine the boundaries between texts.
While the early work of the Archive emphasized Whitman's poetry manuscripts and the books he saw through the press, the Archive has expanded over time to include other primary materials, including his correspondence, notebooks, prose manuscripts, and periodical publications. The Archive also publishes translations, reviews, criticism, portraits of Whitman, and biographies.
For Whitman's manuscripts, periodical printings, and original printed volumes, we publish authoritative electronic versions (facsimile page images and transcriptions) with an emphasis on fidelity to original documents. For other materials, we do not provide facsimile page images of original documents but instead offer searchable electronic transcriptions.
In keeping with Whitman's own "spinal ideas" that organized his poetry, the Whitman Archive uses a special designation—the Whitman Archive Object—to anchor its work with Whitman-related primary and secondary sources. For textual materials, the Whitman Archive Object is a document-based categorization and corresponds to a physical document or a set of writing surfaces. A Whitman Archive Object's boundaries are the largest possible structure of a physical document (e.g., an entire notebook, a set of manuscript leaves). A Whitman Archive Object can have multiple authors; can feature writing from several different times; can span multiple writing surfaces; can reside in multiple repositories; and can include writing in multiple genres. Each Whitman Archive Object is identified with a unique Whitman Archive Object ID.
A Whitman Archive Object also comprises one or more textual units or items, each of which is also assigned its own unique identifier. These textual units within a Whitman Archive Object correspond to linguistic or semantic items within the document. The designation of these textual items is based on a combination of material, linguistic, and semantic signals—changes in handwriting, genre, or content—that suggest a distinct idea, subject, or conceptual structure. In making a distinction between textual items we declare semantic units based on our interpretation of the evidence derived from the document's physical and linguistic characteristics. The textual item's coherence as a compositional unit is determined by Archive staff's capability to distinguish it semantically from other units that may appear even on the same page.
By defining the object at the document level and identifying its textual units, the Whitman Archive works to enable as many analytical approaches as possible. From a book historical perspective, for example, it would make little sense to separate the various other items that were repurposed into, say, a set of notes, from the notes themselves. From a literary critical perspective, however, individual textual units might be the sole or primary matter of interest.
Our definition of the Whitman Archive Object has emerged out of years of work on Whitman’s documents, and as a result our efforts to bring the material available on the Whitman Archive into alignment with this definition of the Whitman Archive Object are ongoing.
We provide both facsimile images and transcriptions that emphasize the semantic content of Whitman's poetry manuscripts. We display text without correction or regularization (e.g., end-of-line hyphens and obvious misspellings remain); regularized forms are encoded but suppressed in the default display.
Our practice for transcribing prose commentary written on a Whitman poetry manuscript differs according to the perceived relationship of the prose and poetry. If the prose is by Whitman or intercedes within the text, it is transcribed. Otherwise, it is noted but not transcribed (for example, Whitman drafted many poems on the backs of envelopes addressed to him by autograph seekers; in these cases, we note this information without fully transcribing the address and whatever other non-authorial writing might be present). When we render our transcriptions on the web, transcribed non-authorial prose appears verbatim in the notes. Our description of non-authorial prose that we have not transcribed also appears in the notes.
We first obtain a high-resolution digital image (ordinarily a TIFF scanned or photographed at 600 DPI) and enter it into a tracking database. This high quality image is the basis for the transcription. To prepare it for web presentation, the TIFF image is cropped and then used to derive three different JPEG images of different sizes to accommodate various users' needs.
Transcription and encoding constitute the second step, which is completed by two Whitman Archive staff members. A staff member transcribes the text and validates the markup against the Archive's encoding standard. A more senior staff member or editor then reviews, corrects, and verifies the encoding and transcription by proofreading against the image. He or she flags for review textual cruxes, uncertain encodings, etc.
Then, in consultation with a senior staff member, one of the general editors of the Archive performs a complete review of each poetry manuscript. This review includes proofreading the transcription and, when possible, resolving cruxes and uncertainties. The general editor works with staff members to date each manuscript and determine its relationship to other manuscripts and published writings. This information is included as metadata, and the updated file is then approved for publication.
Next, a senior editor publishes the poetry manuscript on the development server, reviews the manuscript's digital publication form, and, if necessary, modifies stylesheets (XSLT and CSS) for optimal presentation. The manuscript is then made available on the public site.
The electronic versions of the printed editions of Whitman's poetry have been prepared in collaboration with several institutions. Always our aim has been to reproduce the original printed volumes accurately.
As part of Major Authors on CD-ROM: Walt Whitman (1998), edited by Ed Folsom and Kenneth M. Price, Primary Source Media (PSM) transcribed all six American editions of Leaves of Grass, plus a seventh text, the so-called deathbed edition. Their transcriptions were entered into a proprietary Borland database. At the request of the Whitman Archive, the Electronic Text Center at the University of Virginia stripped out PSM's proprietary encoding and replaced it with Text Encoding Initiative (TEI)-conformant Standard Generalized Markup Language (SGML) encoding. Later, the staff of the Whitman Archive converted the SGML files into eXtensible Markup Language (XML). These converted files were proofread against the high-quality scans of original printed editions of Leaves of Grass that we received from special collections departments at the University of Virginia and the University of Iowa. We then published the corrected transcriptions, along with the page images.
We define a notebook as a series of pages, either bound, sewn, or continuous, that primarily feature notes in Whitman's hand. Whitman would occasionally bind draft manuscripts of whole poetic or prose works: these we have not classified as notebooks. He also kept several scrapbooks, which consist primarily of clippings or works composed by other people. These, also, we have not classified as notebooks.
Where possible, we offer facsimile images and transcriptions. In cases where high-resolution digital images are not available—if the current location of the notebook is unknown, for instance—we provide digital scans of microfilm reproductions or, in some cases, scans of photocopies, or we indicate the source of the text of the transcription. The transcription and encoding process is similar to that for the poetry manuscripts. A staff member transcribes, encodes, proofreads, and supplies metadata for the text. Materials are dated in consultation with general editors. Initial transcriptions are proofed against a printed original, a high-resolution digital image of the original, or a microfilm facsimile. Each transcription is checked by editorial staff and a general editor. Final rounds of checking assess the transcription and encoding as well as the web display. The remaining publishing process matches that used for the poetry manuscripts.
For the notebooks, we prepare a semi-diplomatic plain text transcription. We encode additions and deletions where they appear in the text, and the text itself is transcribed in the order it appears in the notebook. We follow Whitman’s title whenever he has provided one. If no title is given, we derive a title from the first words that appear in Whitman's hand. We display text without correction or regularization, but errors and idiosyncratic, antiquated, or other variant spellings are marked in the encoding.
In the web display of the documents, we do not attempt to reproduce the location of the text on the page, with the exception of poetic lines, in which cases we follow Whitman's line-end hyphenation and his style of hanging indentation. We note transposition marks, where applicable, with an insertion (in green). Where possible, we reproduce the mark itself in green; otherwise, we add the bracketed words: [transposition mark]. We also indicate gaps, where text is cut away or illegible, and words that are unclear. For a more detailed discussion and display of the colors we have used in the text, see the "Color key" link at the bottom of the metadata section for each notebook. We also offer "View XML" and "View page image" options in the metadata section of each notebook file. Although we transcribe archival notes or notes not in Whitman's hand written in the notebooks, we do not display these notes in our transcriptions, other than remarking on any labels on the cover of the notebook in the "Editorial Notes" section of the metadata.
Our process for editing Whitman's poems published in periodicals is similar to that for poetry manuscripts. The practice for preparation and publication of periodical poems varies from manuscripts in the following particulars: initial transcriptions may be based on a printed original, a high-resolution digital image, or a microfilm facsimile. A staff member transcribes, encodes, proofreads, and supplies metadata for the text. A senior editor proofreads, reviews textual cruxes, and writes headnotes to individual periodicals. One of the general editors proofreads and critiques all of the work after it has been completed. The remaining editorial and publishing process matches that used for the poetry manuscripts.
For periodical printings, Archive staff have not attempted to replicate the display of the originals in the visual rendering of our transcriptions. Thus, centered titles and right-aligned bylines, notes, and other prose information have all been left-aligned for presentation on the Archive, although basic placement features are encoded in the TEI files. To the extent possible, we have preserved the formatting of the poems, including indentation of poetic lines in both encoding and web display; line breaks in poetry are always encoded and represented. Users interested in the way typeface, ornamentation, and other aspects of layout may have affected the meaning of Whitman's periodical poems should consult the page images we supply.
Whitman's fiction was published in periodicals, and so our process for editing the fiction is similar to the process we have developed for the journalism and poems in periodicals. Initial transcriptions are drawn from Major Authors on CD-ROM when possible and proofed against a printed original, a high-resolution digital image, or a microfilm facsimile. A staff member transcribes, encodes, proofreads, and supplies metadata for the text. A senior editor proofreads, reviews textual cruxes, and writes headnotes for individual pieces of fiction. One of the general editors proofreads and supplies corrections for all of the work after it has been completed. The remaining publishing process matches that used for the poetry manuscripts. Footnotes are composed by Archive staff.
As has been our practice in the case of Whitman's journalism, Archive staff have not attempted to replicate the display of the newspapers or periodicals with our web display of the fiction. Centered titles and right-aligned bylines, notes, and other information have been left-aligned for presentation on the Archive. Users interested in the way typeface, ornamentation, and other aspects of layout may have affected the meaning of Whitman's fiction may consult the page images we supply. We have retained original spellings, based on our digital images of the original periodicals, but misspellings and misprintings are indicated in gray, and a mouse-over produces the corrected spelling. Punctuation and other accidentals have been corrected and line-end hyphenation has been removed to create a clean reading copy. The source of any supplied text has been noted in the "Source" section of the metadata.
The translations section of the Archive features full-length translations of Whitman's works as well as many versions of Whitman's poem "Poets to Come." For translations, we provide page images whenever possible along with transcriptions. In the transcriptions we do not attempt to capture the so-called bibliographic codes—the appearance of margins, fonts, and ornaments in the original printed documents. Most other features of the printed page are preserved: capitalization, hyphenation, punctuation (for French translations, this sometimes means preserving a space between a word and punctuation), and page breaks. Note that in the case of translations of "Poets to Come," page breaks are recorded in the XML/TEI file, but they are not displayed as such in the HTML display. Our electronic transcriptions preserve typographical errors present in the original; corrected forms are also included in the encoding. When transliteration of non-Roman characters is necessary, as in describing the Russian editions available on the Archive, we follow the Library of Congress ALA-LC Romanization Tables.
The Whitman Archive's correspondence project, which presents Whitman's outgoing and incoming letters and letters of the Whitman family, brings together previously edited printed material and freshly edited material that has never appeared in print. Archive staff have transcribed letters from digital scans of the original manuscripts, from microfilm reproductions, or from previously edited printed volumes of correspondence. The source text for every transcription is identified for users. In our treatment of Whitman's correspondence, we privilege access and accuracy of transcriptions of the main letter text. Because of changes to the editorial policy for the letters over the history of the Archive's work on this material, there are some inconsistencies in our treatment of the correspondence, particularly in what non-Whitman script on a letter has been transcribed or described. We are working to bring all published letters into alignment with our current editorial policy and transcription and encoding practices.
Those letters for which the Archive has digital images have been freshly transcribed and edited, often for the first time. For now, we follow the practices of other editors of Whitman's correspondence by remaining as unobtrusive as possible and presenting an inclusive text representing as nearly as possible a clean, reading version of the letter. In general, we do not record deletions, note authors' insertions, or attempt to duplicate the appearance of the original holographs. We also omit metacommentary in the form of cues such as "(over)" that were relevant to the reader of the original letter as a physical object but are more distracting than helpful in an electronic environment. We standardize the placement of salutations, signatures, and postscripts, though basic placement is described with TEI markup. In addition, we are in the process of transcribing and encoding letterhead for these documents. These decisions have been made on a pragmatic basis and to create consistency among the materials presented. As we secure more digital images of original letters, and as we have time, we will update our XML files and encode all deletions and insertions. In the future, we hope to allow users of the Archive to choose between two different ways of viewing the correspondence, either as clean, reading versions or as diplomatic transcriptions.
In instances where the Archive presents transcriptions derived from earlier print volumes, both the source text and repository of the original manuscript are identified. Letter transcriptions derived from earlier printed volumes reproduce the text of the letter as found in the source text. Editorial content, including footnotes, is a combination of new information composed by Archive staff and existing material from printed editions. To avoid redundancies, Archive staff have omitted some notes, such as those identifying repositories, since we provide this information in our metadata. In other instances, where new information has become available since the publication of the print editions, footnotes have been added or revised. Editorial material that appears prior to the text of a letter in a print volume has been moved to a footnote following the letter transcription. In these instances, the language of the original editorial material has been revised (e.g., we have changed "the following letter" to "the above letter") and footnotes in leading material have been transcribed as parenthetical citations. Given these changes, users should not assume that footnote numbers, as they appear on the Whitman Archive, correspond with the numbers as they appear in the print volumes.
This section aims to provide access to Whitman's written reactions to the works he read during his lifetime. We have focused on Whitman's notes that comment on other writers' works, whether "annotations," which we define as notes entirely in manuscript, or "marginalia," which we define as manuscript notes in the margins of a printed text by another author (such as a book or clipping from a periodical). Documents drafted or edited by Whitman that led to his own compositions, whether poetry or prose, are treated in other sections of the Archive. The editing process for the annotations and marginalia is similar to that for the notebooks.
We are focused on enabling users to read Whitman's reactions to the writings available to him, so the preservation of the texts to which he was reacting has been only a secondary priority. Accordingly, we include predominantly pages that Whitman marked, and have not for the most part supplied text that he cut away or that has become garbled. Names for the annotations files are supplied from the first few words found on the first page of the original, while marginalia are named by the title of the document Whitman annotated. We have preserved running heads in this section of the Archive (which other sections do not) because Whitman at times made marks on or edited running heads and other common features of the printed page. Finally, while we have tried to include dating information, this often proves difficult to ascertain, particularly in the cases of obscure texts or texts that Whitman re-read over many years. As a result, we have emphasized year ranges over specific dates.
All of the documents that we present in our section devoted to Whitman's work as a clerk in the Attorney General’s office are transcribed and encoded by staff of the Whitman Archive and then checked by two or more members of the project staff. Prior to being published on the Archive, each letter receives one or more rounds of additional review by senior Archive staff and general editors. Final rounds of checking assess the transcription and encoding as well as the web display.
Unlike documents treated for other parts of the Whitman Archive, which are more richly encoded, we have so far only done minimal structural encoding of the scribal documents. This decision was made to expedite the encoding process, in order to make this vast trove of material available to users. At a later stage, we may enrich the encoding to mark place names or other named entities or to identify various structural features of the texts.
The documents are presented as transcriptions and facsimile page images. The process and specifications for obtaining page images and for presenting them on the Archive follow those for the poetry manuscripts. Although some documents are on loose leaves, most of the documents have been photographed or scanned from large, bound letter books that often include several letters on a single page. Currently, we make available only cropped images of individual documents. In the future, we may also provide full page images for greater context.
In the web display of our transcriptions, we do not attempt to duplicate the appearance of the original holographs, though users have access to facsimile images of these documents. In the display, all text is left-justified, regardless of how it appears on the manuscript page. In addition, we display a short horizontal line to separate the text of the body of the document and the text of marginal annotations by Whitman and others. We omit metacommentary in the form of cues such as "(over)" and catchwords that were relevant to the reader of the original document as a physical object but are more distracting than helpful in an electronic environment. We display text without correction or regularization, but errors and idiosyncratic, antiquated, or other variant spellings are marked in the encoding.
When editing reviews we record what appeared in the original source document. Any deviations from that original source—insertions of obviously omitted words or alterations of spelling, for example—are marked by brackets. Transcriptions of reviews also include all authorial or editorial footnotes that appeared in the original document. Because we have not represented page breaks in our transcription and encoding of the reviews, authorial and editorial notes in the original appear at the end of the transcription, rather than at page breaks. Footnotes in the original are often indicated by Arabic numerals. To distinguish these notes from editorial notes added by the Archive, we have changed the superscript numerals to asterisks (first footnote), daggers (second footnote), and double-daggers (third footnote). Similarly, in instances where the original review uses the same symbol to mark two or more footnotes appearing on different pages, we have replaced succeeding symbols of the same type with daggers (second footnote) and double-daggers (third footnote).
As elsewhere, Archive staff have not attempted to replicate with our transcriptions the display or typographic features—typeface, ornamentation, and other aspects of layout—of the newspapers or periodicals in which the reviews originally appeared. To the extent possible, we have preserved the formatting of the poetry in the reviews, including indentation of poetic lines; line breaks in poetry are always encoded and represented.
This section prioritizes access to the editions of Horace Traubel's record of his conversations with Whitman, titled With Walt Whitman in Camden and issued in nine volumes from 1906–1996 by various publishing houses. This collection is a major resource for Whitman scholars, but because it is not strictly Whitman-authored, the Archive has striven to create an accurate and complete transcription of the printed volumes rather than an extensively tagged version. Deeper encoding and integration with other resources of the Archive may be pursued in the future.
For this section of the Archive, we provide a digital facsimile of all pages containing a non-textual image (for example, a picture of one of Whitman's associates) but not of pages containing only text. Our policy is to record the textual content of the printed page accurately. Capitalization and punctuation are preserved. The transcription and encoding processes are followed by proofreading of the transcription against the original source document by editorial assistants and another round of proofing of the file as displayed by the stylesheet, performed by the contributing editor. Typographical errors deemed obvious are encoded as alternate forms, but the original printed form is displayed.
This section makes available selected current articles, monographs, and essay collections of scholarly work on Whitman. In all cases, the Archive has received permission from the rights holders to publish an electronic edition of the work. As time allows, we also add to this section some out-of-copyright commentary.
The Archive privileges the textual content of the critical work and does not attempt to present design aspects of the original (font, spacing, ornaments, etc.). Archive staff have regularized the encoding and display of tables of contents. Page breaks are not encoded, and page images are not provided. Obvious textual errors in the original are corrected in the electronic edition, and Archive staff track these changes and publish them in "Changes to Criticism Texts in the Electronic Editions." A link to this document appears at the end of every critical text that contains Archive corrections.
When possible, Archive staff members work from an electronic text (.txt) file of the original article or book, which has been provided by the author or publisher, or created from a PDF. Staff members also consult PDF page images of the printed work, or the actual book or journal. A staff member adds the TEI markup to the text files. In cases where text files are not available, an Archive staff member first transcribes the critical text from the original and then encodes the document, again using page images or hard copy as a point of reference. Another Archive staff member then copyedits the text against the original, and a test version of the document is made available on the site. A senior editor looks for textual and display inconsistencies and modifies the encoding or stylesheet (XSLT) as necessary. A general editor of the Archive then reviews the test version and either suggests changes or recommends publication on the Archive. The latter two steps are repeated until the general editor gives approval for publication.