So you want to adopt JATS. What decisions do you need to make?

The following is a reproduction of a conference proceedings by B.Tommie Usdin, who originally presented the paper at the 2016 JATSCon meeting held in April 2016 at the National Library of Medicine in Bethesda, Md. This article was posted here with permission from the author. Copyright © 2016 Mulberry Technologies, Inc.

 Introduction

Newcomers to JATS often think that the decision to use JATS is the tough decision, and after they have made that decision they are ready to get started. While has never been the case (even NLM 1.0 was available in Archiving and Publishing versions), there are more decisions facing an adopter now and they are more complex than ever before. JATS users must decide which model to use (Archiving, Publishing, Authoring), which table model to adopt, and how to handle math. In addition they should select a citation model and style, an approach to encoding contributor names and affiliations, an approach to multi-language names, whether to provide information for accessibility, and set policies for their metadata including how they want to handle funding, permissions, and licensing information. The new user should also decide which of several available set of tagging guidelines to adopt, including those published by PubMedCentral, the JATS4R group, and the archives/libraries/publishing partners with which they want to interchange documents.

There are several “levels” of choices to be made in implementing JATS:

  • Model Choice (tag set choice)
  • How Much to Include in the XML File
  • How Much to Enhance the XML File
  • Consistency and Style
  • Which Guidelines to Adopt

An intimidating number of variations are available for JATS version 1.1[1]. Each of the following models is available in three grammars: DTD, XSD (W3C XML Schema), and RNG (RELAX NG):

  • JATS Archiving with XHTML tables and MathML2
  • JATS Archiving with XHTML tables and MathML3
  • JATS Archiving with OASIS and XHTML tables and MathML2
  • JATS Archiving with OASIS and XHTML tables and MathML3
  • JATS Publishing with XHTML tables and MathML2
  • JATS Publishing with XHTML tables and MathML3
  • JATS Publishing with OASIS and XHTML tables and MathML2
  • JATS Publishing with OASIS and XHTML tables and MathML3
  • JATS Authoring with XHTML tables and MathML2
  • JATS Authoring with XHTML tables and MathML3
  • JATS Authoring with OASIS and XHTML tables and MathML2
  • JATS Authoring with OASIS and XHTML tables and MathML3
  • BITS with XHTML Tables
  • BITS with OASIS and XHTML Tables

This really comes down to a small number of choices, with we will discuss individually.

What Color: Which of the JATS Tag Sets

DECISION:

The first question a new JATS user must answer is “What Color”, meaning which of the JATS tag sets to adopt. There are three JATS tag sets:

  • Journal Archiving and Interchange — usually called ‘Archiving’ and nicknamed: Green[2]
  • Journal Publishing — usually called ‘Publishing’ and nicknamed: Blue[3]
  • Article Authoring — usually called ‘Authoring’ and nicknamed: Pumpkin[4]

And there is an associated tag set for books and book-like material:

  • Book Interchange Tag Set — usually called ‘BITS’ and nicknamed: Chocolate[5]

(The nicknames are the colors of the documentation for each of the Tag Sets, and are used as shorthand for the names of the Tag Sets.)

ADVICE:

Archiving

If existing documents, especially existing XML documents, are to be converted to JATS, Archiving is the most likely target. This is because Archiving is the most flexible, and it is likely to require less reorganization and regrouping of the existing content to convert it to XML according to Archiving than it would take to get the same material into the Publishing tag set. The archiving model even provides an element to record processing that may have been done in the pre-JATS XML (<x>), which may be useful to retain information from existing non-JATS XML documents. Archiving is intended for libraries and archives who must accept input from a wide variety of sources and convert that input to JATS (as expeditiously as possible).

Publishing

The Publishing tag set is a good choice for publishers converting content from non-XML source material, such as proprietary typesetting formats and word processing files. It enables the tagging of metadata that is created during the publishing process, such as publication date, journal, volume, issue, and page numbers, and is flexible enough to accommodate most publisher styles. Publishing is more restrictive than Archiving, which reduces the variation in the files, reduces the options available for tagging, and makes editing the XML files more comfortable.

If the user is going to use the JATS content to create new journal articles for search and display or for repurposing such as combining fragments of multiple documents into new documents, Publishing may be the best choice because the content will be in a more predictable and thus tractable form than if it were tagged using Archiving.

Authoring

A publisher soliciting new material from article authors in JATS probably wants to have them submit the Article Authoring tag set for incoming content. (Similarly, an author in some utopian future where XML tools are common on general purpose desktops, would use the Authoring tag set to create articles because it is not specific to any particular publisher or journal.) There are few other scenarios in which Authoring is appropriate.

The Authoring tag set was designed to allow as few tagging options as possible while enabling the expression of the full content of a journal article. Authoring does not provide tagging for metadata that will not be known at the time an article is authored, such as publication date, page number, history, or journal name.

BITS

Many organizations have used one or another of the JATS tag sets to tag materials other than journal articles: technical reports, books, pamphlets, posters, text books, and even letters have been tagged using JATS. JATS is not always a good fit for these materials. For example, even the most flexible JATS model (Archiving) allows Reference Lists only in the back matter of an article or at the end of a section. This may not be appropriate in some other types of materials. Conversely, the JATS model does not accommodate some common book structures such as tables of contents, indices, and questions and answers.

The BITS model was designed for books, parts of books such as chapters, and for non-journal-article materials that may be more loosely structured than journal articles.

Tables

After choosing a tag set the user must decide how tables will be handled in their documents.

DECISION:

Will tables in the documents be tagged using the XHTML-based table model[6] or the OASIS/CALS Exchange Table Model[7]?

BACKGROUND:

In order to know which JATS model to choose, we need to discuss tables. Most scholarly, technical, or medical publications, and many other publications that get into detail of any sort, contain tables. Even if you do not expect to have any tables in your JATS documents, you must choose a table model because this is part of what determines which tag set to use.

JATS includes the two most commonly used XML table models: one based on the XHTML table model and one that adopts the OASIS/CALS Exchange table model. Each variation of the three JATS models (colors) is available in two versions: one that uses only the XHTML-based table model and one that enables both the XHTML-based model and the OASIS/CALS Exchange table model.

ADVICE:

Technically, since there are versions of the grammars (that is, DTD, XSDs, and RNG forms of the tag set rules) that support both table models, there is no need to decide between the table models, but not deciding is inviting chaos.

While the grammars support use of both, most users will be better served by selecting one and sticking with it. In the unusual case that a user decides to use both table models, I recommend clearly identifying which should be used in what situation.

Users who have investments in tools that create and format pages using the OASIS/CALS table model probably want to use that table model for their JATS. Certainly, if these users are going to use JATS early in the document life-cycle (for editing and typesetting) they should use the OASIS/CALS table model because that is what their typesetting engine can handle. Even these users might consider converting their tables to the XHTML table model before passing it to publishing partners, archives, and other users of the documents. This is because in many cases the JATS documents are going to end up displayed on the Web. In other words, it will be converted to HTML for display in web browsers, ebooks, or other web-based technologies.

Conversion of tables from the OASIS/CALS table model to the XHTML table model is usually straightforward and lossless. But not always. If the publisher converts the tables from OASIS/CALS to XHTML, then the publisher has the option of checking the converted tables to ensure that they accurately convey the intended meaning. If the publisher passes OASIS/CALS tables to an archive that will publish using HTML, the archive will do that conversion either as part of their intake process or for web display.

For some users I see no reason to consider any model other than the HTML/XHTML table model. If your JATS XML is:

    • created after page production
    • used to produce pages using HTML/XHTML tables
    • produced in an environment in which there are no pages

then all tables can be created with the XHTML-based table model and can be easily displayed using web technology.

Math

The third question the user must answer (after which tag set and how to handle tables) is how to handle math.

DECISION:

How should mathematical expressions be encoded? Should you use a version of the grammar that supports MathML2 or MathML3?

BACKGROUND:

JATS provides several ways to encode mathematical expressions including: as a graphic, as text, as TeX or LaTeX, and in MathML2[8] or MathML3[9]. It also allows the user to provide the same expression in two or more of these formats and identify them as alternative versions of the same expression.

Because the MathML2 and MathML3 models cannot coexist in the same tag set, users must choose one or the other, even if you have no math in your documents or plan not to use MathML to encode the math you have (perhaps choosing to use graphics or TeX or LaTeX instead of MathML).

ADVICE:

If your document collection contains relatively few mathematical expressions and/or if the math in your documents is relatively unimportant, treat the math as graphics. Encoding math is difficult, expensive, and requires substantial amounts of quality assurance. If math is not important in your environment, the investment is inappropriate.

If math is important in your document collection and you can commit to the investment not only of creating MathML but of checking it, consider the advantages of MathML as well as the costs. Advantages include: scalable electronic display, multi-media display, accessibility, and searchability.

If you are just starting to create MathML tagged expressions, start with MathML3. Use MathML2 only if you already have a substantial investment in MathML2 documents and tools and are unable to convert that MathML2 to MathML3*.

If you publish in an environment in which authors and users assume that all documents, and thus all math, is created in TeX or LaTeX, keep your math in TeX or a clearly defined profile of LaTeX.

Since support for display of MathML and TeX is not always available, wrapping math in an <alternatives> wrapper and also providing a graphical version of the expression is a good way to maximize the chances that a user will be able to read the math in your documents.

Choose your model

Once you know which of the JATS models is most appropriate for your use and you have decided how you want to handle tables and math in your JATS documents selecting a model is simply a matter of finding the right combination of choices in the list of options. That intimidating list we looked at before is no longer intimidating:

Archiving

  • XHTML tables only
    • MathML2
    • MathML3
  • XHTML & OASIS/CALS tables
    • MathML2
    • MathML3

Publishing

  • XHTML tables only
    • MathML2
    • MathML3
  • XHTML & OASIS/CALS tables
    • MathML2
    • MathML3

Authoring

  • XHTML tables only
    • MathML2
    • MathML3

BITS

  • XHTML tables only
  • XHTML & OASIS/CALS tables

It would be wonderful if that were all you needed to decide when you get started with JATS, but it is not. In fact, those were the easy decisions!

How much to include in the XML file

Some JATS users include all of the text that a reader of the usual display of the document would see. They look at a print or PDF representation of the print version of the article as the “real” content of the article and put all of the characters that are visible to the reader of that version into the XML. This includes labels (such as section numbers and bullets on list items), separators, punctuation, etc. This is not a bad approach, but it may or may not be suitable for a particular publication.

Some JATS users prefer to have the XML document contain only the content that is best provided by human beings and to have and end user content that can be more reliably created by software generated on display. Reference numbers are an example of content that many publishers prefer to have generated on display than to place content such as [USDIN 2016] or [B] in the XML file. This allows changes that affect these numbers to be made at the last minute before publication without forcing last minute renumbering.

After discussing labels, below, we discuss formatting of bibliographic citations, a far more complex prospect than formatting list item numbers.

Labels as document content or computed display

DECISION:

Should the labels that appear in the document be included as content in the XML file or not?

BACKGROUND:

Labels include such things as section numbers, bullets or numbers on list items, footnote symbols or numbers, the text of cross-references, and the numbers of bibliographic references.

ADVICE:

For example, if a list might be displayed/printed as:

dogbreedslist

It could be represented in the XML with tagged labels:

<list>
<title>Favorite Dog Breeds</title>
<list-item><label>1.</label><p>Companion Dogs
<list>
<list-item><label>a. </label><p>Bichon Frise</p></list-item>
<list-item><label>b. </label><p>Coton de Tulear</p></list-item>
</list>
</list-item></p>
<list-item><label>2. </label><p>Hound Dogs</p></list-item>
<list-item><label>3. </label><p>Sporting Dogs</p></list-item>
<list-item><label>4. </label><p>Working Dogs</p></list-item>
</list>

However, if this is a “live” document, to be edited, this method of tagging means that if the editor inserts a list item before an existing one or resequences the list items, the labels must be changed. More disruptive, if an editor adds, removes, or resequences a reference all of the reference numbers from that point on will need to be updated. Also, this tagging may well mean that people spend their time and attention counting and numbering, which computers are far better at than are people. In an environment in which the XML is used for editing, it might be better to tag as shown below and let the display engine, XSLT conversion, or some CSS provide the numbers:

 <list list-type="order">
<title>Favorite Dog Breeds</title>
<list-item><p>Companion Dogs 
 <list list-type="order">
 <list-item><p>Bichon Frise</p></list-item> 
 <list-item><p>Coton de Tulear</p></list-item> 
 </list> 
</list-item></p> 
<list-item><p>Hound Dogs</p></list-item>
<list-item><p>Sporting Dogs</p></list-item>
<list-item><p>Working Dogs</p></list-item> 
</list>

Additionally, in some environments where the documents were edited without labels and labels were automatically added before creation of an archival version of the XML, it would be appropriate to indicate what content was automatically generated**:

 <list>
 <title>Favorite Dog Breeds</title>
 <list-item><p><x>1. </x>Companion Dogs</p>
 <list>
 <list-item><p><x>a. </x>Bichon Frise</p></list-item>
 <list-item><p><x>b. </x>Coton de Tulear</p></list-item>
 </list>
 </list-item>
 <list-item><p><x>2. </x>Hound Dogs</p></list-item>
 <list-item><p><x>3. </x>Sporting Dogs</p></list-item>
 <list-item><p><x>4. </x>Working Dogs</p></list-item>
 </list>

Citations

Most journal articles, and I would venture to guess most content marked up using JATS, contains bibliographic citations. So the question is not whether to tag citations, but how to tag them.

DECISION:

There are two reasonable ways to tag citations in JATS: <element-citation> and <mixed-citation>. (Technically, there is a third, but <nlm-citation> has been deprecated in the tag set documentation for years and should not be even considered by a new user.) A user should choose a citation style.

BACKGROUND:

Element citation allows all of the meaningful content of the citation to be tagged but does not allow spacing or punctuation between those elements. That is, the tagged citation allows element content only. No punctuation is allowed between the elements and, by definition, any spaces between the elements are meaningless and XML processors can and will ignore or discard them.

Mixed citation allows punctuation between the elements, and spaces between the elements are significant (and collapsible but not completely removable). This means that any place there is one or more spaces in the tagged XML file, an XML processor may collapse that to one space but at least one space will be retained.

As an example, lets look at a citation that looks like this in display or print for publication[10]: Petitti DB, Crooks VC, Buckwalter JG, Chiu V. Blood pressure levels before dementia. Arch Neurol. 2005 Jan;62(1):112-116.

This may look cryptic to readers who do not know the conventions of this style. It may not, for example, be obvious that this article appeared in the first issue of the sixty-second volume of Archives of Neurology. But this is far easier to read than: PetittiDBCrooksVCBuckwalterJGChiuVBlood pressure levels before dementiaArch Neurol2005Jan621112116, which is what we might see if the tags were removed from the display and no punctuation or spaces added. (This is an extreme example, but I have actually seen this in live systems.)

Element citation

Element citation allows all of the meaningful content of the citation to be tagged but does not allow spacing or punctuation between those elements. PMC Citation Tagging at http://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/citations/v3/journals1.html#d3e52 provides this <element-citation>:

 <element-citation publication-type="journal" publication-format="print"> 
<name>
<surname>Petitti</surname> 
<given-names>DB</given-names> 
</name>
<name> 
<surname>Crooks</surname> 
<given-names>VC</given-names>
</name> 
<name> 
<surname>Buckwalter</surname>
<given-names>JG</given-names> 
</name> 
<name> <surname>Chiu</surname>
<given-names>V</given-names> 
</name> 
<article-title>Blood pressure levels before dementia</article-title> 
<source>Arch Neurol</source>
<year>2005</year> 
<month>Jan</month> 
<volume>62</volume>
<issue>1</issue> 
<fpage>112</fpage> 
<lpage>116</lpage>
</element-citation> 

Using element citation means that if the content of the JATS XML article is to be displayed to human users, software must add the punctuation and spacing needed to make the citation comprehensible to the reader. This can be done before display or in the display tool, but it will be done after the XML is created and generally by an organization other than the organization that created the XML.

Mixed citation

Mixed citation allows spacing and punctuation to be interleaved with the elements that identify the meaningful content of the citation. PMC Citation Tagging at http://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/citations/v3/journals1.html#d3e52 provides this for the same content:

<mixed-citation publication-type="journal" publication-format="print"><name><surname>Petitti</surname><given-names>DB</given-names></name>,
<name><surname>Crooks</surname><given-names>VC</given-names></name>,
<name><surname>Buckwalter</surname><given-names>JG</given-names></name>,
<name><surname>Chiu</surname><given-names>V</given-names></name>.
<article-title>Blood pressure levels before dementia</article-title>.
<source>Arch Neurol</source>. 
<year>2005</year> <month>Jan</month>;<volume>62</volume>(<issue>1</issue>):<fpage>112</fpage>-<lpage>116</lpage>.</mixed-citation> 

Another way to tag the same citation uses and replaces with so all spacing and punctuation are present in the XML file:

<mixed-citation publication-type="journal" publication-format="print">
<string-name><surname>Petitti</surname> <given-names>DB</given-names></string-name>,
<string-name><surname>Crooks</surname> <given-names>VC</given-names></string-name>,
<string-name><surname>Buckwalter</surname> <given-names>JG</given-names></string-name>,
<string-name><surname>Chiu</surname> <given-names>V</given-names><
/string-name>. <article-title>Blood pressure levels before dementia</article-title>. <source>Arch Neurol</source>.
<year>2005</year> <month>Jan</month>;<volume>62</volume>(<issue>1</issue>):
<fpage>112</fpage>-<lpage>116</lpage>. </mixed-citation>

ADVICE:
The differences look subtle, but have significant implications when the citations are rendered for display to human readers.

Element citation:
It is unlikely that a system that displays JATS documents will completely fail to format element citations for display. Using ensures that the most commonly cited document types (journal articles and, to a lesser extent, books) will be consistently formatted for display because the formatting is provided automatically.

However, formatting for display of less frequently cited and less easily formatted citations (personal communications, standards, laws, patents, and web sites) is generally less successfully auto-formatted for display when tagged as . ( allows the content creator to include punctuation and spacing in the tagged data, making the display of less common citations far more successful.)

When you tag citations with you are trusting that all future systems that are going to format your content for display to human readers will have both the will and the skill to format them “correctly”, and you assume that your concept of “correctly” formatted is compatible with that of the displaying system. I suggest that relying on not just the kindness of strangers but also their competence is a decision to be taken only after considerable thought.

Mixed citation
I suggest that if your documents cite materials other than journal articles and you have the editorial expertise to format your citations as you want them displayed, that you put the spacing and punctuation in your s. I’ll go further; I suggest tagging names in citations as so that you can include in the XML both the sequence in which you want to display the parts of the name and the punctuation you want between them.

Especially if you already have fully formatted citations for production, throwing that information away and trusting others to re-create it in ways that you will like is not something that would make me comfortable. Using allows the creator of the XML to take control of the display of the citations.

Misguided Citation Kludges; DO NOT DO THIS. OH — and a caution against trying to have it both ways: this is a recipe for disaster! What do I mean by having it both ways: using and including the desired punctuation and spacing inside the elements in the citation. (I know it is dangerous to publish examples of what should not be done, but here is one anyway.) Do NOT do this:

 <element-citation publication-type="journal" publication-format="print"> <name><surname>Petitti </surname><given-names>DB, </given-names></name>
<name><surname>Crooks </surname><given-names>VC, </given-names></name> 
<name><surname>Buckwalter </surname><given-names>JG, </given-names></name>
<name><surname>Chiu </surname><given-names>V. </given-names>
</name> 
<article-title>Blood pressure levels before dementia. </article-title> 
<source>Arch Neurol. </source><year>2005
</year><month>Jan;</month><volume>62
</volume><issue>(1):</issue>
<fpage>112-</fpage><lpage>116.</lpage> </element-citation>

For completeness, I include a version of tagging that does provide punctuation and spacing in a way that is technically allowable, but that is not conventional practice. Formatters are unlikely to expect this, and thus it, too, is likely to result in unfortunate rendering for the human reader. Do NOT do this:

<element-citation publication-type="journal" publication-format="print">
<name><surname>Petitti</surname><given-names>DB</given-names></name>
<comment>, </comment>
<name><surname>Crooks</surname><given-names>VC</given-names></name>
<comment>, </comment>
<name><surname>Buckwalter</surname><given-names>JG</given-names></name>
<comment>, </comment>
<name><surname>Chiu</surname><given-names>V</given-names></name>
<comment>. </comment> 
<article-title>Blood pressure levels before dementia</article-title>
<comment>. </comment> 
<source>Arch Neurol</source><comment>. </comment> 
<year>2005</year><comment> </comment><month>Jan</month>
<comment>; </comment>
<volume>62</volume><comment>(</comment><issue>1</issue>
<comment>):</comment>
<fpage>112</fpage><comment>-</comment><lpage>116</lpage>
<comment>.</comment>
</element-citation>

Varying citation styles increases the risks

  • How much of an author name to include (surname and initials; surname, given name and initials; full name)
  • How to sequence the parts of author names (surname, given name of first author followed by given names and surname of all other authors)
  • How many author names to include in the citation if there are many authors
  • Whether the title of the article is included in the citation
  • What information, if any, is presented in bold or italic
  • Sequence of the information presented, including whether the first item is the name of an author or the title of the work

It is noteworthy that none of the JATS citation models allow the creator to state what citation style was used for the citation, yet knowing the citation style is essential to graceful, or even meaningful, rendering of from Element Citations for human readers. To illustrate this, here is a citation in several widely used citation styles tagged with JATS , and rendered using a formatter that expects s in NLM citation format:

APA style:
Expected Display: Petitti, D. B., Crooks, V. C., Buckwalter, J. G., & Chiu V. (2005, January). Blood pressure levels before dementia. Archive of Neurology, 62(1), 112-116.

Display Using Formatter for NLM Citation: Pettiti D. B., Crooks V. C., Buckwalter J. G., Chiu V.. 2005 January;Blood pressure levels before dementia. Archive of Neurology. 62(1):112–116.

Chicago style:
Expected Display: Petitti, Diana B., Valerie C. Crooks, J. Galen Buckwalter, and Vicki Chiu. “Blood pressure levels before dementia.” Archive of Neurology 62, no. 1 (Jan. 2005): 112-116.

Display Using Formatter for NLM Citation: Pettiti Diana B.. Valerie C.Crooks; J. GalenBuckwalter; VickiChiu. Blood pressure levels before dementia. Archive of Neurology. 62(1)Jan.. 2005. p. 112–116.

Another Chicago style:
Expected Display: Petitti, Diana B., Valerie C. Crooks, J. Galen Buckwalter, and Vicki Chiu. 2005. “Blood pressure levels before dementia.” Archive of Neurology 62 (1 Jan.): 112-116.

Display Using Formatter for NLM Citation: Pettiti Diana B.. Valerie C.Crooks; J. GalenBuckwalter; VickiChiu. 2005 Blood pressure levels before dementia.Archive of Neurology. 62(1)Jan.:p. 112–116.

MLA style:
Expected Display: Petitti, Diana B., Valerie C. Crooks, J. Galen Buckwalter, and Vicki Chiu. “Blood pressure levels before dementia.” Archive of Neurology 62.1 (Jan. 2005): 112-116. Print.

Display Using Formatter for NLM Citation: Pettiti Diana B.. Valerie C.Crooks; J. GalenBuckwalter; VickiChiu. Blood pressure levels before dementia. Archive of Neurology. 62(1)Jan.. 2005. p. 112–116. Print.

ADVICE:

Remember that one of the big selling points of XML is that it creates documents that (are intended to) live a long time and that will be used in a variety of systems. Always assume that someone at some time will render your citations for people to read and assume that they will not have access to you or to any of your documentation. This means that the people/systems that format your documents for display may or may not know how you would format your citations, and that they may or may not know what citation style was used. I suggest that the most future-strong tagging for citations is to use Mixed Citation with String Name.

How much to enhance the XML file

How much enriching information, information that does not typically display in the end user version of an article, will the producer add to the XML to make it more useful?

JATS provides many elements and metadata constructions that a user may choose to use or to ignore. Adding tagging costs money in both initial tagging and in Quality Control to ensure correct tagging. While some XML constructs are legally or contractually required for deposit in some archives or repositories, many others are provided because they add value to the documents, both enhancing the user experience and providing data for collection analysis.

Each JATS user must balance the costs and benefits of this “optional” or “enhancing” material.

Alternatives

DECISION:

Alternatives <alternatives> and the related elements <citation-alternatives>, <collab-alternatives>, and <name-alternatives> provide a mechanism for content producers to provide multiple versions of content with the explicit declaration that they are all equivalent. For example, <name-alternatives> may be used to provide a contributor name in two scripts:

namealt

<name-alternatives> 
<name name-style="eastern" xml:lang="ja-Jpan">
<surname>??</surname> <given-names>??</given-names> </name> 
<name name-style="western" xml:lang="en"> 
<surname>Nakanishi</surname><given-names>Hidehiko</given-names> </name> </name-alternatives>

If alternate versions will be provided, the content provider should decide what alternatives will be provided, how they will be identified, and what the receiving system is expected to do with the alternatives.

BACKGROUND:
If there are two names inside a they are two different expressions of the name of the same person. A count of people should include this person only once although it might be appropriate to display two or even more versions of the name. Similarly, if there are two graphics inside an , these are several versions of the same image. The most appropriate one to the medium should be displayed, but only one should be displayed or printed at a time.

For example, many people think that MathML is the most valuable form in which to exchange math expressions. However, there are many environments in which MathML cannot be rendered for display. A common approach is to provide all mathematical expressions in both MathML and a graphical format.

ADVICE:
Providing several versions of a graphic, perhaps high resolution for print, medium resolution color version to view on screen, and a thumbnail for quick rendering adds significant value to your electronic documents in areas where graphics convey significant content. In subject areas where math is important, users may find value in provision of mathematical expressions in graphical form, MathML, and perhaps TeX or LaTeX. If many of your authors have names that must be simplified for indexing or rendered in scripts other than their native scripts for publication they are likely to find your publications welcoming if you provide their names in a form familiar to their mothers as well as one comfortable for Abstracting and Indexing services.

Be aware that the creation, tagging, and checking of alternatives is costly; don’t take it on without some thought to the schedule, budget, and benefits.

Accessibility

DECISION:
Many documents include content that is not easily accessible to the visually impaired. This includes graphics, mathematical expressions, and large tables. The creator of JATS documents has the ability, and some have the obligation, to add the information necessary to make their documents accessible. Are you going to make your JATS documents accessible?

BACKGROUND:
Section 508 of the US Rehabilitation Act of 1973 requires federal agencies to provide software and website accessibility to people with disabilities. The guidelines on accessibility provided in Section 508, with some variations, have been adopted by many organizations and made a requirement in many publishing environments.

If you need to provide 508 compliant JATS documents there may be a lot of “additional” information you need to provide.

Section 508 deals with such visual and handling aspects as screen flicker, electronic forms, scripting, and color. … Since the … Tag Set does not deal with the look and feel or the behavior of a journal article, but rather with the intellectual content, many of the Section 508 guidelines and WCAG 2.0 techniques do not apply directly to this Tag Set. But certain elements and attributes in this Tag Set enable a publisher, archive, author, aggregator, or other interested party to implement Section 508-compliant or WCAG-accessible display of material based on XML documents tagged with this Tag Set. …

For example, the Section 508 website (http://www.section508.gov), under 508 Standards, Subpart B — Technical Standards, § 1194.22 Web-based intranet and internet information and applications, states that “A text equivalent for every non-text element shall be provided”. This Tag Set does not require that each graphic, for example, also have a non-text companion, but there are three enabling elements available within each element to make that possible. A may contain:

  • an element, to hold a brief description of the graphic for pronouncing software;
  • a element, to hold a full description of the graphic; and/or
  • an element and/or a to hold a link to an even more complete description of the graphic.

—Excerpted from the JATS 1.1 Tag Library at: http://jats.nlm.nih.gov/archiving/tag-library/1.1/chapter/accessibility.html

ADVICE:
My only advice with respect to providing accessibility information is to be sure that you have the resources to carry through if you start. Readers seem to be far more critical of publications with partial or low quality accessibility information than of publications with none. (No, I don’t have any hard data to back up that claim, it is a personal observation based on a few conversations.)

Metadata enhancements to an XML file

DECISION:
How much metadata, beyond that necessary to display the articles as traditionally seen in print, should you add to the XML?

BACKGROUND:
Among the information a JATS document can contain are:

  • Funding information, including a prose description of the source of the funding and the name of each funding source, identification of the award, the principle award recipient, and the principle investigator
  • Permissions information relating to the whole article or to parts of it such as figures, tables, or appendices. This information may include copyright statements, copyright year and holder, and licensing information, including the information described in NISO’s Access and License Indicators Recommended Practice[16]
  • DOIs (Digital Object Identifier)s for the article as a whole, for cited articles, and for parts of the article such as tables, figures, and appendices
  • Information on the contributors to the article may include their role in creating the article, whether or not they were equal contributors with the other contributors, whether the contributor is deceased at publication time, and identifiers such as ORCHID, JST, and NII IDs for each person
  • Identifiers for organizations, especially the affiliations of contributors, including Ringgold and ISNI identifiers
  • Associated articles including related articles (companion articles on the same topic, letters about this article, news related to the topic, or article(s) this article corrects, amends, or supplements

ADVICE:

The decision to provide enriching information in JATS should be made carefully. Publishers should consult with the appropriate Guidelines (see Which Guidelines to Adopt below). It is important that once a provider starts providing DOIs, ORCHIDs, licensing information, long descriptions for the visually impaired, or any of the enhancements that make an article more useful, that you continue to provide them. Reducing the quality of electronic content is likely to be noticed!

Consistency and style

For many constructs (such as associating contributors with their affiliations) JATS provides many ways to encode the same information. This is by design. Since JATS was originally formulated as (and Green is still used largely as) a conversion target, JATS enables an archive to encode the styles we identified as common at the time JATS was written.

However, to the extent possible, I encourage users to be as consistent as possible in the way they encode their JATS documents. Consistency makes Quality Assurance easier and, more important, makes use of the documents in databases easier and reduces insignificant but confusing variation for the reader/user of the documents. The only overall advice I want to give all JATS users is: be consistent. There is no value in encoding the same type of information in different ways and it can lead to misunderstandings, lost documents, poor retrieval, and increased expense.

Contributors and affiliations

DECISION:

JATS provides several ways to encode contributors (authors, illustrators, statisticians, etc.) and several ways to associate affiliations with contributors. Chaos, and costs, are significantly reduced if only one of these styles is selected.

The first decision is: will you standardize how contributors are tagged and how they are associated with their affiliations. The second is: which of the many options will you choose.

BACKGROUND:

These examples are taken from “Tagging Affiliations” in the “Common Tagging Practice” section of the JATS Tag Libraries[11]. A simple way to tag contributors and their affiliations is to put the name of the person and their affiliation(s) inside the Contributor element for each person.

<contrib-group>
 <contrib contrib-type="author" corresp="yes">
 <name>
 <surname>Tanabe</surname>
 <given-names>Lorraine</given-names>
 </name>
 <aff>National Center for Biotechnology Information, National Library of
 Medicine, NIH, 8600 Rockville Pike, Bethesda, MD, USA</aff>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>Thom</surname>
 <given-names>Lynne H.</given-names>
 </name>
 <aff>Consolidated Safety Services, 10335 Democracy Lane, Suite 202,
 Fairfax, VA, USA</aff>
 </contrib>
</contrib-group>

A more compact method might be to group all of the contributors who have the same affiliations into one contributor group with that affiliation for the entire group.

 <contrib-group>
 <contrib contrib-type="author">
 <name>
 <surname>Marth</surname>
 <given-names>Gabor T.</given-names>
 </name>
 <xref ref-type="author-notes" rid="FN1">1</xref>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>Czabarka</surname>
 <given-names>Eva</given-names>
 </name>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>Murvai</surname>
 <given-names>Janos</given-names>
 </name>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>Sherry</surname>
 <given-names>Stephen T.</given-names>
 </name>
 </contrib>
 <aff id="FN1">National Center for Biotechnology Information, National Library 
 of Medicine, National Institutes of Health, Bethesda, Maryland 20894</aff>
</contrib-group>

However, since sequence of contributors is important, that, too, might result in repetitions of some affiliations. Another option is to list all of the contributors and then (after the contributor group) all of the affiliations, and use cross references to associate them.

 ...
<contrib-group>
 <contrib contrib-type="author">
 <name>
 <surname>Moroz</surname>
 <given-names>Olga V.</given-names>
 </name>
 <xref ref-type="aff" rid="A1">1</xref>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>Harkiolaki</surname>
 <given-names>Maria</given-names>
 </name>
 <xref ref-type="aff" rid="A1">1</xref>
 <xref ref-type="aff" rid="A2">2</xref>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>Galperin</surname>
 <given-names>Michael</given-names>
 </name>
 <xref ref-type="aff" rid="A3">3</xref>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>Vagin</surname>
 <given-names>Alexei A.</given-names>
 </name>
 <xref ref-type="aff" rid="A1">1</xref>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>González-Pacanowska</surname>
 <given-names>Delores</given-names>
 </name>
 <xref ref-type="aff" rid="A4">4</xref>
 </contrib>
 <contrib contrib-type="author">
 <name>
 <surname>Wilson</surname>
 <given-names>Keith S.</given-names>
 </name>
 <xref ref-type="aff" rid="A1">1</xref>
 <xref ref-type="author-notes" rid="FN1">*</xref>
 </contrib>
</contrib-group>
<aff id="A1">
 <label>1</label>Structural Biology Laboratory, Department of Chemistry,
 University of York, Heslington, York YO10 5YW, UK;
</aff>
<aff id="A2">
 <label>2</label>Cancer Research UK Cell Signalling Group and Weatherall
 Institute of Molecular Medicine, Oxford, OX3 9DS, UK;
</aff>
<aff id="A3">
 <label>3</label>National Center for Biotechnology Information, National
 Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894,
 USA;
</aff>
<aff id="A4">
 <label>4</label>Instituto de Parasitología y Biomedicina
 "López-Neyra", C/Ventanilla, 11. 18001 Granada, Spain
</aff>
...

Other options include simply providing the affiliations as part of the text of the article and not tagging them or associating them with the contributors. [Not all options are of equivalent quality; these are all weaker than the options above.]

 <contrib-group>
 <contrib>
 <name>
 <surname>Leverenz</surname>
 <given-names>James B.</given-names>
 </name>
 <degrees>MD</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Fishel</surname>
 <given-names>Mark A.</given-names>
 </name>
 <degrees>MD</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Peskind</surname>
 <given-names>Elaine R.</given-names>
 </name>
 <degrees>MD</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Montine</surname>
 <given-names>Thomas J.</given-names>
 </name>
 <degrees>MD</degrees>
 <degrees>PhD</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Nochlin</surname>
 <given-names>David</given-names>
 </name>
 <degrees>MD</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Steinbart</surname>
 <given-names>Ellen</given-names>
 </name>
 <degrees>RN, MA</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Raskin</surname>
 <given-names>Murray A.</given-names>
 </name>
 <degrees>MD</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Schellenberg</surname>
 <given-names>Gerard D.</given-names>
 </name>
 <degrees>PhD</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Bird</surname>
 <given-names>Thomas D.</given-names>
 </name>
 <degrees>MD</degrees>
 </contrib>
 <contrib>
 <name>
 <surname>Tsuang</surname>
 <given-names>Debby</given-names>
 </name>
 <degrees>MD, MS</degrees>
 </contrib>
 <aff>Author Affiliations: Parkinson's Disease (Dr Leverenz), Mental Illness 
 (Drs Leverenz, Peskind, Raskind, Schellenberg, and Tsuang and Ms Steinbart), 
 Research, Education, and Clinical Centers, Veterans Affairs Puget Sound Health 
 Care System, Seattle, Wash; and Departments of Neurology (Drs Leverenz, 
 Fishel, and Bird) and Psychiatry and Behavioral Science (Drs Leverenz, 
 Peskind, Raskind, and Tsuang), Division of Neuropathology, Department of 
 Pathology (Drs Montine and Nochlin), and Division of Gerontology/Geriatrics, 
 Department of Medicine (Dr Schellenberg), University of Washington School 
 of Medicine, Seattle.</aff>
</contrib-group>

ADVICE:

In most publications the association of contributors with their affiliations is important. People will want to search for content written by people with specific organizational affiliations and may even want to find papers written by someone while that person was affiliated with a specific organization. I strongly suggest tagging the names of affiliations and associating them with contributors.

For publications that are likely to have long lists of authors or other contributors many of whom work for the same organization, I suggest putting the affiliation names after all of the contributor names and linking them with ID/IDREFs. However, my only firm advice is to be consistent.

Naming and identification

DECISION:

Everything in a document that is cross-referenced must be named/numbered. In addition, many users provide IDs for things that they think others might want to cross-reference. JATS allows ID identifiers on all elements. How many to identify and how much to reference them is up to each user. JATS also allows users to provide Object Identifiers <object-id> for many portions of the document: abstract, boxed-text, figures, citations (element- and mixed-), and tables among others. Publishers may register Digital Object Identifiers for these portions of the document and put these DOIs into the document using the <object-id> element.

BACKGROUND:
If your documents have references and cross-reference to the references; if they have graphics, and/or if they have internal cross-references, you will have IDs and IDREFs. Someone will decide on a naming/numbering scheme for the cross-references in your JATS documents. The questions are:

  • how much, if any, information will be embedded in the IDs
  • how complex the IDs will be, and
  • who will design the naming scheme

The only rules for identifiers in JATS are that they be legal XML IDs. The only rules in JATS for the content of <object-id>>s is that they be plain character data (no emphasis or internal elements).

ADVICE:

Document a plan for naming the cross-references in your documents. Make it as simple and stupid as possible. If you are capable of totally resisting the impulse to embed any information in the IDs except that this is an ID, you will benefit in the long run. If, like most organizations, you cannot resist that impulse, embed the least amount of information you can stomach in the IDs.

Especially if you create JATS before the document content is frozen, such as before the final edit, resist the impulse to include sequence in the document in IDs. Once IDs are assigned editors should be able to reorganize sections, add or remove cross-references or cross-reference objects without triggering a requirement that IDs be reassigned. For example, if the IDs on tables are Obj1, Obj2, Obj3, Obj4a, Obj4b, Obj4c, Obj5, etc. and an editor moves table Obj4b into supplementary materials and moves the section that references table Obj2 to the end of the document, the tables will how appear in an order that looks like an error: Obj1, Obj3, Obj4a, Obj4c, Obj5, Obj2. If the table IDs did not reflect the sequence in which they occur in the document, there would be no confusion caused by these editorial changes.

Wait, wait, some of you are shouting at the page. Why weren’t the table IDs T1, T2, T3, or better yet Table-1, Table-2, Table-3, etc.? Surely this is the natural way to identify tables! This is a common naming scheme; many users include the type of object in the ID for the object. It isn’t necessary; software looking at the ID can easily see what it is the ID of. But if you can’t resist, go ahead. But be aware that this means you are storing the same information twice, and this raises the question of what should happen if your data is imperfect and you end up with an ID of “Box-14” on a table that says “Table 3-g”.

Which guidelines to adopt

DECISION:

Which of the tagging guidelines and encoding specifications should you adopt in addition to the rules enforced by the version of the JATS tag set you have selected?

BACKGROUND:

JATS constraint languages (DTD, XSD, RNG) enforce basic structural rules. Interoperability, whether among your business partners or inside your own database, requires a deeper level of consistency than can be expected just based on the tag set grammar. There is a growing list of guidelines that the producers of JATS documents are being encouraged to follow. For any particular user some are probably irrelevant. In some situations the various advice/requirements may be contradictory. Among the possibilities are:

  • Archive guidelines/requirements (e.g., PMC[10], ITHAKA/Portico)
  • JATS4R (JATS for Reuse) (http://jats4r.org/)[12]
  • DataCite (https://www.datacite.org/)[13]
  • Force11, in particular their Markup target area (https://www.force11.org/)[14]
  • Service provider guidelines (Atypon, HighWire, SilverChair, … )
  • NISO RP-15-2013, Recommended Practices for Online Supplemental Journal Article Materials[15]
  • NISO RP-22-2015, Access and License Indicators[16]

Each of these guidelines was created to help a community of users solve a common problem. Each is promulgated by enthusiastic advocates. With the exception of the vendor guidelines, each seems to have advocates who say that everyone should always follow their guidelines.

ADVICE:

Look carefully at the goals of each of these guidelines and consider carefully if they apply to you and your documents. Consider how much value you and the users for whom you are creating content will get from any added costs to follow these guidelines.

Just because a document says that all JATS users should do something does not mean that you must do it. Just because a specification is promulgated as being for the good of all does not mean that you must do it. When you hear that all JATS users should do anything I suggest you translate that in your mind to “All the JATS users I know and care about should … what ever is being promoted.

For example, there is great value to many in clearly, precisely, and consistently tagging citations to data. It seems to me that if your content contains (or is likely to contain) a significant number of citations to data sources or if external data sources are key to the credibility of your content then you want to pay close attention to guidelines on how to encode citations of data. However, if you are using JATS to encode information on the bands that will be playing at the venue you manage, don’t waste your energy on DataCite.

Something you don’t need to decide

Many people get excited about the format of the programmatic constraints used to validate the tagging of XML documents. There is a significant amount of noise in the literature about the benefits of W3C XML Schema or RelaxNG over DTDs, and a few articles defending the use of DTDs despite the existence of these newer formats.

In the JATS environment, choice of grammar expression is a minor technical matter, easily changed and of little consequence. JATS tag sets are provided in all three of the common forms: DTD, XSD (W3C XML Schema), and RNG (RELAX NG). You may use only one, or choose to use one for production and a different one for your database or other process. Some tools work best with one form of the constraint language, others work better with, or only with, a different one. Don’t get excited about this; use the one that fits best into your workflow. If some of the people working with your documents use the W3C XML Schema and others use a DTD, it doesn’t matter. If you don’t ask, you probably won’t know which they used. If you tell your vendors which form you want them to use, they will tell you that is what they do. You’ll never know if that’s true, and you don’t need to know.

Put your energy where it matters; make decisions about what should be in your documents and how it should appear in the documents. Your content is far more important than someone’s tools!

Can you ignore all of this?

Of course. You can outsource as much as you want to outsource. If you want to tell a conversion vendor “Make these articles into JATS” and let them do what they do, you can. They will make these decisions, and more. Chances are their choices will be a combination of their typical processing, the requirements of the user/system you have told them is your XML consumer, and what they are doing for some other client. Without guidance from you they will choose the options that are most cost-effective for them (which is not necessarily a bad thing, but might not be the choice you would have made).

ADVICE:

Be informed. Be involved. Make decisions. Do QA.

References

  1. American National Standards Institute/National Information Standards Organization (ANSI/NISO) Z39.96-2015, JATS: Journal Article Tag Suite. Version 1.1. 6 January 2016. Available at: http://www.niso.org/apps/group_public/download.php/15933/z39_96-2015.pdf.
  2. National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Archiving and Interchange Tag Set. Available at: http://jats?.nlm.nih.gov/archiving/. [The JATS Archiving/Green tag set, including DTD, XSD, and RNG forms of the grammar and extensive documentation in a Tag Library.]
  3. National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Publishing Tag Set. Available at: http://jats.nlm.nih.gov/publishing/. [The JATS Publishing/Blue tag set, including DTD, XSD, and RNG forms of the grammar and extensive documentation in a Tag Library.]
  4. National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Article Authoring Tag Set. Available at: http://jats.nlm.nih.gov/articleauthoring/. [The JATS Authoring/Pumpkin tag set, including DTD, XSD, and RNG forms of the grammar and extensive documentation in a Tag Library.]
  5. National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Book Interchange Tag Set: JATS Extension. Available at: http://jats.nlm.nih.gov/extensions/bits/. [The BITS or Book/Chocolate tag set, including DTD, XSD, and RNG forms of the grammar and extensive documentation in a Tag Library.]
  6. World Wide Web Consortium (W3C) Proposed Recommendation, XHTML™ Modularization 1.1. Section 5.6.2. Tables Modules. 13 February 2006. Available at: http://www.w3.org/TR?/2006/PR-xhtml-modularization-20060213/.
  7. Organization for the Advancement of Structured Information Standards (OASIS) Technical Memorandum TR 9901:1999, XML Exchange Table Model Document Type Definition. 29 September 1999. Available at: http://www.oasis-open?.org/specs/tm9901.htm.
  8. World Wide Web Consortium (W3C) Recommendation, Mathematical Markup Language (MathML) Version 2.0 (Second Edition). 21 October 2003. Available at: http://www.w3.org/TR?/2003/REC-MathML2-20031021/.
  9. World Wide Web Consortium (W3C) Recommendation, Mathematical Markup Language (MathML) Version 3.0 (Second Edition). 10 April 2014. Available at: http://www.w3.org/TR?/2014/REC-MathML3-20140410/.
  10. National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), PubMed Central (PMC). PubMed Central Tagging Guidelines. Available at: http://www.ncbi.nlm.nih?.gov/pmc/pmcdoc/tagging-guidelines?/article/style.html.
  11. National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Archiving and Interchange Tag Set, Tagging Affiliations. Available at: http://jats.nlm.nih.gov?/archiving/tag-library/1?.1/chapter/tag-affiliations.html.
  12. JATS for Reuse (JATS4R). Available at: http://jats4r.org.
  13. DataCite. Available at: https://www.datacite.org/.
  14. Force11. Available at: https://www.force11.org/.
  15. National Information Standards Organization (NISO) RP-15-2013, Recommended Practices for Online Supplemental Journal Article Materials. Version 1.0. 24 January 2013. Available at: http://www.niso.org/apps/group_public/download.php/10055/RP-15-2013_Supplemental_Materials.pdf.
  16. National Information Standards Organization (NISO) RP-22-2015, Access License and Indicators. Version 1.0. 5 January 2015. Available at: http://www.niso.org/apps/group_public/download.php/14226/rp-22-2015_ALI.pdf.

Footnotes

  • *Converting existing MathML2 into MathML3 may or may not be easy. The MathML committee says that it should be easy; the changes are that some bugs were fixed and some new features were added. However, there may be complications. MathML2 dropped the @name attribute on the
    element, but it was not removed from the DTD version of the grammar. This means that users who validated MathML to the DTD instead of the XSD version of the grammar may have included @name values, which were not allowed in MathML2 but were DTD valid. The @name attribute was removed from the MathML3 DTD, so those expressions are not DTD valid in MathML3. More important, some of the “bugs” identified in MathML2 and corrected in MathML3 may affect the display of the existing MathML2. Any MathML2 moved to MathML3 and re-rendered may need to be re-proofread to ensure that it still expresses the author’s intention.
  • **The tag is used in the Archiving model to indicate that it contains generated text or punctuation.

 

So you want to adopt JATS. What decisions do you need to make?