General recommendations

Status: PUBLISHED

Context

JATS articles as XML documents

Description

These recommendations apply to an article’s XML as a whole; i.e., they are best practices for setting up a JATS article XML document.

Which JATS DTD?

N.B. All JATS4R recommendations are applicable to NISO JATS Z39.96-2012, version 1.1 of the tagset or later.

When recommendations are applicable to only specific version(s) of JATS, this will be indicated. The validator will not produce warnings or failure messages when testing those recommendations against a document that conforms to some other version.

The tool and recommendations are intended to apply to both Publishing (Blue) and Archiving (Green) schemas. When recommendations only apply to one of the schemas, that will be indicated in the recommendations document, and the validator will not produce error or warning messages if the article declares itself as conforming to the other.

The JATS Article Authoring schema (also known as the “Pumpkin” tag set) is not addressed by JATS4R recommendations, because the Authoring schema was designed to allow an author to draft original work in XML. Articles in this schema are in an early stage of the publication process, and are not meant for reuse. While JATS4R recommendations should always be considered “best practice”, and could be adopted by tools that produce articles in the Authoring model, these recommendations are not intended to apply to articles at that stage.

Recommendations

  1. XML document structure. All JATS4R-compliant article XML files must self-identify which version of JATS they conform to, by unambiguously referencing one JATS schema. The JATS schemas are available in three languages, and one (and only one) of these should be referenced. Those three languages are
    • DTD
    • Relax-NG
    • W3C Schema (XSD)

    There are three different methods that an article XML file can use to refer to its schema. These are given in the following table, which also indicates which method can be used for each language.

    Reference method Schema language
    DTD W3C XSD Relax-NG
    DOCTYPE declaration
    xsi:noNamespaceSchemaLocation attribute
    <?xml-model?> processing instruction

    [[Validator results:

    • Error if the document does not reference a schema.
    • Error if the document references two or more different schemas.
    • Warning if it references two or more schemas which and they agree on the exact version of JATS.]]
  2. DOCTYPE declarations. Most JATS articles use a DOCTYPE declaration to reference the specific version of the JATS DTD. To comply with JATS4R, when using a DOCTYPE declaration, it must include the public identifier, and the complete, absolute URL of the system identifier. For example:
    <!DOCTYPE article
     PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN"
     "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
    <article dtd-version="1.1" 
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns:mml="http://www.w3.org/1998/Math/MathML">
     ...
    </article>

    [[Validator results:

    • Error if the doctype declaration only uses the system identifier.
    • Error if the doctype declaration uses a public or system identifier that doesn’t exactly match one of the official NLM or NISO JATS DTDs.
    • Error if the system identifier is not the full, canonical URL of the JATS DTD.
    • Warning if the referenced version of JATS schema is earlier than JATS 1.0.]]
  3. @noNamespaceSchemaLocation. An article XML file could also reference the W3C XSD version of the JATS schema, by using this attribute. If so, the full URL of the XSD file must be used.  For example:
     <article dtd-version="1.1" 
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns:mml="http://www.w3.org/1998/Math/MathML"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:noNamespaceSchemaLocation=
     "http://jats.nlm.nih.gov/publishing/1.1/xsd/JATS-journalpublishing1.xsd">
     ...
    </article>
    

    [[Validator results:

    • Error if the value of the attribute doesn’t exactly match the full, canonical URL of the W3C XSD version of an official NLM or NISO JATS schema.
    • Warning if the referenced version of JATS schema is earlier than JATS 1.0.]]
  4. xml-model processing instruction. An article can use the <?xml-model?> processing instruction (see the specification), to refer to any of the three languages of JATS.When using the <?xml-model?> processing instruction, place it before the root element. The processing instruction MUST have an @href pseudo-attribute, and the content of this pseudo-attribute must be the absolute, complete URL of the schema. For example, using the processing instruction to refer to the DTD:
    <?xml-model type="application/xml-dtd"
     href="http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd"?>
    <article dtd-version="1.1" 
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns:mml="http://www.w3.org/1998/Math/MathML">
     ...
    </article>
    

    The following is an example of using the xml-model processing instruction to reference the Relax-NG version of JATS:

    <?xml-model schematypens="http://relaxng.org/ns/structure/1.0"
     href="http://jats.nlm.nih.gov/publishing/1.1/rng/JATS-journalpublishing1.rng"?>
    <article dtd-version="1.1" 
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns:mml="http://www.w3.org/1998/Math/MathML">
     ...
    </article>
    

    And finally, to reference the W3C XSD version:

    <?xml-model schematypens="http://www.w3.org/2001/XMLSchema"
     href="http://jats.nlm.nih.gov/publishing/1.1/xsd/JATS-journalpublishing1.xsd"?>
    <article dtd-version="1.1" 
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns:mml="http://www.w3.org/1998/Math/MathML">
     ...
    </article>
    

    [[Validator results:

    • Error if the @type or @schematypens pseudo-attributes don’t match the schema language, i.e..:
      • If @href refers to a DTD, then @type must contain “application/xml-dtd”
      • If @href refers to a Relax-NG file, then @schematypens must have the value “http://relaxng.org/ns/structure/1.0”
      • If @href refers to a W3C XSD, then @schematypens must have the value “http://www.w3.org/2001/XMLSchema”
    • Error if the value of @href doesn’t exactly match the full, canonical URL of the corresponding language-version of an official NLM or NISO JATS schema.
    • Warning if the referenced version of JATS schema is earlier than JATS 1.0.]]
  5. Character encoding. All JATS XML documents should be encoded either as UTF-8 or UTF-16. Note that a byte order mark (BOM)
    • May be present if the encoding is UTF-8
    • Must be present if the encoding is UTF-16

    Also note that US-ASCII, which is commonly used, is a subset of UTF-8, so files that are restricted to the US-ASCII are perfectly fine, but if an XML declaration is given, the encoding should be specified as “utf-8”.

    [[Validator results:

    • Error if the encoding is not UTF-8 or UTF-16.
    • Error if the encoding is UTF-16, and there is no BOM. In fact, note that the current implementation of the validator will not correctly read any XML file not in UTF-8 or UTF-16, so the error message produced will simply indicate that the file is not readable.]]
  6. Character entity references. Do not use named character entity references to represent special characters (see rationale, below). Instead, use either a non-escaped form of the character or a numeric character reference.
    • Rationale and examples:
    • The JATS DTDs define a large set of character entity references (CERs), that they inherit from MathML. (See MathML Chapter 6. Characters Entities and Fonts for a list.) These appear in XML document as, for example, “©”, and get translated by the XML parser into Unicode code points. In order to correctly parse XML files that use CERs, a tool would be required to fetch and parse the entire DTD. Because configuring a system to correctly fetch a DTD, either from the Internet, or from an internally cached version, can be quite burdensome, JATS4R recommends that instance documents do not use CERs.Note that this does not include the five “built-in” XML character entities, “<”, “>”, “’”, “”” and “&”. All XML-compliant parsers are able to expand these without reference to the DTDs.
    • So, for example, the following copyright-statement would not be JATS4R-compliant:
    <copyright-statement>&copy; 2014 Surname et al.</copyright-statement>
    

    Instead, the copyright symbol should be included either directly in the document in non-escaped form (preferred):

    <copyright-statement>© 2014 Surname et al.</copyright-statement>

    or, as a numeric character reference:

     <copyright-statement>&#xA9; 2014 Surname et al.</copyright-statement>

    [[Validator result: Error if named character entity references are used]]

  7. @dtd-version. JATS4R articles should use this attribute on <article>, with the correct value for the version of JATS that they are using.
    [[Validator results:

    • Warning if this attribute is missing
    • Error if the value of this attribute doesn’t match that required by the referenced version of JATS.]]