1. Home
  2. Recommendations
  3. General XML recommendation

General XML recommendation

Status: Published
Version: 1.0
License: this recommendation document is licensed under CC BY-ND 2.0 UK

Context

JATS articles as XML documents

Description

This recommendation applies to an article’s XML as a whole; i.e. best practices for setting up a JATS article XML document.

Which JATS DTD?

Most JATS4R recommendations are applicable to NLM 3.0 or later. However, in certain cases, a more current version of the JATS DTD (NISO JATS Z39.96-2015) may be required to follow the recommendation. In such cases, the recommendation will clearly indicate which version of JATS is needed.

When recommendations are applicable to only specific version(s) of JATS, this will be indicated. The validator will not produce warnings or failure messages when testing those recommendations against a document that conforms to some other version.

The tool and recommendations are intended to apply to both Publishing (Blue) and Archiving (Green) schemas. When recommendations only apply to one of the schemas, that will be indicated in the recommendations document, and the validator will not produce error or warning messages if the article declares itself as conforming to the other.

The JATS Article Authoring schema (also known as the “Pumpkin” tag set) is not addressed by JATS4R recommendations, because the Authoring schema was designed to allow an author to draft original work in XML. Articles in this schema are in an early stage of the publication process, and are not meant for reuse. While JATS4R recommendations should always be considered ‘best practice’, and could be adopted by tools that produce articles in the Authoring model, these recommendations are not intended to apply to articles at that stage.

Recommendation

Reference methodSchema language
 DTDW3C XSDRelax-NG
DOCTYPE declaration?  
xsi:noNamespaceSchemaLocation attribute ? 
<?xml-model?> processing instruction???
  1. XML document structure.All JATS4R-compliant article XML files must self-identify which version of JATS they conform to, by unambiguously referencing one JATS schema. The JATS schemas are available in three languages, and one (and only one) of these should be referenced. Those three languages are
    DTD
    Relax-NG
    W3C Schema (XSD)

    There are three different methods that an article XML file can use to refer to its schema. These are given in the following table, which also indicates which method can be used for each language.

    [[Validator tool result: if the document does not reference a schema ERROR]]

    [[Validator tool result: if the document references two or more different schemas ERROR]]

    [[Validator tool result: if it references two or more schemas which and they agree on the exact version of JATS WARNING]]
  2. DOCTYPE declarations. Most JATS articles use a DOCTYPE declaration to reference the specific version of the JATS DTD. To comply with JATS4R, when using a DOCTYPE declaration, it must include the public identifier, and the complete, absolute URL of the system identifier. See example 1.

    [[Validator tool result: if the doctype declaration only uses the system identifier ERROR]]

    [[Validator tool result: if the doctype declaration uses a public or system identifier that doesn’t exactly match one of the official NLM or NISO JATS DTDs ERROR]]

    [[Validator tool result: if the system identifier is not the full, canonical URL of the JATS DTD ERROR]]

    [[Validator tool result: if the referenced version of JATS schema is earlier than JATS 1.0 WARNING]]
  3. @noNamespaceSchemaLocation. An article XML file could also reference the W3C XSD version of the JATS schema, by using this attribute. If so, the full URL of the XSD file must be used.  See example 2.

    [[Validator tool result: if the value of the attribute doesn’t exactly match the full, canonical URL of the W3C XSD version of an official NLM or NISO JATS schema ERROR]]

    [[Validator tool result: if the referenced version of JATS schema is earlier than JATS 1.0 WARNING]]
  4. xml-model processing instruction. An article can use the <?xml-model?> processing instruction (see the specification), to refer to any of the three languages of JATS. When using the <?xml-model?> processing instruction, place it before the root element. The processing instruction MUST have an @href pseudo-attribute, and the content of this pseudo-attribute must be the absolute, complete URL of the schema. See example 3 for processing instruction to refer to the DTD; example 4 for using the xml-model processing instruction to reference the Relax-NG version of JATS and example 5 to reference the W3C XSD version.

    [[Validator tool result: if @href refers to a DTD, then @type must contain “application/xml-dtd” ERROR]]

    [[Validator tool result: if @href refers to a Relax-NG file, then @schematypens must have the value “http://relaxng.org/ns/structure/1.0” ERROR]]

    [[Validator tool result: if @href refers to a W3C XSD, then @schematypens must have the value “http://www.w3.org/2001/XMLSchema” ERROR]]

    [[Validator tool result: if the value of @href doesn’t exactly match the full, canonical URL of the corresponding language-version of an official NLM or NISO JATS schema ERROR]]

    [[Validator tool result: if the referenced version of JATS schema is earlier than JATS 1.0 WARNING]]
  5. Character encoding. All JATS XML documents should be encoded either as UTF-8 or UTF-16. Note that a byte order mark (BOM) MAY be present if the encoding is UTF-8 but MUST be present if the encoding is UTF-16.

    Also note that US-ASCII, which is commonly used, is a subset of UTF-8, so files that are restricted to the US-ASCII are perfectly fine, but if an XML declaration is given, the encoding should be specified as “utf-8”.

    [[Validator tool result: if the encoding is not UTF-8 or UTF-16 ERROR]]

    [[Validator tool result: if the encoding is UTF-16, and there is no BOM. In fact, note that the current implementation of the validator will not correctly read any XML file not in UTF-8 or UTF-16, so the error message produced will simply indicate that the file is not readable ERROR]]
  6. Character entity references. Do not use named character entity references to represent special characters (see rationale, below). Instead, use either a non-escaped form of the character or a numeric character reference.

    Rationale and examples:
    The JATS DTDs define a large set of character entity references (CERs), that they inherit from MathML (see MathML Chapter 6. Characters Entities and Fonts for a list). These appear in XML document as, for example, “©”, and get translated by the XML parser into Unicode code points. In order to correctly parse XML files that use CERs, a tool would be required to fetch and parse the entire DTD. Because configuring a system to correctly fetch a DTD, either from the Internet, or from an internally cached version, can be quite burdensome, JATS4R recommends that instance documents do not use CERs. Note that this does not include the five “built-in” XML character entities, “<”, “>”, “’”, “”” and “&”. All XML-compliant parsers are able to expand these without reference to the DTDs.

    So, for example, the following copyright-statement would not be JATS4R-compliant:
<copyright-statement>&copy; 2014 Surname et al.</copyright-statement>

Instead, the copyright symbol should be included either directly in the document in non-escaped form (preferred):

<copyright-statement>© 2014 Surname et al.</copyright-statement>

or, as a numeric character reference: 

<copyright-statement>&#xA9; 2014 Surname et al.</copyright-statement>

[[Validator tool result: if named character entity references are used ERROR]]

7. @dtd-version. JATS4R articles should use this attribute on <article>, with the correct value for the version of JATS that they are using.

[[Validator tool result: if this attribute is missing WARNING]]

[[Validator tool result: if the value of this attribute doesn’t match that required by the referenced version of JATS ERROR]]

Examples

Example 1

<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd"> 
<article dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"> 
... 
</article>

Example 2

<article dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation= "http://jats.nlm.nih.gov/publishing/1.1/xsd/JATS-journalpublishing1.xsd"> 
... 
</article>

Example 3: Using the processing instruction to refer to the DTD

<?xml-model type="application/xml-dtd" href="http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd"?> 
<article dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"> 
... 
</article>

Example 4: Using the xml-model processing instruction to reference the Relax-NG version of JATS

<?xml-model schematypens="http://relaxng.org/ns/structure/1.0" href="http://jats.nlm.nih.gov/publishing/1.1/rng/JATS-journalpublishing1.rng"?> 
<article dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"> 
... 
</article>

Example 5: To reference the W3C XSD version

<?xml-model schematypens="http://www.w3.org/2001/XMLSchema" href="http://jats.nlm.nih.gov/publishing/1.1/xsd/JATS-journalpublishing1.xsd"?> 
<article dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"> 
... 
</article>
Updated on October 18, 2023

Related Articles

Provide feedback on this recommendation

Please note you are commenting on this specific recommendation. To suggest a new recommendation, please follow the link on the homepage. By proceeding with your comment here, you understand that your comment will be publicly visible and you may be contacted by JATS4R in case of further clarification.

You may use markdown to format your comment. For example, to allow <> tags to display, please start and end that portion of your comment with three backtick characters, ```.