openaccess_epub.article package

Module contents

openaccess_epub.article defines abstract article representation

The openaccess_epub.article contains the Article class which is instantiated per article XML file. Basic article structural elements are collected and metadata is parsed according to DTD rules into a python data structure. The Article class forms a basic unit in the procedure of analyzing articles and in the conversion to EPUB.

class openaccess_epub.article.Article(xml_file, validation=True)

Abstract class for journal article; parses XML to data structure.

The Article class operates on an abstract level to execute some common processing tasks for all journal articles. It first parses the journal article XML to an lxml.etree structure, then inspects the file to discover the appropriate DTD and version by which the article was published. It, optionally, validates the article according to its DTD then proceeds (if successful) to recursively parse all metadata into a tree data structure. This facilitates easy accession of nested elements using the following strategy: “Article.metadata.front.journal_meta.publisher”

Parameters:
  • xml_file (str) – Path to the xml file for parsing xml_file.
  • validation (bool, optional) – DTD validation is used when this evaluates True, use is strongly advised validation.
doi

str

The full DOI string for the article doi.

dtd

lxml.etree.DTD object

The parsed DTD object used for validation and metadata parsing dtd.

dtd_name

str

The name of the DTD, such as “JPTS” dtd_name.

dtd_version

float

The version of the DTD, such as 3.0 dtd_version.

metadata

namedtuple object

The metadata attribute is a tree structure of nested namedtuples. For JPTS the metadata holds two attributes, ‘front’ and ‘back’. Each namedtuple under metadata will possess: attributes for every allowed child element defined by DTD, a dictionary of XML attributes held in the ‘attrs’ attribute, and a ‘node’ attribute for the lxml.etree Element itself. If any would-be attribute conflicts with a python keyword, it will be prepended by ‘l’ metadata.

publisher

str

A standardized, concise name for the publisher of the article, such as “PLoS” or “Frontiers” publisher.

get_DOI()

This method defines how the Article tries to detect the DOI.

It attempts to determine the article DOI string by DTD-appropriate inspection of the article metadata. This method should be made as flexible as necessary to properly collect the DOI for any XML publishing specification.

Returns:doi (str or None) – The full (publisher/article) DOI string for the article, or None on failure.
get_publisher()

This method defines how the Article tries to determine the publisher of the article.

This method relies on the success of the get_DOI method to fetch the appropriate full DOI for the article. It then takes the DOI prefix which corresponds to the publisher and then uses that to attempt to load the correct publisher-specific code. This may fail; if the DOI is not mapped to a code file, if the DOI is mapped but the code file could not be located, or if the mapped code file is malformed then this method will issue/log an informative error message and return None. This method will not try to infer the publisher based on any metadata other than the DOI of the article.

Returns:publisher (Publisher instance or None)
openaccess_epub.article.dtd_tuple

alias of DTD_Tuple

openaccess_epub.article.iskeyword()

x.__contains__(y) <==> y in x.