openaccess_epub.utils package

Submodules

openaccess_epub.utils.css module

For the creation of default CSS files.

openaccess_epub.utils.element_methods module

Advanced XML manipulation methods built on core lxml functionality.

openaccess_epub.utils.element_methods.append_new_text(destination, text, join_str=None)

This method provides the functionality of adding text appropriately underneath the destination node. This will be either to the destination’s text attribute or to the tail attribute of the last child.

openaccess_epub.utils.element_methods.append_all_below(destination, source, join_str=None)

Compared to xml.dom.minidom, lxml’s treatment of text as .text and .tail attributes of elements is an oddity. It can even be a little frustrating when one is attempting to copy everything underneath some element to another element; one has to write in extra code to handle the text. This method provides the functionality of adding everything underneath the source element, in preserved order, to the destination element.

openaccess_epub.utils.element_methods.all_text(element)

A method for extending lxml’s functionality, this will find and concatenate all text data that exists one level immediately underneath the given element. Unlike etree.tostring(element, method=’text’), this will not recursively walk the entire underlying tree. It merely combines the element text attribute with the tail attribute of each child.

openaccess_epub.utils.element_methods.comment(node)

Converts the node received to a comment, in place, and will also return the comment element.

openaccess_epub.utils.element_methods.elevate_element(node, adopt_name=None, adopt_attrs=None)

This method serves a specialized function. It comes up most often when working with block level elements that may not be contained within paragraph elements, which are presented in the source document as inline elements (inside a paragraph element).

It would be inappropriate to merely insert the block element at the level of the parent, since this disorders the document by placing the child out of place with its siblings. So this method will elevate the node to the parent level and also create a new parent to adopt all of the siblings after the elevated child.

The adopting parent node will have identical attributes and tag name as the original parent unless specified otherwise.

openaccess_epub.utils.element_methods.get_attribute(element, attribute)

Gets the attribute value in a safe way, useful for optional attributes.

Note

deprecated in OpenAccess_EPUB 0.5.5 get_attribute achieves the same functionality as dict.get on the attrib dictionary of an Element. In the future, Element.attrib.get[‘foo’] should be used.

Note

OpenAccess_EPUB 0.6.0 replaces this precisely with Element.attrib.get[‘foo’], further updates will remove this method completely

lxml Elements possess a dictionary called ‘attrib’, but as many attributes are optional, use of optional-safe attribute accession is needed as the key may not always be present.

Parameters:
  • element (lxml.etree.Element object) – The element whose attribute value is being sought
  • attribute (str) – The name of the attribute whose value is being sought
Returns:

attr_value (str or None) – The string value of the attribute, None if it does not exist.

openaccess_epub.utils.element_methods.insert_before(old, new)

A simple way to insert a new element node before the old element node among its siblings.

openaccess_epub.utils.element_methods.ns_format(element, namespaced_string)

Provides a convenient method for adapting a tag or attribute name to use lxml’s format. Use this for tags like ops:switch or attributes like xlink:href.

openaccess_epub.utils.element_methods.remove(node)

A simple way to remove an element node from its tree.

openaccess_epub.utils.element_methods.remove_all_attributes(element, exclude=None)

This method will remove all attributes of any provided element.

A list of strings may be passed to the keyward-argument “exclude”, which will serve as a list of attributes which will not be removed.

openaccess_epub.utils.element_methods.rename_attributes(element, attrs)

Renames the attributes of the element. Accepts the element and a dictionary of string values. The keys are the original names, and their values will be the altered names. This method treats all attributes as optional and will not fail on missing attributes.

openaccess_epub.utils.element_methods.replace(old, new)

A simple way to replace one element node with another.

openaccess_epub.utils.element_methods.serialize(element, strip=False)

A handy way to serialize an element to text.

openaccess_epub.utils.element_methods.uncomment(comment)

Converts the comment node received to a non-commented element, in place, and will return the new node.

This may fail, primarily due to special characters within the comment that the xml parser is unable to handle. If it fails, this method will log an error and return None

openaccess_epub.utils.epub module

Utilities related to the making and managing of EPUB files

openaccess_epub.utils.epub.epub_zip(outdirect)

Zips up the input file directory into an EPUB file.

openaccess_epub.utils.epub.make_EPUB(parsed_article, output_directory, input_path, image_directory, config_module=None, epub_version=None, batch=False)

Standard workflow for creating an EPUB document.

make_EPUB is used to produce an EPUB file from a parsed article. In addition to the article it also requires a path to the appropriate image directory which it will insert into the EPUB file, as well the output directory location for the EPUB file.

Parameters:
  • article (openaccess_epub.article.Article instance) – article is an Article instance for the XML document to be converted to EPUB.
  • output_directory (str) – output_directory is a string path to the directory in which the EPUB will be produced. The name of the directory will be used as the EPUB’s filename.
  • input_path (str) – input_path is a string absolute path to the input XML file, used to locate input-relative images.
  • image_directory (str) – image_directory is a string path indicating an explicit image directory. If supplied, other image input methods will not be used.
  • config_module (config module, optional) – config_module is a pre-loaded config module for OpenAccess_EPUB; if not used then this function will load the global config file. Might be useful in certain cases to dynamically alter configuration.
  • epub_version ({None, 2, 3}) – epub_version dictates which version of EPUB to be created. An error will be raised if the specified version is not supported for the publisher. If left to the default, the created version will defer to the publisher default version.
  • batch (bool, optional) – batch indicates that batch creation is being used (such as with the oaepub batch command). In this case, directory conflicts will be automatically resolved (in favor of keeping previous data, skipping creation of EPUB).
  • Returns False in the case of a fatal error, True if successful.
openaccess_epub.utils.epub.make_epub_base(location)

Creates the base structure for an EPUB file in a specified location.

This function creates constant components for the structure of the EPUB in a specified directory location.

Parameters:location (str) – A path string to a local directory in which the EPUB is to be built

openaccess_epub.utils.images module

Utility suite for handling images.

openaccess_epub.utils.images.explicit_images(images, image_destination, rootname, config)

The method used to handle an explicitly defined image directory by the user as a parsed argument.

openaccess_epub.utils.images.fetch_frontiers_images(doi, output_dir)

Fetch the images from Frontiers’ website. This method may fail to properly locate all the images and should be avoided if the files can be accessed locally. Downloading the images to an appropriate directory in the cache, or to a directory specified by passed argument are the preferred means to access images.

openaccess_epub.utils.images.fetch_plos_images(article_doi, output_dir, document)

Fetch the images for a PLoS article from the internet.

PLoS images are known through the inspection of <graphic> and <inline-graphic> elements. The information in these tags are then parsed into appropriate URLs for downloading.

openaccess_epub.utils.images.get_images(output_directory, explicit, input_path, config, parsed_article)

Main logic controller for the placement of images into the output directory

Controlling logic for placement of the appropriate imager files into the EPUB directory. This function interacts with interface arguments as well as the local installation config.py file. These may change behavior of this function in terms of how it looks for images relative to the input, where it finds explicit images, whether it will attempt to download images, and whether successfully downloaded images will be stored in the cache.

Parameters:
  • output_directory (str) – The directory path where the EPUB is being constructed/output
  • explicit (str) – A directory path to a user specified directory of images. Allows * wildcard expansion.
  • input_path (str) – The absolute path to the input XML file.
  • config (config module) – The imported configuration module
  • parsed_article (openaccess_epub.article.Article object) – The Article instance for the article being converted to EPUB
openaccess_epub.utils.images.image_cache(article_cache, img_dir)

The method to be used by get_images() for copying images out of the cache.

openaccess_epub.utils.images.input_relative_images(input_path, image_destination, rootname, config)

The method used to handle Input-Relative image inclusion.

openaccess_epub.utils.images.make_image_cache(img_cache)

Initiates the image cache if it does not exist

openaccess_epub.utils.images.move_images_to_cache(source, destination)

Handles the movement of images to the cache. Must be helpful if it finds that the folder for this article already exists.

openaccess_epub.utils.inputs module

The methods in this module all utilize their string argument to specify and (optionally) download an input xml file. They all return the base name of their input file; that is, the extension-less name of the file. These base names then provide the basis for instantiating Article class objects and the naming of output files.

openaccess_epub.utils.inputs.doi_input(doi_string, download=True)

This method accepts a DOI string and attempts to download the appropriate xml file. If successful, it returns a path to that file. As with all URL input types, the success of this method depends on supporting per-publisher conventions and will fail on unsupported publishers

openaccess_epub.utils.inputs.frontiersZipInput(zip_path, output_prefix, download=None)

This method provides support for Frontiers production using base zipfiles as the input for ePub creation. It expects a valid pathname for one of the two zipfiles, and that both zipfiles are present in the same directory.

openaccess_epub.utils.inputs.plos_doi_to_xmlurl(doi_string)

Attempts to resolve a PLoS DOI into a URL path to the XML file.

openaccess_epub.utils.inputs.url_input(url_string, download=True)

This method expects a direct URL link to an xml file. It will apply no modifications to the received URL string, so ensure good input.

openaccess_epub.utils.logs module

OpenAccess_EPUB utilities for logging.

openaccess_epub.utils.logs.config_logging(no_log_file, log_to, log_level, silent, verbosity)

Configures and generates a Logger object, ‘openaccess_epub’ based on common parameters used for console interface script execution in OpenAccess_EPUB.

These parameters are:
no_log_file
Boolean. Disables logging to file. If set to True, log_to and log_level become irrelevant.
log_to
A string name indicating a file path for logging.
log_level
Logging level, one of: ‘debug’, ‘info’, ‘warning’, ‘error’, ‘critical’
silent
Boolean
verbosity
Console logging level, one of: ‘debug’, ‘info’, ‘warning’, ‘error’, ‘critical

This method currently only configures a console StreamHandler with a message-only Formatter.

openaccess_epub.utils.logs.get_level(level_string)

Returns an appropriate logging level integer from a string name

openaccess_epub.utils.logs.null_logging()

Configures the Logger for ‘openaccess_epub’ to do nothing

openaccess_epub.utils.logs.replace_filehandler(logname, new_file, level=None, frmt=None)

This utility function will remove a previous Logger FileHandler, if one exists, and add a new filehandler.

Parameters:
logname
The name of the log to reconfigure, ‘openaccess_epub’ for example
new_file
The file location for the new FileHandler
level
Optional. Level of FileHandler logging, if not used then the new FileHandler will have the same level as the old. Pass in name strings, ‘INFO’ for example
frmt
Optional string format of Formatter for the FileHandler, if not used then the new FileHandler will inherit the Formatter of the old, pass in format strings, ‘%(message)s’ for example

It is best practice to use the optional level and frmt arguments to account for the case where a previous FileHandler does not exist. In the case that they are not used and a previous FileHandler is not found, then the level will be set logging.DEBUG and the frmt will be set to openaccess_epub.utils.logs.STANDARD_FORMAT as a matter of safety.

Module contents

Common utility functions

openaccess_epub.utils.Identifier

alias of Identifer

class openaccess_epub.utils.OrderedSet(iterable=None)

Bases: collections.abc.MutableSet

add(key)
discard(key)
pop(last=True)
openaccess_epub.utils.base_epub_location()

Returns the expected location of the base_epub directory

openaccess_epub.utils.cache_location()

Cross-platform placement of cached files

openaccess_epub.utils.config_location()

Returns the expected location of the config file

openaccess_epub.utils.dir_exists(directory)

If a directory already exists that will be overwritten by some action, this will ask the user whether or not to continue with the deletion.

If the user responds affirmatively, then the directory will be removed. If the user responds negatively, then the process will abort.

openaccess_epub.utils.epubcheck(epubname, config=None)

This method takes the name of an epub file as an argument. This name is the input for the java execution of a locally installed epubcheck-.jar. The location of this .jar file is configured in config.py.

openaccess_epub.utils.evaluate_relative_path(working='/var/build/user_builds/openaccess-epub/checkouts/latest/docs', relative='')

This function receives two strings representing system paths. The first is the working directory and it should be an absolute path. The second is the relative path and it should not be absolute. This function will render an OS-appropriate absolute path, which is the normalized path from working to relative.

openaccess_epub.utils.file_root_name(name)

Returns the root name of a file from a full file path.

It will not raise an error if the result is empty, but an warning will be issued.

openaccess_epub.utils.files_with_ext(extension, directory='.', recursive=False)

Generator function that will iterate over all files in the specified directory and return a path to the files which possess a matching extension.

You should include the period in your extension, and matching is not case sensitive: ‘.xml’ will also match ‘.XML’ and vice versa.

An empty string passed to extension will match extensionless files.

openaccess_epub.utils.get_absolute_path(some_path)

This function will return an appropriate absolute path for the path it is given. If the input is absolute, it will return unmodified; if the input is relative, it will be rendered as relative to the current working directory.

openaccess_epub.utils.get_output_directory(args)

Determination of the directory for output placement involves possibilities for explicit user instruction (absolute path or relative to execution) and implicit default configuration (absolute path or relative to input) from the system global configuration file. This function is responsible for reliably returning the appropriate output directory which will contain any log(s), ePub(s), and unzipped output of OpenAccess_EPUB.

It utilizes the parsed args, passed as an object, and is self-sufficient in accessing the config file.

All paths returned by this function are absolute.

openaccess_epub.utils.load_config_module()

If the config.py file exists, import it as a module. If it does not exist, call sys.exit() with a request to run oaepub configure.

openaccess_epub.utils.mkdir_p(dir)
openaccess_epub.utils.publisher_plugin_location()

Returns the expected location of the publisher_plugins directory.