Usage Guide¶
Installing OpenAccess_EPUB will add the openaccess_epub module to your python libraries (see Installation for installation instructions), along with a command-line user interface to access some built-in utilities for EPUB production. This Guide will focus on how to use the command interface; if you are a developer wishing to primarily make use of the openaccess_epub module or sub-modules, you may wish to skip directly to the Module Documentation for OpenAccess_EPUB
oaepub Interface¶
The oaepub command should be available once you have installed OpenAccess_EPUB. oaepub will be the command prefix for every action executed by OpenAccess_EPUB. oaepub provides access to several sub-commands for different jobs. If you have ever used git, then you are familiar with this kind of interface with sub-commands (git add, git commit, etc.)
Using the -h or --help option will always tell you how to use the command or sub-command. Try the following:
oaepub --help
It should print the following to your console:
OpenAccess_EPUB
Usage:
oaepub [--version | --help] [options] COMMAND [ARGS ...]
Options:
-h --help show this help message and exit
-v --version show program version and exit
-V --verbose print additional information about command execution
The available commands are:
batch Convert all the contents of a directory to individual EPUB
clearcache Delete some, or all, of the contents of OpenAccess_EPUB's cache
collection Convert multiple articles into a single omnibus EPUB
configure Configure some settings for your OpenAccess_EPUB install
convert Convert explicit input(s) individually to EPUB
epubzip Zip an unzipped EPUB file back into a valid EPUB
publishers Show which publishers are currently supported by OpenAccess_EPUB
validate Validate article XML files according to their specification
See 'oaepub COMMAND --help' for more information on a specific command.
For more help, visit the documentation:
You should observe the command configure among the list of available commands, which will allow you to define configuration variables for your use of OpenAccess_EPUB.
Configuration¶
Before the majority of OpenAccess_EPUB commands will work, you must set up some configuration options. Configuration may be done by interactive prompt using the command
oaepub configure
Or the configuration may be set to normal defaults using
oaepub configure --default
The information for your configuration will be stored in a Python file called config.py. It will be located in the base directory of your OpenAccess_EPUB cache, to find out where this is, use the command
oaepub configure where
At any point, you may re-run the configure command to change your settings or edit the config.py file in a text editor. The oaepub clearcache manual command will print out the location of the cache itself as well as attempt to launch a platform-appropriate file browser at that location.
Interactive Configuration¶
The command oaepub configure will launch an interactive script for setting you configuration. Each setting has a default value contained in square brackets “[]”, if you do not wish to change the default setting you may simply press “Enter” to accept the default and move on. Each setting should provide plenty of explanation, but it is worthwhile to re-iterate some general concepts here.
Some variable settings require path values that may be either absolute paths or relative paths; absolute paths will be treated the same in any context by OpenAccess_EPUB while relative paths will be treated as relative to a given input file. If you are on Windows, it should be okay for you to use “\” or the unix-style “/” in your paths without issues.
For settings that allow multiple values, make sure that each individual value is separated by a comma ”,”.
The wildcard expansion using the “*” character in some options will expand using the name of the input file. For “journal_article.xml” that name will be “journal_article”. [1]
Overview of Conversion Commands with Examples¶
There are three commands for the purpose of converting OpenAccess journal articles to EPUB documents: convert, batch, and collection.
- The convert command is a general tool for converting one to a few journal article input files into individual EPUB files.
- The batch command is specialized for converting large quantities of journal article input files (contained within a directory) into EPUB individual files.
- The collection command is specialized to take advantage of a powerful feature of OpenAccess_EPUB; it can convert several journal article input files into a single EPUB file representing a collection or omnibus.
Speaking of input files...
Input, what Input?¶
OpenAccess_EPUB always operates on a special XML (.xml extension) file produced by the journal publisher that contains all of the data and metadata for the article. These .xml files are constructed according to various versions of a standard called the Journal Publishing Tag Set . That said, the convert command will also work with the appropriate DOI or URL for a journal article if online-fetching support has been provided for the specific publisher. In this case, it will download the XML file automatically. If this fails, you will need to download the file manually.
convert¶
Let’s suppose that we wish to convert the PLoS Computational Biology article located at this URL http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003450 to an EPUB file. We have the following three options for specifying this article to the convert command:
oaepub convert "http://www.ploscompbiol.org/article/fetchObjectAttachment.action?uri=info%3Adoi%2F10.1371%2Fjournal.pcbi.1003450&representation=XML"
oaepub convert doi:10.1371/journal.pcbi.1003450
oaepub convert path/to/journal.pcbi.1003450.xml
For the first option we copied and pasted the link to the XML file from the sidebar, and enclosed it in quotes to avoid issues with special characters. For the second option we simply find the DOI listed for the article on the web page. For the third option we download the XML file from the web page’s sidebar to our hard drive, then provide the path to the file as the input argument. In the first two options, the command would download the XML file to the working directory in which the command was executed as the first step. We’ll procede from the third example.
Assuming default configuration, the output EPUB file would be located in the same directory as the input XML file as path/to/journal.pcbi.1003450.epub. If we wanted to place the output somewhere else, we could use the --output (-o) option like:
oaepub convert -o my/articles/folder path/to/journal.pcbi.1003450.xml
If we wanted to have no information printed out during conversion, we could use --silence (-s) like:
oaepub convert -s path/to/journal.pcbi.1003450.xml
Or we could have more information printed out using --verbosity (-V) like:
oaepub convert --verbosity DEBUG path/to/journal.pcbi.1003450.xml
For more information and options with the convert command refer to oaepub convert --help
batch¶
The batch command’s job is to convert all articles in a directory (or multiple directories) to EPUB. This is somewhat like running convert on each article (oaepub convert ./*.xml would work ), but provides a few useful additional features for large batch jobs. It will not stop to ask about file or directory name conflicts, it will simply skip converting the article at issue. A simple example might be:
oaepub batch articles_dir/
where “articles” is a directory which contains XML files for journal articles. This example would create a log file for each article it processed to EPUB, what if we wanted to only log all errors into a single file for the whole batch? We might use the --log-to (-l) and --log-level options like this:
oaepub batch --log-to batch_errors.log --log-level ERROR articles_dir/
OpenAccess_EPUB can locate the images for an article on your local machine either from the cache or relative to the input (this behavior is configurable see Configuration). If we want to explicitly specify a pattern for finding images for an article based on “*” name matching, we could use the --images (-i) option like this for input-relative images:
oaepub batch --images ./images/* articles_dir/
which would find image directories of with the path “articles_dir/images/{input-file-name}”.
The --recursive (-r) option will instruct the batch command to recursively traverse sub-directories of each listed directory input to convert their contained XML files as well.
For more information and options with the batch command refer to oaepub batch --help
collection¶
The collection command is similar to the batch command, however it expects a text file as its input. The name of the text file will become the name of the EPUB, as well as the name of the single log file (unless --log-to is employed). Each line of the file should contain a path to a local XML file; the order of the files listed will be the order of the articles in the EPUB document. An example collection text file could read like this (all lines are valid):
./first_article.xml
../second_article.xml
nested/directory/third_article.xml
/absolute/path/to/fourth_article.xml
If this file is named Reading_List_2014.txt, then the following command could be used to create the collection EPUB.
oaepub collection Reading_List_2014.txt
For more information and options with the collection command refer to oaepub collection --help
Footnotes
[1] | The use of the star “*” for wildcard-expansion is usable in the config.py file and in certain command-line options (such as oaepub batch --images). In many shells “*” is a special character so you may need to use the following syntax to avoid its special treatment: oaepub batch --images="spam-*" articles_dir/ |