Plugin Developer’s Guide¶
Purpose of this document¶
Meeting the basic technical requirements of a plugin is straightforward. Simple examples include the demo1 and demo2 plugins in the main Djerba repository.
The goal of this document is to describe how to write a useful plugin, to process production data for a clinical report. It outlines features of Djerba intended to make this easier:
Convenience methods and classes, eg. for manipulating INI config structures
Validation of output, to ensure it is correctly formatted
It also describes how and where to write the plugin code.
Definitions¶
What is a plugin?¶
A plugin works as a mini-Djerba:
Configure an INI file
Extract data to JSON
Render data as HTML
A plugin is a subclass of the Djerba plugin base class.
Other definitions¶
Core: The Djerba framework to detect and run plugins. The core runs all relevant components to produce an INI file, JSON file, and HTML/PDF documents, similarly to classic Djerba.
Workspace: Shared directory in which we can write temporary files
Merger: A type of component which merges and deduplicates JSON from one or more plugins, to write HTML at the render step
Helper: A type of component which writes information to the workspace for use by other components, in the configure or extract steps
Component: Any sub-unit of Djerba: A plugin, merger, or helper. We can define multiple plugins, helpers, and mergers as needed.
Identifiers¶
Each plugin must have a unique identifier. The identifier must be:
A valid Python identifier – no special characters, spaces, reserved Python keywords, etc.
Unique to the plugin – not used by any other plugin, helper or merger
Not the string “core” which is reserved for Djerba core parameters
The identifier is:
The header for the INI file section corresponding to the plugin
The name of the directory containing the plugin code – which makes up a Python package
Djerba relies on the identifier format to distinguish plugins, helpers, and mergers:
Helper identifiers must end with
_helper
. Conversely, any identifier ending with_helper
must be for a helper class.Merger identifiers must end with
_merger
. Conversely, any identifier ending with_merger
must be for a merger class.Plugin identifiers have no special requirements.
Getting started: Files and directories¶
Every plugin is a Python package. In terms of the package hierarchy, it must be a subpackage of djerba.plugins
.
To write a new plugin named foo
:
Make a directory
src/lib/djerba/plugins/foo
in the Djerba repositoryCreate an empty file
src/lib/djerba/plugins/foo
/__init__.py
Create a file
src/lib/djerba/plugins/foo
/plugin.py
and make a subclass of the plugin base classOverride the
configure
,extract
, andrender
methods of the base class as described belowWrite tests to check the plugin functions correctly
Plugin code does not necessarily have to be in the main Djerba repository. It can be anywhere, as long as the plugin is a submodule of djerba.plugins
and visible for import as a Python class (ie. the appropriate directory is on the PYTHONPATH environment variable).
Configure step: The INI file structure¶
Djerba uses the INI file format for configuration. The INI parameters set out what information the plugin needs to do its job.
Reserved parameters¶
Certain parameters are used by the Djerba core to control plugin behaviour. They must be present for all plugins.
Priorities¶
A plugin has three priority parameters: One each for the configure, extract, and render steps.
Priorities determine the order in which components are executed at runtime.
Priority values must be integers.
Components are executed starting with the lowest priority number, and ending with the highest.
The configure, extract, and render steps all have separate priorities, which may be set independently of each other.
Priority at the render step determines the order of sections in the HTML output document.
If two components have the same priority, the order in which they will run is not guaranteed. This may be acceptable for configure/extract, but is unlikely to be desirable for render.
Priority parameters are identified by the following keywords:
configure_priority
extract_priority
render_priority
These are reserved, and may not be used for any other INI parameters.
Priority tips and conventions
We set priorities in increments of 100 or more – this makes it easier to insert a priority in between two others, if needed later
If component A is dependent on component B, it is important to set the priorities so B runs before A
If there are no dependencies, the component still has a priority, but the value is unimportant
Attributes¶
Attributes can be thought of as tags applied to individual plugins. This enables core functions to categorize the plugins.
By default, known attributes are:
clinical
supplementary
The plugin INI may have an attributes
parameter. The associated value may be empty; or one or more of the permitted attribute names, in any order, separated by commas. An incorrectly formatted attributes list will result in an error. Unknown attribute tags will result in a warning. The list of known attribute tags is set in the component superclass configurable
, and may be overridden in subclasses by setting the KNOWN_ATTRIBUTES
class variable.
Similarly to the priority names, “attributes” is a reserved keyword, and may not be used for any other INI parameters.
When it parses a comma-separated list from an INI file, Djerba will automatically remove duplicates and sort the output. This applies to both attributes and dependencies (see below).
Dependencies¶
A plugin may depend on other plugins or helpers.
For example, a helper may write a data file to the workspace, which the plugin needs to do its analysis.
We can explicitly declare a dependency with INI parameters:
depends_configure
depends_extract
at the configure and extract steps, respectively.
Each of these parameters may be empty; otherwise, it expects a comma-separated list of strings. A non-empty list contains the the identifiers of one or more plugins or helpers, in any order, separated by commas. At runtime, Djerba will check the list against the components in the INI file, and raise an error if a dependency is not present.
As with other reserved keywords, depends_configure
and depends_extract
may not be used for any other INI parameters.
“Depends” is not defined for mergers¶
There is no dependency INI parameter for mergers. In keeping with the Prime Directive of Rendering, all dependencies are resolved after the extract step is completed.
Similarly, a merger name should not appear in a dependency parameter.
“Depends” is not required¶
Plugin authors do not have to explicitly declare dependencies with the “requires” keyword. But doing so is recommended, as it allows additional error checking at runtime. (The alternative is implicit dependency, where the onus is on whoever writes the INI file to make sure any dependencies are satisfied by running the needed components.)
Plugin-specific parameters¶
Other INI parameters may be defined as needed. Parameters may have any name which is not a reserved keyword.
A required parameter must be supplied by the user.
A default parameter has a default value, which is read by the plugin code at runtime. Defaults may be recorded in any way which is convenient – hard-coded in the Python module, read from a file, etc.
All (non-reserved) parameters are either required or default, and must be defined as such in the plugin code. If the INI input contains parameters which have not been defined as required or default, an error will result.
Substituting environment variables¶
Djerba supports setting INI parameters with environment variables. This is done by variable substitution using Python template strings and the mapping of environment variables from Python os.environ.
For example:
Suppose
DJERBA_DATA_DIR
is an environment variable with the directory path/etc/data/djerba
; and an INI parameter has the value$DJERBA_DATA_DIR/data_file.txt
Then inside the
configure
function, the plugin can substitute$DJERBA_DATA_DIR
with the value of the corresponding environment variableSo in this case, the config parameter is changed from
$DJERBA_DATA_DIR/data_file.txt
to/etc/data/djerba/data_file.txt
To do the variable substitution, simply call one of these methods inherited from the parent class configurable
:
apply_my_env_templates
– applies only to the current componentapply_env_templates
– applies to all components. This breaks the convention that a plugin only modifies its own config section, so use with caution.
See the template string documentation for substitution rules. Note that Djerba uses substitute
, not safe_substitute
– so attempting to substitute an environment variable which does not exist will raise an error.
Environment variables and defaults
Environment variable substitution happens after defaults are applied – so it is permissible, and useful, for a default to have an environment variable to be filled in at runtime.
Environment variable conventions
We define some conventions for environment variables in Djerba:
DJERBA_DATA_DIR
: Directory with data files, eg. Ensembl ID conversion tableDJERBA_TEST_DIR
: Directory with shared files for unit testsDJERBA_PRIVATE_DIR
: Directory with restricted access permissions, used for files with private information such as passwords / access tokens
Using the workspace¶
The workspace is a directory which can be used to read/write temporary files.
This enables data to be passed between components: A component can write a file, which is later read by another component. Doing so creates dependencies between components, which can be represented by correct use of priority values and the depends
parameters.
Every plugin has a djerba.core.workspace object as an instance variable called workspace
. Methods of the workspace class can be accessed in the usual way. For example, self.workspace.read_string(my_file.txt)
will return the contents of my_file.txt
as a string.
Example INI section¶
In this example, we have all the INI parameters for lego_movie_plugin
. Note that:
The
depends_extract
value is emptymotto
,lucky_number
, andinstructions
are parameters specific to this plugininstructions
uses environment variable substitution.All reserved parameters required by the Djerba core are present.
[lego_movie_plugin]
configure_priority = 100
extract_priority = 100
render_priority = 100
attributes = clinical
depends_configure =
depends_extract = batman,star_wars
motto = "everything is awesome"
lucky_number = 42
instructions = $DJERBA_INSTRUCTION_DIR/instructions.pdf
Implementation: How to write the code¶
How Djerba configuration works¶
At the configure step, we take a partially configured INI as input, and output a fully specified INI. The standard way in Python to handle an INI document is with a ConfigParser object. The Djerba config step essentially consists of updating a ConfigParser into the form we want.
As noted above, a plugin is a subclass of the Djerba plugin base class. We implement the configuration for a plugin by overriding the configure
and specify_params
methods of the base class. Method overriding is explained in detail in the Python tutorial – but in short, we write a method with the same input and output arguments as the method of the same name in the parent class, but which may have different internal operations.
For
configure
, the input and output are both ConfigParsersFor
specify_params
, there is no input or output; instead, we call methods of the parent class to modify instance variables (ie. the internal state of the plugin object).
At runtime, the Djerba core will call the configure
method of each plugin, giving it the ConfigParser output by the previous plugin, and expecting to receive back a ConfigParser with all parameters for the plugin completed. It does this for each component in priority order, completing the config section for each plugin, and so building up the complete INI document.
Editing convention
Input to the configure
method of each plugin is a ConfigParser with parameters for all components; but by convention, the plugin only edits its own section of the ConfigParser.
Configurable methods¶
The plugin base inherits from the core configurable class, which contains a number of useful methods to support INI processing.
Detailed documentation and examples are TODO, but method names/comments in the code itself should be instructive.
The specify_params()
method¶
All plugins must define a method called specify_params()
, which will be called by the __init__()
method of the base plugin class – in other words, it will be run when the plugin is loaded for use.
Any required and default parameters of the plugin must be defined in specify_params()
. This is done using the following methods of the configurable class:
add_ini_required(name)
: Add a required parameter calledname
set_ini_default(key, value)
: Add the given(key, value)
pair as a defaultset_priority_defaults(priority)
: Set defaults for all relevant priorities to the non-negative integerpriority
At a minimum, specify_params()
must set priorities for the plugin, either individually using set_ini_default
or all at once using set_priority_defaults
.
All INI parameters must be defined as either required or default in specify_params()
. Using an undefined parameter in the INI will result in a runtime error.
The config wrapper¶
The get_config_wrapper
method provides a config wrapper object customized for that plugin.
As the name suggests, a config wrapper wraps around a ConfigParser to provide functions specific for a Djerba plugin.
For example, a wrapper obtained from get_config_wrapper
is aware of the plugin name, and has methods like get_my_integer(parameter_name)
to get the named integer from the current plugin’s config section. Doing the same with a ConfigParser would require something like get_integer(plugin_name, parameter_name)
which is less convenient.
Usage:
wrapper = self.get_config_wrapper(config)
gets a wrapperUse methods of the wrapper to get/set variables
Finally, do
return wrapper.get_config()
to output the completed ConfigParser
The apply_defaults()
method¶
As the name suggests, this method populates the INI with any default values defined for the plugin. Defaults are defined in the specify_params
method (see below).
If any defaults have been defined, the developer must call apply_defaults
in the configure
method of the plugin, in order to apply them to the INI at runtime.
Usage: updated_config = self.apply_defaults(config)
The get_module_dir()
method¶
This method returns a path to the directory where the plugin module has been installed. This is very useful for running additional code (eg. R scripts) or reading data files which may be part of the plugin.
Usage: my_module_dir = self.get_module_dir()
Summary and checklist¶
To summarize, the way to write the configure step for a new plugin is:
Make a subclass of the plugin base class.
In the subclass, make a method called
specify_params
. Use the methods of the parent class to provide names of all required parameters, and names and values of all default parameters.In the subclass, make a method called
configure
, which will override theconfigure
method of the parent class.configure
will take a ConfigParser as input, and return a ConfigParser as output.In the new
configure
method, useget_config_wrapper(config)
to get a wrapper for the input ConfigParser.Write whatever code is needed to update values in the config.
All required parameters must receive values.
Default values may also be overwritten, depending on user input or the results of previously run plugins (eg. files read from the workspace).
Access the workspace as needed, using methods of
self.workspace
Do environment variable substitution if desired, by calling
apply_my_env_templates()
Use the
get_config()
method of the config wrapper to get back an updated ConfigParserFinally, return the updated ConfigParser as output from the new
configure
method.
Extract step: From INI to JSON¶
Overview¶
The extract step takes as input a fully specified INI configuration, and outputs a data structure to be written in JSON format. The output should contain only data specific to that plugin – it is a component of the final JSON output, not the entire document. (This contrasts with the configure step, where the input/output is a ConfigParser with parameters for all components.) The JSON output for a plugin should be self-contained, in accordance with the Prime Directive of Rendering.
At the extract step, components have priorities and can read/write files in the workspace, in exactly the same ways as the configure step.
The JSON schema¶
What is a JSON schema?¶
Data output by a plugin must satisfy the plugin JSON schema.
A schema is a JSON document which specifies a permitted structure for other JSON documents. JSON generated by a plugin is automatically validated against the schema.
JSON schema syntax is documented in the JSON schema home page and tutorial. Schema validation in Djerba is done using the Python jsonschema package.
The plugin schema¶
The JSON schema document for plugins is on Github. Required keys include:
plugin_name
: Identifier of the pluginversion
: Version of the pluginpriorities
: Integers determining order at runtimeattributes
: Keywords applied to this pluginrequired
: Comma-separated list of component names required by this componentmerge_inputs
: Input to mergers (if any), see below for detailsresults
: Catch-all attribute to contain data produced by the plugin
The priorities
, attributes
, and required
keys implement the respective concepts, already discussed for the configure step.
The results
value may contain any data produced by the plugin, so long as it can be represented in JSON: Text, additional data structures, base-64 encoded images, etc. (Very large results may cause issues with uploading to the Djerba CouchDB instance.)
The merge_inputs key¶
Djerba merger components expect their input in a specific format. Therefore, each merger has its own JSON schema.
An example is the gene information merger schema. It expects an array of objects, each of which has key/value pairs for gene, gene URL, chromosome, and summary text.
The contents of the merge_inputs
value in plugin output are key/value pairs, where the key is the name of a merger, and the value is a data structure satisfying the merger schema.
Implementation: How to write the code¶
In the plugin subclass created for the
configure
step, override theextract
methodInput is a ConfigParser with all parameters fully specified
Write code to extract the data for the given parameters
Return a data structure satisfying the plugin JSON schema:
The overall structure must match the general plugin schema
Any entries in
merge_inputs
must match the schema of the appropriate merger
Render step: From JSON to HTML¶
The final step for a plugin is to render JSON output to HTML. (Conversion of HTML to PDF is done by the Djerba core.)
Once again, the plugin class must override the render
method of the parent class. Input to render
is the JSON data generated by the plugin in the extract step; and output is a string, intended to make up part of an HTML document.
The output string may be generated in any way that is convenient. The use of Mako templates to interpolate JSON variables to HTML is strongly recommended, for consistency with other plugins.
The final HTML document will contain CSS style headers inserted by the Djerba core, similar to those used by classic Djerba. The CSS features can be used to ensure consistent formatting between report sections. Explicitly defining CSS headers in core Djerba is TODO.
The Prime Directive of Rendering¶
JSON from each plugin is rendered independently of every other plugin.
It is an important principle of Djerba that the JSON document is self-contained; in other words, it is possible to generate a complete report using only the JSON document. In particular, we archive JSON to a database with the expectation that it can be rendered to an HTML document if desired.
To enable this, the JSON element for a plugin must be sufficient to generate HTML for that plugin. The render step cannot depend on output from other plugins (whether in the JSON document, or in the workspace).
Dependencies at the configure and extract steps are fine (and Djerba has extensive features to support this), but any dependencies must be resolved by the completion of the extract step.
Do not use the workspace¶
The workspace
attribute of the plugin can in theory be accessed at the render step. In practice, developers must not do this, as it will break the Prime Directive of Rendering. If data from the workspace is needed, it must be inserted into the JSON in the extract step.
Testing a plugin¶
A simple test class¶
Djerba provides a simple test framework for plugins in plugin_tester.py. As a bare minimum, each plugin should include the plugin tester.
An example can be seen in the demo test. Required steps are:
Create a subdirectory
test
within the plugin directoryCreate an empty file in the test directory named
__init__.py
Create a file with an appropriate name like
plugin_test.py
Start with a shebang such as
#! /usr/bin/env python3
to allow execution on the command lineMake a subclass of
djerba.plugins.plugin_tester
Write INI and JSON files in the test directory, and find the MD5 checksum of the expected output
Create a dictionary with the INI filename, JSON filename, and MD5 string
Call the
run_basic_test
method of the parent class, with the plugin directory and the above dictionary as argumentsThe test can then be run as usual for a Python unit test script, directly or using test discovery
The basic test will:
Run
configure
with the given INI file as inputRun
extract
with the fully specified INI file generated in theconfigure
stepCompare the JSON output of
extract
with the given JSON file, to ensure they matchIf step 3 is successful, run
render
with the JSON as inputCompare the MD5sum of the string output by
render
with the given checksum, again ensuring they match
This is the simplest possible test of running a plugin from start to finish.
Bootstrapping test development¶
Bootstrap, noun: The technique of starting with existing resources to create something more complex and effective.
The generate_ini.py
and main djerba.py
scripts can be used to do preliminary testing and prepare the INI and JSON inputs for the plugin tester.
Write a first draft of the plugin
Try to run
generate_ini.py
to create an INI file for the plugin and any dependencies. Fix any bugs in the initial draft which causegenerate_ini.py
to crash.Update the auto-generated INI file to run Djerba. At a minimum, fill in any parameters which must be specified manually; other parameters may also need to be modified for testing.
Run
djerba.py
with the generate INI file to generate a report. Again, fix any crash-inducing bugs.Check the HTML/PDF output by eye. (Sometimes bugs occur silently but result in incorrect output.)
Use your INI file, and the JSON generated by
djerba.py
, to provide inputs to the simple test class.Run the simple test. Update the expected MD5 sum for HTML output if necessary.
Example of running a test¶
$ generate_ini.py --verbose -o test.ini provenance_helper patient_info
2023-08-02_17:06:45 djerba.core.ini_generator INFO: Generating config for components: ['core', 'provenance_helper', 'patient_info']
2023-08-02_17:06:45 djerba.core.ini_generator INFO: Finished generating config.
# edit test.ini and fix bugs
$ djerba.py report -i test.ini -o report -w work
Further remarks on testing¶
Additional testing is strongly encouraged! See the Python unittest documentation for details. Through testing, we seek to avoid the situation in this XKCD cartoon:
Also important for testing is the ability to think outside the box:
Conclusion¶
This document has set out instructions for how to write a Djerba plugin.
It is not intended to be exhaustive! Rather, it is meant as a reasonably compact tour of the essential plugin features.
Writing mergers and helpers is very similar to writing plugins, but not explicitly covered here.
Questions/comments/bug reports can be sent to the Djerba development team.
Happy coding!