Plugin Developer’s Guide

Purpose of this document

Meeting the basic technical requirements of a plugin is straightforward. Simple examples include the demo1 and demo2 plugins in the main Djerba repository.

The goal of this document is to describe how to write a useful plugin, to process production data for a clinical report. It outlines features of Djerba intended to make this easier:

  • Convenience methods and classes, eg. for manipulating INI config structures

  • Validation of output, to ensure it is correctly formatted

It also describes how and where to write the plugin code.

Definitions

What is a plugin?

A plugin works as a mini-Djerba:

  • Configure an INI file

  • Extract data to JSON

  • Render data as HTML

A plugin is a subclass of the Djerba plugin base class.

Other definitions

  • Core: The Djerba framework to detect and run plugins. The core runs all relevant components to produce an INI file, JSON file, and HTML/PDF documents, similarly to classic Djerba.

  • Workspace: Shared directory in which we can write temporary files

  • Merger: A type of component which merges and deduplicates JSON from one or more plugins, to write HTML at the render step

  • Helper: A type of component which writes information to the workspace for use by other components, in the configure or extract steps

  • Component: Any sub-unit of Djerba: A plugin, merger, or helper. We can define multiple plugins, helpers, and mergers as needed.

Identifiers

Each plugin must have a unique identifier. The identifier must be:

  • A valid Python identifier – no special characters, spaces, reserved Python keywords, etc.

  • Unique to the plugin – not used by any other plugin, helper or merger

  • Not the string “core” which is reserved for Djerba core parameters

The identifier is:

  • The header for the INI file section corresponding to the plugin

  • The name of the directory containing the plugin code – which makes up a Python package

Djerba relies on the identifier format to distinguish plugins, helpers, and mergers:

  • Helper identifiers must end with _helper. Conversely, any identifier ending with _helper  must be for a helper class.

  • Merger identifiers must end with _merger. Conversely, any identifier ending with _merger  must be for a merger class.

  • Plugin identifiers have no special requirements.

Getting started: Files and directories

Every plugin is a Python package. In terms of the package hierarchy, it must be a subpackage of djerba.plugins.

To write a new plugin named foo:

  1. Make a directory src/lib/djerba/plugins/foo in the Djerba repository

  2. Create an empty file src/lib/djerba/plugins/foo/__init__.py

  3. Create a file src/lib/djerba/plugins/foo/plugin.py and make a subclass of the plugin base class

  4. Override the configure, extract, and render methods of the base class as described below

  5. Write tests to check the plugin functions correctly

Plugin code does not necessarily have to be in the main Djerba repository. It can be anywhere, as long as the plugin is a submodule of djerba.plugins and visible for import as a Python class (ie. the appropriate directory is on the PYTHONPATH environment variable).

Configure step: The INI file structure

Djerba uses the INI file format for configuration. The INI parameters set out what information the plugin needs to do its job.

Reserved parameters

Certain parameters are used by the Djerba core to control plugin behaviour. They must be present for all plugins.

Priorities

A plugin has three priority parameters: One each for the configure, extract, and render steps.

  • Priorities determine the order in which components are executed at runtime.

  • Priority values must be integers.

  • Components are executed starting with the lowest priority number, and ending with the highest.

  • The configure, extract, and render steps all have separate priorities, which may be set independently of each other.

  • Priority at the render step determines the order of sections in the HTML output document.

  • If two components have the same priority, the order in which they will run is not guaranteed. This may be acceptable for configure/extract, but is unlikely to be desirable for render.

Priority parameters are identified by the following keywords:

  • configure_priority

  • extract_priority

  • render_priority

These are reserved, and may not be used for any other INI parameters.

Priority tips and conventions

  • We set priorities in increments of 100 or more – this makes it easier to insert a priority in between two others, if needed later

  • If component A is dependent on component B, it is important to set the priorities so B runs before A

  • If there are no dependencies, the component still has a priority, but the value is unimportant

Attributes

Attributes can be thought of as tags applied to individual plugins. This enables core functions to categorize the plugins.

By default, known attributes are:

  • clinical

  • supplementary

The plugin INI may have an attributes parameter. The associated value may be empty; or one or more of the permitted attribute names, in any order, separated by commas. An incorrectly formatted attributes list will result in an error. Unknown attribute tags will result in a warning. The list of known attribute tags is set in the component superclass configurable , and may be overridden in subclasses by setting the KNOWN_ATTRIBUTES class variable.

Similarly to the priority names, “attributes” is a reserved keyword, and may not be used for any other INI parameters.

When it parses a comma-separated list from an INI file, Djerba will automatically remove duplicates and sort the output. This applies to both attributes and dependencies (see below).

Dependencies

A plugin may depend on other plugins or helpers.

For example, a helper may write a data file to the workspace, which the plugin needs to do its analysis.

We can explicitly declare a dependency with INI parameters:

  • depends_configure

  • depends_extract

at the configure and extract steps, respectively.

Each of these parameters may be empty; otherwise, it expects a comma-separated list of strings. A non-empty list contains the the identifiers of one or more plugins or helpers, in any order, separated by commas. At runtime, Djerba will check the list against the components in the INI file, and raise an error if a dependency is not present.

As with other reserved keywords, depends_configure and depends_extract may not be used for any other INI parameters.

“Depends” is not defined for mergers

There is no dependency INI parameter for mergers. In keeping with the Prime Directive of Rendering, all dependencies are resolved after the extract step is completed.

Similarly, a merger name should not appear in a dependency parameter.

“Depends” is not required

Plugin authors do not have to explicitly declare dependencies with the “requires” keyword. But doing so is recommended, as it allows additional error checking at runtime. (The alternative is implicit dependency, where the onus is on whoever writes the INI file to make sure any dependencies are satisfied by running the needed components.)

Plugin-specific parameters

Other INI parameters may be defined as needed. Parameters may have any name which is not a reserved keyword.

A required parameter must be supplied by the user.

A default parameter has a default value, which is read by the plugin code at runtime. Defaults may be recorded in any way which is convenient – hard-coded in the Python module, read from a file, etc.

All (non-reserved) parameters are either required or default, and must be defined as such in the plugin code. If the INI input contains parameters which have not been defined as required or default, an error will result.

Substituting environment variables

Djerba supports setting INI parameters with environment variables. This is done by variable substitution using Python template strings and the mapping of environment variables from Python os.environ.

For example:

  • Suppose DJERBA_DATA_DIR  is an environment variable with the directory path /etc/data/djerba; and an INI parameter has the value $DJERBA_DATA_DIR/data_file.txt

  • Then inside the configure  function, the plugin can substitute $DJERBA_DATA_DIR with the value of the corresponding environment variable

  • So in this case, the config parameter is changed from $DJERBA_DATA_DIR/data_file.txt to /etc/data/djerba/data_file.txt

To do the variable substitution, simply call one of these methods inherited from the parent class configurable :

  • apply_my_env_templates  – applies only to the current component

  • apply_env_templates – applies to all components. This breaks the convention that a plugin only modifies its own config section, so use with caution.

See the template string documentation for substitution rules. Note that Djerba uses substitute , not safe_substitute – so attempting to substitute an environment variable which does not exist will raise an error.

Environment variables and defaults

Environment variable substitution happens after defaults are applied – so it is permissible, and useful, for a default to have an environment variable to be filled in at runtime.

Environment variable conventions

We define some conventions for environment variables in Djerba:

  • DJERBA_DATA_DIR : Directory with data files, eg. Ensembl ID conversion table

  • DJERBA_TEST_DIR : Directory with shared files for unit tests

  • DJERBA_PRIVATE_DIR : Directory with restricted access permissions, used for files with private information such as passwords / access tokens

Using the workspace

The workspace is a directory which can be used to read/write temporary files.

This enables data to be passed between components: A component can write a file, which is later read by another component. Doing so creates dependencies between components, which can be represented by correct use of priority values and the depends  parameters.

Every plugin has a djerba.core.workspace object as an instance variable called workspace. Methods of the workspace class can be accessed in the usual way. For example, self.workspace.read_string(my_file.txt)  will return the contents of my_file.txt  as a string.

Example INI section

In this example, we have all the INI parameters for lego_movie_plugin. Note that:

  • The depends_extract value is empty

  • motto , lucky_number, and instructions  are parameters specific to this plugin

  • instructions  uses environment variable substitution.

  • All reserved parameters required by the Djerba core are present.

[lego_movie_plugin]
configure_priority = 100
extract_priority = 100
render_priority = 100
attributes = clinical
depends_configure =
depends_extract = batman,star_wars
motto = "everything is awesome"
lucky_number = 42
instructions = $DJERBA_INSTRUCTION_DIR/instructions.pdf

Implementation: How to write the code

How Djerba configuration works

At the configure step, we take a partially configured INI as input, and output a fully specified INI. The standard way in Python to handle an INI document is with a ConfigParser object. The Djerba config step essentially consists of updating a ConfigParser into the form we want.

As noted above, a plugin is a subclass of the Djerba plugin base class. We implement the configuration for a plugin by overriding the configure and specify_params methods of the base class. Method overriding is explained in detail in the Python tutorial – but in short, we write a method with the same input and output arguments as the method of the same name in the parent class, but which may have different internal operations.

  • For configure, the input and output are both ConfigParsers

  • For specify_params, there is no input or output; instead, we call methods of the parent class to modify instance variables (ie. the internal state of the plugin object).

At runtime, the Djerba core will call the configure  method of each plugin, giving it the ConfigParser output by the previous plugin, and expecting to receive back a ConfigParser with all parameters for the plugin completed. It does this for each component in priority order, completing the config section for each plugin, and so building up the complete INI document.

Editing convention

Input to the configure  method of each plugin is a ConfigParser with parameters for all components; but by convention, the plugin only edits its own section of the ConfigParser.

Configurable methods

The plugin base inherits from the core configurable class, which contains a number of useful methods to support INI processing.

Detailed documentation and examples are TODO, but method names/comments in the code itself should be instructive.

The specify_params() method

All plugins must define a method called specify_params(), which will be called by the __init__()  method of the base plugin class – in other words, it will be run when the plugin is loaded for use.

Any required and default parameters of the plugin must be defined in specify_params(). This is done using the following methods of the configurable class:

  • add_ini_required(name) : Add a required parameter called name

  • set_ini_default(key, value) : Add the given (key, value) pair as a default

  • set_priority_defaults(priority) : Set defaults for all relevant priorities to the non-negative integer priority

At a minimum, specify_params() must set priorities for the plugin, either individually using set_ini_default or all at once using set_priority_defaults.

All INI parameters must be defined as either required or default in specify_params(). Using an undefined parameter in the INI will result in a runtime error.

The config wrapper

The get_config_wrapper method provides a config wrapper object customized for that plugin.

As the name suggests, a config wrapper wraps around a ConfigParser to provide functions specific for a Djerba plugin.

For example, a wrapper obtained from get_config_wrapper is aware of the plugin name, and has methods like get_my_integer(parameter_name) to get the named integer from the current plugin’s config section. Doing the same with a ConfigParser would require something like get_integer(plugin_name, parameter_name) which is less convenient.

Usage:

  1. wrapper = self.get_config_wrapper(config) gets a wrapper

  2. Use methods of the wrapper to get/set variables

  3. Finally, do return wrapper.get_config() to output the completed ConfigParser

The apply_defaults()  method

As the name suggests, this method populates the INI with any default values defined for the plugin. Defaults are defined in the specify_params method (see below).

If any defaults have been defined, the developer must call apply_defaults  in the configure  method of the plugin, in order to apply them to the INI at runtime.

Usage: updated_config = self.apply_defaults(config)

The get_module_dir()  method

This method returns a path to the directory where the plugin module has been installed. This is very useful for running additional code (eg. R scripts) or reading data files which may be part of the plugin.

Usage: my_module_dir = self.get_module_dir()

Summary and checklist

To summarize, the way to write the configure step for a new plugin is:

  1. Make a subclass of the plugin base class.

  2. In the subclass, make a method called specify_params. Use the methods of the parent class to provide names of all required parameters, and names and values of all default parameters.

  3. In the subclass, make a method called configure, which will override the configure  method of the parent class. configure will take a ConfigParser as input, and return a ConfigParser as output.

  4. In the new configure  method, use get_config_wrapper(config)  to get a wrapper for the input ConfigParser.

  5. Write whatever code is needed to update values in the config.

    1. All required parameters must receive values.

    2. Default values may also be overwritten, depending on user input or the results of previously run plugins (eg. files read from the workspace).

    3. Access the workspace as needed, using methods of self.workspace

  6. Do environment variable substitution if desired, by calling apply_my_env_templates()

  7. Use the get_config()  method of the config wrapper to get back an updated ConfigParser

  8. Finally, return the updated ConfigParser as output from the new configure  method.

Extract step: From INI to JSON

Overview

The extract step takes as input a fully specified INI configuration, and outputs a data structure to be written in JSON format. The output should contain only data specific to that plugin – it is a component of the final JSON output, not the entire document. (This contrasts with the configure step, where the input/output is a ConfigParser with parameters for all components.) The JSON output for a plugin should be self-contained, in accordance with the Prime Directive of Rendering.

At the extract step, components have priorities and can read/write files in the workspace, in exactly the same ways as the configure step.

The JSON schema

What is a JSON schema?

Data output by a plugin must satisfy the plugin JSON schema.

A schema is a JSON document which specifies a permitted structure for other JSON documents. JSON generated by a plugin is automatically validated against the schema.

JSON schema syntax is documented in the JSON schema home page and tutorial. Schema validation in Djerba is done using the Python jsonschema package.

The plugin schema

The JSON schema document for plugins is on Github. Required keys include:

  • plugin_name: Identifier of the plugin

  • version: Version of the plugin

  • priorities: Integers determining order at runtime

  • attributes: Keywords applied to this plugin

  • required: Comma-separated list of component names required by this component

  • merge_inputs : Input to mergers (if any), see below for details

  • results : Catch-all attribute to contain data produced by the plugin

The priorities , attributes , and required keys implement the respective concepts, already discussed for the configure step.

The results  value may contain any data produced by the plugin, so long as it can be represented in JSON: Text, additional data structures, base-64 encoded images, etc. (Very large results may cause issues with uploading to the Djerba CouchDB instance.)

The merge_inputs key

Djerba merger components expect their input in a specific format. Therefore, each merger has its own JSON schema.

An example is the gene information merger schema. It expects an array of objects, each of which has key/value pairs for gene, gene URL, chromosome, and summary text.

The contents of the merge_inputs  value in plugin output are key/value pairs, where the key is the name of a merger, and the value is a data structure satisfying the merger schema.

Implementation: How to write the code

  1. In the plugin subclass created for the configure  step, override the extract  method

  2. Input is a ConfigParser with all parameters fully specified

  3. Write code to extract the data for the given parameters

  4. Return a data structure satisfying the plugin JSON schema:

    1. The overall structure must match the general plugin schema

    2. Any entries in merge_inputs  must match the schema of the appropriate merger

Render step: From JSON to HTML

The final step for a plugin is to render JSON output to HTML. (Conversion of HTML to PDF is done by the Djerba core.)

Once again, the plugin class must override the render method of the parent class. Input to render is the JSON data generated by the plugin in the extract step; and output is a string, intended to make up part of an HTML document.

The output string may be generated in any way that is convenient. The use of Mako templates to interpolate JSON variables to HTML is strongly recommended, for consistency with other plugins.

The final HTML document will contain CSS style headers inserted by the Djerba core, similar to those used by classic Djerba. The CSS features can be used to ensure consistent formatting between report sections. Explicitly defining CSS headers in core Djerba is TODO.

The Prime Directive of Rendering

JSON from each plugin is rendered independently of every other plugin.

It is an important principle of Djerba that the JSON document is self-contained; in other words, it is possible to generate a complete report using only the JSON document. In particular, we archive JSON to a database with the expectation that it can be rendered to an HTML document if desired.

To enable this, the JSON element for a plugin must be sufficient to generate HTML for that plugin. The render step cannot depend on output from other plugins (whether in the JSON document, or in the workspace).

Dependencies at the configure and extract steps are fine (and Djerba has extensive features to support this), but any dependencies must be resolved by the completion of the extract step.

Do not use the workspace

The workspace attribute of the plugin can in theory be accessed at the render step. In practice, developers must not do this, as it will break the Prime Directive of Rendering. If data from the workspace is needed, it must be inserted into the JSON in the extract step.

Testing a plugin

A simple test class

Djerba provides a simple test framework for plugins in plugin_tester.py. As a bare minimum, each plugin should include the plugin tester.

An example can be seen in the demo test. Required steps are:

  1. Create a subdirectory test  within the plugin directory

  2. Create an empty file in the test directory named __init__.py

  3. Create a file with an appropriate name like plugin_test.py

    1. Start with a shebang such as #! /usr/bin/env python3  to allow execution on the command line

    2. Make a subclass of djerba.plugins.plugin_tester

  4. Write INI and JSON files in the test directory, and find the MD5 checksum of the expected output

  5. Create a dictionary with the INI filename, JSON filename, and MD5 string

  6. Call the run_basic_test method of the parent class, with the plugin directory and the above dictionary as arguments

  7. The test can then be run as usual for a Python unit test script, directly or using test discovery

The basic test will:

  1. Run configure with the given INI file as input

  2. Run extract with the fully specified INI file generated in the configure step

  3. Compare the JSON output of extract with the given JSON file, to ensure they match

  4. If step 3 is successful, run render with the JSON as input

  5. Compare the MD5sum of the string output by render with the given checksum, again ensuring they match

This is the simplest possible test of running a plugin from start to finish.

Bootstrapping test development

Bootstrap, noun: The technique of starting with existing resources to create something more complex and effective.

The generate_ini.py  and main djerba.py  scripts can be used to do preliminary testing and prepare the INI and JSON inputs for the plugin tester.

  1. Write a first draft of the plugin

  2. Try to run generate_ini.py to create an INI file for the plugin and any dependencies. Fix any bugs in the initial draft which cause generate_ini.py to crash.

  3. Update the auto-generated INI file to run Djerba. At a minimum, fill in any parameters which must be specified manually; other parameters may also need to be modified for testing.

  4. Run djerba.py with the generate INI file to generate a report. Again, fix any crash-inducing bugs.

  5. Check the HTML/PDF output by eye. (Sometimes bugs occur silently but result in incorrect output.)

  6. Use your INI file, and the JSON generated by djerba.py, to provide inputs to the simple test class.

  7. Run the simple test. Update the expected MD5 sum for HTML output if necessary.

Example of running a test

$ generate_ini.py --verbose -o test.ini provenance_helper patient_info
2023-08-02_17:06:45 djerba.core.ini_generator INFO: Generating config for components: ['core', 'provenance_helper', 'patient_info']
2023-08-02_17:06:45 djerba.core.ini_generator INFO: Finished generating config.
# edit test.ini and fix bugs
$ djerba.py report -i test.ini -o report -w work

Further remarks on testing

Additional testing is strongly encouraged! See the Python unittest documentation for details. Through testing, we seek to avoid the situation in this XKCD cartoon:

Also important for testing is the ability to think outside the box:

Conclusion

  • This document has set out instructions for how to write a Djerba plugin.

  • It is not intended to be exhaustive! Rather, it is meant as a reasonably compact tour of the essential plugin features.

  • Writing mergers and helpers is very similar to writing plugins, but not explicitly covered here.

  • Questions/comments/bug reports can be sent to the Djerba development team.

  • Happy coding!