Modules¶
pirec is a module for recording the activity of file processing pipelines.
Main pirec module containing the Pipeline class and function recording methods.
-
class
pirec.processresult.OutputRecorder¶ Holds commands used via the call function and their resulting output.
-
reset()¶ Clear the stored commands and output.
-
-
class
pirec.processresult.Pipeline¶ Main class managing the recording of a processing pipeline.
-
record(process)¶ Record a process in this pipeline.
Parameters: process ( pirec.processresult.ProcessOutput) – The new result.
-
run(name, pipeline_func, base_dir, *inputs, **kwargs)¶ Execute a function as a recorded pipeline.
Parameters: - name (str) – The name of the pipeline - used to name the output file.
- pipeline_function (function) – The function to be run.
- base_dir (str) – The directory in which to save the pipeline output, also used as the root directory for input filenames if the filenames given are not absolute.
- *inputs – The inputs to the pipeline.
Keyword Arguments: - metadata (dict) – Additional information to be included in the result JSON.
- filename (str) – String template for the result filename.
- result_recorder (object) – An instance of a class implementing a write() method that accepts the report dictionary.
- result_names (str) – An iterable of strings containing the names for any values returned by the pipeline.
- report_name (str) – Filename for the JSON report (default: report.json).
- sentry (raven.Client) – A Sentry.IO client.
-
save(exception=None, report_name='report.json')¶ Save a record of the pipeline execution.
Creates a JSON file with information about the pipeline then saves it to a gzipped tar file along with all files used in the pipeline.
Keyword Arguments: exception ( exceptions.Exceptionor None) – The exception which caused the pipeline run to fail
-
-
class
pirec.processresult.ProcessOutput(func, args, kwargs, commands, output, exception, started, finished, **output_images)¶ A record of one stage within a pipeline.
Parameters: - func (function) – The function that was run.
- args (list) – The arguments passed to the function.
- kwargs (dict) – The keyword arguments passed to the function.
- output (str) – Text printed to stdout or stderr during execution.
- exception (
exceptions.Exceptionor None) – The exception that occurred running the stage if applicable. - started (
datetime.datetime) – When the stage was started. - finished (
datetime.datetime) – When the stage finished executing. - **output_images (
pirec.artefacts.Artefact) – Images produced by the stage.
-
as_dict()¶ Serialize this output as a
dict.
-
pirec.processresult.call(cmd, cwd=None, shell=False)¶ Execute scripts and applications in a pipeline with output capturing.
Parameters: - cmd (list) – List containing the program to be called and any arguments
e.g.
['tar', '-x', '-f', 'file.tgz']. - cwd (str) – Working directory in which to execute the command.
- shell (bool) – Execute the command in a shell.
Returns: The output from the called command on stdout and stderr.
Return type: str
- cmd (list) – List containing the program to be called and any arguments
e.g.
-
pirec.processresult.record(*output_names)¶ Decorator for wrapping pipeline stages.
Parameters: *output_names (str) – The names of each returned variable.
Module containing the pirec.artefacts.Artefact base class and subclasses.
-
class
pirec.artefacts.Artefact(filename, extension, exists=True)¶ Base class for Pirec artefacts (files consumed by and generated by processes).
Parameters: - filename (str) – The filename of the artefact.
- extension (str) – The extension of the artefact’s filename.
Keyword Arguments: exists (boolean) – If true raise an exception if the file does not exist.
Raises: exceptions.ValueError– Iffilenamedoes not end withextension.exceptions.IOError– Iffilenamedoes not exist.
-
abspath¶ The file’s absolute path.
-
basename¶ The filename without the extension.
>> Artefact('/dir/file.txt').basename '/dir/file'
-
checksum()¶ Calculate the SHA-1 checksum of the file.
-
dereference()¶ Remove any directory components from the filename.
>> a = Artefact('/dir/file.txt') >> a.dereference() >> a.filename 'file.txt'
-
dirname¶ Return the directory component of the filename.
>> Artefact('/dir/file.txt').dirname() '/dir'
-
exists()¶ Return
TrueifArtefact.filenameexists.
-
filename¶ The artefact’s filename.
-
justname¶ The filename without the extension and directory components.
>> Artefact('/dir/file.txt').justname 'file'
-
class
pirec.artefacts.NiiGzImage(filename, exists=True)¶ An artefact for
.nii.gzimages.Parameters: filename (str) – The filename of the artefact. Keyword Arguments: exists (boolean) – If true raise an exception if the file does not exist.
-
class
pirec.artefacts.TextFile(filename, exists=True)¶ An artefact for
.txtfiles.Parameters: filename (str) – The filename of the artefact. Keyword Arguments: exists (boolean) – If true raise an exception if the file does not exist.
-
pirec.artefacts.get_targz_artefact(archive_filename, filename, artefact_cls, strip_dirname=True)¶ Get an artefact from a
.tar.gzfile.Parameters: - archive_name (str) – The filename of the container.
- filename (str) – The filename of the artefact.
- artefact_cls (Artefact) – The class of the artefact.
Module containing the get_environment function.
-
pirec.environment.get_environment()¶ Obtain information about the executing environment.
- Captures:
- installed Python packages using pip (if available),
- hostname
- uname
- environment variables
Returns: a dict with the keys python_packages,hostname,unameandenvironReturn type: dict
Module containing functions for recording results to files and databases.
-
class
pirec.recorders.CSVFile(path, values)¶ Records results to a CSV file.
Parameters: - path (str) – The file to which results should be written
- values (dict) – a mapping from table columns to values
-
write(results)¶ Write results to the file specified.
Parameters: results (dict) – A dictionary of results to record Note
If the specified does not exist it will be created and a header will be written , otherwise the new result is appended.
-
class
pirec.recorders.SQLDatabase(uri, table, values, json_column=None)¶ Record results to a database supported by SQLAlchemy.
Parameters: - uri (str) – database server URI e.g.
mysql://username:password@localhost/dbname - table (str) – table name
- values (dict) – a mapping from database table columns to values
Keyword Arguments: json_column (str) – If supplied the complete result dictionary will be written to this column
See also
-
write(results)¶ Write the results to the database table specified at initialisation.
Parameters: results (dict) – A dictionary of results to record
- uri (str) – database server URI e.g.
-
class
pirec.recorders.MongoDB(uri, database, collection)¶ Records results to a MongoDB database.
Parameters: - uri (str) – MongoDB server URI e.g.
mongodb://localhost:27017 - database (str) – database name
- collection (str) – collection name
Note
Use of this class requires the installation of the pymongo module.
See also
-
write(results)¶ Insert results into the database.
- uri (str) – MongoDB server URI e.g.
-
class
pirec.recorders.StdOut(values)¶ Print results to stdout.
Parameters: values (dict) – key-value pairs to be printed -
write(results)¶ Print the results to stdout.
-
-
class
pirec.recorders.Slack(url, channel, values)¶ Send a Slack notification when a pipeline completes.
Parameters: - url (str) – Slack Webhook URL
- channel (str) – The channel name to post to
- values – (dict): A mapping of result keys to report
Note
Use of this class requires the installation of the slackclient module.
-
write(results)¶ Send a message to Slack.
Parameters: results (dict) – A dictionary of results to record
Exposes the CSVFile result recorder.
-
class
pirec.recorders.csvfile.CSVFile(path, values)¶ Records results to a CSV file.
Parameters: - path (str) – The file to which results should be written
- values (dict) – a mapping from table columns to values
-
write(results)¶ Write results to the file specified.
Parameters: results (dict) – A dictionary of results to record Note
If the specified does not exist it will be created and a header will be written , otherwise the new result is appended.
Exposes the MongoDB recorder class.
-
class
pirec.recorders.mongodb.MongoDB(uri, database, collection)¶ Records results to a MongoDB database.
Parameters: - uri (str) – MongoDB server URI e.g.
mongodb://localhost:27017 - database (str) – database name
- collection (str) – collection name
Note
Use of this class requires the installation of the pymongo module.
See also
-
write(results)¶ Insert results into the database.
- uri (str) – MongoDB server URI e.g.
Exposes the Slack result recorder.
-
class
pirec.recorders.slack.Slack(url, channel, values)¶ Send a Slack notification when a pipeline completes.
Parameters: - url (str) – Slack Webhook URL
- channel (str) – The channel name to post to
- values – (dict): A mapping of result keys to report
Note
Use of this class requires the installation of the slackclient module.
-
write(results)¶ Send a message to Slack.
Parameters: results (dict) – A dictionary of results to record
Exposes the SQLDatabase result recorder.
-
class
pirec.recorders.sqldatabase.SQLDatabase(uri, table, values, json_column=None)¶ Record results to a database supported by SQLAlchemy.
Parameters: - uri (str) – database server URI e.g.
mysql://username:password@localhost/dbname - table (str) – table name
- values (dict) – a mapping from database table columns to values
Keyword Arguments: json_column (str) – If supplied the complete result dictionary will be written to this column
See also
-
write(results)¶ Write the results to the database table specified at initialisation.
Parameters: results (dict) – A dictionary of results to record
- uri (str) – database server URI e.g.
Exposes the StdOut recorder.