Introduction to GeneFlow
GeneFlow, a Python-based workflow engine, was originally developed for the
CDC and provides a framework for building generalized data analysis
workflows that leverage modular, reusable components. Version 3 of GeneFlow
(in a pre-release state as of this writing) simplifies the workflow
definition to make it easier for users to build and run workflows. This post
steps through the installation process and runs a simple workflow to
highlight the basic functionality of GeneFlow. Additional details will be provided in future posts.
More information, as well as the GeneFlow source code, can be found here.
Installation
The easiest way to install GeneFlow is via a Python virtual
environment:
python3 -m venv gfpy
source gfpy/bin/activate
These commands create a Python virtual environment in the "gfpy" folder and activates it. Next, install GeneFlow using pip:
pip install geneflow3
Check the version of GeneFlow to validate that it installed correctly:
gf --version
You should see output similar to the following (your version might differ, but should be >= 3.x):
gf 3.0.0-alpha.1
Install a Workflow
Now install an example workflow from the GeneFlow Workflows GitHub
repository, which is located here. Use the "install-workflow" command:
gf install-workflow --make-apps -c -g
https://github.com/geneflow-workflows/minimap2-gf3 --git-branch 2.22-01
minimap2-gf3
This command clones the example workflow repository into the specified
folder (e.g., minimap2-gf3) and builds the apps.
Run the Workflow
Note: this specific workflow depends on
Docker to execute
containers. Docker must be properly installed and functional before running
this workflow.
This example workflow runs a single step: Minimap2 to align short read
sequences to a reference sequence. More information about this
bioinformatics tool can be found
here.
Before running the workflow, take a look at the workflow's command-line
requirements with the following command:
gf help minimap2-gf3
You should see the following output:
To run the workflow, go to the directory where the example workflow was
installed.
cd minimap2-gf3
gf run .
gf run .
By default, the output is placed in ~/geneflow-output, and should be structured as follows:
Comments
Post a Comment