Introduction to GeneFlow

GeneFlow, a Python-based workflow engine, was originally developed for the CDC and provides a framework for building generalized data analysis workflows that leverage modular, reusable components. Version 3 of GeneFlow (in a pre-release state as of this writing) simplifies the workflow definition to make it easier for users to build and run workflows. This post steps through the installation process and runs a simple workflow to highlight the basic functionality of GeneFlow. Additional details will be provided in future posts.

More information, as well as the GeneFlow source code, can be found here.

Installation

The easiest way to install GeneFlow is via a Python virtual environment:

python3 -m venv gfpy
source gfpy/bin/activate

These commands create a Python virtual environment in the "gfpy" folder and activates it. Next, install GeneFlow using pip:

pip install geneflow3

Check the version of GeneFlow to validate that it installed correctly:

gf --version

You should see output similar to the following (your version might differ, but should be >= 3.x):

gf 3.0.0-alpha.1

Install a Workflow

Now install an example workflow from the GeneFlow Workflows GitHub repository, which is located here. Use the "install-workflow" command:

gf install-workflow --make-apps -c -g https://github.com/geneflow-workflows/minimap2-gf3 --git-branch 2.22-01 minimap2-gf3

This command clones the example workflow repository into the specified folder (e.g., minimap2-gf3) and builds the apps. 

Run the Workflow

Note: this specific workflow depends on Docker to execute containers. Docker must be properly installed and functional before running this workflow.  

This example workflow runs a single step: Minimap2 to align short read sequences to a reference sequence. More information about this bioinformatics tool can be found here

Before running the workflow, take a look at the workflow's command-line requirements with the following command:

gf help minimap2-gf3

You should see the following output:


To run the workflow, go to the directory where the example workflow was installed. 

cd minimap2-gf3
gf run .


By default, the output is placed in ~/geneflow-output, and should be structured as follows:

Comments