SEQ-flow Platform

The UNSW Workflow Platform (SEQ-flow) implements commonly used pipelines for the analysis of next-generation sequencing data on the UNSW High-Performance Computing (HPC) Infrastructure.

overview of workflows for sequencing data analysis

Workflows are data analysis pipelines implemented using a workflow manager. In general, workflow managers simplify the use of complex pipelines composed of multiple software tools by handling software installation and versions and optimizing the use of computing resources. Implementing pipelines as workflows on the UNSW HPC has the following advantages:

Shareability Pipelines implemented as workflows are easy to share and use by any UNSW research group since the software installation and version management are handled by the workflow manager.
Reproducibility Workflow managers were developed to address the challenge of reproducibility in bioinformatics. Analyzing data in a consistent manner with a standardized well documented pipeline.
Productivity Increased productivity is achieved by linking the data analysis pipelines to the Ramaciotti Centre data production. In addition, the bioinformatics community is increasingly producing best-practice data analysis pipelines that are easy to adopt in a workflow format, facilitating working with new data types.

For more details on the goals and advantages of bioinformatics workflows see:

Wratten et al. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nature Methods 2021.

OVERVIEW OF the UNSW SEQ-FLOW PLATFORM

The analysis of next-generation sequencing (NGS) data, regardless of its type, has two main stages: (i) alignment of sequencing reads to the reference genome, feature quantification, and quality control assessments and (ii) extracting biological information from the quantified features by differential expression analyses or co-expression networks (RNA-seq), differential accessibility (ATAC-seq), identification of differentially methylated regions (BS-seq) etc.

The SEQ-flow Platform is focused on implementing pipelines for the first stage of NGS data analysis which is computationally intensive, requires larger storage space and more advanced coding skills, and can be easily automated. It is not focused on the second stage, which requires fine-tuning of data analysis in a project-specific manner and can often be carried out on a personal laptop with basic bioinformatics skills (eg. R/Bioconductor). We do however suggest good options for some common downstream analyses for students with limited coding experience.

SEQ-flow is implemented as both a Web User Interface and a Command Line Interface using the workflow manager Nextflow.

It supports input sequencing data:

stored on UNSW Katana servers.
automatically downloaded on Katana from a Ramaciotti Sequencing run.
automatically downloaded on Katana from NCBI SRA.

SEQ-flow allows users to run any of the large collection of Nextflow Core pipelines.

Nextflow Core (nf-core) is an active bioinformatics community developing open-source pipelines based on NextFlow that aim to implement the best practices in the field. For more details on nf-core pipelines see:

Ewels et al. The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology 2022.

The nf-core pipelines fall into two categories in terms of their use in Seq-flow:

Pre-tested pipelines, which we have extensively tested and documented. These include the bulk RNA-seq pipeline (for any organism) and scRNA-seq.
User-adopted, which refers to any nf-core pipeline not yet tested by the Seq-flow team. We can offer support for UNSW users interested in using and testing additional nf-core pipelines, to expand the list of pre-tested pipelines.

Users can also deploy custom NextFlow workflows in Seq-flow.

Getting Started

User Guides

Support

Page updated

Google Sites

Report abuse