x.FASTQ

x.FASTQ is a suite of Bash wrappers for original and third-party software designed to make RNA-Seq data analysis more automated, but also accessible to wet biologists without a specific bioinformatics background.

Modules

x.FASTQ provides several modules to cover the entire RNA-Seq data analysis workflow, from raw read retrieval to count matrix generation. Each module is started with a different CLI-executable bash command:

Module Name	Performed Task
getFASTQ	downloads NGS raw data in FASTQ format from the ENA database
trimFASTQ	performs adapter and quality trimming by running BBDuk
anqFASTQ	aligns reads and quantifies transcript abundance by running STAR and RSEM
qcFASTQ	runs quality-control tools, such as FastQC and MultiQC
tabFASTQ	merges counts from multiple samples into a single expression table
metaharvest	fetches metadata from GEO and/or ENA databases
x.FASTQ	performs common tasks of general utility (disk usage monitor, dependency report...)

Usage

Assuming that you have identified a study of interest from GEO (e.g., GSE138309), have already created a project folder somewhere (mkdir '<anyPath>'/GSE138309), and have moved into it (cd '<anyPath>'/GSE138309), here are some possible sample workflows.

Minimal Step-by-Step Workflow

As an example of a minimal workflow, we can think of the following command set to retrieve the FASTQs, align and quantify them, and generate the gene-level count matrix.

# Download FASTQs, align, quantify, and assemble a gene-level count matrix
getfastq -u GSE138309 > ./GSE138309_wgets.sh
getfastq GSE138309_wgets.sh
anqfastq .
tabfastq .

Complete Step-by-Step Workflow

A more complete workflow might include the download of metadata, a read trimming step, multiple quality control steps, and the inclusion of gene annotations and experimental design information in the count matrix.

# Download 12 (PE) FASTQs in parallel and fetch GEO-ENA cross-referenced metadata
getfastq --urls GSE138309 > ./GSE138309_wgets.sh
getfastq --multi GSE138309_wgets.sh
metaharvest --geo --ena GSE138309 > GSE138309_meta.csv

# Trim and QC
qcfastq --out=FastQC_raw .
trimfastq .
qcfastq --out=FastQC_trim .

# Align, quantify, and QC
anqfastq .
qcfastq --tool=QualiMap .
qcfastq --tool=MultiQC .

# Clean up
rm *.fastq.gz

# Assemble an isoform-level count matrix with annotation and experimental design
groups=(Ctrl Ctrl Ctrl Treat Treat Treat)
tabfastq --isoforms --names=human --design="${groups[*]}" --metric=expected_count .

# Explore samples through PCA
qcfastq --tool=PCA .

Complete Workflow in Batch Mode

Due to the typical hardware requirements for read alignment and subsequent transcript abundance quantification, x.FASTQ has been designed to be installed on one (or a few) remote Linux servers and accessed by multiple client users via SSH. Accordingly, each x.FASTQ module runs by default in the background and persistently (i.e., ignoring the HUP hangup signal), so that the user is not forced to keep the connection active for the entire duration of the analysis, but only for job scheduling. In this way, each x.FASTQ module can be run independently as a single analysis step.

Alternatively, multiple modules can be chained together can be chained together in a single pipeline to automate the entire analysis workflow by using the -w | --workflow option for foreground execution. Here is the batched version of the previous workflow

#!/bin/bash
## Prototypical x.FASTQ pipeline

# Download 12 (PE) FASTQs in parallel and fetch GEO-ENA cross-referenced metadata
getfastq --urls GSE138309 > ./GSE138309_wgets.sh
getfastq -w --multi GSE138309_wgets.sh
metaharvest --geo --ena GSE138309 > GSE138309_meta.csv

# Check FASTQ fileset completeness before going on
if ! getfastq --progress-complete; then
   echo "FASTQ file possibly missing! Aborting the pipeline..."
   exit 1
fi

# Trim and QC
qcfastq -w --out=FastQC_raw .
trimfastq -w .
qcfastq -w --out=FastQC_trim .

# Align, quantify, and QC
anqfastq -w .
qcfastq -w --tool=QualiMap .
qcfastq -w --tool=MultiQC .

# Clean up
rm *.fastq.gz

# Assemble an isoform-level count matrix with annotation and experimental design
groups=(Ctrl Ctrl Ctrl Treat Treat Treat)
tabfastq -w --isoforms --names=human --design="${groups[*]}" --metric=expected_count .

# Explore samples through PCA
qcfastq -w --tool=PCA .

Just save this pipeline as a single script file (e.g., pipeline.xfastq) and run the entire workflow with nohup and in the background

nohup bash pipeline.xfastq &

Complete Workflow with Moliere

Alternatively, a similar workflow can be performed in a single command using Moliere, a "precasted" Python script that runs, in order, getfastq, qcfastq, trimfastq, qcfastq (again), anqfastq, and tabfastq, covering the whole analysis process with sensible defaults.

nohup moliere analyse GSE138309 &

Documentation

Each module (including Moliere) has its own -h | --help option, which provides detailed information on possible arguments and command syntax.

x.FASTQ full documentation, including the installation procedure on the server machine, can be found in the docs folder instead.

A PDF version is also available as a preprint from Prerpints.org with DOI: 10.20944/preprints202507.0213

Name		Name	Last commit message	Last commit date
Latest commit History 491 Commits
config		config
docs		docs
test		test
workers		workers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
anqfastq.sh		anqfastq.sh
getfastq.sh		getfastq.sh
metaharvest.sh		metaharvest.sh
moliere.py		moliere.py
qcfastq.sh		qcfastq.sh
starsem.sh		starsem.sh
tabfastq.sh		tabfastq.sh
trimfastq.sh		trimfastq.sh
trimmer.sh		trimmer.sh
x.fastq.sh		x.fastq.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

x.FASTQ

Modules

Usage

Minimal Step-by-Step Workflow

Complete Step-by-Step Workflow

Complete Workflow in Batch Mode

Complete Workflow with Moliere

Documentation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

TCP-Lab/x.FASTQ

Folders and files

Latest commit

History

Repository files navigation

x.FASTQ

Modules

Usage

Minimal Step-by-Step Workflow

Complete Step-by-Step Workflow

Complete Workflow in Batch Mode

Complete Workflow with Moliere

Documentation

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages