HiReNET – Higher-order Repeat Network Exploration Tool

Introduction:

HiReNET is a graph-based pipeline for detecting and analyzing higher-order repeats (HORs) in genomic sequences. The pipeline has been applied to both Arabidopsis thaliana and maize, providing insights into centromeric and knob repeat structure.

Installation:

Not finished yet.

Dependencies:

HiReNET relies on common bioinformatics tools. Please make sure these are installed and available in your PATH:

HMMER
BLAST+
BLAT
SeqKit
BEDTools
bioawk

R with packages:

dplyr
stringr
reshape2
igraph
ggplot2
cowplot

Usage:

Step 1: Prepare profile hidden Markov models (phmm’s) for repeat monomer identification.

Before starting this pipeline, whole-genome sequencing short reads should be analyzed with RepeatExplorer TAREAN to generate consensus repeats and repeat variants for each repeat type. RepeatExplorer is available through the public Galaxy server at https://repeatexplorer-elixir.cerit-sc.cz/. Using the outputs from RepeatExplorer TAREAN, you can then select the repeat of interest and generate profile hidden Markov models (pHMMs).

Usage:
	
HiReNET getphmm -i test_con_variant.fasta -o phmm -p test

Step 2: Identify repeat monomers and arrange monomers in the customized bins.

Repeat monomers can be identified using profile hidden Markov models (pHMMs) and then organized into customized bins. A bin size of 10 kb is typically a good starting point.

Usage:
	
HiReNET arrayfind -g genome.fasta -c consensus.fasta -o arrayout -p test
HiReNET monomerfind \
	--arrays-dir arrayout \
	--outdir monomerout \
	--prefix test --hmm phmm/test.hmm \
	--chr "chr1,chr2"
HiReNET arrangemonomer \
	--monomer-dir monomerout/test_monomers \
	--outdir arrangemonomer_10kb \
	--prefix test \
	--bin 10000 \
	--chr "chr1,chr2"

Step 3: Classify repeat bins into three classes (Order, HOR, Disorder) using the pre-trained LDA model.

Within each bin, monomers are compared in an all-to-all manner using BLAT. The resulting output is processed to calculate Jaccard index score, which are used to construct a network. Network structure and monomer information are then combined into a feature table that serves as input for a pre-trained LDA model. Each bin is classified with the LDA model, after which adjacent bins sharing the same class and threshold are merged, and their monomers are rearranged into the merged bins.

Usage: 
	
HiReNET comparemonomer \
	--bins-dir arrangemonomer_10kb/test_bin_monomers \
	--outdir comparemonomers_wcon \
	--consensus consensus.fasta 
	
HiReNET comparemonomer \
	--bins-dir arrangemonomer_10kb/test_bin_monomers \
	--outdir comparemonomers2_ncon

HiReNET classprediction \
	--blatsub comparemonomers_wcon/blat_output_sub \
	--outdir classpred_out \
	--prefix test \
	--bin 10000
	
HiReNET classprediction \
	--blatsub comparemonomers_wcon/blat_output_sub \
	--outdir classpred_out2 \
	--prefix test \
	--bin 10000 \
	--plot

Step 4: Annotate local HOR patterns through kmer-based analysis for each bin.

For each merged bin, monomers are compared in an all-to-all manner again and a network is constructed again using the optimal similarity threshold. Monomers are then grouped based on network communities, and higher-order repeat (HOR) patterns are identified within each merged bin.

Usage:
	
HiReNET rearrangemonomers \
  --bins classpred_out/test_fin_bins_combined.txt \
  --class HOR \
  --prefix test \
  --monomer-dir monomerout/test_monomers \
  --outdir rearrange_monomers_mergebin \
  --chr "chr1,chr2"

HiReNET comparemonomer \
	--bins-dir rearrange_monomers_mergebin/re_arrange_monomers \
	--outdir compare_rearrangemonomers
	
HiReNET networkHOR \
	--blatsub compare_rearrangemonomers/blat_output_sub \
	--bins classpred_out/AthCEN178_fin_bins_combined.txt \
	--coor rearrange_monomers_mergebin/test_monomer_bed_inbin.txt 
	--outdir network_HOR_mergebin

Step 5: Find shared HOR patterns on the chromosome level or the genome level.

In each merged HOR bin, monomers with the same label are extracted to generate consensus HOR monomers. Consensus HOR monomers that share the same threshold are combined, and these are compared in an all-to-all manner across thresholds ranging from 0.90 to 0.99. Monomers are then relabeled, and shared HOR patterns are identified for each threshold.

Usage:
	
HiReNET arrangeHORmonomer \
	--groupdir network_HOR_mergebin \
	--monomer-dir monomerout/test_monomers \
	--outdir network_mergebin_consensus
	
HiReNET consensusHORmonomer \
	--outdir network_mergebin_consensus \
	--threads 4 \
	--chroms "chr1,chr2"
	
HiReNET compareConsensus \
	--chr "chr1" --consensdir network_mergebin_consensus/all_recluster_consensus_monomer \
	--outdir compare_consensusHOR_chr1
	
HiReNET sharedHOR\
	--chr "chr1" --datadir compare_consensusHOR_chr1/blat_sub \
	--outdir shared_out_chr1 \
	--letter network_HOR_mergebin/mergebin_string_outputs

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
AthCEN178_phmm		AthCEN178_phmm
R		R
bin		bin
scripts		scripts
AthCEN178_consensus.fasta		AthCEN178_consensus.fasta
AthCEN178_consensus_onemer.fasta		AthCEN178_consensus_onemer.fasta
AthCEN178_consensus_variant.fasta		AthCEN178_consensus_variant.fasta
HiReNET_Arabidopsis.sh		HiReNET_Arabidopsis.sh
LICENSE		LICENSE
README.md		README.md
make_dispatcher.sh		make_dispatcher.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

HiReNET – Higher-order Repeat Network Exploration Tool

Introduction:

Installation:

Dependencies:

Usage:

Step 1: Prepare profile hidden Markov models (phmm’s) for repeat monomer identification.

Step 2: Identify repeat monomers and arrange monomers in the customized bins.

Step 3: Classify repeat bins into three classes (Order, HOR, Disorder) using the pre-trained LDA model.

Step 4: Annotate local HOR patterns through kmer-based analysis for each bin.

Step 5: Find shared HOR patterns on the chromosome level or the genome level.

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

dawelab/HiReNET

Folders and files

Latest commit

History

Repository files navigation

HiReNET – Higher-order Repeat Network Exploration Tool

Introduction:

Installation:

Dependencies:

Usage:

Step 1: Prepare profile hidden Markov models (phmm’s) for repeat monomer identification.

Step 2: Identify repeat monomers and arrange monomers in the customized bins.

Step 3: Classify repeat bins into three classes (Order, HOR, Disorder) using the pre-trained LDA model.

Step 4: Annotate local HOR patterns through kmer-based analysis for each bin.

Step 5: Find shared HOR patterns on the chromosome level or the genome level.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages