Skip to content

dawelab/HiReNET

Repository files navigation

HiReNET – Higher-order Repeat Network Exploration Tool

Introduction:

HiReNET is a graph-based pipeline for detecting and analyzing higher-order repeats (HORs) in genomic sequences. The pipeline has been applied to both Arabidopsis thaliana and maize, providing insights into centromeric and knob repeat structure.

Installation:

Not finished yet.

Dependencies:

HiReNET relies on common bioinformatics tools. Please make sure these are installed and available in your PATH:

  • HMMER
  • BLAST+
  • BLAT
  • SeqKit
  • BEDTools
  • bioawk

R with packages:

  • dplyr
  • stringr
  • reshape2
  • igraph
  • ggplot2
  • cowplot

Usage:

Step 1: Prepare profile hidden Markov models (phmm’s) for repeat monomer identification.

Before starting this pipeline, whole-genome sequencing short reads should be analyzed with RepeatExplorer TAREAN to generate consensus repeats and repeat variants for each repeat type. RepeatExplorer is available through the public Galaxy server at https://repeatexplorer-elixir.cerit-sc.cz/. Using the outputs from RepeatExplorer TAREAN, you can then select the repeat of interest and generate profile hidden Markov models (pHMMs).

image
Usage:
	
HiReNET getphmm -i test_con_variant.fasta -o phmm -p test

Step 2: Identify repeat monomers and arrange monomers in the customized bins.

Repeat monomers can be identified using profile hidden Markov models (pHMMs) and then organized into customized bins. A bin size of 10 kb is typically a good starting point.

image
Usage:
	
HiReNET arrayfind -g genome.fasta -c consensus.fasta -o arrayout -p test
HiReNET monomerfind \
	--arrays-dir arrayout \
	--outdir monomerout \
	--prefix test --hmm phmm/test.hmm \
	--chr "chr1,chr2"
HiReNET arrangemonomer \
	--monomer-dir monomerout/test_monomers \
	--outdir arrangemonomer_10kb \
	--prefix test \
	--bin 10000 \
	--chr "chr1,chr2"

Step 3: Classify repeat bins into three classes (Order, HOR, Disorder) using the pre-trained LDA model.

Within each bin, monomers are compared in an all-to-all manner using BLAT. The resulting output is processed to calculate Jaccard index score, which are used to construct a network. Network structure and monomer information are then combined into a feature table that serves as input for a pre-trained LDA model. Each bin is classified with the LDA model, after which adjacent bins sharing the same class and threshold are merged, and their monomers are rearranged into the merged bins.

image
Usage: 
	
HiReNET comparemonomer \
	--bins-dir arrangemonomer_10kb/test_bin_monomers \
	--outdir comparemonomers_wcon \
	--consensus consensus.fasta 
	
HiReNET comparemonomer \
	--bins-dir arrangemonomer_10kb/test_bin_monomers \
	--outdir comparemonomers2_ncon

HiReNET classprediction \
	--blatsub comparemonomers_wcon/blat_output_sub \
	--outdir classpred_out \
	--prefix test \
	--bin 10000
	
HiReNET classprediction \
	--blatsub comparemonomers_wcon/blat_output_sub \
	--outdir classpred_out2 \
	--prefix test \
	--bin 10000 \
	--plot

Step 4: Annotate local HOR patterns through kmer-based analysis for each bin.

For each merged bin, monomers are compared in an all-to-all manner again and a network is constructed again using the optimal similarity threshold. Monomers are then grouped based on network communities, and higher-order repeat (HOR) patterns are identified within each merged bin.

image
Usage:
	
HiReNET rearrangemonomers \
  --bins classpred_out/test_fin_bins_combined.txt \
  --class HOR \
  --prefix test \
  --monomer-dir monomerout/test_monomers \
  --outdir rearrange_monomers_mergebin \
  --chr "chr1,chr2"

HiReNET comparemonomer \
	--bins-dir rearrange_monomers_mergebin/re_arrange_monomers \
	--outdir compare_rearrangemonomers
	
HiReNET networkHOR \
	--blatsub compare_rearrangemonomers/blat_output_sub \
	--bins classpred_out/AthCEN178_fin_bins_combined.txt \
	--coor rearrange_monomers_mergebin/test_monomer_bed_inbin.txt 
	--outdir network_HOR_mergebin

Step 5: Find shared HOR patterns on the chromosome level or the genome level.

In each merged HOR bin, monomers with the same label are extracted to generate consensus HOR monomers. Consensus HOR monomers that share the same threshold are combined, and these are compared in an all-to-all manner across thresholds ranging from 0.90 to 0.99. Monomers are then relabeled, and shared HOR patterns are identified for each threshold.

image
Usage:
	
HiReNET arrangeHORmonomer \
	--groupdir network_HOR_mergebin \
	--monomer-dir monomerout/test_monomers \
	--outdir network_mergebin_consensus
	
HiReNET consensusHORmonomer \
	--outdir network_mergebin_consensus \
	--threads 4 \
	--chroms "chr1,chr2"
	
HiReNET compareConsensus \
	--chr "chr1" --consensdir network_mergebin_consensus/all_recluster_consensus_monomer \
	--outdir compare_consensusHOR_chr1
	
HiReNET sharedHOR\
	--chr "chr1" --datadir compare_consensusHOR_chr1/blat_sub \
	--outdir shared_out_chr1 \
	--letter network_HOR_mergebin/mergebin_string_outputs  

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published