Skip to content

HiReNET wiki

Mingyu-Wang edited this page Nov 5, 2025 · 5 revisions

Usage

Part 1: Prepare profile hidden Markov models (phmm’s) for repeat monomer identification.

Before starting this pipeline, whole-genome sequencing short reads should be analyzed with RepeatExplorer TAREAN to generate consensus repeats and repeat variants for each repeat type. RepeatExplorer is available through the public Galaxy server at https://repeatexplorer-elixir.cerit-sc.cz/. Using the outputs from RepeatExplorer TAREAN, you can then select the repeat of interest and generate profile hidden Markov models (pHMMs).

image

Step 1: Generate profile hidden Markov models (phmm’s)

Sequences of repeat variants of your interest are output from RepeatExplorer-TAREAN, which are required to generate the phmm’s.

HiReNET getphmm -i data/AthCEN178_consensus_variant.fasta -o phmm -p test

Part 2: Identify repeat monomers and arrange monomers in the customized bins.

Repeat monomers can be identified using profile hidden Markov models (pHMMs) and then organized into customized bins. A bin size of 10 kb is typically a good starting point.

image

Step 2: Find repeat arrays

Non-repeat sequences, such as transposable elements (TEs), are typically sparse within the genome. To minimize noise from these regions, HiReNET focuses exclusively on repeat-enriched intervals (repeat arrays) for downstream analysis. The directory PREFIX_arrayout includes repeat array sequences (FASTA files) and coordinates (BED files).

	
HiReNET arrayfind -g data/test.fasta -c data/AthCEN178_consensus.fasta -o test_arrayout -p test

Step 3: Detect monomers

The default monomer length is 120bp. You can change this threshold with the --min-monomer-len option. Use the --chr flag to restrict analysis to specific chromosomes (e.g., --chr chr1,chr3,chr5). All monomer sequences are included under PREFIX_monomerout/PREFIX_monomers. Each monomer is named based on the coordinates, e.g., chr1_14389767_14389941.fa.

HiReNET monomerfind \
  --arrays-dir test_arrayout \
  --chrom-dir test_arrayout/split_seq \
  --outdir test_monomerout \
  --prefix test \
  --hmm test_phmm/test.hmm \
  --chr chr1,chr3

HiReNET monomerfind \
  --arrays-dir test_arrayout \
  --chrom-dir test_arrayout/split_seq \
  --outdir test_monomerout_chr1 \
  --prefix test \
  --hmm test_phmm/test.hmm \
  --chr chr1 
  
HiReNET monomerfind \
  --arrays-dir test_arrayout \
  --chrom-dir test_arrayout/split_seq \
  --outdir test_monomerout_chr1_2 \
  --prefix test \
  --hmm test_phmm/test.hmm \
  --chr chr1 \
  --min-monomer-len 150

Step 4 — Arrange monomers into bins

The default bin size is 10 kb. HiReNET performs network analysis of monomers within these defined bins. Changing the bin size may alter which monomers are grouped together, potentially affecting the detected HOR patterns. It is therefore recommended to keep the default bin size unless there is a specific reason to modify it. Monomer sequences (FASTA) in each bin are located under PREFIX_arrangemonomer_10kb/PREFIX_bin_monomers, named according to their coordinates, e.g., chr1_14389697_14926924_14389767_14399767.fa.

HiReNET arrangemonomer \
  --arrays-dir test_arrayout \
  --genomic-bed-dir test_monomerout \
  --monomer-dir test_monomerout/test_monomers \
  --outdir test_arrangemonomer_10kb \
  --prefix test \
  --bin 10000 \
  --chr chr1,chr3

HiReNET arrangemonomer \
  --arrays-dir test_arrayout \
  --genomic-bed-dir test_monomerout \
  --monomer-dir test_monomerout/test_monomers \
  --outdir test_arrangemonomer_10kb_chr3 \
  --prefix test \
  --bin 10000 \
  --chr chr3

HiReNET arrangemonomer \
  --arrays-dir test_arrayout \
  --genomic-bed-dir test_monomerout_chr1 \
  --monomer-dir test_monomerout_chr1/test_monomers \
  --outdir test_arrangemonomer_10kb_chr1 \
  --prefix test \
  --bin 10000 \
  --chr chr1

Part 3: Classify repeat bins into three classes (Order, HOR, Disorder) using the pre-trained LDA model.

Within each bin, monomers are compared in an all-to-all manner using BLAT. The resulting output is processed to calculate the Jaccard index score, which is used to construct a network. The network structure and monomer information are then combined into a feature table, which serves as input for a pre-trained LDA model. Each bin is classified with the LDA model, after which adjacent bins sharing the same class and threshold are merged, and their monomers are rearranged into the merged bins.

image

Step 5: Compare monomers (self + consensus)

In this step, BLAT performs an all-to-all comparison of monomers. When the --consensus option is provided, each monomer is additionally compared against the consensus repeat sequence, allowing assessment of sequence similarity at the chromosome or genome level. The blat_output and blat_con_output are the original outputs from BLAT. The blat_output_sub and blat_con_output_sub include processed BLAT output, only containing information related to Jaccard index score calculation.

HiReNET comparemonomer \
  --bins-dir test_arrangemonomer_10kb/test_bin_monomers \
  --outdir test_comparemonomers \
  --consensus data/AthCEN178_consensus.fasta

Step 6: Predict HOR classes

The R packages listed in the Dependencies section must be installed before running this step. Adding the --plot flag will generate a network plot for each HOR bin.

HiReNET classprediction \
  --blatsub test_comparemonomers/blat_output_sub \
  --outdir test_classpred_out \
  --prefix test \
  --bin 10000 \
  --plot

HiReNET classprediction \
  --blatsub test_comparemonomers/blat_output_sub \
  --outdir test_classpred_out_noplot \
  --prefix test \
  --bin 10000 

Part 4: Annotate local HOR patterns through kmer-based analysis for each bin.

For each merged bin, monomers are compared in an all-to-all manner again, and a network is constructed again using the optimal similarity threshold. Monomers are then grouped based on network communities, and higher-order repeat (HOR) patterns are identified within each merged bin.

image

Step 7: Rearrange monomers by class

The File test_fin_bins_combined.txt contains all information for each merged HOR bin.

HiReNET rearrangemonomers \
  --bins test_classpred_out/test_fin_bins_combined.txt \
  --class HOR \
  --prefix test \
  --monomer-dir test_monomerout/test_monomers \
  --outdir test_rearrange_monomers_mergebin_chr1 \
  --chr chr1

Step 8: Compare rearranged monomers

HiReNET comparemonomer \
  --bins-dir test_rearrange_monomers_mergebin/re_arrange_monomers \
  --outdir test_compare_rearrangemonomers

Step 9: Build HOR network for merged HOR bins

HiReNET networkHOR \
  --blatsub test_compare_rearrangemonomers/blat_output_sub \
  --bins test_classpred_out/test_fin_bins_combined.txt \
  --coor test_rearrange_monomers_mergebin/test_monomer_bed_inbin.txt \
  --outdir test_network_HOR_mergebin

Part 5: Find shared HOR patterns on the chromosome level or the genome level.

In each merged HOR bin, monomers with the same label are extracted to generate consensus HOR monomers. Consensus HOR monomers that share the same threshold are combined, and these are compared in an all-to-all manner across thresholds ranging from 0.90 to 0.99. Monomers are then relabeled, and shared HOR patterns are identified for each threshold.

image

Step 10: Arrange HOR monomers for consensus

	
HiReNET arrangeHORmonomer \
  --groupdir test_network_HOR_mergebin \
  --monomer-dir test_monomerout/test_monomers \
  --outdir test_network_mergebin_consensus

Step 11: Build consensus HORs per chromosome

	
HiReNET consensusHORmonomer \
  --outdir test_network_mergebin_consensus \
  --threads 10 \
  --chroms chr1  
     
HiReNET compareConsensus \
  --chr chr1 \
  --consensdir test_network_mergebin_consensus/all_recluster_consensus_monomer \
  --outdir test_compare_consensusHOR_chr1

HiReNET sharedHOR \
  --chr chr1 \
  --datadir test_compare_consensusHOR_chr1/blat_sub \
  --outdir test_shared_out_chr1 \
  --letter test_network_HOR_mergebin/mergebin_string_outputs \
  --plotv V2

HiReNET sharedHOR \
  --chr chr1 \
  --datadir test_compare_consensusHOR_chr1/blat_sub \
  --outdir test_shared_out_chr1 \
  --letter test_network_HOR_mergebin/mergebin_string_outputs \
  --plotv V1

HiReNET sharedHOR \
  --chr chr1 \
  --datadir test_compare_consensusHOR_chr1/blat_sub \
  --outdir test_shared_out_chr1 \
  --letter test_network_HOR_mergebin/mergebin_string_outputs \
  --plotv V3

# Use loop to build consensus HORs per chromosome 
for chr in chr1 chr3; do
  HiReNET consensusHORmonomer \
    --outdir test_network_mergebin_consensus \
    --threads 10 \
    --chroms "$chr"

  HiReNET compareConsensus \
    --chr "$chr" \
    --consensdir test_network_mergebin_consensus/all_recluster_consensus_monomer \
    --outdir test_compare_consensusHOR_${chr}

  HiReNET sharedHOR \
    --chr "$chr" \
    --datadir test_compare_consensusHOR_${chr}/blat_sub \
    --outdir test_shared_out_${chr}_2 \
    --letter test_network_HOR_mergebin/mergebin_string_outputs \
    --plotv V2
done