Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Feb;25(2):123-141.
doi: 10.1038/s41576-023-00638-1. Epub 2023 Sep 6.

Computational methods for analysing multiscale 3D genome organization

Affiliations
Review

Computational methods for analysing multiscale 3D genome organization

Yang Zhang et al. Nat Rev Genet. 2024 Feb.

Abstract

Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.
Overview of multiscale 3D genome features and assays. a. Schematic view of multiscale 3D chromatin organization: DNA is packaged into chromosome territories, where it is intertwined with nuclear bodies such as nuclear speckles. Packaging is achieved through progressively finer resolution structural motifs such as compartments (megabase scale), chromatin domains such as TADs (100kb to a few Mb) and chromatin loops (10kb to 100kb apart, mediated by architectural proteins such as cohesin and CTCF). b. Experimental methods such as Hi-C can be used to capture chromatin contact frequency between pairwise loci or across multiple loci. c. Ligation-free methods (such as DNA Adenine Methyltransferase Identification (DamID), Tyramide Signal Amplification sequencing (TSA-seq), and genomic loci positioning by sequencing (GPSeq)) measure distance and contact frequency relative to nuclear bodies or positioning within the nucleus. d. Single-cell Hi-C can be used to detect variation in chromatin interactions among cells in complex tissues, for example, through adding multiplexed barcodes to individual cells. e. Multiplexed DNA Fluorescence In Situ Hybridization (DNA-FISH) provides direct spatial location of DNA loci and traces chromatin conformations in the nucleus.
Figure 2.
Figure 2.
Computational methods for identifying 3D genome features from Hi-C data. a. First, Hi-C reads are aligned onto the reference genome, filtered, and invalid read pairs are removed. Bins of equal size are created and valid interactions are assigned to generate the raw contact frequency map, which is further normalized and assessed by various quality control metrics. The quality of the Hi-C contact frequency map can be evaluated through a distance-dependent contact frequency decay curve or stratum-adjusted correlation based on distance-based stratification. b. A/B compartments are identified from the 2D contact frequency maps. Applying Principal Component Analysis (PCA) on the Pearson correlation matrix calculated from the observed over expected (O/E) matrix allows genomic bins to be assigned to A/B compartments, which can be further separated into sub-compartments. c. Topologically Associating Domains (TADs) are domain structures along the diagonal of the Hi-C contact frequency map. Some TADs have subTADs nested within a meta-TAD, and they may show partially overlapping structures (left). The schematic chromatin structure corresponding to the contact frequency map is shown as a cartoon (middle). TAD and sub-TADs have distinct characteristics (right). d. Loops and stripes are fine-scale structures on contact frequency maps. Two different approaches are used to identify significant chromatin interactions (left). The first approach selects strong chromatin interactions by comparing contact frequency with neighboring bins on the contact frequency map (center). The second approach fits a global distribution between contact frequency and 1D distance and selects significant chromatin interactions as outliers based on the fitted model (right).
Figure 3.
Figure 3.
Machine learning-based approaches for predicting 3D genome features. a. Significant chromatin interactions, such as CTCF-CTCF loops and enhancer-promoter interactions, can be predicted using machine learning methods that take both sequence and epigenomic signals at loop anchors and features between anchors as input. A supervised method is trained on a subset of true interactions and evaluated on unseen test data. TF, transcription factor. b. Deep neural network models predict genome-wide contact frequency between loci using large stretches of DNA sequence context as input. Some methods also pre-train the model by predicting 1D epigenetic signals from DNA sequence and then transferring the learned DNA feature representations to a second model to predict 2D contact frequency maps.
Figure 4.
Figure 4.
Incorporation of mechanisms of genome folding in modelling approaches. a. The process of loop extrusion is shown, whereby a cohesin molecule attaches to the chromatin fiber and starts extruding it into a loop; the process stops when cohesin falls off or encounters another cohesin or a bound CTCF protein. Loading and unloading factors facilitate the process. Loop extrusion accounts for both loops and Topologically Associating Domains (TAD) observed in Hi-C contact frequency maps. b. The mechanism underlying phase separation is shown. Chromatin segments with different affinities (represented by different colors) micro-phase separate within the nucleus owing to attractive interactions between regions of the same affinity class, spatial restraints from the polymer chain, and competition with other interactions. This mechanism accounts for chromatin compartmentalization as observed in the characteristic Hi-C contact frequency map checkerboard pattern.
Figure 5.
Figure 5.
Data-driven genome modelling methods. a. Spatial information from data generated by a variety of 3D genome mapping methods, such as Hi-C, Tyramide Signal Amplification sequencing (TSA-seq), and imaging data, is used to simulate 3D genome structures by minimizing the deviation of the model prediction from the experimental input. b. The rationale behind the resampling (left) and deconvolution (right) modelling methods is shown using ensemble Hi-C data as experimental input. In the resampling approach, the same set of contacts (shown as coloured dumbbells) is expressed in all the sampled structures. By contrast, in the deconvolution method, different batches of contacts are allocated into different structures. c. The simulated structures are used to compute 3D genome features that comprehensively characterize the local and global nuclear microenvironment of all chromatin loci, such as the average radial position and distance to speckle of each chromatin locus and their variabilities across the ensemble of single-cell models. SPRITE, Split-Pool Recognition of Interactions by Tag Extension; GAM, Genome Architecture Mapping; DamID, DNA Adenine Methyltransferase Identification; FISH, Fluorescence In Situ Hybridization; HIPMap, High-throughput Imaging Position Mapping.
Figure 6.
Figure 6.
A typical workflow for processing and analyzing single-cell Hi-C (scHi-C) data. a. scHi-C data provides insights into cell-to-cell variability and temporal cellular processes by separating sequencing reads into individual cells based on cellular barcodes. b. Computational methods are used to transform the contact frequency map into lower dimensional space (embeddings), impute missing values, and enhance data quality. Hypergraph representation learning (middle) can perform embedding and data imputation jointly. c. Using the embeddings, downstream analyses can reveal cell types (through clustering), multiscale 3D genome features, heterogeneity and variability of 3D genome organization, and association with other cellular processes such as DNA methylation. TAD, topologically associating domain

Similar articles

  • Machine Learning Methods for Exploring Sequence Determinants of 3D Genome Organization.
    Yang M, Ma J. Yang M, et al. J Mol Biol. 2022 Aug 15;434(15):167666. doi: 10.1016/j.jmb.2022.167666. Epub 2022 Jun 2. J Mol Biol. 2022. PMID: 35659533 Review.
  • An integrated view of the structure and function of the human 4D nucleome.
    4D Nucleome Consortium; Dekker J, Oksuz BA, Zhang Y, Wang Y, Minsk MK, Kuang S, Yang L, Gibcus JH, Krietenstein N, Rando OJ, Xu J, Janssens DH, Henikoff S, Kukalev A, Willemin A, Winick-Ng W, Kempfer R, Pombo A, Yu M, Kumar P, Zhang L, Belmont AS, Sasaki T, van Schaik T, Brueckner L, Peric-Hupkes D, van Steensel B, Wang P, Chai H, Kim M, Ruan Y, Zhang R, Quinodoz SA, Bhat P, Guttman M, Zhao W, Chien S, Liu Y, Venev SV, Plewczynski D, Azcarate II, Szabó D, Thieme CJ, Szczepińska T, Chiliński M, Sengupta K, Conte M, Esposito A, Abraham A, Zhang R, Wang Y, Wen X, Wu Q, Yang Y, Liu J, Boninsegna L, Yildirim A, Zhan Y, Chiariello AM, Bianco S, Lee L, Hu M, Li Y, Barnett RJ, Cook AL, Emerson DJ, Marchal C, Zhao P, Park P, Alver BH, Schroeder A, Navelkar R, Bakker C, Ronchetti W, Ehmsen S, Veit A, Gehlenborg N, Wang T, Li D, Wang X, Nicodemi M, Ren B, Zhong S, Phillips-Cremins JE, Gilbert DM, Pollard KS, Alber F, Ma J, Noble WS, Yue F. 4D Nucleome Consortium, et al. bioRxiv [Preprint]. 2024 Oct 27:2024.09.17.613111. doi: 10.1101/2024.09.17.613111. bioRxiv. 2024. PMID: 39484446 Free PMC article. Preprint.
  • Decoding the plant genome: From epigenome to 3D organization.
    Ouyang W, Cao Z, Xiong D, Li G, Li X. Ouyang W, et al. J Genet Genomics. 2020 Aug;47(8):425-435. doi: 10.1016/j.jgg.2020.06.007. Epub 2020 Aug 8. J Genet Genomics. 2020. PMID: 33023833 Review.
  • Plant 3D genomics: the exploration and application of chromatin organization.
    Pei L, Li G, Lindsey K, Zhang X, Wang M. Pei L, et al. New Phytol. 2021 Jun;230(5):1772-1786. doi: 10.1111/nph.17262. Epub 2021 Mar 4. New Phytol. 2021. PMID: 33560539 Free PMC article. Review.
  • Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning.
    Dozmorov MG. Dozmorov MG. Bioinformatics. 2017 Oct 15;33(20):3323-3330. doi: 10.1093/bioinformatics/btx414. Bioinformatics. 2017. PMID: 29028263

Cited by

References

    1. Misteli T The Self-Organizing Genome: Principles of Genome Architecture and Function. Cell 183, 28–45 (2020). - PMC - PubMed
    2. This review captures the recent state of the field and defines some of the basic principles that shape genome organization.

    1. Tolhuis B, Palstra R-J, Splinter E, Grosveld F & de Laat W Looping and Interaction between Hypersensitive Sites in the Active β-globin Locus. Mol. Cell 10, 1453–1465 (2002). - PubMed
    1. Tang Z et al. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell 163, 1611–1627 (2015). - PMC - PubMed
    1. Dixon JR et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012). - PMC - PubMed
    1. Sexton T et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012). - PubMed