Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 4:13:591.
doi: 10.1186/1471-2164-13-591.

Copynumber: Efficient algorithms for single- and multi-track copy number segmentation

Affiliations

Copynumber: Efficient algorithms for single- and multi-track copy number segmentation

Gro Nilsen et al. BMC Genomics. .

Abstract

Background: Cancer progression is associated with genomic instability and an accumulation of gains and losses of DNA. The growing variety of tools for measuring genomic copy numbers, including various types of array-CGH, SNP arrays and high-throughput sequencing, calls for a coherent framework offering unified and consistent handling of single- and multi-track segmentation problems. In addition, there is a demand for highly computationally efficient segmentation algorithms, due to the emergence of very high density scans of copy number.

Results: A comprehensive Bioconductor package for copy number analysis is presented. The package offers a unified framework for single sample, multi-sample and multi-track segmentation and is based on statistically sound penalized least squares principles. Conditional on the number of breakpoints, the estimates are optimal in the least squares sense. A novel and computationally highly efficient algorithm is proposed that utilizes vector-based operations in R. Three case studies are presented.

Conclusions: The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An overview of the copynumber package. Depending on the aim of the analysis, the input will be copy number data and possibly allele frequencies from one or more experiments. Preprocessing tools are available for outlier handling and missing data imputation, and three different methods handle single sample, multi-sample and allele-specific segmentation. Several options are also available for the graphical visualization of data and segmentation results.
Figure 2
Figure 2
The effect of changing the penalty γ in PCF. The plot in the upper left corner shows the copy number data for a selected chromosome (in this case, chromosome 17), while the lower right plot shows the number of segments found by PCF as a function of γ. The remaining plots show the segmentation curves for ten different values of γ. The plot was created with the function plotGamma in copynumber.
Figure 3
Figure 3
Aberration calling accuracy. The ROC-curves show the sensitivity and specificity for a sequence of thresholds as calculated by comparing aberration calls to the classifications made in a MLPA-analysis on the same data material. In panel (a), classifications were made based on PCF segmentations found for a wide range of γ-values. Notably, the classification accuracy is not affected much by the choice of γ, except to some extent for very low values. Panel (b) shows that aberration calls based on multi-sample PCF segmentations are about as accurate as those based on single sample PCF. In panel (c), ROC-curves are shown for calls made on the basis of the segmentations found by PCF and CBS, a running median with window size 50 and raw data. In terms of aberration calling accuracy, PCF and CBS give nearly the same results, while using the running median gives slightly less accurate classifications. Using only raw data leads to much poorer accuracy. Note the range on the ordinate axis.
Figure 4
Figure 4
Comparison of results from single sample and multi-sample PCF. In single sample PCF, γ=40 was used, while in multi-sample PCF, γ=120 was used to limit the number of segments. Note that the estimated aberration patterns are quite similar; indicating that the multi-sample PCF estimates (panel b) should be well suited as variables in statistical analyses. On a more detailed level there are differences, e.g., longer segments in the single sample analysis (panel a) are divided into subsegments with slightly different estimates in the multi-sample analysis. The plot was created with the function plotHeatmap in copynumber.
Figure 5
Figure 5
Analysis of disseminated tumor cells (DTCs) with multi-sample PCF. The top panel shows the primary tumor and the three panels below show single cells morphologically classified as DTCs (all for chromosome 2). High noise levels make separate analyses of each DTC difficult; co-analyzing multiple DTCs, possibly together with a primary tumor, thus facilitates an evaluation of the degree of correspondence between the aberration patterns. In the present case, two DTCs seem to have aberration patterns similar to the primary tumor, while the last cell has an essentially flat (balanced) pattern and is probably a hematopoietic cell misclassified as a DTC. The plot was created with the function plotChrom in copynumber.
Figure 6
Figure 6
Whole-genome view of aberrations in the follicular lymphoma data. The plot is based on all 100 biopsies, and aberrations were defined as copy number estimates above 0.05 (for gains) or below -0.05 (for losses). Aberration frequencies are shown in red for gains and green for losses. Correlations between the copy number activity at different genomic locations are shown as arcs (blue for positive correlations and yellow for negative correlations), using a correlation threshold of ±0.68 to determine which correlations to display. Aberration frequencies are based on the segmentation found with single sample PCF (with γ=16 and kmin=3), while correlations are based on the segmentation found with multi-sample PCF (with γ=6). The plot was created with the function plotCircle in copynumber.
Figure 7
Figure 7
Allele-specific PCF analysis of SNP array data. Results are shown for a breast carcinoma sample in the MicMa cohort for chromosome 1 (panel a) and chromosome 17 (panel b). The points in the upper two panels show observed total copy numbers (logR) while the points in the lower two panels show observed B allele frequencies (BAF). The red curves show the result of applying the allele-specific PCF segmentation method to the data. The plot was created with the function plotAllele in copynumber.

Similar articles

Cited by

References

    1. Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Donovan J, Barretina J, Boehm JS, Dobson J, Urashima M, McHenry KT, Pinchback RM, Ligon AH, Cho YJ, Haery L, Greulich H, Reich M, Winckler W, Lawrence MS, Weir BA, Tanaka KE, Chiang DY, Bass AJ, Loo A, Hoffman C, Prensner J, Liefeld T, Gao Q, Yecies D, Signoretti S. et al.The landscape of somatic copy-number alteration across human cancers. Nature. 2010;18:899–905. - PMC - PubMed
    1. Russnes HG, Vollan HKM, Lingjærde OC, Krasnitz A, Lundin P, Naume B, Sørlie T, Borgen E, Rye IH, Langerød A, Chin SF, Teschendorff AE, Stephens PJ, Månér S, Schlichting E, Baumbusch LO, Kåresen R, Stratton MP, Wigler M, Caldas C, Zetterberg A, Hicks J, Børresen-Dale AB. Genomic architecture characterizes tumor progression paths and fate in breast cancer patients. Sci Transl Med. 2010;2:38–47. - PMC - PubMed
    1. Hupe P, Stransky N, Thiery J, Barillot E. Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics. 2004;20:3413–3422. doi: 10.1093/bioinformatics/bth418. - DOI - PubMed
    1. Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based copy number data. Biostatistics. 2004;5:557–572. doi: 10.1093/biostatistics/kxh008. - DOI - PubMed
    1. Lai WR, Johnson MD, Kucherlapati R, Park PJ. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics. 2005;21:3763–3770. doi: 10.1093/bioinformatics/bti611. - DOI - PMC - PubMed

Publication types