Genome-Wide Analysis of Copy Number Variations in Normal Population Identified by SNP Arrays
Abstract
Gene copy number change is an essential characteristic of many types of cancer. However, it is important to distinguish copy number variation (CNV) in the human genome of normal individuals from bona fide abnormal copy number changes of genes specific to cancers. Based on Affymetrix 50K single nucleotide polymorphism (SNP) array data, we identified genome-wide copy number variations among 104 normal subjects from three ethnic groups that were used in the HapMap project. Our analysis revealed 155 CNV regions, of which 37% were gains and 63% were losses. About 21% (30) of the CNV regions are concordant with earlier reports. These 155 CNV regions are located on more than 100 cytobands across all 23 chromosomes. The CNVs range from 68bp to 18 Mb in length, with a median length of 86 Kb. Eight CNV regions were selected for validation by quantitative PCR. Analysis of genomic sequences within and adjacent to CNVs suggests that repetitive sequences such as long interspersed nuclear elements (LINEs) and long terminal repeats (LTRs) may play a role in the origin of CNVs by facilitating non-allelic homologous recombination. Thirty-two percent of the CNVs identified in this study are associated with segmental duplications. CNVs were not preferentially enriched in gene-encoding regions. Among the 364 genes that are completely encompassed by these 155 CNVs, genes related to olfactory sensory, chemical stimulus, and other physiological responses are significantly enriched. A statistical analysis of CNVs by ethnic group revealed distinct patterns regarding the CNV location and gain-to-loss ratio. The CNVs reported here will help build a more comprehensive map of genomic variations in the human genome and facilitate the differentiation between copy number variation and somatic changes in cancers. The potential roles of certain repeat elements in CNV formation, as corroborated by other studies, shed light on the origin of CNVs and will improve our understanding of the mechanisms of genomic rearrangements in the human genome.