Genome-Wide Analysis of Copy Number Variations in Normal Population Identified by SNP Arrays

Jian Wang1, 2, Tsz-Kwong Man1, 3, 4, Kwong Kwok Wong1, , Pulivarthi H. Rao1, 3, 4, Hon-Chiu Eastwood Leung1, 3, 4, 5, Rudy Guerra6, Ching C. Lau*, 1, 2, 3, 4
1 Texas Children’s Cancer Center and Hematology Service, Texas Children's Hospital
2 Program in Structural and Computational Biology and Molecular Biophysics
3 Dan L. Duncan Cancer Center
4 Department of Pediatrics
5 Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, 77030, USA
6 Department of Statistics, Rice University, Houston, TX 77004, USA
Present address: Department of Gynecologic Oncology, M.D. Anderson Cancer Center, Houston, Texas 77030, USA

* Address correspondence to this author at the Texas Children’s Hospital, 6621 Fannin St. MC 3-3320, Houston, TX 77030, USA; Tel: (832) 824-4543; E-mail:


Gene copy number change is an essential characteristic of many types of cancer. However, it is important to distinguish copy number variation (CNV) in the human genome of normal individuals from bona fide abnormal copy number changes of genes specific to cancers. Based on Affymetrix 50K single nucleotide polymorphism (SNP) array data, we identified genome-wide copy number variations among 104 normal subjects from three ethnic groups that were used in the HapMap project. Our analysis revealed 155 CNV regions, of which 37% were gains and 63% were losses. About 21% (30) of the CNV regions are concordant with earlier reports. These 155 CNV regions are located on more than 100 cytobands across all 23 chromosomes. The CNVs range from 68bp to 18 Mb in length, with a median length of 86 Kb. Eight CNV regions were selected for validation by quantitative PCR. Analysis of genomic sequences within and adjacent to CNVs suggests that repetitive sequences such as long interspersed nuclear elements (LINEs) and long terminal repeats (LTRs) may play a role in the origin of CNVs by facilitating non-allelic homologous recombination. Thirty-two percent of the CNVs identified in this study are associated with segmental duplications. CNVs were not preferentially enriched in gene-encoding regions. Among the 364 genes that are completely encompassed by these 155 CNVs, genes related to olfactory sensory, chemical stimulus, and other physiological responses are significantly enriched. A statistical analysis of CNVs by ethnic group revealed distinct patterns regarding the CNV location and gain-to-loss ratio. The CNVs reported here will help build a more comprehensive map of genomic variations in the human genome and facilitate the differentiation between copy number variation and somatic changes in cancers. The potential roles of certain repeat elements in CNV formation, as corroborated by other studies, shed light on the origin of CNVs and will improve our understanding of the mechanisms of genomic rearrangements in the human genome.

Keywords: Copy number variation, SNP arrays, genomic variation.