Determination of the SNP-SNP Interaction between Breast Cancer Related Genes to Analyze the Disease Susceptibility

(1)



Abstract—Investigation of the single nucleotide polymorphism (SNP)-SNP interaction model can facilitate the analysis of the susceptibility to disease. The model explains the risk of association between the genotypes and the disease in case-control study. Thus, many mathematic methods are widely applied to identify the statistically significant model such as odds ratio (OR), chi-square test, and error rate. However, a huge number of data sets have been found to limit the statistical methods to identify the significant model. In this study, we propose a novel statistical method, complementary-logic particle swarm optimization (CLPSO), to increase the efficiency of significant model identification in case-control study. The complementary-logic is implemented to improve the PSO search ability and identify a better SNP-SNP interaction model. Six important breast cancer genes including 23 SNPs and simulated huge number of data sets were selected as the test data sets. The methods of PSO and CLPSO were applied on the identification of SNP-SNP interactions in the two-way to five-way. In results, the OR evaluates the breast cancer risk of the identified SNP-SNP interaction model. Compared to the corresponding non-interaction model, if the OR value is greater than 1 that indicates the model is significant risk between cases and controls. The results showed that CLPSO is able to identify the significant models for specific SNP-SNP interaction of two-way to five-way (OR value: 1.153-1.391; confidence interval (CI): 1.05-1.79; p-value: 0.01-0.003). The model suggests that the genes ESR1, PGR, and SHBG may be an important role in the interactive effects to breast cancer. In addition, we compared the search abilities of PSO and CLPSO for identification of the significant model. Results revealed that CLPSO can identify better model with difference values between cases and controls than PSO; it suggests CLPSO can be used to identify a better SNP-SNP interaction models.

Index Terms—Single nucleotide polymorphism (SNP), particle swarm optimization (PSO), breast cancer.

I. INTRODUCTION

SNP is an common bio-marker in genomes, and it has widely used in the investigation of association analysis of diseases [1], cancers [2], and pharmacogenomics [3]. These

Manuscript received April 1, 2014; revised May 30, 2014. This work was supported in part by the National Science Council in Taiwan (NSC101-2320-B-037-049, NSC101-2622- E-151-027-CC3, NSC102-2221-E-214-039 and NSC102-2221-E-151-024-MY3).

Mei-Lee Hwang and Li-Yeh Chuang are with the Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan (e-mail: mlhwang@isu.edu.tw, chuang@isu.edu.tw).

Yu-Da Lin and Cheng-Hong Yang are with the Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan (e-mail: e0955767257@yahoo.com.tw, chyang@cc.kuas.edu.tw).

analyses reported that SNPs have specific associations with the risk of certain diseases. However, most of the SNP analyses were focused on a single SNP. The low or marginal significant SNPs could be excluded, but these SNPs may have significant associations with disease when they are combined as SNP-SNP interaction model. Thus, the identification of appropriate interaction model is an important issue for SNP analysis.

A SNP-SNP interaction model includes the SNPs and their corresponding genotypes (AA, Aa, and aa). Therefore, the possible models are rapidly increased by the number of SNPs and their corresponding genotypes. The huge potential SNP-SNP interaction model makes the statistical method difficultly identify the significant models. Currently, machine learning have applied to help statistical method on the identification of appropriate interaction models, such as particle swarm optimization (PSO) [4]-[6] and genetic algorithm (GA) [7], [8]. PSO and GA have the properties of randomized search and are an optimization technique that derives its working principles from simulations of the organism behavior. They provide a fast identification in high-dimension problem, e.g., biomarker selection [9]. Thus, they can be employed to identify an optimal SNP combination from the huge possible combinations. Although they had overcome the excessive computational time to identify the significant models, the search ability remains a challenge. Thus, an improved method is required to explore the SNP-SNP interaction.

Here, we propose a complementary-logic PSO, named CLPSO, to identify the significant model associated with SNP-SNP interactions. In this study, we hypothesize that the interactions between polymorphisms of genes may have a synergistic or non-additive effect on the pathogenesis of a disease. This interaction may explain differences between cases and controls in the disease risk. Six breast cancer related genes (COMT, CYP19A1, ESR1, PGR, SHBG, and STS) including 23 SNPs were selected to simulate huge number of data sets. Results indicate that CLPSO can identify the appropriate interaction models in breast cancer from the huge number of simulated data sets, and the results provide the significant information for determining the SNP-SNP interaction model with maximal difference between the cases and controls.

II. METHODS A. Problem Definition

For SNP-SNP interaction problem, a vector like X = [x1, x2,

Determination of the SNP-SNP Interaction between Breast

Cancer Related Genes to Analyze the Disease

Susceptibility

Mei-Lee Hwang, Yu-Da Lin, Li-Yeh Chuang, and Cheng-Hong Yang

International Journal of Machine Learning and Computing, Vol. 4, No. 5, October 2014

468 DOI: 10.7763/IJMLC.2014.V4.456

(2)

B. Analysis of Breast Cancer Susceptibility from 23 SNPS in Six Genes

The odds ratio and its 95% CI for all SNPs of six genes (COMT, CYP19A1, ESR1, PGR, SHBG, and STS) show that 13 SNPs , including COMT-rs6269, ESR1-rs3020314, ESR1-rs2175898, ESR1-rs1709182, ESR1-rs9478249, ESR1-rs1514348, ESR1-rs532010, PGR-rs660149, PGR-rs500760, SHBG-rs858518, SHBG-rs272428, SHBG-rs858524, and STS-rs2017591, display the statistically significant OR (p < 0.05) for breast cancer; the OR values range between1.268 to 0.846.

C. Analysis of SNP-SNP Interaction Model with Difference between the Cases And Controls

Table II shows the results of SNP-SNP interaction models by 2 to 5-way. The left side in Table II represents the two to five SNP combinations. In these combinations, the 2-way model with SNP combinations and corresponding genotype indicates the rs3020314-Aa associated with rs660149-AA. The difference column indicates the difference between the cases and controls. For example, the 3-way model of CLPSO shows a difference value of 93 between cases and controls (602 vs. 509). Thus, the other models of PSO and CLPSO are explained as above mention. In CLPSO, the difference values between cases and controls are reduced from 126 to 41 in the two-way to five-way SNP-SNP interaction models. In PSO, the difference values between cases and controls are reduced from 126 to 15 in the two-way to five-way SNP-SNP interaction models. The larger difference value between cases and controls represents the better model.

D. Estimation of the Best Interaction Model Generated by PSO and CLPSO Using or and 95% CI in Breast Cancer

Table II shows the best interaction model in the 2-way to 5-way SNP-SNP interaction models. These results reveal that the total number of cases exceeds the total number of controls; it means that all models are a risk association in breast cancer. The right side in Table II shows the evaluated effects using odds ratio, 95% CI, and p-value. In PSO, the OR values in 2-way to 5-way SNP-SNP interaction models show the range of 1.109 to 1.459, and the 95% CI of OR is in the range of 0.89 to 2.28. Only the 2-way SNP-SNP interaction model shows the statistically significant (p-value < 0.05). However, in CLPSO, the OR values in 2-way to 5-way SNP-SNP interaction models show the range of 1.153 to 1.391 and the 95% CI of OR is in the range of 1.05 to 1.79. All of the SNP-SNP interaction models (2-way to 5-way) show significant OR values (p-value < 0.050).

E. Comparison of PSO and CLPSO for the Interaction Model of Breast Cancer

The results represent the SNP-SNP interaction model identified by CLPSO has a better p-value than the model identified by PSO. PSO seems to provide a better OR value in the four-way SNP-SNP interaction model, however, the p-value shows the model does not statistically significant for breast cancer. The computational complexity of CLPSO is evaluated by objective function computation. Let n generation is implemented in a test, the computational complexity of PSO is O(n) which is the big-O in complexity analysis. The

complementary-logic only adds an updated function, i.e., equation 10. Thus, the computational complexity between PSO and CLPSO is the same, but CLPSO is superior to PSO in terms of identifying the best SNP-SNP interaction model.

IV. CONCLUSION

In this study, a novel method, CLPSO, is proposed to identify the statistically significant SNP-SNP interaction models between related genes of breast cancer. These models can be used to analyze disease susceptibility and provide the information of SNPs located in the genes and their associated pathways. We used the huge number of simulated data to test the methods of PSO and CLPSO, the results indicate CLPSO can robust to search the statistically significant models with the difference value between SNPs of genes amongst the huge number of SNPs involved in real data sets.

REFERENCES

[1] I. C.Gray, D. A. Campbell, and N. K. Spurr, “Single nucleotide polymorphisms as tools in human genetics,” Human Molecular

Genetics, vol. 9, pp. 2403-2408, 2000.

[2] C. R. Cantor, “The use of genetic SNPs as new diagnostic markers in preventive medicine,” Longevity Health Sciences: The Phoenix

Conference, vol. 1055, pp. 48-57, 2005.

[3] D. A. Roses, A. M. Saunders, Y. Huang, J. Strum, K. H. Weisgraber et

al., “Complex disease-associated pharmacogenetics: drug efficacy, drug

safety, and confirmation of a pathogenetic hypothesis (Alzheimer's disease) ,” Pharmacogenomics Journal, vol. 7, pp. 10-28, 2007. [4] H. W. Chang, C. H. Yang, C. H. Ho, C. H. Wen, and L. Y. Chuang,

“Generating SNP barcode to evaluate SNP-SNP interaction of disease by particle swarm optimization,” Computational Biology and

Chemistry, vol. 33, pp. 114-119, 2009.

[5] C. H. Yang, H. W. Chang, Y. H. Cheng, and L. Y. Chuang, “Novel generating protective single nucleotide polymorphism barcode for breast cancer using particle swarm optimization,” Cancer Epidemiol, vol. 33, pp. 147-154, 2009.

[6] L.Y. Chuang, H. W. Chang, M. C. Lin, and C. H. Yang, “Chaotic particle swarm optimization for detecting SNP-SNP interactions for CXCL12-related genes in breast cancer prevention,” European Journal

of Cancer Prevention, vol. 21, pp. 336-342, 2011.

[7] H. W. Chang, L. Y. Chuang, C. H. Ho, P. L. Chang, and C. H. Yang, “Odds ratio-based genetic algorithms for generating SNP barcodes of genotypes to predict disease susceptibility,” OMICS-a Journal of

Integrative Biology, vol. 12, pp. 71-81, 2008.

[8] C. H. Yang, L. Y. Chuang, Y. J. Chen, H. F. Tseng, and H. W. Chang, “Computational analysis of simulated SNP interactions between 26 growth factor-related genes in a breast cancer association study,”

OMICS-a Journal of Integrative Biology, vol. 15, pp. 399-407, 2011.

[9] H. W. Ressom, R. S. Varghese, M. Abdel-Hamid, S. A. L. Eissa, D. Saha et al., “Analysis of mass spectral serum profiles for biomarker selection,” Bioinformatics, vol. 21, pp. 4039-4045, 2005.

[10] L. E. Mechanic, B. T. Luke, J. E. Goodman, S. J. Chanock, and C. C. Harris, “Polymorphism interaction analysis (PIA): a method for investigating complex gene-gene interactions,” BMC Bioinformatics, vol. 9, pp. 146, 2008.

[11] P. D. P. Pharoah, J. Tyrer, A. M. Dunning, D. F. Easton, B. A. J. Ponder

et al., “Association between common variation in 120 candidate genes

and breast cancer risk,” PLoS Genetics, vol. 3, pp. 401-406, 2007. Mei-Lee Hwang received the MS degree from the Department of Chemistry, University of Memphis in 1987 and the PhD degree from the Department of Chemistry, University of Cincinnati in 1992. She is an associate professor in the Department of Chemical Engineering and Institute of Biotechnology and Chemical Engineering at I-Shou University, Kaohsiung, Taiwan. Her main areas of research are physical chemistry and molecular spectrum.

(3)

Yu-Da Lin received the MS degree from the Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Taiwan, in 2011. He is currently working toward the PhD degree in the Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Taiwan. He has rich experience in computer programming, database design and management, and systems programming and design. His main areas of research are bioinformatics and computational biology.

Li-Yeh Chuang received the MS degree from the Department of Chemistry, University of North Carolina in 1989 and the PhD degree from the Department of Biochemistry, North Dakota State University in 1994. She is a professor in the Department of Chemical Engineering and Institute of Biotechnology and Chemical Engineering at I-Shou University, Kaohsiung, Taiwan. Her main areas of research are bioinformatics, biochemistry, and genetic engineering.

Cheng-Hong Yang received the MS and PhD degrees in computer engineering from North Dakota State University in 1988 and 1992, respectively. He is a professor in the Department of Electronic Engineering at the National Kaohsiung University of Applied Sciences, Taiwan. His main areas of research are evolutionary computation, bioinformatics, and assistive tool implementation.