• 沒有找到結果。

SNP barcodes generated using particle swarm optimization to detect susceptibility to breast cancer

N/A
N/A
Protected

Academic year: 2021

Share "SNP barcodes generated using particle swarm optimization to detect susceptibility to breast cancer"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

Vol.5, No.3, 359-367 (2013) Natural Science doi:10.4236/ns.2013.53049

SNP barcodes generated using particle swarm

optimization to detect susceptibility to breast cancer

Cheng-Hong Yang1, Yu-Da Lin1, Li-Yeh Chuang2*, Hsueh-Wei Chang3*

1Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan

2Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung,

Tai-wan; *Corresponding Author: chuang@isu.edu.tw

3Department of Biomedical Science and Environmental Biology, Center of Excellence for Environmental Medicine, Cancer Center,

Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan;

*Corresponding Author: changhw@kmu.edu.tw

Received 31 October 2012; revised 30 November 2012; accepted 14 December 2012

ABSTRACT

Considerable research has been devoted to in-vestigating variations in disease susceptibility using SNPs associated with the individual co- occurrence of single nucleotide polymorphisms (SNPs) in genetic and phenotypic variability. Without the raw genotype data, these associa-tion studies are difficult to conduct and often omit SNP interactions, thus limiting their reli-ability and potential applicreli-ability. In this study, we apply a particle swarm optimization (PSO) algorithm to detect and identify the best protec-tive SNP barcodes (i.e., SNP combinations and genotypes with a maximum difference between cases and controls) associated with chronic di-alysis patients. SNP barcodes containing differ-ent numbers of SNPs were computed. We evalu-ated the combined effects of 27 SNPs relevalu-ated to nine published epigenetic modifier-related genes on breast cancer. Eleven different SNP combi-nations were found to be protective associated with the risk of breast cancer (odds ratio, OR < 1.0; p-value < 0.05). The results suggest that SNPs 1 and 2 (gene BAT8), 9, 10, 11 and 13 (DNMT3A), 20 and 21 (EHMT1), 24 (HDAC2), 25 (MBD2), and 27 (SETDB1) are statistically very significant and that there may be interactive ef-fects that play a role in the prevalence of breast cancer. A PSO-based on the Chi-Square test process allowed us to quickly identify the sig-nificant SNP combinations in a multi-locus as-sociation analysis, and then further detect in-teractive effects on complex genotypes amongst the SNPs. The PSO algorithm is robust and pre-cisely identifies the best protective SNP bar-codes. It can identify potential combined epi-genetic modifier-related genes together with the

SNP barcodes that were deemed protective against breast cancer by in silico analysis. Keywords: Single Nucleotide Polymorphism;

Particle Swarm Optimization; SNP-SNP Interaction; Breast Cancer

1. INTRODUCTION

Genome-wide association studies (GWAS) involve a vast amount of single nucleotide polymorphism (SNP) data from several genes which is associated with geno-type frequencies between cases and controls and can be used to investigate disease susceptibility. Studies of gene variations associated with hereditary phenotypes are be-coming increasingly popular and contribute to the detec-tion of significant effects on disease susceptibility [1-6].

A total of 27 SNPs from nine epigenetic modifier-re- lated genes (BAT8, DNMT1, DNMT3A, DNMT3B, EHMT1, HDAC2, MBD2, MTHFR and SETDB1) were selected to investigate their association with breast can-cer [7]. Previous research only considered the analysis of the effect of individual SNPs, but investigating their as-sociation with SNPs can provide deep insight into dis-ease susceptibility. Although the individual role of these epigenetic modifier-related genes was addressed in [7], the combined effect of gene (or SNP) interactions in re-lation to breast cancer was not addressed. This study is similar to many association studies in that only genotype frequencies were published without supplementary geno-typic raw data.

Analysis of SNP-SNP interactions is used to investi-gate polygenic diseases. However, it remains a challenge to collect large-scale combinations of SNP data and ana-lyze the possible SNP-SNP interactions. The simultane-ous evaluation of multiple SNPs generates many possible combinations of alleles in SNP-SNP interactions. The

(2)

C.-H. Yang et al. / Natural Science 5 (2013) 359-367 360

Copyright © 2013 SciRes. Openly accessible athttp://www.scirp.org/journal/ns/

possible combinations of SNP interactions between cases and controls is estimated to be

(

,

)

3M ! !

(

)

! 3M

C N M ∗ =NM NM  ∗ , where N is the total

number of SNPs or factors, and M is the selected number of SNPs. Machine learning and data mining methods are widely used in GWAS data analysis, but current methods aren’t robust enough to simultaneously evaluate the complex interactions for all tested SNPs, though some computational approaches have been developed to ex-amine epistasis in family-based and case-control associa-tion studies [8-16].

We hypothesize that interactions between the poly-morphisms of epigenetic modifier-related genes may have a synergistic or non-additive effect on the patho-genesis of a disease and can explain differences in dis-ease susceptibility. We propose the PSO method to gen-erate SNP barcodes of genotypes to predict disease sus-ceptibility and evaluate risk factors. The best combina-tion of SNPs with genotypes can be verified by deter-mining its risk factor in terms of odds ratio and confi-dence intervals. We systematically evaluated the joint effects of 27 SNP combinations of nine related genes involved in breast carcinogenesis. The SNP barcodes generated by the PSO algorithm were statistically evalu-ated by the odds ratio (OR) to predict dialysis suscepti-bility in breast cancer.

2. METHODS

We introduce a particle swarm optimization method that generates the best SNP barcodes to combine SNPs with their corresponding genotypes. A characteristic of PSO is its fast convergence, allowing for the quick iden-tification of optimal solutions in a wide solution space, meaning that we can look for the optimal protective SNP barcodes.

2.1. Particle Swarm Optimization

PSO is an efficient evolutionary computation learning algorithm developed by Kennedy and Eberhart [17] to describe an automatically evolving system through the simulation of the social behavior of organisms, e.g., the social behavior of birds in a flock or fish in a school. PSO was designed for use in practical applications and simulates social behavior based on information exchange. Within a problem space, each potential result can be garded as a vector in a swarm, where the vector is re-ferred to as a particle. Each particle uses its own memory and knowledge gained from the swarm as a whole to find an optimal solution. Each particle is evaluated by an ob-jective function to detect good experience, and particles can share the experience amongst the swarm. These ex-periences can be inform the search direction to lead the swarm toward the optimal solution. This superior strat-

egy effectively mines the optimal regions of complex search spaces. The basic elements of PSO are as follows:

1) Particle: In this study each particle can be regarded as a problem solution.

2) Population: A swarm population consisting of n particles.

3) Particle position, xi: Each candidate solution can be

represented by a D-dimensional vector; the ith particle can be described as xi =

(

xi1,xi2, , xiD

)

, where xiD is

the position of the ith particle with respect to the Dth di-mension. Each dimensional vector in the particle position is defined by the number of selected SNPs and the cor-responding genotypes for the associated SNPs.

4) Particle velocity, vi: The velocity of the ith particle is

represented by , where viD is the

velocity of the ith particle with respect to the Dth dimen-sion. The new locations of particles are chosen by adding

vi to the coordinate of the particle position xi; PSO

oper-ates this process by adjusting vi. In addition, the velocity

of a particle is restricted within

(

1, 2, , i i i iD v = v vv min,

)

[

max

]

D V V .

5) Inertia weight, w: The inertia weight is used to con-trol the impact of a particle’s previous velocity on its current velocity. This control parameter affects the trade- off between the particle’s abilities for exploration and exploitation.

6) Individual best value, pbesti: pbesti is the position of

the ith particle with the highest value of the objective function during a given iteration. It can be regarded as a best current solution for the ith particle.

7) Global best value, gbest: The best position of all

pbest particles is called the global best gbest. It can be

regarded as the best current solution of SNP barcodes in all particles.

8) Termination criteria: The process is stopped after the maximum allowed number of iterations is reached.

The PSO procedure is shown in Figure 1 and can be divided into the following steps: 1) initialization of parti-cles; 2) particle evaluation with an objective function; 3) selection of the particles’ pbest and gbest; and 4) updat-ing of the particles’ velocity and position. These proce-dures are repeated in successive iterations until the ter-mination conditions are reached.

2.2. Encoding Schemes

In PSO, each particle was designed in a format that enabled us to express a particular amount of SNP and genotype combinations. A particle is defined in a vector that consists of the number of selected SNPs and their corresponding genotypes; SNPs cannot be repeatedly selected. In this paper, we define the SNP barcode to represent a solution with selected SNPs and their corre-sponding genotypes. The particle encoding can thus be represented by:

(3)

C.-H. Yang et al. / Natural Science 5 (2013) 359-367 366

account for complex SNP interactions and provides the best SNP barcode profile for predicting breast cancer cases. This suggests that the method is suitable for the systematic exploration of genome-wide SNP interactions.

6. ACKNOWLEDGEMENTS

This work was partly supported by the National Science Council in Taiwan under grant NSC100-2221-E-151-049-MY3, and NSC100- 2221-E-151-051-MY2.

REFERENCES

[1] Li, J., Humphreys, K., Darabi, H., Rosin, G., Hannelius, U., Heikkinen, T., et al. (2010) A genome-wide associa-tion scan on estrogen receptor-negative breast cancer. Breast Cancer Research, 12, R93. doi:10.1186/bcr2772 [2] Kraft, P. and Haiman, C.A. (2010) GWAS identifies a

common breast cancer risk allele among BRCA1 carriers. Nature Genetics, 42, 819-820. doi:10.1038/ng1010-819 [3] Thomas, G., Jacobs, K.B., Kraft, P., Yeager, M., Wacholder,

S., Cox, D.G., et al. (2009) A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nature Ge-netics, 41, 579-584. doi:10.1038/ng.353

[4] Meindl, A. (2009) Identification of novel susceptibility genes for breast cancer—Genome-wide association stud-ies or evaluation of candidate genes? Breast Care, 4, 93- 99. doi:10.1159/000211991

[5] Fanale, D., Amodeo, V., Corsini, L.R., Rizzo, S., Bazan, V. and Russo, A. (2012) Breast cancer genome-wide as-sociation studies: There is strength in numbers. Oncogene, 31, 2121-2128. doi:10.1038/onc.2011.408

[6] Yu, J.C., Hsiung, C.N., Hsu, H.M., Bao, B.Y., Chen, S.T., Hsu, G.C., et al. (2011) Genetic variation in the genome- wide predicted estrogen response element-related se-quences is associated with breast cancer development. Breast Cancer Research, 13, R13. doi:10.1186/bcr2821 [7] Pharoah, P.D.P., Tyrer, J., Dunning, A.M., Easton, D.F.,

Ponder, B.A.J. and Investigators, S. (2007) Association between common variation in 120 candidate genes and breast cancer risk. PLoS Genetics, 3, 401-406.

doi:10.1371/journal.pgen.0030042

[8] Chang, H.-W., Yang, C.-H., Ho, C.-H., Wen, C.-H. and Chuang, L.-Y. (2009) Generating SNP barcode to evalu-ate SNP-SNP interaction of disease by particle swarm op-timization. Computational Biology and Chemistry, 33, 114-119. doi:10.1016/j.compbiolchem.2008.07.029 [9] Yang, C.H., Chang, H.W., Cheng, Y.H. and Chuang, L.Y.

(2009) Novel generating protective single nucleotide po- lymorphism barcode for breast cancer using particle swarm optimization. Cancer Epidemiology, 33, 147-154. doi:10.1016/j.canep.2009.07.001

[10] Moore, J.H., Asselbergs, F.W. and Williams, S.M. (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics, 26, 445-455.

doi:10.1093/bioinformatics/btp713

[11] Yang, C.-H., Chuang, L.-Y., Chen, Y.-J., Tseng, H.-F. and Chang, H.-W. (2011) Computational analysis of simu-lated SNP interactions between 26 growth factor-resimu-lated genes in a breast cancer association study. OMICS: A Journal of Integrative Biology, 15, 399-407.

doi:10.1089/omi.2010.0028

[12] Yang, P., Ho, J.W., Yang, Y.H. and Zhou, B.B. (2011) Gene-gene interaction filtering with ensemble of filters. BMC Bioinformatics, 12, S10.

doi:10.1186/1471-2105-12-S1-S10

[13] Chuang, L.-Y., Chang, H.-W., Lin, M.-C. and Yang, C.-H. (2012) Chaotic particle swarm optimization for detecting SNP-SNP interactions for CXCL12-related genes in breast cancer prevention. European Journal of Cancer Prevention, 21, 336-342.

doi:10.1097/CEJ.0b013e32834e31f6

[14] Chuang, L.-Y., Lin, Y.-D., Chang, H.-W. and Yang, C.-H. (2012) An improved PSO algorithm for generating pro-tective SNP barcodes in breast cancer. PloS ONE, 7, e37018. doi:10.1371/journal.pone.0037018

[15] Steen, K.V. (2012) Travelling the world of gene-gene interactions. Briefings in Bioinformatics, 13, 1-19. doi:10.1093/bib/bbr012

[16] Yang, C.-H., Chuang, L.-Y., Cheng, Y.-H., Lin, Y.-D., Wang, C.-L., Wen, C.-H., et al. (2012) Single nucleotide polymorphism barcoding to evaluate oral cancer risk us-ing odds ratio-based genetic algorithms. Kaohsiung Jour- nal of Medical Sciences, 28, 362-368.

doi:10.1016/j.kjms.2012.02.002

[17] Kennedy, J. and Eberhart, R.C. (1995) Particle swarm optimization. Proceedings of IEEE International Confer-ence on Neural Networks, Perth, 27 November-1 De-cember 1995, 1942-1948.

doi:10.1109/ICNN.1995.488968

[18] Shi, Y.-H. and Eberhart, R.C. (1999) Empirical study of particle swarm optimization. Proceedings of the 1999 Congress on Evolutionary Computation, Washington DC, 6-9 July 1999, 1948-1950.

[19] Ratnaweera, A., Halgamuge, S. and Watson, H. (2004) Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Transactions on Evolutionary Computation, 8, 240-255.

doi:10.1109/TEVC.2004.826071

[20] Mechanic, L.E., Luke, B.T., Goodman, J.E., Chanock, S.J. and Harris, C.C. (2008) Polymorphism Interaction Analy-sis (PIA): A method for investigating complex gene-gene interactions. BMC Bioinformatics, 9, 146.

doi:10.1186/1471-2105-9-146

[21] Lin, G.-T., Tseng, H.-F., Yang, C.-H., Hou, M.-F., Chuang, L.-Y., Tai, H.-T., et al. (2009) Combinational polymor-phisms of seven CXCL12-related genes are protective against breast cancer in Taiwan. OMICS: A Journal of In-tegrative Biology, 13, 165-172.

doi:10.1089/omi.2008.0050

[22] Smith, T.R., Levine, E.A., Freimanis, R.I., Akman, S.A., Allen, G.O., Hoang, K.N., et al. (2008) Polygenic model of DNA repair genetic polymorphisms in human breast cancer risk. Carcinogenesis, 29, 2132-2138.

(4)

C.-H. Yang et al. / Natural Science 5 (2013) 359-367

Copyright © 2013 SciRes. Openly accessible athttp://www.scirp.org/journal/ns/ 367

doi:10.1093/carcin/bgn193

[23] Briollais, L., Wang, Y., Rajendram, I., Onay, V., Shi, E., Knight, J., et al. (2007) Methodological issues in

detect-ing gene-gene interactions in breast cancer susceptibility: A population-based study in Ontario. BMC Medicine, 5, 22. doi:10.1186/1741-7015-5-22

參考文獻

相關文件

In Section 3, we propose a GPU-accelerated discrete particle swarm optimization (DPSO) algorithm to find the optimal designs over irregular experimental regions in terms of the

The main advantages of working with continuous designs are (i) the same method- ology can be essentially used to find continuous optimal designs for all design criteria and

Other advantages of our ProjPSO algorithm over current methods are (1) our experience is that the time required to generate the optimal design is gen- erally a lot faster than many

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

Shih-Cheng Horng , Feng-Yi Yang, “Embedding particle swarm in ordinal optimization to solve stochastic simulation optimization problems”, International Journal of Intelligent

It is well known that second-order cone programming can be regarded as a special case of positive semidefinite programming by using the arrow matrix.. This paper further studies

We cannot exclude the presence of the SM Higgs boson below 127 GeV/c 2 because of a modest excess of events in the region. between 115 and 127

Moreover, for the merit functions induced by them for the second- order cone complementarity problem (SOCCP), we provide a condition for each sta- tionary point to be a solution of