Published by Oxford University Press on behalf of the International Epidemiological Association International Journal of Epidemiology 2005;34:711–714
Letters to the Editor
Case recruitment in genetic association studies: larger sample size or
greater homogeneity?
From CHUEN-FEI CHEN and WEN-CHUNG LEE
711
Graduate Institute of Epidemiology, College of Public Health, National Taiwan University, No. 1, Jen-Ai Road, 1st Section, Taipei, Taiwan.
E-mail: [email protected]
In the previous example of cardiovascular diseases, suppose that the researcher has at his/her disposal, 300 patients with myocardial infarction (n1= 300), 300 patients with stroke (n2= 300), and 600 control subjects (n0= 600). A literature search revealed that the allele frequencies are p1= 0.158,
p2= 0.149, and p0= 0.095, respectively.1Since
,
this researcher should combine both the patients with myocardial infarction and the patients with stroke as the case group. As for the example of prostate cancer, suppose that the researcher has at his/her disposal, a total of 100 prostate cancer cases, 20 of which were diagnosed at age55 (n1= 20), and the remaining 80, at age55 (n2= 80). He/she also has 100 control subjects (n0= 100) at his/her disposal. The a priori allele frequencies are p1= 0.430, p2= 0.090, and p0= 0.017, respectively.2–4Since
,
this time the researcher should discard the late-onset prostate cancer cases and focus his/her attention on the comparison between the 20 early-onset prostate cancer cases and the 100 control subjects.
References
1Helgadottir A, Manolescu A, Thorleifsson G et al. The gene encoding
5-lipoxygenase activating protein confers risk of myocardial infarction and stroke. Nature Genet 2004;36:233–39.
2Carter BS, Beaty TH, Steinberg GD, Childs B, Walsh PC. Mendelian
inheritance of familial prostate cancer. Proc Natl Acad Sci USA 1992;89:3367–71.
3Grönberg H, Damber L, Damber JE, Iselius L. Segregation analysis of
prostate cancer in Sweden: support for dominant inheritance. Am J
Epidemiol 1997;146:552–57.
4Schaid DJ, McDonnell SK, Blute ML, Thibodeau SN. Evidence for
autosomal dominant inheritance of prostate cancer. Am J Hum Genet 1998;62:1425–38.
doi:10.1093/ije/dyi086 Advance Access publication 15 April 2005 = 0.472 =0.177(20+80) 80 1/(20+80)+1/100 1/20+1/100 20 80 0.0900.017 0.4300.017 = 0.633 =0.857(300+300) 300 1/(300+300)+1/600 1/300+1/600 300 300 0.1490.095 0.1580.095 Sirs—To study the genetic components of complex human
diseases, researchers nowadays are relying heavily on the case–control association designs. However, they are often in a dilemma as to whether they should recruit as many affected cases as they can in a study (even though the affected cases recruited in this way may constitute a heterogeneous group), or whether they should instead insist on a stricter case definition to achieve greater homogeneity (even though this may result in a much smaller sample size).
For example, suppose that a researcher intends to map the susceptibility gene(s) of cardiovascular diseases. Should he/she recruit both the patients with myocardial infarction and the patients with stroke as the case group in the study in order to increase the sample size? Or for the sake of homogeneity, should he/she recruit the patients with myocardial infarction (or stroke) only? As a second example, suppose that another researcher wishes to locate the susceptibility gene(s) of prostate cancer. He/she knows that the genetic contributions are larger in the earlier-onset prostate cancer cases than in the later-onset ones. However, the earlier-onset cases are limited in number. How should this researcher choose between greater homogeneity (only the early-onset cases are recruited) and larger sample size (both the early-onset and the late-onset cases are recruited)?
We propose a simple rule to help resolve the aforementioned dilemma. Assume that there are two types of affected cases. The allele frequency for the first (second) type of cases is denoted as
p1(p2), and the number of subjects, denoted as n1(n2). We have a control group with allele frequency, p0, and the number of subjects, n0. Without loss of generality, we assume that the first type of affected cases is more genetically determined than the second, i.e. . The proposed decision rule is that if
,
both types of affected cases should be recruited; otherwise, only the first type should be recruited. Case recruitment using this rule will achieve a higher statistical power (a proof is available from the authors). Note that the values of the allele frequencies,
p2, p1, and p0should be obtained from a literature survey, or be estimated in a pilot study.
p2p0 p1p0 (n1n2) n2 1/(n1n2)1/n0 1/n11/n0 n1 n2 p1p0 p2p0