• 沒有找到結果。

The two region of total of 30 SNPs we selected , first region contain the disease locus D length about 260 kb and the other is away from locus D its length about 340 kb.

Locus D has a direct effect on RA risk but a low allele frequency. Distance between the two regions is above 12662 Kb, 27 CM (centi-Morgan).With this distance we can say the null region can not affect the disease. Our goal was getting power from the causal region to see which match is the best and the null region can get type 1 error to compare. We note that if the method has the large power in the causal region and less power in the null region, we can say it is the best method we want to select. First we used the character of pair-wise LD plot to understand different of the two regions and difference number of tag SNPs using the four methods. We used the Haploview software to get the pair-wise LD plot, and show the four LD plots of the two regions in Figure 2 to 5.

The color was more deep and the LD more high. We saw that the second region (Figure 3, Figure 5) has high LD than first region (Figure 2, Figure 4), and it would effects the proportions of tag SNPs selected from 30 SNPs. The proportions of tag SNPs selected have no difference between the four samples sizes for each of 4 tagSNP’s methods (Table 1, Table 2). Here we considered two situations in methods of hapotype + CI and haplotype+SSLD. The idea of first situation is in terms of tag SNPs, the two methods of tag SNPs haplotype+CI and haplotype+SSLD selected tag SNPs in blocks and all remaining SNPs outside of blocks (Table 1). Second idea is in terms of association that selected tag SNPs only in blocks (Table 2). The first situation was certain to have more tag SNPs than second, but whether it was useful we would analysis later.

Figure 2. LD plot used 1500 cases and 1500 controls, CI-blocking in low LD region

Figure 3. LD plot used 1500 cases and 1500 control data CI-blocking in high LD region

Figure 4. LD plot used 1500 cases and 1500 control data SSLD-blocking in low LD region

Figure 5. LD plot used 1500 cases and 1500 control data SSLD-blocking in high LD region

All methods of tag SNP have more SNPs in the causal region than the null region in first situation. The reason is the causal region having low levels of pair-wise LD than the null region. But in second situation, method of haplotype+CI has adverse result because blocks in high LD become large and having more SNPs (Figure 2, Figure 3).

As a result of haplotype+SSLD that selected almost all SNPs as tag SNPs, so it still has high proportion of tag SNPs in high LD region (Figure 4, Figure 5).

In first situation, the methods as tagger and haplotype+1-Block has fewer markers than the methods as haplotype+CI and haplotype+SSLD, possible they were not restricted to full representation of each LD block. And the two methods, haplotype+CI and haplotype+SSLD almost have the same number of tag SNPs (Table 1). In second situation, haplotype+CI has the least number of tag SNPs, because tag SNPs were selected only in blocks and the block sizes of haplotype+CI is smaller than haplotype+SSLD (Table 2).

We can see the two methods of haplotype+1-Block and haplotype+CI selected the less tag SNPs. If using these methods to do analysis can use less costs. But whether these methods have the same power to detect disease is the next step we want to do.

The goal of our study was to compare different combinations of SNP tagging methods and association methods on a simulation dataset to select the best match.

First we selected a random sample of 50 individuals from one of an affected sibling pair from 1500 families, and 50 individuals from unaffected 2000 families. Then we used the same way to select a subject of 100 case and 100 controls. The methods of tag SNP used these four samples to analysis.

Table 1. Values are number of tag SNPs divided total number of SNPs, and average from 100 replication selecting tag SNPs in blocks and not in blocks.

Region Control 50 Case 50 Control 100

Case 100 tagger Locus D 0.587333 0.617 0.581667 0.627 Locus D’ 0.506333 0.510333 0.510667 0.521 haplotype-CI Locus D 0.699333 0.683333 0.678667 0.66733

Locus D’ 0.654667 0.667667 0.624 0.62633

haplotype-SSLD Locus D 0.701 0.696667 0.692333 0.67933 Locus D’ 0.624333 0.643333 0.615667 0.62867 haplotype-1-Block Locus D 0.417667 0.435333 0.445667 0.464

Locus D’ 0.38 0.385 0.411667 0.41933

Table 2. Values are number of tagging SNPs divided total number of SNPs, and average from 100 replication selecting tagSNPs in blocks.

Region Control 50 Case 50 Control 100

Case 100 tagger Locus D 0.587333 0.617 0.581667 0.627 Locus D’ 0.506333 0.510333 0.510667 0.521 haplotype-CI Locus D 0.299333 0.259 0.310333 0.25267

Locus D’ 0.384667 0.384667 0.413 0.421

haplotype-SSLD Locus D 0.632333 0.638 0.613 0.60433 Locus D’ 0.548667 0.569333 0.534667 0.54833 haplotype-1-Block Locus D 0.417667 0.435333 0.445667 0.464

Locus D’ 0.38 0.385 0.411667 0.41933

Second we did association study by using a mix subject from sample of 500 cases and 500 controls random from populations. Another subject is from 200 case sand 200 controls. The reason was that we want to understand whether association sample sizes would affect the power of detecting disease. Further when doing haplotype association study we used three blocking methods, Gabriel blocking, SSLD

blocking and third is using 1 block for all region. Then we defined Bonferroni-corrected p-value let α 0.05 / (the number of haplotypes + the number = of tag SNPs outside blocks) as using the multi-SNP test; α 0.05 / (the number of = tag SNPs) as using the single-SNP test. Then we used 100 repeated random samples and estimated power with the proportion of replicates having p-value less than type 1 error. Result Show in Table 3 to 6.

When doing tag SNP methods there was no significant difference using sample size of 50 or 100 in the same Tag-Association match and powers have almost the same degree. So we thought that sample size of 50 or 100 we selected would not affect the result of power. Samples size of 50 was enough to achieve our goal to decide which Tag-Association match methods was better. But it still should be depend on the complicacy of disease gene.

We divided two situations when we selected tag SNPs for methods of haplotype+CI and haplotype+SSLD. Then we advanced some common points in the two situations. In association methods the three multi-SNP methods, multi-SNP+1-Block, multi-SNP+SSLD and multi-SNP+CI have large power then the single association test methods. It have biggish gap between multi-SNP+1-Block, multi-SNP+SSLD and single-SNP test. Although multi-SNP+CI did not have such large gap like multi-SNP+1-Block or multi-SNP+SSLD, it still was a little bigger than single-SNP test. From this we though that the method of multiple SNPs test was a powerful reason to affect the level of power. And in multiple SNPs test the approach of blocking was important. There was no difference between the two situations using haplotype+SSLD in four samples of tag SNP. It was a consistence method.

Then we would consider the difference power of all methods in the 4 samples of

there were two significant differences at the combination of tagger and multi-SNP+SSLD and at the combination of haplotype+1-Block and multi-SNP+SSLD. They had large power in case and the difference about 30 % to 40

% (Table 3.a). We reduced the association sample sizes to 200 cases-200 controls, we found that tagger and haplotype+1-Block had large power with three multiple SNP tests (Table 4.a). We had power would increase when tag SNP selection sample contained case, because cases would be more likely to carry disease haplotype. The reason maybe disease locus D was a rare allele, but it was no act on structured methods, hapotype+CI and haplotype+SSLD. So we got a result that the method of tag SNP using block was not affected by kinds of sample but affected by association sample size, no blocking method was affected by both samples and no matter which method of tag SNP using single-SNP test had the same result.

Then we would see different result in the second situation. When using the tag SNP method of haplotype+CI which were adverse result in sample of case-control.

We could see that power in control was larger both in association sample size 500 or 200 at three multiple SNPs tests. The reason maybe tag SNPs selected by haplotype+CI got much information in control and almost in blocks not in SNPs outside blocks. These distinctions provided information for us to do decide later.

Despite using any association methods the sample size of 500 has large power the 200. Although it had a large power in sample size = 500, but it also cost much. So it was an important thing to find a balance between sample size and power. We would consider the variation of power of every method in the null region after be comparing the causal region. Because a good method was in addition to have higher power in the causal region and must have lower power in the null region. We could see multiple SNP test and single SNP test had the same level of lower power (Table 5, Table 6).

相關文件