Differential Power Analysis - Power Analysis Attacks

Power Analysis Attacks

4.2 Differential Power Analysis

DPA attacks are based on statistics to find the correlation between the measured power con-sumption and the predicted power concon-sumption. Because the power prediction model takes into account both secret keys and processed data blocks, the statistical calculation result can

' (

)))

Figure 4.3: The flow of the DPA attack.

be used to disclose the possible secret key in the cryptographic device.

4.2.1 DPA Attack Flow

A brief flow of the DPA attack is shown in Fig. 4.3. The attacker prepares N different patterns for en-/decryption and records power traces of these patterns. These N power traces, which consist of T sample points, are firstly arranged as a N-by-T measured power array for further processing. The same plaintexts and all possible key hypotheses are used to generate predicted power values by an appropriate power prediction model. The power prediction model is a method to determine possible power consumption either in the behavior or in the algorithm level. The most common used power models are the Hamming-distance model and the Hamming-weight model. The DPA attack efficiency is largely dependent on the

divided into 16 8-bit sub-keys, which largely reduce the key space from 2¹²⁸ to 16 × 2⁸. As shown in Fig. 4.3, power values are arranged as a N-by-K array, where N is the number of plaintexts and K is the number of all possible key hypotheses. Every column of the array indicates predicted power values for all N plaintexts of a specific key hypothesis.

The final step of the DPA attack is to find the correlation by statistical calculation such as difference-of-means or correlation coefficient. For the difference-of-means, introduced by Kocher et al. in [19], power traces are divided into two groups depending on corresponding power values. The difference of the average of these two groups then indicates the correlation between power traces and power values. The correlation coefficient, which is proposed by Brier et al. [60], considers not only the means but also the variances to reduce the required the number of measurements.

4.2.2 Power Models

For DPA attack, intermediate values of each key hypothesis are mapped to predicted power values for analysis. This is a kind of power simulation of cryptographic devices and obtained power values may be in some way related to the actual power consumption. The Hamming-weight (HW) and Hamming-distance (HD) are two most often used power models and they are briefly introduced in this sub-section.

Hamming-Weight Model

The HW model is a simple model that applied if attacker have less knowledge about the cryptographic device. In case of the HW model, the power consumption is assumed to be proportional to the number of bits that are set in the intermediate value. As a result, the HW model is not suitable for describing the power consumption of CMOS circuits because the power consumption of CMOS circuits depends on the number of transitions instead of the processed value.

In practice the Hamming weight of an intermediate value is still somewhat related to the power consumption. For example, a pre-charged or pre-discharged data bus will be all 1s or

0s before processing the data. Then the Hamming weight of the processed data is related to the number of transitions.

However, it is not always the case that the data bus will be pre-charged or pre-discharged and the HW model is not sufficient for these cases. HD model then must be used for inter-preting intermediate values to predicted power values.

Hamming-Distance Model

The basic idea of the HD model is to count the number of transitions, including 1 → 0 and 0 → 1, in the circuit during a specific period. Then the number of transitions can be used to describe the power consumption of the circuit in this period. Although the number of transitions is not the actual power consumption in Watt, it is still proportional to the actual power consumption.

Two assumptions are made when using the HD model to predict the power consumption of the circuit. First, all 1 → 0 and 0 → 1 transitions contribute to the same power consump-tion. Although in practice the power consumption of different transitions is slightly different, it make the power model easier with this assumption. Second, all 0 → 0 and 1 → 1 contribute equally to the power consumption. Parasitic capacitances of wires and cells are eliminated in the HD model. The static power consumption such as internal power or leakage power are also ignored for simplicity.

Since the HD model is simple and can better describe the CMOS circuit, it is commonly used for power simulations. The power simulation based HD model can provide a rough estimation of the power consumption of real chip. The HD model can be formalized by HD(v0, v1) = HW (v0 ⊕ v1), where v0 and v1 are two successive values that appear on a data bus in different time instance.

4.2.3 Statistical Analysis

The most commonly used statistical analysis methods are the difference-of-means [19] and

by the correlation coefficients in this sub-section.

Difference-of-Means

The basic idea of difference-of-means is to determine the relationship between columns of measured power array and predicted power array as shown in Fig. 4.3. In order to check if a key hypothesis Kiis correct or not, the power trace array is divided into two sets according to power values of the key hypothesis. That is, N predicted power values for the key hypothesis Kiis used to categorize N power traces. If the power value is larger than a threshold, then the corresponding power trace is grouped to set 1. If the power value is lower than a threshold, the corresponding power trace is grouped to set 0. The means of these two sets are calculated by following equations:

where i and j are indices for key hypothesis Ki and sample point of the power trace. θh is a pre-determined threshold, which is usually set to half of the maximum of hl,i, used to group power traces. n¹iand n⁰i are the number of power traces in set 1 and set 0, respectively. The vector m¹iand m⁰iare used to denote the mean of rows in set 1 and set 0, respectively. If the key hypothesis Ki is incorrect, then the grouping is somewhat similar to a random method and these two sets would have similar means. If the key hypothesis Ki is correct, then the difference of m0iand m1i would be significant at some point in time.

The difference between m0i and m1i indicates the correlation between power values of key hypothesis Ki and power traces at some time. The difference will be significant at the point when the data and the secret key is processed. For the other time instances, the differ-ence between m0i and m1i will approach zero. If the key hypothesis is incorrect, then the difference will essentially be zero for all time instances.

Correlation Coefficient

The correlation coefficient is a common statistical method to determine the relationship be-tween data. Therefore, the relationship bebe-tween the measured power and predicted power can also be determined by this method. Brier et al. proposed the method which is suitable for the DPA attack. The equation used to find the correlation coefficient is as follows:

ri,j =

n=1(hn,i− ¯hi) · (tn,j− ¯tj) q

n=1(hn,i− ¯hi)²·PN

n=1(tn,j− ¯tj)²

, (4.2)

where hn,iis the power value of key hypothesis Kifor the n-th input pattern; tn,jis the mea-sured power at sample time j for the n-th input pattern; ¯hiand ¯tj are mean values of hn,iand tn,jfor total N input patterns. The correlation coefficient is thus an index used for disclosing the secret key. For example, if the key hypothesis is wrong, predicted power values and the measured power would be independent and the correlation coefficient would approach 0 at all sampling time. On the contrary, if the key hypothesis is correct, the correlation coefficient would be much higher at the sample time performing related operations.

在文檔中具差動功率攻擊防禦之先進加密標準核心設計與安全性分析 (頁 78-83)