Probabilistic properties of mutually independent AMACs

Chapter 3 Definitions

3.6 Probabilistic properties of mutually independent AMACs

If an AMAC is mutually independent, then the probabilistic properties of an AMAC tag changes can decide the properties of AMAC. We denote the probabilistic properties of an AMAC tag changes as PA, thenδt can be written as a binomial distribution of PA.

Where PA can be written as a function ofδm, PA will increase strictly when δm increase as the distance-preserving property. Two different values ofδm will produce two different bi-nomial distributions.

Chapter 4 Our AMAC Algorithm

Our AMAC is a probabilistic checksum calculated by using pseudo-random permutation, masking via a modulo sum operation, and MODE function, such that a small difference be-tween the two messages tends to result in a small difference bebe-tween their AMACs. N is the symbol size of messages, for any messages we can change N easily depending on the verifier.

For N=2, the N-ary AMACs reduces to the binary AMACs where modulo sum operator reduc-es to XOR operation and MODE function reducreduc-es to MAJORITY function.

Let m be the input N-ary message of length l_m. The ith element in the message is denoted as m(i). Given a secret key k generated by K and a pseudo-random number generator PRG. As with conventional MACs, the length of AMACs L is typically chosen in the range 128≦ L ≦ 1024 bits. We compare different AMACs with the same length L, or we say an AMAC is better than the other if they have the same properties while one has shorter L.

First, the row data m with length l is input into authentication tag generation, random sampling is taken to reduce the size of m from l to L*H. After re-formatting, the matrix M is masked by random matrix P generated with key k. The matrix Z then input to the tag function column by column, the output after quantization is the final AMAC tag with L bits.

N-ary message of length

Figure 3. The flow chart of our AMAC scheme

4.1 Initialization

Verifier and owner share the secret key k, k is input to the Pseudo-Random Generator (PRG) as a seed. The output of PRG must be available to both verifier and owner. PRG is used repeatedly as a source of N-ary pseudo-random numbers.

4.2 Feature extraction

A feature vector that represents the media content extracted from the original message and hashed into a small digest. The digest is then signed by a standard digital signature algo-rithm. Since only the semantic information is extracted for authentication, the incidental noise can be tolerated. Different features could be used to represent the content of the im-age such as edge information, DCT coefficients, and color or intensity histogram, histogram feature was used in our AMAC.

In our AMAC, only the error number can detect, not the perceptual data error. If the at-tacker changed the data with the amount of errors that blows the threshold, he will not be detected by our AMAC since the amount of errors is acceptable. In our AMAC, the error posi-tion and error distance are not measured in the tag funcposi-tion. The histogram feature of data only considers the number of errors. To enhance our AMAC for detecting attacker, prepro-cessing the multimedia data to extract the perceptual feature is helpful. There are many works that extract different types of multimedia data features, we make the assumption that the extract features are suitable for further histogram feature extraction, which means the errors in features of multimedia are not location and distance correlated. Thus, we can apply feature extraction of the type of multimedia data and then apply our AMAC, the final AMAC can detect the attacker.

4.3 Random sampling

For the reason that the decrease of accuracy of AMAC is not as much as the decrease of proportion of message which take part in the computation of AMAC, so we use random sampling to reduce the computation.

We use m_old to denote the original message with length l, L is the length of the tag, and namely, the tag has L symbols. We sample L*H symbols from the original message by using PGR. The other message symbols are not taking part in the computation of AMAC.

The PRG is used to form a sample table such that each element in the message matrix and in the sample table forms a new matrix accordingly. The verifier and the message sender use the shared key k for PRG. The purpose of the pseudo-random sampling is to not only destroy any existing spatial correlation within the neighboring elements but also enhance the securi-ty against attack.

4.4 Masking

The N-Nary message of length L*H, denoted asm(m(0),m(1),...,m(LH1), Then the message re-formatted into a matrix, denoted as

Let P be the pseudo-random L*H matrix generated from PRG. The matrix M is then masked by a modulo N operator with the pseudo-random matrix P, element by element. Denote the masked matrix M as M=(M+P)N, where mij=(mij+pij) module N.

The modulo operation leads to the variables , which are independent of each other and unbiased whenever the samples {pij} are mutually independent and unbiased which means they obey a discrete uniform distribution on {0,1,…,N-1}.

m(0)m(1)…..…m(L-1) m(L)m(L+1)…m(2L-1) . . . . . . . .

m(LH-L)………m(LH-1)

4.5 Feature extraction: Tag function

After random sampling and masking, then the tag t of length L symbols is computed by matrix M, t=Tag(M). Because of random sampling, we can simply divide M into rows to com-pute each tag symbol without permutation. Each tag symbol is comcom-puted by ti=Tag(Mi), Mi={mi, mL+i,…, mL(H-1)+i}, thus each tag symbol is mutually independent.

MODE

The MODE is defined as the most common value in a set. If a “tie” occurs, the MODE op-eration breaks the tie by comparing the adjacent values.

Example:

MODE(0, 1, 1, 1)=1, MODE(1, 3, 0, 3, 2)=3

MODE2

The MODE2 is defined as appearance frequency of the most common value in a set. If a

“tie” occurs, the MODE operation breaks the tie by comparing the adjacent values.

MODE2(0, 1, 1, 1)=3, MODE2(1, 3, 0, 3, 2)=2

4.6 Feature reduction: Quantization

Example:

tag t=(0, 1, 2, 3), Quantization(t, 2)=(0, 1, 0, 1)

mi=(0, 1, 2, 3, 3) Tag(mi) = ti = 3

After transmission, mi becomes to mi’=(0, 1, 2, 3, 1) Tag(mi’) = ti’= 1, where ti ≠ ti’

After quantization, ti = Quantization(ti,2) = 1,and the value of ti’ becomes to

ti’ = Quantization(ti’,2) = 1 is equal to ti, in such cases, the tag is not changed if we quantize the tag.

The effect of quantization will have the benefit of accuracy since the number of symbols is increased, but the probability of each tag symbol change is decreasing, the final AMAC accu-racy is the tradeoff of the two factors.

Figure 4. False alarm versus true positive for different quantization

From above figure 6, we can observe that with the same condition but different quantization parameter, more quantization has better accuracy in these three cases.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.2 0.4 0.6 0.8 1

MODE2 s=0.06, q=2 MODE2 s=0.06 q=4 MODE2 s=0.06 q=16

sample 3%

N,L Pa[0.03] Pa[0.04] P[0.03] P[0.04] P[0.03]-P[0.04] T

2,128*8 0.234 0.284 0.986 0.037 0.949 265

4,128*4 0.376 0.453 0.996 0.053 0.943 212

16,128*2 0.518 0.591 0.91 0.133 0.776 143

Figure 5. Different symbol size with fixed tag length

From figure 7, we use the same bit length of the tag but different symbol size without quan-tization, although small N seems to have better performance but in the case N=2 compares to N=4, the AMAC with N=2 are not significantly dominate the AMAC with N=4. This means that the AMAC is not always benefit if we quantize the AMAC symbols more. A practical way is to find the best quantization factor for the AMAC with experiments.

4.7 Verification

The resultant N-ary AMAC tag t together with the initialization data are sent along with the message m. The receiver compares the received AMAC t and the AMAC t’ constructed from the received message m’. The distance between two AMACs is measured by distance func-tion dt, we use Hamming distance here. Over an N-ary alphabet, the definition of Hamming distance between two vectors is the number of positions in which they differ. Although other distance functions like Euclidean distance are also taken into account, the Hamming distance between two AMACs is effective in showing the differences between two messages.

The larger the distance between t and t’, the larger the difference between and is judged to be. We then compare the distance between t and t’, δt, with thresholds c1, c2,

if dt(t, t’) < c1, return 1

if c1≦ dt(t, t’)≦c2, don’t care if dt(t, t’)>c2, return 0

MAC1

MAC2 MAC1

MAC generation

Data Data

Similarity comparison Key Key

Decision

Figure 6. The scheme of AMAC verification

Tag Algorithm

input: the secret key k, data m, sample rate s, L, H

1 generate index set a, where ai = PRNG(k,i), i from 0 to LH - 1 2 m = (ma0, ma1,…, maLH-1)

3 generate r, where ri = PRNG(k, i + LH), i from LH to 2LH - 1 4 m = m + r

5 mi = (mi, m2H+i,…, m(L-1)H+i)

6 t = (MODE2(m0), MODE2(m1),…, MODE2(mL-1)) 7 return t

Verification Algorithm

input: the secret key k, modified data m’, tag t, thresholds c1,c2

1 t’ = Tag(m’, k) 2 δ_t= dt(t, t’)

3 if dt(t, t’) < c1, return 1

if c1≦ dt(t, t’) ≦ c2, don’t care if dt(t, t’) > c2, return 0

Chapter 5 Experiment

5.1

Experiment environment

We use an 8-bit 800*600 image as our example, the row data can be seen as a gray-scale image, and the desired AMAC length is 128 bits to 1024 bits. We use the computer with Intel Core i7 Q720 1.6GHz CPU and 4GB RAM and coding with Bloodshed Dev-C++ 4.9.9.2. The key is generated as the seed for the pseudo random number generator. In this chapter, we will first discuss the comparison for two different AMACs, then discuss several factors that affect to our AMAC and compare to the AMAC of [18]. The distance function we use for data and tags are hamming distance, which is the number of differences of each symbol, the distance of each symbol is not considered in our AMAC.

5.2

Comparison of two different AMACs

To compare two different AMACs, we compare the probability of each AMAC outputs the correct authentication given the same input data, the keys of AMACs are generated randomly.

We simulated the authentication many times with different keys and compute the probabil-ity that the AMAC make the correct authentication decision. The error added to the image are randomly for each byte, for example, a pixel with original value 100 are randomly changed to 0~255 except 100 if the error occurred to this pixel. The amount of errors added to the image is just at the edge of the acceptable number of errors or the unacceptable number of errors, we will discuss this later.

5.2.1 The length of AMAC tag

It is not difficult to see that an AMAC with longer tag take advantage over the other with shorter tag on distinguishable abilities. Suppose we compare two AMACs with the same AMAC family, one with 128-bit length tag and the other with 256-bit length tag, and the 256-bit AMAC divided into two partitions. The first 128 bits are the same as the 128-bit AMAC tag, and the later 128 bits are additional information that does not contains in 128-bit AMAC tag. Consider the worst case of 256-bit AMAC, we just drop the later 128 bits, the output result of authentication is the same as the 128-bit AMAC, thus the distinguishable abilities of the 256-bit AMAC is equal or better than the 128-bit AMAC. In addition, the long-er the tag is, the probability of tag long-error increase, or we need more efforts and redundancy to protect the tag. Thus to compare AMACs fairly, we compare them under the same length of AMAC tag.

5.2.2 AMAC distinguishable ability measurement

We measured an AMAC with the distinguishable ability which is defined as follows:

P1 = P [m’ pass AMAC verification | m’ is acceptable]

P2 = P [m’ pass AMAC verification | m’ is unacceptable]

Comparing two AMACs at the same level of P2, the AMAC with higher P1 has more advantage than the other.

Since we consider AMACs with mutually independent symbols, the properties of AMAC are decided by the probability that one AMAC symbol changes, denote as PA, is a function of δ_m, denote as fPA(δm), and fPA(δm) should be a strict increase function ofδm because more errors of message will increase the probability that an AMAC symbol changes. From figure 9, we can observe that PA is strictly increasing when the error ratio increase, where the error ratio isδm/|m|

Figure 7. The probability that one AMAC symbol changes under different error ratio 0

Since the AMAC symbols of ours are independent, the expected total number of tag sym-bol changes E(δt) can be simply calculated by LPA. And we define the accuracy of AMAC:

accuracy = P[true positive] – P[false alarm]

where P[true positive] = P[true positive |δm = c1], P[false alarm] = P[false alarm |δm = c2]

The accuracy we defined is simply. Consider the penalty describe below:

penalty = a1* P[Reject|δm < c1] + a2* P[Pass|δm > c2]

where a1, a2 are different penalty coefficients for false alarms and false positives

Since we does not know the true environment data error probability and distribution, we can not decide the coefficients of a1, a2. We assume a1 = a2, then the penalty becomes:

penalty = a1* (P[Reject|δm < c1] + P[Pass|δm > c2]) And we remove the factor of a1

penalty = P[Reject|δm < c1] + P[Pass|δm > c2] which equals to P[false alarm] + P[false positive],

1 - penalty = 1 - P[false positive] - P[false alarm] = P[true positive] – P[false alarm]

Thus, 1 - penalty = accuracy

The lower value of penalty is better, on the opposite; the higher value of accuracy is better.

5.3 Error estimation with tags

Figure 8. δm with different number of errors and threshold.

Figure 10 shows an example where the acceptable data errors is 1% and the unacceptable data error is 2%, and both 1% errors data and 2% errors data are generated 5000 times, and compute the number of result δt. Data with 2% errors are generally having higher value of δt

which is consist to our expectation.

From figure 10, there exist overlap region of 1% errors and 2% errors. This means no mat-ter what threshold we use in this case, there exists probability that the authentication deci-sion made by threshold not correct is not equal to 0.

Figure 9.δm with different number of errors and different thresholds

From figure 11, consider we set two different thresholds, threshold 1 with black line and threshold 2 with green line, the false alarm of threshold 1 = 34 is 168, the false positive is 200. The false alarm of threshold 2 = 28 is 1131, the false positive is also 13. If the penalty of a false positive is equal to a false alarm, it is suitable that we choose the threshold 1 = 34.

With the estimation of penalty of each threshold, we can decide the threshold that is most proper for a given environment.

5.4 The effect of different sample rate

To choose the appropriate sample rate is a key of our AMAC, we compare the effect of different sample rates on the difference between tags. We can see from figure 12, higher sample rate will cause the probability PA increase in both MODE1 and MODE2, and MODE2 has higher value of E(δt ) and the line changes significantly when the sample rate changes from 0.03 to 0.06. We can conclude that MODE2 is more adjustable. The adjust ability is im-portant in real environment, since different types of data will have different threshold re-quirements, and different verifier may define different pair of threshold c1,c2, the more ad-justable AMAC can have better accuracy in many situations.

Figure 10. The effect of different sample rates to tag distance

Figure 11. The accuracy comparision of different size of N

The figure 13 shows the accuracy comparison of different size of N for N=16 and N=256, different from the work [18], we find that not greater N will have better accuracy because if we increase the size of N the number of symbols take part in the tag function will decrease and the number of symbols of tag will also decrease. The work [18] didn’t consider the tag length should be fixed in bits. With experiments, we found that N=16 has better accuracy in general, so we set N=16 in our experiments.

0 0.2 0.4 0.6 0.8 1

1 11 21 31 41 51 61 71

AMAC2 N=256 AMAC2 N =16 true positive -

false alarm

threshold

5.5 Accuracy comparison of different tag function

H0: message is authentic

Figure 12. The accuracy comparison of different AMACs

We compare p1 at the same level of p2, from figure 14 we can see that AMAC with MODE2 has advantages of both sample rates 0.06 and 0.12 in the AMAC with MODE, sample rate=0.12. In figure 14 we can also examine the idea that higher sample rates not ensure better performance of accuracy. AMAC with MODE2 sample rate=0.06 is dominated AMAC with MODE2 sample rate=0.12 at every different level of p2.

41 Figure 13. The effect of threshold to accuracy

From the above figure 15, the green line shows the difference of P[0.01] - P[0.04], and the value is maximized when the threshold is equal to 11.

Figure 14. AMACs comparison for error rate c1=0.01, c2=0.03 0

0.2 0.4 0.6 0.8 1 1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 N-ary AMAC Our AMAC

threshold accuracy

5.6 Effect of different thresholds

Figure 15. Effect of different thresholds

From above figure 17, when the threshold is blowing 30, data with error rate both 3% and 4% cannot pass the authentication, and when the threshold is greater than 90, data with er-ror rate both 3% and 4% can pass the authentication almost 100% probability. When thresh-old between 30 and 90, data with error rate both 3% and 4% have a different pass authenti-cation probability, thus can distinguish the data error. We can observe that the threshold close to 55 have the largest probability difference.

0 0.2 0.4 0.6 0.8 1 1.2

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91

**AMAC,1000times,800*600,0.03**

error=0.03%

error=0.04%

5.7 Effect of quantization

Figure 16. The accuracy of different quantization

Quantize the tag symbol into binary or other n-nary, n<N, can reduce the bits of tag while not affect the accuracy of AMACs significantly. The reason is that if we quantize the tag sym-bols, the same length of tag can contain more symbols. From above figure 18, we compare different values of q, for q = 2, q = 4 and q = 16, and fix the length of tag with 256 bits. For q = 2, there are 256 tag symbols, for q = 4, there are 128 tag symbols, and for q = 16, there are 64 tag symbols. We can observe that in the same condition and fixed length of tag but dif-ferent quantization parameter, more quantization has better accuracy in this case.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.2 0.4 0.6 0.8 1

MODE2 s=0.06, q=2 MODE2 s=0.06 q=4 MODE2 s=0.06 q=16

44 Figure 17. Our AMAC with different quantization

More symbols can affect the accuracy significantly in nature. But if we fix the length of the tag, the only way to increase the number of symbols is to compress each symbol. The disad-vantage is that the probability of each tag symbol change is decreasing, from figure 19 we can observe that there exist upper bounds for both q = 2 and q = 16, this is because when the error ratio close to 1, the data is changed extremely and seems like a new data, so as the final tag. Thus, the probability of each tag symbol change is bounded by 1/q.

0 10 20 30 40 50 60 70 80 90

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

our AMAC with q=2 our AMAC with q=16 distance of

5.8 Accuracy under different condition

Figure 18. Accuracy of our AMAC

We experiment the accuracy of our AMAC of MODE2 with the acceptable error rate c1=0.01 and unacceptable error rate c2=0.02, and compare the accuracy performance under different threshold parameter of AMAC. Setting different threshold we have several pairs of (p1,p2), the second pair (0.986,0.013) shows that the AMAC can distinguish both c1 and c2

data error rate with about 98.5% of accuracy.

0.94 0.95 0.96 0.97 0.98 0.99 1 1.01

0 0.2 0.4 0.6 0.8

46 Figure 19. Different error condition

From figure 21 we can observe that compare to the error parameters with c1=0.01, c2=0.02 has better accuracy than to distinguish the error parameters with c1=0.03, c2=0.04. The rea-son is that for c1=0.01, c2=0.02, the number of errors of c1 are twice than the number of er-rors of c2. But in the case of c1=0.03, c2=0.04, the number of errors of c1 are only 1.33 times than the number of errors of c2, so we can expected the average distance of tags is about

在文檔中多媒體資料完整性的快速認證 (頁 28-0)