Pseudocode of ReduceG function in modified scheme

7 P oly⁰←− ReducedRowEchelon(P oly)

8 g ←− g⁰ ∈ P oly, HT(g⁰)= HT(g)

9 end

10 end Output: G

Chapter 4 Experiment Results

In the previous chapter, we present our modification to the F4 algorithm that decreases the minimum memory requirement. We also add a parameter size to control the total mem-ory consumption of the modified F4 algorithm so that the program can run with different memory requirements. The modified F4 scheme also has a better efficiency than original F4given the same amount of memory.

Under the condition of limited memory, most of the information cannot be stored, and the calculation cannot be done in one pass. Therefore, the modified scheme needs more computation, for example, more transformation between matrix and polynomial forms. All the overhead makes the modified scheme slower than the original F4when there is enough memory. However, we will show in this chapter that the modified scheme has better time-memory product than the original F4 algorithm. This means that if we can spend a little more time and save a lot of memory. Furthermore, users can trade time for memory at various degrees, making the modified scheme extremely flexible.

In this chapter, we will show our experiment result and compare it with that of the

original F₄algorithm.

First, we implement both F₄ and the modified scheme in C++ language. Both of these two programs use the same data structures and lower-level field operations. The Polynomial Ring of the input data is GF16<X> with X = <x1, x₂, ..., x_n>.

According to the custom of multivariate cryptography, we let n be the number of vari-ables and m be the number of input polynomials. The experiment is performed on the following three different input parameters to see how the modified scheme works in the ordinary systems as well as overdetermined systems:

1. m = n + 2

2. m = 1.5n

3. m = 2n

Section 4.1 shows the experiment result of the input parameter m = n+2 with n ranging from 6 to 13. Section 4.2 shows the experiment result of the input parameter m = 1.5n with n ranging from 8 to 15. Section 4.3 shows the experiment result of the input parameter m = 2n with n ranging from 10 to 18. The last Section 4.4 presents a generalized analysis of the three different input parameters.

The experiments shown in this chapter are performed on a computer with two In-tel Xeon E5620 CPUs running at 2.40 GHz. Each processor has 128 KB of L1 cache, 1 MB of L2 cache, and 8 MB of L3 cache. The main memory is 24 GB running at 1333 MHz.

Each data point here represents the average of 20 runs.

4.1 m = n + 2

We will check the case m = n + 2 in this section. We compare the running time of F₄ algorithm from n = 8 to n = 15 with our modified scheme.

Figure 4.1: Memory curve for m = n+2 (unit:int)

First, we see from Figure 4.1 that the space complexity of F₄algorithm is indeed expo-nential by noting that the y-axis is in logarithmic scale. Here we run the F₄ program first and record the memory usage. We then try to run the modified scheme with 1/3, 1/4, and 1/10 memory. The reason why we use these ratios 1/3 and 1/4 is that we want to see that how the program performs under different memory budgets. The reason for the ratio 1/10 is that we want to see how the program performs when the memory budget is extremely low. Notice that for n = 6, instead of using 1/10-memory of F₄, we use 1/5-memory of F₄, as the memory usage of F₄at n = 6 is too small to have the enough memory to run the modified program.

From Table 4.1 and Figure 4.2, we see how the programs performs in different n and amounts of memory. In Figure 4.2, we notice that the time complexity of F₄ algorithm is again exponential, as the y-axis is in logarithmic scale, and so is the modified scheme. The

Table 4.1: Running Time for m = n+2 (unit:s)

n 6 7 8 9 10 11 12 13

F₄ 0.01 0.10 0.57 3.29 16.14 76.24 868.29 4567.00 size/3 0.05 0.14 0.62 3.38 23.43 114.76 1041.52 4956.00 size/4 0.01 0.10 0.57 3.19 25.90 119.10 1031.48 5140.00 size/10 0.05 0.25 1.00 5.10 37.67 183.95 1334.67 7578.50

Figure 4.2: Time curve for m = n+2 (unit:s)

two programs have the same time complexity.

In Table 4.1, we also see that the running time of 1/3-memory and 1/4-memory do not differ very much, as their memory usages are also similar. However, when the memory is extremely low, we see that the running time apparently becomes longer. It takes more than 2 times running time when n is small, and 1.5 times when n is larger.

In Figure 4.3, we use the time-memory product as the metric to see how the programs trade off between time and memory. We calculate the time-memory product and compare this value with that of F4 algorithm at every point. At n = 6, since the time is small (less than one second), the ratio is large, and we got unusual value here. In most cases, if we use 1/3 memory, we will need to run 1.2 times longer. For 1/4 memory, we also need around

Figure 4.3: Time * Memory curve for m = n+2

1.2 times running time, which means it takes almost the same calculating time if we do not change the memory usage much. However, for 1/10 memory, it takes 2 times longer.

We also see from Figure 4.3 that there is a small peak between n = 9 and n = 12.

This is because we do not have a good control of the size of G, and this influences the free memory we can use to check SPolynomials. However, it is not obvious since the memory usage of F4in the case m = n + 2 is large and grows very fast. When n is larger, the ratio decreases again.

Notice that the metric function can vary. If the memory is more important to the user, then it can be changed to, e.g., time · memory² or time · memory³.

4.2 m = 1.5n

We will discuss the case m = 1.5n in this section. We compare the running time of F4

algorithm from n = 8 to n = 15 with the modified scheme using 1/3, 1/4 and 1/10

memory.

Table 4.2: Running Time for m = 1.5n (unit:s)

n 8 9 10 11 12 13 14 15

F₄ 0.14 0.86 1.95 8.81 21.95 92.57 211.33 2805.24 size/3 0.14 1.05 2.43 11.71 31.38 123.24 395.33 3010.90 size/4 0.10 1.00 2.52 11.71 32.62 182.43 647.62 3392.95 size/10 0.19 0.86 4.05 19.48 60.14 358.62 976.48 4703.29

Figure 4.4: Time curve for m = 1.5n (unit:s)

From Table 4.2 and Figure 4.4, we see the curves of F₄ and the modified scheme in overdetermined systems. Obviously, the time complexity is exponential, and the exponents are similar in both schemes.

In Table 4.1, we also see that the running time of 1/3-memory and 1/4-memory cases do not differ very much except when n = 14. When the memory usage is 1/10, it takes longer executing time than m = n + 2; when n is small, it takes 2 to 3 times longer, and when n is larger, it take 1.5 to 2 times longer.

Notice that when n = 14, the ratio of running time is larger. This is because in overde-termined systems, it takes more memory to record G and P , and thus the memory for storing the matrix decreases. However, when n is larger it decreased again.

Figure 4.5: Time * Memory curve for m = 1.5n

Figure 4.5 shows the time-memory product ratio. We see that there is a larger peak at n = 14, as the ratio of time here is larger. And the ratio decreases again. The ratio of the 1/3-memory case is around 0.4, 1/4-memory, around 0.3, and 1/10, around 0.2. This is the same as m = n + 2. That is, in overdetermined systems of m = 1.5n, to use 1/10 memory, we will get 2 times longer, too.

4.3 m = 2n

We will discuss the case m = 2n in this section. These are also overdetermined systems.

We have more information here in input data than m = n + 2 and m = 1.5n. We compare

the running time of F₄ algorithm from n = 10 to n = 18 with that of the modified scheme using 1/3, 1/4 and 1/8 memory. The reason why we do not use 1/10 memory is that it is simply too small. It cannot even record all information of G and P .

Table 4.3: Running Time for m = 2n (unit:s)

n 10 11 12 13 14 15 16 17 18

F₄ 0.57 1.38 3.57 7.67 18.10 42.05 163.00 406.10 1707.86 size/3 0.57 1.33 3.43 7.24 21.81 75.10 415.95 965.43 1906.29 size/4 0.52 1.24 3.10 7.00 40.48 108.76 459.00 943.05 1919.38 size/8 0.43 1.19 5.19 18.67 69.86 223.67 569.95 1059.05 3120.95

Figure 4.6: Time curve for m = 2n (unit:s)

From Table 4.3 and Figure 4.6, we see the curves of F₄and modified scheme in overde-termined systems for m = 2n. The time complexity is exponential, and the exponents are also similar.

There is also a peak in Figure 4.7, and this peak is larger than that in m = n + 2 and m = 1.5n. However, when n becomes larger, it decreases again. And the ratio of 1/3

Figure 4.7: Time * Memory curve for m = 2n

memory is around 0.3 to 0.4, ratio of 1/4 memory, around 0.2 to 0.3, and the ratio of 1/8 memory, around 0.1 to 0.2.

4.4 Analysis

Figure 4.8: Memory curve for each case (unit:int)

Figure 4.8 shows the total memory usage of F₄ algorithm in the cases m = n + 2, m = 1.5n, and m = 2n. We see that in each of these cases, the space complexity of F₄ algorithm is all exponential. However, when there is more information in Input F , F₄ needs less memory. Solving overdetermined systems m = 2n needs less memory and has a smaller exponent in the exponential time complexity.

Since it use less memory in m = 1.5n and m = 2n, there is fewer memory stress, and less space to reduce the memory. The modified scheme does not work as well as it does in m = n + 2, since there are obvious peak. Nevertheless, it still has a small ratio of time^∗memory than F₄algorithm.

Figure 4.9: Time * Memory curve for each case for 1/10 and 1/8 memory

In our implementation, we focus on the matrix size and do not have a better control over the size of G as we have for matrix. But we do have an efficient way in control matrix size.

Chapter 5 Conclusion

We mitigate the problem of memory usage explosion in the F4 algorithm in this thesis. In Chapter 3, we can see that by modifying (1) SelectPair function, (2) Reduction step of the algorithm and (3) adding the ReduceG function into the algorithm, the modified scheme can run in memory limitation as long as it is greater than the least necessary memory (size of G plus size of P ). Moreover, the running speed of the modified scheme is not unendurable, which is supported by experiment result as shown in Chapter 4.

The modified scheme in this thesis is a practical solution to add some memory con-sumption control mechanisms to the F4 algorithm. The modified scheme now can fit in most of memory limitation. Users now may do a time-memory trade-off based on their demand and the hardware support they have.

In the modified scheme, not only the checking of SPolynomials in one Reduction loop decreases, the check steps and the transformation between matrix and polynomial forms also increase. Due to this overhead, the modified scheme is not as fast as the original F4 algorithm when there is enough memory. However, when the memory is not sufficient, the

modified scheme works better than the original F4 algorithm. The running time of original F4 algorithm is too long that to be practical under limited memory. On the other hand, the modified F4 scheme can be executed under 1/10 memory and have only 2 times running time. This achieves our goal in this thesis — let F4 can be executed on different hardware and still efficient.

Our scheme in this thesis is not perfect. Indeed, the scheme shown here is only a concept of the basic structure. There is room for improvement to make the program run faster. Users can optimize many parts of the program to improve the efficiency. For ex-ample, users may optimize the transformation between polynomials and matrices since the operation would be executed many times.

Users may also like to optimize the ReduceG function, which is not in the original F4 algorithm. Indeed, we do not have an optimized operation of reducing G in this thesis.

Users can change every step in ReduceG as long as G is minimized and does not change the head terms in G. This is an important step of the modified scheme. If we can optimize it, then it would give a great improvement on the efficiency of the modified scheme.

In the Reduction step of the modified F4 scheme, we can see that the size of the matrix will become smaller and smaller. Though we do not use this free space in the scheme in this thesis, users can try to use this extra memory to enhance the performance, like trying to check the divisibility of monomials next to head terms, which has the highest probability to become head terms of new remainders. Efficient use of this memory would help the performance a lot.

There are also some applications of the modified scheme that can be done in the future.

Having the advantage of less memory consumption, the modified scheme can be ported into computation systems with limited memory like FPGA. We even can port the program

into a parallel or distributed system that has only very little resource. For example, we can do the design — select some pairs and assign the Reduction step to a node, which only needs a small amount of computation resource, and passes the set of new remainders R to the main program.

Bibliography

[1] T. Becker, H. Kredel, and V. Weispfenning. Gröbner bases: a computational approach to commutative algebra. Springer-Verlag, London, UK, 0 edition, 4 1993.

[2] B. Buchberger. A theoretical basis for the reduction of polynomials to canonical forms.

SIGSAM Bull., 10(3):19–29, 1976.

[3] Bo-Yin Yang and Jiun-Ming Chen. All in the xl family: Theory and practice. In ICISC, pages 67–86, 2004.

[4] Nicolas Courtois, Nicolas Courtois, Er Klimov, Jacques Patarin, and Adi Shamir. Effi-cient algorithms for solving overdefined systems of multivariate polynomial equations.

IN ADVANCES IN CRYPTOLOGY, EUROCRYPT’2000, LNCS 1807, 1807:392–407, 2000.

[5] Jean-Charles Faugère. A new efficient algorithm for computing gröbner bases (f4).

Journal of Pure and Applied Algebra, 139(1-3):61–88, 1999.

在文檔中使用較少記憶體的F4演算法 (頁 49-0)