• 沒有找到結果。

1.2.1 Codes for RAID

The current mainstream RAID(Redundant Array of Independent Disks, a technol-ogy for massive data storage) will not be capable for recovering lost data from the

same amount of redundant data when the size of disk array becomes larger due to its erasure correcting code(ECC). For example, with the current ECC, we can recover any kind of lost data for 3 data disks when we set up 3 checksum disks in a disk array of 255 total disks. When we set up with 4 checksum disks, however, we will not recover some lost data for 4 data disks from the 4 checksum disks even if we run a disk array of only 28 total disks.

Therefore, based on the current ECC, we present an erasure correcting code, named RAIDq, to partly fix the issue. The RAIDq can recover any kind of lost data for 4 data disks from 4 checksums for a disk array of 96 total disks. It is designed with backward compatibility to the current RAID in both algorithm and redundant data and thus restricts its ability for data recovery. More importantly, the RAIDq is emphasized with its high performance which is arguably one of the strong reason for current RAID technology. The performance of the RAIDq depends on minimizing the arithmetic operations for generating checksums and exploiting high-performance instructions on modern CPUs.

Implementation with SIMD instructions is the most basic technique for high-performance software, especially for the arithmetic on binary fields. Through the application, we show the SIMD arithmetic technique facilitates the implementation of the erasure correcting code for RAIDs.

This chapter of code is based on the joint work with Bo-Yin Yang and Chen-Mou Cheng published in [CYC13].

1.2.2 MPKCs

Multivariate Public Key Cryptosystems(MPKCs), which are often touted as future-proof cryptosystems against Quantum Computers [DY08]. It is usually advertised with its high-performance implementations [CCC+09]. In this chapter, we review the Rainbow/TTS [DS05, DYC+08], a derivative of MPKC signatures, and its im-plementations. With the powerful instruction sets on modern CPUs, we show the techniques of high-performance implementation on central components of MPKCs

including evaluating multivariate quadratic polynomials and solving linear equations with Gauss eliminations.

In practice, a security system can be broken because of its implementation in-stead of the cryptography. A famous example is that some AES implementations were attacked due to the leakage on side channel information [BM06]. The side channel resilience is an essential requirement for cryptographic software. In other words, the secret data should be independent of memory access, and the program should maintain time constancy when processing secret data in a cryptographic soft-ware. Hence, the implementations of arithmetic for general multiplications may not be suitable in the cryptographic world.

Through the application of MPKCs, we show the implementations of arith-metic in fields under the cryptographic requirements. We implement the constant-time field multiplications with SIMD instructions for high-performance software and demonstrate the different strategies of constant-time implementations for various components in the MPKCs.

This chapter is base on the joint work with Wen-Ding Li, Bo-Yuan Peng, Bo-Yin Yang and Chen-Mou Cheng published in [CLP+18].

1.2.3 The Additive FFT and its Implementations

A fast Fourier transformation(FFT) is an algorithm that evaluates a polynomial at a set of particular points, which is a handy tool in many areas of computer science. The additive FFT evaluates polynomials at an additive subgroup instead of multiplicative subgroups in usual FFTs. It was developed by Cantor [Can89], Gao and Mateer [GM10], Lin, Chung, and Han [LCH14],and Lin, Al-Naffouri, and Han [LANH16]. We discuss its variant for binary fields, specifically, with respect to Cantor basis. In the variant of additive FFTs, the main algorithm is generally divided into two main subroutines which are a basis conversion (for polynomials) and one butterfly network.

For the high-performance implementation of the additive FFT, we use the

tech-niques on multiplying subfield elements to accelerate the field multiplications in the butterfly network. Although the basis conversion runs in higher complexity level, the butterfly network consumes the most computational power in practice and can be optimized by the subfield multiplications. Besides the field multiplications, we also describe the fast calculation of constants in the butterflies and the better memory access model for implementing the algorithm.

The materials about the implementations are based on joint work with Chen-Mou Cheng, Po-Chun Kuo, Wen-Ding Li, and Bo-Yin Yang. The preprint version can be found in [CCK+17].

1.2.4 Multiplying Boolean Polynomials

The last application in chapter 7 presents a method for multiplying boolean polyno-mials with the best-known performance. For this fundamental problem in computer science, it has been many types of researches, e.g., [BGTZ08, HvdHL16, CCK+17, vdHLL17] focusing on fast implementations, based on various FFTs, on modern CPUs. For practitioners, it was surprising that van der Hoeven et al. could still get a new multiplier in 2017 with a two times improvement over their previous imple-mentation [HvdHL16]. The new multiplier multiplies polynomials with a new Frobe-nius FFT [vdHL17] instead of the usual Kronecker substitution [GG13, Chap. 8].

In this chapter, we show how to cooperate the technique of Frobenius FFT with the additive FFT and result in a new algorithm for multiplying Boolean polynomials.

We also present the implementation of the new multiplier and compare the per-formance with other previous softwares. In our implementation, we perform the field multiplications with the PCLMULQDQ instruction, which is a hardware instruction for multiplying 64-bits Boolean polynomials, since we target on the multiplications in 64-bits or 128-bits fields The other techniques include the truncated additive FFT and perform the truncated FFT with linear transformations. We note at last that the application itself can be the critical component for implementing the multipli-cation in further large fields.

This chapter is based on the joint work with Wen-Ding Li, Po-Chun Kuo, Chen-Mou Cheng, and Bo-Yin Yang published in [LCK+18].

相關文件