Extended Genetic Algorithm for Codesign Optimization of DSP Systems in FPGAs

(1)

Extended Genetic Algorithm for Codesign Optimization of DSP Systems in FPGAs

Matthew J W Savage, Zoran Salcic, George Coghill, Grant Covic

Department of Electrical and Computer Engineering, University of Auckland, New Zealand

Abstract

The multi-objective genetic algorithm is an effective solution to the complex problem of hardware software codesign. An extended genetic algorithm (EGA) has been developed that implements a novel selection method with function scaling, adaptive crossover and mutation. This EGA is applied in a codesign optimization stage f o r data- flow oriented applications and synthesis on Field- Programmable Gate Arrays (FPGAs). Its effectiveness is illustrated on the problem of codesign of R se[f-tuning regulator considering area and pe$ormance.

1.

Introduction

Hardware software codesign problems consist of concurrent development of both software and hardware for a system utilizing tradeoffs in design parameters such as area, performance, and power.

This differs from more traditional approaches where hardware and software are designed separately and then integrated. The problem is a multi-objective problem in that usually there are a number of objectives that are of interest to a designer. These could be the area of chip usage, performance latency or throughput, and average or peak power. The objectives are often non-linear functions of design decisions, with complex dependencies among objectives. A simple example could be a device with a high throughput that takes a large amount of area and is power intensive, versus a device with a low throughput but requiring a smaller chip area and with moderate power requirements. In general, there is no optimum design for all objectives but rather a selection of designs that tradeoff various objectives.

There has been a progressive development of algorithms being applied to the problem of codesign.

One of the earliest system called SOS considered the problem as a Mixed Integer Linear Program.

This was slow and only practical for small problem [I]. Cosyma [2] and Vulcan [3] followed; both included heuristic methods as an approach to the problem. Vulcan started with an all hardware solution and moved operations to software to reduce

cost, while Cosyma started with an all software solution and moved operations to hardware to meet performance requirements. Following this, simulated annealing was used as a non-deterininistic solution to reduce the computational cost of codesign. Recently, genetic algorithms have been employed to solve the problem [4, 51. Methods of extending the genetic algorithm are under development [6, 71. Tan et a1 [6] proposed some performance measures, test problem, and gave a comparison of some of the prior work on genetic algorithms. Valenzuela [7] developed a simplified selection scheme based on replacement although the resulting scheme (SEAMO) was applied only to multiple knapsack problem.

The decision described in this paper was to select a genetic algorithm for optimization of the objective functions, after consideration of the elements within the codesign problem space. The problem space is too large for an exhaustive search. A linear programming solution would be too slow for moderate to large problem. A deterministic method would have to be either vely complex or highly domain specific. A heuristic method would have to avoid the many local optima and allow for complex interactions among variables. The problem is regarded as NP-complete and possibly intractable [SI. A method was required that could search a subset of the space in polynomial time with a good probability of success. This would require sampling the space, which was most similar to the genetic algorithm's population of individuals. Genetic Algorithms have been applied effectively on this type of problem before

[XI.

A Simple Genetic Algorithm (SGA) usually takes weights to convert a multi-objective problem to a single objective problem, and requires crossover and mutation rates to be given. The intention was to produce a genetic algorithm that could adapt itself to the problem and remove some of the limitations of the SGA's implementation while at the same time producing an algorithm that was easy to apply.

The major contribution presented in this paper is the development of an EGA suitable for codesign of DSP system in FPGAs. The principal features of this contribution are:

0-7803-8652-3/04/$20.00 0 2004 IEEE 29 1 ICF'PT 2004

(2)

I . A simple selection scheme that incorporates multiple objectives of varying probability distributions without range problems 2. A crossover scheme that is simple and yet

preserves the relative frequency of loci values.

A mutation scheme that allows mutation to be adaptive and on an individual basis A demonstration of the effectiveness of the EGA on non-trivial problem of a Self-Tuning Regulator.

3.

4.

Section 2 defines the codesign problem and introduces the elements of design flow, of which the EGA optimization is a part. Section 3 presents the extended genetic algorithm (EGA) and outlines how it is applied to codesign problem. Results of optimization on an example of a Self-Tuning Regulator are presented in Section 4. Sections 5 , 6, and 7 contain discussion, conclusions and future work to be done.

2. The Codesign Problem

The overal1 design flow implemented here is presented in Figure 1. It consists of: specification, verification, optimization, and synthesis.

Specification entails capturing the functionality of the design independently of its implementation.

Verification is where this functionality is checked, in this case, by simulation. Optimization is where the EGA is applied to map a specification to an efficient implementation. Synthesis involves generating the VHDL files that implement the system.

For the first prototype only data-flow systems were considered. These s y s t e m are typical when implementing functions that use digital signal processing algorithms. Here the initial constraint is that these designs were allowed to contain only additions, subtractions, multiplications, divisions and modulus operators for simplicity. The parameters for the design were whether to implement a component in hardware or software and how resource sharing would be employed. Resource sharing for this purpose means the reuse of components in a system to perform multiple operations. For instance, a multlplier may compute the multiple of A and B at clock cycle 1, and then be reused at clock cycle 5 to compute the multiple of C and D.

The software implementation of a component assumes the use of a simple processor. A DSP core was designed that was configurable. The core contained a minimal instruction set in order to reduce its size.

The work done by Maunder 191 in Synthbase provided a reference for the development of a codesign methodology. Extensions were made to allow processor cores and hierarchical development.

This means the previously developed systems can be

reused in a new system design by properly connecting their input and output ports to the surrounding system. Currently, as in Synthbase, scheduling is cyclostatic using an As-Soon-As- Possible (ASAP) strategy.

I -

I c

I

optimiration ( S c h e d u i i o g . Par,,,iO"i"g)

I

w

I

Figure I. Codesign Flow.

The basic components of a design are profiled in their use of area and performance. These profiles are based on a Stratix I1 device [IO] implementation although other devices could he used. Profiles include the number of basic logic elements used, the number of embedded multipliers used, and the latency.

When designing a new system, specifications are taken as lists of difference equations that are interpreted as being concurrent. For example:

y [ n ] =

a x

y [ n -

11

+ ^{b x} ^{x [ n ]} +

^{c x}

^{x [ n} ^- ¹¹

z[n]

= d x

x[n]

+ e x

X[n

-

11 + ^f

^x

^{x [ n} ^- ²¹

4 x 1

=

y[nI

^-

4 n I

would mean a simple Proportional-Integral- Derivative controller (PID) y[n] and a simple Finite Impulse Response (FIR) filter z[n] would be calculated, and then the difference taken and assigned to e[n]. At present only integer data types are implemented.

The inputs to the methodology are a specification, a list of constraints and priorities of objectives. The outputs of the methodology are a set of VHDL files that implement the specified system according to a tradeoff of the design objectives.

At present only adaptive filters, and DSP tasks can be easily represented.

292

(3)

3. Extended

Genetic

Algorithm

The Simple Genetic Algorithm [ I l l mimics natural evaluation in refining and evolving solutions.

The algorithm typically consists of selecting individuals for reproduction, crossing individuals, and mutating individuals. Selection, crossing, and mutation were extended to make the Extended Genetic Algorithm. The extensions are general in that they can be used for a wider range of problems requiring trade-offs between multiple objectives.

The selection method in the Extended Genetic Algorithm (EGA) calculates fitness as:

where Fj is the fitness of individual j, N is the number of objectives,

oi

is the average value in the population for objective i, ^0,is the value of objective i for individual j and

f; ( o i , o O )

is WeightMin, if 0,

<

oi , WeightMaxi if

o,/ > oi

, and WeightEquali if 0, = o j . Proportional selection is then used. An extension was written to support constraints where a different set of weights was used if a constraint was violated.

In addition to the DNA required to represent an individual, two extra fields are added for the crossover and mutation numbers. For example:

-

F I F ~ F ~ F ~ F ~ F ~ F ~ F ~ M I M Z M ~ M ~ C I

where F, represents DNA affecting the mapping from specification to implementation, M is the mutation number, and C is the crossover number.

The cardinalities of F, are problem-specific. The cardinalities of Mi are 2, and the cardinality of C I is equal to the population size.

The crossover number is an integer occupying 1 locus. During crossover, individuals with the same crossover number are all crossed uniformly in their group. This allows the number of parents used, and which parents to cross to be determined via evolution. Thus if a population of individuals exists such that

A A A A A A A A A A A A I B B B B B B B B B B B B I

c c c c c c c c c c c c 2

D D D D D D D D D D D D Z E E E E E E E E E E E E l where three individuals have crossover number 1, and two individuals have crossover number 2, then following crossover:

A B A B E A B E E A E B l B A E A A E E B B E B A l C D C D C D D C D C D C Z D C D C D C C D C D C D Z E E B E B B A A A B A E I where the individuals with crossover number 1 have been uniformly crossed, and likewise the individuals with crossover number 2 have been uniformly crossed.

The mutation number occupies multiple loci, and is a gray-coded number representing the number of loci to change the value of in an individual. This allows evolution of the mutation probability.

4.

Results

Since the EGA is probabilistic, all trials were repeated at least 50 times to ensure the results were consistent. The SGA and EGA were applied to ten trial problems of different topologies (including De Jong functions 1 and 4, Branin's RCOS function, and Shubert's function) selected from [12]. It was found that the EGA performed at least as well as a well- tuned SGA on seven out of ten of the trial problems.

Simple codesign problems such as the average of 16 values, a system containing 16 divides, an 8-tap FIR filter, and a Matrix Multiplication (4 by 4 times 4 by I ) were tried before undertaking the Self- Tuning Regulator [I31 problem. It was found that software components were eliminated due to an inefficient communication model.

Software was disabled for optimization of the Self-Tuning Regulator. Equal weighting of logic elements, embedded multipliers, and latency was used.

The Self-Tuning Regulator when converted into scalar form as presented in [13], for second-order systems, has 17 subtracts, 44 multiplies, 23 additions, and 20 divisions totaling 104 operations.

Even a conservative estimate of 2'04 gives 2.03 x 10" combinations for the number of different operation to resource bindings. 32-bits were assumed for the bit-widths of operations.

Figure 2 shows the median number of logic elements, embedded multipliers, and latency normalized to the initial population for each generation. A 16% increase in latency has been accepted for a 33% decrease in logic elements, and a 61% decrease in embedded multipliers.

Blind grouping operations into shared groups of 2

~ 3 (minimal communication overhead) to give a potential good solution, would give 14,714 to 18,761 logic elements, 135 to 198 embedded multipliers, and a latency of 22 to 25 cycles. The implementation obtained was approximately 12,965 logic elements, 95 embedded multipliers, and had a latency of 22. This was better than blind grouping.

293

(4)

. . . .in*lllllMUoo-

m

0 2 1

1

Figure 2. EGA Normalized Median Values The Self-Tuning Regulator took 20 minutes to specify and check the equations. The running time of the EGA on the Self-Tuning Regulator problem took an average of 16 minutes on a Pentium 2.66 GHz machine. To design the Self-Tuning Regulator manually took Cao et a1 two months [ 131.

0 10 B a 40 II (0 70 83 93 > m

Ownlo”

5.

Discussion

Like the SGA, the EGA is still computationally intensive, with the major time cost being the evaluation of individuals. The code could be altered so that evaluation could be distributed over a network as in Synthbase [9].

The integer number format, and representation of designs by difference equations has limited the range of problems that can he considered.

The EGA was able to give good median performance, meeting the goals for performance and area as set by blind grouping.

In general the EGA is easier to apply than the SGA as there is no need to determine a rate for crossover or mutation, and weights are easier to apply (they represent priority and the algorithm is less subject to poor definition of these values).

6.

Conclusions and Future

Work

In this paper the optimization of a hardware based Self-Tuning Regulator (STR) was investigated to determine if the proposed Extended Genetic Algorithm (EGA) could optimize the implementation and produce a result competitive with traditional design approaches. This Self-Tuning Regulator took 20 minutes to specify and 16 minutes to optimize and synthesize on a Pentium 2.66 GHz machine. How the EGA scales to larger problems still has yet to he determined.

The Extended Genetic Algorithm presented gave good results and was in general easier to apply than

the Simple Genetic Algorithm, as fewer parameters were required. The setting of these parameters and their effects is still to he tested.

The input representation presented uses difference equations that limit the types of system that can be specified for this type of work. This input representation needs to be extended so a larger operation set is supported allowing a wider range of designs to be implemented. The extensions would include loops, conditional statements, and more complicated mathematical functions such as sine functions.

10. References

[I] Wolf, W. “A Decade of HardwarelSoftware Codesign”, Computer, April 2003, pp. 3 8 4 3 . [2] Emst, R. Henkel, 1. Benner, T. (1993) “Hardware-

Software Cosynthesis for Microcontrollers”, IEEE Design and Test of Computers, vol 10. pp 64-75.

[3] Gupta, R.K. De Micheli, G. (1993) “Hardware- Software Cosynthesis for Digital Systems”, IEEE Design and Test of Computers, VOI 10. pp 29-41.

[4] Gajski, D.D. Vahid, F. Narayan, S. lie Gong.

“SpecSyn: An Environment Supporting the Specify- Explore-Refine Paradigm for HardwarelSoftware System Design”. IEEE Transactions on Very Large Scale lnfegration (VLSI) Systems, March 1998, pp 84-

100.

[SI Wildman, R.A. Kramer, J.I. Weile, D.S. Christie, P.

“Multi-Objective Optimization of Interconnect Geometry”. IEEE Transacrions On VeT Large Scale Integration (VLSI) Sysfems, February 2003, pp 15-23.

[6] Tan, K.C. Lee, T.H. Khor, E.F. (2001)

“Evolutionary Algorithms for Multi-Objective Optimization: Pelformance Assessments and Comparisons”. Proceedinxs ZOO1 Conxress on Evo1;fionary Computafion (Seoul, South Korea), May 27-30. DD 979-986.

[7] Valen&la, C.L. (2002) “A Simple Evolutionary Algorithm for Multi-Objective Optimization (SEAMO)”. Proceedings 2002 Congress on

Evolutionary Coniprrtotion (Honolulu, HI USA), May 12-17. pp717-722.

[8] Palesi, M. Givargis, T. (2002) “Multi-Objective Design Space Exploration using Genetic Algorithms”.

Proceedings Tent Infernafional Symposium on Hardware/Software Codesigii (Estes Park, CO USA), May 6-8. pp 67-72.

[Y] Maunder, R.B. (2002) “A Methodology for the Optimisation and Synthesis of Digital FPGA-Based Circuits”, PhD Thesis, University of Auckland.

[IO] Stratix II Device Handbook. See www.altera.comlliteraturellit-stx2.jsp.

[ I I ] Goldberg, D.E. (1989) “Genetic Algorithms: in Search, Optimization and Machine Leaming”.

Addison-Wesley Publishing Company, USA.

[IZ] Michalewicz, Z. (1996) ”Genetic Algorithms

+

Data Structures = Evolution Programs”. Third Edition.

Springer ^~ Verlag Berlin Heidelberg, United States of America.

[I31 Cao, J. Salcic, Z. Nguang, S.K. (2004) “A Floating-point All Hardware Self-Tuning Regulator for Second Order Systems”, Proceedings of IEEE Region I O Technical Conference On Computers, Commrr,ricotions. Conrml and Pou:er Engineering, IEEE, 2002, vol3. pp 1733-1736.

294

Extended Genetic Algorithm for Codesign Optimization of DSP Systems in FPGAs