Leader-length Minimization on Two-Side Labeling

Chapter 3 One-Side Rerouted Leader and Two-Side Label Placement

3.3 Leader-length Minimization on Two-Side Labeling

Leader-length minimization of two-side labeling is another criterion of our solution.

Without losing generality, when we try to improve the annotation system, we would like to place labels as near its’ site as possible. Which means it’s better to minimize leader length to fit the object.

Instance: Given k , and a set S of n sites on many horizontal lines of set L. Each site has rectangular label with dimension w h . And a set R={r the shortest length from site s to the right boundary}

Question: If there exist any legal opo-labeling with labels on left side and right side of a boundary R, such that the total leader length is at most M?

Theorem 4: Minimizing total leader length of Two-Side map boundary labeling with opo-leader is NP-complete.

Prof: In order to prove the problem belongs to NP-complete problem. We need to guess a position of the labels on the boundary of L and also check (i) the label do not overlap each other. (ii) all the leaders do not intersect with each other. (iii) the sum of leader length is no more than M. We can reduce the problem of determining a legal solution of partition model.

Given positive integers p , p , … , p , is there a subset I of J={1,2,…,2m}such that I contains exactly one of {2i-1,2i} for i=1,2,…,m, and ∑ _Ia ∑ _J/Ia? We will reduce an instance P of partition problem to an instance(S,L) of this problem such that P can be partitioned if and only if there is a two-side boundary labeling of S with corresponding labels to L. This problem can be reduced as a problem that there are many sites on one vertical line, and we want to put all the labels on right side or left side.

If there are more than one sites on the horizontal line, we can assume that they are different sites and very close on the target vertical line (see Figure 3-2). At the same time, every site is lying on the central of the line where it is placed.

Figure 3-2: Transformation to two-side labeling on a line.

We can easily prove this problem by reducing partition problem to it. Giving an integer set A={a , a , … , a } which represents the length of p-segment of every pairs of sites and labels. Also because all the sites have the same distance to the boundary, we can get another integer set B; while b B, and let

b a length of L width of track routing area

Then, we can easily apply this problem to partition which is well-known as a NP-complete problem. Whereas, and if we can find an answer of two-side boundary labeling problem, we can also get the answer of corresponding partition problem. So, minimizing total leader length of Two-Side map boundary labeling with opo-leader is NP-complete.

Because both label height minimization and leader length minimization are NP-Complete problem. And when we try to solve one of both problems, we will not

also get the optimal solution of the other. If we want to solve both of the problems at the same time, it will be harder than solving one problem. So, we try to use genetic algorithm to solve two problems at the same time and runs quickly.

Chapter 4 Genetic Algorithm on Two-Side Labeling

Genetic algorithms are stochastic global search methods that have proved to be successful for many kinds of optimization problems. Genetic algorithms are categorized as global search heuristics. These algorithms work with a population of candidate solutions and try to optimize the answer by using three basic principles, including selection, crossover (also called recombination), and mutation. The initial population should be chosen randomly. Then, during subsequent generation, new candidate solutions are produced by selecting two individuals, with higher probability of selection for better individuals. And then, we have to recombine parts of these individual to form one or two offspring, and mutate (change slightly) the resulting offspring. At last, the

new descendent is inserted back to the population and worst individual is deleted.

Genetic algorithm can be shown as follow:

Pseudo-code algorithm

1. Choose initial population

2. Evaluate the fitness of each individual in the population 3. Repeat

(a) Select best-ranking individuals to reproduce

(b) Breed new generation through crossover and mutation (genetic operations) and give birth to offspring

(d) Replace worst ranked part of population with offspring 4. Until termination

The basic loop is depicted in (see Figure 4-1). The implement of all steps will be discussed in more detail in Chapter 4.

Figure 4-1: The basic loop of a genetic algorithm.

4.1 Genetic Algorithm Modeling

Genetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number

of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.

Genetic algorithms find application in bioinformatics, phylogenetics, computer science, engineering, economics, chemistry, manufacturing, mathematics, physics and other fields.

A typical genetic algorithm requires two things to be defined:

1. a genetic representation of the solution domain, 2. a fitness function to evaluate the solution domain.

A standard representation of the solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, which facilitates simple crossover operation. Variable length representations may also be used, but crossover implementation is more complex in this case. Tree-like representations are explored in Genetic programming and graph-form representations are explored in Evolutionary programming.

The fitness function is defined over the genetic representation and measures the

quality of the represented solution. The fitness function is always problem dependent.

For instance, in the knapsack problem we want to maximize the total value of objects

that we can put in a knapsack of some fixed capacity. A representation of a solution might be an array of bits, where each bit represents a different object, and the value of the bit (0 or 1) represents whether or not the object is in the knapsack. Not every such representation is valid, as the size of objects may exceed the capacity of the knapsack.

The fitness of the solution is the sum of values of all objects in the knapsack if the representation is valid or 0 otherwise. In some problems, it is hard or even impossible to define the fitness expression; in these cases, interactive genetic algorithms are used.

Once we have the genetic representation and the fitness function defined, GA proceeds to initialize a population of solutions randomly, and then improve it through repetitive application of mutation, crossover, inversion, and selection operators.

4.1.1 Individual

The individual represent a candidate solution. In this problem, it represents a legal label placement without crossing. The individuals are stored as real-valued vector. We let the element “0” of the vector representing connecting to left side boundary. On the other side, the element “1” means putting label to the right side boundary. Although it is conceivable that different genetic representations influence the optimization behavior significantly, we choose this representation instinctively. Because we only need two

types of groups to represent connecting to the left side or connecting to the right side, it is obviously that binary integer representation is just what we need.

4.1.2 Initialization

At the beginning of the genetic algorithm, the individuals in the population have to be initialized. Initially many individual solutions are randomly generated to form an initial population. The population size depends on the nature of the problem, but typically contains several hundreds or thousands of possible solutions. Traditionally, the population is generated randomly, covering the entire range of possible solutions (the

search space). Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to be found. In our case, this is done randomly. We generate a label placement with restricted given area (including rectangle R, track routing area, and space for labels) and let bad offspring eliminated by selection.

4.1.3 Evaluation

The choice of the evaluation function plays a crucial role in the design of a genetic algorithm. There is a big advantage of using evaluation in genetic algorithm. One can

measure desired criteria on the resulting placement and weight these criteria to suit personal preferences. Because genetic algorithm is always using in multi-purpose problem, we can then analyze how important these criteria are.

Among the criteria we test are

• All the leaders do not cross to each other.

• All the labels do not overlap to each other.

• Total labels should be placed balanced on left side and right side.

• The longer label height on left side and right side is minimum height.

• Total leader length is a minimum length of all combination.

The result presents in Chapter 5 show some cases of legal placement.

4.1.4 Selection

Selection is important to genetic algorithm, since only selection drives the search towards more promising regions of the search space. In our implementation, we select individuals for reproduction (i.e. parents) according to the common linear ranking selection scheme, i.e. individuals are selected according to their rank, with better individuals receiving a higher chance of being selected. So that, the selective pressure of actual fitness values, which may be important, since it is not known beforehand in

which range of fitness values the optimal solution is located. The algorithm is of the steady state type, i.e. the offspring is introduced into the population, and at least fit individual is deleted. This way, the best solution so far is never lost. Also, in our problem, when crossing happened, the individual should not be counted in the population.

We define fitness function as follow:

f λ ∑ c

In order to normalize the function, we divide these parameters to their intuitive maximum value. Thus without generality, λ and λ can be chosen between 0 and 1, and satisfy λ λ 1.

4.1.5 Recombination

In order to get better result, we combine two good parents into a new offspring which may be brown better or not. The purpose of the crossover operator is to recombining sub-placements of different individuals to produce an offspring. Since, we expect that good parts of a placement are connected, we perform crossover by choosing randomly a connected parts of the placement of two parents and swap the sub-placement.

However, unfortunately, there is a problem with this operator using this method. A combination of two good parents may yield a poor offspring. This poor offspring will be deleted during the natural selection process.

4.1.6 Mutation

Mutation is a crucial step of genetic algorithm. While using recombination, we can only find new combination of individuals that are already at present. We may lose some information forever while it is not in the population. Another method called mutation can introduce new material into the population, i.e. the slight changing of individuals. It is necessary and reasonable to get new materials to increase the probability of getting better answers. In out implementation, mutation is done by randomly changing binary vector with a given small probability. We try to change one leader from the right side to

left side (or from the left side to the right side). This way, we will have a probability to get a better individual through the present individual and the result is different from the recombination (or crossover) process.

4.2 An Example of GA on Two-Side Labeling

Now, we give a simple example how we implement genetic algorithm on two-side labeling. At the beginning, we give a fixed rectangle R which is 400 by 300 units as the target map and also give a fixed width for the track routing area that is assumed enough for all the leaders’ placement. Initially, we generate the number of sites, the height of labels randomly. Also, we can get some parameters (including R width, track routing area width and total label height) for the fitness function shown below:

f λ ∑ c

n |t_R b_R| |r_R l_R| ε λ |R L |

R L

After that, for the fitness function, we only need to calculate the length of leader of any possible placement generated from the algorithm and the combination of labels. We give 4 chromosomes represented as a 20-bits vector as an example (see Figure 4.2-1).

Letting the element “0” of the vector represents connecting to left side boundary. On the

other side, the element “1” of the vector represents connecting to right side boundary.

(see Figure 4.2-1)

chromosome [0]=10100110011101110101 fitness = 0.1946 chromosome [1]=01001101011111000011 fitness = 0.2077 chromosome [2]=11111011011111001110 fitness = 0.2359 chromosome [3]=10111010111001011011 fitness = 0.2335

Figure 4-2: Four chromosomes represented as a 20-bits vector.

In this genetic algorithm, we have to select smaller fitness number as better individuals, and this is different from original fitness definition. Then in iteration (i), we choose the smallest two individuals chromosome [0] and chromosome [1] and recombine them in order to get better offspring (see Figure 4.2-2). After recombination, we can find out that CrossOverChromosome [2] is better than its parents. In the program, choose a number of bits for swap process randomly. In this case, we choose first 4 bits of chromosome [0] and swap them to the first 4 bits of chromosome [1] and get chromosome [2] and chromosome [3]. As the result, one is better and the other is worse. It is obviously that the worse individuals will be eliminated by the natural selection in this iteration.

CrossOverChromosome [0] = 10100110011101110101 fitness = 0.1946 CrossOverChromosome [1] = 01001101011111000011 fitness = 0.2077 CrossOverChromosome [2] = 01000110011101110101 fitness = 0.1856 CrossOverChromosome [3] = 10101101011111000011 fitness = 0.2200

Figure 4-3: Four chromosomes represented as a 20-bits vector in iteration (i).

The iteration will stop when all the four individuals have the same chromosome. In this situation, we will ignore the possibility of mutation in the future because it is not worthy to wait for its happening. The mutation only occurs with a given small probability. In this algorithm we choose only a bit of vector and change it. Even this may not always useful in the algorithm, it helps when we need more different material in the population. The GA result and optimal result show below: (see Figure 4-4 and see Figure 4-5)

Figure 4-4: Type-opo leader of GA solution on two-side labeling.

Figure 4-5: Type-opo leader of optimal solution on two-side labeling.

Chapter 5 Simulation Results

For now, we test our algorithm on a number of example graphs with rerouted leader on one-side labeling and genetic algorithm on two-side labeling. As described above, there are some disadvantages of the annotation system of Microsoft Office Word.

While inputted label number is small, it is more general that users may want to enlarge label height to see more details in the labels. On the other hand, while inputted label number becomes larger, the system should not abandon labels easily. We provide the following method to prove the annotation system: (i) While inputting little labels, we enlarge label size to fit the height of paper sheet and apply the situation on rerouted leader labeling method. However; we can also use this method to prove the

visualization by combining dotted lines into one. We will show the detail in the following sections. (ii) While inputting many labels, we can also combine labels of sites on one line. (iii) While inputting many labels, in order to provide more space for labels, we try to reduce the space of text and provide one more column for labels in one page.

This way, we can apply the situation on two-side labeling problem. We separate them into three basic groups to see whether two objects: minimum label height and minimum leader length is important.

5.1 Rerouted Leaders on One-Side Labeling

Recall the details in chapter 3; we proved that the algorithm of rerouted leader label placement of one-side labeling is run in polynomial time O(n ). The total leader length of original placement is 1880 units and rerouted leader placement is 1120 units (see Figure 5-1). This one-side labeling problem with rerouted leaders makes the annotation system improvable. In next section, we will show how it looks when applying on Word.

(a) Original placement. (b) Rerouted leader placement.

Figure 5-1: Easy sample result of non-uniform rectangular label placement.

5.2 Genetic Algorithm on Two-Side Labeling

In this section, we slightly change λ and λ in order to get better visualization of two-side labeling problem. Besides, we also want to know how these parameters affect the final result.

5.2.1 Leader Length Minimization

In some situation, we may focus on object “leader length minimization”. We can slightly change λ and λ to fit our destination. So, we try typical formation to see how important they are under our constraints.

Table 2: Details of our GA algorithm and optimal solution with λ =1.0 and λ =0.0.

Total leader length Difference of label height

GA algorithm 5842 units 44 units

Optimal solution 4710 units 48 units

Here, the site number is 20, total leader length of Figure 5-5 is 5842 units and height difference of left labels and right labels is 44 units. Total leader length of Figure 5-6 is 4710 units and height difference of left labels and right labels is 48 units. In this case, we assume possible maximum leader length is 16020 units and total label height is 12080 units.

Figure 5-2: Type-opo leader of GA solution on two-side labeling.

Figure 5-3: Type-opo leader of optimal solution on two-side labeling.

When these objectives are not both important, we may try to set with λ 1.0 and λ 0.0. This is reasonable for the normalization. We also show that the genetic algorithm works, because the average fitness converge to optimal fitness finally (see Figure 5-4). It converges quickly. Although in other cases, we may see some points which are not respected, it’s because the mutation process and we still can find out the tendency of convergence. Even though the leader length is smaller, it doesn’t look very good because the labels of two sides are not balanced as usual.

Figure 5-4: The GA convergence with λ 1.0 and λ 0.0

0 1 2 3 4 5 6 7 8 9 10

Best fitness 0.376 0.369 0.364 0.364 0.364 0.364 0.364 0.364 0.364 0.364 0.364 AVG fitness 0.411 0.385 0.373 0.367 0.364 0.364 0.364 0.364 0.364 0.364 0.364

0.34

5.2.2 Label Height Minimization

In some situation, we may focus on object “label height minimization”. We can slightly change λ and λ to fit our destination. So, we try typical formation to see how important they are under our constraints.

Table 3: Details of our GA algorithm and optimal solution with λ =0.0 and λ =1.0.

Total leader length Difference of label height

GA algorithm 6478 units 28 units

Optimal solution 6232 units 0 units

Here, the site number is 20, total leader length of Figure 5-5 is 6478 units and height difference of left labels and right labels is 28 units. Total leader length of Figure 5-6 is 6232 units and height difference of left labels and right labels is 0 units. In this case, we assume possible maximum leader length is 16020 units and total label height is 12080 units.

Figure 5-5: Type-opo leader of GA solution on two-side labeling.

Figure 5-6: Type-opo leader of optimal solution on two-side labeling.

When these objectives are not both important, we may try to set with λ 0.0

在文檔中邊界標記在註解系統之應用 (頁 35-0)