Hints and Technical Reminders

(1)

Data Structure and Algorithm Homework #5

Due: 2:20pm, Thursday, May 30, 2013 TA email: dsa1@csie.ntu.edu.tw

=== Homework submission instructions ===

• For Problem 1, submit your source codes, a Makefile to compile the source, and a brief documentation to the SVN server (katrina.csie.ntu.edu.tw). You should create a new folder

“hw5” and put these three files in it.

• The filenames of the Makefile, and the documentation file should be “Makefile” and

“report.txt”, respectively; you can use any filenames for the source codes, but the file- name of the executable file which is generated after executing “make” command should be

“main.exe”. You will get some penalties in your grade if your submission does not follow the naming rule. The documentation file should be in plain text format (.txt file). In the documentation file you should explain how your code works, the reference, and anything you would like to convey to the TAs.

• For Problem 2 through 5, submit the answers via the SVN server (electronic copy) or to the TA at the beginning of class on the due date (hard copy). Please hand in your homework with A4 paper.

• Except the programming assignment, each student may only choose to submit the homework in only one way; either all in hard copies or all via SVN. If you submit your homework partially in one way and partially in the other way, you might only get the score of the part submitted as hard copies or the part submitted via SVN (the part that the grading TA chooses).

• If you choose to submit the answers of the writing problems through SVN, please combine the answers of all writing problems into only ONE file in the pdf format, with the file name in the format of “hw5 [student ID].pdf” (e.g. “hw5 b01902888.pdf”); otherwise, you might only get the score of one of the files (the one that the grading TA chooses).

• Discussions with others are encouraged. However, you should write down your solutions by your own words. In addition, for each problem you have to specify the references (the Internet URL you consulted with or the people you discussed with) on the first page of your solution to that problem, but on report.txt for the programming assignment.

• NO LATE SUBMISSION IS ALLOWED for the homework submission in hard copies - no score will be given for the part that is submitted after the deadline. For submissions via SVN (including the programming assignment and electronic copies of the writing problems), up to one day of delay is allowed; however, the score of the part that is submitted after the deadline will get some penalties according to the following rule (the time will be in seconds):

(2)

Problem 1. The DSA Kingdom (25%)

Once upon a time, there was a Devil Scary Awful Kingdom. There were T islands in the kingdom and several cities on each island. To simplify the problem, the cities are assumed to scatter on a 2-D plane.

One day, a new communication technology was developed. It can send and receive the messages between two cities with laser. Then, the king wanted to deploy the communication network on each island such that any two cities on the same island can communicate with each other.

A na¨ıve way to build this communication network is to build laser connections between all pairs of cities. However, these laser connections would cost a lot and consume too much energy. A more efficient method is to allow some cities to act as relay stations and transfer the messages for other cities.

On the other hand, due to their inaccurate positioning technology, the laser could only be launched toward 4 directions: north, south, east, and west. In other words, only the cities with the same x coordinates or the cities with the same y coordinates can construct laser connections between them.

The cost of a laser connection depends on the distance between the two cities that it connects.

In this problem, the distance is defined as the Euclidean distance, which means that the distance between (x1, y1) and (x2, y2) is

√

(x1− x2)²+ (y1− y2)².

You were a chancellor in the DSA kingdom. The king summoned you and gave you an order,

“build a minimum cost laser communication network for every island. ”

Input Format

The test data will be given in the following format.

The 1st line contains an integer T , followed by the description of T island.

For each island description, the 1st line contains an integer n, indicating the number of cities on this island.

In the next n lines, the i-th line contains 2 integers xi, yi which are separated by a space character, indicating the coordinate of the i-th city.

All cities have distinct coordinates - no two cities are at the same location.

Output Format

For each island, your program should output an integer indicating the cost (in terms of distance) of the minimum cost network.

(3)

Grading

• 3 points for the report, including the reference.

Remember to write the reference of Problem 1 in report.txt.

• 2 points for Makefile, including whether the compilation of your source codes is successful.

• 2 points for each set of test data; 20 points in total.

The execution time limit for each set of test data is 3 seconds.

You MUST use Kruskal’s Algorithm if you need to find a minimum spanning tree (MST);

other algorithms are not allowed.

The following are conditions that you can assume for the test data.

• For 3 sets of test data, n ≤ 2000, 0 ≤ xi, y_i≤ n.

• For all test data, T ≤ 10, n ≤ 100 000, −10⁹≤ xi, y_i≤ 10⁹.

• For 5 sets of test data, the solutions of all islands always exist.

Sample Input

3 5 1 1 2 1 2 2 0 2 0 0 2 1 1 0 0 1 0 0

Sample Output

6 -1 0

The minimum cost network of the first island is (0, 0)↔ (0, 2) ↔ (2, 2) ↔ (2, 1) ↔ (1, 1). The cost is 2 + 2 + 1 + 1 = 6.

(4)

Hints and Technical Reminders

• Some laser connections can be proven useless.

For example, if there are three cities at coordinates (0, a), (0, b), and (0, c), respectively, and we have a < b < c. The direct connection (0, a)↔ (0, c) could be replaced by two connections (0, a)↔ (0, b) ↔ (0, c) at the same cost. Though the costs are the same, the latter is better since it can connect 3 cities instead 2.

Then, you will find out, a network has minimum cost only if all connections are between pairs of adjacent cities.

By this result, the maximum number of possible connections in an island is 4n = O(n) (in 4 directions).

• Please make sure that your algorithm and data structure are efficient enough.

Your time complexity should be o(n²) (note the small-o notation here). For example, the official solution is O(n lg n).

• You do not need to usedoubleorfloatdata types for this problem. (think about why.)

• Some answers can NOT be stored in a 32-bit integer. An overflow may occur with some test data. Therefore, uselong long intwhen calculating the cost. Use scanf("%lld") to read and printf("%lld") to write. For WinXP user, use "%I64d" when testing your program locally. Don’t forget to change it back to "%lld" before committing.

• The length of readable official solution contains about 150 lines of C code.

• Each test data can be solved in 2 seconds on the CSIE 217 work stations by the official solution without any compiler optimization.

(5)

Problem 2. Quicksort Execution (20%)

2.1. (10%) Partitioning is the key to the quicksort algorithm. It re-arrange the input array A[

p...r] into two subarrays A[p...q-1] and A[q+1...r]. After partitioning, every element in A[p...q-1] is less than or equals to A[q] and every element in A[q+1...r] is greater than or equals to A[q]. In the lectures, we have learned an implementation of the partition algorithm. Here, we’ll introduce a different version that is described on page 171 of the textbook (Cormen). The pseudo code of this partition algorithm is as follows.

1 P A R T I T I O N ( A , p , r )

2 x = A [ r ]

3 i = p - 1

4 for j = p to r - 1

5 if A [ j ] <= x

6 i = i + 1

7 e x c h a n g e A [ i ] wi t h A [ j ]

8 e x c h a n g e A [ i +1] wi t h A [ r ]

9 r e t u r n i +1

Let A[0...12] = {16, 19, 7, 28, 22, 20, 4, 26, 5, 14, 1, 3, 17}. Please fill up the following table with the content of array A[0...12] after each iteration of forand beforereturn to show the progress of PARTITION(A, 0, 12).

initial 16 19 7 28 22 20 4 26 5 14 1 3 17 i= -1

j=0 i=

j=1 i=

j=2 i=

j=3 i=

j=4 i=

j=5 i=

j=6 i=

j=7 i=

j=8 i=

j=9 i=

j=10 i=

j=11 i=

final i=

(6)

2.2. (10%) The following pseudo code implements quicksort. The PARTITION procedure is given in 2.1.

1 Q U I C K S O R T ( A , p , r )

2 if p < r

3 q = P A R T I T I O N ( A , p , r )

4 Q U I C K S O R T ( A , p , q -1)

5 Q U I C K S O R T ( A , q +1 , r )

Let A[0...9]={29, 25, 5, 27, 8, 9, 15, 26, 13, 22}. Please use the method shown on slide 22 in the “sorting” lecture slides to show the progress of QUICKSORT(A, 0, 9). Note that you need to carefully examine the order of the recursive function calls to fill up the table with correct answers.

initial 29 25 5 27 8 9 15 26 13 22

(7)

Problem 3. Quicksort Evaluation (25%)

3.1. (5%) Suppose that we’d like to use quicksort to sort 15 elements: {1, 2, . . . , 15}. Initially, these 15 elements are in random order. In each stage of quicksort, we always choose the first element as the pivot. What is the minimum number of comparisons used in quicksort? How about the maximum? For both questions, please give examples (i.e., the initial order of these 15 elements) and show that these cases will result in the minimum/maximum. (You don’t have to prove that your answers can generate the minimum/maximum)

3.2. (5%) Assume we choose the pivot randomly instead of always choosing the first element in quicksort. Out of the two cases you have given in the previous question, which do you expect to run faster? Explain (or prove if you can) you answer.

3.3. (5%) We have seen that the method of choosing the pivot in quicksort is very important: if the pivot is too small or too large, the speed of quicksort decreases. So let’s try to put more efforts in choosing the pivot. Think about this: when we are going to choose a pivot, we randomly pick 3 elements (from the numbers to be sorted), then choose the median of the 3 to be the pivot. Prove that the maximum number of comparison is still O(N²) in the worst case when sorting N elements.

3.4. (5%) Let’s modify the previous strategy a little bit: when we are going to choose a pivot, we randomly pick a elements and sort them using insertion sort. Then we choose the median of these a elements as the pivot. Give an example in which the number of comparison is O(^N_a² + N a). (If you can, prove that this is the worst case - the maximum number of comparisons that can happen if we use this method to select a pivot)

3.5. (5%) For a given N , due to the result of the previous question, please choose the best a that minimizes the time complexity.

(8)

Problem 4. How do you count these inversions? (15%)

Given a sequence of numbers ⟨an⟩ with N distinct numbers, we call a pair of two numbers (ai, aj) an inversion if and only if i < j and ai > aj. Let I(⟨an⟩) be the set of inversions in ⟨an⟩.

For example, if ⟨an⟩ = ⟨13, 78, 90, 47⟩, I (⟨an⟩) = {(78, 47), (90, 47)}.

In this problem, we will learn to count the numbers of inversions|I (⟨an⟩)| in an efficient way.

4.1. (2%) Design a na¨ıve brute-force algorithm to calculate the number of inversions |I (⟨an⟩)|

in O(N²). The algorithm should be written in pseudo code or C code. Briefly argue the correctness of your code and show that it indeed runs in O(N²).

4.2. (3%) Consider the sequence⟨an⟩ = ⟨bn⟩ ⟨cn⟩. It means ⟨an⟩ is a concatenation of two sequences

⟨bn⟩ and ⟨cn⟩. For example, if ⟨bn⟩ = ⟨13⟩ and ⟨cn⟩ = ⟨78, 90, 47⟩, ⟨an⟩ will be ⟨13, 78, 90, 47⟩.

Let inv (⟨bn⟩ , ⟨cn⟩) be the number of pairs (bi, cj) where bi ∈ ⟨bn⟩ and cj ∈ ⟨cn⟩ such that bi> cj. Prove that |I (⟨an⟩)| = |I (⟨bn⟩)| + |I (⟨cn⟩)| + inv (⟨bn⟩ , ⟨cn⟩).

4.3. (5%) You are given two sorted sequences ⟨bn⟩ and ⟨cn⟩. Design an efficient algorithm to calculate inv (⟨bn⟩ , ⟨cn⟩), as defined in problem 4.2. If the two sequences contain M and K numbers, respectively, your algorithm should run in O(M + K) time. The algorithm should be written in pseudo code or C code. Please also show the correctness of your code and that your code indeed runs in O(M + K).

4.4. (5%) You are given an arbitrary unsorted sequence ⟨an⟩. Design an efficient algorithm to calculate |I (⟨an⟩)|. If the sequence contains N numbers, your algorithm should run in O(N log N ) time. The algorithm should be written in pseudo code or C code. Please also show the correctness of your code and that your code indeed runs in O(N log N ).

Hint 1. Try to split the original sequence⟨an⟩.

Hint 2. The merge sort algorithm may help you a lot.

(9)

Problem 5. Maximum Gap (15%)

You are given an array A which contains N distinct numbers. The findGap function calculates the maximum value of y− x for all x, y in A where x < y and ∄z ∈ A such that x < z < y. The function definitions are shown below. For each subproblem, write down your answer in C code and show that it meets the running time requirement.

5.1. (5%) Design a findGap function that runs in O(N + M ) time. Assume that all numbers in A are natural numbers and less than M .

1 int f i n d G a p (int* A , int N , int M ) {

2 int i , gap = 0;

3 if( N <= 1) r e t u r n gap ;

4 if( N == 2) r e t u r n A [0] < A [1] ? A [1] - A [0] : A [0] - A [ 1 ] ;

5 int * a r r a y = (int*) m a l l o c ( M * s i z e o f(int) ) ;

6 for( i =0; i < M ; i ++) a r r a y [ i ] = 0;

7 // TODO: add some statements here

8

9 r e t u r n gap ;

10 }

5.2. (10%) Design a findGap function that runs in O(N ) time. Assume that all numbers in A are real numbers.

1 d o u b l e f i n d G a p (d o u b l e* A , int N ) {

2 d o u b l e gap = 0 . 0 ;

3 if( N <= 1) r e t u r n gap ;

4 if( N == 2) r e t u r n A [0] < A [1] ? A [1] - A [0] : A [0] - A [ 1 ] ;

5 // TODO: add some statements here

6

7 r e t u r n gap ;

8 }

Hint 1. As mentioned in the lecture, the lower bound of the time complexity for comparison- based sorting algorithms is Ω(N logN ). To bring the running time of your algorithm to be lower than this bound, you need to use a non-comparison sorting algorithm such as counting sort or bucket sort.

Hint 2. Let M and m be the maximum and minimum numbers in A. We can put all numbers in A in the range [m, M ] into N−1 buckets. The interval of each bucket is (M −m)÷(N −1).

Hint 3. Since there are N−2 numbers other than M and m and there are N −1 buckets, at least one of the buckets is empty if M and m are not considered. Thus, any two numbers that are in the same bucket cannot have the largest gap - we only need to store the largest number and the smallest number in each bucket.