• 沒有找到結果。

Michael Tsai2017/3/28 SORTING

N/A
N/A
Protected

Academic year: 2022

Share "Michael Tsai2017/3/28 SORTING"

Copied!
31
0
0

加載中.... (立即查看全文)

全文

(1)

Michael Tsai 2017/3/28

(2)

Sorting

Definition:

Input: a sequence of n numbers

Output: is a permutation (reordering) of the original sequence, such that

In reality, is the key of a record (of multiple field s)

(e.g., student ID)

In a record, the data fields other than the key is call ed satellite data

If satellite data is large in size, we will only sort t he pointers pointing to the records. (avoiding moving t he data)

 

(3)

Applications of Sorting

Example 1: Looking for an item in a list

Q: How do we look for an item in an unsorted list?

A: We likely can only linearly traverse the list from the beginning.

Q: What if it is sorted?

A: We can do binary search

But, how much time do we need for sorting? (pre-proce ssing)

 

(4)

Applications of Sorting

Example 2:

Compare to see if two lists are identical (list all different items)

The two lists are n and m in length

Q: What if they are unsorted?

Compare the 1st item in list 1 with (m-1) items in list 2

Compare the 2nd item in list 1 with (m-1) items in list 2

Compare the n-th item in list 1 with (m-1) items in list 2

time is needed

Q: What if they are sorted?

A:

Again, do not forget we also need time for sorting. But, how much?

 

(5)

Categories of Sorting Algo.

Internal Sort:

Place all data in the memory

External Sort:

The data is too large to fit it entirely in the memory.

Some need to be temporarily placed onto other (slower) stor age, e.g., hard drive, flash disk, network storage, etc.

In this lecture, we will only discuss internal sort .

Storage is cheap nowadays. In most cases, only inte rnal sort is needed.

(6)

Some terms related to sorting

Stability:

If (equal key value) , then they maintain the same order before and after sorting.

In-place:

Directly sort the keys at their current memory locat ions. Therefore, only O(1) additional space is neede d for sorting.

Adaptability:

If part of the sequence is sorted, then the time com plexity of the sorting algorithm reduces.

 

(7)

How fast can we sort?

Assumption: compare and swap

Compare: compare two items in the list

Swap: Swap the locations of these two items

How much time do we need in the worst case?

  1

2

  2

3

  1

3

  1

3

  2

3

stop stop

stop

stop

stop stop

[1,2,3]

[1,3,2]

[1,2,3]

[1,2,3]

[1,3,2]

[2,3,1]

[2,1,3]

[2,1,3]

[3,1,2]

[3,1,2] [3,2,1]

Yes No

(8)

Decision tree for sorting

Every node represents a comparison & swap

Sorting is completed when reaching the leaf

How many leaves?

, since there are that many possible permutations

 

  1

2

  2

3

  1

3

  1

3

stop stop

stop

stop

stop stop

[1,2,3]

[1,3,2]

[1,2,3]

[1,2,3]

[1,3,2]

[2,3,1]

[2,1,3]

[2,1,3]

[3,1,2] [3,2,1]

  2

3

(9)

How fast can we sort?

所以 , worst case 所需要花的時間 , 為此 binary tree 的 height.

如果 decision tree height 為 h, 有 l 個 leaves

, we have a least n! outcomes (leaves)

, a binary tree (decision tree) of height h has at m ost leaves

Summary: Any “comparison-based” sorting algorithm has worst-case time complexity of .

 

(10)

Review: Selection Sort

Select the smallest, move it to the first position.

Select the second smallest, move it to the second position.

….

The last item will automatically be placed at the last position.

ㄅ ㄆ 1

1 2

1 2

(11)

Review: Selection Sort

Selection sort does not change the execution of the algorithm due to the current conditions.

Always going through the entire array in each itera tion.

Therefore, its best-case, worst-case, average-case running time are all

Not adaptive!

In-place

 

(12)

Insertion Sort

In each iteration, add one item to a sorted list of i item.

Turning it into a sorted list of (i+1) item

2 3 6 5 1 4

2 3 6 5 1 4

2 3 6 5 1 4

2 3 5 6 1 4

2 3 5 6 1 4

1 2 3 5 6 4

1 2 3 5 6 4

1 2 3 4 5 6

(13)

Pseudo code

(14)

Insertion Sort

Q: How much time is needed?

A: In the worst case, the item needs to be placed at the beginning for each and every iteration.

(Spending time linear to the size of sorted part)

Average-case complexity:. (Why?)

Possible variation: (do those improve the time complexit y?)

1. Use binary search to look for the location to insert.

2. Use linked list to store the items. Then moving takes only !

 

(15)

What’s good about insertion sort

Simple (small constant in time complexity represent ation)

Good choice when sorting a small list

Stable

In-place

Adaptive

Example: In , only two inversions <5,3>, <5,4>.

The running time for insertion sort: O(n+d), d is the numbe r of inversions Best case: O(n) (No inversion, sorted)

Online:

No need to know all the numbers to be sorted. Possi ble to sort and take input at the same time.

 

(16)

Merge Sort

Use Divide-and-Conquer strategy

Divide-and-Conquer:

Divide: Split the big problem into small problems

Conquer: Solve the small problems

Combine: Combine the solutions to the small problems into the solution of the big problems.

Merge sort:

Divide: Split the n numbers into two sub-sequences of n/2 numb ers

Conquer: Sort the two sub-sequences (use recursive calls to de legate to the clones)

Combine: Combine the two sorted sub-sequences into the one sor ted sequence

(17)

Merge Sort

17

To see that the MERGE procedure runs in ‚ .n/ time, where n D r p C 1, observe that each of lines 1–3 and 8–11 takes constant time, the for loops of lines 4–7 take ‚ .n1 C n2/ D ‚ .n/ time,7 and there are n iterations of the for loop of lines 12–17, each of which takes constant time.

We can now use the MERGE procedure as a subroutine in the merge sort al- gorithm. The procedure MERGE-SORT.A;p;r/ sorts the elements in the subar- ray p ::r•. If p r, the subarray has at most one element and is therefore already sorted. Otherwise, the divide step simply computes an index q that par- titions p ::r•into two subarrays: AŒp ::q•, containing dn=2e elements, and q C 1::r•, containing bn=2c elements.8

MERGE-SORT.A;p;r/

1 if p < r

2 q D b.p C r/=2c

3 MERGE-SORT.A;p;q/

4 MERGE-SORT.A;q C 1;r/

5 MERGE.A;p;q;r/

To sort the entire sequence A D hAŒ1•; AŒ2•; :::; AŒn•i, we make the initial call MERGE-SORT.A;1;A:length/, where once again A:length D n. Figure 2.4 il- lustrates the operation of the procedure bottom-up when n is a power of 2. The algorithm consists of merging pairs of 1-item sequences to form sorted sequences of length 2, merging pairs of sequences of length 2 to form sorted sequences of length 4, and so on, until two sequences of length n=2are merged to form the final sorted sequence of length n.

2.3.2 Analyzingdivide-and-conquer algorithms

When an algorithm contains a recursive call to itself, we can often describe its running time by a recurrenceequation or recurrence, which describes the overall running time on a problem of size n in terms of the running time on smaller inputs.

We can then use mathematical tools to solve the recurrence and provide bounds on the performance of the algorithm.

7We shall see in Chapter 3 how to formally interpret equations containing ‚ -notation.

8The expression dxedenotes the least integer greater than or equal to x, and bxc denotes the greatest integer less than or equal to x. These notations are defined in Chapter 3. The easiest way to verify that setting q to b.p C r/=2c yields subarrays AŒp ::q•and AŒq C 1::r•of sizes dn=2eand bn=2c, respectively, is to examine the four cases that arise depending on whether each of p and r is odd or even.

Divide

Conquer x2 Combin

e

(18)

Merge Sort: Example

(19)

How to combine (merge)?

1 4 5 8 2 3 6 9

1 2 3 4 5 6 8 9

i j

Running time: , 和 are the lengths of the two sub- sequences.

A temporary storage of size O(n) is needed during the merge process

 

Original array

Temporary storage

(20)

Implementation: Merge

(21)

Merge sort

Every item to be sorted is processed once per “pass”

How many passes is needed?

The length of the sub-sequence doubles every pass, and f inally it becomes the large sequence of n numbers

Therefore, passes.

Total running time:

Worst-case, best-case, average-case:

(Not adaptive)

Not in-place: need additional storage for sorted sub-seq uences

Additional space: O(n)

 

(22)

Quick Sort

Find a pivot( 支點 ), manipulate the locations of the items so that:

(1) all items to its left is smaller or equal (unsorted),

(2) all items to its right is larger

Recursively call itself to sort the left and right sub-sequences.

26 5 37 1 61 11 59 15 48 19

26 5 37 1 61 11 59 15 48 19

26 5 19 1 61 11 59 15 48 37

26 5 19 1 15 11 59 61 48 37

11 5 19 1 15 26 59 61 48 37

(23)

Pseudo Code

Divide

Conquer x2

No

Combine!

(24)

Quick Sort

11 5 19 1 15 26 59 61 48 37

1 5 11 19 15 26 59 61 48 37

1 5 11 19 15 26 59 61 48 37

1 5 11 15 19 26 59 61 48 37

1 5 11 15 19 26 59 61 48 37

1 5 11 15 19 26 48 37 59 61

1 5 11 15 19 26 37 48 59 61

1 5 11 15 19 26 37 48 59 61

(25)

Quick Sort: Worst & Best case

But worst case running time is still

Q: Give an example which produces worst-case runnin g time for the quick sort algorithm.

In this case: running time is

Best case?

Pivot can split the sequence into two sub-sequences of equal size.

Therefore, T(n)=2T(n/2)+

T(n)=

 

(26)

Randomized Quick Sort

Avoid worst case to happen frequently

Randomly select a pivot (not always the leftmost ke y)

Reduce the probability of the worst case

However, worst case running time is still

 

26 5 37 1 61 11 59 15 48 19

Randomly select a pivot

Swap in advance

(27)

Average running time

Better if the selection of pivot can evenly split t he sequence into two sub-sequences of equal size

Why the average running time is close to the best-c ase one?

假設很糟的一個狀況 : 每次都分成 1:9

 

Time needed for the “9/10 subsequence”

Time needed for the “1/10 subsequence”

Time needed for partitioning

(28)

Average running time

As long as the pivot can partition according to a particular ratio (even not close to 50%), we can still obtain running time!

 

(29)

Average running time

Case 1:

Worst case for the first level

partition, but best- case for second level.

Case 2:

Best case for the first level partition

Partition time for the first level: 

Partition

time, second level:

 

Partition time for the first level: 

Θ  ()+Θ (�− 1)=Θ (�)

Same!

(Case 1 has larger constant)

The better-

partitioned level would “absorb” the extra running time for worse-

partitioned level.

(30)

比較四大金剛

Insertion sort: quick with small input size n. (small constant)

Quick sort: Best average performance (fairly small constant)

Merge sort: Best worst-case performance

Heap sort: Good worst-case performance, no additional space nee ded.

Real-world strategy: a hybrid of insertion sort + others. Use i nput size n to determine the algorithm to use.

Worst Average Additional Space?

Insertion sort O(1)

Merge sort O(n)

Quick sort O(1)

Heap sort O(1)

Worst Average Additional Space?

Insertion sort O(1)

Merge sort O(n)

Quick sort O(1)

Heap sort O(1)

Not

covered today!

參考文獻

相關文件

Lemma 3 An American call or a European call on a non-dividend-paying stock is never worth less than its intrinsic value.. • An American call cannot be worth less than its

Lemma 3 An American call or a European call on a non-dividend-paying stock is never worth less than its intrinsic value.. • An American call cannot be worth less than its

Lemma 2 An American call or a European call on a non-dividend-paying stock is never worth less than its intrinsic value.. • An American call cannot be worth less than its

• Suppose, instead, we run the algorithm for the same running time mkT (n) once and rejects the input if it does not stop within the time bound.. • By Markov’s inequality, this

In part 1, let’s run experiments on CNN_4layers However, to avoid lengthy training time, let’s consider a 5000-instance subset at this directory Let’s use MNIST-5000 and

In this way, we can take these bits and by using the IFFT, we can create an output signal which is actually a time-domain OFDM signal.. The IFFT is a mathematical concept and does

Thus when we implemented the advanced version, we didn’t really have much trouble caused by being not familiar with the environment, and therefore we can focus ourselves on

Rather than requiring a physical press of the reset button before an upload, the Arduino Uno is designed in a way that allows it to be reset by software running on a