• 沒有找到結果。

DecisionTree OutputFormat DataFormat InputFormat Specificationof5.2 Homework#5

N/A
N/A
Protected

Academic year: 2022

Share "DecisionTree OutputFormat DataFormat InputFormat Specificationof5.2 Homework#5"

Copied!
2
0
0

加載中.... (立即查看全文)

全文

(1)

Data Structures and Algorithms (NTU, Spring 2013) instructor: Hsuan-Tien Lin

Homework #5

TAs’ email: dsata AT csie DOT ntu DOT edu DOT tw RELEASE DATE: 05/03/2013

DUE DATE: 05/16/2013, noon

Specification of 5.2

We provide two data sets, namely heart and wine for you to test your program. Of course, we may use other data sets to evaluate your performance. If you are interseted in trying more data sets, you can check on UCI Machine Learning Repository.

http://archive.ics.uci.edu/ml/datasets.html

Input Format

The first argument of your program should be the training data file which contains the examples. The second argument of your program should be θ, the criterion of whether to stop branching.

./tree heart 3

Data Format

The first line contains two integers n and m, the former one (n) is the number of examples and the latter one (m) is the number of total factors. Each of the following n lines represents an example in the following format, where each number is separated by a space:

label factor[0] factor[1] ... factor[m-1]

For instance, for the line

1 14.23 1.71 2.43 15.6 127 2.8 3.06 0.28 2.29 5.64 1.04 3.92 1065 1 is the label and the rest are the factors.

Output Format

Decision Tree

Please output your tree as a function in C/C++ language. The function must follow this signature:

int tree predict(double *attr);

The only argument is a double array which contains the factors of one example in the same format as input. This function should return the label prediction of the example (1 or -1 for heart, for in- stance). Also, please name your output file as ”tree pred.h”. Then, you can compile and run the provided ”tree predictor.cpp” to check how good your decision tree is (see README). For example, your ”tree pred.h” should look like:

int tree_predict(double *attr){

if(attr[0] > 5){

return 1;

} else{

return -1;

} }

1 of 2

(2)

Data Structures and Algorithms (NTU, Spring 2013) instructor: Hsuan-Tien Lin

Data Set Description

In this section we provide the meaning of factors and label in the data sets.

Wine

label means two types of wine from two different cultivars.

factors are:

1) Alcohol 2) Malic acid 3) Ash

4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids

8) Nonflavanoid phenols 9) Proanthocyanins 10)Color intensity 11)Hue

12)OD280/OD315 of diluted wines 13)Proline

Heart

The data set describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images.

label means two catogories of patients : normal and adnormal.

factors are:

1. F1R: continuous (count in ROI (region of interest) 1 in rest) 2. F1S: continuous (count in ROI 1 in stress)

3. F2R: continuous (count in ROI 2 in rest) 4. F2S: continuous (count in ROI 2 in stress) 5. F3R: continuous (count in ROI 3 in rest) 6. F3S: continuous (count in ROI 3 in stress) 7. F4R: continuous (count in ROI 4 in rest) 8. F4S: continuous (count in ROI 4 in stress) ...

- all continuous attributes have integer values from the 0 to 100

2 of 2

參考文獻

相關文件

The format of the URI in the first line of the header is not specified. For example, it could be empty, a single slash, if the server is only handling XML-RPC calls. However, if the

(a)  is the rate at which the percentage of the city’s electrical power produced by solar panels changes with respect to time , measured in percentage points per year..

(b) 0 = IV, since from left to right, the slopes of the tangents to graph (b) start out at a fixed positive quantity, then suddenly become negative, then positive again..

(b)- IV, since from left to right, the slopes of the tangents to graph (b) start out at a fixed positive quantity, then suddenly become negative, then

Full credit if they got (a) wrong but found correct q and integrated correctly using their answer.. Algebra mistakes -1% each, integral mistakes

[r]

The molal-freezing-point-depression constant (Kf) for ethanol is 1.99 °C/m. The density of the resulting solution is 0.974 g/mL.. 21) Which one of the following graphs shows the

17) Consider a cell made up of two half cells consisting of the same metal in solutions of the metal ion of different concentrations.. A piece of the coffin is analyzed by