Scikit-learn offers an implementation of SVM using two C++ libraries (with a C API to interface with other languages) developed at the National Taiwan University, A Library for Support Vector Machines (LIBSVM) for SVM classification and regression (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and LIBLINEAR for classification problems using linear methods on large and sparse datasets (http://
www.csie.ntu.edu.tw/~cjlin/liblinear/). Having both the libraries free to use, quite fast in computations, and already tested in quite a number of other solutions, all Scikit-learn implementations in the sklearn.svm module rely on one or the other, (The Perceptron and LogisticRegression classes also make use of them, by the way.) making Python just a convenient wrapper.
On the other hand, SGDClassifier and SGDRegressor use a different
implementation as neither LIBSVM nor LIBLINEAR have an online implementation, both being batch learning tools. In fact, when operating, both LIBSVM and
LIBLINEAR perform the best when allocated a suitable memory for kernel operations via the cache_size parameter.
The implementations for classification are as follows:
Class Purpose Hyperparameters
sklearn.svm.NuSVC same as above nu, kernel, degree, gamma
sklearn.svm.
OneClassSVM Unsupervised detection of
outliers nu, kernel, degree, gamma
sklearn.svm.LinearSVC Based on LIBLINEAR, it is a binary and multiclass linear classifier
Penalty, loss, C
As for regression, the solutions are as follows:
Class Purpose Hyperparameters
sklearn.svm.SVR The LIBSVM implementation
for regression C, kernel, degree, gamma, epsilon sklearn.svm.NuSVR same as above nu, C, kernel, degree,
gamma
Fast SVM Implementations
As you can see, there are quite a few hyperparameters to be tuned for each version, making SVMs good learners when using default parameters and excellent ones when properly tuned by cross-validation, using GridSearchCV from the grid_search module in Scikit-learn.
As a golden rule, some parameters influence the result more and so should be fixed beforehand, others being dependent on their values. According to such an empirical rule, you have to correctly set the following parameters (ordered by rank of importance):
• C: This is the penalty value that we discussed before. Decreasing it makes the margin larger, thus ignoring more noise but also making for more computations.
A best value can be normally looked for in the np.logspace(-3, 3, 7) range.
• kernel: This is the non-linearity workhorse because an SVM can be set to linear, poly, rbf, sigmoid, or a custom kernel (for experts!). The widely used one is certainly rbf.
• degree: This works with kernel='poly', signaling the dimensionality of the polynomial expansion. It is ignored by other kernels. Usually, values from 2-5 work the best.
• gamma: This is a coefficient for 'rbf', 'poly', and 'sigmoid'; high values tend to fit data in a better way. The suggested grid search range is np.logspace(-3, 3, 7).
• nu: This is for regression and classification with nuSVR and nuSVC; this parameter approximates the training points that are not classified with confidence, misclassified points, and correct points inside or on the margin.
It should be a number in the range [0,1] as it is a proportion relative to your training set. In the end, it acts as C with high proportions enlarging the margin.
• epsilon: This parameter specifies how much error an SVR is going to accept by defining an epsilon large range where no penalty is associated with respect to the true value of the point. The suggested search range is np.insert(np.logspace(-4, 2, 7),0,[0]).
• penalty, loss and dual: For LinearSVC, these parameters accept the ('l1','squared_hinge',False), ('l2','hinge',True),
('l2','squared_hinge',True), and ('l2','squared_hinge',False) combinations. The ('l2','hinge',True) combination is equivalent to the SVC(kernel='linear') learner.
Chapter 3 First, we will load the Iris dataset:
In: from sklearn import datasets iris = datasets.load_iris() X_i, y_i = iris.data, iris.target
Then, we will fit an SVC with an RBF kernel (C and gamma were chosen on the basis of other known examples in Scikit-learn) and test the results using the cross_val_
score function:
from sklearn.svm import SVC
from sklearn.cross_validation import cross_val_score import numpy as np
h_class = SVC(kernel='rbf', C=1.0, gamma=0.7, random_state=101)
scores = cross_val_score(h_class, X_i, y_i, cv=20, scoring='accuracy') print 'Accuracy: %0.3f' % np.mean(scores)
Output: Accuracy: 0.969
The fitted model can provide you with an index pointing out what are the support vectors among your training examples:
In: h_class.fit(X_i,y_i) print h_class.support_
Out: [ 13 14 15 22 24 41 44 50 52 56 60 62 63 66 68 70 72 76 77 83 84 85 98 100 106 110 114 117 118 119 121 123 126 127 129 131 133 134 138 141 146 149]
Here is a graphical representation of the support vectors selected by the SVC for the Iris dataset, represented with color decision boundaries (we tested a discrete grid of values in order to be able to project for each area of the chart what class the model will predict):
Fast SVM Implementations
If you are interested in replicating the same charts, you can have a look and tweak this code snippet from http://scikit-learn.org/
stable/auto_examples/svm/plot_iris.html.
To test an SVM regressor, we decided to try SVR with the Boston dataset. First, we upload the dataset in the core memory and then we randomize the ordering of examples as, noticeably, such a dataset is actually ordered in a subtle fashion, thus making results from not order-randomized cross-validation invalid:
In: import numpy as np
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler scaler = StandardScaler()
boston = load_boston()
shuffled = np.random.permutation(boston.target.size) X_b = scaler.fit_transform(boston.data[shuffled,:]) y_b = boston.target[shuffled]
Due to the fact that we used the permutation function from the random module in the NumPy package, you may obtain a differently shuffled dataset and consequently a slightly
different cross-validated score from the following test. Moreover, the features having different scales, it is a good practice to standardize the features so that they will have zero-centered mean and unit variance. Especially when using SVM with kernels, standardization is indeed crucial.
Finally, we can fit the SVR model (we decided on some C, gamma, and epsilon parameters that we know work fine) and, using cross-validation, we evaluate it by the root mean squared error:
In: from sklearn.svm import SVR
from sklearn.cross_validation import cross_val_score
h_regr = SVR(kernel='rbf', C=20.0, gamma=0.001, epsilon=1.0) scores = cross_val_score(h_regr, X_b, y_b, cv=20, scoring='mean_
squared_error')
print 'Mean Squared Error: %0.3f' % abs(np.mean(scores)) Out: Mean Squared Error: 28.087
Chapter 3