SMV 被广泛用于数据二分类,在变种中也有做异常检测的应用,本文记录异常检测算法 OCSVM(One Class SVM)。
$$ \begin{array}{c} \min _{w, \zeta_{i}, \rho} \frac{1}{2}\|w\|^{2}+\frac{1}{\nu n} \sum_{i=1}^{n} \zeta_{i}-\rho \\ s.t. \left(w^{T} \phi\left(x_{i}\right)\right)>\rho-\zeta_{i}, i=1, \ldots, n \\ \zeta_{i}>0 \end{array} $$
时:
此时原点和数据在一起,在“数据侧”
$$ \begin{array}{c} \max \frac{\rho}{||w||}\\ s.t. w^Tx_i-\rho \geq0,i\in\{1,2,...,n\}\\ \end{array} $$
当 即:
To separate the data set from the origin, we solve the following quadratic program
$$ \begin{array}{c} \min _{w, \zeta_{i}, \rho} \frac{1}{2}\|w\|^{2}+\frac{1}{\nu n} \sum_{i=1}^{n} \zeta_{i}-\rho \\ s.t. \left(w^{T} \phi\left(x_{i}\right)\right)>\rho-\zeta_{i}, i=1, \ldots, n \\ \zeta_{i}>0 \end{array} $$
$$ f(x)=sign\left(\left(w^{T} \phi\left(x_{i}\right)\right)-\rho\right) =sign( \sum_{i=1}^{n n} \alpha_{i} K\left(x, x_{i}\right)-\rho ) $$
引用一位大佬文章的代码:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.font_manager
from sklearn import svm
"""
Anomaly detection: generate data, and fit the model using scikit-learn OneClassSvm.
scikit-learn Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html
"""
# Generate train/test/abnormal data
X = 0.3 * np.random.randn(100, 2)
XX = 0.3 * np.random.randn(20, 2)
X_train = np.r_[X + 2, X - 2]
X_test = np.r_[XX + 2, XX - 2]
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
# fit the model
clf = svm.OneClassSVM(nu=0.5, kernel='rbf', gamma=0.1)
clf.fit(X_train)
y_pred_train = clf.predict(X_train) # return 1,-1
y_pred_test = clf.predict(X_test) # return 1,-1
y_pred_outliers = clf.predict(X_outliers)
# fp/fn
n_error_train = y_pred_train[y_pred_train == -1].size
n_error_test = y_pred_test[y_pred_test == -1].size
n_error_outliers = y_pred_outliers[y_pred_outliers == 1].size
"""
Visualization of the result.
"""
xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500))
# plot the line, the points, and the nearest vectors to the plane
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(10,6))
plt.title('Novelty Detection')
plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
plt.contourf(xx, yy, Z, levels=[0, Z.max()], colors='palevioletred')
s = 40
b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=s, edgecolors='k')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='blueviolet', s=s,
edgecolors='k')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='gold', s=s,
edgecolors='k')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([a.collections[0], b1, b2, c],
['learned frontier', 'training observations',
'new regular observations', 'new abnormal observations'],
loc='upper left',
prop=matplotlib.font_manager.FontProperties(size=11))
plt.xlabel(
'error train: %d/200 ; errors novel regular: %d/40 ; '
'errors novel abnormal: %d/40'
% (n_error_train, n_error_test, n_error_outliers))
plt.show()