René's URL Explorer Experiment

Title: 异常检测算法速览(Python） · Issue #18 · aialgorithm/Blog · GitHub

Open Graph Title: 异常检测算法速览(Python） · Issue #18 · aialgorithm/Blog

X Title: 异常检测算法速览(Python） · Issue #18 · aialgorithm/Blog

Description: 一、异常检测简介异常检测是通过数据挖掘方法发现与数据集分布不一致的异常数据，也被称为离群点、异常值检测等等。 1.1 异常检测适用的场景异常检测算法适用的场景特点有：（1）无标签或者类别极不均衡；（2）异常数据跟样本中大多数数据的差异性较大；（3）异常数据在总体数据样本中所占的比例很低。常见的应用案例如：金融领域：从金融数据中识别”欺诈用户“，如识别信用卡申请欺诈、信用卡盗刷、信贷欺诈等；安全领域：判断流量数据波动以及是否受到攻击等等；电商领域：从交易...

Open Graph Description: 一、异常检测简介异常检测是通过数据挖掘方法发现与数据集分布不一致的异常数据，也被称为离群点、异常值检测等等。 1.1 异常检测适用的场景异常检测算法适用的场景特点有：（1）无标签或者类别极不均衡；（2）异常数据跟样本中大多数数据的差异性较大；（3）异常数据在总体数据样本中所占的比例很低。常见的应用案例如：金融领域：从金融数据中识别”欺诈用户“，如识别信用卡申请欺诈、信用卡盗刷、...

X Description: 一、异常检测简介异常检测是通过数据挖掘方法发现与数据集分布不一致的异常数据，也被称为离群点、异常值检测等等。 1.1 异常检测适用的场景异常检测算法适用的场景特点有：（1）无标签或者类别极不均衡；（2）异常数据跟样本中大多数数据的差异性较大；（3）异常数据在总体数据样本中所占的比例很低。常见的应用案例如：金融领域：从金融数据中识别”欺诈用户“，如识别信用卡申请欺诈、信用卡盗刷、...

Opengraph URL: https://github.com/aialgorithm/Blog/issues/18

X: @github

direct link

Domain: github.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"异常检测算法速览(Python）","articleBody":"# 一、异常检测简介\r\n\r\n异常检测是通过数据挖掘方法发现与数据集分布不一致的异常数据，也被称为离群点、异常值检测等等。![](https://upload-images.jianshu.io/upload_images/11682271-9e73016e04ae093b.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n## 1.1 异常检测适用的场景\r\n异常检测算法适用的场景特点有：\r\n（1）无标签或者类别极不均衡；\r\n（2）异常数据跟样本中大多数数据的差异性较大；\r\n（3）异常数据在总体数据样本中所占的比例很低。 常见的应用案例如：\r\n\r\n金融领域：从金融数据中识别”欺诈用户“，如识别信用卡申请欺诈、信用卡盗刷、信贷欺诈等；\r\n安全领域：判断流量数据波动以及是否受到攻击等等；\r\n电商领域：从交易等数据中识别”恶意买家“，如羊毛党、恶意刷屏团伙；\r\n生态灾难预警：基于天气指标数据，判断未来可能出现的极端天气；\r\n医疗监控：从医疗设备数据，发现可能会显示疾病状况的异常数据；\r\n![](https://upload-images.jianshu.io/upload_images/11682271-ffd5e7555dc99ff1.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n## 1.2 异常检测存在的挑战\r\n异常检测是热门的研究领域，但由于异常存在的未知性、异质性、特殊性及多样性等复杂情况，整个领域仍有较多的挑战：\r\n\r\n- 1）最具挑战性的问题之一是难以实现高异常检测召回率。由于异常非常罕见且具有异质性，因此很难识别所有异常。\r\n- 2）异常检测模型要提高精确度（precision）往往要深度结合业务特征，否则效果不佳，且容易导致对少数群体产生算法偏见。\r\n\r\n\r\n\r\n# 二、异常检测方法\r\n\r\n按照训练集是否包含异常值可以划分为异常值检测（outlier detection）及新颖点检测（novelty detection），新颖点检测的代表方法如one class SVM。\r\n\r\n按照异常类别的不同，异常检测可划分为：异常点检测(如异常消费用户)，上下文异常检测（如时间序列异常），组异常检测（如异常团伙）。\r\n\r\n按照学习方式的不同，异常检测可划分为：有监督异常检测（Supervised Anomaly Detection）、半监督异常检测（Semi-Supervised Anomaly Detection）及无监督异常检测（Unsupervised Anomaly Detection）。现实情况的异常检测问题，由于收集异常标签样本的难度大，往往是没有标签的，所以无监督异常检测应用最为广泛。\r\n\r\n无监督异常检测按其算法思想大致可分为如下下几类：\r\n\r\n\r\n\r\n\r\n## 2.1 基于聚类的方法\r\n基于聚类的异常检测方法通常依赖下列假设，1）正常数据实例属于数据中的一个簇，而异常数据实例不属于任何簇； 2）正常数据实例靠近它们最近的簇质心，而异常数据离它们最近的簇质心很远； 3）正常数据实例属于大而密集的簇，而异常数据实例要么属于小簇，要么属于稀疏簇；通过将数据归分到不同的簇中，异常数据则是那些属于小簇或者不属于任何一簇或者远离簇中心的数据。\r\n\r\n- 将距离簇中心较远的数据作为异常点：\r\n这类方法有 SOM、K-means、最大期望( expectation maximization，EM)及基于语义异常因子( semantic anomaly factor)算法等；\r\n\r\n- 将聚类所得小簇数据作为异常点：\r\n代表方法有K-means聚类；\r\n\r\n- 将不属于任何一簇作为异常点：\r\n代表方法有 DBSCAN、ROCK、SNN 聚类。\r\n\r\n\r\n\r\n\r\n## 2.2 基于统计的方法\r\n基于统计的方法依赖的假设是数据集服从某种分布( 如正态分布、泊松分布及二项式分布等) 或概率模型，通过判断某数据点是否符合该分布/模型( 即通过小概率事件的判别) 来实现异常检测。根据概率模型可分为: \r\n- 1) 参数方法，由已知分布的数据中估计模型参数( 如高斯模型) ，其中最简单的参数异常检测模型就是假设样本服从一元正态分布，当数据点与均值差距大于两倍或三倍方差时，则认为该点为异常;\r\n - 2) 非参数方法，在数据分布未知时，可绘制直方图通过检测数据是否在训练集所产生的直方图中来进行异常检测。还可以利用数据的变异程度( 如均差、标准差、变异系数、四分位数间距等) 来发现数据中的异常点数据。\r\n\r\n## 2.3 基于深度的方法\r\n该方法将数据映射到 k 维空间的分层结构中，并假设异常值分布在外围，而正常数据点靠近分层结构的中心（深度越高）。\r\n- 半空间深度法( ISODEPTH 法) ，通过计算每个点的深度，并根据深度值判断异常数据点。\r\n- 最小椭球估计 ( minimum volume ellipsoid estimator，MVE)法。根据大多数数据点( 通常为 ＞ 50% ) 的概率分布模型拟合出一个实线椭圆形所示的最小椭圆形球体的边界，不在此边界范围内的数据点将被判断为异常点。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-884a5a9bed23eb05.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n-  孤立森林。上述两种基于深度的基础模型随着特征维度k的增加，其时间复杂性呈指数增长，通常适用于维度k≤3 时，而孤立森林通过改变计算深度的方式，也可以适用于高维的数据。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-3841345b16333d79.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n孤立森林算法是基于 Ensemble 的异常检测方法，因此具有线性的时间复杂度。且精准度较高，在处理大数据时速度快，所以目前在工业界的应用范围比较广。其基本思想是：通过树模型方法随机地切分样本空间，那些密度很高的簇要被切很多次才会停止切割（即每个点都单独存在于一个子空间内），但那些分布稀疏的点（即异常点），大都很早就停到一个子空间内了。算法步骤为：\r\n1）从训练数据中随机选择 Ψ 个样本，以此训练单棵树。\r\n\r\n2）随机指定一个q维度（attribute），在当前节点数据中随机产生一个切割点p。p切割点产生于当前节点数据中指定q维度的最大值和最小值之间。\r\n\r\n3）在此切割点的选取生成了一个超平面，将当前节点数据空间切分为2个子空间：把当前所选维度下小于 p 的点放在当前节点的左分支，把大于等于 p 的点放在当前节点的右分支；\r\n\r\n4）在节点的左分支和右分支节点递归步骤 2、3，不断构造新的叶子节点，直到叶子节点上只有一个数据（无法再继续切割） 或树已经生长到了所设定的高度 。\r\n（设置单颗树的最大高度是因为异常数据记录都比较少，其路径长度也比较低，而我们也只需要把正常记录和异常记录区分开来，因此只需要关心低于平均高度的部分就好，这样算法效率更高。）\r\n\r\n5） 由于每颗树训练的切割特征空间过程是完全随机的，所以需要用 ensemble 的方法来使结果收敛，即多建立几棵树，然后综合计算每棵树切分结果的平均值。对于每个样本 x，通过下面的公式计算综合的异常得分s。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-0fe5e0f3432011fc.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\nh(x) 为 x 在每棵树的高度，c(Ψ) 为给定样本数 Ψ 时路径长度的平均值，用来对样本 x 的路径长度 h(x) 进行标准化处理。\r\n\r\n\r\n## 2.4 基于分类模型：\r\n代表方法是One class SVM，其原理是寻找一个超平面将样本中的正例圈出来，预测就是用这个超平面做决策，在圈内的样本就认为是正样本。由于核函数计算比较耗时，在海量数据的场景用的并不多。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-232364f85fc0c728.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n\r\n\r\n\r\n## 2.5 基于邻近的方法：\r\n依赖的假设是：正常数据实例位于密集的邻域中，而异常数据实例附近的样例较为稀疏。可以继续细分为 基于密度/邻居：\r\n- 基于密度，该方法通过计算数据集中各数据区域的密度，将密度较低区域作为离群区域。经典的方法为：局部离群因子( local outlier factor，LOF) 。LOF 法与传统异常点非彼即此定义不同，将异常点定义局域是异常点，为每个数据赋值一个代表相对于其邻域的 LOF 值，LOF 越大，说明其邻域密度较低，越有可能是异常点。但在 LOF 中难以确定最小近邻域，且随着数据维度的升高，计算复杂度和时间复杂度增加。\r\n\r\n-  基于距离，其基本思想是通过计算比较数据与近邻数据集合的距离来检测异常，正常数据点与其近邻数据相似，而异常数据则有别于近邻数据。\r\n\r\n\r\n\r\n## 2.6 基于偏差的方法\r\n当给定一个数据集时，可通过基于偏差法找出与整个数据集特征不符的点，并且数据集方差会随着异常点的移除而减小。该方法可分为逐个比较数据点的序列异常技术和 OLAP 数据立方体技术。目前该方法实际应用较少。\r\n\r\n## 2.7 基于重构的方法\r\n代表方法为PCA。PCA在异常检测方面的做法，大体有两种思路：一种是将数据映射到低维特征空间，然后在特征空间不同维度上查看每个数据点跟其它数据的偏差；另外一种是将数据映射到低维特征空间，然后由低维特征空间重新映射回原空间，尝试用低维特征重构原始数据，看重构误差的大小。\r\n\r\n\r\n## 2.8 基于神经网络的方法：\r\n\r\n\r\n代表方法有自动编码器( autoencoder，AE) ，长短期记忆神经网络（LSTM）等。\r\n- LSTM可用于时间序列数据的异常检测：利用历史序列数据训练模型，检测与预测值差异较大的异常点。\r\n- Autoencoder异常检测\r\nAutoencoder本质上使用了一个神经网络来产生一个高维输入的低维表示。Autoencoder与主成分分析PCA类似，但是Autoencoder在使用非线性激活函数时克服了PCA线性的限制。\r\n算法的基本上假设是异常点服从不同的分布。根据正常数据训练出来的Autoencoder，能够将正常样本重建还原，但是却无法将异于正常分布的数据点较好地还原，导致其基于重构误差较大。当重构误差大于某个阈值时，将其标记为异常值。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-e93634a401495610.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n##### 小结：无监督异常检测方法的要素为选择相关的特征以及基于合理假设选择合适的算法，可以更好的发挥异常检测效果。\r\n\r\n# 四、项目实战：信用卡反欺诈\r\n项目为kaggle上经典的信用卡欺诈检测，该数据集质量高，正负样本比例非常悬殊。我们在此项目主要用了无监督的Autoencoder新颖点检测，根据重构误差识别异常欺诈样本。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-d507c8d8fab44da3.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n```\r\n#!/usr/bin/env python\r\n# coding: utf-8\r\n\r\nimport warnings\r\nwarnings.filterwarnings(\"ignore\")\r\n\r\nimport pandas as pd\r\nimport numpy as np\r\nimport pickle\r\nimport matplotlib.pyplot as plt\r\nplt.style.use('seaborn')\r\nimport tensorflow as tf\r\nimport seaborn as sns\r\nfrom sklearn.model_selection import train_test_split\r\nfrom keras.models import Model, load_model\r\nfrom keras.layers import Input, Dense\r\nfrom keras.callbacks import ModelCheckpoint\r\nfrom keras import regularizers\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.metrics import roc_curve, auc, precision_recall_curve\r\n# 安利一个异常检测Python库 https://github.com/yzhao062/Pyod\r\n\r\n# 读取数据 ：信用卡欺诈数据集地址https://www.kaggle.com/mlg-ulb/creditcardfraud\r\nd = pd.read_csv('creditcard.csv')\r\n\r\n# 查看样本比例\r\nnum_nonfraud = np.sum(d['Class'] == 0)\r\nnum_fraud = np.sum(d['Class'] == 1)\r\nplt.bar(['Fraud', 'non-fraud'], [num_fraud, num_nonfraud], color='dodgerblue')\r\nplt.show()\r\n\r\n# 删除时间列，对Amount进行标准化\r\ndata = d.drop(['Time'], axis=1)\r\ndata['Amount'] = StandardScaler().fit_transform(data[['Amount']])\r\n\r\n# 为无监督新颖点检测方法，只提取负样本，并且按照8:2切成训练集和测试集\r\nmask = (data['Class'] == 0)\r\nX_train, X_test = train_test_split(data[mask], test_size=0.2, random_state=0)\r\nX_train = X_train.drop(['Class'], axis=1).values\r\nX_test = X_test.drop(['Class'], axis=1).values\r\n\r\n# 提取所有正样本，作为测试集的一部分\r\nX_fraud = data[~mask].drop(['Class'], axis=1).values\r\n\r\n# 构建Autoencoder网络模型\r\n# 隐藏层节点数分别为16，8，8，16\r\n# epoch为5，batch size为32\r\ninput_dim = X_train.shape[1]\r\nencoding_dim = 16\r\nnum_epoch = 5\r\nbatch_size = 32\r\n\r\ninput_layer = Input(shape=(input_dim, ))\r\nencoder = Dense(encoding_dim, activation=\"tanh\", \r\n                activity_regularizer=regularizers.l1(10e-5))(input_layer)\r\nencoder = Dense(int(encoding_dim / 2), activation=\"relu\")(encoder)\r\ndecoder = Dense(int(encoding_dim / 2), activation='tanh')(encoder)\r\ndecoder = Dense(input_dim, activation='relu')(decoder)\r\nautoencoder = Model(inputs=input_layer, outputs=decoder)\r\nautoencoder.compile(optimizer='adam', \r\n                    loss='mean_squared_error', \r\n                    metrics=['mae'])\r\n\r\n# 模型保存为model.h5，并开始训练模型\r\ncheckpointer = ModelCheckpoint(filepath=\"model.h5\",\r\n                               verbose=0,\r\n                               save_best_only=True)\r\nhistory = autoencoder.fit(X_train, X_train,\r\n                          epochs=num_epoch,\r\n                          batch_size=batch_size,\r\n                          shuffle=True,\r\n                          validation_data=(X_test, X_test),\r\n                          verbose=1, \r\n                          callbacks=[checkpointer]).history\r\n\r\n\r\n# 画出损失函数曲线\r\nplt.figure(figsize=(14, 5))\r\nplt.subplot(121)\r\nplt.plot(history['loss'], c='dodgerblue', lw=3)\r\nplt.plot(history['val_loss'], c='coral', lw=3)\r\nplt.title('model loss')\r\nplt.ylabel('mse'); plt.xlabel('epoch')\r\nplt.legend(['train', 'test'], loc='upper right')\r\n\r\nplt.subplot(122)\r\nplt.plot(history['mae'], c='dodgerblue', lw=3)\r\nplt.plot(history['val_mae'], c='coral', lw=3)\r\nplt.title('model mae')\r\nplt.ylabel('mae'); plt.xlabel('epoch')\r\nplt.legend(['train', 'test'], loc='upper right')\r\n\r\n\r\n# 读取模型\r\nautoencoder = load_model('model.h5')\r\n\r\n# 利用autoencoder重建测试集\r\npred_test = autoencoder.predict(X_test)\r\n# 重建欺诈样本\r\npred_fraud = autoencoder.predict(X_fraud)  \r\n\r\n# 计算重构MSE和MAE误差\r\nmse_test = np.mean(np.power(X_test - pred_test, 2), axis=1)\r\nmse_fraud = np.mean(np.power(X_fraud - pred_fraud, 2), axis=1)\r\nmae_test = np.mean(np.abs(X_test - pred_test), axis=1)\r\nmae_fraud = np.mean(np.abs(X_fraud - pred_fraud), axis=1)\r\nmse_df = pd.DataFrame()\r\nmse_df['Class'] = [0] * len(mse_test) + [1] * len(mse_fraud)\r\nmse_df['MSE'] = np.hstack([mse_test, mse_fraud])\r\nmse_df['MAE'] = np.hstack([mae_test, mae_fraud])\r\nmse_df = mse_df.sample(frac=1).reset_index(drop=True)\r\n\r\n# 分别画出测试集中正样本和负样本的还原误差MAE和MSE\r\nmarkers = ['o', '^']\r\nmarkers = ['o', '^']\r\ncolors = ['dodgerblue', 'coral']\r\nlabels = ['Non-fraud', 'Fraud']\r\n\r\nplt.figure(figsize=(14, 5))\r\nplt.subplot(121)\r\nfor flag in [1, 0]:\r\n    temp = mse_df[mse_df['Class'] == flag]\r\n    plt.scatter(temp.index, \r\n                temp['MAE'],  \r\n                alpha=0.7, \r\n                marker=markers[flag], \r\n                c=colors[flag], \r\n                label=labels[flag])\r\nplt.title('Reconstruction MAE')\r\nplt.ylabel('Reconstruction MAE'); plt.xlabel('Index')\r\nplt.subplot(122)\r\nfor flag in [1, 0]:\r\n    temp = mse_df[mse_df['Class'] == flag]\r\n    plt.scatter(temp.index, \r\n                temp['MSE'],  \r\n                alpha=0.7, \r\n                marker=markers[flag], \r\n                c=colors[flag], \r\n                label=labels[flag])\r\nplt.legend(loc=[1, 0], fontsize=12); plt.title('Reconstruction MSE')\r\nplt.ylabel('Reconstruction MSE'); plt.xlabel('Index')\r\nplt.show()\r\n# 下图分别是MAE和MSE重构误差，其中橘黄色的点是信用欺诈，也就是异常点；蓝色是正常点。我们可以看出异常点的重构误差整体很高。\r\n\r\n# 画出Precision-Recall曲线\r\nplt.figure(figsize=(14, 6))\r\nfor i, metric in enumerate(['MAE', 'MSE']):\r\n    plt.subplot(1, 2, i+1)\r\n    precision, recall, _ = precision_recall_curve(mse_df['Class'], mse_df[metric])\r\n    pr_auc = auc(recall, precision)\r\n    plt.title('Precision-Recall curve based on %s\\nAUC = %0.2f'%(metric, pr_auc))\r\n    plt.plot(recall[:-2], precision[:-2], c='coral', lw=4)\r\n    plt.xlabel('Recall'); plt.ylabel('Precision')\r\nplt.show()\r\n\r\n# 画出ROC曲线\r\nplt.figure(figsize=(14, 6))\r\nfor i, metric in enumerate(['MAE', 'MSE']):\r\n    plt.subplot(1, 2, i+1)\r\n    fpr, tpr, _ = roc_curve(mse_df['Class'], mse_df[metric])\r\n    roc_auc = auc(fpr, tpr)\r\n    plt.title('Receiver Operating Characteristic based on %s\\nAUC = %0.2f'%(metric, roc_auc))\r\n    plt.plot(fpr, tpr, c='coral', lw=4)\r\n    plt.plot([0,1],[0,1], c='dodgerblue', ls='--')\r\n    plt.ylabel('TPR'); plt.xlabel('FPR')\r\nplt.show()\r\n# 不管是用MAE还是MSE作为划分标准，模型的表现都算是很好的。PR AUC分别是0.51和0.44，而ROC AUC都达到了0.95。\r\n\r\n# 画出MSE、MAE散点图\r\nmarkers = ['o', '^']\r\ncolors = ['dodgerblue', 'coral']\r\nlabels = ['Non-fraud', 'Fraud']\r\n\r\nplt.figure(figsize=(10, 5))\r\nfor flag in [1, 0]:\r\n    temp = mse_df[mse_df['Class'] == flag]\r\n    plt.scatter(temp['MAE'], \r\n                temp['MSE'],  \r\n                alpha=0.7, \r\n                marker=markers[flag], \r\n                c=colors[flag], \r\n                label=labels[flag])\r\nplt.legend(loc=[1, 0])\r\nplt.ylabel('Reconstruction RMSE'); plt.xlabel('Reconstruction MAE')\r\nplt.show()\r\n\r\n```\r\n\r\n---\r\n文章首发于算法进阶，公众号阅读原文可访问[GitHub项目源码](https://github.com/aialgorithm/Blog)","author":{"url":"https://github.com/aialgorithm","@type":"Person","name":"aialgorithm"},"datePublished":"2021-07-29T08:04:27.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/18/Blog/issues/18"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:cc3b6703-6831-11f2-b9ca-28caeb191e48
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	DD14:5F13B:ACEC55:F2E048:696A569A
html-safe-nonce	a19f8439434f510ae75c9e478f11d57237513cb256b1cbb12af4e18c76910b75
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJERDE0OjVGMTNCOkFDRUM1NTpGMkUwNDg6Njk2QTU2OUEiLCJ2aXNpdG9yX2lkIjoiNDM1OTQ4NjI4Nzg5OTE1NDA3NCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac	d8bb2ae8f25ebae5a720fd1674475c7fdcd1d5a0b38948aa3017ffd65ffddd39
hovercard-subject-tag	issue:955573322
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/aialgorithm/Blog/18/issue_layout
twitter:image	https://opengraph.githubassets.com/79e04a2f5c304580e9f3de7d48c2916b6d47120aa44a962072583886ec7ea94d/aialgorithm/Blog/issues/18
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/79e04a2f5c304580e9f3de7d48c2916b6d47120aa44a962072583886ec7ea94d/aialgorithm/Blog/issues/18
og:image:alt	一、异常检测简介异常检测是通过数据挖掘方法发现与数据集分布不一致的异常数据，也被称为离群点、异常值检测等等。 1.1 异常检测适用的场景异常检测算法适用的场景特点有：（1）无标签或者类别极不均衡；（2）异常数据跟样本中大多数数据的差异性较大；（3）异常数据在总体数据样本中所占的比例很低。常见的应用案例如：金融领域：从金融数据中识别”欺诈用户“，如识别信用卡申请欺诈、信用卡盗刷、...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	aialgorithm
hostname	github.com
expected-hostname	github.com
None	3f871c8e07f0ae1886fa8dac284166d28b09ad5bada6476fc10b674e489788ef
turbo-cache-control	no-preview
go-import	github.com/aialgorithm/Blog git https://github.com/aialgorithm/Blog.git
octolytics-dimension-user_id	33707637
octolytics-dimension-user_login	aialgorithm
octolytics-dimension-repository_id	147093233
octolytics-dimension-repository_nwo	aialgorithm/Blog
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	147093233
octolytics-dimension-repository_network_root_nwo	aialgorithm/Blog
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	63c426b30d262aba269ef14c40e3c817b384cd61
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://github.com/aialgorithm/Blog/issues/18#start-of-content
	https://github.com/
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Faialgorithm%2FBlog%2Fissues%2F18
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Faialgorithm%2FBlog%2Fissues%2F18
Sign up	https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=aialgorithm%2FBlog
Reload	https://github.com/aialgorithm/Blog/issues/18
Reload	https://github.com/aialgorithm/Blog/issues/18
Reload	https://github.com/aialgorithm/Blog/issues/18
aialgorithm	https://github.com/aialgorithm
Blog	https://github.com/aialgorithm/Blog
Notifications	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Fork 259	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Star 942	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Code	https://github.com/aialgorithm/Blog
Issues 66	https://github.com/aialgorithm/Blog/issues
Pull requests 0	https://github.com/aialgorithm/Blog/pulls
Actions	https://github.com/aialgorithm/Blog/actions
Projects 0	https://github.com/aialgorithm/Blog/projects
Security Uh oh! There was an error while loading. Please reload this page.	https://github.com/aialgorithm/Blog/security
Please reload this page	https://github.com/aialgorithm/Blog/issues/18
Insights	https://github.com/aialgorithm/Blog/pulse
Code	https://github.com/aialgorithm/Blog
Issues	https://github.com/aialgorithm/Blog/issues
Pull requests	https://github.com/aialgorithm/Blog/pulls
Actions	https://github.com/aialgorithm/Blog/actions
Projects	https://github.com/aialgorithm/Blog/projects
Security	https://github.com/aialgorithm/Blog/security
Insights	https://github.com/aialgorithm/Blog/pulse
New issue	https://github.com/login?return_to=https://github.com/aialgorithm/Blog/issues/18
New issue	https://github.com/login?return_to=https://github.com/aialgorithm/Blog/issues/18
异常检测算法速览(Python）	https://github.com/aialgorithm/Blog/issues/18#top
	https://github.com/aialgorithm
	https://github.com/aialgorithm
aialgorithm	https://github.com/aialgorithm
on Jul 29, 2021	https://github.com/aialgorithm/Blog/issues/18#issue-955573322
	https://camo.githubusercontent.com/2087a958d2773a8d5310f7e37d832037d842603dfc38351d1dd10690024014f7/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d396537333031366530346165303933622e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/ba9ea4d414a133799670f507231c71b7231f221a9f65eed289015383f9505191/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d666664356537353535646339396666312e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/2792c8a60d5bdf1a239811e3f7c2f1edd2bf429cdc2208beadd2b63df746fba7/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d383834613561396265643233656230352e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/10ee6bff96483b50d0ec3a140353ba7772e4811bfaa3ac5908090ba84c3cf15f/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d333834313334356231363333336437392e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/7c7bbbd4f884f48a85a76a5d31f304bebb5b58c03bf651ac9af7936b9408ccfe/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d306665356530663334333230313166632e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/031f8b48518fc0bc56bc9a701ec080c471697ba9b6a2c7f946b185863ed0f2aa/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d323332333634663835666330633732382e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/2e0c928b232b5ac035a4874641efb8aea41666dd6c3300964de0b4e5e0e97b44/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d653933363334613430313439353631302e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/4abe076628648f867de8aa65e3c6ce2134cff7210447a9882889fa4cd2cb3db8/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d643530376338643866616234346461332e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
GitHub项目源码	https://github.com/aialgorithm/Blog
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.