René's URL Explorer Experiment

Title: 机器学习数据不满足同分布，怎么整？ · Issue #63 · aialgorithm/Blog · GitHub

Open Graph Title: 机器学习数据不满足同分布，怎么整？ · Issue #63 · aialgorithm/Blog

X Title: 机器学习数据不满足同分布，怎么整？ · Issue #63 · aialgorithm/Blog

Description: 机器学习作为一门科学，不可避免的是，科学本身是基于归纳得到经验总结，必然存在历史经验不适用未来的情况（科学必可证伪）。这里很应景地讲一个小故事--By 哲学家罗素：农场有一群火鸡，农夫每天来给它们喂食。经过长期观察后，一只火鸡（火鸡中的科学鸡）得出结论，“每天早上农夫来到鸡舍，我就有吃的”，之后每天的经历都在证实它的这个结论。但是有一天，农夫来到鸡舍，没有带来食物而是把它烤了，因为这天是圣诞节，做成了圣诞节火鸡。通过有限的观察，得出自以为正确的规律性结论的，结局如...

Open Graph Description: 机器学习作为一门科学，不可避免的是，科学本身是基于归纳得到经验总结，必然存在历史经验不适用未来的情况（科学必可证伪）。这里很应景地讲一个小故事--By 哲学家罗素：农场有一群火鸡，农夫每天来给它们喂食。经过长期观察后，一只火鸡（火鸡中的科学鸡）得出结论，“每天早上农夫来到鸡舍，我就有吃的”，之后每天的经历都在证实它的这个结论。但是有一天，农夫来到鸡舍，没有带来食物而是把它烤了，因为这天是...

X Description: 机器学习作为一门科学，不可避免的是，科学本身是基于归纳得到经验总结，必然存在历史经验不适用未来的情况（科学必可证伪）。这里很应景地讲一个小故事--By 哲学家罗素：农场有一群火鸡，农夫每天来给它们喂食。经过长期观察后，一只火鸡（火鸡中的科学鸡）得出结论，“每天早上农夫来到鸡舍，我就有吃的”，之后每天的经历都在证实它的这个结论。但是有一天，农夫来到鸡舍，没有带来食物而是把它烤了，因为这天是...

Opengraph URL: https://github.com/aialgorithm/Blog/issues/63

X: @github

direct link

Domain: github.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"机器学习数据不满足同分布，怎么整？","articleBody":"机器学习作为一门科学，不可避免的是，**科学**本身是基于归纳得到经验总结，必然存在历史经验不适用未来的情况（科学必可证伪）。这里很应景地讲一个小故事--By 哲学家罗素：\r\n\u003e![](https://upload-images.jianshu.io/upload_images/11682271-da1521a81a968c00.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n农场有一群火鸡，农夫每天来给它们喂食。经过长期观察后，一只火鸡（火鸡中的科学鸡）得出结论，“每天早上农夫来到鸡舍，我就有吃的”，之后每天的经历都在证实它的这个结论。\r\n但是有一天，农夫来到鸡舍，没有带来食物而是把它烤了，因为这天是圣诞节，做成了圣诞节火鸡。![](https://upload-images.jianshu.io/upload_images/11682271-7c4359edbd868697.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n通过有限的观察，得出自以为正确的规律性结论的，结局如是此。以这角度，我们去看AI/机器学习的应用，也能看到很多类似的例子。\r\n\r\n机器学习是研究怎样使用计算机模拟或实现人类学习活动的科学，是基于一系列假设（基本的如，独立同分布假设）归纳得到经验，进行预测的过程。\r\n\r\n也不可避免的，机器学习中也可能出现预测的数据与训练数据不满足同分布，**历史数据经验不那么适用了！**导致预测效果变差或失效的情况。这就类似我们考试的时候，发现这类型的题目我没有见过，歇菜了...\r\n\r\n\r\n\r\n### 一、什么是数据不满足同分布\r\n\r\n实际预测与训练数据不满足同分布的问题，也就是数据集偏移(Dataset shift)，是机器学习一个很重要的问题。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-98328fa7b19c98ee.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n从贝叶斯定理可得P(y,x) = P(y|x) * P(x) = P(x|y) * P(y)，当输入空间的边缘概率分布P(x)   , 输出空间的标签分布P(y)  以及表示该机器学习任务的条件概率分布 P(y|x) 之中，有任一项因素发生偏移导致训练数据与预测数据 P(y,x)造成差异，即为数据集偏移现象。\r\n\r\n不同因素对应着如下三种情况得数据偏移：\r\n- Covariate shift：协变量偏移（统计学中的协变量即机器学习中的特征的概念），\r\n指的是输入空间的边缘概率分布P(x)，也就输入特征x分布变化导致的偏移。这个应该是最为常见的，比如图像识别任务中，训练时输入的人脸图像数据没戴口罩，而预测的时候出现了很多戴口罩人脸的图像。  再如反欺诈识别中，实际预测欺诈用户的欺诈行为发生升级改变，与训练数据的行为特征有差异的情况。\r\n\r\n- Prior probability shift：先验偏移，指的是标签分布P(Y) 差异导致的。比如反欺诈识别中，线上某段时间欺诈用户的比例 对比 训练数据 突然变得很大的情况。\r\n\r\n- Concept shift：映射关系偏移，指P(y|x) 分布变化，也就是x-\u003e y的映射关系发生变化。比如农场的火鸡，本来x是【 早上/农夫/来到/鸡舍】对应着 y是【火鸡被喂食】，但是圣诞节那天这层关系突然变了，x还是【 早上/农夫/来到/鸡舍】但对应着 y是【火鸡被烤了】..hah，留下心疼的口水..\r\n\r\n### 二、为什么数据不满足同分布\r\n\r\n可能导致数据不满足同分布的两个常见的原因是：\r\n\r\n- （1）样本选择偏差(Sample Selection Bias) ：分布上的差异是由于训练数据是通过有偏见的方法获得的。\r\n\r\n比如金融领域的信贷客群是通过某种渠道/规则获得的，后面我们新增加营销渠道获客 或者  放宽了客户准入规则。这样就会直接导致实际客群样本比历史训练时点的客群样本更加多样了（分布差异）。\r\n\r\n- （2）不平稳环境(Non-stationary Environments)：由于时间上的或空间上的变化导致训练与测试环境不同。\r\n\r\n比如金融领域，预测用户是否会偿还贷款的任务。有一小类用户在经济环境好的时候有能力偿还债务，但是由于疫情或其他的影响，宏观经济环境不太景气，如今就无法偿还了。\r\n\r\n\r\n\r\n### 三、如何检测数据满足同分布\r\n\r\n可能我们模型在训练、验证及测试集表现都不错，但一到OOT（时间外样本）或者线上预测的时候，效果就掉下来了。这时我们就不能简单说是模型复杂导致过拟合了，也有可能是预测数据的分布变化导致的效果变差。我们可以通过如下常用方式检测数据分布有没有变化：\r\n\r\n#### 3.1 统计指标的方法\r\n通过统计指标去检测分布差异是很直接的，我们通常用群体稳定性指标（Population Stability Index，PSI）， 衡量未来的样本（如测试集）及训练样本评分的分布比例是否保持一致，以评估数据/模型的稳定性（按照经验值，PSI\u003c0.1分布差异是比较小的。）。![](https://upload-images.jianshu.io/upload_images/11682271-29e3b303b08b14d0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n同理，PSI也可以细化衡量特征值的分布差异，评估数据特征层面的稳定性。PSI指标计算公式为 SUM(各分数段的 (实际占比 - 预期占比）* ln(实际占比 / 预期占比) )，介绍可见：[指标](http://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==\u0026mid=2247486137\u0026idx=1\u0026sn=abbc4f6c241a812dcfec690ed3431a25\u0026scene=19#wechat_redirect)。其他的方法如 KS检验，KDE （核密度估计）分布图等方法可见参考链接[2]\r\n\r\n\r\n#### 3.2 异常（新颖）点检测的方法\r\n可以通过训练数据集训练一个模型（如 oneclass-SVM），利用模型判定哪些数据样本的不同于训练集分布（异常概率）。异常检测方法可见：[异常检测算法速览](http://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==\u0026mid=2247485161\u0026idx=1\u0026sn=379c206b399e7ed11a8b18016a3c3cc2\u0026scene=19#wechat_redirect)\r\n![](https://upload-images.jianshu.io/upload_images/11682271-928b6d1b25e7599f.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n#### 3.3 分类的方法\r\n混合训练数据与测试数据（测试数据可得情况），将训练数据与测试数据分别标注为’1‘和’0‘标签，进行分类，若一个模型，可以以一个较好的精度将训练实例与测试实例区分开，说明训练数据与测试数据的特征值分布有较大差异，存在协变量偏移。\r\n\r\n相应的对这个分类模型贡献度比较高的特征，也就是分布偏差比较大的特征。分类较准确的样本（简单样本）也就是分布偏差比较大的样本。\r\n\r\n### 四、如何解决数据不满足同分布\r\n\r\n#### 4.1 增加数据    \r\n增加数据是王道，训练数据只要足够大，什么场面没见过，测试数据的效果自然也可以保证。\r\n\r\n如上面的例子，作为一只农场中的科学鸡，如果观察到完整周期、全场景的数据，或者被灌输一些先验知识，就能更为准确预测火鸡的命运。\r\n\r\n但是现实情况可能多少比较无奈，可能业务场景的原因限制，并不一定可以搞得到更多数据，诸如联邦学习、数据增强等方法也是同样的思路。\r\n\r\n#### 4.2 数据增强\r\n在现实情况没法新增数据的时候，数据增强(Data Augmentation)是一个备选方案，在不实质性的增加数据的情况下，从原始数据加工出更多的表示，提高原数据的数量及质量，以接近于更多数据量产生的价值。\r\n\r\n![](https://upload-images.jianshu.io/upload_images/11682271-5c572a369390525a.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n其原理是通过对原始数据融入先验知识，加工出更多数据的表示，有助于模型判别数据中统计噪声，加强本体特征的学习，减少模型过拟合，提升泛化能力。具体可见：[数据增强方法](https://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==\u0026mid=2247484169\u0026idx=1\u0026sn=54d58e40b31ee34c7fe3c44c096a4221\u0026scene=19#wechat_redirect)\r\n\r\n#### 4.3 选择数据\r\n\r\n我们可以选择和待预测样本分布比较一致的数据做模型训练，使得在待预测样本的效果变得更好。\r\n\r\n这个方法看起来有点投机，这在一些数据波动大的数据竞赛中很经常出现，直接用全量训练样本的结果不一定会好，而我们更改下数据集划分split的随机种子（如暴力for循环遍历一遍各个随机种子的效果），或者  人工选择与线上待预测样本业务类型、 时间相近的样本集用于训练模型（或者 提高这部分样本的学习权重），线上数据的预测效果就提升了。\r\n\r\n#### 4.4 半监督学习\r\n[半监督学习](https://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==\u0026mid=2247484300\u0026idx=1\u0026sn=ae608281023850efdb56a35abb92d907\u0026scene=19#wechat_redirect) 是介于传统监督学习和无监督学习之间，其思想是通过在模型训练中直接引入无标记样本，以充分捕捉数据整体潜在分布，以改善如传统无监督学习过程盲目性、监督学习在训练样本不足导致的学习效果不佳的问题。\r\n\r\n ![](https://upload-images.jianshu.io/upload_images/11682271-69e65eeac7546c47.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n通过半监督学习，训练时候可以充分捕捉数据整体潜在分布，同理也可以缓解预测数据分布有差异的问题。半监督分类常用的做法是，通过业务含义或者模型选择出一些虽然无标签的样本，并打上大概率的某个标签（伪标签）加入到训练数据中，验证待预测样本的效果有没有变好。\r\n\r\n经典的如金融信贷领域的拒绝推断方法（参考链接[6]），我们可以从贷款被拒绝的用户中（这部分用户是贷款的时候直接被拒绝了，没有\"是否违约\"的标签），通过现有信贷违约模型（申请评分卡）预测这部分拒绝用户的违约概率，并把模型认为大概率违约的用户作为坏样本加入到训练样本中，以提升模型的泛化效果。\r\n\r\n#### 4.5 特征选择\r\n对于常见的协变量偏移，用特征选择是一个不错的方法。我们可以分析各个特征在分布稳定性（如PSI值）的情况，筛选掉分布差异比较大的特征。需要注意的是，这里适用的是筛掉特征重要性一般且稳定性差的特征。如果重要特征的分布差距也很大，这就难搞了，还是回头搞搞数据或者整整其他的强特征。特征选择方法可见：[python特征选择](http://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==\u0026mid=2247483996\u0026idx=1\u0026sn=1659cedcc0268f2bee803e96eceabab5\u0026scene=19#wechat_redirect)\r\n![](https://upload-images.jianshu.io/upload_images/11682271-65f6a6fd87189b07.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n#### 4.6 均衡学习\r\n均衡学习适用与标签分布差异（先验偏移）导致的数据集偏移。均衡学习的方法可以归结为：通过某种方法，使得不同类别的样本对于模型学习中的Loss（或梯度）贡献是比较均衡的，以消除模型对不同类别的偏向性，学习到更为本质的决策。\r\n\r\n比如原反欺诈训练样本中，好坏样本的比例是1000：1，但到了预测，有时实际的好坏样本的比例是10：1。这时如果没有通过均衡学习，直接从训练样本学习到模型，会先天认为欺诈坏样本的概率就是很低的，导致很多欺诈坏样本的漏判。\r\n\r\n![](https://upload-images.jianshu.io/upload_images/11682271-4e613c0206b8012c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n不均衡的任务中，一方面可以通过代价敏感、采样等方法做均衡学习；另一方面也可以通过合适指标（如AUC），减少非均衡样本的影响去判定模型的效果。具体可见：[一文解决样本不均衡（全）](https://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==\u0026mid=2247487430\u0026idx=1\u0026sn=abb25dfb333c53634f435c101e1fb8dd\u0026scene=19#wechat_redirect)\r\n\r\n最后，机器学习是一门注重实践的科学，在实践中验证效果，不断探索原理。仅以此文致敬我们的数据科学鸡啊。\r\n\r\n--------------\r\n\u003e参考链接：\r\n1、理解数据集偏移 https://zhuanlan.zhihu.com/p/449101154\r\n2、[训练/测试集分布不一致解法总结](https://mp.weixin.qq.com/s?__biz=MzIyNjM2MzQyNg==\u0026mid=2247627421\u0026idx=1\u0026sn=bd04c00ba684e6fa2f4bb418e700fc60\u0026chksm=e87d23d0df0aaac613f84293b715b0661c3d72d667fbdf17c89a957c0cca8a29302216b3343b\u0026mpshare=1\u0026scene=1\u0026srcid=1012tyIXlyLmXqueq3g5qAi4\u0026sharer_sharetime=1665535882449\u0026sharer_shareid=154fc80c9534d9efd371a48cf67a483a\u0026version=4.0.16.6007\u0026platform=win#rd)\r\n3、训练集和测试集的分布差距太大有好的处理方法吗 https://www.zhihu.com/question/265829982/answer/1770310534\r\n4、训练集与测试集之间的数据偏移（dataset shift or drifting） https://zhuanlan.zhihu.com/p/304018288\r\n5、数据集偏移\u0026领域偏移 Dataset Shift\u0026Domain Shift https://zhuanlan.zhihu.com/p/195704051\r\n6、如何量化样本偏差对信贷风控模型的影响？https://zhuanlan.zhihu.com/p/350616539","author":{"url":"https://github.com/aialgorithm","@type":"Person","name":"aialgorithm"},"datePublished":"2022-12-20T12:29:13.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/63/Blog/issues/63"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:3f8e9fdb-787b-d7f2-661c-f40bf580eec5
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	89F2:245D90:990562:CFAA53:6969ECEE
html-safe-nonce	9c0b4b5154aa71694dcb402fcd8f52a3a1fa181ad0181d0177c49a9a8664527f
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4OUYyOjI0NUQ5MDo5OTA1NjI6Q0ZBQTUzOjY5NjlFQ0VFIiwidmlzaXRvcl9pZCI6IjQ4MTg2MzAzODUyMDY1MDk5MCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac	01169ed5a75fe8cb1bc72cd75180f32353ecf0f3c633df616f1e8d6df97ac051
hovercard-subject-tag	issue:1504485554
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/aialgorithm/Blog/63/issue_layout
twitter:image	https://opengraph.githubassets.com/8fee17bac59e7d2544c5f0593d73c67f3e4d6a54dd485bab417ca149745ceb70/aialgorithm/Blog/issues/63
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/8fee17bac59e7d2544c5f0593d73c67f3e4d6a54dd485bab417ca149745ceb70/aialgorithm/Blog/issues/63
og:image:alt	机器学习作为一门科学，不可避免的是，科学本身是基于归纳得到经验总结，必然存在历史经验不适用未来的情况（科学必可证伪）。这里很应景地讲一个小故事--By 哲学家罗素：农场有一群火鸡，农夫每天来给它们喂食。经过长期观察后，一只火鸡（火鸡中的科学鸡）得出结论，“每天早上农夫来到鸡舍，我就有吃的”，之后每天的经历都在证实它的这个结论。但是有一天，农夫来到鸡舍，没有带来食物而是把它烤了，因为这天是...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	aialgorithm
hostname	github.com
expected-hostname	github.com
None	7b32f1c7c4549428ee399213e8345494fc55b5637195d3fc5f493657579235e8
turbo-cache-control	no-preview
go-import	github.com/aialgorithm/Blog git https://github.com/aialgorithm/Blog.git
octolytics-dimension-user_id	33707637
octolytics-dimension-user_login	aialgorithm
octolytics-dimension-repository_id	147093233
octolytics-dimension-repository_nwo	aialgorithm/Blog
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	147093233
octolytics-dimension-repository_network_root_nwo	aialgorithm/Blog
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	bdde15ad1b403e23b08bbd89b53fbe6bdf688cad
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://github.com/aialgorithm/Blog/issues/63#start-of-content
	https://github.com/
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Faialgorithm%2FBlog%2Fissues%2F63
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Faialgorithm%2FBlog%2Fissues%2F63
Sign up	https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=aialgorithm%2FBlog
Reload	https://github.com/aialgorithm/Blog/issues/63
Reload	https://github.com/aialgorithm/Blog/issues/63
Reload	https://github.com/aialgorithm/Blog/issues/63
aialgorithm	https://github.com/aialgorithm
Blog	https://github.com/aialgorithm/Blog
Notifications	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Fork 259	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Star 942	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Code	https://github.com/aialgorithm/Blog
Issues 66	https://github.com/aialgorithm/Blog/issues
Pull requests 0	https://github.com/aialgorithm/Blog/pulls
Actions	https://github.com/aialgorithm/Blog/actions
Projects 0	https://github.com/aialgorithm/Blog/projects
Security Uh oh! There was an error while loading. Please reload this page.	https://github.com/aialgorithm/Blog/security
Please reload this page	https://github.com/aialgorithm/Blog/issues/63
Insights	https://github.com/aialgorithm/Blog/pulse
Code	https://github.com/aialgorithm/Blog
Issues	https://github.com/aialgorithm/Blog/issues
Pull requests	https://github.com/aialgorithm/Blog/pulls
Actions	https://github.com/aialgorithm/Blog/actions
Projects	https://github.com/aialgorithm/Blog/projects
Security	https://github.com/aialgorithm/Blog/security
Insights	https://github.com/aialgorithm/Blog/pulse
New issue	https://github.com/login?return_to=https://github.com/aialgorithm/Blog/issues/63
New issue	https://github.com/login?return_to=https://github.com/aialgorithm/Blog/issues/63
机器学习数据不满足同分布，怎么整？	https://github.com/aialgorithm/Blog/issues/63#top
	https://github.com/aialgorithm
	https://github.com/aialgorithm
aialgorithm	https://github.com/aialgorithm
on Dec 20, 2022	https://github.com/aialgorithm/Blog/issues/63#issue-1504485554
	https://camo.githubusercontent.com/23ba53ee4dfce479c12c46cc0478831f424b93889a62936ea6910d8998601153/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d646131353231613831613936386330302e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/a626d4adf541eb9b5edadfa741bb599b6fa841713a9af4b1357543f888888b1a/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d376334333539656462643836383639372e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/0664e071b689dec3ee6609d40589b951565ac09806a3aed43b81af811315a7bd/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d393833323866613762313963393865652e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/9fd16d6f833e960f00a168a7c670b878b507a30feb6d1a682072ad189e124ea2/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d323965336233303362303862313464302e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
指标	http://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==&mid=2247486137&idx=1&sn=abbc4f6c241a812dcfec690ed3431a25&scene=19#wechat_redirect
异常检测算法速览	http://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==&mid=2247485161&idx=1&sn=379c206b399e7ed11a8b18016a3c3cc2&scene=19#wechat_redirect
	https://camo.githubusercontent.com/a06b8b62b51f38e917da34d841cb21f6efef46a1afb65f46a1e6230aa155c7f4/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d393238623664316232356537353939662e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/3a2b89efffa5cf62983159bf9c7ad41d6c42e1d6ee826c480e044e6f9dfadd55/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d356335373261333639333930353235612e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
数据增强方法	https://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==&mid=2247484169&idx=1&sn=54d58e40b31ee34c7fe3c44c096a4221&scene=19#wechat_redirect
半监督学习	https://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==&mid=2247484300&idx=1&sn=ae608281023850efdb56a35abb92d907&scene=19#wechat_redirect
	https://camo.githubusercontent.com/205a760e409efe93c6de1ce9e8f1cad20cca65a9a36419788234a1d726e656d9/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d363965363565656163373534366334372e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
python特征选择	http://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==&mid=2247483996&idx=1&sn=1659cedcc0268f2bee803e96eceabab5&scene=19#wechat_redirect
	https://camo.githubusercontent.com/5057bcbd3a19a967884379329f725d8ba02434631310d171d4ad6d0f8afc84a0/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d363566366136666438373138396230372e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/8424767a83269ddd01acf7627de39d38bd8360a6d5d232462aed8c310bf7d0ef/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d346536313363303230366238303132632e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
一文解决样本不均衡（全）	https://mp.weixin.qq.com/s?__biz=MzI4MDE1NjExMQ==&mid=2247487430&idx=1&sn=abb25dfb333c53634f435c101e1fb8dd&scene=19#wechat_redirect
https://zhuanlan.zhihu.com/p/449101154	https://zhuanlan.zhihu.com/p/449101154
训练/测试集分布不一致解法总结	https://mp.weixin.qq.com/s?__biz=MzIyNjM2MzQyNg==&mid=2247627421&idx=1&sn=bd04c00ba684e6fa2f4bb418e700fc60&chksm=e87d23d0df0aaac613f84293b715b0661c3d72d667fbdf17c89a957c0cca8a29302216b3343b&mpshare=1&scene=1&srcid=1012tyIXlyLmXqueq3g5qAi4&sharer_sharetime=1665535882449&sharer_shareid=154fc80c9534d9efd371a48cf67a483a&version=4.0.16.6007&platform=win#rd
https://www.zhihu.com/question/265829982/answer/1770310534	https://www.zhihu.com/question/265829982/answer/1770310534
https://zhuanlan.zhihu.com/p/304018288	https://zhuanlan.zhihu.com/p/304018288
https://zhuanlan.zhihu.com/p/195704051	https://zhuanlan.zhihu.com/p/195704051
https://zhuanlan.zhihu.com/p/350616539	https://zhuanlan.zhihu.com/p/350616539
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.