René's URL Explorer Experiment

Title: 多标签分类怎么做？(Python) · Issue #64 · aialgorithm/Blog · GitHub

Open Graph Title: 多标签分类怎么做？(Python) · Issue #64 · aialgorithm/Blog

X Title: 多标签分类怎么做？(Python) · Issue #64 · aialgorithm/Blog

Description: 一、基本介绍首先简单介绍下，多标签分类与多分类、多任务学习的关系：多分类学习（Multi-class）：分类器去划分的类别是多个的，但对于每一个样本只能有一个类别，类别间是互斥的。例如：分类器判断这只动物是猫、狗、猪，每个样本只能有一种类别，就是一个三分类任务。常用的做法是OVR、softmax多分类多标签学习（Multi-label ）：对于每一个样本可能有多个类别（标签）的任务，不像多分类任务的类别是互斥。例如判断每一部电影的标签可以是多个的，比如有些电影标签...

Open Graph Description: 一、基本介绍首先简单介绍下，多标签分类与多分类、多任务学习的关系：多分类学习（Multi-class）：分类器去划分的类别是多个的，但对于每一个样本只能有一个类别，类别间是互斥的。例如：分类器判断这只动物是猫、狗、猪，每个样本只能有一种类别，就是一个三分类任务。常用的做法是OVR、softmax多分类多标签学习（Multi-label ）：对于每一个样本可能有多个类别（标签）的任务，不...

X Description: 一、基本介绍首先简单介绍下，多标签分类与多分类、多任务学习的关系：多分类学习（Multi-class）：分类器去划分的类别是多个的，但对于每一个样本只能有一个类别，类别间是互斥的。例如：分类器判断这只动物是猫、狗、猪，每个样本只能有一种类别，就是一个三分类任务。常用的做法是OVR、softmax多分类多标签学习（Multi-label ）：对于每一个样本可能有多个类别（标签）的任务，不...

Opengraph URL: https://github.com/aialgorithm/Blog/issues/64

X: @github

direct link

Domain: github.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"多标签分类怎么做？(Python)","articleBody":"\r\n### 一、基本介绍\r\n\r\n首先简单介绍下，多标签分类与多分类、多任务学习的关系：\r\n\r\n- 多分类学习（Multi-class）：分类器去划分的类别是多个的，但对于每一个样本只能有一个类别，类别间是互斥的。例如：分类器判断这只动物是猫、狗、猪，每个样本只能有一种类别，就是一个三分类任务。常用的做法是OVR、softmax多分类![](https://upload-images.jianshu.io/upload_images/11682271-73b567afae27b5cc.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n\r\n- 多标签学习（Multi-label ）：对于每一个样本可能有多个类别（标签）的任务，不像多分类任务的类别是互斥。例如判断每一部电影的标签可以是多个的，比如有些电影标签是【科幻、动作】，有些电影是【动作、爱情、谍战】。需要注意的是，每一样本可能是1个类别，也可能是多个。而且，类别间通常是有所联系的，一部电影有科幻元素 同时也大概率有动作篇元素的。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-cb1a2a341061bde0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n\r\n- 多任务学习（Multi-task）：\r\n基于共享表示（shared representation），多任务学习是通过合并几个任务中的样例（可以视为对参数施加的软约束）来提高泛化的一种方式。额外的训练样本以同样的方式将模型的参数推向泛化更好的方向，当模型的一部分在任务之间共享时，模型的这一部分更多地被约束为良好的值（假设共享是合理的），往往能更好地泛化。某种角度上，多标签分类可以看作是一种多任务学习的简单形式。\r\n![](https://upload-images.jianshu.io/upload_images/11682271-3b4a29c187a2ad69.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n### 二、多标签分类实现\r\n实现多标签分类算法有DNN、KNN、ML-DT、Rank-SVM、CML，像决策树DT、最近邻KNN这一类模型，从原理上面天然可调整适应多标签任务的（多标签适应法），按同一划分/近邻的客群中各标签的占比什么的做下排序就可以做到了。\r\n\r\n这里着重介绍下，比较通用的多标签实现思路，大致有以下4种：\r\n\r\n#### 方法一：多分类思路\r\n简单粗暴，直接把不同标签组合当作一个类别，作为一个多分类任务来学习。如上述 【科幻、动作】、【动作、爱情、谍战】、【科幻、爱情】就可以看作一个三分类任务。这种方法前提是标签组合是比较有限的，不然标签会非常稀疏没啥用。\r\n\r\n#### 方法二：OVR二分类思路\r\n也挺简单的。将多标签问题转成多个二分类模型预测的任务。如电影总的子标签有K个，划分出K份数据，分别训练K个二分类模型，【是否科幻类、是否动作类....第K类】，对于每个样本预测K次打出最终的标签组合。\r\n\r\n这种方法简单灵活，但是缺点是也很明显，各子标签间的学习都是独立的（可能是否科幻类对判定是否动作类的是有影响），忽略了子标签间的联系，丢失了很多信息。\r\n\r\n对应的方法有sklearn的OneVsRestClassifier方法，\r\n```\r\nfrom xgboost import XGBClassifier\r\nfrom sklearn.multiclass import OneVsRestClassifier\r\nimport numpy as np\r\n\r\nclf_multilabel = OneVsRestClassifier(XGBClassifier())\r\n\r\ntrain_data = np.random.rand(500, 100)  # 500 entities, each contains 100 features\r\ntrain_label = np.random.randint(2, size=(500,20))  # 20 targets\r\n\r\nval_data = np.random.rand(100, 100)\r\n\r\nclf_multilabel.fit(train_data,train_label)\r\nval_pred = clf_multilabel.predict(val_data)\r\n\r\n```\r\n\r\n#### 方法三：二分类改良\r\n在方法二的基础上进行改良，即考虑标签之间的关系。 每一个分类器的预测结果将作为一个数据特征传给下一个分类器，参与进行下一个类别的预测。该方法的缺点是分类器之间的顺序会对模型性能产生巨大影响。\r\n\r\n#### 方法四：多个输出的神经网络\r\n这以与多分类方法类似，但不同的是这里神经网络的多个输出，输出层由多个的sigmoid+交叉熵组成，并不是像softmax各输出是互斥的。\r\n\r\n\r\n如下构建一个输出为3个标签的概率的多标签模型，模型是共用一套神经网络参数，各输出的是独立(bernoulli分布)的3个标签概率\r\n\r\n![](https://upload-images.jianshu.io/upload_images/11682271-46698b6b8a6eca87.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n\r\n```\r\n## 多标签 分类\r\nfrom keras.models import Model\r\nfrom keras.layers import Input,Dense\r\n\r\ninputs = Input(shape=(15,))\r\nhidden = Dense(units=10,activation='relu')(inputs)\r\noutput = Dense(units=3,activation='sigmoid')(hidden)\r\nmodel=Model(inputs=inputs, outputs=output)\r\nmodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])\r\nmodel.summary()\r\n\r\n# 训练模型，x特征，y为多个标签\r\nmodel.fit(x, y.loc[:,['LABEL','LABEL1','LABEL3']], epochs=3)\r\n\r\n```\r\n![](https://upload-images.jianshu.io/upload_images/11682271-8c2fa0bdf4c9be3b.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)\r\n\r\n通过共享的模型参数来完成多标签分类任务，在考虑了标签间的联系的同时，共享网络参数可以起着模型正则化的作用，可能对提高模型的泛化能力有所帮助的（在个人验证中，测试集的auc涨了1%左右）。这一点和多任务学习是比较有联系的，等后面有空再好好研究下多任务。\r\n\r\n\r\n\r\n\r\n\r\n","author":{"url":"https://github.com/aialgorithm","@type":"Person","name":"aialgorithm"},"datePublished":"2022-12-20T12:29:51.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/64/Blog/issues/64"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:c1ca7116-6bcd-45af-5a56-5fa1b6d90524
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	CFAC:B6892:CCE549:11661EF:6969ECA7
html-safe-nonce	1c65901e9962c609e08740034d5ceb1064bceb25b899b223df5757e7ed00709e
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDRkFDOkI2ODkyOkNDRTU0OToxMTY2MUVGOjY5NjlFQ0E3IiwidmlzaXRvcl9pZCI6IjE3NjU0MDgzMTEyMTM2ODE4MzEiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmac	ef1af450fe2a047edafd38be2d1f700e2b63213a73dcfdd15bff535ac4e90a3e
hovercard-subject-tag	issue:1504486322
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/aialgorithm/Blog/64/issue_layout
twitter:image	https://opengraph.githubassets.com/7ee1ff347ea84cc20edd13f840d1cf23065effba65c4e6351a5797ae71df5c00/aialgorithm/Blog/issues/64
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/7ee1ff347ea84cc20edd13f840d1cf23065effba65c4e6351a5797ae71df5c00/aialgorithm/Blog/issues/64
og:image:alt	一、基本介绍首先简单介绍下，多标签分类与多分类、多任务学习的关系：多分类学习（Multi-class）：分类器去划分的类别是多个的，但对于每一个样本只能有一个类别，类别间是互斥的。例如：分类器判断这只动物是猫、狗、猪，每个样本只能有一种类别，就是一个三分类任务。常用的做法是OVR、softmax多分类多标签学习（Multi-label ）：对于每一个样本可能有多个类别（标签）的任务，不...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	aialgorithm
hostname	github.com
expected-hostname	github.com
None	7b32f1c7c4549428ee399213e8345494fc55b5637195d3fc5f493657579235e8
turbo-cache-control	no-preview
go-import	github.com/aialgorithm/Blog git https://github.com/aialgorithm/Blog.git
octolytics-dimension-user_id	33707637
octolytics-dimension-user_login	aialgorithm
octolytics-dimension-repository_id	147093233
octolytics-dimension-repository_nwo	aialgorithm/Blog
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	147093233
octolytics-dimension-repository_network_root_nwo	aialgorithm/Blog
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	bdde15ad1b403e23b08bbd89b53fbe6bdf688cad
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://github.com/aialgorithm/Blog/issues/64#start-of-content
	https://github.com/
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Faialgorithm%2FBlog%2Fissues%2F64
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Faialgorithm%2FBlog%2Fissues%2F64
Sign up	https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=aialgorithm%2FBlog
Reload	https://github.com/aialgorithm/Blog/issues/64
Reload	https://github.com/aialgorithm/Blog/issues/64
Reload	https://github.com/aialgorithm/Blog/issues/64
aialgorithm	https://github.com/aialgorithm
Blog	https://github.com/aialgorithm/Blog
Notifications	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Fork 259	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Star 942	https://github.com/login?return_to=%2Faialgorithm%2FBlog
Code	https://github.com/aialgorithm/Blog
Issues 66	https://github.com/aialgorithm/Blog/issues
Pull requests 0	https://github.com/aialgorithm/Blog/pulls
Actions	https://github.com/aialgorithm/Blog/actions
Projects 0	https://github.com/aialgorithm/Blog/projects
Security Uh oh! There was an error while loading. Please reload this page.	https://github.com/aialgorithm/Blog/security
Please reload this page	https://github.com/aialgorithm/Blog/issues/64
Insights	https://github.com/aialgorithm/Blog/pulse
Code	https://github.com/aialgorithm/Blog
Issues	https://github.com/aialgorithm/Blog/issues
Pull requests	https://github.com/aialgorithm/Blog/pulls
Actions	https://github.com/aialgorithm/Blog/actions
Projects	https://github.com/aialgorithm/Blog/projects
Security	https://github.com/aialgorithm/Blog/security
Insights	https://github.com/aialgorithm/Blog/pulse
New issue	https://github.com/login?return_to=https://github.com/aialgorithm/Blog/issues/64
New issue	https://github.com/login?return_to=https://github.com/aialgorithm/Blog/issues/64
多标签分类怎么做？(Python)	https://github.com/aialgorithm/Blog/issues/64#top
	https://github.com/aialgorithm
	https://github.com/aialgorithm
aialgorithm	https://github.com/aialgorithm
on Dec 20, 2022	https://github.com/aialgorithm/Blog/issues/64#issue-1504486322
	https://camo.githubusercontent.com/a2aa6f831accd25ebaf82732b192dd80a0910e3be464cce70dcc721a97f3529f/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d373362353637616661653237623563632e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/003564ac6eacb89237ce49f22cb49ce947dac411a631353465dd7effe89e61e1/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d636231613261333431303631626465302e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/4035f38f070e6521c46593c226e7de4a2c7a9f3a04b4cf75f02250a0a3bfaf07/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d336234613239633138376132616436392e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/0dd04f078a6614be53922c7378b430ab5264501e0e348f41c8290c54822e6698/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d343636393862366238613665636138372e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://camo.githubusercontent.com/006609ad7eef1a5645398905ea93daccdcc21313a470979c862327d2972f652f/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f31313638323237312d386332666130626466346339626533622e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.