Title: Split Your Dataset With scikit-learn's train_test_split() – Real Python
Open Graph Title: Split Your Dataset With scikit-learn's train_test_split() – Real Python
Description: In this tutorial, you'll learn why splitting your dataset in supervised machine learning is important and how to do it with train_test_split() from scikit-learn.
Open Graph Description: In this tutorial, you'll learn why splitting your dataset in supervised machine learning is important and how to do it with train_test_split() from scikit-learn.
Mail addresses
?subject=Python article for you&body=Split Your Dataset With scikit-learn's train_test_split() on Real Python
https://realpython.com/train-test-split-python-data/
Opengraph URL: https://realpython.com/train-test-split-python-data/
X: @realpython
Domain: realpython.com
{
"@context": "http://schema.org",
"@type": "Article",
"headline": "Split Your Dataset With scikit-learn's train_test_split()",
"image": {
"@type": "ImageObject",
"url": "https://files.realpython.com/media/Splitting-Datasets-With-sklearns-train_test_split_Watermarked.13dcac93b15d.jpg",
"width": 1920,
"height": 1080
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://realpython.com/train-test-split-python-data/",
"lastReviewed": "2025-01-29",
"author": {
"@type": "Person",
"name": "Mirko Stojiljkovi\u0107",
"image": "https://realpython.com/cdn-cgi/image/width=240,height=240,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/ms.fdcd0bdc2f4a.png",
"url": "https://realpython.com/team/mstojiljkovic/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
},
"reviewedBy": [
{
"@type": "Person",
"name": "Aldren Santos",
"image": "https://realpython.com/cdn-cgi/image/width=500,height=500,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/Aldren_Santos_Real_Python.6b0861d8b841.png",
"url": "https://realpython.com/team/asantos/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
},
{
"@type": "Person",
"name": "Brenda Weleschuk",
"image": "https://realpython.com/cdn-cgi/image/width=320,height=320,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/IMG_3324_1.50b309355fc1.jpg",
"url": "https://realpython.com/team/bweleschuk/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
},
{
"@type": "Person",
"name": "Geir Arne Hjelle",
"image": "https://realpython.com/cdn-cgi/image/width=800,height=800,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/gahjelle.470149ee709e.jpg",
"url": "https://realpython.com/team/gahjelle/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
},
{
"@type": "Person",
"name": "Joanna Jablonski",
"image": "https://realpython.com/cdn-cgi/image/width=800,height=800,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/jjablonksi-avatar.e37c4f83308e.jpg",
"url": "https://realpython.com/team/jjablonski/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
},
{
"@type": "Person",
"name": "Jacob Schmitt",
"image": "https://realpython.com/cdn-cgi/image/width=400,height=400,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/profile-small_js.2f4d0d8da1ca.jpg",
"url": "https://realpython.com/team/jschmitt/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
},
{
"@type": "Person",
"name": "Kyle Stratis",
"image": "https://realpython.com/cdn-cgi/image/width=400,height=400,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/KEK9iuEG_400x400.28b60a4581c0.jpg",
"url": "https://realpython.com/team/kstratis/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
},
{
"@type": "Person",
"name": "Martin Breuss",
"image": "https://realpython.com/cdn-cgi/image/width=456,height=456,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/martin_breuss_python_square.efb2b07faf9f.jpg",
"url": "https://realpython.com/team/mbreuss/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
}
]
},
"datePublished": "2025-01-29T14:00:00+00:00",
"dateModified": "2025-01-29T14:09:23.603078+00:00",
"publisher": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": {
"@type": "ImageObject",
"url": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png",
"width": 512,
"height": 512
},
"description": "Real Python is a leading provider of online Python education and one of the largest language-specific online communities for software developers. It publishes high-quality learning resources, such as tutorials, books, and courses to an audience of millions of developers, data scientists, and machine learning engineers each month.",
"slogan": "Become a Python Expert",
"email": "info@realpython.com",
"sameAs": [
"https://github.com/realpython",
"https://www.youtube.com/realpython",
"https://twitter.com/realpython",
"https://x.com/realpython",
"https://www.linkedin.com/company/realpython-com/",
"https://www.facebook.com/learnrealpython",
"https://www.instagram.com/realpython",
"https://www.tiktok.com/@realpython.com"
]
},
"author": {
"@type": "Person",
"name": "Mirko Stojiljkovi\u0107",
"image": "https://realpython.com/cdn-cgi/image/width=240,height=240,fit=crop,gravity=auto,format=auto/https://files.realpython.com/media/ms.fdcd0bdc2f4a.png",
"url": "https://realpython.com/team/mstojiljkovic/",
"affiliation": {
"@type": "Organization",
"@id": "https://realpython.com/#organization",
"name": "Real Python",
"url": "https://realpython.com",
"logo": "https://realpython.com/static/real-python-logo-square-512.157ae6bf64ed.png"
}
},
"description": "In this tutorial, you'll learn why splitting your dataset in supervised machine learning is important and how to do it with train_test_split() from scikit-learn.",
"hasPart": {
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is train_test_split()?",
"acceptedAnswer": {
"@type": "Answer",
"text": "train_test_split() is a function from scikit-learn that you use to split your dataset into training and test subsets, which helps you perform unbiased model evaluation and validation.
"
}
},
{
"@type": "Question",
"name": "What do x_train and y_train mean?",
"acceptedAnswer": {
"@type": "Answer",
"text": "x_train and y_train are the parts of your dataset that you use to train—or fit—your machine learning model. x_train contains the input data, while y_train contains the corresponding output labels.
"
}
},
{
"@type": "Question",
"name": "What does test_size=0.2 mean?",
"acceptedAnswer": {
"@type": "Answer",
"text": "When you set test_size=0.2 in train_test_split(), you specify that 20% of your dataset should be used as the test set for evaluating your model, with the remaining 80% used for training.
"
}
},
{
"@type": "Question",
"name": "Can train_test_split() handle imbalanced datasets?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes, train_test_split() can handle imbalanced datasets by using the stratify parameter, which ensures that the class distribution in the training and test sets matches the original dataset.
"
}
}
]
}
}
| author | Real Python |
| twitter:card | summary_large_image |
| twitter:image | https://files.realpython.com/media/Splitting-Datasets-With-sklearns-train_test_split_Watermarked.13dcac93b15d.jpg |
| og:image | https://files.realpython.com/media/Splitting-Datasets-With-sklearns-train_test_split_Watermarked.13dcac93b15d.jpg |
| twitter:creator | @realpython |
| og:type | article |
Links:
Viewport: width=device-width, initial-scale=1, shrink-to-fit=no, viewport-fit=cover
Robots: max-image-preview:large