| Skip to content | https://github.com/code-ram/DAT8#start-of-content |
|
| https://github.com/ |
|
Sign in
| https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fcode-ram%2FDAT8 |
| GitHub CopilotWrite better code with AI | https://github.com/features/copilot |
| GitHub SparkBuild and deploy intelligent apps | https://github.com/features/spark |
| GitHub ModelsManage and compare prompts | https://github.com/features/models |
| MCP RegistryNewIntegrate external tools | https://github.com/mcp |
| ActionsAutomate any workflow | https://github.com/features/actions |
| CodespacesInstant dev environments | https://github.com/features/codespaces |
| IssuesPlan and track work | https://github.com/features/issues |
| Code ReviewManage code changes | https://github.com/features/code-review |
| GitHub Advanced SecurityFind and fix vulnerabilities | https://github.com/security/advanced-security |
| Code securitySecure your code as you build | https://github.com/security/advanced-security/code-security |
| Secret protectionStop leaks before they start | https://github.com/security/advanced-security/secret-protection |
| Why GitHub | https://github.com/why-github |
| Documentation | https://docs.github.com |
| Blog | https://github.blog |
| Changelog | https://github.blog/changelog |
| Marketplace | https://github.com/marketplace |
| View all features | https://github.com/features |
| Enterprises | https://github.com/enterprise |
| Small and medium teams | https://github.com/team |
| Startups | https://github.com/enterprise/startups |
| Nonprofits | https://github.com/solutions/industry/nonprofits |
| App Modernization | https://github.com/solutions/use-case/app-modernization |
| DevSecOps | https://github.com/solutions/use-case/devsecops |
| DevOps | https://github.com/solutions/use-case/devops |
| CI/CD | https://github.com/solutions/use-case/ci-cd |
| View all use cases | https://github.com/solutions/use-case |
| Healthcare | https://github.com/solutions/industry/healthcare |
| Financial services | https://github.com/solutions/industry/financial-services |
| Manufacturing | https://github.com/solutions/industry/manufacturing |
| Government | https://github.com/solutions/industry/government |
| View all industries | https://github.com/solutions/industry |
| View all solutions | https://github.com/solutions |
| AI | https://github.com/resources/articles?topic=ai |
| Software Development | https://github.com/resources/articles?topic=software-development |
| DevOps | https://github.com/resources/articles?topic=devops |
| Security | https://github.com/resources/articles?topic=security |
| View all topics | https://github.com/resources/articles |
| Customer stories | https://github.com/customer-stories |
| Events & webinars | https://github.com/resources/events |
| Ebooks & reports | https://github.com/resources/whitepapers |
| Business insights | https://github.com/solutions/executive-insights |
| GitHub Skills | https://skills.github.com |
| Documentation | https://docs.github.com |
| Customer support | https://support.github.com |
| Community forum | https://github.com/orgs/community/discussions |
| Trust center | https://github.com/trust-center |
| Partners | https://github.com/partners |
| GitHub SponsorsFund open source developers | https://github.com/sponsors |
| Security Lab | https://securitylab.github.com |
| Maintainer Community | https://maintainers.github.com |
| Accelerator | https://github.com/accelerator |
| Archive Program | https://archiveprogram.github.com |
| Topics | https://github.com/topics |
| Trending | https://github.com/trending |
| Collections | https://github.com/collections |
| Enterprise platformAI-powered developer platform | https://github.com/enterprise |
| GitHub Advanced SecurityEnterprise-grade security features | https://github.com/security/advanced-security |
| Copilot for BusinessEnterprise-grade AI features | https://github.com/features/copilot/copilot-business |
| Premium SupportEnterprise-grade 24/7 support | https://github.com/premium-support |
| Pricing | https://github.com/pricing |
| Search syntax tips | https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax |
| documentation | https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax |
|
Sign in
| https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fcode-ram%2FDAT8 |
|
Sign up
| https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=code-ram%2FDAT8 |
| Reload | https://github.com/code-ram/DAT8 |
| Reload | https://github.com/code-ram/DAT8 |
| Reload | https://github.com/code-ram/DAT8 |
|
code-ram
| https://github.com/code-ram |
| DAT8 | https://github.com/code-ram/DAT8 |
| justmarkham/DAT8 | https://github.com/justmarkham/DAT8 |
|
Notifications
| https://github.com/login?return_to=%2Fcode-ram%2FDAT8 |
|
Fork
0
| https://github.com/login?return_to=%2Fcode-ram%2FDAT8 |
|
Star
0
| https://github.com/login?return_to=%2Fcode-ram%2FDAT8 |
|
0
stars
| https://github.com/code-ram/DAT8/stargazers |
|
1.1k
forks
| https://github.com/code-ram/DAT8/forks |
|
Branches
| https://github.com/code-ram/DAT8/branches |
|
Tags
| https://github.com/code-ram/DAT8/tags |
|
Activity
| https://github.com/code-ram/DAT8/activity |
|
Star
| https://github.com/login?return_to=%2Fcode-ram%2FDAT8 |
|
Notifications
| https://github.com/login?return_to=%2Fcode-ram%2FDAT8 |
|
Code
| https://github.com/code-ram/DAT8 |
|
Pull requests
0
| https://github.com/code-ram/DAT8/pulls |
|
Actions
| https://github.com/code-ram/DAT8/actions |
|
Projects
0
| https://github.com/code-ram/DAT8/projects |
|
Wiki
| https://github.com/code-ram/DAT8/wiki |
|
Security
Uh oh!
There was an error while loading. Please reload this page.
| https://github.com/code-ram/DAT8/security |
| Please reload this page | https://github.com/code-ram/DAT8 |
|
Insights
| https://github.com/code-ram/DAT8/pulse |
|
Code
| https://github.com/code-ram/DAT8 |
|
Pull requests
| https://github.com/code-ram/DAT8/pulls |
|
Actions
| https://github.com/code-ram/DAT8/actions |
|
Projects
| https://github.com/code-ram/DAT8/projects |
|
Wiki
| https://github.com/code-ram/DAT8/wiki |
|
Security
| https://github.com/code-ram/DAT8/security |
|
Insights
| https://github.com/code-ram/DAT8/pulse |
| Branches | https://github.com/code-ram/DAT8/branches |
| Tags | https://github.com/code-ram/DAT8/tags |
| https://github.com/code-ram/DAT8/branches |
| https://github.com/code-ram/DAT8/tags |
| 119 Commits | https://github.com/code-ram/DAT8/commits/master/ |
| https://github.com/code-ram/DAT8/commits/master/ |
| code | https://github.com/code-ram/DAT8/tree/master/code |
| code | https://github.com/code-ram/DAT8/tree/master/code |
| data | https://github.com/code-ram/DAT8/tree/master/data |
| data | https://github.com/code-ram/DAT8/tree/master/data |
| homework | https://github.com/code-ram/DAT8/tree/master/homework |
| homework | https://github.com/code-ram/DAT8/tree/master/homework |
| notebooks | https://github.com/code-ram/DAT8/tree/master/notebooks |
| notebooks | https://github.com/code-ram/DAT8/tree/master/notebooks |
| other | https://github.com/code-ram/DAT8/tree/master/other |
| other | https://github.com/code-ram/DAT8/tree/master/other |
| project | https://github.com/code-ram/DAT8/tree/master/project |
| project | https://github.com/code-ram/DAT8/tree/master/project |
| slides | https://github.com/code-ram/DAT8/tree/master/slides |
| slides | https://github.com/code-ram/DAT8/tree/master/slides |
| .gitignore | https://github.com/code-ram/DAT8/blob/master/.gitignore |
| .gitignore | https://github.com/code-ram/DAT8/blob/master/.gitignore |
| README.md | https://github.com/code-ram/DAT8/blob/master/README.md |
| README.md | https://github.com/code-ram/DAT8/blob/master/README.md |
| requirements.txt | https://github.com/code-ram/DAT8/blob/master/requirements.txt |
| requirements.txt | https://github.com/code-ram/DAT8/blob/master/requirements.txt |
| README | https://github.com/code-ram/DAT8 |
| https://github.com/code-ram/DAT8#dat8-course-repository |
| General Assembly's Data Science course | https://generalassemb.ly/education/data-science/washington-dc/ |
| Data School blog | http://www.dataschool.io/ |
| email newsletter | http://www.dataschool.io/subscribe/ |
| YouTube channel | https://www.youtube.com/user/dataschool |
| http://mybinder.org/repo/justmarkham/DAT8 |
| Introduction to Data Science | https://github.com/code-ram/DAT8#class-1-introduction-to-data-science |
| Command Line, Version Control | https://github.com/code-ram/DAT8#class-2-command-line-and-version-control |
| Data Reading and Cleaning | https://github.com/code-ram/DAT8#class-3-data-reading-and-cleaning |
| Exploratory Data Analysis | https://github.com/code-ram/DAT8#class-4-exploratory-data-analysis |
| Visualization | https://github.com/code-ram/DAT8#class-5-visualization |
| Machine Learning | https://github.com/code-ram/DAT8#class-6-machine-learning |
| Getting Data | https://github.com/code-ram/DAT8#class-7-getting-data |
| K-Nearest Neighbors | https://github.com/code-ram/DAT8#class-8-k-nearest-neighbors |
| Basic Model Evaluation | https://github.com/code-ram/DAT8#class-9-basic-model-evaluation |
| Linear Regression | https://github.com/code-ram/DAT8#class-10-linear-regression |
| First Project Presentation | https://github.com/code-ram/DAT8#class-11-first-project-presentation |
| Logistic Regression | https://github.com/code-ram/DAT8#class-12-logistic-regression |
| Advanced Model Evaluation | https://github.com/code-ram/DAT8#class-13-advanced-model-evaluation |
| Naive Bayes and Text Data | https://github.com/code-ram/DAT8#class-14-naive-bayes-and-text-data |
| Natural Language Processing | https://github.com/code-ram/DAT8#class-15-natural-language-processing |
| Kaggle Competition | https://github.com/code-ram/DAT8#class-16-kaggle-competition |
| Decision Trees | https://github.com/code-ram/DAT8#class-17-decision-trees |
| Ensembling | https://github.com/code-ram/DAT8#class-18-ensembling |
| Advanced scikit-learn, Clustering | https://github.com/code-ram/DAT8#class-19-advanced-scikit-learn-and-clustering |
| Regularization, Regex | https://github.com/code-ram/DAT8#class-20-regularization-and-regular-expressions |
| Course Review | https://github.com/code-ram/DAT8#class-21-course-review-and-final-project-presentation |
| Final Project Presentation | https://github.com/code-ram/DAT8#class-22-final-project-presentation |
| https://github.com/code-ram/DAT8#python-resources |
| Codecademy's Python course | http://www.codecademy.com/en/tracks/python |
| Dataquest | https://www.dataquest.io |
| Google's Python Class | https://developers.google.com/edu/python/ |
| Introduction to Python | http://introtopython.org/ |
| Python for Informatics | http://www.pythonlearn.com/book.php |
| slides | https://drive.google.com/folderview?id=0B7X1ycQalUnyal9yeUx3VW81VDg&usp=sharing |
| videos | https://www.youtube.com/playlist?list=PLlRFEj9H3Oj4JXIwMwN1_ss1Tk8wZShEJ |
| A Crash Course in Python for Scientists | http://nbviewer.ipython.org/gist/rpmuller/5920182 |
| Python 2.7 Quick Reference | https://github.com/justmarkham/python-reference/blob/master/reference.py |
| Beginner | https://github.com/code-ram/DAT8/blob/master/code/00_python_beginner_workshop.py |
| intermediate | https://github.com/code-ram/DAT8/blob/master/code/00_python_intermediate_workshop.py |
| Python Tutor | http://pythontutor.com/ |
| Course project | https://github.com/code-ram/DAT8/blob/master/project/README.md |
| https://github.com/code-ram/DAT8#course-project |
| Comparison of machine learning models | https://github.com/code-ram/DAT8/blob/master/other/model_comparison.md |
| https://github.com/code-ram/DAT8#comparison-of-machine-learning-models |
| Comparison of model evaluation procedures and metrics | https://github.com/code-ram/DAT8/blob/master/other/model_evaluation_comparison.md |
| https://github.com/code-ram/DAT8#comparison-of-model-evaluation-procedures-and-metrics |
| Advice for getting better at data science | https://github.com/code-ram/DAT8/blob/master/other/advice.md |
| https://github.com/code-ram/DAT8#advice-for-getting-better-at-data-science |
| Additional resources | https://github.com/code-ram/DAT8#additional-resources-1 |
| https://github.com/code-ram/DAT8#additional-resources |
| https://github.com/code-ram/DAT8#class-1-introduction-to-data-science |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/01_course_overview.pdf |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/01_intro_to_data_science.pdf |
| requirements | https://github.com/code-ram/DAT8/blob/master/project/README.md |
| example projects | https://github.com/justmarkham/DAT-project-examples |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/01_types_of_data.pdf |
| public data sources | https://github.com/code-ram/DAT8/blob/master/project/public_data.md |
| command line tutorial | http://generalassembly.github.io/prework/command-line/#/ |
| command line reference | https://github.com/code-ram/DAT8/blob/master/code/02_command_line.md |
| Introduction to Git and GitHub | https://www.youtube.com/playlist?list=PL5-da3qGB5IBLMp7LtN8Nc3Efd4hJq0kD |
| Pro Git | http://git-scm.com/book/en/v2 |
| setup checklist | https://github.com/code-ram/DAT8/blob/master/other/setup_checklist.md |
| Analyzing the Analyzers | http://cdn.oreillystatic.com/oreilly/radarreport/0636920029014/Analyzing_the_Analyzers.pdf |
| Win-Vector | http://www.win-vector.com/blog/2012/09/on-being-a-data-scientist/ |
| Datascope Analytics | http://datascopeanalytics.com/what-we-think/2014/07/31/six-qualities-of-a-great-data-scientist |
| data science topic FAQ | https://www.quora.com/Data-Science |
| event calendar | http://www.datacommunitydc.org/calendar |
| weekly newsletter | http://www.datacommunitydc.org/newsletter |
| https://github.com/code-ram/DAT8#class-2-command-line-and-version-control |
| code | https://github.com/code-ram/DAT8/blob/master/code/02_command_line.md |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/02_git_github.pdf |
| command line homework assignment | https://github.com/code-ram/DAT8/blob/master/homework/02_command_line_chipotle.md |
| beginner | https://github.com/code-ram/DAT8/blob/master/code/00_python_beginner_workshop.py |
| intermediate | https://github.com/code-ram/DAT8/blob/master/code/00_python_intermediate_workshop.py |
| Introduction to Python | http://introtopython.org/ |
| Python for Informatics | http://www.pythonlearn.com/html-270/ |
| Codecademy | http://www.codecademy.com/en/tracks/python |
| DataQuest's Learning Python | https://www.dataquest.io/course/learning-python |
| Python Challenge | http://www.pythonchallenge.com/ |
| What is machine learning, and how does it work? | https://www.youtube.com/watch?v=elojMnjn4kk |
| IPython notebook | https://github.com/justmarkham/scikit-learn-videos/blob/master/01_machine_learning_intro.ipynb |
| A Visual Introduction to Machine Learning | http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ |
| example student projects | https://github.com/justmarkham/DAT-project-examples |
| Pro Git | http://git-scm.com/book/en/v2 |
| Git Immersion | http://gitimmersion.com/ |
| forks and pull requests | http://www.dataschool.io/simple-guide-to-forks-in-github-and-git/ |
| GitRef | http://gitref.org/ |
| Git quick reference for beginners | http://www.dataschool.io/git-quick-reference-for-beginners/ |
| Cracking the Code to GitHub's Growth | https://growthhackers.com/growth-studies/github |
| Markdown Cheatsheet | https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet |
| Mastering Markdown | https://guides.github.com/features/mastering-markdown/ |
| Data Science at the Command Line | http://shop.oreilly.com/product/0636920032823.do |
| companion website | http://datascienceatthecommandline.com/ |
| csvkit | http://csvkit.readthedocs.org/ |
| https://github.com/code-ram/DAT8#class-3-data-reading-and-cleaning |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/02_git_github.pdf |
| solution | https://github.com/code-ram/DAT8/blob/master/homework/02_command_line_chipotle.md |
| code | https://github.com/code-ram/DAT8/blob/master/code/03_file_reading.py |
| data | https://github.com/code-ram/DAT8/blob/master/data/airlines.csv |
| article | http://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/ |
| code | https://github.com/code-ram/DAT8/blob/master/code/03_python_homework_chipotle.py |
| data | https://github.com/code-ram/DAT8/blob/master/data/chipotle.tsv |
| article | http://www.nytimes.com/interactive/2015/02/17/upshot/what-do-people-actually-order-at-chipotle.html |
| Python homework assignment | https://github.com/code-ram/DAT8/blob/master/code/03_python_homework_chipotle.py |
| Want to understand Python's comprehensions? Think in Excel or SQL | http://blog.lerner.co.il/want-to-understand-pythons-comprehensions-think-like-an-accountant/ |
| My code isn't working | http://www.tecoed.co.uk/uploads/1/4/2/4/14249012/624506_orig.png |
| PEP 8 | https://www.python.org/dev/peps/pep-0008/ |
| Loop Like A Native | http://nedbatchelder.com/text/iter.html |
| Python Names and Values | http://nedbatchelder.com/text/names1.html |
| https://github.com/code-ram/DAT8#class-4-exploratory-data-analysis |
| code | https://github.com/code-ram/DAT8/blob/master/code/04_pandas.py |
| data | https://github.com/code-ram/DAT8/blob/master/data/u.user |
| data dictionary | http://files.grouplens.org/datasets/movielens/ml-100k-README.txt |
| website | http://grouplens.org/datasets/movielens/ |
| data | https://github.com/code-ram/DAT8/blob/master/data/drinks.csv |
| article | http://fivethirtyeight.com/datalab/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/ |
| data | https://github.com/code-ram/DAT8/blob/master/data/ufo.csv |
| website | http://www.nuforc.org/webreports.html |
| How Software in Half of NYC Cabs Generates $5.2 Million a Year in Extra Tips | http://iquantny.tumblr.com/post/107245431809/how-software-in-half-of-nyc-cabs-generates-5-2 |
| Anscombe's Quartet, and Why Summary Statistics Don't Tell the Whole Story | http://data.heapanalytics.com/anscombes-quartet-and-why-summary-statistics-dont-tell-the-whole-story/ |
| API Reference | http://pandas.pydata.org/pandas-docs/stable/api.html |
| What I do when I get a new data set as told through tweets | http://simplystatistics.org/2014/06/13/what-i-do-when-i-get-a-new-data-set-as-told-through-tweets/ |
| https://github.com/code-ram/DAT8#class-5-visualization |
| solution | https://github.com/code-ram/DAT8/blob/master/code/03_python_homework_chipotle.py |
| detailed explanation | https://github.com/code-ram/DAT8/blob/master/notebooks/03_python_homework_chipotle_explained.ipynb |
| code | https://github.com/code-ram/DAT8/blob/master/code/04_pandas.py |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/05_pandas_visualization.ipynb |
| Pandas homework assignment | https://github.com/code-ram/DAT8/blob/master/code/05_pandas_homework_imdb.py |
| IMDb data | https://github.com/code-ram/DAT8/blob/master/data/imdb_1000.csv |
| Jupyter Notebook | http://jupyter.readthedocs.org/en/latest/install.html |
| three-part tutorial | http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/ |
| introduction | https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/Section2_5-Introduction-to-Pandas.ipynb |
| data wrangling | https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/Section2_6-Data-Wrangling-with-Pandas.ipynb |
| Python for Data Analysis | http://shop.oreilly.com/product/0636920023784.do |
| joins in Pandas | https://github.com/code-ram/DAT8/blob/master/notebooks/05_pandas_merge.ipynb |
| pivot tables | https://beta.oreilly.com/learning/pivot-tables |
| GeoPandas | http://geopandas.org/index.html |
| tutorial | http://michelleful.github.io/code-blog/2015/04/24/sgmap/ |
| Look at Your Data | https://www.youtube.com/watch?v=coNDCIMH8bk |
| notebook | https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/Section2_7-Plotting-with-Pandas.ipynb |
| visualization page | http://pandas.pydata.org/pandas-docs/stable/visualization.html |
| notebook on matplotlib | https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/Section2_4-Matplotlib.ipynb |
| similar notebook | https://github.com/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb |
| Overview of Python Visualization Tools | http://pbpython.com/visualization-tools-1.html |
| Choosing a Good Chart | http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf |
| The Graphic Continuum | http://www.coolinfographics.com/storage/post-images/The-Graphic-Continuum-POSTER.jpg |
| R Graph Catalog | http://shiny.stat.ubc.ca/r-graph-catalog/ |
| PowerPoint presentation | http://www2.research.att.com/~volinsky/DataMining/Columbia2011/Slides/Topic2-EDAViz.ppt |
| Harvard's Data Science course | http://cs109.github.io/2014/ |
| Visualization Goals, Data Types, and Statistical Graphs | http://cm.dce.harvard.edu/2015/01/14328/L03/screen_H264LargeTalkingHead-16x9.shtml |
| slides | https://docs.google.com/file/d/0B7IVstmtIvlHLTdTbXdEVENoRzQ/edit |
| https://github.com/code-ram/DAT8#class-6-machine-learning |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/05_pandas_visualization.ipynb |
| Iris dataset | http://archive.ics.uci.edu/ml/datasets/Iris |
| Iris photo | http://sebastianraschka.com/Images/2014_python_lda/iris_petal_sepal.png |
| Notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/06_human_learning_iris.ipynb |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/06_machine_learning.pdf |
| human learning notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/06_human_learning_iris.ipynb |
| requests | http://www.python-requests.org/en/latest/user/install/ |
| Beautiful Soup 4 | http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-beautiful-soup |
| What is machine learning, and how does it work? | https://www.youtube.com/watch?v=elojMnjn4kk |
| associated notebook | https://github.com/justmarkham/scikit-learn-videos/blob/master/01_machine_learning_intro.ipynb |
| An Introduction to Statistical Learning | http://www-bcf.usc.edu/~gareth/ISL/ |
| Learning Paradigms | http://work.caltech.edu/library/014.html |
| Caltech's Learning From Data course | http://work.caltech.edu/telecourse.html |
| Real-World Active Learning | https://beta.oreilly.com/ideas/real-world-active-learning |
| overview of the supervised learning process | https://github.com/rasbt/pattern_classification/blob/master/machine_learning/supervised_intro/introduction_to_supervised_machine_learning.md |
| Data Science, Machine Learning, and Statistics: What is in a Name? | http://www.win-vector.com/blog/2013/04/data-science-machine-learning-and-statistics-what-is-in-a-name/ |
| The Emoji Translation Project | https://www.kickstarter.com/projects/fred/the-emoji-translation-project |
| characteristics of your zip code | http://www.esri.com/landing-pages/tapestry/ |
| 67 distinct segments | http://doc.arcgis.com/en/esri-demographics/data/tapestry-segmentation.htm |
| scikit-learn and the IPython Notebook | https://www.youtube.com/watch?v=IsXXlYVBt1M |
| associated notebook | https://github.com/justmarkham/scikit-learn-videos/blob/master/02_machine_learning_setup.ipynb |
| Notebook tutorials | https://github.com/jupyter/notebook/blob/master/docs/source/examples/Notebook/Examples%20and%20Tutorials%20Index.ipynb |
| Reddit discussion | https://www.reddit.com/r/Python/comments/3be5z2/do_you_prefer_ipython_notebook_over_ipython/ |
| https://github.com/code-ram/DAT8#class-7-getting-data |
| solution | https://github.com/code-ram/DAT8/blob/master/code/05_pandas_homework_imdb.py |
| solution | https://github.com/code-ram/DAT8/blob/master/notebooks/06_human_learning_iris.ipynb |
| code | https://github.com/code-ram/DAT8/blob/master/code/07_api.py |
| OMDb API | http://www.omdbapi.com/ |
| code | https://github.com/code-ram/DAT8/blob/master/code/07_web_scraping.py |
| IMDb: robots.txt | http://www.imdb.com/robots.txt |
| Example web page | https://github.com/code-ram/DAT8/blob/master/data/example.html |
| IMDb: The Shawshank Redemption | http://www.imdb.com/title/tt0111161/ |
| web scraping code | https://github.com/code-ram/DAT8/blob/master/code/07_web_scraping.py |
| install Seaborn | http://stanford.edu/~mwaskom/software/seaborn/installing.html |
| query the U.S. Census API | https://github.com/laurakurup/census-api |
| Mashape | https://www.mashape.com/explore |
| Apigee | https://apigee.com/providers |
| Python API wrapper | http://www.pythonforbeginners.com/api/list-of-python-apis |
| Data Science Toolkit | http://www.datasciencetoolkit.org/ |
| API Integration in Python | https://realpython.com/blog/python/api-integration-in-python/ |
| Face Detection API | https://www.projectoxford.ai/demo/face#detection |
| How-Old.net | http://how-old.net/ |
| Beautiful Soup documentation | http://www.crummy.com/software/BeautifulSoup/bs4/doc/ |
| specifying a parser | http://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use |
| Web Scraping 101 with Python | http://www.gregreda.com/2013/03/03/web-scraping-101-with-python/ |
| scraping Craigslist | https://github.com/Alexjmsherman/DataScience_GeneralAssembly/blob/master/Final_Project/1.%20Final_Project_Data%20Scraping.ipynb |
| notebook | http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html |
| notebook | https://github.com/cs109/2014/blob/master/lectures/2014_09_23-lecture/data_scraping_transcript.ipynb |
| video | http://cm.dce.harvard.edu/2015/01/14328/L07/screen_H264LargeTalkingHead-16x9.shtml |
| Web Scraping with Python | https://www.youtube.com/watch?v=p1iX0uxM1w8 |
| slides | https://docs.google.com/presentation/d/1uHM_esB13VuSf7O1ScGueisnrtu-6usGFD3fs4z5YCE/edit#slide=id.p |
| code | https://github.com/kjam/python-web-scraping-tutorial |
| Scrapy | http://scrapy.org/ |
| documentation | http://doc.scrapy.org/en/1.0/index.html |
| tutorial | https://github.com/rdempsey/ddl-data-wrangling |
| robotstxt.org | http://www.robotstxt.org/robotstxt.html |
| import.io | https://import.io/ |
| Kimono | https://www.kimonolabs.com/ |
| How a Math Genius Hacked OkCupid to Find True Love | http://www.wired.com/2014/01/how-to-hack-okcupid/all/ |
| How Netflix Reverse Engineered Hollywood | http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/?single_page=true |
| https://github.com/code-ram/DAT8#class-8-k-nearest-neighbors |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/08_pandas_review.ipynb |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/08_knn_sklearn.ipynb |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/08_nba_knn.ipynb |
| data | https://github.com/justmarkham/DAT4-students/blob/master/kerry/Final/NBA_players_2015.csv |
| data dictionary | https://github.com/justmarkham/DAT-project-examples/blob/master/pdf/nba_paper.pdf |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/08_bias_variance.ipynb |
| bias-variance tradeoff | https://github.com/code-ram/DAT8/blob/master/homework/09_bias_variance.md |
| introduction to reproducibility | http://www.dataschool.io/reproducibility-is-not-just-for-researchers/ |
| guide to creating a reproducible analysis | https://github.com/jtleek/datasharing |
| Colbert Report video | http://thecolbertreport.cc.com/videos/dcyvro/austerity-s-spreadsheet-error |
| Getting started in scikit-learn with the famous iris dataset | https://www.youtube.com/watch?v=hd1W4CyPX58 |
| Training a machine learning model with scikit-learn | https://www.youtube.com/watch?v=RlQuVL6-qe8 |
| distance metrics | http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html |
| Mahalanobis distance | http://stats.stackexchange.com/questions/62092/bottom-to-top-explanation-of-the-mahalanobis-distance |
| takes the scale of the data into account | http://blogs.sas.com/content/iml/2012/02/15/what-is-mahalanobis-distance.html |
| A Detailed Introduction to KNN | https://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/ |
| Image Classification | http://cs231n.github.io/classification/ |
| object recognition | http://vlm1.uta.edu/~athitsos/nearest_neighbors/ |
| satellite image enhancement | http://land.umn.edu/documents/FS6.pdf |
| document categorization | http://www.ceng.metu.edu.tr/~e120321/paper.pdf |
| gene expression analysis | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.208.993 |
| detailed tutorials | http://web.stanford.edu/~mwaskom/software/seaborn/tutorial.html |
| example gallery | http://web.stanford.edu/~mwaskom/software/seaborn/examples/index.html |
| Data visualization with Seaborn | https://beta.oreilly.com/learning/data-visualization-with-seaborn |
| Visualizing Google Forms Data with Seaborn | http://pbpython.com/pandas-google-forms-part2.html |
| How to Create NBA Shot Charts in Python | http://savvastjortjoglou.com/nba-shot-sharts.html |
| https://github.com/code-ram/DAT8#class-9-basic-model-evaluation |
| solution | https://github.com/code-ram/DAT8/blob/master/code/07_web_scraping.py#L136 |
| introduction | http://www.dataschool.io/reproducibility-is-not-just-for-researchers/ |
| Colbert Report video | http://thecolbertreport.cc.com/videos/dcyvro/austerity-s-spreadsheet-error |
| cabs article | http://iquantny.tumblr.com/post/107245431809/how-software-in-half-of-nyc-cabs-generates-5-2 |
| Tweet | https://twitter.com/jakevdp/status/519563939177197571 |
| creating a reproducible analysis | https://github.com/jtleek/datasharing |
| Classic rock | https://github.com/fivethirtyeight/data/tree/master/classic-rock |
| student project 1 | https://github.com/jwknobloch/DAT4_final_project |
| student project 2 | https://github.com/justmarkham/DAT4-students/tree/master/Jonathan_Bryan/Project_Files |
| bias-variance tradeoff | https://github.com/code-ram/DAT8/blob/master/homework/09_bias_variance.md |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/09_model_evaluation.ipynb |
| module reference | http://scikit-learn.org/stable/modules/classes.html |
| user guide | http://scikit-learn.org/stable/user_guide.html |
| Data science in Python | https://www.youtube.com/watch?v=3ZWuPVWq7p4 |
| associated notebook | https://github.com/justmarkham/scikit-learn-videos/blob/master/06_linear_regression.ipynb |
| The Easiest Introduction to Regression Analysis | https://www.youtube.com/watch?v=k_OB1tWX9PM |
| Comparing machine learning models in scikit-learn | https://www.youtube.com/watch?v=0pP4EwWJgIU |
| estimating prediction error | https://www.youtube.com/watch?v=_2ij6eaaSl0&t=2m34s |
| visualizing bias and variance | http://work.caltech.edu/library/081.html |
| Random Test/Train Split is Not Always Enough | http://www.win-vector.com/blog/2015/01/random-testtrain-split-is-not-always-enough/ |
| What We've Learned About Sharing Our Data Analysis | https://source.opennews.org/en-US/articles/what-weve-learned-about-sharing-our-data-analysis/ |
| Software development skills for data scientists | http://treycausey.com/software_dev_skills.html |
| Data science done well looks easy - and that is a big problem for data scientists | http://simplystatistics.org/2015/03/17/data-science-done-well-looks-easy-and-that-is-a-big-problem-for-data-scientists/ |
| https://github.com/code-ram/DAT8#class-10-linear-regression |
| article | http://blog.dominodatalab.com/10-interesting-uses-of-data-science/ |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/10_linear_regression.ipynb |
| Capital Bikeshare dataset | https://github.com/code-ram/DAT8/blob/master/data/bikeshare.csv |
| Data dictionary | https://www.kaggle.com/c/bike-sharing-demand/data |
| Predicting User Engagement in Corporate Collaboration Network | https://github.com/mikeyea/DAT7_project/blob/master/final%20project/Class_Presention_MYea.ipynb |
| homework assignment | https://github.com/code-ram/DAT8/blob/master/homework/10_yelp_votes.md |
| Yelp data | https://github.com/code-ram/DAT8/blob/master/data/yelp.csv |
| An Introduction to Statistical Learning | http://www-bcf.usc.edu/~gareth/ISL/ |
| related videos | http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/ |
| quick reference guide | http://www.dataschool.io/applying-and-interpreting-linear-regression/ |
| introduction to linear regression | http://people.duke.edu/~rnau/regintro.htm |
| assumptions of linear regression | http://pareonline.net/getvn.asp?n=2&v=8 |
| interactive visualization | http://setosa.io/ev/ordinary-least-squares-regression/ |
| Statsmodels | http://statsmodels.sourceforge.net/ |
| DAT7 lesson on linear regression | https://github.com/justmarkham/DAT7/blob/master/notebooks/10_linear_regression.ipynb |
| confidence intervals | http://www.quora.com/What-is-a-confidence-interval-in-laymans-terms/answer/Michael-Hochster |
| Hypothesis Testing: The Basics | http://20bits.com/article/hypothesis-testing-the-basics |
| Statistics Without the Agonizing Pain | https://www.youtube.com/watch?v=5Dnw46eC-0o |
| summary | http://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/ |
| response | http://www.nature.com/news/statistics-p-values-are-just-the-tip-of-the-iceberg-1.17412 |
| paper | http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf |
| Science Isn't Broken | http://fivethirtyeight.com/features/science-isnt-broken/ |
| Accurately Measuring Model Prediction Error | http://scott.fortmann-roe.com/docs/MeasuringError.html |
| An Introduction to Statistical Learning | http://www-bcf.usc.edu/~gareth/ISL/ |
| visualizations of the bikeshare data | https://www.kaggle.com/c/bike-sharing-demand/scripts?outputType=Visualization |
| https://github.com/code-ram/DAT8#class-11-first-project-presentation |
| probability | https://www.youtube.com/watch?v=o4QmoNfW3bI |
| odds | https://www.youtube.com/watch?v=GxbXQjX7fC0 |
| An Intuitive Guide To Exponential Functions & e | http://betterexplained.com/articles/an-intuitive-guide-to-exponential-functions-e/ |
| Demystifying the Natural Logarithm (ln) | http://betterexplained.com/articles/demystifying-the-natural-logarithm-ln/ |
| brief summary | https://github.com/code-ram/DAT8/blob/master/notebooks/12_e_log_examples.ipynb |
| https://github.com/code-ram/DAT8#class-12-logistic-regression |
| solution | https://github.com/code-ram/DAT8/blob/master/notebooks/10_yelp_votes_homework.ipynb |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/12_logistic_regression.ipynb |
| Glass identification dataset | https://archive.ics.uci.edu/ml/datasets/Glass+Identification |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/12_titanic_confusion.ipynb |
| data | https://github.com/code-ram/DAT8/blob/master/data/titanic.csv |
| data dictionary | https://www.kaggle.com/c/titanic/data |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/12_confusion_matrix.pdf |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/12_titanic_confusion.ipynb |
| Intuitive sensitivity and specificity | https://www.youtube.com/watch?v=U4_3fditnWg |
| The tradeoff between sensitivity and specificity | https://www.youtube.com/watch?v=vtYDyGGeQyo |
| ROC curves and AUC | https://github.com/code-ram/DAT8/blob/master/homework/13_roc_auc.md |
| cross-validation | https://github.com/code-ram/DAT8/blob/master/homework/13_cross_validation.md |
| An Introduction to Statistical Learning | http://www-bcf.usc.edu/~gareth/ISL/ |
| first three videos | http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/ |
| machine learning course | https://www.coursera.org/learn/machine-learning/home/info |
| related lecture notes | http://www.holehouse.org/mlclass/06_Logistic_Regression.html |
| guide | http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm |
| lecture notes | http://www.unm.edu/~schrader/biostat/bio2/Spr06/lec11.pdf |
| explanation | http://scikit-learn.org/stable/modules/calibration.html |
| Supervised learning superstitions cheat sheet | http://ryancompton.net/assets/ml_cheat_sheet/supervised_learning.html |
| simple guide to confusion matrix terminology | http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/ |
| Amazon Machine Learning | https://aws.amazon.com/blogs/aws/amazon-machine-learning-make-data-driven-decisions-at-scale/ |
| graphic | https://media.amazonwebservices.com/blog/2015/ml_adjust_model_1.png |
| how to calculate "expected value" | https://github.com/podopie/DAT18NYC/blob/master/classes/13-expected_value_cost_benefit_analysis.ipynb |
| https://github.com/code-ram/DAT8#class-13-advanced-model-evaluation |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/13_advanced_model_evaluation.ipynb |
| video/reading assignment | https://github.com/code-ram/DAT8/blob/master/homework/13_roc_auc.md |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/13_drawing_roc.pdf |
| video/reading assignment | https://github.com/code-ram/DAT8/blob/master/homework/13_cross_validation.md |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/13_cross_validation.ipynb |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/13_bank_exercise.ipynb |
| data | https://github.com/code-ram/DAT8/blob/master/data/bank-additional.csv |
| data dictionary | https://archive.ics.uci.edu/ml/datasets/Bank+Marketing |
| spam filtering | https://github.com/code-ram/DAT8/blob/master/homework/14_spam_filtering.md |
| Introduction to Probability | https://docs.google.com/presentation/d/1cM2dVbJgTWMkHoVNmYlB9df6P2H8BrjaqAcZTaLe9dA/edit#slide=id.gfc3caad2_00 |
| OpenIntro Statistics textbook | https://www.openintro.org/stat/textbook.php?stat_book=os |
| visualization | http://setosa.io/conditional/ |
| wealth and happiness | http://www.quora.com/What-is-an-intuitive-explanation-of-Bayes-Rule/answer/Michael-Hochster |
| ducks | https://planspacedotorg.wordpress.com/2014/02/23/bayes-rule-for-ducks/ |
| legos | http://www.countbayesie.com/blog/2015/2/18/bayes-theorem-with-lego |
| ROC Curves | https://www.youtube.com/watch?v=21Igj5Pr6u4 |
| An introduction to ROC analysis | http://people.inf.elte.hu/kiss/13dwhdm/roc.pdf |
| comparing different feature sets | http://research.microsoft.com/pubs/205472/aisec10-leontjeva.pdf |
| comparing different classifiers | http://www.cse.ust.hk/nevinZhangGroup/readings/yi/Bradley_PR97.pdf |
| An Introduction to Statistical Learning | http://www-bcf.usc.edu/~gareth/ISL/ |
| K-fold and leave-one-out cross-validation | https://www.youtube.com/watch?v=nZAM5OXrktY |
| cross-validation the right and wrong ways | https://www.youtube.com/watch?v=S06JpVoNaA0 |
| paper | http://www.jcheminf.com/content/pdf/1758-2946-6-10.pdf |
| GridSearchCV and RandomizedSearchCV | http://scikit-learn.org/stable/modules/grid_search.html |
| How to find the best model parameters in scikit-learn | https://www.youtube.com/watch?v=Gol_qOgRqfA |
| associated notebook | https://github.com/justmarkham/scikit-learn-videos/blob/master/08_grid_search.ipynb |
| model evaluation | http://scikit-learn.org/stable/modules/model_evaluation.html |
| Counterfactual evaluation of machine learning models | https://www.youtube.com/watch?v=QWCSxAKR-h0 |
| slides | http://www.slideshare.net/MichaelManapat/counterfactual-evaluation-of-machine-learning-models |
| Visualizing Machine Learning Thresholds to Make Better Business Decisions | http://blog.insightdatalabs.com/visualizing-classifier-thresholds/ |
| https://github.com/code-ram/DAT8#class-14-naive-bayes-and-text-data |
| Slides | https://github.com/code-ram/DAT8/blob/master/slides/14_bayes_theorem.pdf |
| Visualizing Bayes' theorem | http://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/14_bayes_theorem_iris.ipynb |
| Slides | https://github.com/code-ram/DAT8/blob/master/slides/14_naive_bayes.pdf |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/14_naive_bayes_spam.ipynb |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/14_text_data_sklearn.ipynb |
| CountVectorizer | http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html |
| data | https://github.com/code-ram/DAT8/blob/master/data/sms.tsv |
| data dictionary | https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection |
| homework assignment | https://github.com/code-ram/DAT8/blob/master/homework/14_yelp_review_text.md |
| Yelp data | https://github.com/code-ram/DAT8/blob/master/data/yelp.csv |
| TextBlob | https://textblob.readthedocs.org/ |
| Naive Bayes and Text Classification | http://sebastianraschka.com/Articles/2014_naive_bayes_1.html |
| slides | https://docs.google.com/presentation/d/1psUIyig6OxHQngGEHr3TMkCvhdLInnKnclQoNUr4G4U/edit#slide=id.gfc69f484_00 |
| OpenIntro Statistics textbook | https://www.openintro.org/stat/textbook.php?stat_book=os |
| airport security | http://www.quora.com/In-laymans-terms-how-does-Naive-Bayes-work/answer/Konstantin-Tt |
| Naive Bayes classifier | http://en.wikipedia.org/wiki/Naive_Bayes_classifier |
| Naive Bayes spam filtering | http://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering |
| Q&A | http://stats.stackexchange.com/questions/21822/understanding-naive-bayes |
| GaussianNB | http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html |
| MultinomialNB | http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/14_types_of_naive_bayes.ipynb |
| description | https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Gaussian_naive_Bayes |
| example | https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Sex_classification |
| slides | http://www.umiacs.umd.edu/~jbg/teaching/DATA_DIGGING/lecture_05.pdf |
| paper | http://ai.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf |
| his follow-up article | http://www.paulgraham.com/better.html |
| related paper | http://www.merl.com/publications/docs/TR2004-091.pdf |
| categorizing businesses | http://engineeringblog.yelp.com/2011/02/towards-building-a-high-quality-workforce-with-mechanical-turk.html |
| https://github.com/code-ram/DAT8#class-15-natural-language-processing |
| solution | https://github.com/code-ram/DAT8/blob/master/notebooks/14_yelp_review_text_homework.ipynb |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/15_natural_language_processing.ipynb |
| Kaggle competition | https://inclass.kaggle.com/c/dat8-stack-overflow |
| Kaggle: How it Works | https://www.youtube.com/watch?v=PoD84TVdD-4 |
| project presentation video | https://www.youtube.com/watch?v=HGr1yQV3Um0 |
| slides | https://speakerdeck.com/justmarkham/allstate-purchase-prediction-challenge-on-kaggle |
| video lectures | https://class.coursera.org/nlp/lecture |
| slides | http://web.stanford.edu/~jurafsky/NLPCourseraSlides.html |
| Coursera course | https://www.coursera.org/course/nlp |
| key NLP terms | https://github.com/ga-students/DAT_SF_9/blob/master/16_Text_Mining/DAT9_lec16_Text_Mining.pdf |
| Natural Language Processing with Python | http://www.nltk.org/book/ |
| Natural Language Toolkit | http://www.nltk.org/ |
| A Smattering of NLP in Python | https://github.com/charlieg/A-Smattering-of-NLP-in-Python/blob/master/A%20Smattering%20of%20NLP%20in%20Python.ipynb |
| notebook from DAT5 | https://github.com/justmarkham/DAT5/blob/master/notebooks/14_nlp.ipynb |
| spaCy | http://spacy.io/ |
| Stanford CoreNLP | http://nlp.stanford.edu/software/corenlp.shtml |
| HashingVectorizer | http://scikit-learn.org/stable/modules/feature_extraction.html#vectorizing-a-large-text-corpus-with-the-hashing-trick |
| Automatically Categorizing Yelp Businesses | http://engineeringblog.yelp.com/2015/09/automatically-categorizing-yelp-businesses.html |
| Modern Methods for Sentiment Analysis | http://districtdatalabs.silvrback.com/modern-methods-for-sentiment-analysis |
| Identifying Humorous Cartoon Captions | http://www.cs.huji.ac.il/~dshahaf/pHumor.pdf |
| DC Natural Language Processing | http://www.meetup.com/DC-NLP/ |
| https://github.com/code-ram/DAT8#class-16-kaggle-competition |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/16_kaggle.pdf |
| Predict whether a Stack Overflow question will be closed | https://inclass.kaggle.com/c/dat8-stack-overflow |
| Complete code file | https://github.com/code-ram/DAT8/blob/master/code/16_kaggle.py |
| Minimal code file | https://github.com/code-ram/DAT8/blob/master/code/16_kaggle_minimal.py |
| Explanations of log loss | http://www.quora.com/What-is-an-intuitive-explanation-for-the-log-loss-function |
| peer review guidelines | https://github.com/code-ram/DAT8/blob/master/project/peer_review.md |
| A Visual Introduction to Machine Learning | http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ |
| Graphviz | http://www.graphviz.org/ |
| Specialist Knowledge Is Useless and Unhelpful | http://www.slate.com/articles/health_and_science/new_scientist/2012/12/kaggle_president_jeremy_howard_amateurs_beat_specialists_in_data_prediction.html |
| Getting in Shape for the Sport of Data Science | https://www.youtube.com/watch?v=kwt6XEh7U3g |
| Learning from the best | http://blog.kaggle.com/2014/08/01/learning-from-the-best/ |
| Feature Engineering Without Domain Expertise | https://www.youtube.com/watch?v=bL4b1sGnILU |
| passengers at a train station | https://medium.com/@chris_bour/french-largest-data-science-challenge-ever-organized-shows-the-unreasonable-effectiveness-of-open-8399705a20ef |
| fraudulent users of an online store | https://docs.google.com/presentation/d/1UdI5NY-mlHyseiRVbpTLyvbrHxY8RciHp5Vc-ZLrwmU/edit#slide=id.p |
| bots in an online auction | https://www.kaggle.com/c/facebook-recruiting-iv-human-or-bot/forums/t/14628/share-your-secret-sauce |
| subscribe to the next season of an orchestra | http://blog.kaggle.com/2015/01/05/kaggle-inclass-stanfords-getting-a-handel-on-data-science-winners-report/ |
| quality of e-commerce search engine results | http://blog.kaggle.com/2015/07/22/crowdflower-winners-interview-3rd-place-team-quartet/ |
| Our perfect submission | https://www.kaggle.com/c/restaurant-revenue-prediction/forums/t/13950/our-perfect-submission |
| public leaderboard | https://www.kaggle.com/c/restaurant-revenue-prediction/leaderboard/public |
| https://github.com/code-ram/DAT8#class-17-decision-trees |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/17_decision_trees.ipynb |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/17_bikeshare_exercise.ipynb |
| data | https://github.com/code-ram/DAT8/blob/master/data/bikeshare.csv |
| data dictionary | https://www.kaggle.com/c/bike-sharing-demand/data |
| Human Ensemble Learning | http://mlwave.com/human-ensemble-learning/ |
| Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? | http://jmlr.csail.mit.edu/papers/volume15/delgado14a/delgado14a.pdf |
| comment | https://news.ycombinator.com/item?id=8719723 |
| decision trees | http://scikit-learn.org/stable/modules/tree.html |
| Introduction to Data Mining | http://www-users.cs.umn.edu/~kumar/dmbook/index.php |
| A Brief History of Classification and Regression Trees | https://drive.google.com/file/d/0B-BKohKl-jUYQ3RpMEF0OGRUU3RHVGpHY203NFd3Z19Nc1ZF/view |
| The Science of Singing Along | http://www.doc.gold.ac.uk/~mas03dm/papers/PawleyMullensiefen_Singalong_2012.pdf |
| identifying psychosis | http://www.psychcongress.com/sites/naccme.com/files/images/pcn/saundras/psychosis_decision_tree.pdf |
| https://github.com/code-ram/DAT8#class-18-ensembling |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/17_decision_trees.ipynb |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/18_ensembling.ipynb |
| Major League Baseball player data | https://github.com/code-ram/DAT8/blob/master/data/hitters.csv |
| Data dictionary | https://cran.r-project.org/web/packages/ISLR/ISLR.pdf |
| ensemble methods | http://scikit-learn.org/stable/modules/ensemble.html |
| Kaggle Ensembling Guide | http://mlwave.com/kaggle-ensembling-guide/ |
| solution paper | https://docs.google.com/viewer?url=https://raw.githubusercontent.com/ChenglongChen/Kaggle_CrowdFlower/master/Doc/Kaggle_CrowdFlower_ChenglongChen.pdf |
| CrowdFlower competition | https://www.kaggle.com/c/crowdflower-search-relevance |
| Interpretable vs Powerful Predictive Models: Why We Need Them Both | https://medium.com/@chris_bour/interpretable-vs-powerful-predictive-models-why-we-need-them-both-990340074979 |
| Not Even the People Who Write Algorithms Really Know How They Work | http://www.theatlantic.com/technology/archive/2015/09/not-even-the-people-who-write-algorithms-really-know-how-they-work/406099/ |
| How do random forests work in layman's terms? | http://www.quora.com/Random-Forests/How-do-random-forests-work-in-laymans-terms/answer/Edwin-Chen-1 |
| Large Scale Decision Forests: Lessons Learned | http://blog.siftscience.com/blog/2015/large-scale-decision-forests-lessons-learned |
| Unboxing the Random Forest Classifier | http://nerds.airbnb.com/unboxing-the-random-forest-classifier/ |
| Understanding Random Forests: From Theory to Practice | http://arxiv.org/pdf/1407.7502v3.pdf |
| https://github.com/code-ram/DAT8#class-19-advanced-scikit-learn-and-clustering |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/19_advanced_sklearn.ipynb |
| StandardScaler | http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html |
| Pipeline | http://scikit-learn.org/stable/modules/pipeline.html |
| slides | https://github.com/code-ram/DAT8/blob/master/slides/19_clustering.pdf |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/19_clustering.ipynb |
| documentation | http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html |
| visualization 1 | http://tech.nitoyon.com/en/blog/2013/11/07/k-means/ |
| visualization 2 | http://www.naftaliharris.com/blog/visualizing-k-means-clustering/ |
| documentation | http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html |
| visualization | http://www.naftaliharris.com/blog/visualizing-dbscan-clustering/ |
| Understanding the Bias-Variance Tradeoff | http://scott.fortmann-roe.com/docs/BiasVariance.html |
| guiding questions | https://github.com/code-ram/DAT8/blob/master/homework/09_bias_variance.md |
| bias-variance tradeoff | http://work.caltech.edu/library/081.html |
| regularization | http://work.caltech.edu/library/121.html |
| feature scaling | https://github.com/rasbt/pattern_classification/blob/master/preprocessing/about_standardization_normalization.ipynb |
| Practical Data Science in Python | http://radimrehurek.com/data_science_python/ |
| GridSearchCV and RandomizedSearchCV | http://scikit-learn.org/stable/modules/grid_search.html |
| How to find the best model parameters in scikit-learn | https://www.youtube.com/watch?v=Gol_qOgRqfA |
| associated notebook | https://github.com/justmarkham/scikit-learn-videos/blob/master/08_grid_search.ipynb |
| tutorials and examples | https://github.com/rasbt/pattern_classification |
| tools and extensions | http://rasbt.github.io/mlxtend/ |
| book | https://github.com/rasbt/python-machine-learning-book |
| blog | http://sebastianraschka.com/blog/ |
| mailing list | https://www.mail-archive.com/scikit-learn-general@lists.sourceforge.net/index.html |
| Introduction to Data Mining | http://www-users.cs.umn.edu/~kumar/dmbook/index.php |
| types of clustering | http://scikit-learn.org/stable/modules/clustering.html |
| PowerPoint presentation | http://www2.research.att.com/~volinsky/DataMining/Columbia2011/Slides/Topic6-Clustering.ppt |
| K-means clustering | https://www.youtube.com/watch?v=aIybuNt9ps4&list=PL5-da3qGB5IBC-MneTc9oBZz0C6kNJ-f2 |
| hierarchical clustering | https://www.youtube.com/watch?v=Tuuc9Y06tAc&list=PL5-da3qGB5IBC-MneTc9oBZz0C6kNJ-f2 |
| hierarchical clustering | https://joyofdata.shinyapps.io/hclust-shiny/ |
| mean shift clustering | http://spin.atomicobject.com/2015/05/26/mean-shift-clustering/ |
| K-modes algorithm | http://www.cs.ust.hk/~qyang/Teaching/537/Papers/huang98extensions.pdf |
| Python implementation | https://github.com/nicodv/kmodes |
| A Statistical Analysis of the Work of Bob Ross | http://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/ |
| data and Python code | https://github.com/fivethirtyeight/data/tree/master/bob-ross |
| How a Math Genius Hacked OkCupid to Find True Love | http://www.wired.com/2014/01/how-to-hack-okcupid/all/ |
| characteristics of your zip code | http://www.esri.com/landing-pages/tapestry/ |
| https://github.com/code-ram/DAT8#class-20-regularization-and-regular-expressions |
| notebook | https://github.com/code-ram/DAT8/blob/master/notebooks/20_regularization.ipynb |
| Ridge | http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html |
| RidgeCV | http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html |
| Lasso | http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html |
| LassoCV | http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html |
| LogisticRegression | http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html |
| Pipeline | http://scikit-learn.org/stable/modules/pipeline.html |
| GridSearchCV | http://scikit-learn.org/stable/modules/grid_search.html |
| Baltimore homicide data | https://github.com/code-ram/DAT8/blob/master/data/homicides.txt |
| Regular expressions 101 | https://regex101.com/#python |
| Reference guide | https://github.com/code-ram/DAT8/blob/master/code/20_regex_reference.py |
| Exercise | https://github.com/code-ram/DAT8/blob/master/code/20_regex_exercise.py |
| A Few Useful Things to Know about Machine Learning | http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf |
| Generalized Linear Models | http://scikit-learn.org/stable/modules/linear_model.html |
| An Introduction to Statistical Learning | http://www-bcf.usc.edu/~gareth/ISL/ |
| ridge regression | https://www.youtube.com/watch?v=cSKzqb0EKS0&list=PL5-da3qGB5IB-Xdpj_uXJpLGiRfv9UVXI&index=6 |
| lasso regression | https://www.youtube.com/watch?v=A5I1G1MfUmA&index=7&list=PL5-da3qGB5IB-Xdpj_uXJpLGiRfv9UVXI |
| original paper | http://statweb.stanford.edu/~tibs/lasso/lasso.pdf |
| machine learning course | https://www.coursera.org/learn/machine-learning/ |
| related lecture notes | http://www.holehouse.org/mlclass/07_Regularization.html |
| notebook | https://github.com/luispedro/PenalizedRegression/blob/master/PenalizedRegression.ipynb |
| Building Machine Learning Systems with Python | https://www.packtpub.com/big-data-and-business-intelligence/building-machine-learning-systems-python |
| Cross Validated Q&A | https://stats.stackexchange.com/questions/69568/whether-to-rescale-indicator-binary-dummy-predictors-for-lasso |
| blog post | http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models |
| introductory lesson | https://developers.google.com/edu/python/regular-expressions |
| video | https://www.youtube.com/watch?v=kWyoYtvJpe4&index=4&list=PL5-da3qGB5IA5NwDxcEJ5dvt8F9OQP7q5 |
| chapter | http://www.pythonlearn.com/html-270/book012.html |
| mbox.txt | http://www.py4inf.com/code/mbox.txt |
| mbox-short.txt | http://www.py4inf.com/code/mbox-short.txt |
| Breaking the Ice with Regular Expressions | https://www.codeschool.com/courses/breaking-the-ice-with-regular-expressions/ |
| RexEgg | http://www.rexegg.com/ |
| 5 Tools You Didn't Know That Use Regular Expressions | http://blog.codeschool.io/2015/07/30/5-tools-you-didnt-know-that-use-regular-expressions/ |
| Exploring Expressions of Emotions in GitHub Commit Messages | http://geeksta.net/geeklog/exploring-expressions-emotions-github-commit-messages/ |
| Emojineering | http://instagram-engineering.tumblr.com/post/118304328152/emojineering-part-2-implementing-hashtag-emoji |
| https://github.com/code-ram/DAT8#class-21-course-review-and-final-project-presentation |
| Data science review | https://docs.google.com/document/d/19gBCkmrbMpFFLPX8wa5daMnyl7J5BXhMV8JNJwgp1pk/edit?usp=sharing |
| machine learning map | http://scikit-learn.org/stable/tutorial/machine_learning_map/ |
| Choosing a Machine Learning Classifier | http://blog.echen.me/2011/04/27/choosing-a-machine-learning-classifier/ |
| Classifier comparison | http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html |
| Comparing supervised learning algorithms | http://www.dataschool.io/comparing-supervised-learning-algorithms/ |
| Supervised learning superstitions cheat sheet | http://ryancompton.net/assets/ml_cheat_sheet/supervised_learning.html |
| Machine Learning Done Wrong | http://ml.posthaven.com/machine-learning-done-wrong |
| Machine Learning Gremlins | https://www.youtube.com/watch?v=tleeC-KlsKA |
| Clever Methods of Overfitting | http://hunch.net/?p=22 |
| Common Pitfalls in Machine Learning | http://danielnee.com/?p=155 |
| Practical machine learning tricks from the KDD 2011 best industry paper | http://blog.david-andrzejewski.com/machine-learning/practical-machine-learning-tricks-from-the-kdd-2011-best-industry-paper/ |
| Advice for applying machine learning | http://cs229.stanford.edu/materials/ML-advice.pdf |
| An Empirical Comparison of Supervised Learning Algorithms | http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf |
| talk | http://videolectures.net/solomon_caruana_wslmw/ |
| https://github.com/code-ram/DAT8#class-22-final-project-presentation |
| What's next? | https://github.com/code-ram/DAT8/blob/master/other/advice.md |
| https://github.com/code-ram/DAT8#additional-resources-1 |
| https://github.com/code-ram/DAT8#tidy-data |
| Good Data Management Practices for Data Analysis | https://www.prometheusresearch.com/good-data-management-practices-for-data-analysis-tidy-data-part-2/ |
| Hadley Wickham's paper | http://www.jstatsoft.org/article/view/v059i10 |
| Bob Ross | https://github.com/fivethirtyeight/data/blob/master/bob-ross/elements-by-episode.csv |
| NFL ticket prices | https://github.com/fivethirtyeight/data/blob/master/nfl-ticket-prices/2014-average-ticket-price.csv |
| airline safety | https://github.com/fivethirtyeight/data/blob/master/airline-safety/airline-safety.csv |
| Jets ticket prices | https://github.com/fivethirtyeight/data/blob/master/nfl-ticket-prices/jets-buyer.csv |
| Chipotle orders | https://github.com/TheUpshot/chipotle/blob/master/orders.tsv |
| unreadable by computers | https://bosker.wordpress.com/2014/12/05/the-government-statistical-services-terrible-spreadsheet-advice/ |
| tips for releasing data in spreadsheets | http://www.clean-sheet.org/ |
| answer | http://stats.stackexchange.com/questions/83614/best-practices-for-creating-tidy-data/83711#83711 |
| https://github.com/code-ram/DAT8#databases-and-sql |
| GA slide deck | https://github.com/justmarkham/DAT5/blob/master/slides/20_sql.pdf |
| Python script | https://github.com/justmarkham/DAT5/blob/master/code/20_sql.py |
| SQL Bootcamp | https://github.com/brandonmburroughs/sql_bootcamp |
| GA notebook | https://github.com/podopie/DAT18NYC/blob/master/classes/17-relational_databases.ipynb |
| SQLZOO | http://sqlzoo.net/wiki/SQL_Tutorial |
| Mode Analytics | http://sqlschool.modeanalytics.com/ |
| Khan Academy | https://www.khanacademy.org/computing/computer-programming/sql |
| Codecademy | https://www.codecademy.com/courses/learn-sql |
| Datamonkey | http://datamonkey.pro/guess_sql/lessons/ |
| Code School | http://campus.codeschool.com/courses/try-sql/contents |
| advanced tutorial | https://www.codeschool.com/courses/the-sequel-to-sql/ |
| w3schools | http://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all |
| Reddit Comments | https://www.kaggle.com/c/reddit-comments-may-2015/data |
| What Every Data Scientist Needs to Know about SQL | http://joshualande.com/data-science-sql/ |
| Introduction to SQL for Data Scientists | http://bensresearch.com/downloads/SQL.pdf |
| 10 Easy Steps to a Complete Understanding of SQL | https://web.archive.org/web/20150402234726/http://tech.pro/tutorial/1555/10-easy-steps-to-a-complete-understanding-of-sql |
| Query Planning | http://www.sqlite.org/queryplanner.html |
| A Comparison Of Relational Database Management Systems | https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems |
| 14 mini-courses | https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/about |
| Blaze | http://blaze.pydata.org |
| https://github.com/code-ram/DAT8#recommendation-systems |
| GA slide deck | https://github.com/justmarkham/DAT4/blob/master/slides/18_recommendation_engines.pdf |
| Python script | https://github.com/justmarkham/DAT4/blob/master/code/18_recommenders_soutions.py |
| Mining of Massive Datasets | http://infolab.stanford.edu/~ullman/mmds/bookL.pdf |
| A Programmer's Guide to Data Mining | http://guidetodatamining.com/ |
| Netflix Recommendations: Beyond the 5 stars | http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html |
| Winning the Netflix Prize: A Summary | http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/ |
| A Perspective on the Netflix Prize | http://www2.research.att.com/~volinsky/papers/chance.pdf |
| paper | http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf |
| Stack Overflow Q&A | http://stackoverflow.com/questions/2323768/how-does-the-amazon-recommendation-feature-work |
| Facebook | https://code.facebook.com/posts/861999383875667/recommending-items-to-more-than-a-billion-people/ |
| Etsy | https://codeascraft.com/2014/11/17/personalized-recommendations-at-etsy/ |
| The Global Network of Discovery | http://www.gnod.com/ |
| The People Inside Your Machine | http://www.npr.org/blogs/money/2015/01/30/382657657/episode-600-the-people-inside-your-machine |
| course | https://www.coursera.org/learn/recommender-systems |
|
Readme
| https://github.com/code-ram/DAT8#readme-ov-file |
| Please reload this page | https://github.com/code-ram/DAT8 |
|
Activity | https://github.com/code-ram/DAT8/activity |
|
0
stars | https://github.com/code-ram/DAT8/stargazers |
|
0
watching | https://github.com/code-ram/DAT8/watchers |
|
0
forks | https://github.com/code-ram/DAT8/forks |
|
Report repository
| https://github.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2Fcode-ram%2FDAT8&report=code-ram+%28user%29 |
| Releases | https://github.com/code-ram/DAT8/releases |
| Packages
0 | https://github.com/users/code-ram/packages?repo_name=DAT8 |
|
| https://github.com |
| Terms | https://docs.github.com/site-policy/github-terms/github-terms-of-service |
| Privacy | https://docs.github.com/site-policy/privacy-policies/github-privacy-statement |
| Security | https://github.com/security |
| Status | https://www.githubstatus.com/ |
| Community | https://github.community/ |
| Docs | https://docs.github.com/ |
| Contact | https://support.github.com?tags=dotcom-footer |