This page will eventually consist of a number of resources related to Data Science as practiced in Python. Right now I’m working on PANDAS.
An article I have found quite helpful as an introductory, opinionated guide is Minimally Sufficient Pandas by Ted Petrou (1/2019).
Articles
- Kirit Thadaka. Data Science, the Good, the Bad, and the…Future. kite, 2019.
- Pranathi V.N. Vemuri. Image Segmentation with Python. kite, 7/18/19.
- T.J. Simmons. 10 Essential Data Science Packages for Python. kite, 5/27/19.
- Daniel Pyrathon. Practical Machine Learning with Python and Keras. kite, 1/30/19.
- Ray Johns. PyTorch vs TensorFlow for Your Python Deep Learning Project. realpython, 9/2/2020.
- The Kite Team. Tensorflow or PyTorch? A Guide to Python Machine Learning Libraries (with examples!). kite, 10/25/18.
- Eleanor Stribling. Python, NLTK, and the Digital Humanities: Finding Patterns in Gothic Literature. kite, 10/4/18.
- Bryan Weber. Scientific Python: Using SciPy for Optimization. real python, 1/20/20.
- Joseph Lee Wei En. How to Get Started with Python for Deep Learning and Data Science. freecodecamp, 3/6/19.
- Brad Solomon. Python Histogram Plotting: NumPy, Matplotlib, Pandas & Seaborn. realpython.
- Rahul Agarwal. Data Scientists, the 5 Graph Algorithms that You Should Know. towardsdatascience, 2019.
- Mirko Stojiljkovic. NumPy, SciPy, and Pandas: Correlation with Python. realpython, 12/23/19.*
- Chris Moffit. Python Tools for Record Linking and Fuzzy Matching.* pbpython, 2/18/2020.
- Real Python Staff. Python for Social Scientists. realpython.
Statistics
- Mirko Stojiljkovic. Python Statistics Fundamentals: How to Describe Your Data. real python, 12/16/19.
- Tirtha Sarkar. Statistical Modeling with Python: How-to & Top Libraries. kite, 2019.
PySpark
Luke Lee. First Steps with PySpark and Big Data Processing. realpython, 7/31/2019.
Linear Regression
Mirko Stojiljkovic. Linear Regression in Python. realpython.
Matplotlib
- Shaumik Daityari. How to Plot Charts in Python with Matplotlib. sitepoint, 7/10/2019.
- Hennadii Madan. Matplotlib Explained. kite, 3/5/19.
- Brad Solomon. Python Plotting with Matplotlib (Guide). realpython.
NumPy
- NumPy – Stars: 16.2k – Updated: 2/2021 – Checked: 2/2021 – “The fundamental package for scientific computing with Python.”
- Mirko Stojiljkovic. NumPy arange(): How to use np.arange(). realpython, 7/22/2019.
- Beau Carnes. Learn NumPy and Start Doing Scientific Computing in Python. freecodecamp, 8/9/19.
- Stephen Gruppetta. np.linspace(): Create Evenly or Non-Evenly Spaced Arrays. realpython, 11/2020.
- Jay Alammar. A Visual Intro to NumPy and Data Representation. 2019.
Natural Language Processing (NLP)
- Shaumik Daityari. Getting Started with Natural Language Processing in Python. 3/2019.
- SpaCy – NLP
- Taranjeet Singh. Natural Language Processing with spaCy in Python. realpython.
- Natural Language Toolkit (NLTK) – Stars: 9.6k – Updated: 1/2021 – Checked: 2/2021 – “a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing.”
Pandas
Pandas – Stars: 28.4k – Updated: 2/2021 – Checked: 2/2021 – “Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.”
- Bryan Weber. Pandas Project: Make a Gradebook with Python & Pandas. realpython, 7/2020.*
- Nick McCullum. The Ultimate Guide to the Pandas Library for Data Science in Python. freecodecamp, 7/2020.
- Zachary Wilson. The Pandas Library for Python. kite, 3/25/19.
- Bhavani Ravi. Python Pandas–Basics to Beyond. hackernoon, 2019.
- Bhavani Ravi. Learn Python Pandas in 5 Mins.
- Part 2. 2/7/19.
- Parul Pandey. Time Series Analysis with Pandas. kite, 2019.
- Alex DeBrie. How to Use Pandas GroupBy, Counts and Value Counts. kite, 7/18/2019.
- Mokhtar Ebrahim. Python Pandas Tutorial: Getting Started with DataFrames. like geeks, 2/2019.
- Brad Solomon. Pandas GroupBy: Your Guide to Grouping Data in Python. real python, 11/18/19.
- Alex DeBrie. Pandas Pivot: A Guide with Examples. kite, 6/29/19.
- T.J. Simmons. The Quickest Ways to Sort Pandas DataFrame Values. kite, 6/25/19.
- Alex DeBrie. Pandas Merge, Join, and Concat: How To and Examples. kite, 5/3/2019.
- Alex DeBrie. Guide: Pandas DataFrames for Data Analysis. kite, 4/3/19.
- Dataframe Visualization with Pandas Plot. kanoki, 2019.
- Brad Solomon. Python Pandas: Tricks & Features You May Not Know. Real Python.
- Chris Moffitt. Effectively Using Matplotlib. practical business python, 4/25/17.
- Peter Nistrup. Exploring Your Data with Just 1 Line of Python. towardsdatascience, 9/25/19.
- Reka Horvath. Using Pandas and Python to Explore Your Dataset. realpython, 1/6/20.
- Mirko Stojiljkovic. Pandas: How to Read and Write Files. realpython, 12/2/19.
- Malay Agarwal. Pythonic Data Cleaning with Pandas and NumPy. realpython.
- Joe Wyndham. Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects. realpython.
- Mirko Stojiljković. SettingWithCopyWarning in Pandas: Views vs Copies. realpython, 6/2020.
- Kyle Stratis. Combining Data in Pandas with merge(), .join(), and concat(). realpython, 4/2020.
- Reka Horvath. Plot with Pandas: Python Data Visualization for Beginners. realpython, 9/2020.
Scikit-Learn
Scikit-Learn – Stars: 44.5k – Updated: 2/2021 – Checked: 2/2021 – “Simple and efficient tools for predictive data analysis…built on NumPy, SciPy, and matplotlib.”
- Mirko Stojiljkovic. Split Your Dataset with scikit-learn’s train_test_split(). realpython, 11/2020.
Neural Networks
- Padmaja Bhagwat. Introduction to Artificial Neural Networks in Python. kite, 7/18/19.
- CNTK – Stars: 17k – Updated: 3/2020 – Checked: 1/2021 – “The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.”
Deep Learning
- Keras – Deep Learning Library, can run on top of TensorFlow, CNTK, Theano.
- Victor Zhou. Keras for Beginners: Implementing a Convolutional Neural Network. 2019.
- Nikolai Janakiev. Practical Text Classification with Python and Keras. realpython.
- PyTorch – Stars: 46.1k – Updated: 2/2021 – Checked: 2/2021 – Deep Learning.
TensorFlow
- TensorFlow – Stars: 153k – Updated: 2/2021 – Checked: 2/2021 – machine learning.
- TensorFlow Cookbook – Stars: 2.8k – Updated: 2/2020 – Checked: 2/2021.
- TensorFlow Examples – Stars: 39.9k – Updated: 12/2020 – Checked: 2/2021.
- TensorFlow Course – Stars: 15.4k – Updated: 11/2020 – Checked: 2/2021.
Books
- Jake VanderPlas. Python Data Science Handbook. O’Reilly, 2018. – Stars: 28k – Updated: 11/2018 – Checked: 2/2021.
- Full text as Jupyter notebooks is available in github repo.
Tools
- Jupyter
- Mike Driscoll. Jupyter Notebook: An Introduction. realpython.
- Renato Candido. Setting Up Python for Machine Learning on Windows. realpython.
- Data Science Python Notebooks – 18.5k Stars – 2019. On Deep Learning (Tensorflow, Theano, Caffe, Keras), scikit-learn, kaggle, big data (Spark, Hadoop, MapReduce, HDFS), matplotlib, pandas, numpy, scipy, etc.
- Homemade Machine Learning* – 14k Stars – 2020 – “Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained.”
- Streamlit – “open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours!”
- SciPy – Data Science and Analysis toolset. Includes NumPy, SciPy, Matplotlib, IPython, pandas, Sympy, nose.
- Spyder – Scientific Python Development Environment.
- TextBlob – Stars: 7.5k – Updated: 1/2021 – Checked: 2/2021 – Text processing including sentiment analysis.