Python Data Science

This page will eventually consist of a number of resources related to Data Science as practiced in Python. Right now I’m working on PANDAS.

An article I have found quite helpful as an introductory, opinionated guide is Minimally Sufficient Pandas by Ted Petrou (1/2019).

Articles

Statistics

PySpark

Luke Lee. First Steps with PySpark and Big Data Processing. realpython, 7/31/2019.

Linear Regression

Mirko Stojiljkovic. Linear Regression in Python. realpython.

Matplotlib

NumPy

  • NumPy – Stars: 16.2k – Updated: 2/2021 – Checked: 2/2021 – “The fundamental package for scientific computing with Python.”

Natural Language Processing (NLP)

Pandas

Pandas – Stars: 28.4k – Updated: 2/2021 – Checked: 2/2021 – “Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.”

Scikit-Learn

Scikit-Learn – Stars: 44.5k – Updated: 2/2021 – Checked: 2/2021 – “Simple and efficient tools for predictive data analysis…built on NumPy, SciPy, and matplotlib.”

Neural Networks

  • Padmaja Bhagwat. Introduction to Artificial Neural Networks in Python. kite, 7/18/19.
  • CNTK – Stars: 17k – Updated: 3/2020 – Checked: 1/2021 – “The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.”

Deep Learning

TensorFlow

Books

  • Jake VanderPlas. Python Data Science Handbook. O’Reilly, 2018. – Stars: 28k – Updated: 11/2018 – Checked: 2/2021.
    • Full text as Jupyter notebooks is available in github repo.

Tools

  • Jupyter
  • Renato Candido. Setting Up Python for Machine Learning on Windows. realpython.
  • Data Science Python Notebooks – 18.5k Stars – 2019. On Deep Learning (Tensorflow, Theano, Caffe, Keras), scikit-learn, kaggle, big data (Spark, Hadoop, MapReduce, HDFS), matplotlib, pandas, numpy, scipy, etc.
  • Homemade Machine Learning* – 14k Stars – 2020 – “Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained.”
  • Streamlit – “open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours!”
  • SciPy – Data Science and Analysis toolset. Includes NumPy, SciPy, Matplotlib, IPython, pandas, Sympy, nose.
  • Spyder – Scientific Python Development Environment.
  • TextBlob – Stars: 7.5k – Updated: 1/2021 – Checked: 2/2021 – Text processing including sentiment analysis.