Python Libraries Needed for Machine Learning

This is the second tutorial in the series. In this tutorial, we will be studying various python libraries and their corresponding functions that we will be using during our series

ML Python Libraries

 
This is the second tutorial in the series. In the previous tutorial, we brushed our python basics. In this tutorial, we will be studying various python libraries and their features. We will be discussing the following Python Libraries:
  1. Numpy
  2. Pandas
  3. Sklearn
  4. Matplotlib
  5. Seaborn
  6. Tensorflow

Numpy

 
Numeric, the ancestor of NumPy, was developed by Jim Hugunin. Another package Numarray was also developed, having some additional functionalities. In 2005, Travis Oliphant created NumPy package by incorporating the features of Numarray into Numeric package. There are many contributors to this open-source project. 
 
Numpy or Numerical Python is a python library that provides the following
  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier Transform and random number capabilities.
It can also provide an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. The official website is www.numpy.org
 

Pandas 

 
Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals 
 
The original author is Wes McKinney. Pandas was first released on 11 January 2008. The official website is www.pandas.pydata.org
  • DataFrame object for data manipulation with integrated indexing.
  • Tools for reading and writing data between in-memory data structures and different file formats.
  • Data alignment and integrated handling of missing data.
  • Reshaping and pivoting of data sets.
  • Label-based slicing, fancy indexing, and subsetting of large data sets.
  • Data structure column insertion and deletion.
  • Group by engine allowing split-apply-combine operations on data sets.
  • Data set merging and joining.
  • Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data structure.
  • Time series-functionality: Date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging.
  • Provides data filtration.

Scikit-Learn

 
Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language. 
 
It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. 
 
The scikit-learn project started as scikits.learn a Google Summer of Code project by David Cournapeau. Its name stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately-developed and distributed third-party extension to SciPy. 
 
It was first released on June 2012. The official website is www.scikit-learn.org
 

Matplotlib

 
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There is also a procedural "pylab" interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of Matplotlib.
 
Matplotlib was originally written by John D. Hunter, has an active development community, and is distributed under a BSD-style license. It was first released in 2003. The official website is www.matplotlib.org.
 

Seaborn

 
Seaborn is a library for making statistical graphics in Python. It is built on top of matplotlib and closely integrated with pandas data structures.
 
Here is some of the functionality that seaborn offers:
  • A dataset-oriented API for examining relationships between multiple variables
  • Specialized support for using categorical variables to show observations or aggregate statistics
  • Options for visualizing univariate or bivariate distributions and for comparing them between subsets of data
  • Automatic estimation and plotting of linear regression models for different kinds of dependent variables
  • Convenient views onto the overall structure of complex datasets
  • High-level abstractions for structuring multi-plot grids that let you easily build complex visualizations
  • Concise control over matplotlib figure styling with several built-in themes
  • Tools for choosing colour palettes that faithfully reveal patterns in your data
Seaborn aims to make visualization a central part of exploring and understanding data. Its dataset-oriented plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots
 
The official website is seaborn.pydata.org 
 

Tensorflow 

 
TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library and is also used for machine learning applications such as neural networks. It is used for both research and production at Google.
 
TensorFlow was developed by the Google Brain team for internal Google use. It was released under the Apache License 2.0 on November 9, 2015. The official website is www.tensorflow.org.
 

Conclusion 

 
In the above article, I introduced to you numpy, pandas, sklearn, seaborn, matplotlib and Tensorflow. In the next articles, we will be studying in details about each of the library.
 
Hope to see in the next article. 
 
Congratulations!!! You now have a basic idea about all the Python ML Libraries.
 
Next article in this series >> Numpy