Python Libraries for Data Science

Python libraries play a big role in making Python the swiss knife of programming languages. Data science requires data cleaning, data manipulation & data presentation. These libraries cover one or more of these different areas. Keep aside the hype and initial buzz surrounding python & AI – these libraries do help.

Below are the few python libraries which are very useful

Numpy

Why to use Numpy? It offers great ease when working on matrices and arrays. Matrices and arrays are really an important concept in machine learning. Let me give you an example. If I am building a recommendation system – I would model the system as a matrix. This matrix will have users as its rows and various attributes of users as its columns.

Scipy

Scipy means scientific python. This python library offers great support when you are doing statistics related tasks.

Pandas

Pandas have dataframes which are like a table. You can read a csv or an excel file into the dataframe and can do all sort of calculations and manipulations on that dataframe. Pandas offers limitless possibilities. Ever heard of data cleaning ? This is possible using pandas

Matplotlib & seaborn

These are required for visualization. In fact matplotlib is the base of seaborn. Seaborn has good support for more complex plots, attractive default styles, and integrates well with the pandas library.

scikit learn

We didnt touch machine learning yet. This is the library which has all sorts of models built into it and can be used for machine learning tasks.

Once you get a basic hold on these python libraries for data science, building models become very easy. I recommend trying out Amazon sagemaker if you are not too conversant on installing python and related libraries on your machine which can get difficult.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.