In this exercise today, we will perform some simple data exploration using pandas in python. We will use a dataset that has information about various car models. The data is in a CSV file, mtcars.csv.
The notebook for this tutorial along with the dataset can be found here.
We can start by importing pandas and loading the data into the dataframe.
import pandas as pd
data = pd.read_csv('mtcars.csv')
Now that we have our data in a dataframe, we can take a peak into the data.
We can also quickly get some statistics on the data by using the describe function.
We can also get information about the columns and datatypes of each column and the count of non-null values.
RangeIndex: 32 entries, 0 to 31
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 model 32 non-null object
1 mpg 32 non-null float64
2 cyl 32 non-null int64
3 disp 32 non-null float64
4 hp 32 non-null int64
5 drat 32 non-null float64
6 wt 32 non-null float64
7 qsec 32 non-null float64
8 vs 32 non-null int64
9 am 32 non-null int64
10 gear 32 non-null int64
11 carb 32 non-null int64
dtypes: float64(5), int64(6), object(1)
memory usage: 3.1+ KB
We can also look for null values in the dataframe.
Now lets say we want to see which model has the maximum MPG. We can do that by finding the row in which the mpg column has the highest values.
model Toyota Corolla