A tutorial to data visualization in python with Matplotlib, Seaborn, and Plotly

  • Two basic python packages are required for visualization:

    1. Matplotlib – a Python based plotting library offers matplotlib with a complete 2D support with limited 3D graphic support. It is useful in producing publication quality figures in interactive environment across platforms.
    2. Seaborn – Based on Matplotlib, Seaborn provides various features such as built-in themes, color palettes, functions and tools to visualize univariate, bivariate, linear regression, data matrices, time series, etc in order to build more complex visualizations.
  • The sample dataset used in this tutorial dataset

  • Import dataset:

  • import matplotlib.pyplot as plt
    import pandas as pd
    import numpy as np
    import seaborn as sns 
    df=pd.read_excel("pathtodataset", "sample")
  • Create histogram:

  • fig=plt.figure() #Plots in matplotlib reside within a figure object, use plt.figure to create new figure
    #Create one or more subplots using add_subplot, because you can't create blank figure
    ax = fig.add_subplot(1,1,1)
    #Variable
    ax.hist(df['Age'],bins = 7) # Here you can play with number of bins
    Labels and Tit
    plt.title('Age distribution')
    plt.xlabel('Age')
    plt.ylabel('#Employee')
    plt.show()1
  • Create boxplot:

  • fig=plt.figure()
    ax = fig.add_subplot(1,1,1)
    #Variable
    ax.boxplot(df['Age'])
    plt.show()2
  • Create violin plot

  • sns.violinplot(df['Age'], df['Gender']) #Variable Plot
    sns.despine()3
  • Create bar chart

  • var = df.groupby('Gender').Sales.sum() #grouped sum of sales at Gender level
    fig = plt.figure()
    ax1 = fig.add_subplot(1,1,1)
    ax1.set_xlabel('Gender')
    ax1.set_ylabel('Sum of Sales')
    ax1.set_title("Gender wise Sum of Sales")
    var.plot(kind='bar')4
  • Create line chart

  • var = df.groupby('BMI').Sales.sum()
    fig = plt.figure()
    ax1 = fig.add_subplot(1,1,1)
    ax1.set_xlabel('BMI')
    ax1.set_ylabel('Sum of Sales')
    ax1.set_title("BMI wise Sum of Sales")
    var.plot(kind='line')5
  • Create Stacked Column Chart

  • var = df.groupby(['BMI','Gender']).Sales.sum()
    var.unstack().plot(kind='bar',stacked=True,  color=['red','blue'], grid=False)6
  • Create Scatter Plot

  • fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    ax.scatter(df['Age'],df['Sales']) #You can also add more variables here to represent color and size.
    plt.show()7
  • Create Bubble Plot

  • fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    ax.scatter(df['Age'],df['Sales'], s=df['Income']) # Added third variable income as size of the bubble
    plt.show()8
  • Create Pie chart

  • var=df.groupby(['Gender']).sum().stack()
    temp=var.unstack()
    type(temp)
    x_list = temp['Sales']
    label_list = temp.index
    plt.axis("equal") #The pie chart is oval by default. To make it a circle use pyplot.axis("equal")
    #To show the percentage of each pie slice, pass an output format to the autopctparameter plt.pie(x_list,labels=label_list,autopct="%1.1f%%") plt.title("Pastafarianism expenses") 
    plt.show()9
  • Create Heat Map

  • #Generate a random number, you can refer your data values also
    data = np.random.rand(4,2)
    rows = list('1234') #rows categories
    columns = list('MF') #column categories
    fig,ax=plt.subplots()
    #Advance color controls
    ax.pcolor(data,cmap=plt.cm.Reds,edgecolors='k')
    ax.set_xticks(np.arange(0,2)+0.5)
    ax.set_yticks(np.arange(0,4)+0.5)
    # Here we position the tick labels for x and y axis
    ax.xaxis.tick_bottom()
    ax.yaxis.tick_left()
    #Values against each labels
    ax.set_xticklabels(columns,minor=False,fontsize=20)
    ax.set_yticklabels(rows,minor=False,fontsize=20)
    plt.show()10

Leave a comment