Creating Professional-Quality Plots in Python: A Comprehensive Guide
Written on
Chapter 1: Introduction to Data Visualization
Data science revolves around effective communication. Those of us engaged in this field dedicate our time to mastering numerical methods, data manipulation, and regression analysis to derive insights from data sets. However, the true value of these skills lies in our ability to convey our findings effectively.
This underscores my belief that the essence of data science is communication. The key is to present your data analysis conclusions in a manner that persuades others to take notice. Since many individuals are reluctant to delve into raw data or complex statistical models, data visualization becomes a crucial communication tool in a data scientist's toolkit.
Fortunately, Python features specialized libraries that facilitate the creation of polished, report-ready plots, which are instrumental in sharing insights with supervisors and clients. One of my favorite libraries for this purpose is Bokeh. This guide will instruct you on how to utilize Bokeh to produce professional-quality plots.
Section 1.1: Getting Started with Python
For those new to Python, a bit of setup is necessary before diving into plotting. There are various methods to access and utilize Python, allowing for personal choice in your approach. I recommend using the Anaconda distribution, which includes essential packages and the Spyder IDE.
Subsection 1.1.1: Installing Bokeh
To create plots with Bokeh, the first steps involve importing the necessary Bokeh functions and obtaining a data set. In this tutorial, we will generate a simple data set using NumPy, which will serve as our plotting data.
To import the required functions, use the following code:
from bokeh.plotting import figure, save, output_file
import numpy as np
The imported Bokeh functions will allow you to create figures, save your plots, and specify where to save them. By importing the entire NumPy library with the alias np, we can easily generate the data arrays needed for our plot.
Section 1.2: Creating Your First Plot
We will now create our first plot using the data generated with NumPy. Start by creating the data arrays:
x = np.arange(0, 10, 1)
y = np.arange(0, 5, 0.5)
To create a plot object, we invoke the figure function and assign it to a variable. While the inputs in the figure function are optional, they are crucial for creating a report-ready plot. Key parameters include:
- width: Set to 800 pixels for standard plots.
- height: Usually set to 400 pixels.
- x_axis_label: Descriptive labels for the x-axis.
- y_axis_label: Descriptive labels for the y-axis.
The code below generates the plot object:
p1 = figure(width=800, height=400, x_axis_label='Time Since Experiment Start (Minutes)', y_axis_label='Distance Driven by Test Car (Kilometers)')
Next, we will add the x and y data to our plot using the circle function:
p1.circle(x, y, legend='Honda', color='red')
To save the plot, specify the output file location and name:
output_file('C:\Users\JSmith\Desktop\FirstPlot.html', title='First Plot')
save(p1)
Opening the resulting HTML file will display your plot.
Chapter 2: Enhancing Your Plot
This video tutorial demonstrates how to create publication-quality line plots using Python and Bokeh, perfect for impressing clients and stakeholders.
This video focuses on making attractive figures in Python, including advanced plotting techniques that can elevate your data visualization skills.
Improving the initial plot involves addressing common issues, such as:
- The legend obscuring data points
- Small font sizes for axis labels and ticks
To relocate the legend and enhance font sizes, add the following lines of code:
p1.legend.location = "bottom_right"
p1.xaxis.axis_label_text_font_size = "16pt"
p1.yaxis.axis_label_text_font_size = "16pt"
p1.xaxis.major_label_text_font_size = "14pt"
p1.yaxis.major_label_text_font_size = "14pt"
p1.legend.label_text_font_size = '16pt'
These adjustments will lead to a more legible and visually appealing plot.
Adding additional data series can enrich the narrative your plot conveys. For example, let's generate a second set of y-values:
y2 = np.arange(0, 20, 2)
p1.diamond(x, y2, legend='Ferrari', color='blue')
p1.legend.location = "top_left"
This results in a more informative plot that allows for effective comparisons between the data series, making it ideal for presentation.
This is how you can create impactful, report-ready plots that enhance your data storytelling capabilities. If you enjoyed this tutorial and want to learn more about Python programming, consider reading my book, 1000x Faster: How to Automate Laboratory Data Analysis with Python.