This tutorial will teach you how to use Python to analyze millions of rows of Baltimore City parking ticket data. More specifically, we will be using a Python package called pandas.
It is not humanly possible to comprehend a dataset with millions of rows. That’s why we will use the data analysis tool pandas to help us gain some insights into our dataset. If you have never used pandas before, don’t sweat it. I’ll walk you through all the fundamentals that you need to start crunching your own datasets.
Baltimore City Parking Ticket Data
Below is a preview of the data that we will be working with in this tutorial. I created this dataset based on publicly available data through Baltimore’s Open Data website. The dataset used in this tutorial can be downloaded as a CSV file from the dropdown menu of the table below. Note that the CSV file is a few hundred megabytes in size.
I stumbled upon this dataset after I got my very own Baltimore City parking citation. I used to crunch large financial datasets on a daily basis as a software engineer at a hedge fund, so I figured that I’d do an analysis on parking tickets in Baltimore and see how much money the city was pulling in from these fines.
Getting Setup with Python and Pandas
I’ll assume that you have a recent version of Python installed on your machine, but if you don’t, you can learn how to install virtualenv with Python 3.
The first thing you’ll want to do is install the pandas package. We’ll do that with the pip package manager. In a terminal, type the following.
pip install pandas
That’s it! You now have pandas installed.
If you haven’t already done so, download the Baltimore_City_Parking_Tickets.csv file from the Export tab. Place this file in your home directory.
Finally, let’s hop into a Python interpreter in your terminal from your home directory. Simply type python into your terminal window.
python
Reading a CSV File with Pandas
The pandas Python package makes it super easy to work with delimited datasets such as CSV files. To read our CSV file, we will use the read_csv function.
import pandas as pd df = pd.read_csv('Baltimore_City_Parking_Tickets.csv')
We now have a variable called df with a chronological integer index that holds over 1 million rows and 7 columns of parking tickets from Baltimore City. We called our variable df which is short for dataframe. This is common practice when working with pandas. You can think of a dataframe as an Excel-like spreadsheet object.
You can see a preview of the data by typing the name of the df variable into your Python interpreter. Examine your dataframe by accessing some of these properties and methods.
df df.columns df.head() df.tail()
This is a very simple way to quickly read a CSV file, but this is a very dumb way to read a CSV file. Right now, every cell in our dataframe is an object. In other words, Python doesn’t know that our dates are actually dates or that our fines are actually numbers. Additionally, we want to use violation date column as our index. Ultimately, we want a sorted DateTimeIndex in order to do time series analysis. Let’s reread our CSV file into memory, but this time, be smarter about it.
index_col = 'ViolDate' dtype = {'Description':str, 'ViolFine':float, 'Address':str, 'Citation':str, 'Tag':str, 'State':str} df = pd.read_csv('Baltimore_City_Parking_Tickets.csv', index_col=index_col, dtype=dtype) df.index = pd.to_datetime(df.index, format='%m/%d/%Y %I:%M:%S %p') df = df.sort_index()
Now we have a dataframe with a DateTimeIndex, over 1 million rows, and 6 typed columns. Perfect!
Baltimore City Parking Ticket Statistics with Python and Pandas
We are ready to start analyzing this dataset! There’s a lot of insight we can gain from this data. Let’s walk through a few examples.
How Much Money Does Baltimore Make From Parking Tickets?
You’re probably wondering, as was I, how much money Baltimore is pulling in from parking tickets. We can easily find this answer by using the sum aggregate function.
df['ViolFine'].sum()
As you’ll see, Baltimore City made a staggering $123,391,897 from parking tickets over the course of two years.
How Much Money Does Baltimore Make From Speed Cameras?
Speed cameras seem to be around every corner in Baltimore. You’re probably in the minority if you live in Baltimore and have never received a speed camera ticket.
All joking aside, to find out how much Baltimore has made from speed camera tickets, we first need to only consider rows with a description of ‘Fixed Speed Camera’.
df['Description'] == 'Fixed Speed Camera'
As you’ll see, this just returns True or False if that row matches ‘Fixed Speed Camera’. We need to use this boolean series to index into our dataframe and subset only those rows with a value of True.
df[df['Description'] == 'Fixed Speed Camera']
Perfect. You’ll see right away that Baltimore issued 498,840 speed camera tickets over the course of these two years. This is already an staggering insight.
Moving on though, let’s use sum like we did before to get the total dollar amount.
df[df['Description'] == 'Fixed Speed Camera']['ViolFine'].sum()
Wow! Baltimore City issued $19,953,600 in speed camera tickets between October 5, 2016 and October 5, 2018.
It’s worth noting at this point that you don’t have to chain the Python code on a single line. You can also create intermediate variables if that helps you understand these concepts better. The following code is identical to above and will yield the same result.
speed_camera_df = df['Description'] == 'Fixed Speed Camera' speed_camera_viol_fine_df = df[speed_camera_df]['ViolFine'] speed_camera_viol_fine_df.sum()
How Much Money Does Baltimore Make From Each Type of Ticket?
Now that we know Baltimore City is pulling in nearly $20 million over the course of two years from speed camera tickets alone, how much are they making from other types of tickets? We can determine this with a single line of code.
df.groupby('Description')['ViolFine'].sum().sort_values(ascending=False)
This might seem complicated, but let’s take it one function at a time from left to right. First, we use groupby to essentially put all the same violation descriptions into the same bucket. Then we sum all the violation fines in each bucket. Finally, we sort by descending order so that the larger fines appear first.
What we end up with is the following table, which shocked me, and will probably shock you too. Remember that these numbers are over the course of two years… but still.
Baltimore City Ticket Revenue by Violation
Violation | Revenue |
---|---|
Fixed Speed Camera | $19,953,600 |
Red Light Violation | $11,178,675 |
All Other Parking Meter Violations | $6,549,440 |
No Stop/Park Street Cleaning | $4,782,388 |
No Stopping/Standing Tow Away Zone | $3,664,088 |
Right on Red | $2,914,125 |
No Stop/Park Handicap | $2,671,342 |
Residential Parking Permit Only | $2,536,144 |
No Stopping/Standing Not Tow-Away Zone | $2,331,286 |
Expired Tags | $2,323,840 |
Abandoned Vehicle | $1,515,136 |
No Parking/Standing In Bus Stop/Bus Lane | $1,474,200 |
Obstruct/Impeding Movement of Pedestrian | $971,817 |
All Other Stopping or Parking Violations | $951,106 |
No Parking/Standing In Transit Stop | $829,192 |
No Stopping//Parking Stadium Event Camden | $683,196 |
Less Than 15 feet from Fire Hydrant | $613,230 |
Commercial Veh/Residence under 20,000 lbs | $610,848 |
Obstruct/Impeding Flow of Traffic | $398,433 |
Obstructing/Imped Traffic Xwalk/inter/school | $266,868 |
Exceeding 48 Hours | $201,760 |
Passenger Loading Zone | $199,424 |
Commercial Veh/Residence over 20,000 lbs | $163,150 |
Commercial Vehicle Obstruct/Imped Traffic Flow | $111,132 |
Fire Lane/Handicapped Violation | $86,421 |
No Parking/Standing In Bike Lanes | $49,385 |
Blocking Garage or Driveway | $20,898 |
No Parking/Stand Motor Home/Campr/Travel Trailer | $5,544 |
No Parking/Standing Vendor Truck | $3,514 |
No Stopping or No Parking Pimlico Event | $3,162 |
Unlawful Dumping/Waste Hauler w/o Permit | $2,510 |
In Taxicab Stand | $2,368 |
The second most popular ticket in Baltimore is the red light violation ticket. I would have to guess that, similar to the speed cameras, these red light violations are the automatic ones that you see fixed to a pole at intersections.
The type of ticket that I got this year was “Obstruct/Impeding Movement of Pedestrian” which brought in nearly a cool million bucks over two years.
What is Baltimore’s Monthly Ticket Revenue?
After knowing how much Baltimore City is making for each type of ticket, let’s find out how much Baltimore City is making on tickets every month.
We can resample the data on a monthly basis (i.e. ‘M’) and aggregate this result with a sum like before. Resampling on a daily basis with ‘D’ and yearly basis with ‘Y’ are also valid arguments.
df['ViolFine'].resample('M').sum()
This yields a dataframe with 25 rows, one for each month between October 5, 2016 and October 5, 2018. Notice that the first and last row are the relatively smaller because these October months are only partial months.
Baltimore City Ticket Revenue by Month
Month | Revenue |
---|---|
October 2016 | $1,324,135 |
November 2016 | $1,435,148 |
December 2016 | $1,397,615 |
January 2017 | $1,416,006 |
February 2017 | $1,368,683 |
March 2017 | $1,532,398 |
April 2017 | $1,396,936 |
May 2017 | $1,560,212 |
June 2017 | $1,487,301 |
July 2017 | $1,513,177 |
August 2017 | $2,826,269 |
September 2017 | $2,947,868 |
October 2017 | $3,815,125 |
November 2017 | $3,800,495 |
December 2017 | $3,160,081 |
January 2018 | $3,488,668 |
February 2018 | $3,147,869 |
March 2018 | $3,487,466 |
April 2018 | $4,159,604 |
May 2018 | $4,532,835 |
June 2018 | $4,495,594 |
July 2018 | $5,002,747 |
August 2018 | $4,992,486 |
September 2018 | $3,583,117 |
October 2018 | $200,757 |
You’ll quickly see that Baltimore is making well over $1 million dollars each and every month from tickets and in some cases over $5 million dollars in a single month.
Which Month Did Baltimore City Issue the Most Tickets?
Now that we know how much money Baltimore is making off of ticket for each month, we can easily find out which month saw the most revenue by using max.
df['ViolFine'].resample('M').sum().max()
Baltimore issued a record $5,002,747 of tickets in July 2018. Mind blowing!
Final Thoughts
I’m sure by now you see the power of the Python package pandas. We were able to effortlessly comb through millions of rows of Baltimore City parking ticket data and gain many insights that would be very hard to conclude from manually examining the raw data by hand.
For more tutorials like this, check out some of my other Python blog posts.
As a resident of Baltimore for 10 years, all of the insights that we derived from the data were surprising to me. If nothing else, I hope that this tutorial helped you better understand how to use pandas to do basic data analysis on large datasets.
Let me know in the comments below what your thoughts are about the insights that we gained from the data or if you were able to find out anything else interesting about this Baltimore City parking ticket dataset.
Keep on crunching that data!