Working with JSON Data in Modeling Crime: Topics of Data Analytics

Introduction to JSON Data

The Topics of Data Analytics series has been working thoroughly to deliver the tools that provide the greatest utility. The series began by explicating the dynamics behind working with CSV derived data. We provided insight into what CSV data consists of and the structure that defines it. Furthermore, we expanded on the methods of its importing, extracting particular subsets of data, storing the CSV data as well as plotting it. Within this article, we seek this same end with respect to working with JSON derived data. Additionally, we will throughly all of such similar topics, including importing, extracting, storing, and modeling. Let us proceed.

What is JSON Data?

JSON is a data structure known as Java Script Object Notation which, like CSV, provides a particular structure to the data stored therein. JSON is a frequently used data structure known for the particular ease by which it can be used, transferred, and parsed, as well as the myriad of platforms upon which it implements.

We know from previous discussions that the unique attribute of CSV data is the fact that it is organized by the differential placement of commas (or alternative delimiters) which segregate various data objects. JSON differs from CSV in that its structure relies upon organizing data in the form of a highly codified dictionary.

In this article, we will be working with the Maryland Violent crime statistics from ‘data.gov’, a file which can be found here. Data.gov has this file available in the form of CSV, XML, JSON, and HTML. Here, we will specifically utilize the JSON version of this data. Take a look at its structure below:

Conceptualizing JSON Structure

JSON relies on two fundamental structural components: (1) An amalgamation of key-value pairs which constitute the dictionary feature, and (2) An ordered list of values, organized as an array. In JSON, an object is a set of key-value pairs surrounded by brackets ‘{}’. Furthermore, the array represents a series of values collected in square brackets ‘[]’. Within the array, values are separated by commas, and may be either numeric or strings. Boolean objects such as ‘True’ or ‘False’ may also be used.

Importing JSON Data

When you have a JSON file you know you want to work with, the first order of business is either saving the file under a particular name in your working directory, or acquiring the full path of the file in your computer system. Once we determine this, we import the ‘json’ library into the program. The JSON library provides a variety of functions which permit the manipulation of JSON data and files.

With the library imported, the first order of business is to open the JSON file using the built-in Python function ‘open()’. We explicitly specify the ‘read’ method as well to iterate through the contents of the entire JSON file. Subsequently, we store this to the crime_data variable.

After this, we call the ‘json.loads’ function on the crime_data variable to access the entirety of the content and store this to the actual data variable. We then iterate through the data with a for-loop and print out the content. The code for executing this functionality appears as follows:

When we execute the code above, the output result looks like this:

Extracting JSON Content

Looking at the image above, the data appears to simply be a complex list of numbers and we have no idea what they reference. It’s important that we dig into the JSON data to get an idea of what it is we’re working with. One portion of the file appears as follows:

In line 134, we see that this portion of the data is referring to the jurisdiction, or county, from which the information derives. The data has 37 other sub-groups of this information specificity. These include the year, population, number of murders, rapes, robberies, assaults, breaking and entry, thefts, vehicle thefts, total crimes, percent change in crime over time, total violent crime, percentage of violent crime, change in percentage of violent crime, total property crime, its percentage, and percent change, crime rate per 100,000, and many more.

Now that we know what the data means and its organization, we begin to extract therefrom the actual items we desire. Because we can calculate the percentages per 100,000 on our own using the crime numbers and population, we will leave those alone. Therefore, we will focus on acquiring the pieces of information we can acquire or calculate on our own.

For our purposes, we will extract the data related to: (1) Jurisdiction, (2) Year, (3) Population, (4) Murder, (5) Rape, (6) Robbery, (7) Assault, (8) Break-ins, (9) Petty Theft, (10) Vehicular Theft, (11) Total Crime, (12) Percentage Change in Total Crime, (13) Total Violent Crime, (14) Percent of Violent Crime, (15) Percentage Change in Violent Crime, (16) Total Property Crime, (17) Percent of Property Crime, (18) Percentage Change in Property Crime, (19) Overall Crime Rate Per 100,000.

Parsing JSON

Because the JSON file is organized as a dictionary, in order to iterate through it, our for-loop must take into account both the keys and the values. We know that the data we desire is held within the key ‘data’ so we can specify this directly with an ‘if’ statement. Then we can iterate through each item in this region and parse out the data we desire. For every array within the key, we extract the data and append it to a list to create a nested list, and also specify the jurisdiction in its own list. This is because the jurisdiction will serve as the index later on. The code to execute this appears as follows:

Now, we could have very easily done this with a slice, but I wanted the code to be explicitly demonstrated so that it could be followed with ease by the reader.

Creating Data Structure

It is quite important that once the data has been extracted, we have a succinct and organized means of storing the data. As we observed in our discussion of CSV files, the quickest means of organizing data is with Pandas DataFrames.

Before we get ahead of ourselves and start throwing numbers into a data structure, we have to ask ourselves what we intend to do with this information. If our goal is to plot the data which has been received, well then, our first order of business is to make sure that data values are in the form of integers or floats. To do this, we much iterate through each nested list and each item in each nested list. The code for doing so appears as follows:

Now, once the data has been converted to a usable format, we may create the Pandas data frame. When we do this, we specify the data items as the nested lists in the ‘values’ nested list. Furthermore, we specify the jurisdictions as the index for the data frame. After we have done this, we give a title to each of the columns. The code for creating this data frame looks as follows:

When we execute this code, the structure of the data frame looks like:

The Take Away

MatPlotLib makes it quite simple to plot the data from the Pandas data frame. Nevertheless, the present article has granted quite an extensive discussion of the methods for working with JSON data. From the structure of JSON to importing, extracting, and storing this data, we have discussed at length the important techniques necessary for efficient data modeling. Hopefully this tutorial has proven quite helpful. Our next article intends to focus on the intricacies of working with XML data, followed by a thorough discussion of working with databases. We hope to see you there.

Leave a Reply

%d bloggers like this: