Opendata and citizen datascience

Data analysis and visualizations are the most useful end products as BI professionals and even data scientists. They give actionable insights to the end user.

With all the data initiatives and people working with it, there are now a lot of examples of government open data being used to better the community.
But I’ve yet to see a lot of  Freedom of Information Act (FOIA) datasets being used or visualized.

This post uses an enriched dataset about the deaths in California Police Custody during the 2013 – 2015 period acquired using the FOIA.

Interested in getting the enriched dataset and analysing it yourself?
Read on!

The dataset

It Started with Dan Nguyen (blog | twitter) retweeting this yesterday.


I was late to notice it, but got intrigued when I did.
The link goes to an article on MuckRock, showing what apparently is a breakdown of deaths by race and gender for the police custory dataset of 2014.

At the end of the article, there’s a link that has a breakdown of the FOIA request and all data received so far. It’s an interesting read and sheds some light on the data quality issues in the dataset.

  • There are 3 files, one for each year.
  • In total there are 1500 deaths recorded
    • 2013: 729
    • 2014: 679
    • 2015: 92
  • The 2015 dataset is currently incomplete, it only goes up to february
  • Fields change between the datasets
  • The explanation for 1 custody offense code is missing (code “752”)
  • For 20 deaths, the agency name corresponding to the agency number is missing

Apart from the data quality issues, the data was enriched with extra fields for a better analysis. This includes for example an age calculation, translation of codes to english, …
Get the full enriched dataset here: CaliforniaDeathInCustodyData_2013-2015.

What now?

Some ideas for you to work with this dataset

  • Add locations and set them out on a map (check the free
  • Bin fields, for example. Join ages into groups (kid, young adult, adult, …) and look for patterns.
  • Combine several fields into new fields (features) and look for patterns.

And when you’re done, blog, tweet or link the result to me!
I’m interested in what you’ll do with it

Need more datasets?

Read more about this and other FOIA datasets on Dan Nguyen’s blog.

Share your thoughts and what you did with the data.
Just comment here or let me know via twitter, or both  🙂

Leave a Reply

%d bloggers like this: