Free datasets

We all have those default vendor datasets.
But from time to time, we all just want a newer and fresher datasets, small or large, to play with.
Here you can find a list of real and useful opendata data to play with.

 

I’ll keep this list updated when I see new, incredible or just fun datasets.

Got a dataset that you think needs to be listed as well?
Post it in the comments!

2015/08/16 – Update: Added a realtime dataset section and added UK and EU opengov historical data.
2015/08/02 – First published: Added several useful datasets from my favorites


Real-time Data

City of Philadelphia Bike Share Stations
API info
Data relating to the Indego BikeShare program, including station locations and the number of available bikes.
More information about the program is available at: https://www.rideindego.com/

Live traffic information from the UK Highways Agency
Several API’s depending on the information you want
Live traffic information data showing traffic information on the strategic road network in England, maintained by the Highways Agency.


Historical Data

New York City OpenData
1300+ recent datasets, formatted for ease of use
makes the wealth of public data generated by various New York City agencies and other City organizations available for public use.

Reddit 1.7 Billion Public Comments
Original reddit post
Torrent of complete archive | Torrent of 54M comment subset
This dataset in Google BigQuery (direct link)
A dataset that is 250GB when compressed, over a terabyte uncompressed. Talk about a lot of data for all your text analysis needs…

Airline on-time performance data
zip files per year
The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. This is a large dataset: there are nearly 120 million records in total, and takes up 1.6 gigabytes of space compressed and 12 gigabytes when uncompressed.

MuckRock
4000+ FOIA datasets all forms (email, excel, …)
is a collaborative news site that brings together journalists, researchers, activists, and regular citizens to request, analyze & share government documents.
Data is requested through Freedom of Information Acts.

New York State
2500+ charts, maps, calendars and of course regular datasets
Not only the city New York, but also the state has opened up it’s data.
Anything ranging from Citi Bike System Data to Subway Entrances to even Bicycle Routes.

World Country information
Site with API links
Get information about countries via a RESTful API

UK Open Government Project
23000+ datasets in different forms (CSV, XML, API, …)
Everything you ever wanted to know about the UK and more 🙂

European Union Open Data Portal
8600+ datasets available
Ranging from employment to industry to education data

Belgian Open Data Initiative
Approaching 2000 pure Belgian datasets

Opendata and citizen datascience

Data analysis and visualizations are the most useful end products as BI professionals and even data scientists. They give actionable insights to the end user.

With all the data initiatives and people working with it, there are now a lot of examples of government open data being used to better the community.
But I’ve yet to see a lot of  Freedom of Information Act (FOIA) datasets being used or visualized.

This post uses an enriched dataset about the deaths in California Police Custody during the 2013 – 2015 period acquired using the FOIA.

Interested in getting the enriched dataset and analysing it yourself?
Read on!

Read moreOpendata and citizen datascience

PASS Performance – SQLPalooza 2015!

Today I noticed I missed some great sessions from the PASS Performance virtual chapter. I’ll list them all below for you.

Do check performance.sqlpass.org for more great sessions!
And don’t forget to follow @SQLPASS_PVC on twitter

#1: Make Your SQL Server Queries Go Faster
#2: Building High Performance SQL Servers Virtual Machines on AWS and Azure
#3: Performing a SQL Server Health Check
#4: Columnstore Indexes – Questions and Answers
#5: Performance Troubleshooting Using DMVs
#6: Maximizing SSIS Package Performance
#7: Troubleshooting Seconday Replica Latency
#8: SQL Server Benchmarking: The Powershell Speedometer

Read morePASS Performance – SQLPalooza 2015!

Huge free dataset with all public Reddit comments

Do you know Reddit?
Did you know all 1.7 billion+ public comments are now available as an open data set?
Are you ready to unleash all your Text Analytics fantasies? 🙂
Check out this post on Reddit!

A friendly user also uploaded this to Google BigQuery.

If you want example queries or visualisations, check out this thread on reddit.

Windows as a Service and Windows as a panic attack

Over the last week I’ve noticed several articles on Windows 10  passing by. I only linked 2, but some of these articles have very suggestive titles. Other articles are just full of suggestive language. Both suggesting that the free update is a trap and that we’ll all have to pay through the nose for Windows 10 after a free period.
It's a trap!

I understand that these major sites run on hits generated by all the fuzz they can create, but this is just starting to seem crazy.

Am I the only one thinking that Windows will just follow the same great path that Office has paved for it?

To me that would be the only logical path, as the Office 365 model is the way to forward. While people not needing all the fancy stuff can just buy each Office version separately. The same will hold true for Windows 10 or Windows 365.

It also will make licensing and managing the computer park a lot easier for a lot of businesses. And the home users will always be safe with the latest and greatest Windows version.

I tend to get bored every now and then

And when I get bored, the internet reaches out to me…
MVA has become a hobby it seems, even more so then browsing LinkedIn Pulse, Twitter, 9GAG, … 🙂


 

The Know It Prove It Challenge was fun to do in February and I completed the last module of the SharePoint course on the last day of February.

Today, as I started yet another video series (this time a C# for beginners course), I got bored almost when nearing the 1 hour mark. Which isn’t that abnormal, we all need breaks once in a while.

The thing is that I went into my MVA dashboard and noticed that I’m spending waaaaay too much time on there.

MVA Ranking 201503019

 

Out of 2.880.150 people on MVA I rank #9534.
And out of 12.404 Belgians I rank #76.

Something else I wasn’t expecting is that most (about 2/3rds) completed courses were Azure/cloud related instead of SQL Server!

That only enforces my idea of continuing on the Azure path in my spare time.

So I’ll talk to you soon about some Azure stuff!

 

%d bloggers like this: