How do you go from Personal BI to Personal Data Science?
Isn’t Data Science only for those rare unicorns that are smarter than most of us combined?
Can we even democratize Data Science within the enterprise?
Kimberly Hermans (twitter) and I presented on this topic last week, the 27th of November, at the Microsoft UK office. Jen Stirrup (blog | twitter) had organized a great event for the community there: “Data Culture Day London – Power BI Edition“.
This is a write-up of the presentation “Personal BI to Personal Data Science”, hopefully it will make you want to attend the presentation.
Currently you can still attend it at the next SQL Server Usergroup event in Belgium on the 8th of December.
After that date, take a look at my calendar to see if I’m delivering this session somewhere else.
I’ve read a lot about this fancy data science thing. I’ve played around a bit with Azure Machine Learning and I even participated in the Kaggle Titanic case and I even had a decent submission (0.271544) in the closed Caterpillar case.
This was all done with little data science background, Microsoft tools.
The only statistical knowledge used was to log(x -1) and exp(x) +1 the results.
And that’s something my coworkers helped me with.
So while everyone seemed to think that a data scientist does some kind of unique job, I started to notice the similarities their work and tools have with the ones that are used in the good old business intelligence world.
My own experience even made me think that you might not need to get an expensive data scientist for most of the work.
There’s already these “Big data” developers running around and they help data scientists with their Hadoop clusters for example.
Could we possibly offload other tasks? Could it be done with all the great new things Microsoft has introduced?
Apart from those ideas, the mention that anybody else could do data science work seems to bump into a lot of resistance.
And I’ll tell you that, because of this resistance, I had my doubts as well.
Until I saw this inspiring presentation by Amir Netz (twitter) on the age of data culture.
Now rewatch the part where he describes the evolution of Microsoft throughout the ages and how under the leadership of Satya Nadella, Power BI really came to existence.
Is this not the most inspiring thing you’ve heard in a while?
In case you haven’t watched the video, I’ll summarize what it is about:
Power BI is not only built to enable IT or analysts to work with data. It is built to enable everyone to take action on existing data.
You read that right, everyone should be able to get insight from data. And by extension data should be actionable for everyone.
I know what your thinking, and you are probably right.
Putting data in everyone’s hands is like putting a loaded gun in everyone’s hands.
Do we really want to do that?
We don’t let everyone develop their own business intelligence solution. Instead we give them a governed playground.
Specialists set up and maintain servers, develop data models, create a data warehouse…
And in the end, users get to connect to the end result, do and re-do their own analysis and get the illusion that they’re in full control.
In reality, users are in control. But only over that one last step in the process.
But that last step is what people need, even more, it’s what they actually want.
No business user wants to set up a server or do any of the other theoretical and technical work leading up to their report.
They want their insights and they want them as soon as possible in as many forms a.
Users don’t just want to visualize their data, some of them want to combine corporate data with public data. Some want to combine data from different corporate sources. And almost all of them just want to work with the data that they create on a daily basis because it is this data that enables them to do a better job.
And that basically is what we tend to call self-service BI. Giving users access to right data and letting them fulfill almost all of their reporting needs.
What you won’t hear most of the time, is how hard it can be to get this right.
People tend to think that just giving users access to all databases is enough.
Jen Underwood (blog | twitter) has a great article about the self-service fantasy most people have.
Personal Data science
If we stretch this personal BI idea to data science, we’d get what I call personal data science.
As you know there are different skills required for data science. Important to keep in mind is that users can’t fulfill all of these roles. And you shouldn’t even allow them to take certain roles either.
So what can users do?
Users tend to be a huge resource of domain expertise in business intelligence projects.
They can help you with the exploratory data analysis for example.
But they can also help with the monitoring and evaluating of your predictive models if you haven’t automated that yet.
When I started bouncing this idea off of data scientists I stumbled upon some natural resistance and a healthy dose of scepticism. And they had all the right reasons for it as well.
- Users don’t and won’t want to learn and use R or Python
- Users usually don’t have the background in maths or stats
- Users making the wrong choices can lead to very costly errors
If you ask me, this is the same resistance that early personal BI promoters received.
But are there any reasons why we should have a user work together with a data scientist on a predictive model? Sure!
- Users tend to have very intimate business knowledge
- Users love nothing more than to improve their own job
- There are more and cheaper users in a company than there are data scientists
During the presentation we discuss what users can actually do in an enterprise data science process.
We take you on a journey on why you would want that and what is required to make sure your experience will end with a successful predictive model.
You can watch the slides here.
Just remember that it’s no substitute for the real presentation 😉
Data is data, whether it’s in a BI or a data science context. The business wants and needs that data.
Even more important is the huge role the business, regular or power users, can play in helping you in your BI and data science projects.
In the same way, remember that tools are just tools. Whether you work with R Studio or SPSS doesn’t matter in the end. What does matter is how the user experiences your work.
Remember that users experience data through their own tool, just make sure that it’s the best one.
Currently, that best tool seems to be Power BI because it enables your users to do almost anything they want.
On top of that, the pace at which it keeps getting better is just incredible.
What are your thoughts on users getting more involved in the data science process?
How do you see the divide between business intelligence and data science?
Have you implemented Power BI yet?
Leave a comment below and don’t forget to tweet, share and like this post!