[A version of this post appears on the O’Reilly Radar.]
The O’Reilly Data Show Podcast: Duncan Ross on the evolution of analytics, data mining, and data philanthropy.
Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.
In this episode of the O’Reilly Data Show, I spoke with one of Strata + Hadoop World’s most popular teachers—Duncan Ross, data and analytics director at TES Global. In his long career in data, Ross has seen several stages of the evolution of tools, techniques, and training programs, and along the way he has interacted with business managers in many countries and regions. In keeping with his wide-ranging interests, we discussed many topics, including business analytics, data science training programs, data philanthropy and data for good, and university rankings.
Here are some highlights from our conversation:
Democratizing big data and data science
If you look over the last 30 years, in some ways things have moved on a lot. There is more flexibility and choice around the software that’s available now. However, in terms of strict usability, if anyone tries to claim that R is as elegant or usable as Clementine, which is what we had back in the late ’90s, it clearly isn’t. I mean Clementine was specifically designed to allow non-technical people to do data analysis.
As we’ve developed down into the world of Hadoop, and R, and Python, et cetera, in some ways we’ve taken a step back, because now, in order to use those tools effectively, or at the most detailed level, you need to have people who have the ability to do some level of programming. You may say, ‘well that’s actually quite a good thing, because those are good skills to have.’ Then there is this counter-argument that says, ‘if we want this truly to be democratized, we want someone who has a marketing focus to be able to pick up these technologies and use them effectively.’ Then, either we need to simplify the software, or we need to find other mechanisms of giving them control.
Data for evil and data for good
Using Data for Evil is our annual roundup of examples of how people have done things badly over the year. We use this as a way of highlighting what you shouldn’t do, and hopefully inspiring people instead to do things for good. We will be updating our Necromantic Quadrant of Evil to show which organizations have improved their evilness from previous years. We will be looking at particularly great examples of malfeasance with data. Increasingly, we see that the ability for organizations to do just plain evil stuff with data grows every year.
… We have definitely have had people who have come to these events, come to the presentations, and actually used this as the springboard into data philanthropy, and actually giving back their time, and their commitment, to using data for good. I hope we can have a wider impact, and that maybe we can help turnaround that oil tanker of evil heading for the coast. That’s a horrible metaphor, but you get the idea.
… The positive spin is that it’s getting easier because people are more aware of when data is being misapplied, and therefore, it’s reported more, so we have more cases. Actually, I think there is a whole new category of evil that is coming up this year, and I will be talking about that at Using Data for EVIL IV – The Journey Home.
Ranking the world’s universities
As you might imagine, we have to use data that is directly comparable. There are many other university missions, for example the teaching mission, which is really important, but it is a nightmare to try and measure as soon as you go across an international boundary. I’ll give you a really clear example of that: imagine we wanted to rank or evaluate universities by graduate employment rates. The challenge there is the graduate employment rate is affected by the natural unemployment rates where you are. If you have a university in New York, what’s the New York unemployment rate? … Then you have Singapore, which has an official unemployment rate of 0%. As soon as you look across international boundaries, you hit these challenges, so we have to have metrics that are consistent, that have some meaning. We look at some measures around teaching, but they are mostly focused on the input, so how much resource a university has to put into the teaching mission. We look at some metrics around research, both input to research and also a measure of the output. Then we look at some measures around internationalization and industry links. … We effectively use a purchase price parity measure, so that allows us to effectively say how many units of local currency would it take to buy $1 worth of stuff. … It gives us a way of saying, yes, you’re a university in Singapore, but your cost basis isn’t the same as the university in California.
Of those factors, I think one of the interesting ones that stands out, and we keep coming back to, is this idea of internationalization. All of the work that’s been done suggests that universities that have more of an international outlook are more successful. They have a better learning environment, a better teaching environment, and a better research environment. There is some evidence around the citations as well; if you do international research collaborations, they tend to have a better record when you look at the bibliometric data.
Editor’s note: Duncan Ross and Francine Bennett will co-present two sessions at Strata + Hadoop World London: Using data for EVIL IV – The Journey Home and The best university in the world.
- Using Data for Evil, Strata + Hadoop World, London 2015
- How we amplify privilege with supervised machine learning
- Five principles for applying data science for social good
- Haunted by data
- Marketing and Consumer Research Learning Path