Understanding information cascades, viral content, and significant relationships. [A version of this post appears on the O’Reilly Radar blog.] I rarely work with social network data, but I’m familiar with the standard problems confronting data scientists who work in this area. These include questions pertaining to network structure, viral content, and the dynamics of informationContinue reading “Network structure and dynamics in online social systems”
Tag Archives: radar
The evolution of GraphLab
[A version of this post appears on the O’Reilly Radar blog.] Editor’s note: Carlos Guestrin will be part of the team teaching Large-scale Machine Learning Day at Strata + Hadoop World in San Jose. Visit the Strata + Hadoop World website for more information on the program. I only really started playing around with GraphLabContinue reading “The evolution of GraphLab”
Building and deploying large-scale machine learning pipelines
[A version of this post appears on the O’Reilly Radar blog.] There are many algorithms with implementations that scale to large data sets (this list includes matrix factorization, SVM, logistic regression, LASSO, and many others). In fact, machine learning experts are fond of pointing out: if you can pose your problem as a simple optimizationContinue reading “Building and deploying large-scale machine learning pipelines”
A brief look at data science’s past and future
[A version of this post appears on the O’Reilly Radar blog.] Back in 2008, when we were working on what became one of the first papers on big data technologies, one of our first visits was to LinkedIn’s new “data” team. Many of the members of that team went on to build interesting tools andContinue reading “A brief look at data science’s past and future”
Lessons from next-generation data wrangling tools
[A version of this post appears on the O’Reilly Radar blog.] One of the trends we’re following is the rise of applications that combine big data, algorithms, and efficient user interfaces. As I noted in an earlier post, our interest stems from both consumer apps as well as tools that democratize data analysis. It’s noContinue reading “Lessons from next-generation data wrangling tools”
Building Apache Kafka from scratch
[A version of this post originally appeared on the O’Reilly Radar blog.] In this episode of the O’Reilly Data Show Podcast, Jay Kreps talks about data integration, event data, and the Internet of Things. At the heart of big data platforms are robust data flows that connect diverse data sources. Over the past few years,Continue reading “Building Apache Kafka from scratch”
Decoding bitcoin and the blockchain
[A version of this post originally appeared on the O’Reilly Radar blog.] When the creators of bitcoin solved the “double spend” problem in a decentralized manner, they introduced techniques that have implications far beyond digital currency. Our newly announced one-day event — Bitcoin & the Blockchain: An O’Reilly Radar Summit — is in line withContinue reading “Decoding bitcoin and the blockchain”
The science of moving dots: the O’Reilly Data Show Podcast
Rajiv Maheswaran talks about the tools and techniques required to analyze new kinds of sports data [This post originally appeared on the O’Reilly Radar blog.] Editor’s note: you can subscribe to the O’Reilly Data Show Podcast through iTunes, SoundCloud or through our RSS feed. Many data scientists are comfortable working with structured operational data andContinue reading “The science of moving dots: the O’Reilly Data Show Podcast”
Announcing Spark Certification
I’m happy to announce the Databricks/O’Reilly Developer Certification for Apache Spark! For more details, please read my post on the O’Reilly Radar.
Scaling up Data Frames
New frameworks for interactive business analysis and advanced analytics fuel the rise in tabular data objects [A version of this post appears on the O’Reilly Radar blog.] Long before the advent of “big data”, analysts were building models using tools like R (and its forerunners S/S-PLUS). Productivity hinged on tools that made data wrangling, dataContinue reading “Scaling up Data Frames”
