Time-turner: Strata NYC 2014, day 1

There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 2 (maybe 3) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day wouldContinue reading “Time-turner: Strata NYC 2014, day 1”

Streamlining Feature Engineering

Researchers and startups are building tools that enable feature discovery [A version of this post appears on the O’Reilly Data blog.] Why do data scientists spend so much time on data wrangling and data preparation? In many cases it’s because they want access to the best variables with which to build their models. These variablesContinue reading “Streamlining Feature Engineering”

A growing number of applications are being built with Spark

Many more companies are willing to talk about how they’re using Apache Spark in production [A version of this post appears on the O’Reilly Data blog.] One of the trends we’re following closely at Strata is the emergence of vertical applications. As components for creating large-scale data infrastructures enter their early stages of maturation, companiesContinue reading “A growing number of applications are being built with Spark”

Welcome to Intelligence Matters

Casting a critical eye on the exciting developments in the world of AI [A version of this post appears on the O’Reilly Radar blog and Forbes.] Editor’s note: this post was co-authored by Ben Lorica and Roger Magoulas Today the O’Reilly Radar is kicking off Intelligence Matters (IM), a new series exploring current issues inContinue reading “Welcome to Intelligence Matters”

Network Science Dashboards

Networks graphs can be used as primary visual objects with conventional charts used to supply detailed views [A version of this post appears on the O’Reilly Data blog.] With Network Science well on its way to being an established academic discipline, we’re beginning to see tools that leverage it. Applications that draw heavily from thisContinue reading “Network Science Dashboards”

Verticalized Big Data solutions

General-purpose platforms can come across as hammers in search of nails [A version of this post appears on the O’Reilly Data blog and Forbes.] As much as I love talking about general-purpose big data platforms and data science frameworks, I’m the first to admit that many of the interesting startups I talk to are focusedContinue reading “Verticalized Big Data solutions”

Advanced Analytics on Relational Data with Spark SQL

I’ll be hosting a webcast on Spark SQL featuring Michael Armbrust of Databricks: In this webcast, we’ll examine Spark SQL, a new Alpha component that is part of the Apache Spark 1.0 release. Spark SQL lets developers natively query data stored in both existing RDDs and external sources such as Apache Hive. A key featureContinue reading “Advanced Analytics on Relational Data with Spark SQL”

5 Fun Facts about HBase that you didn’t know

HBase has made inroads in companies across many industries and countries [A version of this post appears on the O’Reilly Data blog.] With HBaseCon right around the corner, I wanted to take stock of one of the more popular1 components in the Hadoop ecosystem. Over the last few years, many more companies have come toContinue reading “5 Fun Facts about HBase that you didn’t know”

Crowdsourcing Feature discovery

More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists [A version of this post appears on the O’Reilly Data blog and Forbes.] Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasonsContinue reading “Crowdsourcing Feature discovery”