Deep Learning for Hackers

How do you get started using Deep Learning? In a previous post, I noted how many of the tools and best practices are locked away in “oral traditions” shared among practitioners. But recently, open source tools have made Deep Learning somewhat more accessible to hackers. In an upcoming webcast, I’m hosting noted hacker and startupContinue reading “Deep Learning for Hackers”

PredictionIO: an open source machine learning server

PredictionIO a startup that produces an open source machine learning server, has raised a seed round of $2.5M. The company’s engine allows developers to quickly integrate machine learning into products and services. The company’s machine learning server is open source, and is available on Amazon Web Services. As an open source package, the company hopesContinue reading “PredictionIO: an open source machine learning server”

Databricks Cloud makes it easier to build Data Products

Here is a link to Ali Ghodsi’s talk and demo that took the Spark Summit by storm. The demo really captures the power of Databricks Cloud: complex, high-performance, big data analytics at massive scale, accessible to anyone who can write simple scripts (currently supports SQL, Python, Scala). The demo culminates when Ali shows how easyContinue reading “Databricks Cloud makes it easier to build Data Products”

There are many use cases for graph databases and analytics

Business users are becoming more comfortable with graph analytics [A version of this post appears on the O’Reilly Radar blog.] The rise of sensors and connected devices will lead to applications that draw from network/graph data management and analytics. As the number of devices surpasses the number of people — Cisco estimates 50 billion connectedContinue reading “There are many use cases for graph databases and analytics”

Super Simple Real-Time Big Data Backend

I recently had a great conversation with Jodok Batlogg, Co-Founder and CEO, Crate Data. We talked about how his experience as CTO of StudiVZ and CEO of Lovely Systems informed how they designed and built CrateDB. A few months ago Crate ended up as the top story on Hacker News, which caught the founders byContinue reading “Super Simple Real-Time Big Data Backend”

Don’t miss the keynotes at the 2014 Spark Summit

There will be major announcements – particularly during the Monday morning keynotes. Fortunately the organizers will livestream the talks (sign up here). An added bonus if you sign-up for the livestream: I’ll be interviewing (keynote) speakers and key members of the Spark community throughout the first two days of the summit.

Scalable Data Science on a Laptop

I’ll be hosting a webcast featuring one of Strata’s most popular speakers: machine-learning expert, Alice Zheng Here is what data science looks like today: 1. Munge some data: a. Process raw data. Stuff it into a database. b. Query for specific data. Coax results out through a straw. c. Munge data into a format requiredContinue reading “Scalable Data Science on a Laptop”

Streamlining Feature Engineering

Researchers and startups are building tools that enable feature discovery [A version of this post appears on the O’Reilly Data blog.] Why do data scientists spend so much time on data wrangling and data preparation? In many cases it’s because they want access to the best variables with which to build their models. These variablesContinue reading “Streamlining Feature Engineering”

Bits from the Data Store

Semi-regular field notes from the world of data: I’m always on the lookout for interesting tools and ideas for reproducing and collaborating on long data workflows. Reproducibility and collaboration are topics that we’re following closely at Strata (both topics remain on the radar of many data scientists and data engineers I speak with). At theContinue reading “Bits from the Data Store”

Data Analysis on Streams

If you’re struggling with analyzing streaming data, I have just the event for you. I’ll be hosting a webcast on June 12th, featuring Mikio Braun, co-founder of streamdrill: Analyzing real-time data poses special kinds of challenges, such as dealing with large event rates, aggregating activities for millions of objects in parallel, and processing queries withContinue reading “Data Analysis on Streams”