Scalable Data Science on a Laptop

I’ll be hosting a webcast featuring one of Strata’s most popular speakers: machine-learning expert, Alice Zheng

Here is what data science looks like today:

1. Munge some data:

    a. Process raw data. Stuff it into a database.
    b. Query for specific data. Coax results out through a straw.
    c. Munge data into a format required for the next stage.

2. Do some analysis:

    a. Figure out how to use a data analytics library to generate the results you need.
    b. Dump results out to file/database/hand truck.
    c. Parse out the chunk of output you need. Look at it.
    d. Decide something is not right. Repeat all of the above.

3. Oh right, speed!

    a. Repeat all steps in native code to make it fast.

4. Wait, what about scale?

    a. Repeat all steps with five other tools, write more code to scale up.

In this webcast, we’ll demonstrate doing scalable data science using GraphLab Create, an end-to-end platform for prototyping and deploying data products. You can munge data, query statistics, build sophisticated models, and deploy to the cloud, all from *one* platform—your laptop. With disk-backed data stores, an intuitive Python front-end and efficient C++ back-end, GraphLab Create squeezes out all the power from a single machine, which can be orders of magnitude faster than MapReduce.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.