Scalable Data Science on a Laptop

I’ll be hosting a webcast featuring one of Strata’s most popular speakers: machine-learning expert, Alice Zheng

Here is what data science looks like today:

1. Munge some data:

    a. Process raw data. Stuff it into a database.
    b. Query for specific data. Coax results out through a straw.
    c. Munge data into a format required for the next stage.

2. Do some analysis:

    a. Figure out how to use a data analytics library to generate the results you need.
    b. Dump results out to file/database/hand truck.
    c. Parse out the chunk of output you need. Look at it.
    d. Decide something is not right. Repeat all of the above.

3. Oh right, speed!

    a. Repeat all steps in native code to make it fast.

4. Wait, what about scale?

    a. Repeat all steps with five other tools, write more code to scale up.

In this webcast, we’ll demonstrate doing scalable data science using GraphLab Create, an end-to-end platform for prototyping and deploying data products. You can munge data, query statistics, build sophisticated models, and deploy to the cloud, all from *one* platform—your laptop. With disk-backed data stores, an intuitive Python front-end and efficient C++ back-end, GraphLab Create squeezes out all the power from a single machine, which can be orders of magnitude faster than MapReduce.

Leave a Reply

Discover more from Gradient Flow

Subscribe now to keep reading and get access to the full archive.

Continue reading