[A version of this article appears on the O’Reilly Radar.] The O’Reilly Data Show podcast: Dean Wampler on bounded and unbounded data processing and analytics. Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. I first found myself having to learn Scala when I startedContinue reading “Building enterprise data applications with open source components”
Author Archives: Ben Lorica
From search to distributed computing to large-scale information extraction
Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. February 2016 marks the 10th anniversary of Hadoop — at a point in time when many IT organizations actively use Hadoop, and/or one of the open source, big data projects that originated after, and in someContinue reading “From search to distributed computing to large-scale information extraction”
Introduction to Tachyon and a deep dive into Baidu’s production use case
I pleased to announce a webcast that I’ll be hosting a webcast featuring the co-creator of Tachyon (full disclosure: I’m an advisor to Tachyon Nexus) alongside one of the architects behind Baidu’s big data platform. I hope to see you online on Sept 14th! Tachyon is a memory-centric fault-tolerant distributed storage system, which enables reliableContinue reading “Introduction to Tachyon and a deep dive into Baidu’s production use case”
Celebrating the real-time processing revival
[A version of this article appears on the O’Reilly Radar.] Register for Strata + Hadoop World NYC, which will take place September 29 to Oct 1, 2015. A few months ago, I noted the resurgence in interest in large-scale stream-processing tools and real-time applications. Interest remains strong, and if anything, I’ve noticed growth in theContinue reading “Celebrating the real-time processing revival”
Bringing Apache Spark closer to bare metal
Fans and users of Apache Spark will want to attend a webcast I’ll be hosting next week (Sept 3rd), featuring Josh Rosen – one of the early developers behind PySpark: Deep dive into Project Tungsten: Bring Spark closer to bare metal Project Tungsten focuses on substantially improving the efficiency of memory and CPU for SparkContinue reading “Bringing Apache Spark closer to bare metal”
Bridging the divide: Business users and machine learning experts
[A version of this articles appears on the O’Reilly Radar.] Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. As tools for advanced analytics become more accessible, data scientist’s roles will evolve. Most media stories emphasize a need for expertise in algorithms and quantitative techniquesContinue reading “Bridging the divide: Business users and machine learning experts”
Pattern recognition and sports data
[A version of this article appears on the O’Reilly Radar.] One of my favorite books from the last few years is David Epstein’s engaging tour through sports science using examples and stories from a wide variety of athletic endeavors. Epstein draws on examples from individual sports (including track and field, winter sports) and major U.S.Continue reading “Pattern recognition and sports data”
Understanding neural function and virtual reality
[A version of this article appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Poppy Crum explains that what matters is efficiency in identifying and emphasizing relevant data. Like many data scientists, I’m excited about advances in large-scale machine learning, particularly recent success stories in computer vision and speech recognition. But I’m also cognizantContinue reading “Understanding neural function and virtual reality”
6 reasons why I like KeystoneML
[A version of this article appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Ben Recht on optimization, compressed sensing, and large-scale machine learning pipelines. As we put the finishing touches on what promises to be another outstanding Hardcore Data Science Day at Strata + Hadoop World in New York, I sat down withContinue reading “6 reasons why I like KeystoneML”
Apache Spark in the Enterprise and in China
Enterprise Adoption IBM’s announcements at the recent Spark Summit in SF bodes well for enterprise adoption of Spark. Ben Horowitz jokingly referred to IBM’s endorsement as akin to a Rabbi blessing Spark as kosher for use in an enterprise. I recently sat down with a set of luminaries at the Spark Summit and asked themContinue reading “Apache Spark in the Enterprise and in China”
