Data architectures for streaming applications

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Dean Wampler on streaming data applications, Scala and Spark, and cloud computing.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the O’Reilly Data Show I sat down with O’Reilly author Dean Wampler, big data architect at Lightbend. We talked about new architectures for stream processing, Scala, and cloud computing.

Our interview dovetailed with conversations I’ve had lately, where I’ve been emphasizing the distinction between streaming and real time. Streaming connotes an unbounded data set, whereas real time is mainly about low latency. The distinction can be blurry, but it’s something that seasoned solution architects understand. While most companies deal with problems that fall under the realm of “near real time” (end-to-end pipelines that run somewhere between five minutes to an hour), they still need to deal with data that is continuously arriving. Part of what’s interesting about the new Structured Streaming API in Apache Spark is that it opens up streaming (or unbounded) data processing to a much wider group of users (namely data scientists and business analysts).

Here are some highlights from our conversation:
Continue reading

Strata NYC 2016 is next week

Complete schedule is HERE.

Data science for humans and data science for machines

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Michael Li on the state of data engineering and data science training programs.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the O’Reilly Data Show, I spoke with Michael Li, cofounder and CEO of the Data Incubator. We discussed the current state of data science and data engineering training programs, Apache Spark, quantitative finance, and the misunderstanding around the term “data science.”

Here are some highlights from our conversation:
Continue reading

The importance of emotion in AI systems

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Rana el Kaliouby on deep learning, emotion detection, and user engagement in an attention economy.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

While I was in Beijing for Strata + Hadoop World, several people reminded me of the chatbot Xiaoice—one of the most popular accounts on the Chinese social media site Weibo. Developed by Microsoft researchers, Xiaoice comes with a personality and is able to engage users in extended conversations on Weibo. These types of capabilities highlight that in an attention economy, systems that are able to forge an emotional connection will garner more loyalty and engagement from users.

In this episode of the O’Reilly Data Show, I sat down with Rana el Kaliouby, co-founder and CEO of Affectiva, one of the leading experts in emotion sensing systems. We talked about the impact of deep learning and computer vision, Affectiva’s large facial expression database, and privacy and ethics in an era of multimodal systems.

Here are some highlights from our conversation:
Continue reading

Building human-assisted AI applications

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Adam Marcus on intelligent systems and human-in-the-loop computing.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the O’Reilly Data Show, I spoke with Adam Marcus, co-founder and CTO of B12, a startup focused on building human-in-the-loop intelligent applications. We talked about the open source platform Orchestra,for coordinating human-in-the-loop projects; the current wave of human-assisted AI applications; best practices for reviewing and scoring experts; and flash teams.

Here are some highlights from our conversation:

Continue reading

Enabling enterprise adoption of AI technologies

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Jana Eggers on building applications that rely on synaptic intelligence.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the O’Reilly Data Show, I spoke with Jana Eggers, CEO of Nara Logics. Eggers’ involvement with AI dates back to her days as a researcher at the Los Alamos National Laboratory. Most recently she has been helping companies across many industries adopt AI technologies as a way to enable a range of intelligent data applications.

Here are some highlights from our conversation:
Continue reading

Beijing Restaurants: Strata 2016

Here’s a partial list of the many memorable restaurants we visited in Beijing during the week of Strata 2016: