Algorithms are shaping our lives – here’s how we wrest back control

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Kartik Hosanagar on the growing power and sophistication of algorithms.

In this episode of the Data Show, I spoke with Kartik Hosanagar, professor of technology and digital business, and professor of marketing at The Wharton School of the University of Pennsylvania.  Hosanagar is also the author of a newly released book, A Human’s Guide to Machine Intelligence, an interesting tour through the recent evolution of AI applications, which draws from his extensive experience at the intersection of business and technology.

We had a great conversation spanning many topics, including:

  • The types of unanticipated consequences of which algorithm designers should be aware.
  • The predictability-resilience paradox: as systems become more intelligent and dynamic, they also become more unpredictable, so there are trade-offs algorithms designers must face.
  • Managing risk in machine learning: AI application designers need to weigh considerations such as fairness, security, privacy, explainability, safety, and reliability.
  • A bill of rights for humans impacted by the growing power and sophistication of algorithms.
  • Some best practices for bringing AI into the enterprise.

Related resources:

 

Why your attention is like a piece of contested territory

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: P.W. Singer on how social media has changed, war, politics, and business.

In this episode of the Data Show, I spoke with P.W. Singer, strategist and senior fellow at the New America Foundation, and a contributing editor at Popular Science. He is co-author of an excellent new book, LikeWar: The Weaponization of Social Media, which explores how social media has changed war, politics, and business. The book is essential reading for anyone interested in how social media has become an important new battlefield in a diverse set of domains and settings.

We had a great conversation spanning many topics, including:

  • In light of the 10th anniversary of his earlier book Wired for War, we talked about progress in robotics over the past decade.
  • The challenge posed by the fact that social networks reward virality, not veracity.
  • How the internet has emerged as an important new battlefield.
  • How this new online battlefield changes how conflicts are fought and unfold.
  • How many of the ideas and techniques covered in LikeWarare trickling down from nation-state actors influencing global events, to consulting companies offering services that companies and individuals can use.

Continue reading “Why your attention is like a piece of contested territory”

The technical, societal, and cultural challenges that come with the rise of fake media

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Siwei Lyu on machine learning for digital media forensics and image synthesis.

In this episode of the Data Show, I spoke with Siwei Lyu, associate professor of computer science at the University at Albany, State University of New York. Lyu is a leading expert in digital media forensics, a field of research into tools and techniques for analyzing the authenticity of media files. Over the past year, there have been many stories written about the rise of tools for creating fake media (mainly images, video, audio files). Researchers in digital image forensics haven’t exactly been standing still, though. As Lyu notes, advances in machine learning and deep learning have also found a receptive audience among the forensics community.

We had a great conversation spanning many topics including:

  • The many indicators used by forensic experts and forgery detection systems
  • Balancing “open” research with risks that come with it—including “tipping off” adversaries
  • State-of-the-art detection tools today, and what the research community and funding agencies are working on over the next few years.
  • Technical, societal, and cultural challenges that come with the rise of fake media.

Here are some highlights from our conversation:
Continue reading “The technical, societal, and cultural challenges that come with the rise of fake media”

Using machine learning and analytics to attract and retain employees

[A version of this post appears on the O’Reilly Radar blog.]

The O’Reilly Data Show Podcast: Maryam Jahanshahi on building tools to help improve efficiency and fairness in how companies recruit.

In this episode of the Data Show, I spoke with Maryam Jahanshahi, research scientist at TapRecruit, a startup that uses machine learning and analytics to help companies recruit more effectively. In an upcoming survey, we found that a “skills gap” or “lack of skilled people” was one of the main bottlenecks holding back adoption of AI technologies. Many companies are exploring a variety of internal and external programs to train staff on new tools and processes. The other route is to hire new talent. But recent reports suggest that demand for data professionals is strong and competition for experienced talent is fierce. Jahanshahi and her team are building natural language and statistical tools that can help companies improve their ability to attract and retain talent across many key areas.

Here are some highlights from our conversation:

Optimal job titles

The conventional wisdom in our field has always been that you want to optimize for “the number of good candidates” divided by “the number of total candidates.” … The thinking is that one of the ways in which you get a good signal-to-noise ratio is if you advertise for a more senior role. … In fact, we found the number of qualified applicants was lower for the senior data scientist role.

… We saw from some of our behavioral experiments that people were feeling like that was too senior a role for them to apply to. What we would call the “confidence gap” was kicking in at that point. It’s a pretty well-known phenomena that there are different groups of the population that are less confident. This has been best characterized in terms of gender. It’s the idea that most women only apply for jobs when they meet 100% of the qualifications versus most men will apply even with 60% of the qualifications. That was actually manifesting.

Highlighting benefits

We saw a lot of big companies that would offer 401(k), that would offer health insurance or family leave, but wouldn’t mention those benefits in the job descriptions. This had an impact on how candidates perceived these companies. Even though it’s implied that Coca-Cola is probably going to give you 401(k) and health insurance, not mentioning it changes the way you think of that job.
Continue reading “Using machine learning and analytics to attract and retain employees”

How machine learning impacts information security

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Andrew Burt on the need to modernize data protection tools and strategies.

In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer and legal engineer at Immuta, a company building data management tools tuned for data science. Burt and cybersecurity pioneer Daniel Geer recently released a must-read white paper (“Flat Light”) that provides a great framework for how to think about information security in the age of big data and AI. They list important changes to the information landscape and offer suggestions on how to alleviate some of the new risks introduced by the rise of machine learning and AI.

We discussed their new white paper, cybersecurity (Burt was previously a special advisor at the FBI), and an exciting new Strata Data tutorial that Burt will be co-teaching in March.
Continue reading “How machine learning impacts information security”

In the age of AI, fundamental value resides in data

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Haoyuan Li on accelerating analytic workloads, and innovation in data and AI in China.

In this episode of the Data Show, I spoke with Haoyuan Li, CEO and founder of Alluxio, a startup commercializing the open source project with the same name (full disclosure: I’m an advisor to Alluxio). Our discussion focuses on the state of Alluxio (the open source project that has roots in UC Berkeley’s AMPLab), specifically emerging use cases here and in China. Given the large-scale use in China, I also wanted to get Li’s take on the state of data and AI technologies in Beijing and other parts of China.

Here are some highlights from our conversation:
Continue reading “In the age of AI, fundamental value resides in data”

Tools for generating deep neural networks with efficient network architectures

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Alex Wong on building human-in-the-loop automation solutions for enterprise machine learning.

In this episode of the Data Show, I spoke with Alex Wong, associate professor at the University of Waterloo, and co-founder of DarwinAI, a startup that uses AI to address foundational challenges with deep learning in the enterprise. As the use of machine learning and analytics become more widespread, we’re beginning to see tools that enable data scientists and data engineers to scale and tackle many more problems and maintain more systems. This includes automation tools for the many stages involved in data science, including data preparation, feature engineering, model selection, and hyperparameter tuning, as well as tools for data engineering and data operations.

Wong and his collaborators are building solutions for enterprises, including tools for generating efficient neural networks and for the performance analysis of networks deployed to edge devices.

Here are some highlights from our conversation:

Using AI to democratize deep learning

Having worked in machine learning and deep learning for more than a decade, both in academia as well as industry, it really became very evident to me that there’s a significant barrier to widespread adoption. One of the main things is that it is very difficult to design, build, and explain deep neural networks. I especially wanted to meet operational requirements. The process just involves way too much guesswork, trial and error, so it’s hard to build systems that work in real-world industrial systems.
Continue reading “Tools for generating deep neural networks with efficient network architectures”