Algorithms are shaping our lives – here’s how we wrest back control

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Kartik Hosanagar on the growing power and sophistication of algorithms.

In this episode of the Data Show, I spoke with Kartik Hosanagar, professor of technology and digital business, and professor of marketing at The Wharton School of the University of Pennsylvania.  Hosanagar is also the author of a newly released book, A Human’s Guide to Machine Intelligence, an interesting tour through the recent evolution of AI applications, which draws from his extensive experience at the intersection of business and technology.

We had a great conversation spanning many topics, including:

  • The types of unanticipated consequences of which algorithm designers should be aware.
  • The predictability-resilience paradox: as systems become more intelligent and dynamic, they also become more unpredictable, so there are trade-offs algorithms designers must face.
  • Managing risk in machine learning: AI application designers need to weigh considerations such as fairness, security, privacy, explainability, safety, and reliability.
  • A bill of rights for humans impacted by the growing power and sophistication of algorithms.
  • Some best practices for bringing AI into the enterprise.

Related resources:

 

You created a machine learning application. Now make sure it’s secure.

[A version of this post appears on the O’Reilly Radar.]

The software industry has demonstrated, all too clearly, what happens when you don’t pay attention to security.

By Ben Lorica and Mike Loukides.

In a recent post, we described what it would take to build a sustainable machine learning practice. By “sustainable,” we mean projects that aren’t just proofs of concepts or experiments. A sustainable practice means projects that are integral to an organization’s mission: projects by which an organization lives or dies. These projects are built and supported by a stable team of engineers, and supported by a management team that understands what machine learning is, why it’s important, and what it’s capable of accomplishing. Finally, sustainable machine learning means that as many aspects of product development as possible are automated: not just building models, but cleaning data, building and managing data pipelines, testing, and much more. Machine learning will penetrate our organizations so deeply that it won’t be possible for humans to manage them unassisted.

Organizations throughout the world are waking up to the fact that security is essential to their software projects. Nobody wants to be the next Sony, the next Anthem, or the next Equifax. But while we know how to make traditional software more secure (even though we frequently don’t), machine learning presents a new set of problems. Any sustainable machine learning practice must address machine learning’s unique security issues. We didn’t do that for traditional software, and we’re paying the price now. Nobody wants to pay the price again. If we learn one thing from traditional software’s approach to security, it’s that we need to be ahead of the curve, not behind it. As Joanna Bryson writes, “Cyber security and AI are inseparable.”
Continue reading “You created a machine learning application. Now make sure it’s secure.”

Why your attention is like a piece of contested territory

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: P.W. Singer on how social media has changed, war, politics, and business.

In this episode of the Data Show, I spoke with P.W. Singer, strategist and senior fellow at the New America Foundation, and a contributing editor at Popular Science. He is co-author of an excellent new book, LikeWar: The Weaponization of Social Media, which explores how social media has changed war, politics, and business. The book is essential reading for anyone interested in how social media has become an important new battlefield in a diverse set of domains and settings.

We had a great conversation spanning many topics, including:

  • In light of the 10th anniversary of his earlier book Wired for War, we talked about progress in robotics over the past decade.
  • The challenge posed by the fact that social networks reward virality, not veracity.
  • How the internet has emerged as an important new battlefield.
  • How this new online battlefield changes how conflicts are fought and unfold.
  • How many of the ideas and techniques covered in LikeWarare trickling down from nation-state actors influencing global events, to consulting companies offering services that companies and individuals can use.

Continue reading “Why your attention is like a piece of contested territory”

The evolution and expanding utility of Ray

[A version of this post appears on the O’Reilly Radar.]

There are growing numbers of users and contributors to the framework, as well as libraries for reinforcement learning, AutoML, and data science.

In a recent post, I listed some of the early use cases described in the first meetup dedicated to Ray—a distributed programming framework from UC Berkeley’s RISE Lab. A second meetup took place a few months later, and both events featured some of the first applications built with Ray. On the development front, the core API has stabilized and a lot of work has gone into improving Ray’s performance and stability. The project now has around 5,700 stars on GitHuband more than 100 contributors across many organizations.

At this stage of the project, how does one describe Ray to those who aren’t familiar with the project? The RISE Lab team describes Ray as a “general framework for programming your cluster or cloud.” To place the project into context, Ray and cloud functions (FaaS, serverless) currently sit somewhere in the middle between extremely flexible systems on one end or systems that are much more targeted and emphasize ease of use. More precisely, users currently can avail of extremely flexible cluster management and virtualization tools on one end (Docker, Kubernetes, Mesos, etc.), or domain specific systems on the other end of the flexibility spectrum (Spark, Kafka, Flink, PyTorch, TensorFlow, Redshift, etc.).
Continue reading “The evolution and expanding utility of Ray”

Using machine learning and analytics to attract and retain employees

[A version of this post appears on the O’Reilly Radar blog.]

The O’Reilly Data Show Podcast: Maryam Jahanshahi on building tools to help improve efficiency and fairness in how companies recruit.

In this episode of the Data Show, I spoke with Maryam Jahanshahi, research scientist at TapRecruit, a startup that uses machine learning and analytics to help companies recruit more effectively. In an upcoming survey, we found that a “skills gap” or “lack of skilled people” was one of the main bottlenecks holding back adoption of AI technologies. Many companies are exploring a variety of internal and external programs to train staff on new tools and processes. The other route is to hire new talent. But recent reports suggest that demand for data professionals is strong and competition for experienced talent is fierce. Jahanshahi and her team are building natural language and statistical tools that can help companies improve their ability to attract and retain talent across many key areas.

Here are some highlights from our conversation:

Optimal job titles

The conventional wisdom in our field has always been that you want to optimize for “the number of good candidates” divided by “the number of total candidates.” … The thinking is that one of the ways in which you get a good signal-to-noise ratio is if you advertise for a more senior role. … In fact, we found the number of qualified applicants was lower for the senior data scientist role.

… We saw from some of our behavioral experiments that people were feeling like that was too senior a role for them to apply to. What we would call the “confidence gap” was kicking in at that point. It’s a pretty well-known phenomena that there are different groups of the population that are less confident. This has been best characterized in terms of gender. It’s the idea that most women only apply for jobs when they meet 100% of the qualifications versus most men will apply even with 60% of the qualifications. That was actually manifesting.

Highlighting benefits

We saw a lot of big companies that would offer 401(k), that would offer health insurance or family leave, but wouldn’t mention those benefits in the job descriptions. This had an impact on how candidates perceived these companies. Even though it’s implied that Coca-Cola is probably going to give you 401(k) and health insurance, not mentioning it changes the way you think of that job.
Continue reading “Using machine learning and analytics to attract and retain employees”

How machine learning impacts information security

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Andrew Burt on the need to modernize data protection tools and strategies.

In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer and legal engineer at Immuta, a company building data management tools tuned for data science. Burt and cybersecurity pioneer Daniel Geer recently released a must-read white paper (“Flat Light”) that provides a great framework for how to think about information security in the age of big data and AI. They list important changes to the information landscape and offer suggestions on how to alleviate some of the new risks introduced by the rise of machine learning and AI.

We discussed their new white paper, cybersecurity (Burt was previously a special advisor at the FBI), and an exciting new Strata Data tutorial that Burt will be co-teaching in March.
Continue reading “How machine learning impacts information security”

7 data trends on our radar

[A version of this post appears on the O’Reilly Radar.]

From infrastructure to tools to training, here’s what’s ahead for data.

Whether you’re a business leader or a practitioner, here are key data trends to watch and explore in the months ahead.

Increasing focus on building data culture, organization, and training

In a recent O’Reilly survey, we found that the skills gap remains one of the key challenges holding back the adoption of machine learning. The demand for data skills (“the sexiest job of the 21st century”) hasn’t dissipated. LinkedIn recently found that demand for data scientists in the US is “off the charts,” and our survey indicated that the demand for data scientists and data engineers is strong not just in the US but globally.

With the average shelf life of a skill today at less than five years and the cost to replace an employee estimated at between six and nine months of the position’s salary, there is increasing pressure on tech leaders to retain and upskill rather than replace their employees in order to keep data projects (such as machine learning implementations) on track. We are also seeing more training programs aimed at executives and decision makers, who need to understand how these new ML technologies can impact their current operations and products.
Continue reading “7 data trends on our radar”