One Simple Chart: Spark NLP goes international

About a year ago, I posted a chart that showed the geographic distribution of users of Spark NLP, an open source, natural language processing library built on top of Apache Spark. With the recent release of version 2.5, Spark NLP now provides support for 14 new languages (more languages than any other open-source library). I decided to ask David Talby for an update of the geo-demographic data of visitors to the project’s homepage:

  • The US and India are still the source of many visitors to the project homepage, but their combined share is down to 49% from 56%.
  • Of the thousands of visitors to the site: 42% are from the Americas, 24% from Asia-Pacific, and 30% are based in the EMEA region.

As David Talby describes it, version 2.5 is a major release and includes: “87 new pre-trained models, support for 14 new languages, ALBERT & XLNet transformers out-of-the-box, a new context-based spell checker with state-of-the-art accuracy, and a new multi-class sentiment detector with 2 pre-trained SOTA models.”

It will be interesting to see how support for new languages changes the geographic distribution of users of Spark NLP over the next year. Spark NLP has made steady inroads among enterprise users, many of whom already use Spark and need a scalable, production ready NLP library. On the product side, David and his colleagues at John Snow Labs are focused on specific verticals: they have built a commercial NLP platform (that uses Spark NLP) focused on the healthcare and pharmaceutical domains.

For more on Spark NLP, David Talby will be speaking at the Spark+AI Summit, which takes place June 22-26. The conference will be virtual and free!


Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.