The term “NLP”—or natural language processing—encompasses a wide range of business use cases that are mostly text based. Consider that people use text to record and transmit their communications in general, and, as such, it’s one of the most widely available and “interoperable” data formats. While some industry sectors such as finance and healthcare have long used text mining approaches, much more sophisticated use cases have been gaining traction in industry, especially during the latter 2010s.
To uncover in-depth insights and current industry trends, we ran the Natural Language Processing (NLP) Industry Survey online from July 5 to August 14, 2020. A selection of highlights from the survey follows. For complete survey results, download the full survey report.
Company Size refers to the number of employees in a company (Small = 500 or fewer employees; Medium = 501 to 5,000 employees; Large = more than 5,000 employees).
Scale of Documents refers to the number of documents an NLP system supports (Low = fewer than 50,000 documents monthly or overall; Medium = between 50,000 to 500,000 documents per month; High = more than 500,000 documents per month).
Respondents came from a wide variety of industry sectors. Along with the Technology sector, companies in Healthcare and Financial Services have long used text mining technology.
Stage of NLP Adoption
In the report, we group respondents based their response to a question that attempts to measure their stage of adoption of NLP technologies:
- Using NLP: respondents who have deployed NLP to production (44%)
- Exploring NLP: respondents who have not yet deployed NLP to production (56%)
Industry Use Cases
Respondents who work at companies that already use NLP in production signaled that document classification and NER (named entity recognition) were by far the use cases they are most likely to have. The accompanying Treemap displays the share of respondents within a given group: 63% of respondents who work at companies Using NLP indicated that they are using NLP for document classification, while 27% of respondents at companies who are Exploring NLP are using it for NER.
With many more libraries and models to choose from compared to five years ago, this is truly a great time to be an NLP user. What criteria do respondents use to gauge NLP tools? When evaluating an NLP library, survey respondents cite accuracy as the most important requirement. Accuracy refers to the effectiveness of models (typically pre-trained models) that now come with many NLP libraries. These models allow users to input text into a pipeline then get common outputs (e.g., tokens, lemmas, part-of-speech (POS), similarity, and entity recognition).
When reviewing these libraries, it’s important to understand them as multi-stage pipelines, where a cascade of models gets used across each stage of processing. Therefore, there is no single metric for accuracy, but instead a cumulative measure for any given NLP application based on how these models get applied. According to our survey results, users are increasingly comparing libraries based on the accuracy of such pre-trained models.
A total of 571 respondents from more than 50 countries completed the survey. A quarter of all respondents held technical leadership roles. Respondents were recruited via social media, online advertising, the Gradient Flow Newsletter, and through industry partners and contacts.