Natural language generation (NLG) is a subset of natural language processing (NLP) that aims to produce natural language from structured data. It can be used in chatbot conversations, but also for various types of content creation, such as summarizing data and generating product descriptions for online shopping. Companies in the space offer various use cases for this type of automated content creation, but the technology requires human oversight—a necessity that is likely to remain in the near future.


The Current State of NLG for Content Creation

Experts agree that in its current state, using NLG for the purpose of creating content still requires significant human input. “From a technological standpoint, is it possible to turn on the machines and let them generate content at 100%? Absolutely. Do you want them to do that in their current state across the broad market? Absolutely not. … [You want] a balance between augmenting your human workforce to allow them to focus on higher-value tasks while letting the machine automate what can be data-driven,” says John Hathorn, director of marketing at Automated Insights, a company that provides a self-service NLG platform called Wordsmith.

Orion Montoya, engineering manager at Textio, a company that provides an augmented writing platform, agrees. “The state of NLG depends on who’s generating, for what purpose, and how much is being generated. Everyone is now familiar with keyboards that can give you options for the next word that you might be typing. … At the other end of that is where people are trying to write whole sentences or whole paragraphs or documents on a topic, and that’s where there’s still a lot of luck involved,” Montoya says. They add that human input is essential if a company is looking to use the technology to generate content that takes into account details such as brand language and the style of a particular market.

According to Jane Nemcova, managing director of machine intelligence at Lionbridge, a provider of translation and localization services, “any content that is repeatable and somewhat formulaic is a good candidate for NLG. To illustrate this, she provides the example of how a person might “translate” a graph of company revenue for a 2-year period into a verbal description. “An NLG system has to be taught by a human what to look for and how to talk about it. Programmatic expression of the content is required: What should the content highlight as significant? What, for example, constitutes ‘strong growth’ or ‘steady decline’? What is most significant?” Nemcova adds that language style “would also need to be specified or trained, based on the audience: should it be in earnings call style or less formal, for example.”

She goes on to say that using NLG for repetitive content can “free up personnel for more meaningful activities” and that current candidates for the technology include business intelligence, daily reporting such as most-clicked-on stories, and structured stories such as sports reports.


Current Use Cases of NLG for Content Creation

Companies are using NLG technology to create content in various ways. For Automated Insights, there are three main focuses: personalized communications, the ability to create content at scale, and business intelligence.

When it comes to personalized communications, Hathorn uses Automated Insights’ work with Yahoo Fantasy Football as an example. “If a client has unbelievable data on their user set, clients, and prospective clients, they can actually custom tailor content to the individual. … One of the most pertinent examples in that space is the work we do with Yahoo Fantasy Football. … After the draft, you get a report card and a draft recap, and then every single week those matchups are analyzed—and you get a custom-tailored report that’s only pertinent to you, which people love. Over the season, when we are in full season swing, running test productions and live deployments, we’re talking 6, 7 million a week and pushing 70-plus million in a season,” he elaborates.

With regard to the ability to generate content at scale, Hathorn cites the company’s work with the Associated Press as an example. “[There are] two key areas, one being corporate earnings reports, which are very data-driven. There’s a lot of structured data from the stock market, and they’re really boring to write. They had a staff that would focus on this quarterly, and they could get a few hundred of these reports out every quarter. What we did is allow the Associated Press to completely automate that process, and what they’ve seen is a 15x increase in coverage,” he says.

The other area where Automated Insights has helped the Associated Press is in covering minor league baseball. “You can imagine that within these pockets where these smaller teams are, they do have a loyal fan base, but there’s just not enough human capital to ensure coverage of all those games. So we started automating content and recaps. … The key thing that you draw across those use cases is the fact that both are very data-driven: Baseball revolves around statistics; the financial markets revolve around statistics,” Hathorn says. “Our bread and butter is not necessarily helping people create a movie script or a blog, … but really augmenting a journalist’s ability to focus on the uniquely human aspects of journalism, investigative journalism, interviewing, and allow us to do the numbers side,” he adds.

As for business intelligence, Hathorn says that Automated Insights aims to help organizations improve internal reporting: “Where we come in is adding a contextual layer to dashboards, so you have the data behind the dashboard that’s helping you build these beautiful visualizations. But what we can do is tag onto that, add the contextual layer, and deliver written analytics directly within your business intelligence platform that not only tell the person what’s going on and why it’s going on, but most importantly, what they should do about it.”

Narrative Science, a company that provides an NLG platform, aims to use the technology to “transform data into plain-English stories,” according to its website. “We’re a company based on the sole belief that today, 5 years from now, 10 years from now, language will be the dominant interface for how we communicate with our data and our insight,” says Cassidy Shield, VP of marketing at Narrative Science. “Today, we have about 100 customers, typically large, enterprise brands … where we’re taking business data and turning it essentially into a story—a story that their customers can understand, a story that their employees can understand, and so forth,” he adds.

Shield goes on to say that the company conceptualizes three stages of evolution in transforming data into language: static, real-time, and interactive. He says that turning static data into language is “historically the way we’ve done it” and that “you can think of this in financial reporting where there’s a discrete, defined output and you’re converting it to language once and reporting out on that.”

With regard to real-time transformation of data into language, he uses the example of describing in language what’s happening with the data in a dashboard. “It’s real-time because as you change the data in the dashboard, the language changes with it. That’s the evolution we’re in today, … taking data that’s evolving and changing, describing that in language, and putting the story next to the data,” Shield explains.

As for the interactive stage, he says that this is the evolution that the company is currently working on and describes it as “data communicating directly with the end user through language and through stories,” without the need for a dashboard. “You have the evolution from static data, which is just language as an output to real-time data, which is language that evolves sitting next to the visualization, and the third is really interactive, which is what we believe the future is: people communicating with their data and insight through language.”

According to Textio’s website, augmented writing is “the power to see the future your words will create.” The company offers a hiring solution called Textio Hire that leverages augmented writing with the goal of using language that engages prospects. “Textio is built on the outcome data of half a billion documents, and so what Textio is trying to do is based on all the documents and the data and language that it understands. Your document will reach this sort of response in the market when you write it in Textio and take the real-time insights that the product provides to you,” says Marissa Coughlin, senior director of communications at Textio.

She goes on to say that the technology can help with consistency in brand voice. “One of the things that we find with our customers is they have invested heavily in their brand, and they’re trying to portray their culture and values. … But most job posts or recruiting emails are written by everyone in the company, and there’s not always a brand person sitting on the shoulder of someone saying, ‘Hey, we’re going to use this word now.’ And so what happens is that language infiltrates these job posts and recruiting emails that may not actually be reflective of the culture,” Coughlin says. “One of the things that Textio found for one of our customers … [was that] their number one cultural value was ‘We are owners.’ And so it turns out that when they used the word ‘ownership’ in their job posts, those jobs filled faster and that’s probably because that’s the value that the company has ascribed to. They’re putting it into their job posting, and then when they’re actually interviewing and screening candidates, they’re looking that those candidates meet up with those values, so that ownership value starts from the very first connection. Having an understanding of what works for individual companies to fill their roles is really improving the way these companies not only write but express themselves in the job market.”


Moving Forward With NLG for Content Creation

Experts identify a few potential sticking points in the evolution of NLG. One such limitation is around data—for Hathorn, having structured data specifically. “As a subset of artificial intelligence, we fall academically in natural language processing, which is taking unstructured text and turning it into structured data. We’re just on the opposite end taking structured data and turning it into unstructured text. In the technology’s current state, the biggest limitation is having structured data,” he elaborates. Although, he says that this is “becoming less of a problem because the Big Data trend started almost a decade ago, so people really do have great, valuable, structured information.”

For Nemcova, the necessity of data brings up another issue: privacy. “Privacy continues to be a concern for a data-driven industry. The better and more voluminous the data, the better and more appropriate the real-world applications can be for a business’ customers and partners. However, restrictions apply on how company and customer data can be used. Often data must be anonymized or synthesized, which can increase the cost and complexity of NLG applications,” she says.

Going forward, Nemcova says that companies will “want NLG to be invisible: to have that human feel in length, style, and content appropriate to the ‘intent’ of the communication.” She suggests that human oversight can help train systems to improve future output, adding that in the near future, she expects to see “more niche providers of NLG solutions, where content generation models can be trained internally on data that companies have available to them or developed via cloud service providers and third parties,” and larger companies may develop NLG systems in-house.

Montoya sees NLG evolving to become a tool that can help content creators craft more informed content: “The combination of generation and a big content library can really help people who are required to write for a purpose have some guardrails to ensure that they are meeting that purpose independent of just asking a bunch of other people to look at what they’ve written.”