Anthony Brown/123RF

Google Brain is Teaching Bots to Write Wikipedia Articles

  • 17 February 2018
  • Sam Mire

For those who make a living or find pleasure writing Wikipedia articles, be aware that you may want to get as many Wikis possible completed while you still can. The bots might be coming for your gig sooner than later.

A paper recently published through the Cornell University Library documents how Google is teaching bots to aggregate information found on various websites in order to create a Wikipedia page that aggregates its finding into a single text. In other words, to do just as humans have been doing since April of 2008, when Wikipedia was first created.

Through a process called ‘extractive summarization’, a team working within Google Brain – the segment of Google dedicated to machine learning innovation – is teaching robots to identify information relevant to the topic at hand and organize it through a ‘neural abstractive model’ in order to create the article. The top ten web pages on a given subject are scoured by the bot, and from those pages the algorithm gleans information that it uses to learn about the topic and how the authorities speak about it. Still, the algorithm essentially takes longer, unoriginal sentences and whittles them down into shorter ones using the abstractive model.

While the team has said to have witnessed its algorithms create ‘fluent, coherent, multi-sentence paragraphs’ and even entire articles, human Wikipedia writers can rest easy knowing that the process is far from perfect.

Even the most advanced machine learning algorithms struggle with the creation of coherent text which spans more than a few sentences, and the Google Brain bot is no exception, though it is a slight improvement over its predecessors. The authors of the Cornell paper admit that, ideally, their bot will be able to support longer inputs and thus create longer, more fluent Wikipedia articles that are on par with those created by humans.

Ideally, we would like to pass all the input from reference documents, Mohammad Saleh, a software engineer in Google AI’s team said. Designing models and hardware that can support longer input sequences is currently an active area of research that can alleviate these limitations.

So, Wikipedia authors can continue their writing. For now, at least.

About Sam Mire

Sam is a Market Research Analyst at Disruptor Daily. He's a trained journalist with experience in the field of disruptive technology. He’s versed in the impact that blockchain technology is having on industries of today, from healthcare to cannabis. He’s written extensively on the individuals and companies shaping the future of tech, working directly with many of them to advance their vision. Sam is known for writing work that brings value to industry professionals and the generally curious – as well as an occasional smile to the face.