Language is difficult for Google’s computers

Joakim Nivre, professor of computational linguistics, has been researching a method of teaching computers grammar.

Joakim Nivre, professor of computational linguistics, has been researching a method of teaching computers grammar.

Computers that can speak like humans have existed for some time in science fiction literature. Yet in reality, it has proved difficult to get computers to understand the nuances of language. Joakim Nivre is a professor of computational linguistics and researches teaching computers to improve their understand of language.


Joakim Nivre mainly teaches computers to explore the component parts of a sentence. He has recently been a visiting researcher at Google to help the company to develop improved language analysis software.

‘The methods they use are largely based on my research.’

In the early days of search engines, keywords were only matched against web pages. If the word occurred many times the page was positioned near the top of the search results. Now they want to access more of the content, i.e. to build question-answering systems.

‘When asking someone “who bought Nokia?” it is not enough that the computer can find documents where all keywords are present. It must also be able to determine that Microsoft is the subject and Nokia the object.´

Nowadays almost all searchable texts have undergone a grammatical analysis.

‘Google, for example, has its own copy of the web, which is updated daily. Information is stored about the content of each page, which words occur, facts are extracted and relationships mapped. Search questions and what people click on are also stored and matched.´

Making a linguistic analysis of the entire web involves managing incredible amounts of data. Of course it is important to have sufficiently fast algorithms.

‘If you take the software that has the world record for accurate analysis of English, it would take 300 years to analyse the entire Web on a computer. That's what I'm working on - to produce sufficiently fast software without losing too much accuracy.’

There is a great deal of irony in grammatical analysis. It is one of the most data and computer intensive areas. However, first and foremost it is neither storage space nor processing power that is biggest bottleneck.

‘In order for the software to learn to understand the texts, we first need to feed them with example sentences marked with a grammatical analysis. So people need to sit and mark up a sufficient amount of text.’

In this world of data, researchers also share large amounts of data.

‘It can provide additional credits in a publication if you have assisted with data that the article is based on.’

Yet it is more difficult with data that companies own, even if they are also involved in the data exchange, and data is under copyright. Another type of problem comes with integrity-protected data such as e-mail and SMS.

‘At the same time, for example, the disaster in Haiti a few years ago showed that SMS was an important channel for emergency information. It is then important to be able to automatically analyse such text in real time.’

Kim Bergström

Läs mer

Subscribe to the Uppsala University newsletter

FOLLOW UPPSALA UNIVERSITY ON

Uppsala University on Facebook
Uppsala University on Instagram
Uppsala University on Youtube
Uppsala University on Linkedin