The AI tool quickly became good at Swedish

Portrait Joakim Nivre.

For the past two years, Joakim Nivre has been involved in developing language models based on Swedish texts. Photo: Daniel Olsson

The AI tool Chat-GPT has stunned the world with its good language use. But how did the language model become so good at Swedish? We asked Joakim Nivre, Professor of Computational Linguistics. For the past two years, he has been involved in developing language models based on Swedish texts.

“The idea of building language models has been around for a long time, at least since the 1950s,” says Joakim Nivre. Claude Shannon, known as the “father of information theory”, realised that you could measure the amount of information in language by guessing the next word in a text. The more difficult it was to guess the next word, the more information there was in the text.

By having a computer model try to guess the next word and giving a feedback signal on how good it is, the model can be trained. If it is good enough at guessing the next word, it has also learned something about the language.

“Since the 1950s, it has been possible to scale this up and make it so much more powerful. The statistical probability models have billions of different parameters. Moreover, they can be trained on an incredible number of text types, maybe even consisting of trillions of words.”

Knowledge of the language and the world

The training takes several months and results in the model storing a lot of knowledge about not only the language but also the world and what is reasonable to talk about.

When you interact with a model like Chat GPT, you ask a question and enter a prompt, which is a piece of text. It then provides a likely continuation in response to the question.

“It always sounds very convincing and very fluid, but there is no guarantee whatsoever that the answer is correct, because everything is based on probabilities,” says Joakim Nivre.

It all depends on how much the data model has been trained. The more data that is input, the more knowledge it has about the language and the subject area.

Chat-GPT is superior

In autumn 2022, Chat-GPT, a language model developed by the company Open AI, was released with surprisingly good language capabilities. At that time, Joakim Nivre, together with researchers at AI Sweden and RISE, had already started building Swedish language models.

“Within the project, we have trained several models of different sizes, the largest of which has 40 billion parameters. This is about a quarter of what GPT-3 (the predecessor of Chat-GTP) has and about a tenth of the largest models. This is one of the largest models available for a language other than English and Chinese.”

But there is no denying it. Chat-GPT and its successor GPT 4 are superior. Not only in English, but also in their use of Swedish, especially when it comes to providing relevant answers to questions.

Need for Swedish language models

It is currently an open question as to how the Swedish language model will continue to be developed within the project, which is run by AI Sweden. In parallel, a project funded by Vinnova is underway in which different organisations will be involved in exploring the technology. For example, Region Västra Götaland and Region Halland want to explore the possibilities of using language models in healthcare.

“Some of these things can actually be done much better by Chat-GPT, which is a bit demotivating. At the same time, there are sometimes reasons not to use Chat-GPT, such as when dealing with sensitive data and personal data that you do not want or are not legally allowed to send over the internet to Chat-GPT.”

For example, hospitals are not allowed to share that kind of data and would need their own language model that can run in a more closed system. The problem is that the models require a lot of computing power when they are used.

“Perhaps Sweden’s municipalities and regions should have a centralised IT infrastructure to be able to use not only language models but also other AI, such as that used in healthcare to interpret X-rays.”

Develop a European language model

Another reason to build smaller, local language models is to avoid becoming dependent on large American companies. Chinese companies are also investing in the development of AI and there is less and less transparency in this area.

“Europe is lagging behind, both commercially and in terms of research. If you really want to do something on a large scale, the Swedish context is a bit too small, but one possibility is to create a European language model.”

World map consisting of speech bubbles in different colors.

“There are 7,000 languages in the world, and most of them are not even close to benefiting from this technology,” says Joakim Nivre. Photo: Getty Images

The EU has long had a policy of supporting all official languages in Europe, and there is much to do in the future in the field of AI, according to Joakim Nivre.

“There are 7,000 languages in the world, and most of them are not even close to benefiting from this technology.”

Swedish is one of the major languages

However, Chat-GPT is good at Swedish, which is because there is a lot of data to train on.

“We often say that Swedish is a small language, but of these 7,000 languages, Swedish is in the top 100 in terms of number of speakers. If we then talk about digital resources and internet presence, the ranking is much higher than that. For example, we have the world's fourth largest Wikipedia.”

In the future, Joakim Nivre sees a great need for research on how AI technology can be adapted to handle even smaller languages, such as the minority languages in Sweden, like Sami and Meänkieli.

“For these, we cannot just copy models that already exist because there will never be that much data. We need to find smarter methods that can reach the same level or at least a similar level in a more efficient way with less data.”

Annica Hulth

Subscribe to the Uppsala University newsletter