Digital humanities – new tools for historians
Computational linguist Eva Pettersson has developed a tool which will make it easier for historians to find what they are looking for in very old documents. For the project ‘Gender and Work’, led by Professor Maria Ågren, it would be a dream come true to be able to make the time-consuming manual work more efficient using clever software.
Eva Pettersson’s language technology development work in Gender and Work is also her PhD thesis. Over four years’ time she has developed a tool which both she and Professor Ågren deem promising.
“I have developed a comprehensive solution. Now it needs refining,” says Eva Pettersson.
The researchers thank their linguist colleague Ingrid Almqvist who first realised the two fields could benefit each other. They find it rewarding to work across department borders.
“Often we don’t know what we have to offer each other,” says Eva Pettersson. Professor Ågren adds:
“It takes a broker, someone to recognise the opportunities.”
Highlights verbs about work
Eva Pettersson and Maria Ågren know what they want to achieve through their cooperation – software that can highlight all verbs in a text and also rank phrases describing work the highest.
“That would be the dream scenario,” says Maria Ågren.
Her research team have a very time-consuming task. They manually go through hand-written court minutes which require special training to read, identify verb phrases describing work, copy the text and file it in a database. Today there are 20,000 verb phrases in an open, searchable database.
“Thanks to large research grants, we have been able to have many people working on this over five years’ time. But there are huge differences in data gathering between the natural sciences and the humanities. That’s why it is so important for us to further develop our digital tools,” says Maria Ågren.
Teaching the model modern spelling
The first part of Eva Pettersson’s work was to manually translate historical spelling into modern spelling. The normalised spelling is then used as training data which ‘teaches’ the model how historical words are spelled today. This is necessary for the ‘tagger’, the software that identifies word classes, to be able to find the verbs in the historical texts.
“We have tested these two steps with good results. The two subsequent steps are more complicated – teaching the tool to find verb phrases and to rank them,” says Eva Pettersson.
She has tested the software, or parsers, used to find words related to the verb, on the translated texts with some success. But historical texts can have sentence structures that the software does not recognise. Eva Pettersson is working on fine-tuning the tool to find verb phrases. The same goes for the ranking. The researchers want to be sure that verb phrases are ranked by relevance so that phrases describing work end up at the top.
No time savings yet
So far, the tool has not contributed to any time savings. But the work to achieve that goal continues.
“Eva uses the material we type in to train her model and at the same time we are given reason to reflect on our manually acquired results. Have we maybe gone wrong somewhere or missed a verb phrase?”
“A perhaps even larger problem to solve in order to save time is how hand-written documents are to be automatically converted into digital text, without manual labour. That is something we will look into going forward,” says Maria Ågren.
Facts: Gender and Work
The Gender and Work project researches how men and women in Sweden made a living between 1550 and 1800. The researchers look for verb phrases which describe work in hand-written court minutes. One interesting finding is that marital status had much greater significance than sex for the kind of jobs people had. Project manager Maria Ågren’s grant as a Wallenberg Scholar has now been prolonged with SEK 3 million per year for another five years. The next step is to investigate work life between 1720 and 1880.
9 November 2016