State Helpline Project: Topic Analysis Based on Call Transcription

Sander Tars
March 10th, 2021

Helpline

The Estonian Government’s information telephone is 1247. Its main purpose has been answering citizen questions related to the coronavirus crisis. Additionally, most information numbers (e.g. road-info) are currently also gathered under 1247. These calls need to be analysed to better understand what people ask and how to solve their issues more efficiently. Since there are already more than 100 000 calls per year, manual call topic analysis is not reasonable. 

EU regional development fund horizontal

The goal of this project was to explore and implement machine learning approaches to achieve gender, age and language detection, sentiment analysis, sensitive information retrieval, and topic detection. Since the time-frame for these practical experiments was limited to one month, the experiments were purposefully designed to be brief and exploratory.

For the experiments, we used two types of data – text and sound. Sound data is directly available from the 1247 calls but text data had to be transcribed using our previously developed automatic speech recognition model.

We applied gender-, language-, and age detection to sound data. More specifically, we used MFC coefficients generated from five-second call snippets. 

  • Gender detection experiments gave well-usable scores already in experimental tests.
  • Language detection is sufficiently achievable. The experimental scores are not usable per se, but give a strong indication that after some minor improvements sufficient language detection is achievable.
  • Age detection requires a designated labelled dataset to be feasible. 

On transcribed text data we applied sentiment analysis, sensitive information removal, and topic detection.

  • Sentiment analysis did not have suitable freely available datasets for Estonian. (This would not be a problem in case of English.) Indicative tests on related datasets showed that with a designated dataset, sentiment analysis is also achievable. 
  • Sensitive information retrieval (names, ID-codes, etc.) experiments were highly successful. However, it is not clear whether other tasks, such as topic detection, would still be feasible after sensitive information removal. 
  • Topic detection had average experimental results. We are confident that when additional attention is given, accurate topic detection is feasible.

From the ML approaches we deem gender and language detection, sensitive information retrieval, and topic detection feasible with limited additional effort. Age detection and sentiment analysis require more effort and business-related consideration. All of the experiments would benefit from additional 1247-specific labelled data. We also think that none of the proposed directions are infeasible and pending sufficient resources and business interest all of these deserve a more thorough look.

Want to learn more about our products?

Want to learn more about our products?

Kristjan Jansons
Co-founder, CEO

Kristjan has been studying and working on machine learning projects for more than 7 years.
After acquiring a Master’s Degree in Computer Science and Machine Learning, he started working at Milrem Robotics as the Team Lead for Autonomous Vehicles, helping to build self-driving vehicles.

Kristjan also has experience in building intelligent systems for data centers, robots and electric formulas; also with computer vision and image recognition. He is especially fascinated by how people from different industries combine their knowledge with data science, arrive at new insights and help to accelerate innovation.

Talk to us

kristjan jansons

Want to learn more about our products?

Want to learn more about our products?