State Helpline Project: Topic Analysis Based on Call Transcription

Sander Tars
March 10th, 2021


The Estonian Government’s information telephone is 1247. Its main purpose has been answering citizen questions related to the coronavirus crisis. Additionally, most information numbers (e.g. road-info) are currently also gathered under 1247. These calls need to be analysed to better understand what people ask and how to solve their issues more efficiently. Since there are already more than 100 000 calls per year, manual call topic analysis is not reasonable. 

EU regional development fund horizontal

The goal of this project was to explore and implement machine learning approaches to achieve gender, age and language detection, sentiment analysis, sensitive information retrieval, and topic detection. Since the time-frame for these practical experiments was limited to one month, the experiments were purposefully designed to be brief and exploratory.

For the experiments, we used two types of data – text and sound. Sound data is directly available from the 1247 calls but text data had to be transcribed using our previously developed automatic speech recognition model.

We applied gender-, language-, and age detection to sound data. More specifically, we used MFC coefficients generated from five-second call snippets. 

  • Gender detection experiments gave well-usable scores already in experimental tests.
  • Language detection is sufficiently achievable. The experimental scores are not usable per se, but give a strong indication that after some minor improvements sufficient language detection is achievable.
  • Age detection requires a designated labelled dataset to be feasible. 

On transcribed text data we applied sentiment analysis, sensitive information removal, and topic detection.

  • Sentiment analysis did not have suitable freely available datasets for Estonian. (This would not be a problem in case of English.) Indicative tests on related datasets showed that with a designated dataset, sentiment analysis is also achievable. 
  • Sensitive information retrieval (names, ID-codes, etc.) experiments were highly successful. However, it is not clear whether other tasks, such as topic detection, would still be feasible after sensitive information removal. 
  • Topic detection had average experimental results. We are confident that when additional attention is given, accurate topic detection is feasible.

From the ML approaches we deem gender and language detection, sensitive information retrieval, and topic detection feasible with limited additional effort. Age detection and sentiment analysis require more effort and business-related consideration. All of the experiments would benefit from additional 1247-specific labelled data. We also think that none of the proposed directions are infeasible and pending sufficient resources and business interest all of these deserve a more thorough look.

Want to learn more about our products?

Want to learn more about our products?

Harry Liimal

Harry has years of experience working in business development, strategy, and digitalization for some of the world’s largest companies. He is a people person who enjoys fast-paced environments and multiple responsibilities. He is especially curious about transformation and disruptive technology.

Harry holds a Master’s degree in Law from the Tartu University and is acquiring an MBA at the Estonian Business School. He has also studied innovation management in Japan at the Nagoya University of Commerce and Business.

Talk to us

harry liimal

Want to learn more about our products?

Want to learn more about our products?