The Estonian Government’s information telephone is 1247. Its main purpose has been answering citizen questions related to the coronavirus crisis. Additionally, most information numbers (e.g. road-info) are currently also gathered under 1247. These calls need to be analysed to better understand what people ask and how to solve their issues more efficiently. Since there are already more than 100 000 calls per year, manual call topic analysis is not reasonable.
The goal of this project was to explore and implement machine learning approaches to achieve gender, age and language detection, sentiment analysis, sensitive information retrieval, and topic detection. Since the time-frame for these practical experiments was limited to one month, the experiments were purposefully designed to be brief and exploratory.
For the experiments, we used two types of data – text and sound. Sound data is directly available from the 1247 calls but text data had to be transcribed using our previously developed automatic speech recognition model.
We applied gender-, language-, and age detection to sound data. More specifically, we used MFC coefficients generated from five-second call snippets.
- Gender detection experiments gave well-usable scores already in experimental tests.
- Language detection is sufficiently achievable. The experimental scores are not usable per se, but give a strong indication that after some minor improvements sufficient language detection is achievable.
- Age detection requires a designated labelled dataset to be feasible.
On transcribed text data we applied sentiment analysis, sensitive information removal, and topic detection.
- Sentiment analysis did not have suitable freely available datasets for Estonian. (This would not be a problem in case of English.) Indicative tests on related datasets showed that with a designated dataset, sentiment analysis is also achievable.
- Sensitive information retrieval (names, ID-codes, etc.) experiments were highly successful. However, it is not clear whether other tasks, such as topic detection, would still be feasible after sensitive information removal.
- Topic detection had average experimental results. We are confident that when additional attention is given, accurate topic detection is feasible.
From the ML approaches we deem gender and language detection, sensitive information retrieval, and topic detection feasible with limited additional effort. Age detection and sentiment analysis require more effort and business-related consideration. All of the experiments would benefit from additional 1247-specific labelled data. We also think that none of the proposed directions are infeasible and pending sufficient resources and business interest all of these deserve a more thorough look.
Want to learn more about our products?
HEAD OF GROWTH AND MARKETING
Konstantin has graduated from the Estonian Business School major in economics and finance and is currently doing his MBA degree in the USA. Before joining MindTitan he had an international business management experience for more than 5 years and overall more than 9 years of international B2B sales and marketing experience