Case study: Web Scraping and Recommendation System for Soulie

Irina Kolesnikova
May 31st, 2023

What is Soulie?

Soulie is the first app that allows users to create their personal algorithms and customize their social media experience. Unlike other apps, Soulie takes into consideration each user’s unique personal goals, interests, and wishes. In this way, users can control their online behavior and avoid mindless scrolling.

With Soulie, users can state what they want to see, how much of it they want to see, and what topics they are interested in. This level of customization is something that other platforms don’t offer: instead of being forced to consume content that may not be relevant or interesting, users can now take control of their online experience.

MindTitan created the basics of the Soulie backend system, including the web scraping and recommendation system components.
.

Ann Margit Järvekülg, Soulie co-founder & CMO

Ann Margit Järvekülg, Soulie co-founder & CMO:

At Soulie, we were conducting an applied research project together with researchers from the University of Helsinki, and we were in need of Machine Learning experts.

Our collaboration with MindTitan allowed us to achieve an A+ project delivery, completing it on time.

Though hiring an agency like MindTitan can be a costly option for startups, it can definitely speed up the project’s progress if managed well. I’d say completing the project would have taken us twice as long without MindTitan.

Results of the case study

The first step in avoiding the frustration and guilt that come with mindless scrolling on social media platforms is made thanks to Soulie. With this personal algorithm, 800+ users could enjoy a more customized and intentional online experience in the Alpha testing version.

Andres Tiko, Soulie co-founder & CTO:

We recently launched the Alpha version of our app which serves as a technical demo.

The app features a recommendation system that we developed in collaboration with MindTitan, enabling users to input the topics they’re interested in.

The system performed exceptionally well, and we garnered around 800 users to test the technology.

This phase of the project was crucial in determining whether our idea would work and meet our requirements.

People continued to use the app even after the Alpha phase. We have now closed access to the Alpha and will be launching our Beta version soon.

The problem described

Many people are feeling frustrated and trapped by the social media platforms they use. It’s easy to get caught in a cycle of scrolling through content, wasting hours without even realizing it. However, the blame cannot solely be placed on the users.

Social media platforms have algorithms designed to keep users engaged and coming back for more. These algorithms utilize infinite scrolling and recommend content that will keep users on the platform for longer periods of time.

Realizing that something needed to change, a group of individuals came up with an idea: Soulie.

Ann Margit Järvekülg, Soulie co-founder &CMO says that as Soulie is a startup, it was always their goal to have own technical team with expertise in machine learning, AI, and recommendation systems. However, building such a team from scratch takes time, which, in this case, they did not have. Especially in the early stages of a startup, it’s crucial to move fast and get your product out to market as soon as possible. This means making strategic decisions about where to invest your resources.

For us, hiring MindTitan was the logical choice. Their expertise in the field of AI and machine learning was exactly what we needed to get our product off the ground. Working with MindTitan allowed us to hit the ground running and focus on developing our product, without the added stress of trying to build a technical team from scratch quickly to meet our project deadlines. (Ann Margit Järvekülg, Soulie co-founder &CMO)

What made Soulie’s mission particularly inspiring was its focus on reducing mindless scrolling and promoting serendipitous discovery. Their goal was to provide users with content that was not only interesting but also valuable and relevant.

“What made this project even more interesting was the opportunity to experiment with a different type of presentation. Instead of using neural networks to recommend items, we were able to leverage an open search engine to quickly find recommendations for users. The architecture of the project was complex, with many moving components, which made it all the more exciting for us to work on.” (Marco Piscopello, MindTitan business analyst and data scientist)

Marco Piscopello, MindTitan business analyst and data scientist

Marco Piscopello, MindTitan business analyst and data scientist:

It was a great match between what our team was capable of and what the Soulie team wanted to achieve.

Our experience in fetching content from online sources was a core functionality that was necessary for their project, as content analysis required data to be scraped first.

After all, NLP models and content scraping are our core experiences. Using our know-how, we set up the initial approach to Soulie’s AI, paving the way for their team to continue working with the app and expanding its functionality.

Creating the recommendation model

The MindTitan team was initially involved in the planning process and had a clear roadmap of how to proceed. The approach was to gather diverse inputs from various parts of the app, such as profiles, feedback, and favorites, to gain a better understanding of the users. This follows the typical recommendation model approach, but, as an initial approach, a randomizing component was added to promote serendipitous discovery.

To analyze feedback and interactions with random content, the system continues testing to identify patterns that traditional algorithms may not recognize. For instance, a user who enjoys DIY may also be interested in DIY gardening, which a regular algorithm may not recommend. Soulie is made to incorporate a learning mechanism into the algorithm to provide random content, connected to the topics that users positively responded to, which could improve recommendations in the future.

The scraping module enables the system to retrieve data from open sources, such as sites, while, additionally, the app takes other user-related data, such as profiles, likes, and favorites. The data is suitably transformed for an AI model, which runs on a machine learning component. It analyzes the data and provides recommendations not only of preferable content but also suggestions to expand the horizon and step out of the typical social media information bubble.

Because of the technical peculiarities of social media, monitoring was essential.

It was an interesting process that came with many challenges. We used a serverless database, DynamoDB, to store user-related data, which was scalable and offered parallelization benefits. However, it required a lot of tuning, and we had to monitor it thoroughly. For that, we developed a comprehensive monitoring system and a development pipeline that created ways for Soulie to interact with the system in a controlled manner, reducing the risk of errors.” (Marco Piscopello, MindTitan business analyst and data scientist)

 

ML model for Soulie

Business understanding hurdles and solutions

Creating a clear vision for a project is crucial for success, and the Soulie and MindTitan collaboration faced a major challenge during the initial stages.

“When starting the project, we were not completely certain about the technical functionality needed for the prototype. This lack of clarity made it difficult to effectively communicate our needs and set the exact scope for the project.

Initially, we only had a vague idea and hoped that MindTitan’s technical team would be able to come up with a solution that would meet our needs. Once we had a working prototype, the essential functionality and what we could do without became clearer. This meant revisiting our original plans and re-evaluating what we wanted to achieve.

As a result of this, we ended up with a slightly different idea than what we started with. This process of reconsideration and re-evaluation was essential in helping us define our goals and objectives more clearly. Ultimately, we were able to overcome this challenge and create a more defined vision for our project.” (Andres Tiko, Soulie co-founder & CTO)

To identify the idea from the very beginning, the business should ask the right questions, looking for objectives to improve with the help of AI. These include “What is the value proposition of the use case?” or a What+Why+Who+How combination ( What are we trying to do? Why is it important? Who are the users? How can success be measured?)

Then, with the help of a machine learning canvas and a team of experts, the business problem is turned into an AI use case.

ml canvas

Technical complexity hurdles and solutions

Additionally, there were several real-world technical challenges that made the project complex. First, web scraping is inherently complicated because it can stop working at any time, and troubleshooting can be difficult. Moreover, while changing the code in Python is straightforward, modifying it for Open Search requires a deeper understanding of what’s working and what’s not, including identifying and adjusting non-obvious settings that may be impacting query performance.

Soulie testing

These problems became more apparent as the user base rapidly grew from 10 to close to a thousand. Coordinating monitoring across different systems and software was also a challenge, as some processes ran twice a day while others required constant attention. Overall, the combination of web scraping, Open Search, and complex system interactions made the project difficult but ultimately rewarding, as all the issues were resolved.

“Whenever any unexpected event happened, MindTitan was able to take charge and fix the issue. We were able to overcome the errors in communication that can easily happen in complex projects like this one. From my side, I can attest to the team’s friendliness and responsiveness.” (Ann Margit Järvekülg, Soulie co-founder & CMO)

Conclusion

Soulie anticipates a future where the traditional model of infinite scrolling feeds, over which people have little control, will be replaced. Their vision is for individuals to be the sole owners of their personal algorithms or data sets, which they can rent out to other platforms for financial gain. For instance, people may choose to rent out their data exclusively to a social media platform, allowing them to use this personal data and show relevant content.

However, people would retain complete control over what platforms can and cannot use. Currently, social media platforms depend on user data to display ads, but, in the future, which the Soulie team is building, individuals will be able to monetize their attention, making money from companies that seek to capture it.

At the moment, Soulie is training its own in-house team that could take over all aspects of our AI and machine learning, developing the technology initially created in collaboration with MindTitan.

“However, we couldn’t have done it without MindTitan. They were a key partner in our early stages, and their expertise allowed us to focus on what we do best – building a great product. We will always be grateful for their contribution to our success, and we look forward to working with them in the future as well.” (Andres Tiko, Soulie co-founder & CTO)

“It was exciting to be a part of this journey, and we were glad to contribute to their vision in any way we could. Although we had some challenges, overall, the system performed well, and we are proud of what we accomplished.” (Marco Piscopello, MindTitan business analyst and data scientist)

New call-to-action

Go back