How to put together a machine learning team for an AI project

When starting a new project in the artificial intelligence field, business leaders often meet questions: how to build up a successful machine learning team? What members should be present in a sufficient and effective machine learning team for an AI project development? How should they interact? These questions could cause pressure because the mistakes in building a machine learning team could cost time and money, as well as threaten the project with delays and inefficiency.

To avoid this danger, MindTitan’s team of experts, who successfully implemented more than 100 projects in 20 countries worldwide, created a guide on how to put together a machine learning team

Part 1. The roles of a basic team for an AI project

When businesses are faced with a data science project, people usually think that a couple of data scientists are sufficient, and they will do everything. But, to do complex tasks fast and well, there are many more roles to be covered.

Largely, the roles could be divided into “a problem owner side”, “an AI solution side” and “connectors”.
The problem owner and tech support specialists obviously represent the problem owner’s side. The data scientist, data engineer, machine learning engineer, analyst, and software engineer are from the AI solution side and the project manager is a bridge connecting both sides.

The problem owner

The problem owner is the person who feels the pain of the business process and who understands both the process and the pain very well. The cool thing about an AI project is that it starts with the person from the business side who has a problem. It really makes no difference whether this person is from the accounting, legal, or mechanics department: the main point is to be ready to put considerable effort and energy into the AI project, therefore accepting that this is not just another IT development project where you demand something and then wait for the brighter future while the IT wizards do their magic.

Responsibilities:

Defines the problem to be solved and explains why it is important.
Makes the GAP analysis and machine learning canvas to describe the objective differences between the process today and its ideal look in the future.
Picks the correct metrics and KPIs that should measure the success of the AI project.
Sets targets for the whole team based on those KPIs.
Constantly evaluates the project’s success business-wise, providing feedback to the team and guiding them toward the best business results.
Sets up the necessary processes to keep the AI project in line with the business after the development phase has ended.

The technical support specialist

This role is necessary, however, it could be the same person as the problem owner. The technical person provides access to the already existing systems, such as previously deployed applications, or knows the peculiarities of the production systems. This person is not developing the AI, but facilitating and speeding up this process.

Responsibilities:

Gives access to the necessary systems for the AI team.
Knows where the data is.

Analyst

These specialists are required from the very beginning of the project for the mapping, to determine what needs to be done from the business perspective and what are the technical demands of the project. This part might require some help from an architect as well to help figure out the entire structure of the system while translating the business problem into a technical dimension. Then, their work could be needed more during application development, but usually, their involvement is lower after the initial mapping.

Responsibilities:

“Translates” a business problem into a technical dimension.
Determines technical requirements for the project.

Data engineer

Data engineers are handling raw data and setting up a suitable architecture to refine or process the data. These specialists decide how long the data is stored, or which platform or type of data storage would be the best for the job. The first round of data cleaning, checking and fixing errors, and organizing the data to make it more presentable and easier to understand are also the data engineers’ responsibilities, which usually means a multi-step process to construct, test and maintain architectures, such as databases and large-scale processing systems.

Responsibilities:

Analyzes and organizes raw data, creating datasets.
Develops and manages the infrastructure to transfer the data.
Builds data pipelines from a data source or several sources.
Prepares the data for a data scientist.
Will be “forever” responsible for the quality and usability of the datasets used by your AI.
Prepares data for prescriptive and predictive modeling.
Explores ways to enhance data quality and reliability.
Identifies opportunities for data acquisition.
Collaborates with data scientists and the machine learning engineer.

Data scientist

The data scientist is actually making the magic of AI happen: they make ML models learn to solve the business problem by using data. In other words, data scientists are people who take the data, cleaned, processed, and supplied by the data engineers, put it into an algorithm, built with a help of machine learning engineers, and — voilà! — the result is on the table, performing the progress and benefiting the business.

But jokes aside, data science has a lot to do with the analytical part of data processing behind that magic. Data scientists are tasked with finding data-driven solutions to business problems. For example, they might be looking at user data to find meaningful user segments and building models that can classify those users into segments to differentiate the end-user experience and drive more engagement.

Responsibilities:

Selects appropriate data representation methods.
Cleans the data, so it can be used for building a model and selects appropriate data sets.
Verifies data quality.
Finds the patterns and trends in the data that are relevant for solving this particular business problem.
Creates, trains, and retrains the machine learning models.
Identifies differences in data distribution that affects model performance.
Evaluates the results and correlation with the business KPI-s.
Uses the results to improve models.
Visualizes and communicates the results to all involved parties.
Researches and implements ML algorithms and tools.
Performs statistical analysis.

Machine learning engineer

A machine learning engineer (ML engineer) is a person in IT who focuses on building and designing artificial intelligence (AI) systems to be used by companies in production. They are helping to get AI systems developed by data scientists to get to production and glue together the work done by the data scientist and the data engineer. Machine learning engineers design and create AI algorithms capable of learning and making predictions that define machine learning (ML).

Responsibilities:

Designs machine learning systems.
Transforms and converts data science prototypes.
Runs machine learning tests.
Extends machine learning libraries.
Develops machine learning apps according to client requirements.

In the picture below, you can see how the process of setting up machine learning moves from data ingestion to model deployment. The data engineer, as mentioned above, is working with raw data, and manages data storage and pipelines. The data scientist deals with model development and validation. There is a point of connection between the responsibilities of a data engineer and a data scientist, called Feature Store. This part automates and centrally manages the data processes, and allows data practitioners to build and deploy features quickly and reliably. The ML engineer deploys and monitors the ML model, building a machine learning pipeline together with the data scientist.

Depiction of the connections between a data engineer, a data scientist and a machine learning engineer. — Source: Valohai blog

Developer

The developer or software engineer analyzes and modifies existing software as well as designs, constructs and tests end-user applications that meet user needs — all through software programming languages.

Responsibilities:

Develops, tests, quality assures software.
Analyzes user requirements, software, and code.
Conducts systems risk and reliability analysis.
Monitors systems performance.
Performs maintenance and software integrations for existing systems.
Maintains or exceeds compliance with industry standards.
Identifies and assesses new technologies prior to implementation.
Develops and executes project plans.
Creates technical specifications.

Project manager

The project manager oversees projects. These professionals are adept change agents, and comfortable working in a complex, dynamic environment. In addition to strong analytical skills, project managers have excellent soft skills, which instill trust and smooth communication between all stakeholders involved. A good project manager for AI knows what is “regular IT” and is familiar with common programming languages, but also understands what is machine learning and how it differentiates from regular IT.

Responsibilities:

Understands and explains how the business units operate and how business value is created.
Has good people skills and knows how to communicate with both sides (and explain/ translate between them).
Owns the toolbox of leading people without legal authority.
Keeps the team productive.
Explains the expectations to each participant.
Plans and establishes project goals.
Sets milestones and tracks progress.
Ensures timely delivery.
Assesses and improves, and irons out glitches along the way.

Additional project members might be required along the way to bring the project to life, also including data engineers, DevOps, sysadmins, UI/UX designers, etc. As the project matures, everything besides AI model development itself might become a bottleneck. Thus, be prepared to extend the team if that happens.

However, building such a team in-house could take forever, because it is a complex thus demanding, and time-consuming task. It is hard to define which team composition is going to be the most effective, especially if you have not hired machine learning experts for an AI project before, as AI development does not only mean the AI model development, it includes a sequence of processes. To assemble a proper team, to let these AI development processes proceed at a reasonable pace, asking for help or at least advice from AI experts is a good idea.

Part 2 Choose an in-house or outsourced machine learning team

When starting a new project in the artificial intelligence field, apart from other important things to consider, business leaders often meet a dilemma: what is the best option – to hire an in-house team or machine learning outsourcing?

The main problem of in-house teams: uneven workload

On the graph, you can see the workload is unevenly distributed between different roles.

The analysts work from the very beginning of the machine learning project, mapping, determining the goals from the business perspective and establishing the technical demands of the project. Then, they could be involved during application development, but usually, their participation is less intensive after the initial mapping.
Data scientists develop the model itself. Afterward, their involvement diminishes after the model is launched.
Developers are only working at the app development stage to implement the benefits of the AI model after the first usable model version is ready for use.
A project manager is part of the process the entire time, but relatively less in the late project phases.

This problem could be easily solved by hiring an external machine learning team. Also, the workload is uneven during the time: as shown in the picture below, the core AI development phase is followed by a maintenance phase, where the effort usually decreases with every iteration, thus contracting additional machine learning specialists could be beneficial.

There are more differences between hiring an in-house and an outsourced machine learning team, like the price-quality rate or learning curve.

Kristjan Jansons, MindTitan CEO and co-founder explains, that some pauses in machine learning projects are not rare; for example, there could be a long wait for some data collection or labeling, or unfinished work from another person could hold up the entire project, as it is an input for the others’ work. MindTitan solves it easily by switching people between projects. With an internal team, only one project in hand, and a limited number of specialists, it could be weeks or sometimes even months wasted because people are just waiting for something.

“Of course, people can always come to sit at work and do something, but are the salary outlays in such a case justified?” Kristjan Jansons continues. “Also, what do you do with the in-house team when the core of the AI project is finished, and it enters the maintenance phase? Most AI projects do not need endless and constant effort from the machine learning team. Most AI projects will have diminishing returns once a certain target metric has been achieved.”

The MindTitan team of experts has faced hundreds of potential AI use cases by now, therefore we would recommend an outsourced team for at least 90+% of the cases.

How to recognize the right team for machine learning projects if you outsource development.

They have a well-defined process laying out how they work.
They have a track record with similar projects. However, this point could be tricky, because, when it comes to data science projects, business people might not always have the best judgment as to what similar means. For data scientists, some projects actually might be quite similar, while for a businessperson they might look different.
They as a team have a range of skills and competencies that cover what you are looking for. This point requires extra attention and consultation with ML experts because first-time tech business leaders don’t always have enough knowledge to recognize the skills they should search for.
They can function both independently and as a part of your team at the same time to create a successful project. However strong the AI team is, if they don’t cooperate with the business side, the project will not work at maximum efficiency or even fail due to goal misalignment.
Last, but not least: find out what is the machine learning team going to deliver: is it just an AI model development or a turn-key solution? The second option is better because the project will not get stuck while being deployed and maintained.

Instead of a conclusion: collaboration for success

To launch a successful AI project, it is crucial to have on board all the sides described. In general and especially in the early stages, having business people assigned to the initiative is vital for success to ensure that you are on the right way, solving the correct problem.

It is crucial to keep in mind that AI development and training is an iterative process: new data comes after initial implementation, which leads to a new training cycle and sometimes even new business goals. For example, as a call center automation project matures, a KPI could change from pure automation to a higher satisfaction level. That’s why the contact between the business part and the technical part of the AI project team should be continually maintained.

Working closely with business people, a good machine learning team will identify the key areas where AI can bring the most value, develop a roadmap for action and implement ML successfully, bringing the most benefit to the business.

Go back