Technical Capabilities of Machine Learning Operations (MLOps)

Until recently, an integral part of the training was exposure to the standard software development life cycle (SDLC). This would start with requirements analysis, and then be followed by planning, engineering and design, development, testing, deployment, and finally maintenance. Many learn waterfall, iterative, and agile software development models also.

Almost every organization is trying to implement machine learning and artificial intelligence (AI) in their business. The growing need to create machine learning systems complements some of the principles of SDLC, which eventually form a new engineering discipline called Machine Learning Operations, or MLOps.

What Is MLOps?

MLOps is an engineering discipline that aims to unify the development (dev) and deployment (ops) of machine learning systems in order to standardize and optimize the continuous delivery of high-performance models to production.

Simply put, it’s a way to take the pain out of the development process and make it easier to deliver machine learning-based software, not to mention streamlining work for every team member.

Historically, data science teams dealt with tangible amounts of data and very few models on a small scale. Now the situation is changing, and businesses have to implement decision automation in the widest range of applications. And of course, this goes hand in hand with a whole host of technical challenges that come with building and deploying machine learning-based systems.

To better understand MLOps, we first need to look at the lifecycle of machine learning systems, which usually involves engaging several development teams at once, covering different aspects of working with data. This often involves:

Business development team or software product development team – defining business goals using key performance indicators (KPIs).
Data engineers – data collection and preparation.
Data scientists – designing machine learning solutions and developing models.
IT or DevOps – deployment and monitoring with a team of data scientists.

Several teams at Google have done extensive research on the technical challenges involved in building systems based on machine learning. The NeurIPS article on Hidden Technical Debt in Machine Learning Systems clearly demonstrates that model development is only a small part of the whole process. There are many other processes, configurations, and tools that need to be integrated into AI systems.

In order to optimize this whole system, a new engineering culture of machine learning was formed. And it involves everyone from top management with minimal technical skills to data scientists, DevOps, and machine learning engineers.

What Tasks Does MLOps Solve?

When managing systems on a large scale, there are many bottlenecks that need to be taken into account:

Lack of data scientists capable of developing and deploying scalable web applications. A new profile of machine learning engineers is emerging that stands at the intersection of data science and DevOps.
Change business goals in the model. With dependencies on ever-changing data, the need to maintain model performance standards, and provide AI-driven control, retraining a model in response to changes in business goals is a constant need.
Mutual misunderstandings between technical departments and business teams can make it rather difficult to find a common language within the project framework. Most often, it is this misunderstanding that causes the breakdown of large projects.
Risk assessment. The nature of the “black box” of such machine learning and deep learning systems is a matter of constant debate. Since models tend to deviate from what they were originally intended for, assessing the risks and costs of such deviations is a very important step. For example, the price of an inaccurate YouTube video recommendation will be significantly lower than the cost of flagging an innocent person as a scammer and then rejecting their loan applications.

Opportunities And Risks Of MLOps

According to Forbes, the market for MLOps solutions will reach $4 billion by 2025. Not surprisingly, data-driven analytics is changing the landscape of all market verticals. The value of AI in the US agricultural market, for example, is projected to be 2629 million for 2025, which is almost three times more than it was in 2020.

To illustrate this point, let’s recall two important rationales for implementing machine learning: multiple parameters and the ability to solve conceptual problems. Machine learning models can provide many features, namely:

Recommendations
Classification
Forecasting
Content creation
Answers to the most pressing business questions
Automation
Fraud and anomaly detection
Extraction of information

MLOps is designed to manage all of these tasks.

However, it also has its limitations, which we recommend considering when producing ML models:

Data quality. The better your data, the better the model can help solve a business problem.
The change of the model. Real data changes over time and needs to be managed on the fly.
The location of the data. Models that are pre-trained on different user demographics may not perform appropriately when moving to other markets.

Meanwhile, MLOps is especially useful when experimenting with models that use an iterative approach. MLOps is ready to go through as many iterations as needed since machine learning is experimental. This helps to find the right set of parameters and create reproducible models. Any change to data versions, hyperparameters, and code versions results in new versions of the deployable model that allow for experimentation.

Machine Learning Workflow Lifecycle

Each machine learning project aims to build a statistical model from data using a machine learning algorithm. Therefore, data and machine learning models are two different artifacts for software development in terms of code development. In general, the machine learning life cycle consists of three elements:

Data Engineering: Providing and training datasets for machine learning algorithms. Includes data ingestion, exploration, validation, cleaning, labeling, and splitting (per training, validation, and testing data set).
Model Design: Preparing the final model. Includes model training, evaluation, testing, and packaging.
Model Deployment: Integrating a trained model into a business application. Includes model maintenance, performance monitoring, and performance logging.

MLOps: When Data And Model Meet Code

Because machine learning introduces two additional elements into the software development lifecycle, things get more complicated than when using DevOps for any software development. While MLOps still strives for version control, unit and integration testing, and continuous package delivery, it brings some new differences compared to DevOps:

Continuous Integration (CI) applies to the test and validation of data, schemas, and models, not just code and components.
Continuous Deployment (CD) refers to an entire system that is designed to deploy another service provided by machine learning, not to an individual software or service.
Continuous learning (CL) is unique to machine learning models and means maintaining and retraining models.

The level of each stage of data engineering, model engineering, and deployment automation determines the overall maturity of MLOps. Ideally, the CI and CD pipeline should be automated to define a mature MLOps system. So there are three levels of MLOps, divided into categories and based on the level of process automation:

Level 0 MLOps: The process of creating and deploying an ML model is completely manual. This is sufficient for models that rarely change or train.
Level 1 MLOps: Continuous model training by automating the machine learning pipeline, good for models based on new data, but not for new machine learning ideas.
Level 2 MLOps: CI/CD automation allows you to work with new ideas for feature design, model architecture, and hyperparameters.

Unlike DevOps, model reuse is a different story as it requires data manipulation and scripting as opposed to software reuse. Since the model decays over time, it becomes necessary to retrain it. In general, data and model versioning is the “code versioning” in MLOps, requiring more effort than DevOps.

Benefits And Costs Of MLops

To think of a hybrid MLOps approach for the team that implements it, it is necessary to evaluate the possible outcomes.

Pros of MLOps:

Automatic updating of multiple pipelines, which works well, since it’s not a simple task with a single code file
Scalability and management of machine learning models – depending on the volume, thousands of models can be under control
CI and CD are organized to serve machine learning models (depending on the maturity level of MLOps)
ML Model State and Management – Simplified Model Management after Deployment
A useful method for people, processes, and technology to streamline the development of machine learning products.

It can take some time for any team to adapt to MLOps and develop their way of working. Here is a list of possible stumbling blocks to look out for:

MLOps costs:

Development: more frequent manipulation of parameters, functions, and models, non-linear experimental approach compared to DevOps
Testing: Includes data and model validation, and model quality testing.
Production and monitoring: MLOps needs constant monitoring and verification of accuracy
Memory monitor: Monitor memory usage while making predictions
Monitor model performance: model retraining is applied over time as the data may change and this may affect the results
Infrastructure monitoring: ongoing collection and analysis of relevant data
Team: Spending time and effort getting data scientists and engineers to accept your system

Conclusion

As you can see, MLOps has a lot of technical capabilities. As a result, each company must develop its own set of practices to adapt MLOps to its AI development and automation. We hope that the mentioned guidelines will help you smoothly implement this philosophy in your team.