When Deploying Your Machine Learning Model Isn’t Easy

Machine Learning Deployment for Complicated Environments — solutions for ML on the edge

8 min readOct 21, 2022

The “Iceberg of Concerns” for MLOps (image by Hypercube Consulting)

Introduction

I’ve worked with customers across the energy and financial services sectors for the best part of a decade now. They’re often blessed with talented data scientists who can crank out superb models. This is promising as there’s so much untapped potential in the data many of these organisations hold — I truly think we’ll see colossal shifts across these industries as organisations get more and more mature with their data science delivery processes.

In that time, and especially in recent years, the field of MLOps has boomed. If you’re not familiar with MLOps think of it like this — where data science proves that you can build a helpful model, MLOps is the process of getting that model into a state that can serve its predictions at the scale and speed required for it to be valuable. It’s been great to see this shift from pure model development to all the considerations surrounding putting machine learning models into production.

The thing is, this is all still forming. There’s no right or wrong answer. No best practices. No single vendor that’s solved all the problems.

In most cases, you’ll have to roll your solution composed of several things. There’s lots of competition for your attention and conflicting advice out there.

I don’t have a silver bullet, but I’ve deployed everything from fully bespoke platforms to integrated open-source tooling all the way to comprehensive end-to-end MLOps tools that do it all for you.

Last Mile Problem

Going from that working model to something production-ready throws up many more challenges than you’d first expect. A lot of the early literature and advice focused on deployment — getting your models from a Jupyter notebook on one machine to something many people can use.

That makes sense. Without deployment, nothing else really matters.

But MLOps holds a lot of other gotchas that you might not be aware of until you’re trying to fix them. I recently threw up this slide at a local MLOps Community meetup highlighting the hidden challenges of MLOps.

The stuff under the water line is what keeps your data scientists and machine learning engineers up at night. Many promising approaches and solutions are forming for many of them, but each poses a long list of technical challenges that need to be overcome. Consumption architecture, in particular, is an issue that can stump even the most robust technical teams — with the various possible platforms and contexts you might need to support, it can quickly get out of hand.

If you aren’t eager to code your entire MLOps platform from scratch — I’d advise you don’t! — you generally have two paths for delivering production machine learning systems:

Compose your solution from existing open-source tools that tackle each part of the problem
Utilise an end-to-end solution with a unified product experience (and potentially vendor support)

The challenge with OSS

Building their MLOps platform from open source might be the way to go for many organisations. It offers an outstanding balance between flexibility and speed of development against getting precisely the solution you need for your use case.

There are some significant challenges, though.

Firstly, every new tool you add increases the overall complexity of your system. You’re relying on your engineers to keep up to speed with another set of tools in an already busy ecosystem. This can lead to Frankenstein’s monster-like deployments that work but take significant overhead to support.

You also rely on the latest trends and changes in a shifting market. There’s no clear winner yet — meaning what’s popular and has easy-to-find experience on the talent market one day becomes unpopular and niche the next.

Finally — and maybe the biggest concern for some users — there are no SLAs or vendor support to fall back on when things get complicated. Your team will have to either develop the expertise in-house to overcome tricky problems specific to your situation or hope the community can come to the rescue.

The primary outcome of all this tends to significantly impact headcount and role requirements for the team responsible for MLOPs.

Wallaroo Introduction

So what about end-to-end products in this space? Well, you’ll be lucky to hear there are a few. The space continues to grow, and the future looks quite promising. I think there are a lot of tools out there competing for attention and plenty of overlap.

One that’s caught my eye recently is Wallaroo. If you haven’t heard of it, Wallaroo pitches itself as a “platform for the last mile of machine learning”.

What does this mean?

Wallaroo has spent a lot of time and effort thinking about optimising the deployment and management of machine learning (ML) solutions — the bit that often gets really difficult for data scientists. It’s Knowing the ins and outs of a good ML model is hard enough, figuring out how to deploy it even in simple situations isn’t something many data scientists are comfortable with.

Wallaroo handles this for you. What’s really good about the tool is how prioritised for high-speed inference it is. Reading their documentation and blog posts, this is because they’ve written the compute engine in Rust (the C-like programming language that’s taking the programming community by storm for its lightning-fast performance).

Another tricky aspect of deploying ML models is ensuring you have proper observability. Wallaroo has taken care of this. Out of the box, it gives you an extensive stream of audit logs that allow you to interrogate precisely what the model is up to out in production. This is further enhanced by clever insights and analytics on the logs themselves — empower the user to quickly analyse what’s going on, meaning they can manage more. This becomes even more important when you’re deploying large numbers of models across complicated environments, which leads me too…

Complicated Deployment

I work a lot with companies in the energy sector. There’s often a lot of very complex and complicated environments in which they operate — both collecting data and making decisions. The usual practices for deploying machine learning models into optimised cloud computing environments is difficult enough, but what if you want to deploy it to some random bit of hardware in your power plant or on your wind farm?

These are real issues that the energy industry faces. Take the wind farm example, these are often in the most remote locations and may either get sporadic network connections or require some bespoke solution for transmitting and receiving data. Furthermore the hardware you’re deploying these models on adds even more constraints.

There’s a huge rise in investment for battery storage to support the growing demands of an ever greener energy system in the UK. These batteries need to operate at very fast timescales. This means, for some cases, network round trips are prohibitively slow.

Looking for a solution to the combined challenges of fast inference and ease of deployment into less-than-ideal environments led me to Wallaroo.

How would you even begin to implement drift detection, continuous training, and automatic deployment for this situation?

The following figure is a quick sketch of how Wallaroo tackles this challenge.

Essentially we’re split into five main pieces in this approach. Let’s talk through this as though you’re wanting to deploy and monitor a new model.

On the left of Figure 1 you’ve got the developer — be they a data scientist or machine learning engineer. They can work on their local machine for exploration and development, as they see fit.

The MLOps API is the gateway to the Wallaroo control centre — a cloud based, centralised development environment that’s the core of the system. This enables them to pull data, code, logs, and other artefacts from this central control point. They’re also able to push changes and interact with the system in all the ways you’d expect for a modern MLOps platform.

Once the developer is ready to deploy a new model they can send an instruction, with all the right artefacts and config, to the Deployment Manager and have that handled for them. This is where Wallaroo does the clever stuff to line up deployments and build the containers that will host and run the models.

If your use case requires the model to be deployed on the edge, the required artefacts are then sent to the Edge Manager & Orchestrator. This acts as a stub for the end users' code if required, meaning you can push models to the edge deployment, or the software running on the edge node can take responsibility for selecting which model to run. One example for this would be a piece of industrial equipment being able to select different models published by the Wallaroo tools based on which mode it is operating in. In this sort of scenario, you don’t want model selection controlled remotely, it needs to be selectable depending on local conditions. Pretty cool.

The final piece of the puzzle then is the deployed model itself. This is neatly wrapped up in a flavour of Wallaroo’s purpose-built inference engine, enabling the consumers to make use of the model however they see fit.

All the data and calls that go through the Wallaroo Engine are also sent back to the Control Centre and stored as logs. Wallaroo has done some interesting stuff with machine learning and optimised analytics to cut through a lot of the noise here. If you’re not familiar with the headaches associated with dealing with raw logs from a mix of systems, count yourself lucky.

Last thoughts

The thing I like most about all of this is the massive reduction in complexity for the developer. There’s a lot that’s been abstracted away and not at the cost of performance.

If you’re interested in trying Wallaroo, you can download the free Community Edition here, and there are a load of excellent tutorials here:

Wallaroo Tutorials

Welcome to Wallaroo! Whether you're using our free Community Edition or the Enterprise Edition, these Quick Start…

hubs.la

If you want to look at the ML edge tutorial specifically, that can be found here:

Simulated Edge Tutorial

The Shadow Deployment Tutorial demonstrates how restrain resources for pipelines to operate in an edge environment…