The internet is flooded with information on how to do machine learning, deploy machine learning etc etc, but how do we think about it. What is the process that is involved. I don’t mean what are the best libraries for deep learning, or how I need to think about seasonal decomposition of time series, I’m talking about, to use a very business term : Set ourselves up for success. In this post, I discuss the concept of Machine Learning (specifically) as an operation.
So many things we do are part of a process. The processes that we undertake aim to achieve something. This could be as simple as the process of putting on your clothes to clothe yourself, or it could be a process of manufacturing. Each of these processes has components; these components are called operations.
If we consider this in the context of ML, we can see that there are distinct operations that need to be undertaken.
- How are we getting the data (upload/API etc)
- Pre-processing the data
- Model Assessment
These are major areas which need to happen when attempting to build ML models.
Who is this relevant for?
Those involved in Machine Learning.
When I say those involved in ML, I’m not talking about testing out linear regression vs decision tree. While in that scenario, these principles apply and is good practise, we can engage with such problems with ease, such as using notebooks, and if we dont, we wont have so much of an issue.
This is relevant for people engaged with out of the box problems. This is for people who have a ML problem that requires substantial testing to be conducted. This is when there is no obvious answer to the problem. The academic literature can provide us with direction of where we need to go, but we actually need to take the path ourselves. In taking this path, you need to have the right tools, the discussion of this topic is what those tools are.
This is so obvious: You’re wasting my time
Why have I decided to talk specifically about ML and not about Data Science etc?
Machine Learning is all about model building. Its whole aim to is to ‘minimise the error’. Thus, in order to help achieve this, the process of ML needs to be robust.
When a lot of people start out doing Data Science and Machine Learning, its pretty scrappy. Notebooks are scrappy, they allow us to explore ideas, test hypothesis, write comments. But in all honesty, they;re not usually utilised in a heavy duty fashion.
Like I said, Machine Learning at its core is about optimising some outcome. So how can we make doing ML a robust process.
Get everything ready!
This is the key to getting on with ML. Get everything ready.
That means build a robust method to ;
- Get your data in
- transform your data.(For this, you will already have an idea of what the data structure that will be inputted to model will look like, e.g. for CNN, you need to have a tensor of a particular dimension)
- Manage Experiments(Comet ML, ML Flow, check out this post for more involved discussion)
Once you are able to have these operations set-up. ALL you know focus on is building the ML models. With the ability to manage experiments, you can try models, optimise hyper-parameters and have the logs stored.
Kaggle is a very obvious example of the importance and value of ensuring robust ML operations exists.
The data is available, it needs to be transformed, and the aim of the competitions is to maximise some loss function. If we have the first two in place, combined with the ability to manage experiments, all we need to is to build the actual model so that it slots right in to the framework, and voila! We have the ability to test numerous models, with varying hyper parameters with ease.
An example of this modular framework can be seen in recent times with the development of OpenAI reinforcement learning package. This allows you to use a variety of RL algorithms on environments of your choice. All you have to do is build the environment in a particular way, slot it in, and off you go!