top of page

How To Choose A Machine Learning Model

Writer's picture: Craig RisiCraig Risi



This article first appeared on Snapt.


In a previous article, we introduced machine learning and how it works. You can choose from a wide array of machine learning (ML) algorithms, but which model is best for solving your particular problem?


If you download a random ML model and hope it works for your needs, there is a good chance you will end up wasting a lot of effort working with an unsuitable model that leads you in the wrong direction.


Worse still, it can take a long time to realize that you have made a mistake when choosing your ML model. Small differences in the learning approach can have big effects on outcomes, but with learning potentially taking days, weeks, or months you might not see the limitations of your chosen model until you have wasted a lot of time.


This decision, therefore, has big ramifications for your development timeline and budget, which will determine the feasibility of your ML solution.


So, it’s important to choose wisely, and while it’s difficult to provide you with an exact recommendation for the model that will work for your given situation, this guide will explain the criteria you should use to identify the right machine learning model for your use case, priorities, and constraints.


1. What Level Of Quality Do You Need?

Consider whether your chosen ML model will produce results of sufficient quality. Some of the most popular quality metrics to consider include accuracy, precision, recall, and f1-score.


Not all of these metrics may be relevant to every situation, but the more metrics you can include, the better the quality of results you will hopefully get from your ML model.

Some ML models can produce unpredictable results, which some organizations or teams cannot tolerate. This is especially true when dealing with critical decisions concerning security, money, or people, where the consequences of poor analysis are too high.


Nevertheless, there are some models that have been proven to be effective for certain industries and provided companies can provide the correct datasets and set the right measurement metrics for the ML tooling, they can be assured that these models will produce reliable results.


Trade-offs for quality

In general, increased quality has the following trade-offs:

  • Bigger data sets

  • Longer training time

  • Slower inference time

2. Do You Need To Explain The Results?

Consider how easily you can explain, interpret, and justify the results of your chosen ML model.


Often, a model can be used effectively only if it is well understood. Unfortunately, many algorithms work like black boxes, and the results are hard to explain regardless of how good they are because you may not be able to correctly interpret the data.


In many situations, explaining the results of a model is paramount. The lack of explainability may be a dealbreaker in those situations, so you should err on the side of caution and use a model that is more easily understood.


Where explainability is important, use models such as linear regression and decision trees, and avoid neural networks, which are often quite difficult to understand.


Trade-offs for explainability

In general, increased explainability has the following trade-offs:

  • Less complexity

  • Smaller data sets

  • Fewer features

3. How Much Complexity Can You Handle?

Consider whether your data set or goals require a complex ML model.


A complex model can find more interesting patterns in the data and often leads to more profound and accurate insights. However, it will take a certain level of intellectual interpretation to be able to make use of those results.


Trade-offs for complexity

In general, increased complexity has the following trade-offs:

  • Better quality (sometimes)

  • Less explainability

  • Bigger data sets

  • Longer training time

  • Slower inference time

Putting explainability aside, the cost of building and maintaining a model is also a crucial factor in a successful project. A complex setup will have an increasing impact during the entire lifecycle of a model and will often require far more complex data models to be able to work, something that not every company will be in a position to provide.


4. What Is The Size Of Your Dataset?

Consider how much data you have and how much training data your chosen ML model needs to be effective.


When selecting ML models, it’s not just about the effectiveness of the model itself but also about the size of the datasets required for it to perform its role.


For example, a neural network is really good at processing and synthesizing tons of data, whereas a K-nearest neighbors (KNN) model works much better with fewer examples.

A related consideration is how much data you truly need to achieve good results.


Sometimes, you can build a great solution with 100 training examples; other times, you need 100,000.


Therefore, be sure to consider the amount of data available to you and the amount of data you will need for a model to be effective in your specific case.


Trade-offs for quality

In general, increased data size has the following trade-offs:

  • Better quality

  • More complexity

  • Longer training time

5. What Features Will Help You Integrate Easily?

Consider the different features available in your chosen ML model and the amount of configuration it offers.


An ML model doesn’t work in isolation and needs to factor into a wider software ecosystem. The features and configuration options of the ML model will either help with integration or present an obstacle to it.


Additionally, more features will often lead your model to come up with better solutions (improved quality).


However, more features might also increase the complexity of your model. So be careful when evaluating features and make sure that you really need them.


Trade-offs for features

In general, increased features and configuration have the following trade-offs:

  • Better integration

  • Less explainability

  • More complexity

6. How Long Can You Afford To Train Your Model?

Consider how long it takes and how much it costs to train your chosen ML model to achieve the quality metrics you need.


For example, would you choose a 98%-accurate model that costs $10,000 to train or a 97%-accurate model that costs $2,000?


The answer to this question depends on your priorities and budgets. How important is the accuracy of your model to you? How much money and time are you willing to invest in a model for it to start producing a return on investment?


Additionally, some models that need to incorporate new knowledge in near real-time can’t afford long training cycles, such as a recommendation system that needs to provide an output based on individual user preferences.


Trade-offs for training time

In general, increased training time has the following trade-offs:

  • Better quality

  • Higher cost

  • Slower delivery

7. Do You Need A Fast Inference Time?

Consider how quickly you need your chosen ML model to process data and deliver results.


Quality and accuracy are important, and you want to have an ML model that can deliver the results you need. However, sometimes you also need a model that can deliver results quickly. Here, we are not talking about the learning time that an ML model takes to learn but rather its processing speed in delivering output—also known as inference time.


If you intend to use ML to power a chatbot, which needs to perform quickly and provide a rapid response to the user, then it’s important that you look for an ML model with a fast inference time. The same applies to self-driving cars, which need to evaluate information and respond to it quickly to avoid crashes.


Not all ML models are designed to produce quick results, and some are focused on deep analysis, which may require wider datasets and more time to produce the results you need.


Trade-offs for inference time

In general, faster inference time has the following trade-offs:

  • Better for rapid-response use cases

  • Lower quality

There are many different ML models out there, and each one might be excellent for particular use cases. Every organizational need is different, and it’s crucial that you evaluate the things that are important to you before determining which ML model is best suited for your needs.


Too many ML journeys end badly because the organization didn’t fully understand its needs and selected a model that wasn’t fit for its intended purpose. These mistakes can be expensive and can lead organizations to abandon ML technology altogether, despite the potential it offers.


So, if you want to succeed in your ML journey, make sure you start by selecting the appropriate model first.

Comments


Thanks for subscribing!

bottom of page