We have used the term average on numerous occasions and it normally used to express an amount that is typical for a team or a player. It is very common, in a team or players’ analysis, to read about the average goals scored, bookings, corners and other statistical attributes. Huddersfield and Stoke City, both had an average of 0.736 goals for in the 2017/2018 Premiership season, but Stoke City got relegated and the Terriers finished 16th and stayed in the top tier of English football.(Read about expected goals and how to calculate them) These averages are an important part in understanding teams, players and leagues dynamics, but these averages should be treated with a certain degree of caution. The main reason is that averages can be misleading when the distribution is seriously skewed at one end. In other words, there is a strong influence of outliers that are pulling the average in one direction.
Averaging is not a complex method and it is used by punters to estimate performance, patterns and trends. It is crucial to understand averaging and its limitation for successful betting.
We can estimate different average values based on the mean, median and mode of a distribution. Based on the distribution of a dataset, some averaging methods are more appropriate than others. We will move forward with short explanations of these standards averaging methods and then discuss moving averages and their usefulness in sports betting.
The mean is the most common mathematical measure and it is used in general to estimate the average of a dataset. It is calculated by totalling all the values in a dataset and then dividing that sum by the number of values in that same dataset. The mean is best used when you have a distribution, which values are evenly spread, and the number of outliers is quite low. In statistics, an outlier is a value that differs greatly from the other values in the dataset. It is very important that these abnormal values be dealt with because they can affect how the data is perceived. This is the reason why exceptionally high or low values are taken out of a dataset in order to prevent the misleading impression of the estimate. Werder Bremen suffered a 6-2 defeat by visitors Bayer Leverkusen on Sunday (2018-10-28) and that defeat ended Werder’s 16-game unbeaten home run. Leverkusen scoring 6 goals away is an outlier that needs to be addressed when averaging the team’s away goals. Bayer Leverkusen has scored in total 3 goals in their last 4 away games and scoring 6 in their fifth is anomaly. Bayer’s average (mean) goals in the last 4 games (excluding the 6-2 result) is 0.75 and if we add their fifth game, their average becomes 1.8. This is a very good and recent example that shows how the mean can be misleading.
The median can show you the middle value in a dataset when it is arranged in order of magnitude from smallest to largest. The middle value in a dataset of 5 values is the third one and when we have an equal number of values, the middle one is the average of the two central values. In our Bayer Leverkusen example, the away goals scored are as follow:
[ 0, 1, 2, 0, 6]. The median in this example is 1 because the middle value in the arranged dataset [0, 0, 1, 2, 6] is one. We can notice here that the median is a good method when we have exceptionally high/low values in a dataset, because they have little influence on the outcome.
The mode represents the most common value in a dataset. The mode can be more than one, if the dataset values with equal frequency count or there may be no mode. Where no median or mean can be calculated, usually you can find the mode and the data is nominal.
Moving averages is another method based on a specified number of observations. These can provide hints in terms of the direction of the current trend. It is a customizable indicator, which means that the past time periods can be decided by the user. Furthermore, the shorter the number of time periods are, the more sensible the average will be to trends and momentum.
Single moving average (SMA) defines the number of observations that you will consider and in general it assumes that the recent past is the better predictor of the future. It is common to use a single moving average in teams’ or players’ estimates because it can potentially show you a certain momentum of play, motivation in that period of time, change in tactics or in players’ positions that produced a result. In calculating a moving average on a distribution, you need to determine the appropriate number of time periods (past counts). In a single moving average, you can take the values for the past 3 time periods, sum them up and divided them by the number of time periods. With the addition of new values, the data from an earlier time period is dropped and the new one is added.
The SMA model is not a complicated one and it is easier to understand but it is putting the same weight to all past observations and that might not be the best way to average values that are arriving consecutively in time. If we need to estimate a statistical attribute, we need to understand that each past value is significant and relevant, but the newer ones are more relevant than the older ones. Knowing all this, it will make more sense if we gradually decrease the weights placed on older values.
The exponential moving average (EMA) gives more weight to recent values and in this way, it makes it more responsive to new information, trends and patterns. This is one of the main differences with SMA – EMA responds more quickly to changing values. Also, the simple moving averages represent a “true” average of the values for the entire time period. In terms of averaging teams and players’ attributes, the exponential moving average sounds much better for the simple reason that it identifies trends and changes quicker than the simple moving average. Making note of outliers and removing these from a dataset, and then applying EMA can be a good indicator for a player’s form for example. The length of the time periods depends on the analysis of the dataset. Furthermore, experimenting with a number of different time periods until you find the one that best suits your strategy, is the best way to move forward here. The weights can be calculated based on analysis and it needs to be decided how much more the last match weights than the others. For example, we might conclude that in order to calculate the average away goals scored for Bayer Leverkusen, we will take the last 4 Leverkusen matches and apply 40%,30%,20% and 10% as weights to them. The most recent finished match will matter most and for this reason we are applying 40% weight on it. Most statistical packages have libraries that can easily calculate this for you.
SMA and EMA are generally used with time series data. As an example of time series data, in terms of sports betting, we can mention a distribution of any statistical attribute describing the performance of a team or a player. For instance, total shots per team for the past 20 matches is a time series data. If we want to make a step further, we will need to mention autoregressive integrate moving averages (ARIMA).
That model uses time series data as well in order to predict future trends by examining the differences between values in the time series data. It is a form of regression analysis that estimates the strength of one dependent variable relative to other changing variables. The (AR) autoregression part of the ARIMA model refers to the situation when a value from a time series data is regressed on previous values from that same time series. The integrated part, in simple terms, represents the number of differences between the data values and the previous values. The model relies on its own past data and for this reason, a longer series is preferable to get more accurate results. Also, we need to mention that the ARIMA relies on a stationary model which means that the measures of central tendency must remain the same over the course of the series. It is a complicated model that can be easily calculated with the help of statistical software and coding libraries.
Used cautiously, averages help to analyse patterns, hint trends and even points towards value bets (The story of small NFL underdogs). If you have just started with sports analysis and estimates, understanding of the above methods will help you to easily create insightful analysis. IT is important to distinguish between simple moving averages and weighted ones and decide whether weights for previous values/attributes are necessary for better estimates of performance.
Georgie has been in the betting industry for over 11 years, working as a trader and a broker for some of the largest syndicates in the world. Georgie has focused his model development on international soccer leagues.