Predicting good strategies for picking players in Fantasy Premier League

A project by A.M. Sahu and P. Nandan

We love football

Being ardent fans of the Engligh Premier League, the Fantasy Premier League is a great way to keep ourselves invested in the games during the lull between two gameweeks.

The Premiership

The top flight of English Football has 20 teams.Every season each team plays twice against the other teams in a home/away format in about 38 gameweeks spread between August and May. There are about 10 matches per weekend. Each team fields a roster of 11 plus 9 bench players with 3 in match substitutions allowed.

Fantasy Premier League

Is a game played by fans of the premiership where they get to put their managerial hats on and choose a team of 15 players to play against all such other teams in a bid to place the highest in terms of points scored in order to win goodies and most often the bragging rights among friends.

The aim of the game is to chose a well balanced team of 15 players, for a budget fixed and then make one change weekly or two changes bi-weekly in a bid to maximise the points scored. Each week a team of 11 out of the 15 has to be fielded and the captain of the team has to be decided as his points get doubled.

Proposal

Fantasy premier league is more a game of skill for experienced players than it would seem. This correlation has been found in previous works. Those at the top 0.5% of the total FPL players have been doing continuously good over the years.

We intend to study the habits of these top managers and find strategies that they seem to employ to get consistent results year after year. We will also take the betting odds from different betting sites to incorporate a return ratio for each player to see how or if they seem to reflect the performances on pitch.

The end goal of this project is to make a program which can take your team data and the athelete data for the past weeks on top of historical data (past season performances) and give a prediction for the best possible transfer to make and the captain to choose. Higher the points scored w.r.t other will correlate to higher success of the algorithm.

Machine Learning Aspect

Pre-processing data (clustering based on team composition) to get players who are common in teams of current high ranked players and high ranked players of previous years, along with how similar are the teams.Analysing output as binary (1, if player is predicted to score > threshold points otherwise, 0)

Applying SVM
Using Neural Networks

We plan to do this analysis separately for goalkeepers, defenders, midfielders and forwards.

Additional work

If time permits we intend to go a step ahead and:

Try to find the difference makers, i.e. players not selected by many people that do end up performing well.
Modifying the algorithm to make an initial team based on prices and the budget of 100 mil at the start of the season, so we can see the performance of the program over the whole season.

Milestones

Half way targets achieved:

Scrapping of Data from the official FPL site of the current season(2021/22) and historical data of seasons till 2016/17.
Scrapping of betting odds of current and past seasons from the site "oddsportal.com".
Implementation of SVM and generating data and insights on player selection by the top managers the current season(2021/22) and the previous season(2020/21).
Correlating the results as obtained from SVM with the historical(test) data. Marked improvement observed on the prediction for current season.
Analysis of the 1st and 3rd papers from the relevant paper section. We tried to implement learnings from these papers into our code.
Link to all the files and the SVM code -->CODE.

Midway Results

SVM performs well when we included the data of the teams of top players of last and this season, and betting data.
For high threshold, when we are trying to find the difference makers, the model is able to have high accuracy as the number of 0 predictions is high. Still, half the predictions are the ones that actually perform well. Selecting captains would need a bit more surity.

Issues Identified

Problems that are obvious: Time- series nature of points is not being captured. Taking inspiration from the 3rd paper, we will try LSTM neural networks as they have shown good results for FPL for data of different seasons. Another idea to integrate is making windowed data, taking average of a few gameweeks to capture the variation

A lot of players don't get picked and end up getting 0 points. While it is important to predict these players so we don't pick them, they also skew our models so that predicting players who score too high becomes difficult. We need to find a modification that helps detect such extremums or find a way to remove these 0 players.

Final Results

To combat the problems identified earlier we used LSTM neural network and ARIMA(Autoregressive Integrated Moving Average Model) and took a hybrid model with certain ideal percentage of each system as our input to the SVM model.
LSTM-RNN doesn't perform well when asked to predict future performance of players based on their past season and previous game-weeks data.
Next we implemented ARIMA(Autoregressive Integrated Moving Average Model). Although it's not able to quantitatively predict the expected points for the coming game-weeks, it's able to qualitatively predict their form.
Hence we ran ARIMA to capture this form as 1 if the player does well than his average and 0 otherwise. Using this as a feature we ran our SVM again.
This time we ran SVM for each position on the field(GK, FW, MF, DF) separately
The code works reasonably well for GK, for MF and DF it works well when asked to predict for a threshold of 4 points per game-week. For FW it was unable to predict anything useful which can be attributed to the highly random nature of their performance.

Position	Point Threshold	Accuracy	Precision	Recall
GK	3	0.9268	0.7143	0.8333
GK	5	0.9268	0.5000	0.6667
MF	3	0.9145	0.8333	0.4762
MF	5	0.9539	1.0000	0.4167
DF	3	0.8720	0.6000	0.3333
DF	5	0.9680	1.0000	0.5000
FW	3	0.8431	0.5385	0.7778
FW	5	0.9020	0.4000	0.5000

Issues Identified

As sir rightly pointed out, A better way to check if LSTM RNN could have been useful for us would have been if we removed the spikes in our data first using averaging over different gameweeks. This would have helped in identifying any pattern if it exists.