CS460 Machine Learning
We all know the value of money. Everyone wants to earn to live a stable life. But we also want to become rich with low efforts and great advantages. We all have some kind of wish list in our mind and we need a lot of money to fulfill those desires. Stock market is one of the best platform for this. But again there are risks which we don't want. Or do you want to take risk and lose your money? No, right? Even though there are risks one can forecast the stocks by visualising the past stock values and some statistical factors.
In this project we are trying to plots of financial data of a specific company by using tabular data provided by "yfinance" python library. Again we will create a Support Vector Regression(SVR) model to predict upcoming stock prices. This will reduce the risk factor for the investors.
We are going to use Yahoo Finanace, Quandle and if possible some other platform from which we can get sufficient dataset for stock prediction. We will create a SVR model. We will split the data set into training and testing data. After training the model we will check the performance of the model by using different metrics(e.g mean square error, mean absolute error). We have already studied about the Support Vector Machine for Linear Regression and SVR uses same principle as SVM.
No team is a perfect team without teamwork and coordination. We will discuss everything about our work. But roughly saying, Ajaya is going to give the 1st and 2nd Presentation, create slides and Prateek is going to present the last presentation, provide the summary of all theories for the report. Creating ML model using python and handling website for reports will come under joint task.
Since we are taking support vector machine as our baseline, our main focus is to understand how the SVR algorithm works theoratically. After that we will try to build the model for it, test the performance of trained model, forecast the stock for a certain date by plotting a graph between "exponential moving average vs date". At the latter stage we are planning to compare the result with the result obtained by other algorithm for example Long Short-Term Memory (LSTM) model.
There are some difficulties we might face such as SVR model may not work for large data sets, choosing a good kernel and fine-tuning the hyperparameters are not easy task. So the final goal of the project to bulld a model that might overcome some of these obstacles.
1.Theoretical Analysis (Using different sources/papers)
Support Vector Regression (SVR):
A Support Vector Machine (SVM) is a discriminative classifier that is formally defined by the separating hyperplane. This algorithm outputs an optimal hyperplane which categorizes new examples. It is considered to be one of the most suitable algorithms available for time series prediction. This algorithm can be used for both classification and regression problems.
SVM involves plotting data as points in the multidimensional space. The dimensions represent the attributes (or parameters) of our given data. This algorithm sets a boundary on the dataset called hyperplane. The hyperplane separates our data points into separate classes. To find the best hyperplane (or decision boundary) is our goal. And by best we mean, the decision boundary would be associated with a maximum marginal distance.
Let µ be some unknown data point and w be the vector which is perpendicular to the hyperplane. So, now our decision rule will be,
+ b
………………………………….. (1)
Width of the margin of the hyperplane must be maximized to get a good hyperplane.
Width = [2 / ||||] ................................................... (2)
Max of Width = Max [2 / ||||] ..........................................(3)
Applying Lagrange's multiplier as,
L = ||
||² -
……………………………… (4)
where , ’s are the class classifications.
L = ……………………………………….. (5)
By finding the extremum of the above Lagrangian L, we get our desired result.
Now our decision rule will be,
……………………………………… (6)
If we get a dataset which is non-linear in the current dimensional space, we can map them to a new space with greater dimension than before.
The hyperplane in our new dimension is given by the equation,
………………………………. (7)
And the hyperplane must satisfy,
for positive samples, i.e. when
………………………..(8)
for the negative samples, i.e. when
……………….(9)
Here maps our independent values to the new space with greater dimension in which our dataset turns to be linearly separable.
To summarize the above two inequalities we can write,
……………………………………………………. (10)
In real world problems it can happen that even in the new space the dataset is not linearly separable. To counter this problem we introduce a variable, as a tolerance margin in the classification thresholds, making the classifier more flexible in accepting possible errors. Now the hyperplane condition in Eq. (10) becomes Eq. (11), and the problem of finding the optimal hyperplane becomes a convex optimization problem given by Eq. (12). In this equation, C is the adjustment parameter for the edge of the hyperplane with the smallest possible misclassification, under the conditions of Eq. (11).
………………………………. (11)
………………………………….... (12)
Now we come to SVR. It uses principles similar to SVM, but the response variable is a continuous value . Instead of seeking for the hyperplane in Eq. (11), SVR seeks the linear regression function, given by Eq. (13). To achieve this, a threshold error ε is defined to be minimized in the expression in Equation (14). This expression is called the ε-insensitivity loss error function. The SVR regression process therefore seeks to minimize ε in Eq. (14) and
in the expression of R, defined in Eq. (15).
……………………………………………… (13)
……. (14)
……………………………………. (15)
Again introducing the tolerance variables here as well, defining as the value in excess of ε and
to limit the value to the regression target. Thus, the minimization of Eq. (15) becomes Eq. (16), under the conditions of Eqs. (17) and (18) for
and
.
+
) ........................................................... (16)
…………………………………………… (17)
………………………………………….. (18)
Now let’s focus on the Kernel Function now. It is,
………………………………….. (19)
So, it is the dot product of the images of the vectors in our current space.
The perks of kernels is that we can get the dot products of the images without even knowing anything about the map
.
There are mainly four types of kernel function in the SVM algorithm, namely, linear, radial basis function (RBF) Eq.(20), polynomial Eq.(21) and sigmoid function Eq.(22). In this project we have used RBF, polynomial and sigmoid functions and have compared the results.
where gamma is a parameter ……………………….. (20)
where d and r are parameters …………………….. (21)
where
and r are parameters ……………... (22)
In the RBF kernel, is the squared Euclidean distance between two feature vector and
is a parameter which determines how much influence a single training data point has. RBF kernel is a function whose value depends on the distance from the origin or from some point.
In the polynomial kernel, d is the degree of the kernel and r is a constant term. Here, we simply calculate the dot product by increasing the power of the kernel.
An interesting fact to be noted is that the shape of the kernel function directly influences the values obtained by the SVR. Similarly, the constant c in Eq. (16) and the parameters and d in Eqs. (21) and (22) should be optimized.
2.Experiment and Results:
Using plotly we got a figure of opening and close price of actual data vs Date.
After getting the model we will predict the close price using different kernels. Here it is shown for the next 30 days.
Now, the question is which kernel we should use for better result. Let’s look into a statistical measure called the R-squared. It represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. We have calculated the R^2 value(R) for each kernel. We know that for R=1, it is considered as a best model and for R=0 or < 0 ; it is considered as a worse model. Considering the above measure, we can clearly see that ‘Sigmoid’ is a very bad kernel option for our dataset. That’s why we haven’t considered it for the ‘30 days prediction’. Now, comparing the above statistical measures for RBF and Polynomial kernel, we can conclude that for our experiment RBF kernel is the best model.
Final
Report: Review: Let’s
first go through what we have done till now(briefly). We have described how the
SVR algorithm works by giving a theoretical explanation. Further we moved into
our model. We created three different models using three different kernels. The
kernels we have used are RBF (Radial Basis Function), Polynomial (degree= 2)
and Sigmoid. The graphical representation of Close Price vs Date was presented
for these different models. We
used a specific set of values for the parameters of RBF kernel. The values for
C and gamma that are used are 100 and 0.1 respectively. For this model we got
almost around 93% accuracy. The polynomial kernel model gave us almost 74%
accuracy. [Here the term ‘accuracy’ is used for percentage value of R-squared
score. For example, for model-1 if r-squared value is 0.78 we are saying that
the model is 78% accurate.] We
created another model for RBF kernel with default parameters i.e., C=1 and
gamma=1. For this new model we got almost 83% accuracy. Here we present the
graph for two different RBF models. Fig.
RBF model with default parameters Fig.
RBF model with C=100 and gamma=0.1 Ideas
to improve accuracy: First,
we tried to improve the accuracy of RBF model with default parameters by tuning
the hyperparameters. To find a good set of values for C and gamma, we used Grid
Search method. We took a no. of sets of values of C and gamma as parameters
with 5-fold cross validation. Using Grid search we got the best set of C and
gamma value from those taken values. For this set of values, the RBF model gave
almost 96.5% accuracy. Another
idea is we can create a hybrid model. Instead of searching for an optimal set
of hyperparameters for RBF kernel, we tried to use hybrid model so that we can
get more accuracy than polynomial model. To create a hybrid model, we are
proposing to create a model with mixed kernel function. Let’s say we have ‘A’
as kernel function for RBF and ‘B’ as kernel function for polynomial. We
somehow create a kernel function K = aA+(1-a)B, where ‘a’ is a positive
constant such that 0<a<1. In this way, we can use the mixed kernel and
get benefitted by both RBF and polynomial kernel. We are proposing that the
value of ‘a’ should be taken small so that the impact of RBF will be less.
Unfortunately, we couldn’t provide an experimental proof or any result to
validate the idea due to some issues while running the python code. The last
paper in the reference list gives us some theoretical proof to back this idea.
So, we are providing another idea. Before
discussing the last idea, let’s explain what the polynomial kernel and RBF
kernel do. Polynomial and radial basis function (RBF) kernels have
complementary strengths. Polynomial kernels perform better for extrapolation.
RBF kernels give a better fit in the region covered by the training data Poly
kernel has extremely strong generalizability but weak learning capacity. By
contrast, Gaussian RBF is it has strong learning capacity but weak
generalizability. Based on these concepts, we are proposing an idea of creating
a hybrid model in which first we will train an RBF kernel model with original
training data set. Then we will predict the values when the input is training
data. Now we will create another polynomial kernel model which will be trained
by original x values of training data and predicted y values from above model.
This will be our hybrid model. This model gives us 82% accuracy which is better
than the polynomial kernel. Note: We are simply proposing this idea. We are not
providing any strong theoretical proof for this. But here is the result we got
for this hybrid kernel. Limitations
and Conclusion: ·
Finding optimal hyperparameters using Grid
Search is computationally expensive. ·
We still need a strong theoretical proof
to verify the effectiveness of the Hybrid Model for different datasets Also it
depends on the user how much RBF and polynomial part we need to create a good
hybrid model for the data. ·
With high prediction accuracy, SVR
implementation is easy and it is robust to outliers. For very large data sets,
it is not suitable. Here as our dataset is small the results might be a
consequence of overfitting. ·
The Hybrid Model-2 gives a good accuracy
of 82%. Reference: *Remark:
The proposed ideas for hybrid model-1 and 2 may not work for every data sets.
We are simply proposing some ideas without any valid proof that might work,
according to us. Also, these kinds of simple models may not work in real life
stock market prediction as the real stock market depends on a lot of factors.
Thank you.