Machine Learning and its Scope:

Introduction:

Machine Learning, as the name suggests, is a branch of Computer Science that deals with ideas and algorithms on programming machines to be capable of learning out of their experience. The notion of ’Learning’ here is essentially that of machines making sense of the experience(data) fed to them to make smart decisions based on them. It is important to understand here that there exists a strong correlation between the data input to our algorithm (henceforth, referred to as model) and the accuracy of the predictions it has been asked to make. It is impractical to test the model for predictions that are beyond the scope of the data on which it has been trained. Machine Learning models work best and are most useful when there is a balance in generalization from the training data to the prediction it has been asked to make. The 3 scenarios below articulate the different degrees of generalization and how it impacts the capabilities of the model:

Scenario1:

The model is trained on data related to people’s medical history and is tested for data on their work efficiency

Here, we see that the training data is not representative of the testing data i.e they (in general) have a poor correlation. This corresponds to the case of too much generalization out of the training data. It is natural to obtain incorrect predictions from the model in such cases.

Scenario2:

The model is trained on data related to a people’s medical history and is tested for data on medical prescriptions

Here, we see that the training and testing datasets have too high a correlation. This corresponds to the case of memorization. It is expected to have a high degree of accuracy in such cases. However, such implementations do not capitalize on the actual power of Machine Learning as the probability of the need to make informed guesses in such cases is pretty low.

Scenario3:

The model is trained on data related to people’s medical history and is tested for data on possible future health risks

Here, the datasets have the perfect degree of correlation to implement Machine Learning. The desired output incorporates a probabilistic factor ruling out memorization and at the same time, the training data can offer some great insights for generalization. Such implementations strike the perfect balance between generalization and memorization and effectively utilize machine learning capabilities.

Key Takeaway:

Machine Learning works best and is most useful when:

i) No Deterministic Algorithm exists to the problem
ii) Enough data is available on the problem to be tackled
iii) There is a pattern in the data or a possibility to draw some conclusion out of the data

Behind the Scenes:

Machine Learning algorithms identify patterns in data that are relevant to the problem. The algorithms, in particular, structure (providing a framework to the data to easily answer the question at hand) the unstructured data. The structured data, thus obtained, abstracts a model which along with some statistics can be used to answer the question or make predictions on new but similar data.

Applications:

Based on the above criteria, some suitable applications of Machine Learning are:

i) Movie recommendations based on user’s past views or likings
ii) Prediction of future health risks based on past medical history
iii) Sales forecasting based on previous sales
iv) Spam filters