Who Borrows in India?
Analysis of loan transactions based on social and economic factors using Debt and Investment Survey 2019
Devansh Sharma SPS
Gaurav Kanu SPS
Supervisor: Dr. Amarendra Das
Github Repository
Motivation
Plan
- We will make two models, one that classifies if a person is eligible for a loan or not and second is how much loan can be given as a safe investment.
- With the help of Linear regression and Logisitc Regression we can make both the models.
- We will try to apply Neural Networks on our model to make it much better.
Algorithms and Datasets
Algorithms
- Support Vector Machines
- Linear Regression
- Logistic Regression
- Decision Trees/Random Forests
- Neural Networks
Datasets
- Primary dataset is to be searched
- Backup data from kaggle link
- Data from Lending Club's website
Division of Labour
- We have planned our tasks till project midway which means completing the Logisitic Regression for loan eligibilty
- Gaurav will work on finding new datasets and see the applicabilty of neural networks on our current model
- Devansh will try to implement Logistic Regression.
- Reports and website will be jointly handled.
Loan Prediction using ML
Devansh Sharma SPS
Gaurav Kanu SPS
Supervisor: Dr. Amarendra Das
Github Repository
Midway
The primary data for the loan prediction model is obtained from “National Sample Survey (NSS) 77th round”, and the datasets are collected by the government of India.
The raw data collected from NSS is not as tidy as the data-frame shown during the project proposal.
- Under NSS, this data is collected under All India Debt & Investment Survery-2019
- NSS surveyed over 118000 households and over 3 lakh individuals.Survey contains 14 blocks having data for basic information and debt information.
In India loans are usually returned as of by the whole household people than individually, so here we have related the people individually who have the access to credit cards and e-wallets. Because credit cards and e-wallets are the parameters which affects directly the chances of getting a loan from the bank.
The savings details of the individual person is not available, only the details of whether the person has issued a credit card or an e-wallet.
Plan
- Here we have two approaches, First we have to work on individual data and secondly we can work on household data
- In case of individual data, we have variables such as gender,eduation, age,land ,credit card and e-wallet data .Here we take gender,education, age as independent variable and owning credit card or e-wallet as dependent variable
- For household data we have information regard debt such as applied for loan ,amount of loan.With the help of the same independent variables we can work out independent variable loan payment .
Algorithms and Datasets
Algorithms
- Support Vector Machines
- Logistic Regression
- Decision Trees
- Naive Bayes
Dataset Description
- Descriptive identication of sample household
- identification of sample household
- particular of field operations
- demographic and other particulars of household members
- household characteristics
- Rural land owned
- Urban land owned
- building and other constructions owned
- livestock and poultry owned
- transport equipment owned
- agricultural machinery
- financial assets
- particulars of cash loans
- value of transactions by household
E-wallet prediction
Here we use 2 algorithms decision tree and naive bayes classifier with dependent variable as owning e-wallet.This model can be a good indicator for predicting socio-economic status of our country.
- The accuracy of decision tree is: 0.8108472028683712
- The accuracy of Naive baise is: 0.8019226854968764
Credit-card prediction
Here we use 2 algorithms decision tree and naive bayes classifier with dependent variable as owning credit card.This model can be a good indicator for predicting socio-economic status of our country.
- The accuracy of decision tree is: 0.9750426654558549
- The accuracy of Naive baise is: 0.9348353661400679
Dataset analysis
Histograms showing distributions of credit card and e-wallet usage amongst gender and education
Multiple Logistic Regression
ROCAUC curve for credit card and e-wallet