Investigating ferromagnetic transitions using ML (Final Presentation)

Team members: Ashish Panigrahi and S. Gautameshwar.

For the midway progress of the project, see here. All the code used for this project is hosted on GitHub at this link. The slides for the presentation can be found here.

Overview of the Ising Model

Previously covered during the midway presentation, we deal briefly with the one-dimensional Ising model and in the second half of the project, present the two-dimensional Ising spin model consisting of either spin-\(\uparrow\) or spin-\(\downarrow\).

Fig 1: 2D lattice consisting of spin-1/2 particles

The Hamiltonian of the Ising model is given by

\[ H(\sigma) = - \sum_{\langle i, j \rangle} J_{ij} \sigma_i \sigma_j \]

where the summation is carried over the first nearest neighbours i.e. \(j = i+1\) or \(j = i-1\). Hence the equivalent total "eigenenergy" of the lattice is similarly given by

\[ E(\sigma) = - \sum_{\langle i, j \rangle} J_{ij} \sigma_i \sigma_j \]

here \(\sigma_i = \pm1\), giving a scalar value for energy.

Pre-midway work

We investigated regression models for the one-dimensional Ising spin model consisting of 50 spin-1/2 particles. The idea was to be able to find the parameter quantifying interaction between the spin particles i.e. the coupling constant \(J\).

The assumption made was that the interaction is limited to only the first nearest neighbours of a spin particle (depicted by the notation \(\langle i, j \rangle\) under the summation in the equation for the Hamiltonian).

We trained our model using linear regression and its variants (Ridge and Lasso regression). Comparison between the various models was observed with linear regression giving a bad model due to overfitting. In contrast, Ridge and Lasso eliminated the problem of overfitting and gave us a fairly performant model.

To extend our model, we also experimented with the case where instead of a first-neighbour, extension of the interaction upto the second nearest neighbour was assumed and a model was trained on it using Ridge regression which gave us pretty good results.

Premise to investigating the 2D Ising model

The idea for post-midway work involving the two-dimensional Ising model is investigating 2D lattices made with spin-1/2 particles with the dynamics of the orientation of these particles governed by the temperature of the lattice where there exists a critical temperature \(T_c\) beyond which an originally ferromagnetic material becomes paramagnetic.

Analysis of the Monte-Carlo data for the model

For the testing and training data for our model, we generate our data using Monte-Carlo methods i.e. the metropolis algorithm.
We took a temperature range of (0.25-4.0) units ^[1] at an interval of 0.25 units and generated 10,000 lattices for every temperature for a total of 200,000 lattices.
Corresponding plots of energy and total heat capacity as a function of temperature are made for verifying the reliability of the Monte-Carlo data in accordance with statistical mechanics.

Plot of average energy as a function of temperature

Plot of heat capacity as a function of temperature

[1]	The unit for temperature is the natural units where we assume \(h = 1\), \(c = 1\) and so on.

Labelling data before supervised learning

While generating our data, we use a coupling strength value of \(J=2\). How does one proceed to determine the critical temperature? Theoretically shown by Onsager et al., \(T_c\) is given by

\[ T_c = J/ \log (1 + \sqrt{2}) \approx 2.26 \]

Afterwards, we split this data into critical and ordered/disordered sets and we expect the model to struggle in identifying the type of orderness of the lattices in the vicinity of the critical temperature.

Vicinity of critical temperature

In the left plot, the region marked red refers to the vicinity of the critical temperature where we label our lattices to be in the critical phase. The whole premise is for the model, given a lattice configuration, be able to determine if the phase is ordered or disordered.

Random Forests: the way to go?

Since our data was Monte-Carlo generated, we expect our predictions to be correlated. Weak classifiers such as decision trees being algorithms with high variance are not a good choice for eliminating errors due to such correlated data but introudcing ensemble learning techniques involving bagging and averaging of these decision trees helps lower such errors and the overall bias and variance, the algorithm which we commonly know as Random Forests [3].
Ensemble methods such as RF also reduces the risk of our model to rely on some greedy assumption or local search that may ge stuck in a local minima (quite common in Monte-Carlo data such as ours) and generates a better performing model.

Implementation of Random Forests

We used scikit-learn to implement our algorithm. The training was done on our ordered and disordered lattice datasets using the RF classifier scheme. However, the critical lattice samples were not used for training since we use it as a certificate for verifying the reliability of our model when classifying between the ordered and disordered phases especially within the critical regime (region we earlier marked red).

In the RF algorithm, two hyperparameters play significant roles in determining if our model works fairly well: - n_estimators i.e. the number of decision trees in our forest. - min_samples_split i.e. the leaf size (number of features) at each node.

During the implementation, we compared course trees (min_samples_split = 10000) with fine trees (min_samples_split = 2).

Results from Mehta et al.

Mehta et al. [1] implemented the random forests algorithm on Monte-Carlo generated data and as seen from the above plot, we see a training and testing score of almost 100%, which is expected since both the training and testing data were generated using the same Monte-Carlo approach.

However, the catch comes up when seeing the scores for critical samples. Accuracy scores of 69.2% and 83.1% were seen for course and fine trees respectively at 100 estimators each.

Is there a way we can improve on this score?

Transductive approach - an improvement to vanilla RF

In our original model implemented using standard RF algorithm, all lattices regardless of the state of order are given equal importance (weightage) based on which the hierarchy of features in the decision trees is determined.
Since we know (and also saw in the results by Mehta et al.) that the model struggles in the critical regime, our approach involves giving more weightage to lattice datapoints closer to the critical temperature over the ones that are farther away from it. This allows the model to pick more prominent features in these lattices.
It also lowers the priority of features present in the extreme ends of the orderness i.e. completely ordered/disordered since they don't play as important of a role as the lattices that are present in the neighbourhood of the critical point.

Implementation of this weighted approach

We simply cloned (duplicated) the lattice Ising data for points that were closer to the critical temperature \(T_c\) in order to increase the weights for these lattices compared to the farther ones for which no further cloning was done.

More weights given to points closer to the red region (compared to the ones far from it) who have more weights than their extremities

Results

Training scores obtained with our approach

Testing scores obtained with our approach

The results we obtained with our weighted approach shows an increase in the accuracy of our model when classifying critical samples compared to vanilla RF. Interestingly, by emphasising on lattice samples closer to the critical temperature \(T_c\), our approach works better for coarse trees with an accuracy of 92.0% compared to fine ones (88.75%).

Overall, we see a huge improvement over a vanilla RF critical score of 69.2%.

Conclusion

Our modified random forests algorithm is effective especially in classifying lattices in the critical region (along with other regions). Unfortunately, a downside to this algorithm is that it is computationally expensive with a runtime of ~90-170 seconds.
The accuracy of our model is comparable with unsupervised methods such as convolutional neural networks (CNNs) with an accuracy of ~88-92% [1].

Limitations and future work

We however, need good analytical data for our material in order to implement this algorithm for it demands us to know the critical temperature of the material beforehand (since we are using a supervised approach) with labels being assigned manually for ordered/disordered lattices to get a well-trained model.
We hope to work using an unsupervised approach as a next step in using a "generative" as opposed to a "discriminative" model in the future. Such models are capable of generating a new critical phase lattice using information from lattices used in our training data. This is achieved using restricted Boltzmann machines and deep Boltzmann machines [2].

Bibliography

Pankaj Mehta et al. “A high-bias, low-variance introduction to machine learning for physicists”. In: Physics reports 810 (2019), pp. 1-124
Morningstar, Alan, and Roger G. Melko. "Deep learning the ising model near criticality." arXiv preprint arXiv:1708.04622 (2017).
Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.
Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning. Cambridge University Press, 2020.