Project as a part of CS460- Machine Learning(2021)
By Anshada P M and Varun Manilal
Project Instructor: Dr.Liton Majumdar
PROJECT PROPOSAL
- Motivation:
An exoplanet or extrasolar planet is any planet beyond our solar system. There are many exoplanets and there are some methods of detection such as Radial velocity, microlensing and direct imaging.
But any of these methods can't detect exoplanets which are younger than 1000 years old. So here we are trying to create a model using mechine learning to infer the existance and masses from the gaps observed in protoplanatery disks (a rotating circumstellar disc of dense gas and dust surrounding a young newly formed star).
Protoplanetary disk of HL Tauri
- Plan:We plan to use an indirect method to probe and detect young planets in planet forming systems.
Near-infrared imaging with Atacama Large Millimeter Array (ALMA) has provided lots of images of protoplanetary disks showing rings and distinct gaps in star systems like HL Tau. So using this data as well as other simulations already performed, we can train our model to predict existence of exoplanets. The paper we referred to took only the case of ring-planet interaction whereas a planet might not exist actually in that ring. Also that model is limited only to low mass planets. We would like to improve the training data so as to account for larger planets also.
-
Dataset and algorithm: Dataset is provided by ALMA.
We try to make a deep neural network model using a multilayer
perceptron.
Examples of image data by ALMA
-
References:
PROJECT MIDWAY
The deep neural network implemented mainly using sequential model from
tensorflow keras model and has an input layer of six variables such as gap width, aspect
ratio, viscosity, dust to gas ratio, stokes number and density profile. This is followed by
two hidden layers of nodes numbering 256 and 128. And gives a single node output which is
Planet Mass.
The base code is given as follows;
For this planet prediction model the activation function relu was applied and L2
regularisation was used and RMSprop algorithm was used as optimiser function.
We experimented on optimizer function to minimize error. Keras provides us with a range
of optimizers out of which we applied the optimization algorithm adam which is based on
the SGD. Upon applying this lower training data was obtained but there was an increase
in the cross validation error as given below:
When SGD is used as optimizer function;
RMSprop is the most efficient optimiser function since it have least cross validation error.
The primary part of the project involves the designing of a convalutional neural network(CNN)
for image recognition from the data which has been obtained from ALMA. For that
we are currently trying to use the Network type RESNET50 whose structure we have
studied. Possible use of ALEXNET if we can obtain better results. For optimizer
in keras we are using Adam which uses a stochastic gradient descent method.
We might try Adadelta if time permits.
Isolation Forest
Isolation forest algorithm has been used to attempt to classify whether the
disks are host to planets or not. This has been implemented using sklearn library.
Functions used include make_pipeline , StandardScaler, GradientBoostingRegressor
and IsolationForest. Anomalies popped up as -1 and were compared for efficiency by
looking at Dust Gap as the parameter. Initial efficiencies are low and we plan to improve it by tweaking the parameters accordingly.
Contribution From Papers
We could find some useful papers from which we get the following contributions:
-
Atacama Large Millimeter Array (ALMA) has enabled us to observe Protoplanetary
Disks (PPDs) with good resolution. But this resolution has been observed not to
be adequate enough. Hence before running a CNN we need to use a super-resolution
imaging techniques to be able to observe the disks.
-
In one of the papers they utilised sparse modelling (SpM) to enhance the resolution
of the image by 30% compared to the conventional algorithm used. Since developing
a new super resolution algorithm is out of the scope we will try to obtain similar
algorithm to increase resolution so that we can run the CNN.
-
Using SpM a resolution of 5 x 4 au was obtained using which they observed disk
and ring structures of the a young triple system T Tau from the ALMA archival data.
-
Sparse modelling:
We are using sparse modelling(SpM) for reconstructing images from data
sets taken by ALMA, using input parameters as field of view, image pixel size and
regularization parameter set. Through this method we could reduce data size and increase
efficiency. We execute 10 fold cross validation with SpM and process imaging with
minimum cross validation error. A flowchart of image processing with SpM is given below:
References:
-
Yamaguchi, M., Tsukagoshi, T., Muto, T., Nomura, H., Nakazato, T.,
Ikeda, S., Tamura, M. and Kawabe,R. 2021.ALMA Super-resolution Imaging of T Tau:
r = 12 au Gap in the Compact Dust Disk around T Tau N.
https://arxiv.org/pdf/2110.00974.pdf
-
Yamaguchi, M., Akiyama, K., Tsukagoshi, T., Muto, T., Kataoka, A.,
Tazaki, F., Ikeda8, S., Fukagawa1, M., Honma, M., and Kawabe1, R. 2020. Super-resolution Imaging of the Protoplanetary Disk HD 142527 Using Sparse Modeling.
https://iopscience.iop.org/article/10.3847/1538-4357/ab899f/pdf
PROJECT FINAL
Planet Mass Prediction using Gradient Boosting Regressor
Good Results were achieved for Gradient Boosting Regressor when the number of boosting stages were taken as 100, maximum depth as 8, the minimum number of samples required to split a node was 50, and minimum number of samples at the leaf was taken to be 5. The base code is:
An example of the results is summarized in the table showing the actual and the predicted mass for the cross-validation data.
Planet mass prediction using Multi-Layer Perceptron
Planet mass prediction was tried using multilayer perceptron for various activation functions. Optimizers tried were RMSProp, Adam and SGD. The architecture has two hidden layers with number of nodes 256 and 128. We used the activation function Relu and L2 regularization.
In the case of RMSProp the training error was more than the cross-validation error. Hence the doubt of whether the data is overfit can be avoided. The model was run for 2000 epochs. The training results are:
The other optimizer function which gave good results was SGD. This was run for 600 epochs. The results are:
The error for other models were compared with our model. We have obtained better results than the Lodato and Kanagawa model. This graph shows the variation of error for these models:
And the graph given shows error in our model:
Moving on to the Image data
For image processing most of the data was collected from ALMA (primarily DSHARP) and a few other sources. Some of these images were in the form of raw data and had to undergo processing to obtain the final image. Processing of the data was done by NASA’s HEASARC: Software for the raw images. Since the image data was unlabelled, we undertook literature survey to collect data regarding the disks. We had a total of close to 120 images of protostellar disks out of which close to 60 of them were labelled with the planet mass.
Pre-processed image and image after processed
The images obtained from ALMA are observed to be quite blurred due to which the disk substructures are not clearly visible. This can hamper the results. Hence to solve this problem we undertake super resolution techniques to increase the resolution and clarity of the images. For this initially we had planned to implement sparse modelling but later used other two algorithms called BSRGAN and SwinIR. BSRGAN is a degradation model used to synthesize Low Resolution images and SwinIR does image restoration using Swin Transformer.
This is the architecture of SwinIR. SwinIR consists of three parts shallow feature extraction, deep feature extraction and high-quality image reconstruction. The deep feature extraction module is composed of several Swin Transformer blocks (RSTB) and each of this has 6 Swin Transformer layers along with a residual connection.
This is the schematic illustration of the BSRGAN degradation model. BSRGAN is a blind super-resolver which is trained with paired low resolution/high resolution images. In this the degradation sequence is randomly shuffled where Biso , Baniso refers to isotropic and anisotropic Gaussian kernels. Ds is the downsampling operation and N represents different types of noise.
These algorithms were applied for all images. These are the examples:
CNN using ResNet-50
ResNet50 is a variant of ResNet model which has 48 Convolution layers along with 1 MaxPool and 1 Average Pool layer. Using ResNet50 we can train very deep models and still obtain good accuracy. As number of layers increase deep neural networks accuracy saturates and then starts degrading rapidly. This is not due to overfitting as training error also increases. This is rectified by creating a deeper layer which has layers from the shallow model and then identity layers are added to it. This is the architecture of ResNet50 algorithm:
Before running the CNN, all images were converted to a resolution of 128X128 in the RGB format. The initial results from training for 200 epochs using the optimizer function Adam was obtained as:
Prediction of Other Disk Parameters
Another part of the project involved the prediction of other disk parameters from the image of the disk. For this we took up equations from Lodato et al. (2019) and Kanagawa et al. (2016). We have mainly focused on the prediction of viscosity of the disk, the dust gap width and the gas gap.
The equation used in Lodato et al. is an empirical relation to infer planet masses from observed dust gap widths. The equation is:
here wd is the dust gap width, M* is the parent stellar mass and MP is the planet mass.
The other equation is from Kanagawa et al. and is also called as the Kanagawa model. This is a more sophisticated equation and is given by:
where wd is the gas width, h0 is the disk’s local aspect ratio and ⍺ is the viscosity parameter.
To predict the relationship between the planet mass and disk parameters we used polynomial regression. The idea is that when we obtain the planet mass from the CNN-ResNet50 we can find the dust gap, gas width and viscosity based on the predictions offered by these polynomial regressions. For this we use the pre-processed dataset which we had worked on before. The dataset is created using the numeric data used earlier. The general code for these polynomial regressions are:
For gas width the fit is for second order in mass and the fit is obtained as such:
The planet mass is proportional to the cube of the dust gap. The plot gives a good fit.
The viscosity is predicted against planet mass for degree 0.5 and the fit obtained is not too good.
Now to present an example case we take the protostellar disk V883 Orionis. V883 Orionis is a
protostar in the
constellation of
Orion. It is assumed to be a member of the
Orion Nebula cluster at 414±7
pc. The true mass of primary planet in the disk is 0.897 M
ꙩ . The predicted values are:
Planet Mass Prediction = 1.0698 Mꙩ
Gap Gap = 0.1408
Dust Gap = 0.5715
Viscosity of disk = 0.00466
The predictions for other disk parameters other than Planet mass cannot be confirmed yet as research is ongoing.
RESULTS AND INFERENCES
We are able to predict for Planet mass and properties. The error is due to our attempt to predict Earth sized planets as well as hot Jupiters leading to a very huge range of masses for comparatively a smaller number of dataset.
Proposed Implementations and Ideas
We plan to run the CNN for Planet mass estimation after converting images to grayscale. This can help in faster implementation and we assume that patterns will be better recognised in greyscale (currently being implemented).
Another idea is to train labelled simulation data along with labelled image data to increase the number of training datasets to predict properties of the disk. (Probable implementation using image generation GAN)
This model can be improved if mixed data (numerical/categorical and image data) is available for all protoplanetary disks or if it can be simulated. We believe an MLP-CNN algorithm can better predict other disk properties than what we have presented here.