Research
These are the projects based on my academic research career, i.e. my Ph.D. and my current PostDoc.
MeLODy | Sept 2021 - Now
Investingating how we can using machine learning in observation and computational oceanography. Specifically, we are working with geophysical observations obtained from satellite altimetry and we would like to constrain the solution with ocean models, e.g. NEMO, MITGCM. We approach this problem from a data assimilation formulation but with a machine learning perspective. Currently, we have broken this problem into 3 parts:
- Using Neural Fields to interpolate sparse, noisy sea surface height data from altimetry satellites (
nerf4ssh
) - Using conditional generative models to learn surrogate models (
cflow4surrgate
) - Using neural networks to solve 4DVar problems (
modern4dvar
)
ERC | Mar 2017 - Sep 2020
In this project, I looked at various ways we could use machine learning to characterize uncertainty in geoscience applications. ML is useful but it is often not understood. However, we demonstrate that there are two aspects that would help alleviate the idea that ML is a “black-box”: 1) error propagation and 2) sensitivity analysis.
USMILE | Sept 2020 - Aug 2021
In this project, I looked at how we could use different machine learning methods to extract information from Earth System Data. We were given a massive amount of reanalysis data from various sources and we were interested to see how ML can be useful to extract information from said cubes. However, information theory is a difficult problem due to the curse of dimensionality which prohibits good density estimators that are necessary for said information theoretic metrics. We look at Gaussianization as a good density estimator with direct links to information theory due to the formulation.
Machine Learning “Sprints”
This are the projects that I was a part of for sprints; where I was a part of a multidisciplinary team looking to solve an applied problem using machine learning.
FDL Sprint (GLM) | Jun 2021 - Aug 2021
In this project, we were concerned with extracting lightning events from the GLM Lightning mapper. I was a part of a team that built an end-to-end machine learning pipeline that was able help do filter the point clouds and extract features that were indicative of lightning. We tried the standard techniques like PCA, some deep learning techniques like AutoEncoders, and finally some new SOTA like Graphical Neural Networks.
For more resources see:
ML4CC Sprint | Jan 2021 - Mar 2021
In this project, we were concerned with predicting flood extent after a large storm via an onboard model on a satellite. I was a part of a team that built an end-to-end machine learning pipeline that was able help do just that. We implemented the following pipeline:
- downloads multimodel remotely sensed multispectral images
- preprocesses heterogeneous data accordingly
- augments the images to increase sample size for training
- train a neural network model for image segmentation
- download a trained model to perform inference
- visualize the image segmented image
For more resources see:
- Project Overview
- GitHub Repo - the repo with all the codes
- JupyterBook - A detailed tutorial document for the end-to-end pipeline.
- Video
FDL Sprint (StarSpots) | Jun 2020 - Aug 2020
In this project, we were concerned with predicting spots on stars (as an indicator of star activity) from telescope data. I was a part of a team that built an end-to-end machine learning pipeline that was able help do just that. We were able to fit the data to get parameters using Bayesian inference. We were also able to use a neural network to predict star spot properties using transfer learning resulting in a x10K speed up!
For more resources see: