Skip to content

Intermediate Machine Learning in Python for Environmental Science Problems

License

Notifications You must be signed in to change notification settings

ekrell/ams_ai_shortcourse_2025

Repository files navigation

Intermediate Machine Learning in Python for Environmental Science Problems

A one-day short course presented at the American Meteorological Society (AMS) Annual Meeting 2025

  • New Orleans Ernest N. Morial Convention Center
  • January 12, 2025 at 8:00 AM - 3:30 PM Central Time (Hybrid)
  • Course Registration

Course Description

The major goal of this course is to help participants better apply ML for environmental science applications. Instead of focusing deeply on specific ML architectures, we are providing material that will be broadly useful across many environmental ML applications. With the rapid development of ML techniques, it is important for practitioners to be able to appropriately tailor these models for complex environmental applications which often use datasets that are high-dimensional and highly imbalanced.

ML model training tends to be very sensitive to the choice of hyperparameters. Beginner tutorials usually demonstrate tuning these with a simple grid search, if they demonstrate hyperparameter tuning at all. For complex environmental models, that grid search may be too computationally intensive. Here, we will demonstrate how to use packages for automatic hyperparameter tuning, and offer practical guidance on which tuning strategies are appropriate for different situations.

Another major concern is model evaluation. Using conventional metrics like accuracy and mean squared error, it is very easy to be misled regarding model performance. Here, we will demonstrate model evaluation with a variety of forecasting skill scores that are much better for capturing how the model performs for critical events instead of average performance. This is a major concern for meteorology where a model can have very high average performance, but fail to predict the extreme events (e.g. storms). In addition, we will demonstrate how to create and interpret evaluation graphics such as the receiver operating characteristic curve for a deeper understanding of model performance.

Imbalanced datasets are a major concern for ML modeling in environmental science. Very often, it is the rare events that we are most interested in predicting and where model performance is most critical. We will demonstrate several strategies for working with imbalanced datasets that can potentially improve model performance. These include sampling techniques as well as methods for generating synthetic examples of the minority class. We will also share some of the caveats and potential pitfalls associated with the methods.

Finally, we will discuss model interpretation through explainable AI (XAI) techniques. There are many reasons why users want to understand how their model works. XAI may reveal failure cases that could lead to ideas for improving the model. Or, the model may reveal that it has learned physically-realistic strategies which may help us trust the models more for use in critical situations. We will introduce XAI and demonstrate several commonly used methods. We will show how XAI can be used to investigate interesting cases in imbalanced datasets. We will also show examples of some of XAI pitfalls and how to avoid being misled by the explanations.

Authors (and their AMS talks)

Agenda

Topic Materials Instructor
Introduction slides Evan Krell
Imbalanced data slides, notebook Praveen Singh
Hyperparameter tuning notebook Christian Duff
Model evaluation notebook Evan Krell
Explainable AI notebook Evan Krell
Physics-informed ML notebook Kara D. Lamb
Small groups exercise notebook Everyone

About

Intermediate Machine Learning in Python for Environmental Science Problems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •