# Neural ODEs as continuous network layers

In a previous article we talked about how to put neural networks inside ODEs to learn their dynamics from data. Armed with that knowledge we created a powerful weather forecasting model. But learning the dynamics of a process is only one side of the neural ODE story, they can also be used as very flexible function approximators much like regular neural network. In this article we are going to create continuous neural network layers using neural ODEs and see how they can be used to classify the Fashion MNIST dataset.

# Forecasting the weather with neural ODEs

Weather forecasting is a tricky problem. Traditionally, it has been done by manually modelling weather dynamics using differential equations, but this approach is highly dependent on us getting the equations right. To avoid this problem, we can use machine learning to directly predict the weather, which let’s us make predictions without modelling the dynamics. However, this approach requires huge amounts of data to reach good performance. Fortunately, there is a middle ground: What if we instead use machine learning to model the dynamics of the weather?

# Probabilistic modeling using normalizing flows pt.2

This is a follow-up post where we will see how to apply a normalizing flow model to learn the density of observed data. If you are not familiar with these models I recommend checking out the first part which explains how normalizing flows work and the math behind them. The promise of normalizing flows is that we can learn probability densities over our observations without having to model our entire domain by hand.

# Probabilistic modeling using normalizing flows pt.1

Probabilistic models give a rich representation of observed data and allow us to quantify uncertainty, detect outliers, and perform simulations. Classic probabilistic modeling require us to model our domain with conditional probabilities, which is not always feasible. This is particularly true for high-dimensional data such as images or audio. In these scenarios, we would like to learn the data distribution without all the modeling assumptions. normaizing flows is a powerful class of models that allow us to do just that, without resorting to approximations.

# Bayesian inference with Stochastic Gradient Langevin Dynamics

Modern machine learning algorithms can scale to enormous datasets and reach superhuman accuracy on specific tasks. Yet, they are largely incapable of answering “I don’t know” when queried with new data. Taking a Bayesian approach to learning lets models be uncertain about their predictions, but classical Bayesian methods do not scale to modern settings. In this post we are going to use Julia to explore Stochastic Gradient Langevin Dynamics (SGLD), an algorithm which makes it possible to apply Bayesian learning to deep learning models and still train them on a GPU with mini-batched data.