# The Fokker-Planck equation: entropy and convergence to equilibrium

(29 Apr 2023)

The Fokker-Planck equation typically refers to the equation

$\partial_tf(x,t)= \text{div}(\nabla f(x,t)+ f(x,t)\lambda x),\; f:\mathbb{R}^d\times \mathbb{R}_+ \to \mathbb{R}$

where $\lambda\geq 0$ is some given constant. As the equation is meant to model mass distributions, the chief interest is in non-negative solutions which represent probability densities. Then, for each time $t>0$, we will assume the function $f(x,t)$ is a probality density function in $\mathbb{R}^d$ (if the initial data is a probability distribution then $f(x,t)$ will be a probability distribution for each $t>0$).

However, the following equation is also considered a Fokker-Planck equation:

$\partial_tf(x,t)= \text{div}(\nabla f(x,t)+ f(x,t)\nabla \phi(x))$

Here, $\phi:\mathbb{R}^d\to\mathbb{R}$ is a given function which (under some circumstances) drives the equation towards an equilibrium given by a multiple of $e^{-\phi(x)}$. The most meaningful cases are those where $\phi$ is convex, and $\phi(x) = \lambda|x|^2/2$ corresponds to the original Fokker-Planck equation.

## The entropy and the entropy production

If $f:\mathbb{R}^d\to\mathbb{R}$ is a probability density, one defines

$H(f) = -\int f(x)(\log f(x)+ \lambda |x|^2/2 )\;dx$

This is called the entropy of $f$. To every $f$ we associate a function $p = p_f$ defined by

$p = \log f(x)+ \frac{\lambda}{2}|x|^2$

This function will be called the pressure of $f$.

In terms of the pressure, the Fokker-Planck equation takes the form

$\partial_tf = \text{div}(f \nabla p)$

and the entropy can be expressed as

$H(f) = -\int f(x) p(x)\;dx$

Lemma. If $f(x,t)$ solves the Fokker-Planck equation, then $H(f)$ is increasing in time and $$\frac{d}{dt}H(f(t)) = \int f|\nabla p|^2\;dv$$

Proof.

This follows by a basic integration by parts, noting that $$\frac{d}{dt}H(f) = -\int (\partial_t f) p\;dx - \int f \partial_t p\;dx$$ Since $\partial_t p = \partial_t \log f = f^{-1}\partial_t f$, we have $$\int f \partial_t p\;dx = \int \text{div}(f\nabla p)\;dx = 0$$ Therefore $$\frac{d}{dt}H(f(t)) = - \int \text{div}(f\nabla p) p\;dx = \int f |\nabla p|^2\;dx.$$

The integral on the right

$\int f|\nabla p|^2\;dx$

is called the entropy production, and it is denoted by $D(f)$. This first lemma simply states that the derivative in time of $H(f(t))$ is equal to the entropy production, which is a non-negative quantity, and thus the entropy is always increasing.

Another interesting and less obvious fact is that the second derivative of $H(f(t))$ is non-positive, that is, that $D(f(t))$ is decreasing with time.

Lemma. The derivative of the entropy production is given by $$\frac{d}{dt}D(f(t)) = -2\int f|D^2p|^2\;dx - 2\lambda \int f |\nabla p|^2\;dx$$

Proof.

From the definition of $p$ follows that $\partial_t p = \partial_t \log f$, and $$\frac{d}{dt}D(f(t)) = \frac{d}{dt}\int f |\nabla p|^2\;dx \hspace{310px}$$ $$\hspace{40px} = \int (\partial_tf) |\nabla p|^2\;dx + 2 \int f(\nabla p,\nabla \partial_t \log f)\;dx$$ Now, $\partial_t \log f = \Delta p + (\nabla \log f,\nabla p)$ so $$\nabla \partial_t\log f = \nabla \Delta p + \nabla (\nabla p - \frac{\lambda}{2}\nabla |x|^2,\nabla p) = \nabla \Delta p + \nabla |\nabla p|^2 -\nabla (\frac{\lambda}{2}\nabla |x|^2,\nabla p)$$ In particular, $$2 \int f(\nabla p,\nabla \partial_t \log f)\;dx \hspace{410px}$$ $$\hspace{40px} = 2\int f(\nabla p,\nabla \Delta p)\;dx +2\int f(\nabla p,\nabla |\nabla p|^2)\;dx-\lambda \int f(\nabla p,\nabla (x,\nabla p))\;dx$$ $$2(\nabla p,\nabla \Delta p) = \Delta |\nabla p|^2-2\Gamma_2(p,p)$$ $$\int f \Delta |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx + 2 \int f(\nabla p, \nabla |\nabla p|^2)\;dx$$ $$= \int (\Delta f) |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx - 2 \int \text{div}(f\nabla p) |\nabla p|^2\;dx$$ Now, $\Delta f = \text{div}(\nabla f) = \text{div}(f\nabla p-\lambda x f) = \partial_tf - \lambda \text{div}(fx)$ $$\int f \Delta |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx + 2 \int f(\nabla p, \nabla |\nabla p|^2)\;dx$$ $$= -\int \partial_tf |\nabla p|^2\;dx-\lambda \int \text{div}(xf)|\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx$$ Then, $$\frac{d}{dt}D(f(t)) = -2\int f\Gamma_2(p,p)\;dx-\lambda \int \text{div}(xf)|\nabla p|^2\;dx$$ $$-2\int f(\nabla p,\nabla (\lambda x,\nabla p))\;dx$$ Since $\nabla (\lambda x,\nabla p) = \lambda \nabla p + \lambda (D^2p) x$, $$-2\int f(\nabla p,\nabla (\lambda x,\nabla p))\;dx = -2\lambda \int f(\nabla p,\nabla p)\;dx - 2\lambda \int f(\nabla p,D^2p x)\;dx$$ $$= -2\lambda \int f |\nabla p|^2\;dx - \lambda \int f(\nabla |\nabla p|^2,x)\;dx$$ $$= -2\lambda \int f |\nabla p|^2\;dx + \lambda \int \text{div}(fx) |\nabla p|^2\;dx$$ In conclusion, $$\frac{d}{dt}D(f(t)) = -2\int f|D^2p|^2\;dx - \lambda \int f|\nabla p|^2\;dx$$

As a corollary, we have an exponential bound on the entropy production $D(f(t))$, since

$\frac{d}{dt}D(f(t)) \leq - 2\lambda D(f(t))$

From where it follows that

$D(f(t)) \leq e^{-2\lambda t} D(f(0))$

## The equilibrium distribution and exponential decay

If $f$ is a probability distribution such that $\nabla p = 0$, then there is some $c \in \mathbb{R}$ such that

$\log f + \frac{\lambda}{2}|x|^2 = c$

In other words, $f$ must be given by

$f = e^{c-\frac{\lambda}{2}|x|^2} = \frac{1}{Z_\lambda}e^{-\frac{\lambda}{2}|x|^2}$

Then, we define the equilibrium distribution function

$f_\infty := \frac{1}{Z_\lambda}e^{-\frac{\lambda}{2}|x|^2}$

Such a function is a time-independent solution to the Fokker-Planck equation – indeed, it is the only stationary solution, since a stationary solution must necessarily have $\nabla p = 0$ in the set where $f>0$ thanks to the formula for $\frac{d}{dt}H(f(t))$.

In the case $\lambda>0$ this shows $D(f(t))$ is decaying exponentially fast as $t\to \infty$. This has an important consequence: note that for every $t>0$ we have

$H(f_\infty)-H(f(t)) = \int_t^\infty D(f(s))\;ds \leq \int_t^\infty e^{-2\lambda (s-t)}D(f(t))\;ds$

in which case, using that the last integral in $s$ is equal to $\frac{1}{2\lambda}D(f(t))$, we obtain the inequality

$H(f_\infty)-H(f) \leq \frac{1}{2\lambda}D(f)$

valid for all functions $f$. This inequality tells us that $D(f)$ bounds how far $f$ is from having the maximum possible entropy. Since $D(f(t))$ is decaying when $\lambda>0$, we conclude the entropy of $f(t)$ is converging exponentially fast to the maximum possible entropy.