The Fokker-Planck equation typically refers to the equation
\[\partial_tf(x,t)= \text{div}(\nabla f(x,t)+ f(x,t)\lambda x),\; f:\mathbb{R}^d\times \mathbb{R}_+ \to \mathbb{R}\]where $\lambda\geq 0$ is some given constant. As the equation is meant to model mass distributions, the chief interest is in non-negative solutions which represent probability densities. Then, for each time $t>0$, we will assume the function $f(x,t)$ is a probality density function in $\mathbb{R}^d$ (if the initial data is a probability distribution then $f(x,t)$ will be a probability distribution for each $t>0$).
However, the following equation is also considered a Fokker-Planck equation:
\[\partial_tf(x,t)= \text{div}(\nabla f(x,t)+ f(x,t)\nabla \phi(x))\]Here, $\phi:\mathbb{R}^d\to\mathbb{R}$ is a given function which (under some circumstances) drives the equation towards an equilibrium given by a multiple of $e^{-\phi(x)}$. The most meaningful cases are those where $\phi$ is convex, and $\phi(x) = \lambda|x|^2/2$ corresponds to the original Fokker-Planck equation.
The entropy and the entropy production
If $f:\mathbb{R}^d\to\mathbb{R}$ is a probability density, one defines
\[H(f) = -\int f(x)(\log f(x)+ \lambda |x|^2/2 )\;dx\]This is called the entropy of $f$. To every $f$ we associate a function $p = p_f$ defined by
\[p = \log f(x)+ \frac{\lambda}{2}|x|^2\]This function will be called the pressure of $f$.
In terms of the pressure, the Fokker-Planck equation takes the form
\[\partial_tf = \text{div}(f \nabla p)\]and the entropy can be expressed as
\[H(f) = -\int f(x) p(x)\;dx\]Lemma. If $f(x,t)$ solves the Fokker-Planck equation, then $H(f)$ is increasing in time and $$\frac{d}{dt}H(f(t)) = \int f|\nabla p|^2\;dv $$
Proof.
This follows by a basic integration by parts, noting that $$ \frac{d}{dt}H(f) = -\int (\partial_t f) p\;dx - \int f \partial_t p\;dx $$ Since $\partial_t p = \partial_t \log f = f^{-1}\partial_t f$, we have $$ \int f \partial_t p\;dx = \int \text{div}(f\nabla p)\;dx = 0$$ Therefore $$ \frac{d}{dt}H(f(t)) = - \int \text{div}(f\nabla p) p\;dx = \int f |\nabla p|^2\;dx.$$
The integral on the right
\[\int f|\nabla p|^2\;dx\]is called the entropy production, and it is denoted by $D(f)$. This first lemma simply states that the derivative in time of $H(f(t))$ is equal to the entropy production, which is a non-negative quantity, and thus the entropy is always increasing.
Another interesting and less obvious fact is that the second derivative of $H(f(t))$ is non-positive, that is, that $D(f(t))$ is decreasing with time.
Lemma. The derivative of the entropy production is given by $$ \frac{d}{dt}D(f(t)) = -2\int f|D^2p|^2\;dx - 2\lambda \int f |\nabla p|^2\;dx $$
Proof.
From the definition of $p$ follows that $\partial_t p = \partial_t \log f$, and $$ \frac{d}{dt}D(f(t)) = \frac{d}{dt}\int f |\nabla p|^2\;dx \hspace{310px}$$ $$ \hspace{40px} = \int (\partial_tf) |\nabla p|^2\;dx + 2 \int f(\nabla p,\nabla \partial_t \log f)\;dx$$ Now, $\partial_t \log f = \Delta p + (\nabla \log f,\nabla p)$ so $$\nabla \partial_t\log f = \nabla \Delta p + \nabla (\nabla p - \frac{\lambda}{2}\nabla |x|^2,\nabla p) = \nabla \Delta p + \nabla |\nabla p|^2 -\nabla (\frac{\lambda}{2}\nabla |x|^2,\nabla p) $$ In particular, $$ 2 \int f(\nabla p,\nabla \partial_t \log f)\;dx \hspace{410px}$$ $$ \hspace{40px} = 2\int f(\nabla p,\nabla \Delta p)\;dx +2\int f(\nabla p,\nabla |\nabla p|^2)\;dx-\lambda \int f(\nabla p,\nabla (x,\nabla p))\;dx$$ $$ 2(\nabla p,\nabla \Delta p) = \Delta |\nabla p|^2-2\Gamma_2(p,p)$$ $$\int f \Delta |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx + 2 \int f(\nabla p, \nabla |\nabla p|^2)\;dx$$ $$= \int (\Delta f) |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx - 2 \int \text{div}(f\nabla p) |\nabla p|^2\;dx$$ Now, $\Delta f = \text{div}(\nabla f) = \text{div}(f\nabla p-\lambda x f) = \partial_tf - \lambda \text{div}(fx)$ $$\int f \Delta |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx + 2 \int f(\nabla p, \nabla |\nabla p|^2)\;dx$$ $$= -\int \partial_tf |\nabla p|^2\;dx-\lambda \int \text{div}(xf)|\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx $$ Then, $$ \frac{d}{dt}D(f(t)) = -2\int f\Gamma_2(p,p)\;dx-\lambda \int \text{div}(xf)|\nabla p|^2\;dx$$ $$ -2\int f(\nabla p,\nabla (\lambda x,\nabla p))\;dx$$ Since $\nabla (\lambda x,\nabla p) = \lambda \nabla p + \lambda (D^2p) x$, $$ -2\int f(\nabla p,\nabla (\lambda x,\nabla p))\;dx = -2\lambda \int f(\nabla p,\nabla p)\;dx - 2\lambda \int f(\nabla p,D^2p x)\;dx$$ $$ = -2\lambda \int f |\nabla p|^2\;dx - \lambda \int f(\nabla |\nabla p|^2,x)\;dx $$ $$ = -2\lambda \int f |\nabla p|^2\;dx + \lambda \int \text{div}(fx) |\nabla p|^2\;dx $$ In conclusion, $$ \frac{d}{dt}D(f(t)) = -2\int f|D^2p|^2\;dx - \lambda \int f|\nabla p|^2\;dx$$
As a corollary, we have an exponential bound on the entropy production $D(f(t))$, since
\[\frac{d}{dt}D(f(t)) \leq - 2\lambda D(f(t))\]From where it follows that
\[D(f(t)) \leq e^{-2\lambda t} D(f(0))\]The equilibrium distribution and exponential decay
If $f$ is a probability distribution such that $\nabla p = 0$, then there is some $c \in \mathbb{R}$ such that
\[\log f + \frac{\lambda}{2}|x|^2 = c\]In other words, $f$ must be given by
\[f = e^{c-\frac{\lambda}{2}|x|^2} = \frac{1}{Z_\lambda}e^{-\frac{\lambda}{2}|x|^2}\]Then, we define the equilibrium distribution function
\[f_\infty := \frac{1}{Z_\lambda}e^{-\frac{\lambda}{2}|x|^2}\]Such a function is a time-independent solution to the Fokker-Planck equation – indeed, it is the only stationary solution, since a stationary solution must necessarily have $\nabla p = 0$ in the set where $f>0$ thanks to the formula for $\frac{d}{dt}H(f(t))$.
In the case $\lambda>0$ this shows $D(f(t))$ is decaying exponentially fast as $t\to \infty$. This has an important consequence: note that for every $t>0$ we have
\[H(f_\infty)-H(f(t)) = \int_t^\infty D(f(s))\;ds \leq \int_t^\infty e^{-2\lambda (s-t)}D(f(t))\;ds\]in which case, using that the last integral in $s$ is equal to $\frac{1}{2\lambda}D(f(t))$, we obtain the inequality
\[H(f_\infty)-H(f) \leq \frac{1}{2\lambda}D(f)\]valid for all functions $f$. This inequality tells us that $D(f)$ bounds how far $f$ is from having the maximum possible entropy. Since $D(f(t))$ is decaying when $\lambda>0$, we conclude the entropy of $f(t)$ is converging exponentially fast to the maximum possible entropy.