Birds and frogs

OT+MFG reading 1: Variational Mean Field Games

2023-05-29T00:00:00+00:00

This will be one (of hopefully a complete sequence) of posts I am writing as I read about Optimal Transport and/or Mean Field Games. I will not try to give anything like a complete picture or historical introduction here, but discuss and summarize papers in the order that I read them which will likely result in a very nonlinear chronology. For now I will say that I have fond memories as a graduate student circa 2006-2010 when each year for a few weeks in February Pierre Louis Lions would come to Austin and lecture on Mean Field Games. At the time I barely knew any PDE and I could only follow the first couple of minutes of each lecture, but those lectures were the first place where I heard about hydrodynamic limits, Nash equilibria, and infinite dimensional Hamilton-Jacobi equations. Maybe as I work my way back to the papers my memories will get refreshed and I might be able to add more to my recollections.

I am going to start my reading with the paper Variational Mean Field Games by Benamou, Carlier, and Santambrogio in Active Particles, 2017.

The basic problem

The motivating question to mean field games is understanding $N$-player differential games when $N$ is large i.e. in the limit $N\to \infty$.

Let us first describe the $N$-player gamer. For each $i$ ($i=1,\ldots,N$) player $i$ chooses their trajectory $x_i(t)$ by optimizing (minimizing) the objective functional

\[x_i \mapsto \int_0^T \frac{1}{2}|\dot x_i(t)|^2 + g_i(x_1(t),\ldots,x_N(t)) \;dt + \Psi_i(x_i(T))\]

This objective functional covers the time interval $[0,T]$ and it is made out of three parts. First there is a ``kinetic energy’’ term. Second there is the integral over time of the quantity $g_i(x_1(t),\ldots,x_2(t))$, which is how the different interact with each other. Lastly, there is a term $\Psi_i$ which is a contribution to the objective functional depending only on the position of player $i$ at the final time $T$.

The interaction terms $g_1,\ldots,g_N$ are chosen so as to model a key feature of the game, which is that for each player the other $N-1$ players are undistinguishable from each other. This means that each $g_i$ has the same value if one reshuffles all the players other than $i$, and that this relation is the same for each $i$. Indeed, in such a case we can express all the $g_i$ in the form

\[g_i(x_1,\ldots,x_N) = g(x_i, \frac{1}{N-1}\sum \limits_{j\neq i}\delta_{x_j})\]

where $g(x,\mu)$ is a real valued function that depends on a point $x \in \mathbb{R}^d$ and a probability distribution $\mu \in \mathcal{P}(\mathbb{R}^d)$, so $g:\mathbb{R}^d\times \mathcal{P}(\mathbb{R}^d)\to\mathbb{R}$.

With this setup for each $N$, one wants to understand Nash equilibria for the game for large $N$. As the players are undistinguishabel for one another one cares about the overall distribution of players in such equilibria as $N \to \infty$, and so one comes to the problem of analyzing the $N\to \infty$ limit of the time-dependent probability measures

\[\mu^{(N)}_t := \frac{1}{N}\sum \limits_{i=1}^N \delta_{x_i^{N,*}(t)}\]

where for each $N$, the $x_1^{N,*},\ldots,x_N^{N,*}$ form a Nash equilbrium for the corresponding game.

The continuum model, part 1

The question of the convergence of the Nash equilibria as $N\to \infty$ is an important and delicate one that will not be discussed here. Instead, let us describe the problem one expects to obtain in the limit. By this we mean one where (in an ideal world) the measures $\mu^{(N)}_t$ given by the Nash equilibria converge to a $\mu_t$ which is the unique solution to this putative problem.

Suppose we are given such an evolution of probability measures $\mu_t$. Given such a distribution, what does it mean for a single agent to have an optimal trajectory? It means that its trajectory $\gamma(t)$ minimizes the functional

\[J_{\mu}(\gamma) := \int_0^T\frac{1}{2}|\dot \gamma(t)|^2 + g(\gamma(t),\mu_t)\;dt + \Psi(\gamma(T))\]

over all $\gamma$’s with given initial value $\gamma(0)$. Any minimizer of this functional has a simple characterization in terms of the value function,

\[\dot \gamma(t) = -(\nabla \phi)(\gamma(t),t)\]

where the value function $\phi$ is defined as

\[J(x,t) = \inf \left \{ \int_{t}^T\frac{1}{2}|\dot\gamma(s)|^2+g(\gamma(s),\mu_s)\;ds + \Psi(\gamma(T)) \mid \sigma(t) = x \right \}\]

This characterization, at least as stated, works only as long as the value function is a differentiable function. For a differentiable value function $\phi$ it is a classical fact that it must solve the Hamilton-Jacobi equation

\[-\partial_t \phi + \frac{1}{2}|\nabla \phi|^2 = h(x,t)\]

where the function $h$ is defined by $h(x,t) := g(x,\mu_t)$ for every $(x,t)$. In general the value function might not be differentiable but it will solve the equation above in the viscosity sense.

The continuum model, part 2

Whatever the limiting problem as $N\to \infty$, we expect the curve of empirical probability measures $\mu^{(N)}_t$ just constructed to converge to a curve of probability measures $\mu_t$. For an equilibrium situation, each ofn the particles that make up $\mu^{(N)}_t$ will be moving according to the Euler-Lagrange equation, that is their dynamics are governed by a flow (from the previous discussion we expect this flow to be $v(x,t) = -\nabla\phi(x,t)$ for a $\phi$ solving a Hamilton-Jacobi equation)

Therefore, in the limit model one expects to have a curve of measures $\mu_t$ having the form

\[\mu_t = (\Phi_t)_{\#}\mu_0\]

where $\Phi_t$ is the evolution map $\Phi_t:\mathbb{R}^d \to \mathbb{R}^d$ given by some vector field $v(x,t)$. In such a case $\mu_t$ and $v$ will solve the continuity equation

\[\partial_t \mu_t + \text{div}(\mu_t v) = 0\]

Combining this equation with the condition $v= -\nabla \phi$ from earlier, we arrive at the following system of equations

\[\left \{ \begin{array}{rl} \partial_t \mu_t - \text{div}(\mu_t \nabla \phi) & = 0 \\ -\partial_t \phi + \frac{1}{2}|\nabla \phi|^2 & = g(x,\mu_t)\end{array} \right.\]

Together with the boundary conditions

\[\phi(x,T) = \Psi(x),\; (\mu_t)_{\mid t=0} = \mu_0\]

These equations and boundary conditions form what is known as a mean field game, and we call the above the Mean Field Game (MFG) equations. A solution gives a pair $(\mu_t,\phi)$ describing an equilibrium situation for a game. Ideally, one expects (and would like to show) that the limit of the measures $\mu^{(N)}_t$ gives a $\mu_t$ which together with some $\phi$ form a solution to the MFG equations.

A variational principle

Lasry and Lions showed that for the MFG system one obtains a remarkable simplification of the problem in comparison to the case of finite $N$, that is, one can characterize equilibria for the MFG as minimizers for a variational problem.

The problem is the following (here, one needs only consider measures of the form $\mu_t = \rho(x,t) \;dx$).

\[\begin{array}{rl} \text{Minimize } & (\rho,v)\mapsto \int_0^T\int \frac{1}{2}\rho(x,t) |v(x,t)|^2 + G(x,\rho(x,t))\;dxdt + \int_{\mathbb{R}^d}\Psi(x)\rho(x,T)\;dx \\ \text{subject to } & \partial_t\rho + \text{div}(\rho v) = 0 \text{ and } \rho(x,0) = \rho_0(x) \end{array}\]

Here, $G:\mathbb{R}^d\times \mathbb{R} \to \mathbb{R}$ is the function defined by $\partial_\beta G(\alpha,\beta) = g(\alpha,\beta)$ for $\beta>0$, $G(\alpha,0) = 0$ and $G(\alpha,\beta) = +\infty$ for $\beta<0$.

For those familiar with optimal transport, this resembles the Benamou-Brenier problem for the optimal transport problem with quadratic cost. Indeed, the difference here is that we have added additional terms (the integral involving $G$ and $\Psi$). Accordingly, the fields of optimal transport and mean field games are closely related. We will explore be exploring this connection in future posts.

The Fokker-Planck equation: entropy and convergence to equilibrium

2023-04-29T00:00:00+00:00

The Fokker-Planck equation typically refers to the equation

\[\partial_tf(x,t)= \text{div}(\nabla f(x,t)+ f(x,t)\lambda x),\; f:\mathbb{R}^d\times \mathbb{R}_+ \to \mathbb{R}\]

where $\lambda\geq 0$ is some given constant. As the equation is meant to model mass distributions, the chief interest is in non-negative solutions which represent probability densities. Then, for each time $t>0$, we will assume the function $f(x,t)$ is a probality density function in $\mathbb{R}^d$ (if the initial data is a probability distribution then $f(x,t)$ will be a probability distribution for each $t>0$).

However, the following equation is also considered a Fokker-Planck equation:

\[\partial_tf(x,t)= \text{div}(\nabla f(x,t)+ f(x,t)\nabla \phi(x))\]

Here, $\phi:\mathbb{R}^d\to\mathbb{R}$ is a given function which (under some circumstances) drives the equation towards an equilibrium given by a multiple of $e^{-\phi(x)}$. The most meaningful cases are those where $\phi$ is convex, and $\phi(x) = \lambda|x|^2/2$ corresponds to the original Fokker-Planck equation.

The entropy and the entropy production

If $f:\mathbb{R}^d\to\mathbb{R}$ is a probability density, one defines

\[H(f) = -\int f(x)(\log f(x)+ \lambda |x|^2/2 )\;dx\]

This is called the entropy of $f$. To every $f$ we associate a function $p = p_f$ defined by

\[p = \log f(x)+ \frac{\lambda}{2}|x|^2\]

This function will be called the pressure of $f$.

In terms of the pressure, the Fokker-Planck equation takes the form

\[\partial_tf = \text{div}(f \nabla p)\]

and the entropy can be expressed as

\[H(f) = -\int f(x) p(x)\;dx\]

Lemma. If $f(x,t)$ solves the Fokker-Planck equation, then $H(f)$ is increasing in time and $$\frac{d}{dt}H(f(t)) = \int f|\nabla p|^2\;dv $$

Proof.

This follows by a basic integration by parts, noting that $$ \frac{d}{dt}H(f) = -\int (\partial_t f) p\;dx - \int f \partial_t p\;dx $$ Since $\partial_t p = \partial_t \log f = f^{-1}\partial_t f$, we have $$ \int f \partial_t p\;dx = \int \text{div}(f\nabla p)\;dx = 0$$ Therefore $$ \frac{d}{dt}H(f(t)) = - \int \text{div}(f\nabla p) p\;dx = \int f |\nabla p|^2\;dx.$$

The integral on the right

\[\int f|\nabla p|^2\;dx\]

is called the entropy production, and it is denoted by $D(f)$. This first lemma simply states that the derivative in time of $H(f(t))$ is equal to the entropy production, which is a non-negative quantity, and thus the entropy is always increasing.

Another interesting and less obvious fact is that the second derivative of $H(f(t))$ is non-positive, that is, that $D(f(t))$ is decreasing with time.

Lemma. The derivative of the entropy production is given by $$ \frac{d}{dt}D(f(t)) = -2\int f|D^2p|^2\;dx - 2\lambda \int f |\nabla p|^2\;dx $$

Proof.

From the definition of $p$ follows that $\partial_t p = \partial_t \log f$, and $$ \frac{d}{dt}D(f(t)) = \frac{d}{dt}\int f |\nabla p|^2\;dx \hspace{310px}$$ $$ \hspace{40px} = \int (\partial_tf) |\nabla p|^2\;dx + 2 \int f(\nabla p,\nabla \partial_t \log f)\;dx$$ Now, $\partial_t \log f = \Delta p + (\nabla \log f,\nabla p)$ so $$\nabla \partial_t\log f = \nabla \Delta p + \nabla (\nabla p - \frac{\lambda}{2}\nabla |x|^2,\nabla p) = \nabla \Delta p + \nabla |\nabla p|^2 -\nabla (\frac{\lambda}{2}\nabla |x|^2,\nabla p) $$ In particular, $$ 2 \int f(\nabla p,\nabla \partial_t \log f)\;dx \hspace{410px}$$ $$ \hspace{40px} = 2\int f(\nabla p,\nabla \Delta p)\;dx +2\int f(\nabla p,\nabla |\nabla p|^2)\;dx-\lambda \int f(\nabla p,\nabla (x,\nabla p))\;dx$$ $$ 2(\nabla p,\nabla \Delta p) = \Delta |\nabla p|^2-2\Gamma_2(p,p)$$ $$\int f \Delta |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx + 2 \int f(\nabla p, \nabla |\nabla p|^2)\;dx$$ $$= \int (\Delta f) |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx - 2 \int \text{div}(f\nabla p) |\nabla p|^2\;dx$$ Now, $\Delta f = \text{div}(\nabla f) = \text{div}(f\nabla p-\lambda x f) = \partial_tf - \lambda \text{div}(fx)$ $$\int f \Delta |\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx + 2 \int f(\nabla p, \nabla |\nabla p|^2)\;dx$$ $$= -\int \partial_tf |\nabla p|^2\;dx-\lambda \int \text{div}(xf)|\nabla p|^2\;dx-2\int f\Gamma_2(p,p)\;dx $$ Then, $$ \frac{d}{dt}D(f(t)) = -2\int f\Gamma_2(p,p)\;dx-\lambda \int \text{div}(xf)|\nabla p|^2\;dx$$ $$ -2\int f(\nabla p,\nabla (\lambda x,\nabla p))\;dx$$ Since $\nabla (\lambda x,\nabla p) = \lambda \nabla p + \lambda (D^2p) x$, $$ -2\int f(\nabla p,\nabla (\lambda x,\nabla p))\;dx = -2\lambda \int f(\nabla p,\nabla p)\;dx - 2\lambda \int f(\nabla p,D^2p x)\;dx$$ $$ = -2\lambda \int f |\nabla p|^2\;dx - \lambda \int f(\nabla |\nabla p|^2,x)\;dx $$ $$ = -2\lambda \int f |\nabla p|^2\;dx + \lambda \int \text{div}(fx) |\nabla p|^2\;dx $$ In conclusion, $$ \frac{d}{dt}D(f(t)) = -2\int f|D^2p|^2\;dx - \lambda \int f|\nabla p|^2\;dx$$

As a corollary, we have an exponential bound on the entropy production $D(f(t))$, since

\[\frac{d}{dt}D(f(t)) \leq - 2\lambda D(f(t))\]

From where it follows that

\[D(f(t)) \leq e^{-2\lambda t} D(f(0))\]

The equilibrium distribution and exponential decay

If $f$ is a probability distribution such that $\nabla p = 0$, then there is some $c \in \mathbb{R}$ such that

\[\log f + \frac{\lambda}{2}|x|^2 = c\]

In other words, $f$ must be given by

\[f = e^{c-\frac{\lambda}{2}|x|^2} = \frac{1}{Z_\lambda}e^{-\frac{\lambda}{2}|x|^2}\]

Then, we define the equilibrium distribution function

\[f_\infty := \frac{1}{Z_\lambda}e^{-\frac{\lambda}{2}|x|^2}\]

Such a function is a time-independent solution to the Fokker-Planck equation – indeed, it is the only stationary solution, since a stationary solution must necessarily have $\nabla p = 0$ in the set where $f>0$ thanks to the formula for $\frac{d}{dt}H(f(t))$.

In the case $\lambda>0$ this shows $D(f(t))$ is decaying exponentially fast as $t\to \infty$. This has an important consequence: note that for every $t>0$ we have

\[H(f_\infty)-H(f(t)) = \int_t^\infty D(f(s))\;ds \leq \int_t^\infty e^{-2\lambda (s-t)}D(f(t))\;ds\]

in which case, using that the last integral in $s$ is equal to $\frac{1}{2\lambda}D(f(t))$, we obtain the inequality

\[H(f_\infty)-H(f) \leq \frac{1}{2\lambda}D(f)\]

valid for all functions $f$. This inequality tells us that $D(f)$ bounds how far $f$ is from having the maximum possible entropy. Since $D(f(t))$ is decaying when $\lambda>0$, we conclude the entropy of $f(t)$ is converging exponentially fast to the maximum possible entropy.

Thoughts about nonlocal elliptic operators

2018-12-20T00:00:00+00:00

This is my first post in two years (some other day I can ponder again about why I keep failing at blogging regularly). Perhaps unsurprisingly, two years later I still find myself thinking about the structure of nonlocal operators! What I want to discuss here is a very basic observation that I came across while working on my latest paper with Russell Schwab (where we revisit our 2016 preprint on min-max formulas for nonlocal operators). As such, this post can be seen as a kind of follow up to this one, where the Global Comparison Property and other basic definitions are discussed.

The idea is how to describe elliptic integro-differential equations in a way that is entirely analogous to

\[F(D^2u,\nabla u(x),u(x),x) = 0,\]

which is how local non-divergence equations are often considered. That is, the operator in the equation depends on the Hessian, gradient, and value of the function at $x$, as well as the location $x$ itself. To put it differently, a local fully nonlinear equation is described by a real valued function

\[F: \text{Sym}(\mathbb{R}^d) \times \mathbb{R}^d \times \mathbb{R} \times \mathbb{R}^d \to \mathbb{R}\]

which is monotone with respect to its $\text{Sym}(\mathbb{R}^d)$ and $\mathbb{R}$ arguments –here $\text{Sym}(\mathbb{R}^d)$ denotes the space of symmetric real matrices of dimension $d$.

The way to achieve a similar description for nonlocal operators involves a functional following space, which arises naturally when dealing with elliptic integro-differential operators. The space is defined by

\[L^\infty_\beta = \{ h \in L^\infty(\mathbb{R}^d) \text{ such that } |h(y)| = O(|y|^\beta) \text{ as } |y| \to 0 \},\]

when $\beta \in (1,2)$ –the discussion below can be extended to all $\beta \in [0,2]$, but we focus on this range for the sake of a simplified presentation. The space $L^\infty_\beta$ has a topology given by the norm

\[\| h \|_{L^\infty_\beta} := \sup \limits_{y \in \mathbb{R}^d} |h(y)| (\min\{1,|y|^{\beta} \})^{-1}.\]

Now, suppose we are given a continuous, real valued function

\[F:L^\infty_\beta \times \mathbb{R}^d \times \mathbb{R} \times \mathbb{R}^d \to \mathbb{R}\]

which is monotone increasing with respect to its $L^\infty_\beta$ and $\mathbb{R}$ arguments. To such a function $F$ we can associate an operator $I = I_F$, as follows

\[I(u,x) := F(\delta_x u,\nabla u(x),u(x),x),\]

here, $\delta_x u$ is the function in $L^\infty_\beta$ defined by

\[\delta_x u(y) := u(x+y)-u(x)-\chi_{B_1}(y)\nabla u(x)\cdot y.\]

It follows, thanks to the monotonicity assumed on $F$, that $I_F$ satisfies the Global Comparison Property (GCP).

All operators with the GCP arise in this fashion. This can be seen easily, in fact. First, note that given

\[(h,p,z,x_0) \in L^\infty_\beta \times \mathbb{R}^d \times \mathbb{R} \times \mathbb{R}^d,\]

we can define a function $u_{h,p,z,x_0}:\mathbb{R}^d\to \mathbb{R}$ by

\[u_{h,p,z,x_0}(x) = z+\chi_{B_1}(x-x_0)p\cdot (x-x_0)+ h(x-x_0),\;\;\forall\;x\in\mathbb{R}^d.\]

This function may not be of class $C^\beta$ in all of $\mathbb{R}^d$, but it has enough regularity at $x_0$ for our purposes.

Now with this definition at hand, suppose $I:C^\beta(\mathbb{R}^d)\to C^0(\mathbb{R}^d)$ is an operator satisfying the GCP, then we may define $ F:L^\infty_\beta \times \mathbb{R}^d \times \mathbb{R} \times \mathbb{R}^d \to \mathbb{R}$ via

\[F(h,p,z,x) := I( u_{h,p,z,x},x).\]

Since $u_{h,p,z,x}$ is sufficiently regular, the operator is clasically defined for $u_{h,p,z,x}$ at the point $x$ be clasically defined. Furthermore, by construction, we have

\[u_{h,p,z,x_0}(x_0) = z,\; \nabla u_{h,p,z,x_0}(x_0) = p,\; \delta_{x_0} u_{h,p,z,x_0} = h,\]

so it follows that for $F$ thus constructed we have $I = I_F$, and that $F$ is monotone increasing with respect to $h$ and $z$ as long as $I$ satisfies the GCP.

Therefore, when thinking of a nondivergence integro-differential equation, a good way to think about it is as a function

\[F:L^\infty_\beta \times \mathbb{R}^d \times \mathbb{R} \times \mathbb{R}^d \to \mathbb{R}\]

As one last remark: note that the (positive) elements in the dual of $L^\infty_\beta$ are simply the Levy measures that are integrable against

\[\min\{1,|y|^\beta\}.\]

Min max formulas for nonlocal equations

2016-10-30T00:00:00+00:00

Recently, with Russell Schwab, we finished project related to the question of which operators satisfy the global comparison principle. The preprint can be found here.

The manuscript turned out to be longer than we had anticipated. In part, this is due to our having to revisit several facts about Whitney extensions for functions in a Riemannian manifold, which took considerable extra space. We could not find a reference for Whitney extensions on manifolds that stated what we needed explicitly, but our proofs followed closely the very well known ideas for the case of $\mathbb{R}^d$. In any case, we intend to write a shorter paper reviewing our result in the case of $\mathbb{R}^d$ only, where several technical matters, including the Whitney extension, become much simpler.

This post will be an even shorter discussion of the ideas in the paper. I might do a later post discussing matters in greater generality (operators in a manifold or in a metric space). For now, this post will deal only with operators acting on $C^2$ functions in $\mathbb{R}^d$.

(Let me stress that the case of operators on a Riemannian manifold merits attention for several reasons, one is the study of Dirichlet to Neumann maps for elliptic equations, and another is that many free boundary problems can be posed as parabolic integro-differential equations in a manifold, both topics for another post).

(1) The global comparison property

We are considering scalar equations in $\mathbb{R}^d$, concretely, functions $u:\mathbb{R}^d\to\mathbb{R}$ which solve the equation

\[I(u,x) = 0 \;\;\text{ for } x\in \Omega,\]

where $I$ is some (possibly nonlinear) mapping between functions. The operators $I$ we are interested are (heuristically) those for which one expects the comparison principle to hold, which roughly speaking says that

\[u \leq v \text{ in } \mathbb{R}^d\setminus \Omega \text{ and } I(u,x) \geq I(v,x) \text{ in } \Omega \Rightarrow u\leq v \text{ everywhere.}\]

This is the case, for instance, for the Laplace operator. If one looks at the proof of the comparison principle for harmonic functions, then one sees that the crucial fact is the following:

\[u\leq v \text{ everywhere, with } u=v \text{ at } x_0 \Rightarrow \Delta u \leq \Delta v \text{ at } x_0\]

This motivates the following definition.

Definition: An operator $I:C^2_b(\mathbb{R}^d)\to C^0_b(\mathbb{R}^d)$ is said to have the global comparison property (GCP) if whenever $u,v$ are such that $v$ touches $u$ from above at some $x_0\in\mathbb{R}^d$, we have $I(u,x_0)\leq I(v,x_0)$.

Recall that "$v$ touches $u$ from above at $x_0$" means that

\[u(x) \leq v(x) \text{ for all } x\in\mathbb{R}^d \text{ and } u(x_0) = v(x_0) \text{ at some } x_0.\]

By its very definition, the class of equations having the GCP is the class of equations that are amenable to treatment by methods based on the comparison property (i.e. barrier arguments, viscosity solutions, Perron's method, etc).

Question: Is there a simple characterization for the class of operators which have the GCP?.

(2) A few Examples

1) $I u= \Delta u(x) $.

2) $Iu = H(\nabla u(x)) $, for some differentiable function $H:\mathbb{R}^d\to\mathbb{R}$.

3) The operator known as "the fractional Laplacian ", $Lu = -(-\Delta)^{\alpha/2}u$ with $\alpha \in [0,2]$, also written for $u\in C^2_b(\mathbb{R}^d)$ by the formula

\[Lu (x) = C(d,\alpha)\int_{\mathbb{R}^d} \frac{1}{2}(u(x+y)+u(x-y)-2u(y))|y|^{-d-2s}\;dy.\]

4) Any given a Borel measure $\mu$ defines such an operator, via

\[Lu(x) = \int_{\mathbb{R}^d} u(x+y)-u(x)\;d\mu(y).\]

5) If $L_1$ and $L_2$ are two operators having the GCP, then

\[\min\{ L_1(u,x), L_2(u,x) \} \text{ and } \max\{ L_1(u,x), L_2(u,x) \}\]

also have the GCP.

6) Given $u \in C^2_b(\mathbb{R}^{d-1})$, let $U_u$ denote be the unique bounded solution to the elliptic Dirichlet problem

\[F(D^2U) = 0 \;\; \text{in } \{ (x,x_d) \in \mathbb{R}^d \mid \;0<x_d< 1 \},\; U = u \;\text{on } \{ x_d =0\},\; U= 0 \;\text{on } \{x_d=1\}\]

Then, $I(u,x):= \partial_{d} U_u(x,0)$ satisfies the global comparison property.

(3) A warm up exercise

Lemma: Suppose $L$ is a bounded linear map $L:C^2(\mathbb{R}^d)\to C_b^0(\mathbb{R}^d)$ such that $Lu(x_0)\leq 0$ for any $u\in C^2_b(\mathbb{R}^d)$ having a nonnegative local maximum at $x_0$. Then $$Lu(x) = \text{tr}(A(x)D^2u(x))+b(x)\cdot Du(x)+c(x)u(x)$$ where $A(x)\geq 0$ and $c(x)\leq 0$.

Proof: Let $P(x)$ denote the second order Taylor polynomial for $u$ at $x_0$.

For any $\delta>0$ one can construct a function $\eta \in C^2_b(\mathbb{R}^d)$ with $\| \eta\|_{C^2(\mathbb{R}^d)} \leq \delta$ such that $\eta(x_0)=0$ and

\[P(x)+\eta(x) \geq u(x) \geq P(x)-\eta(x) \text{ in some neighborhood of } x_0\]

In which case

\[L(P,x_0) +C_0 \delta \geq L(u,x_0) \geq L(P,x_0) -C_0\delta\]

Since $\delta>0$ was arbitrary, it follows that

\[L(u,x_0) = L(P,x_0),\]

in other words, for every $x \in \mathbb{R}^d$ we have that $L(u,x)$ is a (linear) function of the second order polynomial of $u$ at $x$. In particular, there must be a symmetric matrix $A(x)$, a vector $b(x)$ and a scalar $c(x)$ such that

\[L(u,x) = \text{tr}(A(x)D^2u(x))+b(x)\cdot D u(x) + c(x) u(x).\]

From here it is not difficult to see that $A(x)\geq 0$ and $c(x)\leq 0$. ∎

Keeping in mind the proof of the above lemma, think about the following:

-What can be said if instead of asking $L(u,x_0)\leq 0$ at every local maximum $x_0$, we only assume that this happens at global maxima?.

-What if the operator is not linear?

The first question was answered by P. Courrege in the 60’s, and the answer to this (semingly) purely analytical question leads to an important class of operators from the theory of stochastic processes.

Definition: By a Levy measure we will refer to a Borel measure $\nu$ in $\mathbb{R}^d \setminus \{ 0 \}$ which may not have finite total mass, but is at least such that

\[\int_{\mathbb{R}^d\setminus\{0\}} \min\{ |x|^2,1\}\;\nu(dy)<\infty\]

Theorem (Courrege): A linear operator $L:C^2_b(\mathbb{R}^d)\to C_b^0(\mathbb{R}^d)$ has the global comparison property if and only if it is of the form $$L = L_{\text{loc}}+L_{\text{Levy}},$$ where the operators $L_{\text{loc}}$ and $L_{\text{Levy}}$ are given by $$ L_{\text{loc}}(u,x) = \text{tr}(A(x)D^2u(x))+b(x)\cdot D u(x) + c(x) u(x),$$ $$ L_{\text{Levy} }= \int_{\mathbb{R}^d\setminus \{ 0\}} u(x+y)-u(x)-\chi_{B_1(0)}(y) \nabla u(x)\cdot y \;\nu(x,dy).$$ for $A(x)\geq 0$, $A,b,c\in L^\infty$ and $\{ \nu(x,dy) \}_{x\in\mathbb{R}^d}$ is a family of Levy measures.

(4) A new min-max formula

Theorem (joint with Russell Schwab): Let $I:C^2_b(\mathbb{R}^d)\to C_b^\gamma(\mathbb{R}^d)$ ($\gamma\in (0,1)$) be a Lipschitz continuous map which satisfies the GCP, and such that $$\exists \text{ modulus of continuity } \omega(\cdot) \text{ and a constant } C \text{ such that:}$$ $$\|I(u)-I(v)\|_{L^\infty(B_r)} \leq C\|u-v\|_{C^2(B_{2r})}+C\omega(r)\|u-v\|_{L^\infty(\mathbb{R}^d)}.$$ Then, then there exists i) a (uniformly continuous) family of linear operators $$ L_{ab}:C^2_b(\mathbb{R}^d) \to C^\gamma_b(\mathbb{R}^d)$$, each having the global comparison property, ii) a (bounded) family of functions $f_{ab}\in C^\gamma_b(\mathbb{R}^d)$, and these are such that for any function $u \in C^2_b(\mathbb{R}^d)$ we have the formula $$ I(u,x) = \min\limits_{a} \max \limits_{b} \{ f_{ab}(x)+ L_{ab}(u,x)\}.$$

Remark: If one asks that $I$ be Lipschitz as a map between the spaces $C^\beta_b(\mathbb{R}^d)$ and $C^\gamma_b(\mathbb{R}^d)$ (where now $\beta,\gamma\in(0,1)$), then one can say more about the terms appearing in the min-max formula, in fact, in that case the theorem says that

\[I(u,x) = \min\limits_{a} \max \limits_{b} \left \{ f_{ab}(x)+c_{ab}(x)u(x)+ \int_{\mathbb{R}^d\setminus \{0 \}}u(x+y)-u(x) \;\nu_{ab}(x,dy)\right \}.\]

«««< HEAD (3) Elementary proof when $I$ is Fréchet differentiable

Let us suppose that $I:C^2_b(\mathbb{R}^d) \to C^0_b(\mathbb{R}^d)$ is Fréchet differentiable.

(3) Elementary proof when $I$ is Frechet differentiable

Let us suppose that $I:C^2_b(\mathbb{R}^d) \to C^0_b(\mathbb{R}^d)$ is Frechet differentiable.

origin/master

1.Fix $u,v\in C^2_b(\mathbb{R}^d)$ and let

\[u_t := v+t(u-v)\]

Then

\[I(u)-I(v) = \int_0^1 \frac{d}{dt} I(u_t)\;dt\]

Then, the chain-rule says that

\[I(u)-I(v) = \int_0^1 (DI(u_t))(u-v)\;dt\] \[\;\;\;\;\;= \left ( \int_0^1 DI(u_t)\;dt\right )(u-v)\]

That is, if we define an operator $L$ by $\int_0^1 DI(u_t)\;dt$ then

\[I(u)-I(v) = L(u-v)\]

2.For any $u\in C^2_b(\mathbb{R}^d)$, the linear operator $DI(u)$ is a continuous linear map from $C^2_b(\mathbb{R}^d)$ to $C^0_b(\mathbb{R}^d)$ which was the GCP.

3.The GCP is closed under convex combinations. Therefore, if we define

\[\mathcal{D}(I) := \text{hull} \{ DI(u) \mid u \in C^2_b(\mathbb{R}^d \},\]

then every element of $\mathcal{D}(I)$ has the GCP.

4.Thanks to to step 1), for any $u,v\in C^2_b(\mathbb{R}^d)$

\[I(u,x) \leq \max \limits_{L \in \mathcal{D}(I)} \{ I(v,x) +L(u-v,x) \}.\]

Since we have equality for $v=u$, it follows that

\[I(u,x) = \min \limits_{v \in C^2_b(\mathbb{R}^d))} \max \limits_{L \in \mathcal{D}(I)} \{ I(v,x) +L(u-v,x) \}.\]

(4) A finite dimensional version

Let $G$ be a finite set, and let $C(G)$ denote the space of real valued functions in $G$.

Lemma: Let $I:C(G)\to C(G)$ be a Lipschitz map satisfying the GCP, then, $$ I(u,x) = \min \limits_{v \in C(G)} \max \limits_{L \in \mathcal{D}I} \{ I(v,x) + L(u-v,x)\},$$ where each $L$ is a linear map from $C(G)$ to $C(G)$ having the form $$ L(u,x) = c(x)u(x) + \sum \limits_{y\in G} (u(y)-u(x))k(x,y),$$ for some $c\in C(G)$ and some $k:G\times G\to\mathbb{R}$ with $k(x,y)\geq 0$ for all $x$ and $y$ in $G$.

The key difference now is that in this case $I:C(G)\to C(G)$ amounts to a Lipschitz map between two finite dimensional vector spaces, and here we have Clarke’s non-smooth calculus at our disposal.

Definition (Clarke Jacobian): Let $I:C(G)\to C(G)$ be a Lipschiz continuous map and $f \in C(G)$. The Clarke Jacobian of $I$ at $f$ is defined as the set

\[\mathcal{D}_f(I) := \text{hull}\{ L = \lim\limits_n L_n \mid \exists f_n \to f \text{ s.t. } I \text{ differentiable at } f_n \text{ and } L_n = DI(f_n) \forall n \}.\]

Finally, we will consider the total Clarke Jacobian of $I$, denoted $\mathcal{D}I$, and defined by

\[\mathcal{D}(I) := \text{hull} ( \bigcup_f \mathcal{D}_f(I) ).\]

Lemma (mean value theorem): Let $I:C(G)\to C(G)$ be Lipschitz mapping. For any $u,v\in C(G)$, there exists some $L\in \mathcal{D}(I)$ such that $$ I(u)-I(v) = L(u-v).$$

Corollary: Let $I$ be as before, then for any $u\in C(G)$ and any $x\in G$, we have $$ I(u,x) \leq I(v,x) + \max \limits_{L \in \mathcal{D}(I)} L(u-v,x).$$

Proposition: If $I:C(G)\to C(G)$ is Lipschitz and has the GCP, then each $L \in \mathcal{D}(I)$ has the GCP.

«««< HEAD The proof of this last proposition is quite simple. First, let $u\in C(G)$ be a point of differentiability for $I(\cdot)$, and let $L_u$ denote the Fréchet derivative of $I(\cdot)$ at $u$. ======= The proof of this last proposition is quite simple. First, let $u\in C(G)$ be a point of differentiability for $I(\cdot)$, and let $L_u$ denote the Frechet derivative of $I(\cdot)$ at $u$.

origin/master

Then, let $v_1,v_2 \in C(G)$ be such that $v_1\leq v_2$ in $G$ with $=$ at some $x_0 \in G$, then for every $t>0$ we have

\[u+tv_1 \leq u+tv_2 \text{ in } G,\;\; u+tv_1 = u+tv_2 \text{ at } x_0\]

Then

\[I(u+tv_1,x_0)\leq I(u+tv_2,x_0)\]

«««< HEAD Using the fact that $I(\cdot)$ is Fréchet differentiable at $t$, it follows that ======= Using the fact that $I(\cdot)$ is Frechet differentiable at $t$, it follows that

origin/master

\[L(v_1,x_0)\leq L(v_2,x_0)\]

(5) The min-max formula via finite dimensional approximations.

With this finite dimensional result at hand, one can try to prove the result by approximating the space $C^2_b(\mathbb{R}^d)$ by finite dimensional subspaces, obtained roughly as follows: one constructs an increasing sequence of finite graphs $G_n$, which converge to $\mathbb{R}^d$.

Consider the following increasing sequence of discrete sets in $\mathbb{R}^d$:

\[\tilde G_n := 2^{-n}\mathbb{Z}^d,\;\; G_n := [-2^n,2^n]^d \cap (2^{-n}\mathbb{Z}^d),\]

in terms of these sets, we define projection operators

\[\pi_n: C^2_b(\mathbb{R}^d) \to X_n \subset C^2_b(\mathbb{R}^d),\;\;\;\pi_n^0: C^0_b(\mathbb{R}^d) \to X_n ^0\subset C^0_b(\mathbb{R}^d).\]

Then, for each $n$, we define a finite dimensional approximation to $I$, $ I_n :C^2_b(\mathbb{R}^d) \to C^0_b(\mathbb{R}^d)$, by

\[I_n : = \pi_n^0 \circ I \circ \pi_n.\]

Now, we can think of $I_n$ also as a (Lipschitz) map $C(G) \to C(G)$, and apply the min-max formula for this case,

Lemma: For every $n \in \mathbb{N}$ and $x\in G_n$, we have, for any $u \in C^2_b(\mathbb{R}^d)$, $$ I_n(u,x) = \min\limits_{v \in C^2_b(\mathbb{R}^d)} \max \limits_{L \in \mathcal{D}(I_n)} \{ I_n(v,x) + L(u-v,x) \},$$ moreover, for each $L \in \mathcal{D}(I_n)$, there is some $c\in C(G_n)$ and measures $\{ m(x,dy) \}_{x\in G_n}$ such that $$L(v,x) = c(x)v(x) + \int_{\mathbb{R}^d} v(y)-v(x)\; m(x,dy),\;\;\;\forall\;x \in G_n$$ Moreover, for each $x\in G_n$, we have some nonnegative function $k:G_n\times G_n \to \mathbb{R}$ such that $$ m(x,dy) = \sum \limits_{z \in G_n\setminus \{x\}} k(x,z)\delta_z(dy).$$

Why write about mathematics?

2016-10-25T00:00:00+00:00

Over the years, I have tried again and again to mantain a math blog. In all those instances, I would ultimately plateau after a flurry of early posts, often overwhelmed by more urgent matters (teaching duties, research, job applications, fixing fatal mistake in a paper). I would eventually abandon the blog –who knows, maybe the same will happen this time.

But now I believe I understand the main reason for why these attempts failed, ultimately, I would not hit “publish” on a post until I felt it was significantly polished (which is not the same as actually polished). I mean, I already feel I am over analizing this very first post!. Thinking too much before publishing defeats the purpose of blogging: a feature, not a bug of blogging is the ability to write often, and post things without agonizing whether the thing really is perfect (and yes, to any of my coauthors reading this: I appreciate the irony of me writing these words).

What I envision in the short term is to make this blog a universal placeholder for notes on topics I am currently learning or thinking about, and the occasional rant about topics related to mathematics. The exposition will therefore be mostly technical (often confusing or downright incorrect), and it will deal almost exclusively with analysis and partial differential equations.

Another aim of the blog is to experiment and simply see what happens. How can technical discussion of mathematics be done online, in blog format? Can short posts I write here be of use to others? Can I develop a blogging habit which may turn out to be an appropriate use of my time? Can it serve as a place to speculate about approaches to problems, and to gain useful feedback? Hopefully the answer to some of these questions will turn out to be positive.

In short, this blog’s mantra will be: write a lot of crappy posts about math, and hope for the best.

About the blog’s name: The name of the blog was inspired by a well known lecture by Freeman Dyson, where he uses birds and frogs as metaphor for different styles or approaches to mathematical research. A written version can be found here.

About the blog’s layout: For now I intend to keep the website layout minimal (this website was built from scratch on github pages using jekyll, for which this is a good tutorial). The only other feature beside the post feed is the RSS feed linked above, I will add features later as needed.