If you see this, something is wrong

LaTex2Web logo

LaTeX2Web, a web authoring and publishing system

LaTex2Web logo

LaTeX2Web, a web authoring and publishing system

Table of contents

Digest for "Contraction theory for nonlinear stability analysis and learning-based control: A tutorial overview"

Definition 1

A differential displacement, \( \delta{x}\) , is defined as an infinitesimal displacement at a fixed time as used in the calculus of variation [1, p. 107], and (1) yields the following differential dynamics:

\[ \begin{align} \delta\dot{x}(t)=\frac{\partial f}{\partial x}({x}(t),t)\delta x(t) \end{align} \]

(2)

where \( f({x}(t),t)\) is given in (1).

Lemma 1

Suppose that a continuously differentiable function \( v\in\mathbb{R}_{\geq0}\mapsto\mathbb{R}\) satisfies the following differential inequality:

\[ \begin{align} \dot{v}(t) \leq -\gamma v(t)+c,~v(0)=v_0,~\forall t\in\mathbb{R}_{\geq 0} \end{align} \]

(3)

where \( \gamma \in \mathbb{R}_{>0}\) , \( c\in\mathbb{R}\) , and \( v_0\in\mathbb{R}\) . Then we have

\[ \begin{align} v(t) \leq v_0e^{-\gamma t}+\frac{c}{\gamma}(1-e^{-\gamma t}),~\forall t\in\mathbb{R}_{\geq 0}. \\\end{align} \]

(4)

Theorem 1

If there exists a uniformly positive definite matrix \( {M}({x},t)={{\Theta}}({x},t)^{\top}{{\Theta}}({x},t) \succ 0, \forall x,t\) , where \( {\Theta(x,t)}\) defines a smooth coordinate transformation of \( \delta x\) , i.e., \( \delta{z}={\Theta}(x,t)\delta{x}\) , s.t. either of the following equivalent conditions holds for \( \exists \alpha \in \mathbb{R}_{>0}\) , \( \forall x,t\) :

\[ \begin{align} &\lambda_{\max}(\rm{sym}(F(x,t))) =\lambda_{\max}\left(\rm{sym}\left(\left({\dot{\Theta}}+{{\Theta}}\frac{\partial {f}}{\partial {x}}\right){{\Theta}}^{-1}\right)\right) \leq - \alpha \end{align} \]

(5)

\[ \begin{align} &{\dot{M}}+M\frac{\partial {f}}{\partial {x}}+\frac{\partial {f}}{\partial {x}}^{\top}M \preceq -2\alpha M \end{align} \]

(6)

where the arguments \( (x,t)\) of \( M(x,t)\) and \( \Theta(x,t)\) are omitted for notational simplicity, then all the solution trajectories of (1) converge to a single trajectory exponentially fast regardless of their initial conditions (i.e., contracting, see Definition 3), with an exponential convergence rate \( \alpha\) . The converse also holds.

Remark 1

Since \( M\) of Theorem 1 is positive definite, i.e., \( v^{\top}Mv=\|\Theta v\|^2\geq0, \forall v\in\mathbb{R}^n\) , and \( \|\Theta v\|^2=0\) if and only if \( v=0\) , the equation \( \Theta v=0\) only has a trivial solution \( v=0\) . This implies that \( \Theta\) is always non-singular (i.e., \( \Theta(x,t)^{-1}\) always exists).

Definition 2

Let \( \xi_0(t)\) and \( \xi_1(t)\) denote some solution trajectories of (1). We say that (1) is incrementally exponentially stable if \( \exists C,\alpha>0\) s.t. the following holds [2]:

\[ \begin{align} \|\xi_1(t)-\xi_0(t)\| \leq Ce^{-\alpha t}\|\xi_0(0)-\xi_1(0)\| \end{align} \]

(10)

for any \( \xi_0(t)\) and \( \xi_1(t)\) . Note that, since we have \( \|\xi_1(t)-\xi_0(t)\|=\|\int_{\xi_0}^{\xi_1}\delta x\|\) (see Theorem 3), Theorem 1 implies incremental stability of the system (1).

Definition 3

The system (1) satisfying the conditions in Theorem 1 is said to be contracting, and a uniformly positive definite matrix \( M\) that satisfies (6) defines a contraction metric. As to be discussed in Theorem 3 of Sec. 2.1.2, a contracting system is incrementally exponentially stable in the sense of Definition 2.

Example 1

One of the distinct features of contraction theory in Theorem 1 is incremental stability with exponential convergence. Consider the example given in [3]:

\[ \begin{align} \dfrac{d}{dt}\begin{bmatrix}x_1\\ x_2\end{bmatrix} = \begin{bmatrix}-1 & x_1\\ -x_1 & -1\end{bmatrix}\begin{bmatrix}x_1\\ x_2\end{bmatrix}. \end{align} \]

(11)

A Lyapunov function \( V = \|x\|^2/2\) for (11), where \( x=[x_1,x_2]^{\top}\) , yields \( \dot{V} \leq -2V\) . Thus, (11) is exponentially stable with respect to \( x=0\) . The differential dynamics of (11) is given as

\[ \begin{align} \dfrac{d}{dt}\begin{bmatrix}\delta x_1\\ \delta x_2\end{bmatrix} = \begin{bmatrix}-1+x_2 & x_1\\ -2x_1 & -1\end{bmatrix}\begin{bmatrix}\delta x_1\\ \delta x_2\end{bmatrix} \end{align} \]

(12)

and the contraction condition (6) for (12) can no longer be proven by \( V=\|\delta x\|^2/2\) , due to the lack of the skew-symmetric property of (11) in (12). This difficulty illustrates the difference between Lyapunov theory and contraction theory, where the former considers stability of (11) with respect to the equilibrium point, while the latter analyzes exponential convergence of any couple of trajectories in (11) with respect to each other (i.e., incremental stability in Definition 2[4, 3].

Example 2

Contraction defined by Theorem 1 guarantees incremental stability of their solution trajectories but does not require the existence of stable fixed points. Let us consider the following system for \( x:\mathbb{R}_{\geq0}\mapsto\mathbb{R}\) :

\[ \begin{align} \dot{x} = -x + e^t. \end{align} \]

(13)

Using the constraint (6) of Theorem 1, we can easily verify that (13) is contracting as \( M=\mathrm{I}\) defines its contraction metric with the contraction rate \( \alpha=1\) . However, since (13) has \( x(t) = e^t/2+(x(0)-1/2)e^{-t}\) as its unique solution, it is not stable with respect to any fixed point.

Example 3

Consider a Linear Time-Invariant (LTI) system, \( \dot{x} = f(x) = Ax\) . Lyapunov theory states that the origin is globally exponentially stable if and only if there exists a constant positive-definite matrix \( P\in\mathbb{R}^{n\times n}\) s.t. [5, pp. 67-68]

\[ \begin{align}\exists \epsilon >0\text{ s.t.} PA+A^{\top}P \preceq -\epsilon \mathrm{I}.\end{align} \]

(14)

Now, let \( \overline{p} = \|P\|\) . Since \( -\mathrm{I} \preceq -P/\overline{p}\) , (14) implies that \( PA+A^{\top}P \leq -(\epsilon/\overline{p})P\) , which shows that \( M=P\) with \( \alpha = \epsilon/(2\overline{p})\) satisfies (6) due to the relation \( \partial f/\partial x = A\) .

The contraction condition (6) can thus be viewed as a generalization of the Lyapunov stability condition (14) (see the generalized Krasovskii’s theorem [6, pp. 83-86]) in a nonlinear non-autonomous system setting, expressed in the differential formulation that permits a non-constant metric and pure differential coordinate change [4]. Furthermore, if \( f(x,t)=A(t)x\) and \( M=\mathrm{I}\) , (6) results in \( A(t)+A(t)^{\top} \preceq -2\alpha \mathrm{I}\) (i.e., all the eigenvalues of the symmetric matrix \( A(t)+A(t)^{\top}\) remain strictly in the left-half complex plane), which is a known sufficient condition for stability of Linear Time-Varying (LTV) systems [6, pp. 114-115].

Example 4

Most of the learning-based techniques involving neural networks are based on optimizing their hyperparameters by gradient descent [7]. Contraction theory provides a generalized view on the analysis of such continuous-time gradient-based optimization algorithms [8].

Let us consider a twice differentiable scalar output function \( f:\mathbb{R}^n\times\mathbb{R}\mapsto\mathbb{R}\) , a matrix-valued function \( M:\mathbb{R}^n\mapsto\mathbb{R}^{n\times n}\) with \( M(x)\succ0, \forall x\in\mathbb{R}^n\) , and the following natural gradient system [9]:

\[ \begin{align} \dot{x} = h(x,t) = -M(x)^{-1}\nabla_x f(x,t). \end{align} \]

(15)

Then, \( f\) is geodesically \( \alpha\) -strongly convex for each \( t\) in the metric defined by \( M(x)\) (i.e., \( H(x)\succeq \alpha M(x)\) with \( H(x)\) being the Riemannian Hessian matrix of \( f\) with respect to \( M\)  [10]), if and only if (15) is contracting with rate \( \alpha\) in the metric defined by \( M\) as in (6) of Theorem 1, where \( A = {\partial h}/{\partial x}\) . More specifically, the Riemannian Hessian verifies \( H(x) = -(\dot{M}+MA+A^{\top}M)/2\) . See [8] for details.

Remark 2

Theorem 1 can be applied to other vector norms of \( \|\delta {z}\|_p\) with, e.g.{}, \( p=1\) or \( p=\infty\)  [4]. It can also be shown that for a contracting autonomous system of the form \( \dot{x}={f}({x})\) , all trajectories converge to an equilibrium point exponentially fast.

Theorem 2

Consider the following nonlinear system with the state \( x\in\mathbb{R}_{\geq 0}\mapsto\mathbb{R}^n\) and the auxiliary or virtual system with the state \( q\in\mathbb{R}_{\geq 0}\mapsto\mathbb{R}^n\) :

\[ \begin{align} \dot{x}(t) &= g (x(t),x(t),t) \end{align} \]

(16)

\[ \begin{align} \dot{q}(t) &= g (q(t),x(t),t) \end{align} \]

(17)

where \( g : \mathbb{R}^n\times\mathbb{R}^n\times\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}^n\) is a smooth function. Suppose that (17) is contracting with respect to \( q\) . If a particular solution of (17) verifies a smooth specific property, then all trajectories of (16) verify this property exponentially.

Example 5

Let us illustrate the role of partial contraction using the following nonlinear system [3]:

\[ \begin{align} \dot{x} &= -D(x)x+u \\ \dot{x}_d &= -D(x_d)x_d \\\end{align} \]

(18)

where \( x\) is the system state, \( x_d\) is the target state, \( u\) is the control input designed as \( u = -K(x)(x-x_d)+(D(x)-D(x_d))x_d\) , and \( D(x)+K(x) \succ 0\) . We could define a virtual system, which has \( q=x\) and \( q=x_d\) as its particular solutions, as follows:

\[ \begin{align} \dot{q} = -(D(x)+K(x))(q-x_d)-D(x_d)x_d. \end{align} \]

(19)

Since we have \( \delta\dot{q} = -(D(x)+K(x))\delta q\) and \( D(x)+K(x)\succ0\) , (19) is contracting with \( M=\mathrm{I}\) in (6). However, if we consider the following virtual system:

\[ \begin{align} \dot{q} = -(D(q)+K(q))(q-x_d)-D(x_d)x_d \end{align} \]

(20)

which also has \( q=x\) and \( q=x_d\) as its particular solutions, proving contraction is no longer straightforward because of the terms \( \partial D/\partial q_i\) and \( \partial K/\partial q_i\) in the differential dynamics of (20). This is due to the fact that, in contrast to (19), (20) has particular solutions nonlinear in \( q\) in addition to \( q=x\) and \( q=x_d\) , and the condition (6) becomes more involved for (20) as it is for guaranteeing exponential convergence of any couple of these particular solution trajectories [3].

Example 6

As one of the key applications of partial contraction given in Theorem 2, let us consider the following closed-loop Lagrangian system [6, p. 392]:

\[ \begin{align} &\mathcal{H}(\mathtt{q})\ddot{\mathtt{q}}+\mathcal{C}(\mathtt{q},\dot{\mathtt{q}})\dot{\mathtt{q}}+\mathcal{G}(\mathtt{q})=u(\mathtt{q},\dot{\mathtt{q}},t) \end{align} \]

(21)

\[ \begin{align} &u(\mathtt{q},\dot{\mathtt{q}},t) = \mathcal{H}(\mathtt{q})\ddot{\mathtt{q}}_r+\mathcal{C}(\mathtt{q},\dot{\mathtt{q}})\dot{\mathtt{q}}_r+\mathcal{G}(\mathtt{q})-\mathcal{K}(t)(\dot{\mathtt{q}}-\dot{\mathtt{q}}_r) \end{align} \]

(22)

where \( \mathtt{q},\dot{\mathtt{q}}\in\mathbb{R}^n\) , \( \dot{\mathtt{q}}_r=\dot{\mathtt{q}}_d(t)-\Lambda(t)(\mathtt{q}-\mathtt{q}_d(t))\) , \( \mathcal{H}: \mathbb{R}^n\mapsto\mathbb{R}^{n\times n}\) , \( \mathcal{C}: \mathbb{R}^n\times \mathbb{R}^n\mapsto\mathbb{R}^{n\times n}\) , \( \mathcal{G}: \mathbb{R}^n\mapsto\mathbb{R}^n\) , \( \mathcal{K}: \mathbb{R}_{\geq 0}\mapsto\mathbb{R}^{n\times n}\) , \( \Lambda: \mathbb{R}_{\geq 0}\mapsto\mathbb{R}^{n\times n}\) , and \( (\mathtt{q}_d,\dot{\mathtt{q}}_d)\) is the target trajectory of the state \( (\mathtt{q},\dot{\mathtt{q}})\) . Note that \( \mathcal{K},\Lambda \succ 0\) are control gain matrices (design parameters), and \( \dot{\mathcal{H}}-2\mathcal{C}\) is skew-symmetric with \( \mathcal{H} \succ 0\) by construction.

By comparing with (21) and (22), we define the following virtual observer-like system of \( q\) (not \( \mathtt{q}\) ) that has \( q = \dot{\mathtt{q}}\) and \( q = \dot{\mathtt{q}}_r\) as its particular solutions:

\[ \begin{align} \mathcal{H}(\mathtt{q})\dot{q}+\mathcal{C}(\mathtt{q},\dot{\mathtt{q}}){q} +\mathcal{K}(t)({q}-\dot{\mathtt{q}})+\mathcal{G}(\mathtt{q}) =u(\mathtt{q},\dot{\mathtt{q}},t) \end{align} \]

(23)

which gives \( \mathcal{H}(\mathtt{q})\delta\dot{q}+(\mathcal{C}(\mathtt{q},\dot{\mathtt{q}})+\mathcal{K}(t))\delta q=0\) . We thus have that

\[ \begin{align} \dfrac{d}{dt} (\delta q^{\top} \mathcal{H}(\mathtt{q})\delta q) = \delta q^{\top} (\dot{\mathcal{H}}-2\mathcal{C}-2\mathcal{K})\delta q = -2\delta q^{\top}\mathcal{K}\delta q \end{align} \]

(24)

where the skew-symmetry of \( \dot{\mathcal{H}}-2\mathcal{C}\) is used to obtain the second equality. Since \( \mathcal{K}\succ0\) , (24) indicates that the virtual system (23) is partially contracting in \( q\) with \( \mathcal{H}\) defining its contraction metric. Contraction of the full state \( (\mathtt{q},\dot{\mathtt{q}})\) will be discussed in Example 7.

Note that if we treat the arguments \( (\mathtt{q},\dot{\mathtt{q}})\) of \( \mathcal{H}\) and \( \mathcal{C}\) also as the virtual state \( q\) , we end up having additional terms such as \( \partial\mathcal{H}/\partial{q_i}\) , which makes proving contraction analytically more involved as in Example 5.

Theorem 3

Let \( \xi_0\) and \( \xi_1\) be the two arbitrary solutions of (1), and define the transformed squared length with \( M(x,t)\) of Theorem 1 as follows:

\[ \begin{equation} V_{s\ell}\left(x,\delta x,t\right)=\int^{\xi_1}_{\xi_0} \|\delta z\|^2 =\int_{0}^{1}\frac{\partial x}{\partial\mu}^{\top}M\left(x,t\right)\frac{\partial x}{\partial\mu}d\mu \end{equation} \]

(25)

where \( x\) is a smooth path parameterized as \( x(\mu=0,t)=\xi_0(t)\) and \( x(\mu=1,t)=\xi_1(t)\) by \( \mu \in [0,1]\) . Also, define the path integral with the transformation \( \Theta(x,t)\) for \( M(x,t) = \Theta(x,t)^{\top}\Theta(x,t)\) as follows:

\[ \begin{equation} V_\ell(x,\delta x,t)=\int^{\xi_1}_{\xi_0} \|\delta {z}\|=\int^{\xi_1}_{\xi_0} \|\Theta(x,t)\delta {x}\|. \end{equation} \]

(26)

Then (25) and (26) are related as

\[ \begin{align} \|\xi_1 -\xi_0\|=\left\|\int^{\xi_1}_{\xi_0} \delta {x}\right\|\leq \frac{V_{\ell}}{\sqrt{\underline{m}}} \leq \sqrt{\frac{V_{s\ell}}{\underline{m}}} \end{align} \]

(27)

where \( M(x,t)\succeq\underline{m}\mathrm{I}, \forall x,t\) for \( \exists\underline{m}\in\mathbb{R}_{>0}\) , and Theorem 1 can also be proven by using (25) and (26) as a Lyapunov-like function, resulting in incremental exponential stability of the system (1) (see Definition 2). Note that the shortest path integral \( V_\ell\) of (26) with a parameterized state \( x\) (i.e., \( \inf V_\ell=\sqrt{\inf V_{s\ell}}\) ) defines the Riemannian distance and the path integral of a minimizing geodesic [11].

Theorem 4

If the system (1) satisfies (5) and (6) of Theorem 1 (i.e., the system (1) is contracting), then the path integral \( V_\ell(q,\delta q,t)=\int^{\xi_1}_{\xi_0}\|\Theta(q,t)\delta q\|\) of (26), where \( \xi_0\) is a solution of the contracting system (1), \( \xi_1\) is a solution of the perturbed system (29), and \( q\) is the virtual state of (30), exponentially converges to a bounded error ball as long as \( {\Theta}{d}\in \mathcal{L}_\infty\) (i.e., \( \sup_{x,t}\|\Theta d\| < \infty\) ). Specifically, if \( \exists \underline{m},\overline{m} \in \mathbb{R}_{>0}\) and \( \exists \bar{d}\in\mathbb{R}_{\geq 0}\) s.t. \( \bar{d}=\sup_{{x},t}\| {d}(x,t)\|\) and

\[ \begin{align} \underline{m}\mathrm{I} \preceq M(x,t) \preceq \overline{m}\mathrm{I},~\forall x,t \end{align} \]

(31)

then we have the following relation:

\[ \begin{equation} \|\xi_1(t)-\xi_0(t)\\leq \frac{V_\ell(0)}{\sqrt{\underline{m}}}e^{-\alpha t}+\frac{\bar{d}}{\alpha}\sqrt{\frac{\overline{m}}{\underline{m}}}(1-e^{-\alpha t}) \end{equation} \]

(32)

where \( V_\ell(t) = V_\ell(q(t),\delta q(t),t)\) for notational simplicity.

Lemma 2

Suppose that \( V_{s\ell}\) of (25) satisfies the following inequality:

\[ \begin{align} \mathscr{L}V_{s\ell} \leq -\gamma V_{s\ell}+c \end{align} \]

(40)

where \( \gamma \in \mathbb{R}_{>0}\) , \( c \in \mathbb{R}_{\geq0}\) , and \( \mathscr{L}\) denotes the infinitesimal differential generator of the It^{o} process given in [12, p. 15]. Then we have the following bound [13]:

\[ \begin{align} \mathbb{E} \left[\|\xi_1(t)-\xi_0(t)\|^2\right] \leq \frac{1}{\underline{m}}\left( \mathbb{E} e^{-\gamma t}+\frac{c}{\gamma}\right) \end{align} \]

(41)

where \( V_{s\ell}(0) = V_{s\ell}(x(0),\delta x(0),0)\) for \( V_{s\ell}\) in (25), \( \underline{m}\) is given in (31), \( \xi_{0}\) and \( \xi_{1}\) are given in (39), and \( \mathbb{E} \) denotes the expected value operator. Furthermore, the probability that \( \|\xi_1-\xi_0\|\) is greater than or equal to \( \varepsilon\in\mathbb{R}_{> 0}\) is given as

\[ \begin{align} \mathbb{P} \left[\|\xi_1(t)-\xi_0(t)\| \geq \varepsilon\right] \leq \frac{1}{\varepsilon^2\underline{m}}\left( \mathbb{E} e^{-\gamma t}+\frac{c}{\gamma}\right). \end{align} \]

(42)

Remark 3

Although Lemma 2 considers the second moment of \( \|\xi_1(t)-\xi_0(t)\|\) , i.e., \( \mathbb{E}[\|\xi_1(t)-\xi_0(t)\|^2]\) , it can be readily generalized to the \( p\) -th moment of \( \|\xi_1(t)-\xi_0(t)\|\) , i.e., \( \mathbb{E}[\|\xi_1(t)-\xi_0(t)\|^p]\) , applying the Lyapunov-based technique proposed in [14].

Theorem 5

Suppose that \( \exists \bar{g}_{0}\in\mathbb{R}_{\geq 0}\) and \( \exists \bar{g}_{1}\in\mathbb{R}_{\geq 0}\) s.t. \( \sup_{x,t}\|G_1(x,t)\|_F = \bar{g}_0\) and \( \sup_{x,t}\|G_1(x,t)\|_F = \bar{g}_1\) in (39). Suppose also that there exists \( M(x,t) \succ 0, \forall x,t\) , s.t. \( M_{x_i}=\partial M/\partial x_i\) is Lipschitz with respect to \( x\) for all \( i=1,…,n\) , i.e., \( \exists L_m \in\mathbb{R}_{\geq 0}\) s.t.

\[ \begin{align} \|M_{x_i}(x,t)-M_{x_i}(x',t)\|\leq L_m\|x-x'\|,~\forall x,x',t,i. \end{align} \]

(44)

Also, suppose that \( M\) of (44) satisfies (31) and (6) with its right-hand side replaced by \( -2\alpha M-\alpha_s \mathrm{I}\) for \( \alpha_s = L_m(\bar{g}_0^2+\bar{g}_1^2)(\alpha_G+{1}/{2})\) , i.e.,

\[ \begin{align} {\dot{M}}+M\frac{\partial {f}}{\partial {x}}+\frac{\partial {f}}{\partial {x}}^{\top}M \preceq - 2 \alpha M-\alpha_s\mathrm{I} \end{align} \]

(45)

where \( \alpha_G\in\mathbb{R}_{>0}\) is an arbitrary constant (see (49) for its details). Then, the following error bound of incremental stability holds:

\[ \begin{align} \mathbb{E} \left[\|\xi_1(t)-\xi_0(t)\|^2\right] \leq \frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha t}+\frac{C}{2\alpha}\frac{\overline{m}}{\underline{m}} \end{align} \]

(46)

where \( \xi_0\) and \( \xi_1\) are the trajectories given in (39), \( V_{s\ell}(t)=V_{s\ell}(q(t),\delta q(t),t)=\int^{\xi_1}_{\xi_0}\delta q^{\top}M(q,t)\delta q\) is given in (25) with the virtual state \( q\) of (43), \( \underline{m}\) and \( \overline{m}\) are given in (31), \( C = (\bar{g}_0^2+\bar{g}_1^2)({2}{\alpha_G}^{-1}+1)\) , and \( \mathbb{E} \) denotes the expected value operator. Furthermore, the probability that \( \|\xi_1-\xi_0\|\) is greater than or equal to \( \varepsilon\in\mathbb{R}_{> 0}\) is given as

\[ \begin{align} \mathbb{P} \left[\|\xi_1(t)-\xi_0(t)\|\geq\varepsilon\right] \leq \frac{1}{\varepsilon^2}\left(\frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha t}+\frac{C}{2\alpha}\frac{\overline{m}}{\underline{m}}\right).~~~~~~ \end{align} \]

(47)

Remark 4

Although we consider the Gaussian white noise stochastic differential equation (39) when referring to stochastic systems in this paper, other types of stochastic noises, including compound Poisson shot noise and bounded-measure Lévy noise, could be considered as in Theorem 5 using contraction theory [15].

Theorem 6

If (29) is perturbed by \( {d}({x},t)\in \mathcal{L}_{pe}\) (i.e., \( \|(d)_{\tau}\|_{\mathcal{L}_p} < \infty\) for \( \tau\in\mathbb{R}_{\geq0}\) and \( p\in[1,\infty]\) , see Sec. 1.2.3) and Theorem 1 holds, then (29) is finite-gain \( \mathcal{L}_p\) stable with \( p\in [1,\infty]\) for an output \( {y}={h}({x},{d},t)\) with \( \int^{Y_1}_{Y_0}\|\delta {y}\|\leq \eta_0 \int^{\xi_1}_{\xi_0}\|\delta {x}\|+\eta_1\|{d}\|\) , \( \exists\eta_0,\eta_1\geq0\) , i.e., \( \forall \tau\in\mathbb{R}_{\geq 0}\)  [16]

\[ \begin{equation} \left\|{\left(\int^{Y_1}_{Y_0} \|\delta {y}\|\right)_\tau}\right\|_{\mathcal{L}_p} \leq \left(\frac{\eta_0}{\alpha}+{\eta_1}\right)\frac{\|({\Theta}{d})_\tau \|_{\mathcal{L}_p}}{\sqrt{\underline{m}}}+\frac{\eta_0\zeta V_\ell(0)}{\sqrt{\underline{m}}} \end{equation} \]

(50)

where \( Y_0\) and \( Y_1\) denote the output trajectories of the original contracting system (1) and its perturbed system (29), respectively, \( \underline{m}\) is defined as \( {M}({x},t)\succeq \underline{m}{\mathrm{I}}\) , \( \forall x,t\) , as in (31), and \( V_\ell(0) = V_\ell(x(0),\delta x(0),0)\) for \( V_{\ell}\) in (26). Also, \( \zeta=1\) if \( p=\infty\) and \( \zeta=1/(\alpha p)^{1/p}\) if \( p\in[1,\infty)\) . The perturbed system (29) also exhibits ISS.

Theorem 7

Consider the following hierarchically combined system of two contracting dynamics:

\[ \begin{align} \dfrac{d}{dt}\begin{bmatrix}\delta z_0\\ \delta z_1\end{bmatrix}=\begin{bmatrix}{F}_{00} & {0}\\ {F}_{10} & {F}_{11}\end{bmatrix}\begin{bmatrix}\delta z_0\\ \delta z_1\end{bmatrix} \end{align} \]

(52)

where \( F=F_{00}\) and \( F=F_{11}\) both satisfy the contraction condition (5), and suppose that it is subject to perturbation \( [{d}_0,{d}_1]^{\top}\) . Then the path length integral \( V_{\ell,i}(t)=\int^{\xi_1}_{\xi_0}\|\delta {z}_i\|\) with \( i=0,1\) between the original and perturbed dynamics trajectories, \( \xi_0\) and \( \xi_1\) , respectively, verifies [4]

\[ \begin{align} \dot{V}_{\ell,0}+\alpha_0V_{\ell,0}&\leq\|{\Theta}_0{d}_0\| \end{align} \]

(53)

\[ \begin{align} \dot{V}_{\ell,1}+\alpha_1V_{\ell,1}&\leq\|{\Theta}_1{d}_1\|+\int^{\xi_1}_{\xi_0}\|{F}_{10}\|\|\delta {z}_1\| \end{align} \]

(54)

where \( \alpha_i=\sup_{x,t}|\lambda_\mathrm{max}({F}_{ii})|, i=0,1\) . Hence, the error bounds of \( V_{\ell,0}(t)\) and \( V_{\ell,1}(t)\) can be obtained using Theorem 4 if \( \|{F}_{10}\|\) is bounded. In particular, if \( \|{\Theta}_i d_i\|\leq \sqrt{\overline{m}_i}\bar{d}_i\) , \( i=0,1\) , the relations (53) and (54) yield \( V_{\ell,0}(t) \leq V_{\ell,0}(0)e^{-\alpha_0t}+(\sqrt{\overline{m}_0}\bar{d}_0/{\alpha_0})(1-e^{-\alpha_0t})\) as in (32), and

\[ \begin{align} V_{\ell,1}(t) &\leq V_{\ell,1}(0)e^{-\alpha_1t}+\frac{\sqrt{\overline{m}_1}\bar{d}_1+\bar{f}_{10}\bar{V}_{0}}{\alpha_1}(1-e^{-\alpha_1t})~~~~ \end{align} \]

(55)

where \( \bar{f}_{10}=\sup_{t\in\mathbb{R}_{\geq 0}}\|{F}_{10}\|\) and \( \bar{V}_{0}=V_{\ell,0}(0)+\sqrt{\overline{m}_0}\bar{d}_0/{\alpha_0}\) .

Also, similar to Theorem 6, we have \( \|(V_{\ell,0})_\tau\|_{\mathcal{L}_p} \leq V_{\ell,0}(0)\zeta_0+{\|({\Theta}_0{d}_0)_\tau \|_{\mathcal{L}_p}}/{\alpha_0}\) , and thus a hierarchical connection for finite-gain \( \mathcal{L}_p\) stability can be established as follows [16]:

\[ \begin{align} \|(V_{\ell,1})_\tau\|_{\mathcal{L}_p} &\leq V_{\ell,1}(0)\zeta_1 +\frac{\|({\Theta}_1{d}_1)_\tau \|_{\mathcal{L}_p}+\bar{f}_{10}\|(V_{\ell,0})_\tau\|_{\mathcal{L}_p}}{\alpha_1}~~~~ \end{align} \]

(56)

where \( \zeta_i=1\) if \( p=\infty\) and \( \zeta_i=1/(\alpha_i p)^{1/p}\) if \( p\in[1,\infty)\) for \( i=0,1\) . By recursion, this result can be extended to an arbitrary number of hierarchically combined groups.

Example 7

As demonstrated in Example 6, the Lagrangian virtual system (23) is contracting with respect to \( q\) , having \( q=\dot{\mathtt{q}}\) and \( q = \dot{\mathtt{q}}_r = \dot{\mathtt{q}}_d-\Lambda (\mathtt{q}-\mathtt{q}_d)\) as its particular solutions. Let \( q_0=q\) for such \( q\) . The virtual system of \( q_1\) which has \( [q_0^{\top},q_1^{\top}]^{\top}=[\dot{\mathtt{q}}^{\top},\mathtt{q}^{\top}]^{\top}\) and \( [\dot{\mathtt{q}}_r^{\top},\mathtt{q}_d^{\top}]^{\top}\) as its particular solutions is given as

\[ \begin{align} \dot{q}_1 = q_0-\Lambda(q_1-\mathtt{q}) \\\end{align} \]

(57)

resulting in \( \delta \dot{q}_1=\mathrm{I}\delta q_0-\Lambda\delta q_1\) . Since the virtual system of \( q_0\) is contracting as in Example 6 and the virtual system \( \delta \dot{q}_1=-\Lambda\delta q_1\) is contracting in \( q_1\) due to \( \Lambda \succ 0\) , (55) of Theorem 7 implies that the whole system (21) for \( [q_0^{\top},q_1^{\top}]^{\top}\) is hierarchically contracting and robust against perturbation in the sense of Theorem 7 (\( F_{10}=\mathrm{I}\) in this case). Also, see [17, 16] for the hierarchical multi-timescale separation of tracking and synchronization control for multiple Lagrangian systems.

Theorem 8

Let \( {x}(k)=x_k\) and \( q(\mu,k)=q_k\) for any \( k\in\mathbb{N}\) for simplicity. If there exists a uniformly positive definite matrix \( {M}_k({x}_k,k)={{\Theta_k}}({x}_k,k)^{\top}{{\Theta_k}}({x}_k,k) \succ 0, \forall x_k,k\) , where \( {\Theta_k}\) defines a smooth coordinate transformation of \( \delta x_k\) , i.e., \( \delta{z}_k={\Theta_k}(x_k,k)\delta{x}_k\) , s.t. either of the following equivalent conditions holds for \( \exists \alpha \in (0,1)\) , \( \forall x_k,k\) :

\[ \begin{align} &\left\|{\Theta_{k+1}}({x}_{k+1},k+1)\frac{\partial {f}_k}{\partial {x}_k}{{\Theta_k}({x}_k,k)^{-1}}\right\| \leq \alpha \end{align} \]

(60)

\[ \begin{align} &\frac{\partial {f}_k}{\partial {x}_k}^{\top}{{M_{k+1}}({x}_{k+1},k+1)}\frac{\partial {f}_k}{\partial {x}_k} \preceq \alpha^2 {M_k}({x}_{k},k) \end{align} \]

(61)

then we have the following bound as long as we have \( \underline{m}\mathrm{I}\preceq M_x(x_k,k)\preceq \overline{m}\mathrm{I}, \forall x_k,k\) , as in (31):

\[ \begin{align} \|\xi_{1}(k)-\xi_{0}(k)\| \leq \frac{V_{\ell}(0)}{\sqrt{\underline{m}}}\alpha^k+\frac{\bar{d}(1-\alpha^k)}{1-\alpha}\sqrt{\frac{\overline{m}}{\underline{m}}} \end{align} \]

(62)

where \( V_{\ell}(k)=\int_{\xi_{0}}^{\xi_{1}}\|\Theta_k(q_k,k)\delta q_k\|\) as in (26) for the unperturbed trajectory \( \xi_{0}\) , perturbed trajectory \( \xi_1\) , and virtual state \( q_k=q(k)\) given in (59).

Remark 5

We consider control-affine nonlinear systems (63)–(65) in Sec. 2.2, 2.3, and 3.23.4. This is primarily because the controller design techniques for control-affine nonlinear systems are less complicated than those for control non-affine systems (which often result in \( u\) given implicitly by \( u = k(x,u,t)\)  [18, 19]), but still utilizable even for the latter, e.g.{}, by treating \( \dot{u}\) as another control input (see Example 8), or by solving the implicit equation \( u = k(x,u,t)\) iteratively with a discrete-time controller (see Example 9 and Remark 7).

Example 8

By using \( \dot{u}\) instead of \( u\) in (63) and (64), a control non-affine system, \( \dot{x}=f(x,u,t)\) , can be rewritten as

\[ \begin{align} \dfrac{d}{dt}\begin{bmatrix}x\\ u\end{bmatrix}=\begin{bmatrix}f(x,u,t)\\ 0\end{bmatrix}+\begin{bmatrix}0\\ \mathrm{I}\end{bmatrix}\dot{u} \\\end{align} \]

(66)

which can be viewed as a control-affine nonlinear system with the state \( [x^{\top},u^{\top}]^{\top}\) and control \( \dot{u}\) .

Example 9

One drawback of the technique in Example 8 is that we have to control \( \dot{u}\) instead of \( u\) , which could be difficult in practice. In this case, we can utilize the following control non-affine nonlinear system decomposed into control-affine and non-affine parts:

\[ \begin{align} \dot{x}=f(x,u,t)=f_a(x,t)+B_a(x,t)u+r(x,u,t) \\\end{align} \]

(67)

where \( r(x,u,t)=f(x,u,t)-f_a(x,t)-B_a(x,t)u\) . The controller \( u\) can now be designed implicitly as

\[ \begin{align} B_a(x,t)u = B_a(x,t)u^*-r(x,u,t) \end{align} \]

(68)

where \( u^*\) is a stabilizing controller for the control-affine system \( \dot{x}=f_a(x,t)+B_a(x,t)u^*\) . Since solving such an implicit equation in (68) in real-time could be unrealistic in practice, we will derive a learning-based approach to solve it iteratively for unknown \( r(x,u,t)\) , without deteriorating its stability performance (see Lemma 9 and Theorem 31 of Sec. 3.5).

Lemma 3

Let \( f:\mathbb{R}^n\times\mathbb{R}_{\geq0}\mapsto\mathbb{R}^n\) and \( B:\mathbb{R}^n\times\mathbb{R}_{\geq0}\mapsto\mathbb{R}^{n\times m}\) be piecewise continuously differentiable functions. Then there exists a matrix-valued function \( A:\mathbb{R}^n\times\mathbb{R}^n\times\mathbb{R}^m\times\mathbb{R}_{\geq0}\mapsto\mathbb{R}^{n\times n}\) s.t., \( \forall {s}\in\mathbb{R}^n\) , \( \bar{s}\in\mathbb{R}^n\) , \( \bar{u}\in\mathbb{R}^m\) , and \( t\in\mathbb{R}_{\geq0}\) ,

\[ \begin{align} A({s},\bar{s},\bar{u},t)\mathtt{e}&=f({s},t)+B({s},t)\bar{u}-f(\bar{s},t)-B(\bar{s},t)\bar{u} \\\end{align} \]

(69)

where \( \mathtt{e} = {s}-\bar{s}\) , and one such \( A\) is given as follows:

\[ \begin{align} A({s},\bar{s},\bar{u},t) &= \int_0^1\frac{\partial \bar{f}}{\partial {s}}(c {s}+(1-c)\bar{s},\bar{u},t)dc \end{align} \]

(70)

where \( \bar{f}({s},\bar{u},t)=f({s},t)+B({s},t)\bar{u}\) . We call \( A\) an SDC matrix if it is constructed to satisfy the controllability (or observability for estimation) condition. Also, the choice of \( A\) is not unique for \( n \geq 2\) , where \( n\) is the number of states, and the convex combination of such non-unique SDC matrices also verifies extended linearization as follows:

\[ \begin{align} \begin{aligned} &f({s},t)+B({s},t)\bar{u}X-f(\bar{s},t)-B(\bar{s},t)\bar{u} \\ &=A(\varrho,{s},\bar{s},\bar{u},t)({s}-\bar{s})=\sum_{i=1}^{s_A}\varrho_iA_i({s},\bar{s},\bar{u},t)({s}-\bar{s}) \end{aligned} \end{align} \]

(71)

where \( \varrho=(\varrho_1,…,\varrho_{s_A})\) , \( \sum_{i=1}^{s_A}\varrho_i=1, \varrho_i\geq 0\) , and each \( A_i\) satisfies the relation \( \bar{f}({s},\bar{u},t)-\bar{f}(\bar{s},\bar{u},t)=A_i({s},\bar{s},\bar{u},t)({s}-\bar{s})\) .

Example 10

Let us illustrate how Lemma 3 can be used in practice taking the following nonlinear system as an example:

\[ \begin{align} \dot{x} = [x_2, -x_1x_2]^{\top}+[0,\cos x_1]^{\top}u \end{align} \]

(73)

where \( x = [x_1,x_2]^{\top}\) . If we use \( ({s},\bar{s},\bar{u})=(x,x_d,u_d)\) in Lemma 3 for a given target trajectory \( (x_d,u_d)\) that satisfies (73), evaluating the integral of (70) gives

\[ \begin{align} A_1(x,x_d,u_d,t) = -\begin{bmatrix}0 & 1\\ \frac{x_2+x_{2d}}{2}-\frac{u_d(\cos x_1-\cos x_{d1})}{x_1-x_{d1}}&\frac{x_1+x_{1d}}{2}\end{bmatrix}~~~~~~~~ \end{align} \]

(74)

due to the relation \( \partial \bar{f}/\partial {s} = \left[\begin{smallmatrix}0 & 1\\ -{s}_2 & -{s}_1\end{smallmatrix}\right]+\left[\begin{smallmatrix}0 & 0\\ -u_d\sin {s}_1 & 0\end{smallmatrix}\right]\) for \( \bar{f}({s},u_d,t)=f({s},t)+B({s},t)u_d\) , where \( x_d = [x_{1d},x_{2d}]^{\top}\) . Note that we have

\[ \begin{align} \frac{(\cos x_1-\cos x_{d1})}{x_1-x_{d1}} = -\sin\left(\frac{x_1+x_{1d}}{2}\right)\rm{sinc}\left(\frac{x_1-x_{1d}}{2}\right) \\\end{align} \]

(75)

and thus \( A(x,x_d,u_d,t)\) is defined for all \( x\) , \( x_d\) , \( u_d\) , and \( t\) . The SDC matrix (74) indeed verifies \( A_1(x,x_d,u_d,t)(x-x_d)=\bar{f}(x,t)-\bar{f}(x_d,t)\) .

We can see that the following is also an SDC matrix of the nonlinear system (73):

\[ \begin{align} A_2(x,x_d,u_d,t) = -\begin{bmatrix}0 & 1\\ x_2-\frac{u_d(\cos x_1-\cos x_{d1})}{x_1-x_{d1}}&x_{1d}\end{bmatrix}. \end{align} \]

(76)

Therefore, the convex combination of \( A_1\) in (74) and \( A_2\) in (76), \( A = \varrho_1A_1+\varrho_2A_2\) with \( \varrho_1+\varrho_2=1\) , \( \varrho_1,\varrho_2 \geq 0\) , is also an SDC matrix due to Lemma 3.

Remark 6

This does not mean that contraction theory works only for the SDC parameterized nonlinear systems but implies that it can be used with the other techniques discussed in Sec. 2.2.1. For example, due to the extended linear form given in Table 3, the results to be presented in Sec. 2.2 and in Sec. 2.3 based on the SDC formulation are applicable to linearized dynamics that can be viewed as an LTV system with some modifications, regarding the dynamics modeling error term as an external disturbance as in Sec. 2.2.1.2. Also, the original SDC formulation with respect to a fixed point (e.g.{}, \( (s,\bar{s},\bar{u})=(x,0,0)\) in Lemma 3) can still be used to obtain contraction conditions independent of a target trajectory \( (x_d,u_d)\) (see Theorem 10 for details).

Remark 7

For control non-affine nonlinear systems, we could find \( f(x,u,t)-f(x_d,u_d,t)=A(x,x_d,u,u_d,t)(x-x_d)+B(x,x_d,u,u_d,t)(u-u_d)\) by Lemma 3 on the SDC formulation and use it in Theorem 14, although (77) has to be solved implicitly as \( B\) depends on \( u\) in this case. A similar approach for the CCM formulation can be found in [18, 19]. As discussed in Example 9, designing such implicit control laws will be discussed in Lemma 9 and Theorem 31 of Sec. 3.5.1.

Lemma 4

Consider a general feedback controller \( u\) defined as \( u = k(x,x_d,u_d,t)\) with \( k(x_d,x_d,u_d,t)=u_d\) , where \( k:\mathbb{R}^n\times\mathbb{R}^n\times\mathbb{R}^m\times\mathbb{R}_{\geq 0}\mapsto\mathbb{R}^{m}\) . If \( k\) is piecewise continuously differentiable, then \( \exists K:\mathbb{R}^n\times\mathbb{R}^n\times\mathbb{R}^m\times\mathbb{R}_{\geq 0}\mapsto\mathbb{R}^{m\times n}\) s.t. \( u = k(x,x_d,u_d,t) = u_d-K(x,x_d,u_d,t)(x-x_d)\) .

Remark 8

Lemma 4 implies that designing optimal \( k\) of \( u = k(x,x_d,u_d,t)\) reduces to designing the optimal gain \( K(x,x_d,u_d,t)\) of \( u=u_d-K(x,x_d,u_d,t)(x-x_d)\) . We could also generalize this idea further using the CCM-based differential feedback controller \( \delta u = k(x,\delta x,u,t)\)  [11, 20, 21, 19, 18, 22] (see Theorem 18).

Theorem 9

Let \( \beta\) be defined as \( \beta = 0\) for deterministic systems (63) and

\[ \begin{align} \beta = \alpha_s=L_m\bar{g}_c^2(\alpha_G+{1}/{2}) \\\end{align} \]

(83)

for stochastic systems (64), respectively, where \( \bar{g}_c\) is given in (64), \( L_m\) is the Lipschitz constant of \( \partial M/\partial x_i\) for \( M\) of (77), and \( \alpha_G\in\mathbb{R}_{>0}\) is an arbitrary constant as in Theorem 5. Also, let \( W = M(x,x_d,u_d,t)^{-1}\) (or \( W=M(x,t)^{-1}\) , see Theorem 10), \( \bar{W}=\nu W\) , and \( \nu=\overline{m}\) . Then the following three matrix inequalities are equivalent:

\[ \begin{align} \dot{M}+M\frac{\partial \zeta}{\partial q}+\frac{\partial \zeta}{\partial q}^{\top}M &\preceq -2 \alpha M-\beta \mathrm{I},~\forall \mu\in[0,1] \end{align} \]

(84)

\[ \begin{align} \dot{M}+2\rm{sym}(MA)-2MBR^{-1}B^{\top}M&\preceq -2 \alpha M-\beta \mathrm{I} \end{align} \]

(85)

\[ \begin{align} -\dot{\bar{W}}+2\rm{sym}(A\bar{W})-2\nu BR^{-1}B^{\top} &\preceq -2\alpha \bar{W}-\frac{\beta}{\nu}\bar{W}^2 \end{align} \]

(86)

where \( \zeta\) is as defined in (81). For stochastic systems with \( \beta=\alpha_s>0\) , these inequalities are also equivalent to

\[ \begin{equation} \begin{bmatrix} -\dot{\bar{W}}+2\rm{sym}{}(A\bar{W})-2\nu BR^{-1}B^{\top}+2\alpha \bar{W}& \bar{W} \\ \bar{W} & -\frac{\nu}{\beta}\mathrm{I} \end{bmatrix} \preceq 0. \end{equation} \]

(87)

Note that \( \nu\) and \( \bar{W}\) are required for (86) and (87) and the arguments \( (x,x_d,u_d,t)\) for each matrix are suppressed for notational simplicity.

Furthermore, under these equivalent contraction conditions, Theorems 4 and 5 hold for the virtual systems (79) and (80), respectively. In particular, if \( \underline{m}\mathrm{I} \preceq M\preceq \overline{m}\mathrm{I}\) of (31) holds, or equivalently

\[ \begin{align} \mathrm{I} \preceq \bar{W} (x,x_d,u_d,t)\preceq \chi \mathrm{I} \end{align} \]

(88)

holds for \( \chi=\overline{m}/\underline{m}\) , then we have the following bounds:

\[ \begin{align} \|x(t)-x_d(t)\| &\leq \frac{V_{\ell}(0)}{\sqrt{\underline{m}}}e^{-\alpha t}+\frac{\bar{d}_c}{\alpha}\sqrt{\chi}(1-e^{-\alpha t}) \end{align} \]

(89)

\[ \begin{align} \mathbb{E} \left[\|x(t)-x_d(t)\|^2\right] &\leq \frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha t}+\frac{C_C}{2\alpha}\chi \end{align} \]

(90)

where \( V_{s\ell}=\int^x_{x_d}\delta q^{\top}M\delta q\) and \( V_{\ell}=\int^x_{x_d}\|\Theta\delta q\|\) are as given in Theorem 3 with \( M=\Theta^{\top}\Theta\) , the disturbance bounds \( \bar{d}_c\) and \( \bar{g}_c\) are given in (63) and (64), respectively, and \( C_C = \bar{g}_c^2({2}{\alpha_G}^{-1}+1)\) . Note that for stochastic systems, the probability that \( \|x-x_d\|\) is greater than or equal to \( \varepsilon\in\mathbb{R}_{> 0}\) is given as

\[ \begin{align} \mathbb{P} \left[\|x(t)-x_d(t)\|\geq\varepsilon\right] \leq \frac{1}{\varepsilon^2}\left(\frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha t}+\frac{C_C}{2\alpha}\chi\right).~~~~~~ \end{align} \]

(91)

Theorem 10

Let \( (\bar{x},\bar{u})\) be a fixed point selected arbitrarily in \( \mathbb{R}^n\times\mathbb{R}^m\) , e.g.{}, \( (\bar{x},\bar{u})=(0,0)\) , and let \( A(x,t)\) be an SDC matrix constructed with \( ({s},\bar{s},\bar{u})=(x,\bar{x},\bar{u})\) in Lemma 3, i.e.,

\[ \begin{align} A(\varrho,x,t)(x-\bar{x})=f(x,t)+B(x,t)\bar{u}-f(\bar{x},t)-B(\bar{x},t)\bar{u}.~~~~ \end{align} \]

(92)

Suppose that the contraction metric of Theorem 9 is designed by \( M(x,t)\) with \( A\) of (92), independently of the target trajectory \( (x_d,u_d)\) , and that the systems (63) and (64) are controlled by

\[ \begin{align} u=u_d-R(x,t)^{-1}B(x,t)^{\top}M(x,t)(x-x_d) \end{align} \]

(93)

with such \( M(x,t)\) , where \( R(x,t)\succ0\) is a weight matrix on \( u\) . If the function \( \phi(x,x_d,u_d,t)=A(\varrho,x,t)(x_d-\bar{x})+B(x,t)(u_d-\bar{u})\) is Lipschitz in \( x\) with its Lipschitz constant \( \bar{L}\) , then Theorem 9 still holds with \( \alpha\) of the conditions (84)–(87) replaced by \( \alpha+\bar{L}\sqrt{\overline{m}/\underline{m}}\) . The same argument holds for state estimation of Theorem 15 to be discussed in Sec. 2.3.2.

Remark 9

As demonstrated in [23], we could directly use the extra term \( 2\delta q^{\top}M({\partial\phi}/{\partial q})\delta q\) of (95) in (84)–(87) without upper-bounding it, although now the constraints of Theorem 10 depend on \( (x,q,t)\) instead of \( (x,t)\) . Also, the following two inequalities given in [23] with \( \bar{\gamma}=\nu \gamma\) , \( \gamma \in \mathbb{R}_{\geq 0}\) :

\[ \begin{align} &-\dot{\bar{W}}+A\bar{W}+\bar{W}A^{\top}+\bar{\gamma} \mathrm{I}-\nu BR^{-1}B^{\top} \preceq 0 \end{align} \]

(96)

\[ \begin{align} &\begin{bmatrix} \bar{\gamma} \mathrm{I}+\nu BR^{-1}B^{\top}-\bar{W}\phi^{\top}-\phi \bar{W}-2\alpha \bar{W}& \bar{W} \\ \bar{W} & \frac{\nu}{2\alpha_s}\mathrm{I} \end{bmatrix} \succeq 0 \\\end{align} \]

are combined as one LMI (87) in Theorems 9 and 10.

Example 11

The inequalities in Theorem 9 can be interpreted as in the Riccati inequality in \( \mathcal{H}_{\infty}\) control. Consider the following system:

\[ \begin{align} \begin{aligned} \dot{x} &= Ax+B_uu+B_ww \\ z &= C_zx \end{aligned} \end{align} \]

(97)

where \( A\in\mathbb{R}^{n\times n}\) , \( B_u\in\mathbb{R}^{n\times m}\) , \( B_w\in\mathbb{R}^{n\times w}\) , and \( C_z\in\mathbb{R}^{o\times n}\) are constant matrices, \( w\in\mathbb{R}^w\) is an exogenous input, and \( z\in\mathbb{R}^o\) is a system output. As shown in [24] and [25, p. 109], there exists a state feedback gain \( K=R^{-1}B_u^{\top}P\) such that the \( \mathcal{L}_2\) gain of the closed-loop system (97), \( \sup_{\|w\|\neq 0}{\|z\|}/{\|w\|}\) , is less than or equal to \( \gamma\) if

\[ \begin{align} 2\rm{sym}{}(PA)-2PB_uR^{-1}B_u^{\top}P+\frac{PB_wB_w^{\top}P}{\gamma^2}+C_z^{\top}C_z \preceq 0~~~~~~ \end{align} \]

(98)

has a solution \( P \succ 0\) , where \( R\succ 0\) is a constant weight matrix on the input \( u\) . If we select \( B_w\) and \( C_z\) to have \( B_wB_w^{\top}\succeq (P^{-1})^2\) and \( C_z^{\top}C_z \succeq 2\alpha P\) for some \( \alpha >0\) , the contraction condition (85) in Theorem 9 can be satisfied with \( M=P\) , \( B = B_u\) , and \( \beta = 1/\gamma^2\) due to (98).

Theorem 11

Let \( \mathcal{A}= A-BR^{-1}B^{\top}M\) , \( \mathcal{B}=M^{-1}\) , \( C=\sqrt{2\alpha}\Theta\) , and \( \mathcal{D}=0\) in (99), where \( M=\Theta^{\top}\Theta\) , and the other variables are as defined in Theorem 9. Then (100) with \( \mathcal{P}=M\) and \( \gamma=1/\sqrt{\beta}\) is equivalent to (87), and thus (87) implies \( \mathcal{L}_2\) gain stability of (99) with its \( \mathcal{L}_2\) gain less than or equal to \( \gamma\) .

Theorem 12

If (104) holds, the LMI (100) for the bounded real lemma holds with \( \mathcal{P}=\gamma \mathcal{Q}\) , i.e., the system (99) is \( \mathcal{L}_2\) gain stable with its \( \mathcal{L}_2\) gain less than or equal to \( \gamma\) . Thus, for systems with \( \mathcal{A}\) , \( \mathcal{B}\) , \( \mathcal{C}\) , and \( \mathcal{D}\) defined in Theorem 11, the condition (104) guarantees the contraction condition (87) of Theorem 9.

Example 12

Theorems 11 and 12 imply the following statements.

  • If (100) holds, the system is finite-gain \( \mathcal{L}_2\) stable with \( \gamma\) as its \( \mathcal{L}_2\) gain.
  • If (104) holds, the system is finite-gain \( \mathcal{L}_2\) stable with \( \gamma\) as its \( \mathcal{L}_2\) gain.
  • (102) is equivalent to (104), and to the output strict passivity condition with dissipation \( 1/\gamma\)  [26, p. 231].
  • If (104) holds (KYP lemma), (100) holds (bounded real lemma). The converse is not necessarily true.

Theorem 13

If one of the matrix inequalities of Theorem 9 holds, we have the following bound for \( \chi={\overline{m}}/\underline{{m}}\) :

\[ \begin{align} &\lim_{t \to \infty}\sqrt{ \mathbb{E} \left[\|x-x_d\|^2\right]} \leq c_0(\alpha,\alpha_G)\sqrt{\chi} \leq c_0(\alpha,\alpha_G)\chi \end{align} \]

(105)

where \( c_0(\alpha,\alpha_G)=\bar{d}_c /\alpha\) for deterministic systems and \( c_0(\alpha,\alpha_G)=\bar{g}_c\sqrt{(2\alpha_G^{-1}+1)/(2\alpha)}\) for stochastic systems, with the variables given in Theorem 9.

Theorem 14

Suppose that \( \alpha\) , \( \alpha_G\) , \( \bar{d}_c\) , and \( \bar{g}_c\) in (105) and the Lipschitz constant \( L_m\) of \( \partial M/\partial x_i\) in Theorem 9 are given. If the pair \( (A,B)\) is uniformly controllable, the non-convex optimization problem of minimizing the upper bound (105) is equivalent to the following convex optimization problem, with the convex contraction constraint (86) or (87) of Theorem 9 and \( \mathrm{I}\preceq \bar{W}\preceq\chi\mathrm{I}\) of (88):

\[ \begin{align}\begin{aligned}{J}_{CV}^* &= \min_{\nu\in\mathbb{R}_{>0},\chi \in \mathbb{R},\bar{W} \succ 0} c_0\chi+c_1\nu+c_2 P(\chi,\nu,\bar{W}) \\ &\text{ s.t. (86) and (88) for deterministic systems} \\ & \text{ s.t. (87) and (88) for stochastic systems} \end{aligned}\end{align} \]

(106)

where \( W = M(x,x_d,u_d,t)^{-1}\) for Theorem 9 or \( W=M(x,t)^{-1}\) for Theorem 10, \( \bar{W}=\nu W\) , \( \nu=\overline{m}\) , \( \chi=\overline{m}/\underline{m}\) , \( c_0\) is as defined in (105) of Theorem 13, \( c_1,c_2\in\mathbb{R}_{\geq 0}\) , and \( P\) is a performance-based convex cost function.

The weight \( c_1\) for \( \nu\) can be viewed as a penalty on the \( 2\) -norm of the feedback control gain \( K\) of \( u=u_d-K(x-x_d)\) in (77). Using non-zero \( c_1\) and \( c_2\) thus enables finding contraction metrics optimal in a different sense. Furthermore, the coefficients of the SDC parameterizations \( \varrho\) in Lemma 3, i.e., \( A = \sum\varrho_iA_i\) in (71), can also be treated as decision variables by convex relaxation, thereby adding design flexibility to mitigate the effects of external disturbances while verifying the system controllability.

Example 13

The weights \( c_0\) and \( c_1\) of the CV-STEM control of Theorem 14 establish an analogous trade-off to the case of the LQR with the cost weight matrices of \( Q\) for state and \( R\) for control, since the upper bound of the steady-state tracking error (105) is proportional to \( \chi\) , and an upper bound of \( \|K\|\) for the control gain \( K\) in (77), \( u=u_d-K(x-x_d)\) , is proportional to \( \nu\) . In particular [27, 28],

  • if \( c_1\) is much greater than \( c_0\) in (106) of Theorem 14, we get smaller control effort but with a large steady-state tracking error, and
  • if \( c_1\) is much smaller than \( c_0\) in (106) of Theorem 14, we get a smaller steady-state state tracking error but with larger control effort.

Theorem 15

Suppose \( \exists \bar{\rho},\bar{c}\in\mathbb{R}_{\geq0}\) s.t. \( \|R^{-1}(\hat{x},t)\|\leq\bar{\rho}\) , \( \|C(\varrho_c,x,\hat{x},t)\|\leq\bar{c}, \forall x,\hat{x},t\) . Suppose also that \( \underline{m}\mathrm{I}\preceq M \preceq \overline{m} I\) of (31) holds, or equivalently, \( \mathrm{I}\preceq \bar{W} \preceq \chi I\) of (88) holds with \( W=M(\hat{x},t)^{-1}\) , \( \bar{W}=\nu W\) , \( \nu=\overline{m}\) , and \( \chi=\overline{m}/\underline{m}\) . As in Theorem 9, let \( \beta\) be defined as \( \beta = 0\) for deterministic systems (107) and

\[ \begin{align} \beta &= \alpha_s = \alpha_{e0}+\nu^2\alpha_{e1} = \frac{L_m\bar{g}_{e0}^2}{2}\left(\alpha_G+\frac{1}{2}\right)+\frac{\nu^2L_m\bar{\rho}^2\bar{c}^2\bar{g}_{e1}^2}{2}\left(\alpha_G+\frac{1}{2}\right) \\\end{align} \]

(116)

for stochastic systems (108), where \( 2\alpha_{e0} = L_m\bar{g}_{e0}^2(\alpha_G+{1}/{2})\) , \( 2\alpha_{e1} = L_m\bar{\rho}^2\bar{c}^2\bar{g}_{e1}^2(\alpha_G+{1}/{2})\) , \( L_m\) is the Lipschitz constant of \( \partial W/\partial x_i\) , \( \bar{g}_{e0}\) and \( \bar{g}_{e1}\) are given in (108), and \( \exists\alpha_G\in\mathbb{R}_{>0}\) is an arbitrary constant as in Theorem 5.

If \( M(\hat{x},t)\) in (111) is constructed to satisfy the following convex constraint for \( \exists \alpha \in \mathbb{R}_{>0}\) :

\[ \begin{equation} \dot{\bar{W}}+2\rm{sym}{}(\bar{W}A-\nu \bar{C}^{\top}R^{-1}C)\preceq -2\alpha \bar{W}-\nu\beta \mathrm{I} \end{equation} \]

(117)

then Theorems 4 and 5 hold for the virtual systems (112) and (113), respectively, i.e., we have the following bounds for \( \mathtt{e}=\hat{x}-x\) with \( \nu=\overline{m}\) and \( \chi=\overline{m}/\underline{m}\) :

\[ \begin{align} \|\mathtt{e}(t)\| &\leq \sqrt{\overline{m}}{V_{\ell}(0)}e^{-\alpha t}+\frac{\bar{d}_{e0}\sqrt{\chi}+\bar{\rho}\bar{c}\bar{d}_{e1}\nu}{\alpha}(1-e^{-\alpha t}) \end{align} \]

(118)

\[ \begin{align} \mathbb{E} \left[\|\mathtt{e}(t)\|^2\right] &\leq \overline{m}{ \mathbb{E} }e^{-2\alpha t}+\frac{C_{e0}\chi+C_{e1}\chi\nu^2}{2\alpha} \end{align} \]

(119)

where \( V_{s\ell}=\int^{\hat{x}}_{x}\delta q^{\top}W\delta q\) and \( V_{\ell}=\int^{\hat{x}}_{x}\|\Theta\delta q\|\) are given in Theorem 3 with \( W=M^{-1}=\Theta^{\top}\Theta\) defining a contraction metric, the disturbance bounds \( \bar{d}_{e0}\) , \( \bar{d}_{e1}\) , \( \bar{g}_{e0}\) , and \( \bar{g}_{e1}\) are given in (107) and (108), respectively, \( C_{e0} = \bar{g}_{e0}^2({2}{\alpha_G}^{-1}+1)\) , and \( C_{e1} = \bar{\rho}^2\bar{c}^2\bar{g}_{e1}^2({2}{\alpha_G}^{-1}+1)\) . Note that for stochastic systems, the probability that \( \|\mathtt{e}\|\) is greater than or equal to \( \varepsilon\in\mathbb{R}_{> 0}\) is given as

\[ \begin{align} \mathbb{P} \left[\|\mathtt{e}(t)\|\geq\varepsilon\right] \leq \frac{1}{\varepsilon^2}\left(\overline{m}{ \mathbb{E} }e^{-2\alpha t}+\frac{C_E}{2\alpha}\right) \end{align} \]

(120)

where \( C_E=C_{e0}\chi+C_{e1}\chi\nu^2\) .

Remark 10

Although (117) is not an LMI due to the nonlinear term \( -\nu\beta \mathrm{I}\) on its right-hand side for stochastic systems (108), it is a convex constraint as \( -\nu \beta = -\nu\alpha_s = -\nu\alpha_{e0}-\nu^3\alpha_{e1}\) is a concave function for \( \nu\in\mathbb{R}_{>0}\)  [29, 28].

Theorem 16

If (117) of Theorem 15 holds, then we have the following bound:

\[ \begin{align} &\lim_{t \to \infty}\sqrt{ \mathbb{E} \left[\|\hat{x}-x\|^2\right]} \leq c_0(\alpha,\alpha_G)\chi+c_1(\alpha,\alpha_G)\nu^s \end{align} \]

(123)

where \( c_0=\bar{d}_{e0}/\alpha\) , \( c_1=\bar{\rho}\bar{c}\bar{d}_{e1}/\alpha\) , \( s=1\) for deterministic systems (112), and \( c_0=\sqrt{C_{e0}/(2\alpha)}\) , \( c_1=C_{e1}/(2\sqrt{2\alpha C_{e0}})\) , and \( s=2\) for stochastic systems (113), with \( C_{e0}\) and \( C_{e0}\) given as \( C_{e0} = \bar{g}_{e0}^2(2\alpha_G^{-1}+1)\) and \( C_{e1} = \bar{\rho}^2\bar{c}^2\bar{g}_{e1}^2(2\alpha_G^{-1}+1)\) .

Theorem 17

Suppose that \( \alpha\) , \( \alpha_G\) , \( \bar{d}_{e0}\) , \( \bar{d}_{e1}\) , \( \bar{g}_{e0}\) , \( \bar{g}_{e1}\) , and \( L_m\) in (117) and (123) are given. If the pair \( (A,C)\) is uniformly observable, the non-convex optimization problem of minimizing the upper bound (123) is equivalent to the following convex optimization problem with the contraction constraint (117) and \( \mathrm{I}\preceq \bar{W}\preceq\chi\mathrm{I}\) of (88):

\[ \begin{align}\begin{aligned}{J}_{CV}^* &= \min_{\nu\in\mathbb{R}_{>0},\chi \in \mathbb{R},\bar{W} \succ 0} c_0\chi+c_1\nu^s+c_2 P(\chi,\nu,\bar{W}) \\ &\text{ s.t. (117) and (88)}\end{aligned}\end{align} \]

(125)

where \( c_0\) , \( c_1\) , and \( s\) are as defined in (123) of Theorem 16, \( c_2\in\mathbb{R}_{\geq 0}\) , and \( P\) is some performance-based cost function as in Theorem 14.

The weight \( c_1\) for \( \nu^s\) indicates how much we trust the measurement \( y(t)\) . Using non-zero \( c_2\) enables finding contraction metrics optimal in a different sense in terms of \( P\) . Furthermore, the coefficients of the SDC parameterizations \( \varrho_a\) and \( \varrho_c\) in Lemma 3, i.e., \( A = \sum\varrho_{a,i}A_i\) and \( C = \sum\varrho_{c,i}C_i\) in (109) and (110), can also be treated as decision variables by convex relaxation [30], thereby adding a design flexibility to mitigate the effects of external disturbances while verifying the system observability.

Example 14

The weights \( c_0\) and \( c_1\) of the CV-STEM estimation of Theorem 17 has an analogous trade-off to the case of the Kalman filter with the process and sensor noise covariance matrices, \( Q\) and \( P\) , respectively, since the term \( c_0\chi\) in upper bound of the steady-state tracking error in (123) becomes dominant if measurement noise is much smaller than process noise (\( \bar{d}_{e0}\gg \bar{d}_{e1}\) or \( \bar{g}_{e0}\gg \bar{g}_{e1}\) ), and the term \( c_1\nu^s\) becomes dominant if measurement noise is much greater than process noise (\( \bar{d}_{e0}\ll \bar{d}_{e1}\) or \( \bar{g}_{e0}\ll \bar{g}_{e1}\) ). In particular [27, 28],

  • if \( c_1\) is much greater than \( c_0\) , large measurement noise leads to state estimation that responds slowly to unexpected changes in the measurement \( y\) , i.e., small estimation gain due to \( \nu=\overline{m}\geq\|M\|\) , and
  • if \( c_1\) is much smaller than \( c_0\) , large process noise leads to state estimation that responds fast to changes in the measurement, i.e., large \( \nu=\overline{m}\geq\|M\|\) .

This is also because the solution \( Q = P^{-1}\succ0\) of the Kalman filter Riccati equation [31, p. 375], \( \dot{P}=AP+PA^{\top}-PC^{\top}R^{-1}CP+Q\) , can be viewed as a positive definite matrix that defines a contraction metric as discussed in Example 3.

Theorem 18

Suppose the CV-STEM control of Theorem 14 is designed with its contraction condition replaced by the following set of convex constraints along with \( \mathrm{I}\preceq \bar{W}\preceq\chi\mathrm{I}\) of (88):

\[ \begin{align} &B_{\bot}^{\top} \left( -\frac{\partial \bar{W}}{\partial t}-\partial_f \bar{W} + 2\rm{sym}{}\left({\frac{\partial f}{\partial x} \bar{W}}\right) + 2 \alpha \bar{W} \right)B_{\bot} \prec 0~~~~ \end{align} \]

(126)

\[ \begin{align} &B_{\bot}^{\top} \left( \partial_{b_i} \bar{W} - 2\rm{sym}{}\left({\frac{\partial b_i}{\partial x} \bar{W}}\right) \right) B_{\bot} = 0,\forall x,t,i \end{align} \]

(127)

where \( B_{\bot}(x,t)\) is a matrix whose columns span the cokernel of \( B(x,t)\) defined as \( \mathrm{coker}(B)= \{a\in\mathbb{R}^n|B^{\top}a=0\}\) satisfying \( B^{\top}B_{\bot} = 0\) , \( b_i(x,t)\) is the \( i\) th column of \( B(x,t)\) , \( \bar{W}(x,t)=\nu W(x,t)\) , \( W(x,t) = M(x,t)^{-1}\) for \( M(x,t)\) that defines a contraction metric, \( \nu=\overline{m}\) , \( \chi=\overline{m}/\underline{m}\) , and \( \partial_{p} F = \sum_{k=1}^n(\partial F/\partial x_k)p_k\) for \( p(x,t)\in\mathbb{R}^n\) and \( F(x,t) \in \mathbb{R}^{n\times n}\) . Then the controlled system (63) with \( d_c=0\) , i.e., \( \dot{x}=f(x,t)+B(x,t)u\) , is

  1. universally exponentially open-loop controllable
  2. universally exponentially stabilizable via sampled-data feedback with arbitrary sample times
  3. universally exponentially stabilizable using continuous feedback defined almost everywhere, and everywhere in a neighborhood of the target trajectory

all with rate \( \alpha\) and overshoot \( R = \sqrt{\overline{m}/\underline{m}}\) for \( \underline{m}\) and \( \overline{m}\) given in \( \underline{m}\mathrm{I}\preceq M\preceq\overline{m}\mathrm{I}\) of (31). Such a positive definite matrix \( M(x,t)\) defines a CCM.

Given a CCM, there exists a differential feedback controller \( \delta u = k(x,\delta x,u,t)\) that stabilizes the following differential dynamics of (63) with \( d_c=0\) along all solutions (i.e., the closed-loop dynamics is contracting as in Definition 3):

\[ \begin{align} \delta\dot{x} = A(x,u,t)\delta x+B(x,t)\delta u \end{align} \]

(128)

where \( A=\partial f/\partial x+\sum_{i=1}^m(\partial b_i/\partial x)u_i\) and \( \delta u\) is a tangent vector to a smooth path of controls at \( u\) . Furthermore, it can be computed as follows [11, 20, 22]:

\[ \begin{align} u(x(t),t) = u_d+\int_{0}^1k(\gamma(\mu,t),\partial_{\mu}\gamma(\mu,t),u(\gamma(\mu,t),t),t)d\mu~~~~ \end{align} \]

(129)

where \( \partial_{\mu}\gamma=\partial \gamma/\partial \mu\) , \( \gamma\) is the minimizing geodesic with \( \gamma(0,t) = x_d\) and \( \gamma(1,t) = x\) for \( \mu\in[0,1]\) , \( (x_d,u_d)\) is a given target trajectory in \( \dot{x}_d=f(x_d,t)+B(x_d,t)u_d\) of (65), and the computation of \( k\) is given in [11, 20] (see Remark 11). Furthermore, Theorem 4 for deterministic disturbance rejection still holds and the CCM controller (129) thus possesses the same sense of oplimality as for the CV-STEM control in Theorem 14.

Example 15

Let us consider a case where \( k\) of the differential feedback controller does not depend on \( u\) , and is defined explicitly by \( M(x,t) \succ 0\) as follows [32]:

\[ \begin{align} &u = u_d-\int_{x_d}^xR(q,t)^{-1}B(q,t)^{\top}M(q,t)\delta q \end{align} \]

(131)

where \( R(x,t) \succ 0\) is a given weight matrix and \( q\) is a smooth path that connects \( x\) to \( x_d\) . Since (131) yields \( \delta u=-R(x,t)^{-1}B(x,t)^{\top}M(x,t)\delta x\) , the contraction conditions (126) and (127) could be simplified as

\[ \begin{align} &-\frac{\partial \bar{W}}{\partial t}-\partial_f \bar{W}+2\rm{sym}{}\left(\frac{\partial f}{\partial x}\bar{W}\right)-\nu BR^{-1}B^{\top} \preceq -2\alpha \bar{W} \end{align} \]

(132)

\[ \begin{align} &-\partial_{b_i}\bar{W}+2\rm{sym}{}\left(\frac{\partial b_i}{\partial x}\bar{W}\right) = 0 \end{align} \]

(133)

yielding the convex optimization-based control synthesis algorithm of Theorem 18 independently of \( (x_d,u_d)\) similar to that of Theorem 14 (see [32] for details).

Example 16

If \( B\) is of the form \( [0,\mathrm{I}]^{\top}\) for the zero matrix \( 0\in\mathbb{R}^{n_1\times m}\) and identity matrix \( \mathrm{I}\in\mathbb{R}^{n_2\times m}\) with \( n=n_1+n_2\) , the condition (133) says that \( M\) should not depend on the last \( n_2\) state variables [11].

Remark 11

We could consider stochastic perturbation in Theorem 18 using Theorem 5, even with the differential control law of the form (129) or (131) as demonstrated in [32]. Also, although the relation (127) or (133) is not included as a constraint in Theorem 14 for simplicity of discussion, the dependence of \( \dot{\bar{W}}\) on \( u\) in Theorem 14 can be removed by using it in a similar way to [32].

As stated in (129), the computation of the differential feedback gain \( k(x,\delta x,u,t)\) and minimizing geodesics \( \gamma\) is elaborated in [11, 20]. For example, if \( M\) is state-independent, then geodesics are just straight lines.

Example 17 (Tracking control)

Let us consider the problem of learning a computationally-expensive (or unknown) feedback control law \( u^*(x,x_d,u_d,t)\) that tracks a target trajectory \( (x_d,u_d)\) given by \( \dot{x}_d=f(x_d,t)+B(x_d,t)u_d\) as in (65), assuming that \( f\) and \( B\) are known. If the dynamics is perturbed by \( d_c(x,t)\) as in (63), we have that

\[ \begin{align} \dot{x} &= f(x,t)+B(x,t)u_L+d_c(x,t) = f(x,t)+B(x,t)u^*+B(x,t)(u_L-u^*)+d_c(x,t) \\\end{align} \]

(137)

where \( u_L(x,x_d,u_d,t)\) denotes a learning-based control law that models \( u^*(x,x_d,u_d,t)\) . As long as \( u^*\) renders the closed-loop dynamics \( \dot{x}=f(x,t)+B(x,t)u^*(x,x_d,u_d,t)\) contracting with respect to \( x\)  [23, 11, 27, 20], we can define the functions of (134) as follows:

\[ \begin{align} g (q,\varpi,t) &= f(q,t)+B(q,t)u^*(q,x_d,u_d,t) \end{align} \]

(138)

\[ \begin{align} \Delta_L(\varpi,t) &= B(x,t)(u_L(x,x_d,u_d,t)-u^*(x,x_d,u_d,t)) \end{align} \]

(139)

with \( d(\mu,\varpi,t)=\mu d_c(x,t)\) , where \( \varpi=[x^{\top},x_d^{\top},u_d^{\top}]^{\top}\) in this case. It can be easily verified that (134) indeed has \( q(\mu=0,t)=x_d\) and \( q(\mu=1,t)=x\) as particular solutions if \( u^*(x_d,x_d,u_d,t)=u_d\) . Similarly, we can use (135) with \( G=\mu G_c(x,t)\) if the dynamics is stochastically perturbed by \( G_c(x,t)d\mathscr{W}\) as in (64).

The learning objective here is to make \( \|u_L(x,x_d,u_d,t)-u^*(x,x_d,u_d,t)\|\) as small as possible, which aligns with the aforementioned objective in (136). Note that if \( \|B\|\) is bounded in the compact set \( \mathcal{S}\) of (136), we can bound \( \|\Delta_L\|\) of (139) with \( \epsilon_{\ell0}\neq 0\) and \( \epsilon_{\ell1}=0\) as to be explained in Remark 12. See Theorems 22, 24, and 27 for details.

Example 18 (State estimation)

Next, let us consider the problem of learning a computationally-expensive (or unknown) state estimator, approximating its estimation gain \( L(\hat{x},t)\) by \( L_L(\hat{x},t)\) . If there is no disturbance in (107) and (108), we have that

\[ \begin{align} \dot{\hat{x}} &= f(\hat{x},t)+L_L(h(x,t)-h(\hat{x},t)) \\ &= f(\hat{x},t)+L(h(x,t)-h(\hat{x},t))+(L_L-L)(h(x,t)-h(\hat{x},t)) \\\end{align} \]

(140)

where \( \dot{x}=f(x,t)\) is the true system, \( y=h(x,t)\) is the measurement, and we assume that \( f\) and \( h\) are known. If \( L\) is designed to render \( \dot{q}=f(q,t)+L(\hat{x},t)(h(x,t)-h(q,t))\) contracting with respect to \( q\)  [27, 23, 30, 33], we could define the functions of (134) and (135) as follows:

\[ \begin{align} g (q,\varpi,t) &= f(q,t)+L(\hat{x},t)(h(x,t)-h(q,t)) \end{align} \]

(141)

\[ \begin{align} \Delta_L(q,\varpi,t) &= (L_L(\hat{x},t)-L(\hat{x},t))(h(x,t)-h(\hat{x},t)) \end{align} \]

(142)

where \( \varpi=[x^{\top},\hat{x}^{\top}]^{\top}\) . It can be seen that (134) and (135) with the relations (141) and (142) indeed has \( q(\mu=0,t)=x\) and \( q(\mu=1,t)=\hat{x}\) as its particular solutions when perturbed by deterministic and stochastic disturbances as in (107) and (108), respectively. We can bound \( \|\Delta_L\|\) of (142) in the compact set of (136) with \( \epsilon_{\ell0}=0\) and \( \epsilon_{\ell1}\neq0\) if \( h\) is Lipschitz, and with \( \epsilon_{\ell0}\neq0\) and \( \epsilon_{\ell1}=0\) if \( h\) is bounded in \( \mathcal{S}\) , using the techniques to be discussed in Remark 12. See Theorems 23 and 24 for details.

Example 19 (System identification)

We can use the problem (136) with the systems (134) and (135) also if \( f_{\mathrm{true}}\) of the underlying dynamics \( \dot{x}^* = f_{\mathrm{true}}(x^*,t)\) is unknown and learned by \( \dot{x} = f_L(x,t)\) . Since we have

\[ \begin{align} \dot{x}^* = f_{\mathrm{true}}(x^*,t) = f_L(x^*,t)+(f_{\mathrm{true}}(x^*,t)-f_L(x,t)) \\\end{align} \]

(143)

we could define \( g \) and \( \Delta_L\) of (134) and (135) as follows, to have \( q(\mu=0,t)=x\) and \( q(\mu=1,t)=x^*\) as its particular solutions:

\[ \begin{align} \begin{aligned} g (q,\varpi,t) &= f_L(q,t) \\ \Delta_L(\varpi,t) &= f_{\mathrm{true}}(x^*,t)-f_L(x^*,t) \end{aligned} \end{align} \]

(144)

where \( \varpi=x^*\) , as long as \( \dot{x} = f_L(x,t)\) is contracting with respect to \( x\)  [21, 34]. Since \( \Delta_L\) of (144) is the learning error itself, \( \|\Delta_L\|\) can be bounded in \( \mathcal{S}\) using the techniques of Remark 12. See Theorems 30 and 31 for details.

Remark 12

As seen in Examples 1718, \( \Delta_L\) of (136) is typically given by a learning error, \( \phi(z)-\varphi(z)\) , multiplied by a bounded or Lipschitz continuous function, where \( z=(\varpi,t)\in\mathcal{S}\) for a compact set \( \mathcal{S}\) , and \( \phi\) is the true computationally-expensive/unknown function to be learned by \( \varphi\) as in (139), (142), (144), etc.

Let \( \mathcal{T}\) be a set of training data \( \{(z_i,\phi(z_i))\}_{i=1}^N\) sampled in \( \mathcal{S}\) . For systems with the true function \( \phi\) and its approximation \( \varphi\) being Lipschitz (e.g.{}, by spectral normalization [35] to be discussed in Definition 6 or by using contraction theory as in [36, 37, 38]), we can analytically find a bound for \( \|\phi(z)-\varphi(z)\|\) as follows if target data samples \( z\) are in \( B(r) = \{z\in\mathcal{S}|\sup_{z'\in\mathcal{T}}\|z-z'\|\leq r\}\) for some \( r \in\mathbb{R}_{\geq0}\) :

\[ \begin{align} \sup_{z\in B(r)}\|\phi(z)-\varphi(z)\| \leq \sup_{z'\in \mathcal{T}}\|\phi(z')-\varphi(z')\|+(L_{\phi}+L_{\varphi})r \end{align} \]

(145)

where \( L_{\phi}\) and \( L_{\varphi}\) are the Lipschitz constants of \( \phi\) and \( \varphi\) , respectively. The term \( \sup_{z'\in \mathcal{T}}\|\phi(z')-\varphi(z')\|\) can then be bounded by a constant, e.g.{}, by using a deep robust regression model as proven in [39, 40] with spectral normalization, under standard training data distribution assumptions.

Deep Neural Networks (DNNs) have been shown to generalize well to the set of unseen events that are from almost the same distribution as their training set [41, 42, 43, 44, 45], and consequently, obtaining a tighter and more general upper bound for the learning error as in (145) has been an active field of research [46, 47, 40, 39]. Thus, the condition (136) has become a common assumption in analyzing the performance of learning-based and data-driven control techniques [43, 48, 49, 50, 51, 52, 32, 53, 54].

Theorem 19

Suppose that \( \Delta_L\) of (134) is learned to satisfy (136), and that \( g (q,\varpi,t)\) of (134) is Lipschitz with respect to \( q\) with its 2-norm Lipschitz constant \( L_{ g }\in\mathbb{R}_{\geq 0}\) , i.e.,

\[ \begin{align} \| g (q,\varpi,t)- g (q',\varpi,t)\| \leq L_{ g }\|q-q'\|,~\forall (\varpi,t) \in \mathcal{S} \\\end{align} \]

(146)

where \( q,q' \in \mathbb{R}^n\) and \( \mathcal{S}\) is the compact set of (136). Then we have the following upper bound for all \( (\varpi,t) \in \mathcal{S}\) :

\[ \begin{align} \|\mathtt{e}(t)\| &\leq \|\mathtt{e}(0)\|e^{(L_{ g }+\epsilon_{\ell1})t}+\frac{\epsilon_{\ell0}+\bar{d}}{L_{ g }+\epsilon_{\ell1}}(e^{(L_{ g }+\epsilon_{\ell1})t}-1) \end{align} \]

(147)

where \( \mathtt{e}(t) = \xi_1(t)-\xi_0(t)\) , \( \xi_0(t)=q(0,t)\) and \( \xi_1(t)=q(1,t)\) for \( q\) of (134), and \( \bar{d}=\sup_{\mu,\varpi,t}\|\partial d/\partial \mu\|\) for \( d\) of (136).

Theorem 20

Let us consider the virtual system of a smooth path \( q(\mu,t)\) in (134) and suppose that \( \Delta_L\) of (134) is learned to satisfy (136). If the condition \( \underline{m}\mathrm{I}\preceq M\preceq\overline{m}\mathrm{I}\) of (31) holds and the system (134) with \( \Delta_L=0\) and \( d=0\) is contracting, i.e.,

\[ \begin{align} \dot{M}+M\frac{\partial g }{\partial q}+\frac{\partial g }{\partial q}^{\top}M \preceq -2\alpha M \\\end{align} \]

(148)

of Theorem 1 or 2 holds for \( M\succ 0\) that defines a contraction metric with the contraction rate \( \alpha\) , and if the learning error \( \epsilon_{\ell1}\) of (136) is sufficiently small to satisfy

\[ \begin{align}\exists \alpha_{\ell} \in\mathbb{R}_{>0}\text{ s.t. } \alpha_{\ell}=\alpha-\epsilon_{\ell1}\sqrt{\frac{\overline{m}}{\underline{m}}} > 0\end{align} \]

(149)

then we have the following bound for all \( (\varpi,t) \in \mathcal{S}\) :

\[ \begin{equation} \|\mathtt{e}(t)\| \leq \frac{V_{\ell}(0)}{\sqrt{\underline{m}}}e^{-\alpha_{\ell} t}+\frac{\epsilon_{\ell0}+\bar{d}}{\alpha_{\ell}}\sqrt{\frac{\overline{m}}{\underline{m}}}(1-e^{-\alpha_{\ell} t}) \end{equation} \]

(150)

where \( \mathtt{e}(t)=\xi_1(t)-\xi_0(t)\) with \( \xi_0(t)=q(0,t)\) and \( \xi_1(t)=q(1,t)\) for \( q\) of (134), \( \bar{d}=\sup_{\mu,\varpi,t}\|\partial d/\partial \mu\|\) as in (134), \( \mathcal{S}\) is the compact set of Theorem 19, \( \epsilon_{\ell0}\) and \( \epsilon_{\ell1}\) are the learning errors of (136), and \( V_{\ell}(t) = \int_{\xi_0}^{\xi_1} \|\Theta(q(t),t)\delta q(t)\|\) for \( M=\Theta^{\top}\Theta\) as in Theorem 3.

Theorem 21

Consider the virtual system of a smooth path \( q(\mu,t)\) in (135) and suppose \( \Delta_L\) of (135) is learned to satisfy (136). If \( \underline{m}\mathrm{I}\preceq M\preceq\overline{m}\mathrm{I}\) of (31) holds and the system (135) is contracting, i.e.,

\[ \begin{align} \dot{M}+M\frac{\partial g }{\partial q}+\frac{\partial g }{\partial q}^{\top}M \preceq -2\alpha M-\alpha_s \mathrm{I} \\\end{align} \]

(152)

of Theorem 5 holds, and if the learning error \( \epsilon_{\ell1}\) of (136) and an arbitrary constant \( \alpha_d \in \mathbb{R}_{>0}\) (see (156)) are selected to satisfy

\[ \begin{align}\exists \alpha_{\ell} \in\mathbb{R}_{>0}\text{ s.t. }\alpha_{\ell}=\alpha-\left(\frac{\alpha_d}{2}+\epsilon_{\ell1}\sqrt{\frac{\overline{m}}{\underline{m}}}\right) > 0\end{align} \]

(153)

then we have the following bound for all \( (\varpi,t) \in \mathcal{S}\) :

\[ \begin{align} \mathbb{E} \left[\|\mathtt{e}(t)\|^2\right] \leq \frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha_{\ell} t}+\frac{C}{2\alpha_{\ell}}\frac{\overline{m}}{\underline{m}} \end{align} \]

(154)

where \( \mathtt{e}(t)=\xi_1(t)-\xi_0(t)\) with \( \xi_0(t)=q(0,t)\) and \( \xi_1(t)=q(1,t)\) for \( q\) of (135), \( \mathcal{S}\) is the compact set given in Theorem 19, \( \epsilon_{\ell0}\) and \( \epsilon_{\ell1}\) are the learning errors of (136), \( V_{s\ell}(t) = \int_{\xi_0}^{\xi_1} \|\Theta(q(t),t)\delta q(t)\|^2\) for \( M=\Theta^{\top}\Theta\) as given in Theorem 3, and \( C = \bar{g}^2({2}{\alpha_G}^{-1}+1)+\epsilon_{\ell0}^2\alpha_d^{-1}\) for \( \bar{g}=\sup_{\mu,\varpi,t}\|\partial G/\partial \mu\|_F\) of (135) with an arbitrary constant \( \alpha_G \in \mathbb{R}_{>0}\) as in Theorem 5. Furthermore, the probability that \( \|\mathtt{e}\|\) is greater than or equal to \( \varepsilon\in\mathbb{R}_{> 0}\) is given as

\[ \begin{align} \mathbb{P} \left[\|\mathtt{e}(t)\|\geq\varepsilon\right] \leq \frac{1}{\varepsilon^2}\left(\frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha_{\ell} t}+\frac{C}{2\alpha_{\ell}}\frac{\overline{m}}{\underline{m}}\right). \end{align} \]

(155)

Definition 4

A neural network is a nonlinear mathematical model to approximately represent training samples \( \{(x_i,y_i)\}_{i=1}^{N}\) of \( y = \phi(x)\) by optimally tuning its hyperparameters \( W_{\ell}\) , and is given as follows:

\[ \begin{align} y_i = \varphi(x_i;W_{\ell}) = T_{L+1}\circ\sigma\circ T_{L}\circ\cdots\circ\sigma\circ T_{1}(x_i) \end{align} \]

(162)

where \( T_{\ell}(x) = W_{\ell}x\) , \( \circ\) denotes composition of functions, and \( \sigma\) is an activation function (e.g.{} \( \sigma(x) = \tanh(x)\) , note that \( \varphi(x)\) is smooth in this case). We call a neural network that has more than two layers (i.e., \( L\geq 2\) ) a Deep Neural Network (DNN).

Definition 5

A Neural Contraction Metric (NCM) is a DNN model of a contraction metric given in Theorem 1. Its training data could be sampled by solving the CV-STEM presented in Theorem 14 for control (or Theorem 18 for differential feedback control) and Theorem 17 for estimation. The NCM framework is summarized in Fig. 3.

Remark 13

Although a stochastic version of the NCM is called a Neural Stochastic Contraction Metric (NSCM) in [28], we also denote it as an NCM in this paper for simplicity of presentation.

Theorem 22

Let \( \mathcal{M}\) define the NCM in Definition 5, and let \( u^*=u_d-R^{-1}B^{\top}M(x-x_d)\) for \( M\) and \( R\) given in Theorem 14. Suppose that the systems (157) and (158) are controlled by \( u_L\) , which computes the CV-STEM controller \( u^*\) of Theorem 14 replacing \( M\) by \( \mathcal{M}\) , i.e.,

\[ \begin{align} u_L=u_d-R(x,x_d,u_d,t)^{-1}B(x,t)^{\top}\mathcal{M}(x,x_d,u_d,t)\mathtt{e} \end{align} \]

(163)

for \( \mathtt{e}=x-x_d\) . Note that we could use the differential feedback control \( u=u_d+\int^1_0kd\mu\) of Theorem 18 for \( u^*\) and \( u_L\) . Define \( \Delta_L\) of Theorems 20 and 21 as follows:

\[ \begin{align} \Delta_L(\varpi,t) = B(x,t)(u_L(x,x_d,u_d,t)-u^*(x,x_d,u_d,t)) \end{align} \]

(164)

where we use \( q(0,t)=\xi_0(t)=x_d(t)\) , \( q(1,t)=\xi_1(t)=x(t)\) , and \( \varpi=[x^{\top},x_d^{\top},u_d^{\top}]^{\top}\) in the learning-based control formulation (134) and (135). If the NCM is learned to satisfy the learning error assumptions of Theorems 20 and 21 for \( \Delta_L\) given by (164), then (150), (154), and (155) of Theorems 20 and 21 hold as follows:

\[ \begin{align} \|\mathtt{e}(t)\| &\leq \frac{V_{\ell}(0)}{\sqrt{\underline{m}}}e^{-\alpha_{\ell} t}+\frac{\epsilon_{\ell0}+\bar{d}_c}{\alpha_{\ell}}\sqrt{\frac{\overline{m}}{\underline{m}}}(1-e^{-\alpha_{\ell} t}) \end{align} \]

(165)

\[ \begin{align} \mathbb{E} \left[\|\mathtt{e}(t)\|^2\right] &\leq \frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha_{\ell} t}+\frac{C_C}{2\alpha_{\ell}}\frac{\overline{m}}{\underline{m}} \end{align} \]

(166)

\[ \begin{align} \mathbb{P} \left[\|\mathtt{e}(t)\|\geq\varepsilon\right] &\leq \frac{1}{\varepsilon^2}\left(\frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha_{\ell} t}+\frac{C_C}{2\alpha_{\ell}}\frac{\overline{m}}{\underline{m}}\right) \end{align} \]

(167)

where \( \mathtt{e}=x-x_d\) , \( C_C = \bar{g}_c^2({2}{\alpha_G}^{-1}+1)+\epsilon_{\ell0}^2\alpha_d^{-1}\) , the disturbance upper bounds \( \bar{d}_c\) and \( \bar{g}_c\) are given in (157) and (158), respectively, and the other variables are defined in Theorems 1420 and 21.

Theorem 23

Let \( \mathcal{M}\) define the NCM in Definition 5, and let \( L(\hat{x},t)=M\bar{C}^{\top}R^{-1}\) for \( M\) , \( \bar{C}\) and \( R\) given in Theorem 17. Suppose that the systems (160) and (161) are estimated with an estimator gain \( L_L\) , which computes \( L\) of the CV-STEM estimator \( \dot{\hat{x}} = f(\hat{x},t)+L(\hat{x},t)(y-h(\hat{x},t))\) in Theorem 17 replacing \( M\) by \( \mathcal{M}\) , i.e.,

\[ \begin{align} \dot{\hat{x}} &= f(\hat{x},t)+L_L(\hat{x},t)(y-h(\hat{x},t)) \end{align} \]

(169)

\[ \begin{align} &= f(\hat{x},t)+\mathcal{M}(\hat{x},t)\bar{C}(\varrho_c,\hat{x},t)^{\top}R(\hat{x},t)^{-1}(y-h(\hat{x},t)). \\\end{align} \]

(170)

Define \( \Delta_L\) of Theorems 20 and 21 as follows:

\[ \begin{align} \Delta_L(\varpi,t) = (L_L(\hat{x},t)-L(\hat{x},t))(h(x,t)-h(\hat{x},t)) \end{align} \]

(171)

where we use \( q(0,t)=\xi_0(t)=x(t)\) , \( q(1,t)=\xi_1(t)=\hat{x}(t)\) , and \( \varpi=[x^{\top},\hat{x}^{\top}]^{\top}\) in the learning-based control formulation (134) and (135). If the NCM is learned to satisfy the learning error assumptions of Theorems 20 and 21 for \( \Delta_L\) given by (171), then (150), (154), and (155) of Theorems 20 and 21 hold as follows:

\[ \begin{align} \|\mathtt{e}(t)\| &\leq \sqrt{\overline{m}}{V_{\ell}(0)}e^{-\alpha_{\ell} t}+\frac{\bar{d}_{a}\sqrt{\frac{\overline{m}}{\underline{m}}}+\bar{d}_{b}\overline{m}}{\alpha_{\ell}}(1-e^{-\alpha_{\ell} t})~~~~ \end{align} \]

(172)

\[ \begin{align} \mathbb{E} \left[\|\mathtt{e}(t)\|^2\right] &\leq \overline{m}{ \mathbb{E} }e^{-2\alpha_{\ell} t}+\frac{C_E}{2\alpha_{\ell}}\frac{\overline{m}}{\underline{m}} \end{align} \]

(173)

\[ \begin{align} \mathbb{P} \left[\|\mathtt{e}(t)\|\geq\varepsilon\right] &\leq \frac{1}{\varepsilon^2}\left(\overline{m}{ \mathbb{E} }e^{-2\alpha_{\ell} t}+\frac{C_E}{2\alpha_{\ell}}\frac{\overline{m}}{\underline{m}}\right) \end{align} \]

(174)

where \( \mathtt{e}=\hat{x}-x\) , \( \bar{d}_{a} = \epsilon_{\ell0}+\bar{d}_{e0}\) , \( \bar{d}_{b}=\bar{\rho}\bar{c}\bar{d}_{e1}\) , \( C_E = (\bar{g}_{e0}^2+\bar{\rho}^2\bar{c}^2\bar{g}_{e1}^2\overline{m}^2)({2}{\alpha_G}^{-1}+1)+\epsilon_{\ell0}^2\alpha_d^{-1}\) , the disturbance upper bounds \( \bar{d}_{e0}\) \( \bar{d}_{e1}\) , \( \bar{g}_{e0}\) , and \( \bar{g}_{e1}\) are given in (160) and (161), and the other variables are defined in Theorems 1720 and 21.

Remark 14

As discussed in Examples 17 and 18 and in Remark 12, we have \( \epsilon_{\ell1}=0\) for the learning error \( \epsilon_{\ell1}\) of (136) as long as \( B\) and \( h\) are bounded a compact set by the definition of \( \Delta_L\) in (164) and (171). If \( h\) and \( u^*\) are Lipschitz with respect to \( x\) in the compact set, we have non-zero \( \epsilon_{\ell1}\) with \( \epsilon_{\ell0}=0\) in (136) (see Theorem 24).

Theorem 24

Let \( \mathcal{M}\) define the NCM in Definition 5, and let \( \mathcal{S}_s\subseteq\mathbb{R}^n\) , \( \mathcal{S}_u\subseteq\mathbb{R}^m\) , and \( t\in\mathcal{S}_t\subseteq\mathbb{R}_{\geq 0}\) be some compact sets. Suppose that the systems (157)–(161) are controlled by (163) and estimated by (169), respectively, as in Theorems 22 and 23. Suppose also that \( \exists \bar{b},\bar{c},\bar{\rho}\in\mathbb{R}_{\geq 0}\) s.t. \( \|B(x,t)\|\leq\bar{b}\) , \( \|C(\varrho_c,x,\hat{x},t)\|\leq\bar{c}\) , and \( \|R^{-1}\|\leq\bar{\rho}, \forall x,\hat{x}\in\mathcal{S}_s\) and \( t\in\mathcal{S}_t\) , for \( B\) in (157), \( C\) in Theorem 17, and \( R\) in (163) and (169). If we have for all \( x,x_d,\hat{x}\in\mathcal{S}_s\) , \( u_d\in\mathcal{S}_u\) , and \( t\in\mathcal{S}_t\) that

\[ \begin{align} \|\mathcal{M}(x,x_d,u_d,t)-M(x,x_d,u_d,t)\| &\leq \epsilon_{\ell} \text{ for control} \end{align} \]

(177)

\[ \begin{align} \|\mathcal{M}(\hat{x},t)-M(\hat{x},t)\| &\leq \epsilon_{\ell} \text{ for estimation} \end{align} \]

(178)

where \( M(x,x_d,u_d,t)\) and \( M(\hat{x},t)\) (or \( M(x,t)\) and \( M(\hat{x},t)\) , see Theorem 10) are the CV-STEM solutions of Theorems 14 and 17 with \( \exists\epsilon_{\ell}\in\mathbb{R}_{\geq 0}\) being the learning error, then the bounds (165)–(167) of Theorem 22 and (172)–(174) of Theorem 23 hold with \( \epsilon_{\ell0} = 0\) and \( \epsilon_{\ell1} = \bar{\rho}\bar{b}^2\epsilon_{\ell}\) for control, and \( \epsilon_{\ell0} = 0\) and \( \epsilon_{\ell1} = \bar{\rho}\bar{c}^2\epsilon_{\ell}\) for estimation, as long as \( \epsilon_{\ell}\) is sufficiently small to satisfy the conditions (149) and (153) of Theorems 20 and 21 for deterministic and stochastic systems, respectively.

Example 20

The left-hand side of Fig. 4 shows the state estimation errors of the following Lorenz oscillator perturbed by process noise \( d_0\) and measurement noise \( d_1\)  [27]:

\[ \begin{align} \dot{x} &= [\sigma(x_2-x_1),x_1(\rho-x_3)-x_2,x_1 x_2-\beta x_3]^{\top}+d_0(x,t) \\ y &= [1,0,0]x+d_2(x,t) \\\end{align} \]

(180)

where \( x = [x_1,x_2,x_3]^{\top}\) , \( \sigma = 10\) , \( \beta = 8/3\) , and \( \rho = 28\) . It can be seen that the steady-state estimation errors of the NCM and CV-STEM are below the optimal steady-state upper bound of Theorem 17 (dotted line in the left-hand side of Fig. 4) while the EKF has a larger error compared to the other two. In this example, training data is sampled along \( 100\) trajectories with uniformly distributed initial conditions (\( -10 \leq x_i \leq 10, i = 1,2,3\) ) using Theorem 17, and then modeled by a DNN as in Definition 5. Additional implementation and network training details can be found in [27].

Example 21

Consider a robust feedback control problem of the planar spacecraft perturbed by deterministic disturbances, the unperturbed dynamics of which is given as follows [55]:

\[ \begin{align} \begin{bmatrix}m & 0 & 0 \\ 0 & m & 0 \\ 0 & 0 & I\end{bmatrix}\ddot{x}=\begin{bmatrix}\cos\phi & \sin\phi & 0 \\ -\sin\phi & \cos\phi & 0 \\ 0 & 0 & 1\end{bmatrix}\begin{bmatrix}-1 & -1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & -1 & -1 & 0 & 0 & 1 & 1 \\ -\ell & \ell & -b & b & -\ell & \ell & -b & b\end{bmatrix}u~~~~~~~~~ \end{align} \]

(181)

where \( x = [p_x,p_y,\phi,\dot{p}_x,\dot{p}_y,\dot{\phi}]^T\) with \( p_x\) , \( p_y\) , and \( \phi\) being the horizontal coordinate, vertical coordinate, and yaw angle of the spacecraft, respectively, with the other variables defined as \( m = \text{mass of spacecraft}\) , \( I = \text{yaw moment of inertia}\) , \( u = \text{thruster force vector}\) , \( \ell=\text{half-depth of spacecraft}\) , and \( b=\text{half-width of spacecraft}\) (see Fig. 8 of [55] for details).

As shown in the right-hand side of Fig. 4, the NCM keeps their trajectories within the bounded error tube of Theorem 14 (shaded region) around the target trajectory (dotted line) avoiding collision with the circular obstacles, even under the presence of disturbances, whilst requiring much smaller computational cost than the CV-STEM. Data sampling and the NCM training are performed as in Example 20 [27].

Remark 15

We could also simultaneously synthesize a controller \( u\) and the NCM, and Theorem 22 can provide formal robustness and stability guarantees even for such cases [56, 32]. In Sec. 3.3, we generalize the NCM to learn contraction theory-based robust control using a DNN only taking \( x\) , \( t\) , and a vector containing local environment information as its inputs, to avoid the online computation of \( x_d\) for the sake of automatic guidance and control implementable in real-time.

Theorem 25

Suppose (157) and (158) are controlled by \( u = u_d-K^*(x,x_d,u_d,t)\mathtt{e}\) , where \( \mathtt{e}=x-x_d\) , and that \( K^*\) is designed by the following convex optimization for given \( (x,x_d,u_d,t)\) :

\[ \begin{align} &(K^*,p^*) = \mathrm{arg}\min_{K\in \mathcal{C}_K,p\in\mathbb{R}}\|K\|^2_F+p^2 \end{align} \]

(184)

\[ \begin{align} &\text{ s.t.} \int_{x_d}^x\delta q^{\top}(\dot{\mathcal{M}}+2\rm{sym}{}(\mathcal{M}A-\mathcal{M}BK))\delta q \leq -2\alpha\int_{x_d}^x\delta q^{\top}( \mathcal{M}+\beta \mathrm{I})\delta q+p\end{align} \]

(185)

where \( \mathcal{M}\) defines the NCM of Definition 5 that models \( M\) of the CV-STEM contraction metric in Theorem 14 as in (177), \( \mathcal{C}_K\) is a convex set containing admissible \( K\) , \( \alpha\) is the contraction rate, and \( \beta\) is as defined in theorem 9 (i.e., \( \beta=0\) for deterministic systems and \( \beta = \alpha_s\) for stochastic systems). Then Theorems 4 and 5 still hold for the Lyapunov function defined in (183), yielding the following bounds:

\[ \begin{align} \|\mathtt{e}(t)\|^2 &\leq \frac{V_{\mathrm{NCM}}(0)}{\underline{m}}e^{-2\alpha_p t}+\frac{\frac{\bar{p}}{\underline{m}}+\frac{\bar{d}_c^2\overline{m}}{\alpha_d\underline{m}}}{2\alpha_p}(1-e^{-2\alpha_p t}) \end{align} \]

(186)

\[ \begin{align} \mathbb{E} \left[\|\mathtt{e}(t)\|^2\right] &\leq \frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha t}+\frac{C_C\overline{m}+\bar{p}}{2\alpha\underline{m}} \end{align} \]

(187)

\[ \begin{align} \mathbb{P} \left[\|\mathtt{e}(t)\|\geq\varepsilon\right] &\leq \frac{1}{\varepsilon^2}\left(\frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha t}+\frac{C_C\overline{m}+\bar{p}}{2\alpha\underline{m}}\right)~~~~~~~~~~ \end{align} \]

(188)

where \( \bar{p}=\sup_{x,x_d,u_d,t}p\) , \( \underline{m}\mathrm{I}\preceq\mathcal{M}\preceq\overline{m}\mathrm{I}\) as in (31), \( \alpha_d\in\mathbb{R}_{>0}\) is an arbitrary positive constant (see (189)) selected to satisfy \( \alpha_p=\alpha-\alpha_d/2>0\) , \( C_C = \bar{g}_c^2({2}{\alpha_G}^{-1}+1)\) , \( \alpha_G\in\mathbb{R}_{>0}\) is an arbitrary constant as in Theorem 5, \( \varepsilon\in\mathbb{R}_{\geq0}\) , and the disturbance terms \( \bar{d}_c\) and \( \bar{g}_c\) are given in (157) and (158), respectively. Note that the problem (184) is feasible due to the constraint relaxation variable \( p\) .

Furthermore, if \( \epsilon_{\ell} = 0\) in (177), (184) with \( p=0\) is always feasible, and the optimal feedback gain \( K^*\) minimizes its Frobenius norm under the contraction constraint (185).

Theorem 26

Suppose \( u=u^*\) of (157) is designed by the following convex optimization for given \( (x,x_d,u_d,t)\) :

\[ \begin{align} (u^*,p^*) &= \mathrm{arg}\min_{u\in \mathcal{C}_u,p\in\mathbb{R}}\|u-u_d\|^2+p^2 \end{align} \]

(192)

\[ \begin{align} &\begin{aligned}& \text{ s.t.} \frac{\partial E}{\partial t}+2\gamma_{\mu}(1,t)^{\top}\mathcal{M}(x,t)(f(x,t)+B(x,t)u) \\ & -2\gamma_{\mu}(0,t)^{\top}\mathcal{M}(x_d,t)(f(x_d,t)+B(x_d,t)u_d) \leq -2\alpha E+p\end{aligned}\end{align} \]

(193)

where \( \mathcal{M}\) defines the NCM of Definition 5 that models the CCM of Theorem 18, and \( \mathcal{C}_{u}\) is a convex set containing admissible \( u\) . Then the bound (186) of Theorem 25 holds.

Furthermore, if \( \epsilon_{\ell}=0\) for the NCM learning error \( \epsilon_{\ell}\) in (177), then the problem (192) with \( p=0\) is always feasible, and \( u^*\) minimizes the deviation of \( u\) from \( u_d\) under the contraction constraint (193).

Remark 16

As discussed in Remark 11, Theorem 26 can also be formulated in a stochastic setting as in Theorem 25 using the technique discussed in [32].

Example 22

If \( \mathcal{C}_K=\mathbb{R}^{m\times n}\) and \( \mathcal{C}_u=\mathbb{R}^{m}\) in (184) and (192), respectively, they can both be expressed as the following quadratic optimization problem:

\[ \begin{align}&v^* = \mathrm{arg}\min_{v\in\mathbb{R}^\mathtt{v}} \frac{1}{2}v^{\top}v+c(x,x_d,u_d,t)^{\top}v \text{ s.t.} \varphi_0(x,x_d,u_d,t)+\varphi_1(x,x_d,u_d,t)^{\top}v \leq 0 \end{align} \]

(194)

\[ \begin{align} \\\end{align} \]

(195)

by defining \( {c}\in\mathbb{R}^{\mathtt{v}}\) , \( \varphi_0\in\mathbb{R}\) , \( \varphi_1\in\mathbb{R}^{\mathtt{v}}\) , and \( v\in\mathbb{R}^\mathtt{v}\) appropriately. Applying the KKT condition [29, pp. 243-244] to this problem, we can show that

\[ \begin{align}v^* =\left\{ \begin{array}{ll}-c-\frac{\varphi_0-\varphi_1^{\top}c}{\varphi_1^{\top}\varphi_1}\varphi_1 & \text{if }\varphi_0-\varphi_1^{\top}c > 0 \\ -c & \text{otherwise}.\end{array}\right. \end{align} \]

(196)

\[ \begin{align} \\\end{align} \]

(197)

This implies that the controller utilizes \( u_d\) with the feedback term \( (({\varphi_0-\varphi_1^{\top}c})/{\varphi_1^{\top}\varphi_1})\varphi_1\) only if \( \dot{x}=f(x,t)+B(x,t)u\) with \( u=u_d\) is not contracting, i.e., \( \varphi_0-\varphi_1^{\top}c > 0\) .

Example 23

When \( q\) is parameterized linearly by \( \mu\) in Theorem 25, i.e., \( q = \mu x+(1-\mu)x_d\) , we have \( V_{\mathrm{NCM}}=(x-x_d)^{\top}\mathcal{M}(x-x_d)\) as in the standard Lyapunov function formulation [57, 58, 26]. Also, a linear input constraint, \( u_{\min} \leq u \leq u_{\max}\) can be implemented by using

\[ \begin{align} \mathcal{C}_K &= \{K\in\mathbb{R}^{m\times n}|u_d-u_{\max} \leq K\mathtt{e} \leq u_d-u_{\min}\} \\ \mathcal{C}_u &= \{u\in\mathbb{R}^{m}|u_{\min} \leq u \leq u_{\max}\} \\\end{align} \]

(198)

in (184) and (192), respectively.

Remark 17

For the CV-STEM and NCM state estimation in Theorems 17, 23, and 24, we could design a similar Lyapunov function-based estimation scheme if its contraction constraints of Theorem 17 do not explicitly depend on the actual state \( x\) . This can be achieved by using \( A(\varrho_A,\hat{x},t)\) and \( C(\varrho_C,\hat{x},t)\) instead of \( A(\varrho_A,x,\hat{x},t)\) and \( C(\varrho_C,x,\hat{x},t)\) in Theorem 17 as in Theorem 10 [30], leading to exponential boundedness of system trajectories as derived in [23, 30, 59].

Lemma 5

A matrix \( A \succ 0\) has a unique Cholesky decomposition, i.e., there exists a unique upper triangular matrix \( U\in\mathbb{R}^{n\times n}\) with strictly positive diagonal entries s.t. \( A = U^TU\) .

Definition 6

A spectrally-normalized DNN is a DNN of Definition 4 with its weights \( W_{\ell}\) normalized as \( W_{\ell} = (C_{nn}\Omega_{\ell})/\|\Omega_{\ell}\|\) , where \( C_{nn} \in\mathbb{R}_{\geq0}\) is a given constant.

Lemma 6

A spectrally-normalized DNN given in Definition 6 is Lipschitz continuous with its 2-norm Lipschitz constant \( C_{nn}^{L+1}L_{\sigma}^L\) , where \( L_{\sigma}\) is the Lipschitz constant of the activation function \( \sigma\) in (162). Also, it is robust to perturbation in its input.

Example 24

Let us demonstrate how having the Lipschitz condition on \( \partial M/\partial x_i\) in Theorem 5 would affect the NCM-based control and estimation performance. We consider a rocket autopilot problem perturbed by stochastic disturbances as in [28], the unperturbed dynamics of which is depicted in Fig. 5 and given as follows, assuming that the pitch rate \( q\) and specific normal force are available as a measurement via rate gyros and accelerometers [60, 61]:

\[ \begin{align} \dfrac{d}{dt}\begin{bmatrix}\varphi \\ q\end{bmatrix} &= \begin{bmatrix}\frac{gC(M)\cos(\varphi)\phi_z(\varphi,M)}{mV}+q \\ \frac{C(M)d\phi_m(\varphi)}{I_{y}}\end{bmatrix}+\begin{bmatrix}\frac{gC(M)\bar{d}_n\cos(\varphi)}{mV} \\ \frac{C(M)\bar{d}_m}{I_{y}}\end{bmatrix}u \end{align} \]

(201)

\[ \begin{align} y &= \begin{bmatrix}q\\ \frac{C(M)\phi_z(\varphi,M)}{m}\end{bmatrix}+\begin{bmatrix}0\\ \frac{C(M)\bar{d}_n}{m}\end{bmatrix}u \end{align} \]

(202)

where \( C(M) = p_0M^2S\) , \( \phi_z(\varphi,M)=0.7(a_n\varphi^3+b_n\varphi|\varphi|+c_n(2+M/3)\varphi)\) , \( \phi_m(\varphi,M)=0.7(a_m\varphi^3+b_m\varphi|\varphi|-c_m(7-8M/3)\varphi)\) , \( \bar{d}_n=0.7d_n\) , \( \bar{d}_m=0.7d_m\) , \( (a_n,b_n,c_n,d_n)\) and \( (a_m,b_m,c_m,d_m)\) are given in [61], and the notations are defined in Table 4. Since this example explicitly takes into account stochastic perturbation, the spectrally-normalized DNN of Definition 6 is used to guarantee the Lipschitz condition of Theorem 5 by Lemma 6.

Figure 6 shows the state estimation and tracking error performance of each estimator and controller, where the NSCM is the stochastic counterpart of the NCM aforementioned in Remark 13. It is demonstrated that the steady-state errors of the NSCM and CV-STEM are indeed smaller than its steady-state upper bounds of Theorems 14 and 17 (solid black line), while the other controllers violate this condition. In particular, the optimal contraction rate of the deterministic NCM for state estimation is turned out to be much larger than the NSCM as it does not account for stochastic perturbation, which makes the NCM trajectory diverge around \( t=5.8\) in Fig. 6. The NSCM circumvents this difficulty by imposing the Lipschitz condition on \( \partial M/\partial x_i\) as in Theorem 5 using spectral normalization of Lemma 6. See [28] for additional details.

Definition 7

 Learning-based Autonomous Guidance with RObustness and Stability guarantees (LAG-ROS) is a DNN model for approximating a function which maps state \( x\) , local environment information \( o_{\ell}\) , and time \( t\) to an optimal robust feedback control input \( u^*\) given by (77) of Theorem 14, i.e., \( u^*=u_d-K(x-x_d)\) , or (129) of Theorem 18, i.e., \( u=u_d+\int_{0}^1kd\mu\) , where its training data is sampled as explained in Sec. 3.3.3 (see Figures 1112, and 14).

Theorem 27

Let \( u_L=u_L(x,o_{\ell}(x,o_g),t)\) be the LAG-ROS in Definition 7, and let \( u^*\) be given by (77) of Theorem 14, i.e., \( u^*=u_d-K(x-x_d)\) , or (129) of Theorem 18, i.e., \( u^*=u_d+\int^1_{0}kd\mu\) . Define \( \Delta_L\) of Theorems 20 and 21 as follows:

\[ \begin{align} \Delta_L = B(u_L(x,o_{\ell}(x,o_g),t)-u^*(x,x_d(o_g,t),u_d(o_g,t),t)) \end{align} \]

(203)

where \( B=B(x,t)\) is the actuation matrix given in (157) and (158). Note that we use \( q(0,t)=\xi_0(t)=x_d(o_g,t)\) and \( q(1,t)=\xi_1(t)=x(t)\) in the learning-based control formulation of Theorems 20 and 21.

Table 5. Comparison of LAG-ROS [51] with the Learning-based and Robust Tube-based Planners.
Motion planning schemeControl policyState tracking error \( \|x-x_d\|\)Computational load
(I) Learning-based\( (o_{\ell},t) \mapsto u_d\)Increases exponentially (Theorem 19)One DNN evaluation
(II) Robust tube-based\( (x,x_d,u_d,t) \mapsto u^*\)Exponentially bounded (Theorems 14182526)Computation of \( (x_d,u_d)\)
(I) LAG-ROS\( (x,o_{\ell},t) \mapsto u^*\)Exponentially bounded (Theorem 27)One DNN evaluation

If the LAG-ROS \( u_L\) is learned to satisfy (136) with \( \epsilon_{\ell1}=0\) , i.e., \( \|\Delta_L\| \leq \epsilon_{\ell0}\) for all \( x\in\mathcal{S}_s\) , \( o_g\in\mathcal{S}_o\) , and \( t\in\mathcal{S}_t\) , where \( \mathcal{S}_s\subseteq\mathbb{R}^n\) , \( \mathcal{S}_o\subseteq\mathbb{R}^{g}\) , and \( \mathcal{S}_t\subseteq\mathbb{R}_{\geq0}\) are some compact sets, and if the control-affine nonlinear systems (157) and (158) are controlled by \( u=u_L\) , then (150) of Theorem 20 and (154) of Theorem 21 hold with \( \bar{d}\) and \( \bar{g}\) replaced as in (168), i.e., we have the following for \( \mathtt{e}(t)=x(t)-x_d(o_g,t)\) :

\[ \begin{align} \|\mathtt{e}(t)\| &\leq \frac{V_{\ell}(0)}{\sqrt{\underline{m}}}e^{-\alpha t}+\frac{\bar{d}_{\epsilon_{\ell}}}{\alpha}\sqrt{\frac{\overline{m}}{\underline{m}}}(1-e^{-\alpha t})=r_{\ell}(t)~~~~~~~~ \end{align} \]

(204)

\[ \begin{align} \mathbb{E} \left[\|\mathtt{e}(t)\|^2\right] &\leq \frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha t}+\frac{C_{\epsilon_{\ell}}}{2\alpha}\frac{\overline{m}}{\underline{m}}=r_{s\ell}(t) \end{align} \]

(205)

\[ \begin{align} \mathbb{P} \left[\|\mathtt{e}(t)\|\geq\varepsilon\right] &\leq \frac{1}{\varepsilon^2}\left(\frac{ \mathbb{E} }{\underline{m}}e^{-2\alpha t}+\frac{C_{\epsilon_{\ell}}}{2\alpha}\frac{\overline{m}}{\underline{m}}\right) \end{align} \]

(206)

where \( x_d=x_d(o_g,t)\) and \( u_d=u_d(o_g,t)\) are as given in (159), \( \bar{d}_{\epsilon_{\ell}}=\epsilon_{\ell0}+\bar{d}_c\) , \( C_{\epsilon_{\ell}} = \bar{g}_c^2({2}{\alpha_G}^{-1}+1)+\epsilon_{\ell0}^2\alpha_d^{-1}\) , and the other variables are as defined in Theorems 20 and 21 with the disturbance terms in (168) of Theorem 22.

Example 25

Figure 9
Figure 10
Figure 8. Cart-pole balancing task: \( x=[p,\theta,\dot{p},\dot{\theta}]^{\top}\) , \( x\) and \( x_d\) are given in (157) and (159), (I)(I) are given in Table 5, and \( r_{\ell}\) is given in (207). The shaded area denotes the standard deviation (\( +1\sigma\) and \( -0.5\sigma\) ).

Let us consider the cart-pole balancing task in Fig. 8, perturbed externally by deterministic perturbation, to demonstrate the differences of (I)(I) summarized in Table 5. Its dynamical system is given in [52, 62] with its values defined in [62].

The target trajectories \( (x_d,u_d)\) are sampled with the objective function \( \int_0^{T}\|u\|^2dt\) , and the LAG-ROS of Theorem 27 is modeled by a DNN independently of the target trajectory as detailed in [51]. The environment information \( o_g\) is selected as random initial and target terminal states, and we let \( V_{\ell}(0)=0\) and \( \epsilon_{\ell0}+\bar{d}_c = 0.75\) in (204) of Theorem 27, yielding the following exponential bound:

\[ \begin{align} r_{\ell}(t) = \frac{V_{\ell}(0)}{\sqrt{\underline{m}}}+\frac{\bar{d}_{\epsilon_{\ell}}}{\alpha}\sqrt{\frac{\overline{m}}{\underline{m}}}(1-e^{-\alpha t}) = 3.15(1-e^{-0.60 t})~~~~~~ \end{align} \]

(207)

where \( \bar{d}_{\epsilon_{\ell}}=\epsilon_{\ell0}+\bar{d}_c\) .

Figure 8 shows the tracking errors of each motion planner averaged over \( 50\) simulations at each time instant \( t\) . The LAG-ROS (I) and robust tube-based planner (II) indeed satisfy the bound (207) (dotted black curve) for all \( t\) with a small standard deviation \( \sigma\) , unlike learning-based motion planner (I) with a diverging bound in Theorem 19 and increasing deviation \( \sigma\) . This example demonstrates one of the major advantages of contraction theory, which enables such quantitative analysis on robustness and stability of learning-based planners.

Lemma 7

If the solution of (208) gives \( (x_d,u_d)\) , the LAG-ROS \( u_L\) of Theorem 27 ensures the perturbed solution \( x(t)\) of \( \dot{x}=f(x,t)+B(x,t)u_L+d_c(x,t)\) in (157) to stay in the admissible state space \( \mathcal{X}(o_g,t)\) , i.e., \( x(t) \in \mathcal{X}(o_g,t)\) , even with the learning error \( \epsilon_{\ell0}\) of \( u_L\) in Theorem 27.

Remark 18

For stochastic systems, we could sample \( x\) around \( x_d\) using a given probability distribution of the disturbance using (205) and (206) of Theorem 27.

Example 26

Let us again consider the spacecraft dynamics of Example 21, but now for designing a real-time robust motion planner using the LAG-ROS in a multi-agent setting. In this example, each agent is assumed to have access only to the states of the other agents/obstacles located in a circle with a limited radius, \( 2.0\) m, centered around it. The performance of LAG-ROS is compared with (I), (II), and the following centralized motion planner (I) which is not computable in real-time:

  1. Centralized robust motion planner:
    \( (x,x_d,u_d,t) \mapsto u^*\) , offline centralized solution of (II).

The objective function for sampling target trajectories of (159) is selected as \( \int_0^{T}\|u\|^2dt\) with their target terminal positions \( x_f=[p_{xf},p_{yf},0,0]^{\top}\) , where \( (p_{xf},p_{yf})\) is a random position in \( (0,0) \leq (p_{xf},p_{yf}) \leq (5,5)\) . For the state constraint satisfaction in Lemma 7, the following error tube (204):

\[ \begin{align}r_{\ell}(t) = \frac{\bar{d}_{\epsilon_{\ell}}}{{\alpha}}\sqrt{\chi}(1-e^{-\alpha t}) = 0.125(1-e^{-0.30 t}) \\ \\\end{align} \]

(211)

is used with an input constraint \( u_i\geq 0, \forall i\) , to avoid collisions with a random number of multiple circular obstacles and of other agents, even under the learning error and disturbances. See [51] for the LAG-ROS training details [51].

Figure 13 shows one example of the trajectories of the motion planners (I)(I) under deterministic external disturbances. It implies the following:

  • For (I), the tracking error accumulates exponentially with time due to the lack of robustness as proven in Theorem 19.
  • (II) only yields locally optimal \( (x_d,u_d)\) of (159), as its time horizon has to be small enough to make the problem solvable online with limited computational capacity, only with local environment information. This renders some agents stuck in local minima as depicted in Fig. 13.
  • LAG-ROS (I) tackles these two problems by providing formal robustness and stability guarantees of Theorem 27, whilst implicitly knowing the global solution (only from the local information \( o_{\ell}\) as in [63]) without computing it online. It indeed satisfies the state constraints due to Lemma 7 as can be seen from Fig. 13.

See [51] for the in-depth discussion on this simulation result.

&lt;span data-controller=&quot;mathjax&quot;&gt;Trajectories for the learning-based planner&amp;#160;itemFF, robust tube-based planner&amp;#160;itemMP, LAG-ROS&amp;#160;itemLAGROS, and offline centralized solution&amp;#160;itemC ( ) : start, ) : goal).&lt;/span&gt;
Figure 13. Trajectories for the learning-based planner (I), robust tube-based planner (II), LAG-ROS (I), and offline centralized solution (I) (\( \circ\) : start, \( \bullet\) : goal).

Assumption 1

The matrix \( B\) in (212) does not depend on \( \theta\) , and \( \exists \Pi(x) \in \mathbb{R}^{{c}\times n}\) s.t. \( \Pi(x)^{\top}\vartheta = f(x,\theta_n)-f(x,\theta)\) , where \( \vartheta=\theta_n-\theta\) and \( \theta_n\) is a nominal guess of the uncertain parameter \( \theta\) .

Theorem 28

Suppose that Assumption 1 holds and let \( \mathcal{M}\) defines the NCM of Definition 5, which models \( M\) of the CV-STEM contraction metric in Theorem 14 for the nominal system (212) with \( \theta = \theta_n\) , constructed with an additional convex constraint given as \( \partial_{b_i(x)}\bar{W}+\partial_{b_i(x_d)}\bar{W}=0\) , where \( \partial_{b_i(q)}\bar{W}=\sum_{i}(\partial \bar{W}/\partial q_i)b_i(q)\) for \( B=[b_1,…,b_m]\) (see [11, 64]). Suppose also that the matched uncertainty condition [64] holds, i.e., \( (\Pi(x)-\Pi(x_d))^{\top}\vartheta \in \textrm{span}(B(x))\) for \( \Pi(x)\) , and that (212) is controlled by the following adaptive control law:

\[ \begin{align} u &= u_L+\varphi(x,x_d)^{\top} \hat{\vartheta} \end{align} \]

(214)

\[ \begin{align} \dot{\hat{\vartheta}} &= -\Gamma (\varphi(x,x_d)B(x)^{\top}\mathcal{M}(x,x_d,u_d)(x-x_d)+\sigma \hat{\vartheta}) \end{align} \]

(215)

where \( u_L\) is given by (163) of Theorem 24, i.e., \( u_L=u_d-R^{-1}B^{\top}\mathcal{M}(x-x_d)\) , \( \Gamma \in \mathbb{R}^{{c}\times {c}}\) is a diagonal matrix with positive elements that governs the rate of adaptation, \( \sigma \in \mathbb{R}_{\geq 0}\) , and \( (\Pi(x)-\Pi(x_d))^{\top}\vartheta=B(x)\varphi(x,x_d)^{\top}\vartheta\) .

If \( \exists \underline{\gamma},\overline{\gamma},\bar{b},\bar{\rho},\bar{\phi},\bar{\vartheta} \in \mathbb{R}_{>0}\) s.t. \( \underline{\gamma} \mathrm{I} \preceq \Gamma \preceq \overline{\gamma}\mathrm{I}\) , \( \|B(x)\| \leq \bar{b}\) , \( \|R^{-1}(x,x_d,u_d)\| \leq \bar{\rho}\) , \( \|\varphi(x,x_d)\| \leq \bar{\phi}\) , and \( \|\vartheta\| \leq \bar{\vartheta}\) , and if \( \Gamma\) and \( \sigma\) of (215) are selected to satisfy the following relation for the learning error \( \|\mathcal{M}-M\| \leq \epsilon_{\ell}\) in some compact set \( \mathcal{S}\) as in (177) of Theorem 24:

\[ \begin{align} \begin{bmatrix} -2\alpha_{\ell}\underline{m} & \bar{\phi}\bar{b}\epsilon_{\ell} \\ \bar{\phi}\bar{b}\epsilon_{\ell} & -2\sigma \end{bmatrix} \preceq -2 \alpha_{a}\begin{bmatrix} \overline{m} & 0 \\ 0 & 1/\underline{\gamma} \end{bmatrix} \end{align} \]

(216)

for \( \exists \alpha_{a} \in \mathbb{R}_{>0}\) , \( \alpha_{\ell}\) given in Theorem 24, and \( \underline{m}\) and \( \overline{m}\) given in \( \underline{m}\mathrm{I}\preceq M \preceq \overline{m}\mathrm{I}\) of (31), then the system (212) is robust against bounded deterministic and stochastic disturbances with \( \sigma\neq0\) , and we have the following bound in the compact set \( \mathcal{S}\) :

\[ \begin{align} \|\mathtt{e}(t)\| \leq \frac{V_{\ell}(0)e^{-\alpha_a t}+\alpha_a^{-1}\sigma\sqrt{\overline{\gamma}}\bar{\vartheta}(1-e^{-\alpha_a t})}{\sqrt{\underline{m}}} \end{align} \]

(217)

where \( \mathtt{e}=x-x_d\) , and \( V_{\ell}=\int^{\xi_1}_{\xi_0}\|\Theta\delta q\|\) is defined in Theorem 3 with \( M=\Theta^{\top}\Theta\) replaced by \( \mathrm{diag}(M,\Gamma^{-1})\) for \( \xi_0=[x_d^{\top},\vartheta^{\top}]^{\top}\) and \( \xi_1=[x^{\top},\hat{\vartheta}^{\top}]^{\top}\) . Furthermore, if the learning error \( \epsilon_{\ell}=0\) (CV-STEM control), (215) with \( \sigma = 0\) guarantees asymptotic stability of \( x\) to \( x_d\) in (212).

Remark 19

Although Theorem 28 is for the case where \( f(x)\) is affine in its parameter, it is also useful for the following types of systems with an uncertain parameter \( \theta\in\mathbb{R}^{c}\) and a control input \( u\) (see [52]):

\[ \begin{align} H(x){p}^{(n)}+h(x)+\Pi (x)\theta=u \end{align} \]

(222)

where \( {p}\in \mathbb{R}^n\) , \( u\in \mathbb{R}^n\) , \( h:\mathbb{R}^{n}\mapsto \mathbb{R}^{n}\) , \( H:\mathbb{R}^{n}\mapsto \mathbb{R}^{n\times n}\) , \( \Pi:\mathbb{R}^{n}\mapsto \mathbb{R}^{n\times {c}}\) , \( x=[({p}^{(n-2)})^{\top},…,({p})^{\top}]^{\top}\) , and \( {p}^{(k)}\) denotes the \( k\) th time derivative of \( {p}\) . In particular, adaptive sliding control [65] designs \( u\) to render the system of the composite variable \( s\) given as \( s = {p}^{(n-1)}-{p}_r^{(n-1)}\) to be contracting, where \( {p}_r^{(n-1)} = p_d^{(n-1)}-\sum_{i=0}^{n-2}\lambda_{i}\mathtt{e}^{(i)}\) , \( \mathtt{e}=p-p_d\) , \( p_d\) is a target trajectory, and \( \kappa^{n-1}+\lambda_{n-2}\kappa^{n-2}+…+\lambda_0\) is a stable (Hurwitz) polynomial in the Laplace variable \( \kappa\) (see Example 6). Since we have \( \mathtt{e}^{(n-1)}=s-\sum_{i=0}^{n-2}\lambda_{i}\mathtt{e}^{(i)}\) and the system for \( [\mathtt{e}^{(0)},…,\mathtt{e}^{(n-2)}]\) is also contracting if \( s=0\) due to the Hurwitz property, the hierarchical combination property [66, 67] of contraction in Theorem 7 guarantees \( \lim_{t\to\infty}\|p-p_d\| = 0\)  [6, p. 352] (see Example 7).

Example 27

Using the technique of Remark 19, we can construct adaptive control for the Lagrangian system in Example 6 as follows:

\[ \begin{align} &\mathcal{H}(\mathtt{q})\ddot{\mathtt{q}}+\mathcal{C}(\mathtt{q},\dot{\mathtt{q}})\dot{\mathtt{q}}+\mathcal{G}(\mathtt{q})=u(\mathtt{q},\dot{\mathtt{q}},t) \end{align} \]

(223)

\[ \begin{align} &u(\mathtt{q},\dot{\mathtt{q}},t) = -\mathcal{K}(t)(\dot{\mathtt{q}}-\dot{\mathtt{q}}_r)+\hat{\mathcal{H}}(\mathtt{q})\ddot{\mathtt{q}}_r+\hat{\mathcal{C}}(\mathtt{q},\dot{\mathtt{q}})\dot{\mathtt{q}}_r+\hat{\mathcal{G}}(\mathtt{q})~~~~~ \end{align} \]

(224)

where \( \hat{\mathcal{H}}\) , \( \hat{\mathcal{C}}\) , and \( \hat{\mathcal{G}}\) are the estimates of \( \mathcal{H}\) , \( \mathcal{C}\) , and \( \mathcal{G}\) , respectively, and the other variables are as defined in Example 6. Suppose that the terms \( \mathcal{H}\) , \( \mathcal{C}\) , and \( \mathcal{G}\) depend linearly on the unknown parameter vector \( \theta\) as follows [6, p. 405]:

\[ \begin{align} \mathcal{H}(\mathtt{q})\ddot{\mathtt{q}}_r+\mathcal{C}(\mathtt{q},\dot{\mathtt{q}})\dot{\mathtt{q}}_r+\mathcal{G}(\mathtt{q})=Y(\mathtt{q},\dot{\mathtt{q}},\dot{\mathtt{q}}_r,\ddot{\mathtt{q}}_r)\theta. \end{align} \]

(225)

Updating the parameter estimate \( \hat{\theta}\) using the adaptation law, \( \dot{\hat{\theta}} = -\Gamma Y(\mathtt{q},\dot{\mathtt{q}},\dot{\mathtt{q}}_r,\ddot{\mathtt{q}}_r)^{\top}(\dot{\mathtt{q}}-\dot{\mathtt{q}}_r)\) , as in (215) where \( \Gamma \succ 0\) , we can define the following virtual system which has \( q = \xi_0 = [\dot{\mathtt{q}}_r^{\top},{\theta}^{\top}]^{\top}\) and \( q = \xi_1 = [\dot{\mathtt{q}}^{\top},\hat{\theta}^{\top}]^{\top}\) as its particular solutions:

\[ \begin{align} \begin{bmatrix}\mathcal{H} & 0 \\ 0 & \Gamma^{-1}\end{bmatrix}(\dot{q}-\dot{\xi}_0)+\begin{bmatrix}\mathcal{C}+\mathcal{K} & -Y \\ Y^{\top} & 0\end{bmatrix}(q-\xi_0) =0 \end{align} \]

(226)

where the relation \( u=-\mathcal{K}(t)(\dot{\mathtt{q}}-\dot{\mathtt{q}}_r)+Y(\mathtt{q},\dot{\mathtt{q}},\dot{\mathtt{q}}_r,\ddot{\mathtt{q}}_r)\hat{\theta}\) is used. Thus, for a Lyapunov function \( V=\delta q^{\top}\left[\begin{smallmatrix}\mathcal{H} & 0\\ 0 & \Gamma^{-1}\end{smallmatrix}\right]\delta q\) , we have that

\[ \begin{align} \dot{V} &= \delta q^{\top}\begin{bmatrix}\mathcal{K} & Y \\ -Y^{\top} & 0\end{bmatrix}\delta q= \delta q^{\top}\begin{bmatrix}\mathcal{K} & 0 \\ 0 & 0\end{bmatrix}\delta q \\\end{align} \]

(227)

which results in asymptotic stability of the differential state \( \delta q\) (i.e., semi-contraction [68], see also Barbalat’s lemma [6, p. 405-406]).

Assumption 2

The dynamical system (212) is multiplicatively separable in terms of \( x\) and \( \theta\) , i.e., \( \exists\) \( Y_f:\mathbb{R}^n\mapsto\mathbb{R}^{n\times {c}_z}\) , \( Y_{b_i}:\mathbb{R}^n\mapsto\mathbb{R}^{n\times {c}_z}\) , and \( Z:\mathbb{R}^{c}\mapsto\mathbb{R}^{{c}_z}\) s.t.

\[ \begin{align} \begin{aligned} Y_f(x)Z(\theta) &= f(x,\theta) \\ Y_{b_i}(x)Z(\theta) &= b_i(x,\theta) \end{aligned} \end{align} \]

(228)

where \( B(x,\theta)=[b_1(x,\theta),…,b_m(x,\theta)]\) . We could define \( \theta\) as \( [\theta^{\top},Z(\theta)^{\top}]^{\top}\) so we have \( Y_f(q)\theta = f(q,\theta)\) and \( Y_{b_i}(q)\theta = b_i(q;\theta)\) . Such an over-parameterized system could be regularized using the Bregman divergence as in [69] (see Example 29), and thus we denote \( [\theta^{\top},Z(\theta)^{\top}]^{\top}\) as \( \theta\) in the subsequent discussion.

Definition 8

The adaptive Neural Contraction Metric (aNCM) in Fig. 15 is a DNN model for the optimal parameter-dependent contraction metric, given by solving the adaptive CV-STEM, i.e., (106) of Theorem 14 (or Theorem 18 for differential feedback) with its contraction constraint replaced by the following convex constraint:

\[ \begin{align} &-\Xi+2\rm{sym}{\left(A\bar{W}\right)}-2\nu BR^{-1}B^{\top} \preceq -2\alpha \bar{W} \end{align} \]

(231)

for deterministic systems, and

\[ \begin{equation} \begin{bmatrix} -\Xi+2\rm{sym}{}(A\bar{W})-2\nu BR^{-1}B^{\top}+2\alpha \bar{W}& \bar{W} \\ \bar{W} & -\frac{\nu}{\alpha_s}\mathrm{I} \end{bmatrix} \preceq 0 \end{equation} \]

(232)

for stochastic systems, where \( W=M(x,x_d,u_d,\hat{\theta})^{-1}\succ0\) (or \( W=M(x,\hat{\theta})^{-1}\succ0\) , see Theorem 10), \( \bar{W} = \nu W\) , \( \nu = \overline{m}\) , \( R=R(x,x_d,u_d)\succ0\) is a weight matrix on \( u\) , \( \Xi=({d}/{dt})|_{\hat{\theta}}{\bar{W}}\) is the time derivative of \( \bar{W}\) computed along (212) with \( \theta = \hat{\theta}\) , \( A\) and \( B\) are given in (229), \( \alpha\) , \( \underline{m}\) , \( \overline{m}\) , and \( \alpha_s\) are as given in Theorem 14, and the arguments \( (x,x_d,u_d,\hat{\theta})\) are omitted for notational simplicity.

Theorem 29

Suppose that Assumption 2 holds and let \( \mathcal{M}\) define the aNCM of Definition 8. Suppose also that the true dynamics (212) is controlled by the following adaptive control law:

\[ \begin{align} u &= u_d-R(x,x_d,u_d)^{-1}B(x,\hat{\theta})^{\top}\mathcal{M}(x,x_d,u_d,\hat{\theta})\mathtt{e} \end{align} \]

(233)

\[ \begin{align} \dot{\hat{\theta}} &= \Gamma((Y^{\top}d\mathcal{M}_x+Y_d^{\top}d\mathcal{M}_{x_d}+\tilde{Y}^{\top}\mathcal{M}) \mathtt{e}-\sigma\hat{\theta}) \end{align} \]

(234)

where \( d\mathcal{M}_q = [({\partial \mathcal{M}}/{\partial q_1})\mathtt{e},…,({\partial \mathcal{M}}/{\partial q_n})\mathtt{e}]^{\top}/2\) , \( \mathtt{e}=x-x_d\) , \( \Gamma \succ 0\) , \( \sigma \in \mathbb{R}_{\geq 0}\) , and \( Y\) , \( Y_d\) , \( \tilde{Y}\) are given in (230). Suppose further that the learning error in \( \|\mathcal{M}-M\|\leq\epsilon_{\ell}\) of Theorem 24 additionally satisfies \( \|d\mathcal{M}_{x_d} -dM_{x_d} \| \leq \epsilon_{\ell}\) and \( \|d\mathcal{M}_x -dM_x \| \leq \epsilon_{\ell}\) in some compact set \( \mathcal{S}\) .

If \( \exists \bar{b},\bar{\rho},\bar{y} \in \mathbb{R}_{>0}\) s.t. \( \|B(x,\hat{\theta})\| \leq \bar{b}\) , \( \|R^{-1}(x,x_d,u_d)\| \leq \bar{\rho}\) , \( \|Y\|\leq \bar{y}\) , \( \|Y_d\|\leq \bar{y}\) , and \( \|\tilde{Y}\|\leq \bar{y}\) in (233) and (234), and if \( \Gamma\) and \( \sigma\) of (234) are selected to satisfy the following as in Theorem 28:

\[ \begin{align} \begin{bmatrix} -2\alpha_{\ell}\underline{m} & \bar{y}\epsilon_{\ell} \\ \bar{y}\epsilon_{\ell} & -2\sigma \end{bmatrix} \preceq -2 \alpha_{a}\begin{bmatrix} \overline{m} & 0 \\ 0 & 1/\underline{\gamma} \end{bmatrix} \end{align} \]

(235)

for \( \exists \alpha_a\in\mathbb{R}_{>0}\) , \( \alpha_{\ell}\) given in Theorem 24, and \( \underline{m}\) and \( \overline{m}\) given in \( \underline{m}\mathrm{I}\preceq M \preceq \overline{m}\mathrm{I}\) of (31), then the system (212) is robust against bounded deterministic and stochastic disturbances, and we have the exponential bound (217) in the compact set \( \mathcal{S}\) . Furthermore, if \( \epsilon_{\ell}=0\) (adaptive CV-STEM control), (234) with \( \sigma = 0\) guarantees asymptotic stability of \( x\) to \( x_d\) in (212).

Lemma 8

If \( \hat{\theta}\) is given by (236) with \( \hat{\theta}(0) \in \Omega_{\theta} = \{\theta\in\mathbb{R}^c|p(\theta) \leq 1\}\) for a convex function \( p(\theta)\) , then \( \hat{\theta}(t) \in \Omega_{\theta}, \forall t\geq 0\) (e.g.{}, \( \|\hat{\theta}\| \leq \theta_{\max}\sqrt{1+\epsilon_{\theta}}\) if \( p(\hat{\theta})=(\hat{\theta}^{\top}\hat{\theta}-\theta_{\max}^2)/(\epsilon_{\theta}\theta_{\max}^2)\) ).

Example 28

The dependence on \( u\) and \( \dot{\hat{\theta}}\) in \( (d/dt)|_{\hat{\theta}}M\) can be removed by using \( \partial_{b_i(x)}M+\partial_{b_i(x_d)}M=0\) as in Theorem 28, and by using adaptation rate scaling introduced in [70]. In essence, the latter multiplies the adaptation law (234) by any strictly-increasing and strictly-positive scalar function \( v(2\rho)\) , and update \( \rho\) as

\[ \begin{align} \dot{\rho} = \frac{1}{2}\frac{v(2\rho)}{v_{\rho}(2\rho)}\sum_{i=1}^{c}\frac{1}{V_{\mathtt{e}}+\eta}\frac{\partial V_{\mathtt{e}}}{\partial \hat{\theta}_i}\dot{\theta_i} \end{align} \]

(239)

where \( v_{\rho}=\partial v/\partial \rho\) , \( \eta \in\mathbb{R}_{>0}\) , and \( V_{\mathtt{e}} = \mathtt{e}^{\top}M(x,x_d,u_d,\hat{\theta})\mathtt{e}\) for \( \mathtt{e}=x-x_d\) and \( M\) given in Definition 8, so the additional term due to (239) cancels out the term involving \( \dot{\hat{\theta}}\) in \( (d/dt)|_{\hat{\theta}}\bar{W}\) of (231) (see [70] for details). Its robustness property follows from Theorem 29 also in this case.

Example 29

Using the Bregman divergence-based adaptation in [69], we could implicitly regularize the parameter estimate \( \hat{\theta}\) as follows:

\[ \begin{align} \lim_{t\to\infty}\hat{\theta}=\mathrm{arg}\min_{\vartheta\in \mathcal{A}}d_{\psi}(\vartheta\|\theta^*)=\mathrm{arg}\min_{\vartheta\in \mathcal{A}}d_{\psi}\left(\vartheta\left\|\hat{\theta}(0)\right.\right) \end{align} \]

(240)

where \( d_{\psi}\) is the Bregman divergence defined as \( d_{\psi}(x\|y)=\psi(x)-\psi(y)-(x-y)^{\top}\nabla\psi(y)\) for a convex function \( \psi\) , and \( \mathcal{A}\) is a set containing only parameters that interpolate the dynamics along the entire trajectory. If \( \hat{\theta}(0)=\mathrm{arg}\min_{b\in \mathbb{R}^p}\psi(b)\) , we have \( \lim_{t\to\infty}\hat{\theta}=\mathrm{arg}\min_{\vartheta\in \mathcal{A}}\psi(\vartheta)\) , which regularizes \( \hat{\theta}\) to converge to a parameter that minimizes \( \psi\) . For example, using \( 1\) -norm for \( \psi\) would impose sparsity on the steady-state parameter estimate \( \hat{\theta}\)  [69].

Remark 20

Note that the results presented earlier in this section do not necessarily mean parameter convergence, \( \lim_{t\to\infty}\|\tilde{\theta}\|=0\) , as the adaptation objective is to drive the tracking error \( \|x-x_d\|\) to zero [6, p. 331], not to find out the true parameter \( \theta\) out of the many that achieve perfect tracking.

Asymptotic parameter convergence, \( \lim_{t\to\infty}\|\tilde{\theta}\|=0\) for \( \theta\) of (212) under Assumption 2, could be guaranteed if there is no disturbance and learning error with \( \sigma=0\) , and we have \( \exists T,\alpha_{PE}\in\mathbb{R}_{>0}\text{ s.t. }\int_{t}^{t+T}\tilde{Y}^{\top}\tilde{Y}d\tau \succeq \alpha_{PE}\mathrm{I}, \forall{t}\) for \( \tilde{Y}\) given in (230) (the persistent excitation condition [6, p. 366]). We could also utilize the Bregman divergence-based adaptation to regularize the behavior of \( \lim_{t\to\infty}\|\hat{\theta}\|\) as in Example 29.

Theorem 30

Let \( q(0,t)=\xi_0(t)=x(t)\) , \( q(1,t)=\xi_1(t)=x^*(t)\) , and \( g =f_L\) in Theorems 20 and 21 as in Example 19, and define \( \Delta_L\) as

\[ \begin{align} \Delta_L = f_\mathrm{true}(x^*,u(x^*,t),t)-f_L(x^*,u(x^*,t),t) \end{align} \]

(243)

for the learning error condition \( \|\Delta_L\|\leq\epsilon_{\ell0}+\epsilon_{\ell1}\|\xi_1-\xi_0\|\) in (136). If the function \( f_L\) is learned to satisfy (136) with \( \epsilon_{\ell1}=0\) , i.e., \( \|\Delta_L\| \leq \epsilon_{\ell0}\) in some compact set \( \mathcal{S}\) , and if there exists a contraction metric defined by \( M\) bounded as \( \underline{m}\mathrm{I}\preceq M \preceq \overline{m}\mathrm{I}\) as in (31), which renders (241) contracting as in (6) of Theorem 1 for deterministic systems, i.e.,

\[ \begin{align} \dot{M}+M\frac{\partial f_L}{\partial x}+\frac{\partial f_L}{\partial x}^{\top}M\preceq-2\alpha M, \end{align} \]

(244)

and (45) of Theorem 5 for stochastic systems, i.e.,

\[ \begin{align} \dot{M}+M\frac{\partial f_L}{\partial x}+\frac{\partial f_L}{\partial x}^{\top}M\preceq-2\alpha M-\alpha_s\mathrm{I}, \end{align} \]

(245)

then we have the following in the compact set \( \mathcal{S}\) :

\[ \begin{align} \|x(t)-x^*(t)\| \leq \frac{V_{\ell}(0)}{\sqrt{\underline{m}}}e^{-\alpha t}+\frac{\epsilon_{\ell0}}{\alpha}\sqrt{\frac{\overline{m}}{\underline{m}}}(1-e^{-\alpha t}) \end{align} \]

(246)

for \( x\) and \( x^*\) in (241) and (242), where \( V_{\ell}=\int^{x}_{x^*}\|\Theta\delta x\|\) as in Theorem 3 with \( M=\Theta^{\top}\Theta\) . Furthermore, the systems (241) and (242) are robust against bounded deterministic and stochastic disturbances.

Lemma 9

Define a mapping \( \mathcal{F}\) as \( \mathcal{F}(u) = u^*(x,t)-r_L(x,u,t)\) , where \( u^*\) and \( r_L\) are given in (248). If \( r_L\) is Lipschitz in \( u\) with a 2-norm Lipschitz constant \( L_u < 1\) , i.e., \( \|r_L(x,u,t)-r_L(x,u',t)\| \leq L_u\|u-u'\|, \forall u,u'\) , then \( \mathcal{F}\) is a contraction mapping for fixed \( x,t\) . Therefore, if \( x,t\) are fixed, discrete-time nonlinear control \( u_k\) defined as

\[ \begin{align} u_k = \mathcal{F}(u_{k-1}) = u^*(x,t)-r_L(x,u_{k-1},t) \end{align} \]

(249)

converges to a unique solution \( u\) given by \( u=\mathcal{F}(u)\) .

Theorem 31

Let \( x\) be the trajectory of the following ideal system without the unknown part \( r\) of the dynamics (247):

\[ \begin{align} \dot{x} = f(x,t)+B(x,t)u^*(x,t) \end{align} \]

(251)

where \( u^*\) is a stabilizing controller that renders (251) contracting as in Theorem 30 for \( M\) which satisfies \( \underline{m}\mathrm{I}\preceq M \preceq \overline{m}\mathrm{I}\) of (31). Note that such \( u^*\) can be designed by using, e.g.{}, Theorems 14 and 18. Suppose that the true dynamics (247) is controlled by (249) and

\[ \begin{align}\exists \rho \in \mathbb{R}_{\geq 0}\text{ s.t. } \|u_k-u_{k-1}\| \leq \rho \|x-x^*\|\end{align} \]

(252)

for \( x^*\) in (247[43]. If \( \exists\bar{b}\in\mathbb{R}_{\geq 0}\) s.t. \( \|B(x,t)\|\leq\bar{b}\) , and if \( r_L\) is modeled by a spectrally-normalized DNN of Definition 6 to have

\[ \begin{align} \|r_L(x,u,t)-r(x,u,t)\| \leq \epsilon_{\ell} \end{align} \]

(253)

for all \( x\in\mathcal{S}_s\) , \( u\in\mathcal{S}_u\) , and \( t\in\mathcal{S}_t\) , where \( \mathcal{S}_s\subseteq\mathbb{R}^n\) , \( \mathcal{S}_u\subseteq\mathbb{R}^m\) , and \( \mathcal{S}_t\subseteq\mathbb{R}_{\geq 0}\) are some compact sets, then \( r_L\) is Lipschitz continuous, and the controller (249) applied to (247) gives the following bound in the compact set:

\[ \begin{align} \|x(t)-x^*(t)\| \leq \frac{V_{\ell}(0)}{\sqrt{\underline{m}}}e^{-\alpha_{\ell}t}+\frac{\bar{b}\epsilon_{\ell}}{\alpha_{\ell}}\sqrt{\frac{\overline{m}}{\underline{m}}}(1-e^{-\alpha_{\ell}t}) \end{align} \]

(254)

as long as the Lipschitz constant of \( r_L\) is selected to have

\[ \begin{align} \exists \alpha_{\ell} \in\mathbb{R}_{>0} \text{ s.t. } \alpha_{\ell} = \alpha-\bar{b}L_u\rho\sqrt{\frac{ \overline{m} }{ \underline{m} }} > 0\end{align} \]

(255)

where \( V_{\ell}=\int^{x}_{x^*}\|\Theta\delta x\|\) as in Theorem 3 with \( M=\Theta^{\top}\Theta\) . Furthermore, the system (247) with the controller (249) is robust against deterministic and stochastic disturbances.

Example 30

Let us consider the following Lagrangian system of Example 6 perturbed externally by unknown control non-affine residual disturbance:

\[ \begin{align} &\mathcal{H}(\mathtt{q})\ddot{\mathtt{q}}+\mathcal{C}(\mathtt{q},\dot{\mathtt{q}})\dot{\mathtt{q}}+\mathcal{G}(\mathtt{q})=u+r(x,u) \end{align} \]

(258)

where \( x=[\mathtt{q}^{\top},\dot{\mathtt{q}}^{\top}]^{\top}\) and the other variables are as given in Example 6. Using the result of Theorem 31, we can design a discrete-time nonlinear controller by augmenting the exponentially stabilizing controller of Example 6 with a learned residual part \( r_L(x,u)\) as follows:

\[ \begin{align} u_k &= -\mathcal{K}(t)(\dot{\mathtt{q}}-\dot{\mathtt{q}}_r)+\mathcal{H}(\mathtt{q})\ddot{\mathtt{q}}_r+\mathcal{C}(\mathtt{q},\dot{\mathtt{q}})\dot{\mathtt{q}}_r+\mathcal{G}(\mathtt{q})-r_L(x,u_{k-1}) \end{align} \]

(259)

where \( \dot{\mathtt{q}}_r=\dot{\mathtt{q}}_d(t)-\Lambda(t)(\mathtt{q}-\mathtt{q}_d(t))\) , \( \mathcal{K}: \mathbb{R}_{\geq 0}\mapsto\mathbb{R}^{n\times n}\) , \( \Lambda: \mathbb{R}_{\geq 0}\mapsto\mathbb{R}^{n\times n}\) , and \( (\mathtt{q}_d,\dot{\mathtt{q}}_d)\) is the target trajectory of the state \( (\mathtt{q},\dot{\mathtt{q}})\) , and \( \mathcal{K},\Lambda \succ 0\) are control gain matrices (design parameters). Again, note that \( \dot{\mathcal{H}}-2\mathcal{C}\) is skew-symmetric with \( \mathcal{H} \succ 0\) by construction. This gives us the following closed-loop virtual system of a smooth path \( q(\mu,t)\) parameterized by \( \mu \in [0,1]\) , which has \( q(\mu=0,t)=\dot{\mathtt{q}}_r\) and \( q(\mu=1,t)=\dot{\mathtt{q}}\) as its particular solutions as in Example 6, but now with non-zero perturbation due to \( r(x,u)\) :

\[ \begin{align} &\mathcal{H}(\dot{q}-\ddot{\mathtt{q}}_r)+(\mathcal{C}+\mathcal{K})(q-\dot{\mathtt{q}}_r)=\mu(r(x,u_k)-r_L(x,u_{k-1})).\nonumber \end{align} \]

(260)

After some algebra as in the proof of Theorem 31, we can show that

\[ \begin{align} \dfrac{d}{dt}\int_0^1\|\Theta\partial_{\mu} q\| &\leq -\left(\frac{k_{\ell}}{h_u}-\frac{L_u\rho}{h_{\ell}}\right)\int_0^1\|\Theta\partial_{\mu} q\|+\frac{\epsilon_{\ell}}{\sqrt{h_{\ell}}} \\\end{align} \]

(261)

where \( \mathcal{H}=\Theta^{\top}\Theta\) , \( h_{\ell}\mathrm{I}\preceq\mathcal{H}\preceq h_u\mathrm{I}\) , \( k_{\ell}\mathrm{I}\preceq\mathcal{K}\) , \( \|u_k-u_{k-1}\| \leq \rho\|\dot{\mathtt{q}}-\dot{\mathtt{q}}_r\|\) , \( \epsilon_{\ell}\) is the learning error of \( r_L\) , and \( L_u\) is the Lipschitz constant of \( r_L\) (assuming \( r_L\) is modeled by a spectrally-normalized DNN of Definition 6). This indeed indicates that the tracking error of the Lagrangian system (258) is exponentially bounded as proven in Theorem 31.

Theorem 32

Suppose that \( \mathcal{S}_s \subseteq \mathbb{R}^n\) is full-dimensional, \( n \geq 2\) , and \( \dot{x} = f(x)\) is contracting in a metric defined by a uniformly bounded \( M(x)\succ 0\) , i.e.,

\[ \begin{align} \dot{M}+M\frac{\partial f}{\partial x}+\frac{\partial f}{\partial x}^{\top}M\preceq-2\alpha M \\\end{align} \]

(262)

of (6) and \( \underline{m}\mathrm{I}\preceq M \preceq \overline{m}\mathrm{I}\) of (31) hold for \( M\) . Suppose also that a matrix-valued function \( \mathcal{M}(x)\) with \( \mathcal{M}(x)\succeq \underline{m}_L \mathrm{I}\) is learned to satisfy \( \nu(Z_b)\leq\epsilon_{\ell}\) , where the contraction violation set \( Z_b\) for \( p(x,\delta x) = (f(x),(\partial f/\partial x)\delta x)^{\top}\) is defined as

\[ \begin{align} &Z_b = \left\{(\xi,\delta\xi)\in \mathcal{S}_s\times\mathbb{S}^{n-1}:\max_{t\in \mathcal{S}_t}(\nabla \mathcal{V}(\psi_{t}(\xi,\delta \xi))^{\top}p(\psi_{t}(\xi,\delta \xi))-\alpha \mathcal{V}(\psi_t(\xi,\delta \xi)) > 0)\right\} \\\end{align} \]

(263)

with \( \mathcal{V}: \mathcal{T}S \mapsto \mathbb{R}\) being a learned differential Lyapunov function \( \mathcal{V}(x,\delta x) = \delta x^{\top}\mathcal{M}(x)\delta x\) for \( S = \cup_{t\in \mathcal{S}_t}\varphi_t(\mathcal{S}_s)\) . Define \( \bar{B}\) , \( B_H\) , \( B_{\nabla q}\) , and \( B_{\nabla \mathcal{V}}\) as

\[ \begin{align} \bar{B}&=B_H(B_{\nabla q}+\alpha B_{\nabla \mathcal{V}})\left(\frac{\overline{m}}{\underline{m}}\right)^{3/2} \\ B_H &= \sup_{x\in S}\left\|\frac{\partial^2f}{\partial x^2}\right\| \\ B_{\nabla q} &= \sup_{x\in S}\|\nabla q(x)\| \\ B_{\nabla \mathcal{V}} &= \sup_{x\in S}\|\nabla \mathcal{V}(x)\| \\\end{align} \]

where \( q = \nabla \mathcal{V}(x)^{\top} f(x)\) . Also, we define \( r_{\epsilon_{\ell}}\) and \( r_b\) as

\[ \begin{align} r_{\epsilon_{\ell}} &= \sup\left\{r\in\mathbb{R}_{>0}:r^n\zeta_n(r)\leq \frac{\epsilon_{\ell}\mu_{\mathrm{Leb}}(\mathcal{S}_s)}{\mu_{\mathrm{Leb}}(\mathbb{B}_2^{n}(1))}\right\} \\ r_b &= \sqrt{\frac{r_{\epsilon_{\ell}}\bar{B}}{\eta \alpha \underline{m}_L}} \\\end{align} \]

where \( \eta \in (0,1)\) . Finally, define \( \tilde{X}_t(r_b)\) as

\[ \begin{align} \tilde{X}_t(r_b) = \left\{\xi \in \tilde{X}:\inf_{\delta \xi \in \mathbb{S}^{n-1}} \|\theta_t(\delta \xi,\xi)\| \geq r_b\right\} \\\end{align} \]

(264)

for \( t \in \mathcal{S}_t\) , where \( \tilde{X} = \{\xi \in \mathcal{S}_s: \mathbb{B}^n_2(\xi,r_{\epsilon_{\ell}}) \subset \mathcal{S}_s\}\) . Then the system is contracting in the learned metric defined by \( \mathcal{M}(x)\) at the rate \( (1-\eta)\alpha\) with \( \eta \in (0,1)\) , for every \( x \in \tilde{S}(r_b) = \cup_{t\in \mathcal{S}_t}\varphi_t(\tilde{X}_t(r_b))\) .

References

[1] Donald E. Kirk Optimal Control Theory: An Introduction Dover Publications 2004 Apr

[2] D. Angeli A Lyapunov approach to incremental stability properties IEEE Trans. Autom. Control 2002 47 3 410-421

[3] J. Jouffroy, J.-J. E. Slotine, Methodological remarks on contraction theory, in: IEEE CDC, volume 3, 2004, pp. 2537–2543.

[4] W. Lohmiller, J.-J. E. Slotine, On contraction analysis for nonlinear systems, Automatica 34 (1998) 683 – 696.

[5] Wilson J. Rugh Linear Systems Theory Prentice-Hall, Inc. 1996 USA

[6] J.-J. E. Slotine, W. Li, Applied Nonlinear Control, Pearson, Upper Saddle River, NJ, 1991.

[7] Herbert Robbins and Sutton Monro A stochastic approximation method Ann. Math. Statist. 1951 22 3 400 – 407

[8] Patrick M. AND Jean-Jacques E. Slotine Wensing Beyond convexity – Contraction and global convergence of gradient descent PLOS ONE 2020 15 1-29 08

[9] Shun-Ichi Amari Natural gradient works efficiently in learning Neural Computation 1998 10 2 251-276

[10] Constantin Udriste Convex functions and optimization methods on Riemannian manifolds Springer Science & Business Media 297

[11] I. R. Manchester and Jean-Jacques E. Slotine Control contraction metrics: Convex and intrinsic criteria for nonlinear feedback design IEEE Trans. Autom. Control 2017 62 6 3046-3053

[12] Harold J. Kushner Stochastic Stability and Control Academic Press New York 1967

[13] Q. Pham, N. Tabareau, J.-J. E. Slotine, A contraction theory approach to stochastic incremental stability, IEEE Trans. Autom. Control 54 (2009) 816–820.

[14] Eric Mazumdar and Tyler Westenbroek and Michael I. Jordan and S. Shankar Sastry High confidence sets for trajectories of stochastic time-varying nonlinear systems IEEE CDC 2020 4275-4280

[15] SooJean Han and Soon-Jo Chung Incremental nonlinear stability analysis for stochastic systems perturbed by Lévy noise arXiv:2103.13338 Mar 2021

[16] S.-J. Chung, S. Bandyopadhyay, I. Chang, F. Y. Hadaegh, Phase synchronization control of complex networks of Lagrangian systems on adaptive digraphs, Automatica 49 (2013) 1148–1161.

[17] S.-J. Chung, J.-J. E. Slotine, Cooperative robot control and concurrent synchronization of Lagrangian systems, IEEE Trans. Robot. 25 (2009) 686–700.

[18] Ruigang Wang and Roland Tóth and Ian R. Manchester Virtual control contraction metrics: Convex nonlinear feedback design via behavioral embedding arXiv:2003.08513 Mar 2020

[19] R. Wang, R. Tóth, I. R. Manchester, A comparison of LPV gain scheduling and control contraction metrics for nonlinear control, in: 3rd IFAC Workshop on LPVS, volume 52, 2019, pp. 44–49.

[20] S. Singh and A. Majumdar and Jean-Jacques E. Slotine and M. Pavone Robust online motion planning via contraction theory and convex optimization IEEE ICRA 2017 5883-5890

[21] Sumeet Singh and Spencer M Richards and Vikas Sindhwani and Jean-Jacques E. Slotine and Marco Pavone Learning stabilizable nonlinear dynamics with contraction-based regularization Int. J. Robot. Res. 2020

[22] Ian R. Manchester and Jean-Jacques E. Slotine Robust control contraction metrics: A convex approach to nonlinear state-feedback \( \mathcal{H}^\infty\) control IEEE Control Systems Letters 2018 2 3 333-338

[23] H. Tsukamoto, S.-J. Chung, Robust controller design for stochastic nonlinear systems via convex optimization, IEEE Trans. Autom. Control 66 (2021) 4731–4746.

[25] S. Boyd and L. El~{G}haoui and E. Feron and V. Balakrishnan Linear Matrix Inequalities in System and Control Theory SIAM 1994 15 Studies in Applied Mathematics Philadelphia, PA Jun

[26] H. K. Khalil, Nonlinear Systems, 3rd ed., Prentice-Hall, Upper Saddle River, NJ, 2002.

[27] H. Tsukamoto and S.-J. Chung Neural contraction metrics for robust estimation and control: A convex optimization approach IEEE Control Syst. Lett. 2021 5 1 211-216

[28] Hiroyasu Tsukamoto and Soon-Jo Chung and Jean-Jacques E. Slotine Neural stochastic contraction metrics for learning-based control and estimation IEEE Control Syst. Lett. 2021 5 5 1825-1830

[29] Stephen Boyd and Lieven Vandenberghe Convex Optimization Cambridge University Press Mar

[30] A. P. Dani, S.-J. Chung, S. Hutchinson, Observer design for stochastic nonlinear systems via contraction-based incremental stability, IEEE Trans. Autom. Control 60 (2015) 700–714.

[31] R. G. Brown, P. Y. C. Hwang, Introduction to Random Signals and Applied Kalman Filtering: with MATLAB Exercises and Solutions, 4th ed., Wiley, New York, NY, 1997.

[32] H. Tsukamoto and S.-J. Chung and Jean-Jacques E. Slotine and Chuchu Fan A theoretical overview of neural contraction metrics for learning-based control with guaranteed stability IEEE CDC 2021

[33] S. Bonnabel and Jean-Jacques E. Slotine A contraction theory-based analysis of the stability of the deterministic extended Kalman filter IEEE Trans. Autom. Control 2015 60 2 565-569

[34] Nicholas M. Boffi and Stephen Tu and Nikolai Matni and Jean-Jacques E. Slotine and Vikas Sindhwani Learning stability certificates from data arXiv:2008.05952 2020

[35] T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida, Spectral normalization for generative adversarial networks, in: Int. Conf. Learn. Representations, 2018.

[36] Max Revay and Ruigang Wang and Ian R. Manchester Lipschitz bounded equilibrium networks arXiv:2010.01732 Oct 2020

[37] Max Revay and Ruigang Wang and Ian R. Manchester Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness arXiv:2104.05942 Apr 2021

[38] Ian R. Manchester and Max Revay and Ruigang Wang Contraction-based methods for stable identification and robust machine learning: A tutorial IEEE CDC 2021

[39] Xiangli Chen and Mathew Monfort and Anqi Liu and Brian D. Ziebart Robust Covariate Shift Regression 19th Int. Conf. Artif. Intell. Statist. 2016 Gretton, Arthur and Robert, Christian C. 51 Proc. Mach. Learn. Res. 1270–1279

[40] Anqi Liu and Guanya Shi and Soon-Jo Chung and Anima Anandkumar and Yisong Yue Robust regression for safe exploration in control CoRL 2020

[41] Chiyuan Zhang and Samy Bengio and Moritz Hardt and Benjamin Recht and Oriol Vinyals Understanding deep learning requires rethinking generalization 5th Int. Conf. Learn. Representations 2017 OpenReview.net

[42] Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun Deep residual learning for image recognition IEEE Conf. Comput. Vision Pattern Recognit. 2016 770-778

[43] Guanya Shi and Xichen Shi and Michael O’Connell and Rose Yu and Kamyar Azizzadenesheli and Animashree Anandkumar and Yisong Yue and Soon-Jo Chung Neural lander: Stable drone landing control using learned dynamics IEEE ICRA 2019 9784-9790

[44] G. K. Dziugaite, D. M. Roy, Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data, in: 33rd Conf. Uncertainty Artif. Intell., AUAI Press, Sydney, Australia, 2017.

[45] B. Neyshabur, S. Bhojanapalli, D. Mcallester, N. Srebro, Exploring generalization in deep learning, in: Adv. Neural Inf. Process. Syst., volume 30, Curran Associates, Inc., 2017.

[46] Peter L Bartlett and Dylan J Foster and Matus J Telgarsky Spectrally-normalized margin bounds for neural networks Adv. Neural Inf. Process. Syst. 2017 30

[47] B. Neyshabur, S. Bhojanapalli, N. Srebro, A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks, in: Int. Conf. Learn. Representations, 2018.

[48] Andrew J. Taylor and Victor D. Dorobantu and Hoang M. Le and Yisong Yue and Aaron D. Ames Episodic learning with control Lyapunov functions for uncertain robotic systems IEEE/RSJ IROS 2019 6878-6884

[49] Richard Cheng and Abhinav Verma and Gabor Orosz and Swarat Chaudhuri and Yisong Yue and Joel Burdick Control regularization for reduced variance reinforcement learning 36th Int. Conf. Mach. Learn. 2019 Chaudhuri, Kamalika and Salakhutdinov, Ruslan 97 Proc. Mach. Learn. Res. 1141–1150 PMLR

[50] Christopher D. McKinnon and Angela P. Schoellig Learn fast, forget slow: Safe predictive learning control for systems With unknown and changing dynamics performing repetitive tasks IEEE Robot. Automat. Lett. 2019 4 2 2180-2187

[51] Hiroyasu Tsukamoto and Soon-Jo Chung Learning-based robust motion planning with guaranteed stability: A contraction theory approach IEEE Robot. Automat. Lett. 2021 6 4 6164-6171 10.1109/LRA.2021.3091019

[52] Hiroyasu Tsukamoto and Soon-Jo Chung and Jean-Jacques E. Slotine Learning-based adaptive control via contraction theory IEEE CDC 2021

[53] Guanya Shi and Wolfgang Hönig and Yisong Yue and Soon-Jo Chung Neural-swarm: Decentralized close-proximity multirotor control using learned interactions IEEE ICRA 2020 3241-3247

[54] Guanya Shi and Wolfgang Hönig and Xichen Shi and Yisong Yue and Soon-Jo Chung Neural-swarm2: Planning and control of heterogeneous multirotor swarms using learned interactions IEEE Trans. Robot. 2021 1-17

[55] Y. Nakka, R. Foust, E. Lupu, D. Elliott, I. Crowell, S.-J. Chung, F. Hadaegh, A six degree-of-freedom spacecraft dynamics simulator for formation control research, in: Astrodyn. Special. Conf., 2018.

[56] Dawei Sun and Susmit Jha and Chuchu Fan Learning certified control using contraction metric arXiv:2011.12569 Nov 2020

[57] E. Sontag A Lyapunov-like characterization of asymptotic controllability SIAM J. Control Optim. 1983 21 3 462-471

[58] E. D. Sontag, A ‘universal' construction of Artstein's theorem on nonlinear stabilization, Syst. Control Lett. 13 (1989) 117 – 123.

[59] Hiroyasu Tsukamoto and Soon-Jo Chung Convex optimization-based controller design for stochastic nonlinear systems using contraction analysis IEEE CDC 2019 8196–8203

[60] Jeff S. Shamma and James R. Cloutier Gain-scheduled missile autopilot design using linear parameter varying transformations J. Guid. Control Dyn. 1993 16 2 256-263 10.2514/3.20997

[61] S. Bennani and D. Willemsen and C. Scherer and C. Scherer and S. Bennani and D. Willemsen Robust LPV control with bounded parameter rates Guid. Navigation Control Conf. 1997

[62] A. G. Barto and R. S. Sutton and C. W. Anderson Neuronlike adaptive elements that can solve difficult learning control problems IEEE Trans. Syst. Man Cybern. 1983 SMC-13 5 834-846

[63] B. Rivière and W. Hönig and Y. Yue and S. Chung GLAS: Global-to-local safe autonomy synthesis for multi-robot motion planning with end-to-end learning IEEE Robot. Automat. Lett. 2020 5 3 4249-4256

[64] B. T. Lopez and Jean-Jacques E. Slotine Adaptive nonlinear control with contraction metrics IEEE Control Syst. Lett. 2021 5 1 205-210

[65] Jean-Jacques E. Slotine and J. A Coetsee Adaptive sliding controller synthesis for non-linear systems Int. J. Control 1986 43 6 1631-1651

[66] J.-J. E. Slotine, Modular stability tools for distributed computation and control, Int. J. Adaptive Control Signal Process. 17 (2003) 397–416.

[67] J.-J. E. Slotine, W. Lohmiller, Modularity, evolution, and the binding problem: a view from stability theory, Neural Networks 14 (2001) 137 – 145.

[68] W. Wang, J.-J. E. Slotine, On partial contraction analysis for coupled nonlinear oscillators, Biol. Cybern. 92 (2005) 38–53.

[69] Nicholas M. Boffi and Jean-Jacques E. Slotine Implicit regularization and momentum algorithms in nonlinear adaptive control and prediction arXiv:1912.13154 2020

[70] Brett T. Lopez and Jean-Jacques E. Slotine Universal adaptive control of nonlinear systems arXiv:2012.15815 2021