If you see this, something is wrong
To get acquainted with the document, the best thing to do is to select the "Collapse all sections" item from the "View" menu. This will leave visible only the titles of the top-level sections.
Clicking on a section title toggles the visibility of the section content. If you have collapsed all of the sections, this will let you discover the document progressively, from the top-level sections to the lower-level ones.
Generally speaking, anything that is blue is clickable.
Clicking on a reference link (like an equation number, for instance) will display the reference as close as possible, without breaking the layout. Clicking on the displayed content or on the reference link hides the content. This is recursive: if the content includes a reference, clicking on it will have the same effect. These "links" are not necessarily numbers, as it is possible in LaTeX2Web to use full text for a reference.
Clicking on a bibliographical reference (i.e., a number within brackets) will display the reference.
Speech bubbles indicate a footnote. Click on the bubble to reveal the footnote (there is no page in a web document, so footnotes are placed inside the text flow). Acronyms work the same way as footnotes, except that you have the acronym instead of the speech bubble.
By default, discussions are open in a document. Click on the discussion button below to reveal the discussion thread. However, you must be registered to participate in the discussion.
If a thread has been initialized, you can reply to it. Any modification to any comment, or a reply to it, in the discussion is signified by email to the owner of the document and to the author of the comment.
The blue button below that says "table of contents" is your tool to navigate in a publication.
The left arrow brings you to the previous document in the publication, and the right one brings you to the next. Both cycle over the publication list.
The middle button that says "table of contents" reveals the publication table of contents. This table is hierarchical structured. It has sections, and sections can be collapsed or expanded. If you are a registered user, you can save the layout of the table of contents.
First published on Monday, May 5, 2025 and last modified on Monday, May 5, 2025
Centre Automatique et Systèmes (CAS), Mines Paris-PSL.
This tutorial provides an overview of the current state-of-the-art in the sensitivity analysis for nonlinear programming. Building upon the fundamental work of Fiacco, it derives the sensitivity of primal-dual solutions for regular nonlinear programs and explores the extent to which Fiacco’s framework can be extended to degenerate nonlinear programs with non-unique dual solutions. The survey ends with a discussion on how to adapt the sensitivity analysis for conic programs and approximate solutions obtained from interior-point algorithms.
Sensitivity analysis has been a long standing research thread in nonlinear programming. It aims at evaluating the sensitivity of a solution \( x \in \mathbb{R}^n\) of a nonlinear problem with respect to parameter changes. For a parameter \( p\in\mathbb{R}^\ell\) , the solution of a generic optimization problem is given as solution of a (possibly nonsmooth) variational problem:
(1)
with \( F: \mathbb{R}^n \times \mathbb{R}^\ell \to \mathbb{R}^n\) a function encoding the stationary conditions of the optimization problem (e.g. the Karush-Kuhn-Tucker stationary conditions, or the homogeneous self-dual embedding associated to a conic program). The solution of (1) depends on \( p\) and is noted \( x(p)\) . If the function \( F\) is regular enough, the Implicit Function Theorem gives an elegant way to evaluate the Jacobian \( J_p x(p) \in \mathbb{R}^{n \times \ell}\) of the solution \( x(p)\) of (1) as the solution of the linear system:
(2)
Most applications require only the forward and adjoint sensitivities, alleviating the need to compute the full Jacobian \( J_p x(p)\) . For vectors \( u \in \mathbb{R}^\ell\) and \( v \in \mathbb{R}^n\) , the forward and adjoint sensitivities are defined respectively as
(3)
Following [1], we call implicit differentiation the class of differentiation methods based on the relation (2). Implicit differentiation requires solving the nonlinear system (1) exactly and then relies on the Implicit Function Theorem: it is accurate, but often limited in scope. A practical alternative uses the fact that the system (1) is usually solved iteratively, e.g. using an optimization solver. If the iterative method is differentiable (in the sense each of its operation can be differentiated by automatic differentiation), the sensivitity \( J_p x(p)\) can be computed iteratively using forward accumulation (also known as piggy back method [2]). This second mode does not rely on the Implicit Function Theorem, as it propagates the derivatives directly through the algorithm. The main advantage is the derivatives are directly computed at the solution, meaning we do not have to solve an additional linear system as in (2). However, the method can be sensitive to numerical error and the derivatives \( J_p x(p)\) converge more slowly than the primal solution \( x(p)\) [3, 4].
Hence, implicit differentiation remains the favorite method to compute the sensitivities. Unfortunately, the hypothesis of the Implicit Function Theorem are often too stringent to hold for a generic nonlinear program, limiting the applicability of (2) unless practical workarounds are applied. For nonlinear programs, the problem’s regularity is related to its non-degeneracy: if we perturb slightly the parameters, we have to ensure that the primal solution exists and remains unique. Depending on the structure of the problem, the perturbed solution may have the following demonstrable properties [5], ordered by difficulty:
In this tutorial, we aim at summarizing the conditions under which the sensitivity analysis is well-posed.
We start by a brief review of literature to remind the main developments in the theory of sensitivity analysis for nonlinear programming.
The regularity of the KKT solution has been studied extensively in the 1970s by Robinson [6, 7], by interpreting the nonlinear program as a generalized nonlinear equation. It is now established that the regularity of a nonlinear program usually follows from the constraint qualification conditions satisfied at the KKT solution. Robinson proved in [7] that the primal-dual solution is strongly regular (hence, locally unique and Lipschitz continuous) if LICQ and SSOSC holds at the original solution. In addition, the (Fréchet) differentiability of the primal-dual solution is established if SCS hold, as established by Fiacco in his seminal result [8].
The previous results have been extended in the 1980s, with a primary focus on dropping the strong SCS condition required in [8]. Kojima [9] proved that MFCQ and GSSOSC are sufficient to guarantee that the primal-dual solution is strongly stable. Moreover, the primal-dual solution becomes non-differentiable as soon as we drop the SCS condition, but it remains directionally differentiable under certain assumptions. Jittorntrum [10] showed that if LICQ and SSOSC hold, then the directional derivative of the primal-dual solution exists and is given as the solution of a quadratic program (QP).
If we drop the LICQ condition, the dual solution becomes non-unique and the directional differentiability of the dual solution is lost. Shapiro [11] showed that the primal solution remains directionally differentiable if we replace LICQ by SMCFQ and SSOSC. This result was extended by Kyparisis in [12], who proved that we retain directionally differentiability for the primal solution if we assume MFCQ, CRCQ and GSSOSC, that is, without requiring unicity of the dual solution. This thread of research culminated in 1995 with the publication of the seminal result of Ralph and Dempe [13], giving a practical method to evaluate the directional derivative of the primal solution under MFCQ.
The theory of sensitivity analysis for nonlinear program has witnessed a significant consolidation after the 1990s, with the publication of three consequent books dedicated to the sensitivity analysis of optimization programs [14, 15, 16]. In particular, the application of sensitivity analysis has been extended to the more general setting offered by variational inequalities, which covers both nonlinear programs and complementarity problems. In parallel, numerous applications have flourished out of the sensitivity analysis theory, one of the most significant being bilevel programming [17].
Sensitivity analysis has proven to be a fertile ground for the analysis of stochastic optimization problems [18, 19], parametric decomposition [20, 21], parameter estimation problems [22] and last but not least, model predictive control [23, 24, 25]. Recently, the field has witnessed a renewed interest with the emergence of the differentiable programming paradigm in the machine learning community, where people look at embedding a (convex) optimization solver inside a neural network [26, 27] (whose training requires the evaluation of the sensitivity in the backward pass).
Section 2 is devoted to the theory of nonlinear programming, with an emphasis on the constraint qualifications used to define the regularity of the feasible set. Section 3 summarizes the main results about the sensitivity of the primal-dual solution in nonlinear programming. It extends Fiacco’s seminal work [28] and puts a special consideration on degenerate nonlinear programs, following the lead of Ralph and Dempe [13]. Section 4 focuses more particularly on the sensitivity of the objective function. Last, Section 5 details practical methods and specific software implementations to compute the sensitivity of any optimization program, including an extended discussion on the sensitivity of conic programs.
This section recalls classical results in the theory of nonlinear programming, with a long discussion on constraint qualifications §2.2. The scope of this tutorial is limited to finite-dimensional optimization problems. We refer to [14] for a broader description covering the infinite dimensional cases.
For a parameter \( p \in \mathbb{R}^{\ell}\) , we are interested in solving the parametric nonlinear optimization problem:
(4)
where \( f:\mathbb{R}^{n} \times \mathbb{R}^{\ell} \to \mathbb{R}\) an objective and \( g:\mathbb{R}^{n} \times \mathbb{R}^{\ell} \to \mathbb{R}^{m_e}\) and \( h:\mathbb{R}^{n} \times \mathbb{R}^{\ell} \to \mathbb{R}^{m_i}\) two functions encoding respectively the equality and inequality constraints.
All functions are twice-differentiable and depend jointly on the optimization variable \( x \in \mathbb{R}^{n}\) and a parameter \( p \in \mathbb{R}^{\ell}\) . The feasible set of the optimization problem (4) is encoded by the set-valued function
(5)
We define the value function \( \varphi:\mathbb{R}^{\ell} \to \mathbb{R} \cup \{+\infty\}\) as the solution of NLP(\( p\) for a given parameter \( p \in \mathbb{R}^{\ell}\) :
(6)
By convention \( \varphi(p) = +\infty\) if the problem is infeasible (\( X(p) = \varnothing\) ). The solution set associated to (6) is noted
(7)
We introduce the Lagrangian associated to (4) as
(8)
where \( y \in \mathbb{R}^{m_e}\) (resp. \( z \in \mathbb{R}^{m_i}\) ) the multiplier associated to the equality (resp. inequality) constraints. The Karush-Kuhn-Tucker (KKT) conditions of (4) state that under a certain constraint qualification, the primal vector \( x\) is a local optimal solution if there exists a dual multiplier \( (y, z)\) such that
(9)
The KKT conditions become necessary and sufficient in the convex case. We define a primal-dual stationary solution (or KKT stationary solution) as the vector
(10)
To alleviate the notations, we often denote by \( w^\star\) the vector \( w(p)\) .
The set of multipliers \( (y, z) \in \mathbb{R}^{m_e+m_i}\) satisfying the KKT conditions (9) is denoted by \( \mathcal{M}_p(x)\) . We notice that the set \( \mathcal{M}_p(x)\) is polyhedral (hence closed and convex). We note by \( \mathcal{E}_p(x)\) the set of extreme points of the polyhedron \( \mathcal{M}_p(x)\) . We say that the problem is degenerate if \( \mathcal{M}_p(x)\) has more than one element.
For a given \( x \in \mathbb{R}^{n}\) , we define the active set at \( x\) as
(11)
For given multipliers \( (y, z) \in \mathcal{M}_p(x)\) , we define the strongly and weakly active constraints as
(12)
The strongly and weakly active constraints form a partition of the active set: \( \mathcal{I}_p(x) = \mathcal{I}_p^+(x; z) \cup \mathcal{I}_p^0(x; z)\) and \( \mathcal{I}_p^+(x; z) \cap \mathcal{I}_p^0(x; z) = \varnothing\) .
Degeneracy arises from the undetermined weakly active constraints \( \mathcal{I}_p^0(x, z)\) : a small perturbation of the parameter \( p\) is likely to render them inactive. We say that the problem satisfies strict complementarity slackness (SCS) if there is no weakly active constraint: \( \mathcal{I}_p^0(x; z) = \varnothing\) .
The critical cone is a key notion to characterize the second-order optimality conditions of a nonlinear program.
Definition 1 (Critical cone)
For a feasible point \( x \in X(p)\) , the critical cone associated to the feasible set \( X(p)\) at \( x\) is defined as
(13)
We note that at a KKT solution \( (x^\star, y^\star, z^\star)\) , the critical cone simplifies as
(14)
Further, if we assume that strict complementarity slackness holds, the critical cone is exactly the null space of the active constraints’ Jacobian: \( \mathcal{C}_p(x^\star) = N(J_{act}(x^\star))\) :
(15)
In parametric optimization, it is common to introduce the critical set associated to the critical cone (13) at the solution [15, 13].
Definition 2 (Critical set)
Let \( (x^\star, y^\star, z^\star)\) be a KKT stationary solution. The critical set at the solution is the polyhedral cone defined as
(16)
Elements of \( \mathcal{K}_p(x^\star, z^\star)\) are called critical directions. The directional critical set in the direction \( h \in \mathbb{R}^{\ell}\) is defined as
(17)
Constraint qualifications ensure that we can capture locally the geometry of the feasible set \( X(p)\) by linearizing it near a feasible point \( x \in X(p)\) . The regularity of a nonlinear program is defined by the particular constraint qualifications that hold at a stationary point. We recall in this section the main constraint qualifications we can encounter in practice, and their respective impacts on the problem’s topology.
Definition 3 (Linear-independence constraint qualification)
LICQ holds if the gradients of active constraints
(18)
are linearly independent.
LICQ implies the following second-order necessary condition.
Theorem 1
Suppose \( (x^\star, y^\star, z^\star)\) is a primal-dual solution of (4) satisfying LICQ. Then,
(19)
where \( \mathcal{C}_p(x)\) is the critical cone of the feasible set \( X(p)\) at \( x\) .
Definition 4 (Mangasarian-Fromovitz constraint qualification)
MFCQ holds if the gradients of the equality constraints
(20)
are linearly independent and if there exists a vector \( d \in \mathbb{R}^{n_x}\) such that
(21)
It is well known that MFCQ generalizes the Slater’s condition of convex optimization to nonlinear programming. MFCQ also implies the boundedness of the optimal multipliers set.
Theorem 2 ([29])
The set \( \mathcal{M}_p(x)\) is bounded if and only if MFCQ holds.
Proposition 1 (Dual MFCQ)
Using the theorem of alternatives, MFCQ is equivalent to \( 0\) being the unique solution of the linear system
(22)
Definition 5 (Strict Mangasarian-Fromovitz constraint qualification)
SMFCQ holds if
(23)
are linearly independent and if there exists a vector \( d \in \mathbb{R}^{n_x}\) such that
(24)
A primal-dual solution \( (x^\star, y^\star, z^\star)\) satisfying SMCFQ implies a second-order necessary condition analogous to Theorem 1 [30, Theorem 2.1]. SMFCQ is the weakest condition under which the multiplier set \( \mathcal{M}_p(x)\) reduces to a singleton.
Proposition 2 (Proposition 1.1, [30])
Let \( w = (x, y, z)\) be a primal-dual solution. Then SMFCQ holds at \( x\) if and only if the multiplier vector \( (y, z)\) is unique.
Definition 6 (Constant-rank constraint qualification (CRCQ))
CRCQ holds if there exists a neighborhood \( W\) of \( x^\star\) such that for any subsets \( I\) of \( \mathcal{I}_p(x^\star)\) and \( J\) of \( [m_e]\) the family of gradient vectors
(25)
has the same rank for all vectors \( x \in W\) .
We now focus on second-order constraint qualifications, starting with the second-order sufficiency condition.
Definition 7 (Second-order sufficiency condition (SOSC))
SOSC holds at \( x\) if there exists \( (y, z) \in \mathcal{M}_p(x)\) such that
(26)
Proposition 3
Suppose SOSC holds at a solution \( x^\star\) . Then, there exists \( \sigma > 0\) and a neighborhood \( V\) of \( x^\star\) such that for all \( x \in X(p) \cap V\) (with \( x \neq x^\star\) ),
(27)
We define the set
(28)
Note that at a primal-dual solution \( (x, y, z)\) , \( \mathcal{C}_p(x; z) \subset \mathcal{D}_p(x; z)\) with both sets being equal if SCS holds.
Definition 8 (Strong second-order sufficiency condition (SSOSC))
SSOSC holds at \( x\) if there exists \( (y, z) \in \mathcal{M}_p(x)\) such that
(29)
If the condition (29) hold for all multipliers in \( \mathcal{M}_p(x)\) , then we say that the generalized second-order sufficiency condition (GSSOSC) hold.
If SCS hold, we note that SSOSC becomes equivalent to SOSC.
Let \( p^\star \in \mathbb{R}^{\ell}\) be a parameter, and \( w^\star = (x^\star, y^\star, z^\star)\) a KKT stationary solution of NLP(\( p\) ). We aim at studying the behavior of the primal-dual solution \( w(p) := (x(p), y(p), z(p))\) near \( p^\star\) . In particular, we are interested in characterizing the local continuity of the solution \( w(p)\) (i.e., is \( w(p)\) Lipschitz-continuous?) as well as its local differentiability (i.e., under which conditions is \( w(p)\) differentiable?).
We start by recalling Fiacco’s theorem in §3.1. This result is often restrictive, as it does not allow for active set changes in the solution. For that reason, the sensitivity are often derived using Robinson’s generalized equations, as described in §3.2. The generalized equation framework is a convenient way to encode the nonsmoothness in the problem’s solution. In §3.3, we show that it can also be interpreted using the notion of lexicographic derivative. The last section §3.4 is devoted to the differentiation of degenerate nonlinear program, where the dual solution is not unique. We show that in that case the notion of \( PC^1\) functions is the most adapted to characterize the nature of the solution.
If the problem is regular, the local continuity is usually given by the Implicit Function Theorem.
Theorem 3 (Implicit function theorem)
Let \( F: \mathbb{R}^{n} \times \mathbb{R}^{\ell} \to \mathbb{R}^{n}\) be a \( C^1\) function and \( (x^\star, p^\star) \in \mathbb{R}^{n} \times \mathbb{R}^{\ell}\) such that \( F(x^\star, p^\star) = 0\) . If the Jacobian \( J_x F(x^\star, p^\star)\) is invertible, then there exists a neighborhood \( U \subset \mathbb{R}^{\ell}\) of \( p^\star\) and a differentiable function \( x(p)\) such that \( F(x(p), p)= 0\) for all \( p \in U\) , and \( x(p)\) is the unique solution in a neighborhood of \( x^\star\) . In addition, \( x(\cdot)\) is \( C^1\) differentiable on \( U\) , with
(30)
In [8], Fiacco investigated under which conditions the KKT equations falls under the conditions of the implicit function theorem. To do so, we should ensure (i) the primal solution mapping \( x(p)\) is single-valued (i.e. the problem (4) has an unique solution) (ii) the dual solution \( (y(p), z(p))\) is unique and (iii) the active set is locally stable. Fiacco showed that these three conditions are satisfied if respectively (i) SOSC (ii) LICQ and (iii) SCS hold.
In that particular case, the KKT equations (9) rewrite as a smooth system of nonlinear equations depending continuously on \( w = (x, y, z)\) . For \( i \in \mathcal{I}_p(x)\) we have \( z_i > 0\) and \( h_i(x, p) = 0\) ; alternatively for \( i \in [m_i] \setminus \mathcal{I}_p(x)\) we have \( z_i = 0\) and \( h_i(x, p) < 0\) . By differentiating the complementarity condition \( z_i h_i(x, p) = 0\) , we get for all \( i = 1,…, m_i\) ,
(31)
Hence, if \( i \in [m_i] \setminus \mathcal{I}_p(x)\) we get by elimination \( \frac{\partial z_i}{\partial p} = 0\) : the multiplier \( z_i = 0\) remains constant after a small perturbation, meaning the constraints remain inactive. Alternatively, if \( i \in \mathcal{I}_p(x)\) , we deduce that \( \frac{\partial h_i}{\partial x}\frac{\partial x}{\partial p} + \frac{\partial h_i}{\partial p} = 0\) : the constraint remains locally active. As a consequence, we can discard the inactive constraints information from the KKT conditions and interpret the remaining active constraints as equality constraints. We note \( z_\mathcal{I} = \{z_i \}_{i \in \mathcal{I}_p}\) the vector storing the active multipliers and \( h_\mathcal{I}(x, p) = \{h_i(x, p) \}_{i \in \mathcal{I}_p}\) the vector storing the active inequality constraints.
Locally, the solution \( w := (x, y, z_\mathcal{I})\) satisfies the smooth nonlinear system:
(32)
We note the Jacobian of \( F(\cdot)\) :
(33)
The sensitivity of the solution is given by the following theorem, which applies the Implicit Function Theorem 3 to the system (32).
Theorem 4 (Theorem 3.2.2 [8])
Suppose that at a KKT stationary solution \( x^\star\) of NLP\( (p^\star)\) , SOSC, LICQ and SCS hold. Then,
there exists a unique continuously differentiable function \( w(\cdot)\) defined in a neighborhood \( U\) of \( p^\star\) such that \( w(p^\star) =(x^\star, y^\star, z^\star_\mathcal{I})\) , with
(34)
In detail, Equation (34) gives the sensitivity of the primal-dual solution as
(35)
The operation translates as the solution of a (sparse) linear system with \( \ell\) right-hand-sides. We note that the linear system (35) has a saddle point structure: the SOSC and LICQ conditions implies that it is invertible [31, Theorem 3.4]. If the original problem NLP(\( p\) is solved with a Newton algorithm, the factorization of the Jacobian matrix \( J_w F\) is already available, leaving only to compute \( \ell\) backsolves.
Remark 1 (Linear programming)
If the problem (4) is a linear program (LP), we have \( \nabla_{xx}^2 \mathcal{L} = 0\) . The Goldman-Tucker theorem states there always exists a LP solution satisfying strict-complementarity. If the LP is not degenerate, LICQ holds and the solution is a vertex of the polyhedral feasibility set \( X(p)\) . This automatically implies SOCP (the critical cone reduces to \( \mathcal{C}_p(x^\star) = \{0 \}\) ) and there are exactly \( n\) binding constraints: \( m_e + | \mathcal{I}_p(x^\star) | = n\) . Hence Theorem 4 applies to the linear programming case. We denote the active Jacobian at the solution \( B = \begin{bmatrix} J_x g^\top & J_x h_\mathcal{I}^\top \end{bmatrix}^\top\) . The matrix \( B\) has dimension \( n \times n\) and is non-singular (LICQ implies it is full row-rank). We note that \( B\) is exactly the basis matrix used in the simplex algorithm. In that case, the inverse of the matrix \( M = \begin{bmatrix} 0 & B^\top \\ B & 0 \end{bmatrix}\) in (34) satisfies:
(36)
Hence, the computation of the sensitivities simplifies considerably if the problem is regular linear program. If the linear program is solved using the simplex algorithm, the solver usually returns a LU factorization for the basis matrix \( B\) .
Remark 2 (Exponential decay)
Certain optimization problems have a graph structure: in that case the primal-dual variable can be reordered following a set of nodes \( \mathcal{V}\) such that \( w = \{w_i \}_{i \in \mathcal{V}}\) . As a consequence, the Jacobian matrix \( M\) appearing in (34) can also be reordered following the graph structure: in the language of linear algebra, there exists a permutation matrix \( P\) such that the matrix \( P^\top M P\) is banded. It is known that terms in the inverse of a banded matrix decay exponentially fast away from the diagonal [32]. Then, assuming SSOSC, SCS and LICQ the solution of (34) satisfies the exponential decay of sensitivity property [33], that is, for two parameters \( p, p'\) close enough,
(37)
with \( w_i(p)\) the primal-dual nodal solution at node \( i \in \mathcal{V}\) and \( C_{ij} = \Gamma \rho^{d_G(i, j)}\) (with \( \Gamma > 0\) and \( \rho > 0\) two positive constants) and \( d_G(i, j)\) the graph distance between node \( i\) and node \( j\) . The Equation (37) has important consequence for computing the sensitivities in a decentralized fashion [34] or building a preconditionner to solve (34) with an iterative method [35].
We emphasize that the strict complementarity condition is unlikely to hold on practical optimization problems, where constraints can change locally from active to inactive. For that reason, it has been investigated how to generalize Fiacco’s result without the strict complementarity assumption. In [36], Robinson proposed a powerful framework to characterize the solution of nonlinear program (4), by interpreting the KKT conditions (9) as a generalized equation \( F(w, p) + N_C(w) \ni 0\) (see Appendix C for a brief recall on generalized equations).
Proposition 4 (KKT system as generalized equation)
Let \( w = (x, y, z) \in \mathbb{R}^{n} \times \mathbb{R}^{m_e} \times \mathbb{R}^{m_i}\) be a primal-dual variable. A primal-dual solution \( w\) of the KKT system (9) is also solution of the generalized equation
(38)
with \( F(w, p) := \begin{bmatrix} \phantom{-}\nabla_x \mathcal{L}(x, y, z, p) \\ -\nabla_y \mathcal{L}(x, y, z, p) \\ -\nabla_z \mathcal{L}(x, y, z, p) \end{bmatrix}\) and \( C = \mathbb{R}^{n} \times \mathbb{R}^{m_e} \times \mathbb{R}^{m_i}_+\) .
When writing the KKT system as a generalized equation, the nonsmoothness arising from the active set changes is encoded implicitly by the normal cone \( N_C(x)\) , \( F\) being here a smooth functional. If the generalized equation (38) is strongly regular, then the sensitivities follow from Robinson’s Theorem (Theorem 13 in the appendix). Strong regularity is implied by SSOSC and LICQ, allowing to drop the SCS condition (however, SSOSC is necessary to guarantee that the perturbed solution \( x(p)\) remains unique without SCS).
Proposition 5 (Theorem 4.1, [6])
Let \( w^\star = (x^\star, y^\star, z^\star)\) a primal-dual solution of the generalized equation (38). If SSOSC and LICQ hold at \( w^\star\) , then (38) is strongly regular at \( w^\star\) .
The strong regularity given by Proposition 5 implies
Hence, strict complementarity slackness is not required to ensure the primal-dual solution \( w(p)\) is locally Lipschitz continuous. In addition, we can prove that the optimal solution is also directionally differentiable in a direction \( h \in \mathbb{R}^{\ell}\) .
Proposition 6 (Theorem 2, [10])
Suppose that at a KKT stationary solution \( x^\star\) of NLP\( (p^\star)\) , LICQ and SSOSC hold. Then
for a given direction \( h \in \mathbb{R}^{\ell}\) , the directional derivative \( w'(p; h)\) is the primal-dual solution of the QP problem defined as
(39)
with \( \nabla_{x x}^2 \mathcal{L} = \nabla_{x x}^2 \mathcal{L}(x, y, z, p)\) , \( \nabla_{x p}^2 \mathcal{L} = \nabla_{xp}^2 \mathcal{L}(x, y, z, p)\) .
The QP is parameterized implicitly by the dual multipliers \( (y, z)\) (appearing in the derivatives of the Lagrangian \( \nabla_{x x}^2 \mathcal{L}\) and \( \nabla_{x p}^2 \mathcal{L}\) ). LICQ guarantees that the multipliers are unique at the solution, leading to a non-ambiguous definition of the directional derivative (39). Under LICQ and SSOSC, the directional derivative exists for every direction \( h \in \mathbb{R}^{\ell}\) . However, we cannot guarantee that the directional derivative \( w'(p; \cdot)\) is continuous unless SCS hold. Note that under SCS, the QP (39) rewrites with only equality constraints and becomes equivalent to the linear system (34).
Robinson’s generalized equation is a powerful framework to encode nonsmooth equations. Several alternatives have been developed in the recent years, one of the most promising method being based on the notion of lexicographic derivative [37], which has been introduced has a practical differentiation method for nonsmooth functions, such as \( PC^1\) functions (see Appendix A.4). The KKT conditions (9) rewrites equivalently as a system of nonsmooth equations:
(40)
It is well known that the nonsmooth KKT system (40) can be solved using a semi-smooth Newton method. At the solution, the sensitivity analysis of the nonsmooth system can be carried out using lexicographic derivatives [38].
The function \( \min\) is \( PC^1\) in the sense of Scholtes [39], so we deduce that the mapping \( \Phi\) is itself \( PC^1\) on its domain. The notion of coherently oriented structure extends the Implicit Function Theorem 3 to \( PC^1\) functions.
Proposition 7 (Theorem 5.1 [40])
Let \( (x^\star, y^\star, z^\star)\) be a solution of KKT. If \( \Phi\) is coherently oriented with respect to \( (x, y, z)\) at \( (x^\star, y^\star, z^\star)\) then there exists a neighborhood \( U\) of \( p\) and \( V\) of \( x^\star\) and a \( PC^1\) mapping \( w: U \to V\) such that for each \( p \in U\) \( w(p)\) is the unique solution of (40). Moreover, for any \( k \in \mathbb{N}\) and any \( R \in \mathbb{R}^{p \times k}\) the LD-derivatives \( x'(p; R)\) , \( y'(p, R)\) and \( z'(p, R)\) are the unique solution \( X\) , \( Y\) and \( Z = [Z^+ Z^0 Z^-]\) of the following nonsmooth equation system
(41)
where \( \bf{LMmin}\) is the lexicographic matrix minimum function defined in (108), and
Without surprise, ensuring complete coherently orientation is equivalent to assuming LICQ and SSOSC hold.
Proposition 8
Let \( (x^\star, y^\star, z^\star)\) be a KKT stationary point. Suppose that LICQ and SSOSC holds at \( (x^\star, y^\star, z^\star)\) . Then \( \Phi\) is coherently oriented with respect to \( (x, y, z)\) .
Solving the nonsmooth system (41) is generally non trivial, and falls back to a nonsmooth Newton method or an active set algorithm (cycling through linear equation system solves parameterized by the current working active set). However, it has been proved in [41] that evaluating the LD derivatives can be rewritten as the solution a hierarchy of \( k\) QP problems \( QP_{(1)}, QP_{(2)}, …, QP_{(k)}\) , with \( QP_{(j)}(r_{(j)})\) defined for \( j=1, …, k\) as
We note \( \alpha_{(j)} \in \mathbb{R}^{m_e}\) , \( \beta_{(j)} \in \mathbb{R}^{m^+_i}\) , \( \gamma_{(j)} \in \mathbb{R}^{m^0_i}\) the unique multipliers associated resp. to the equality constraints, the strongly active constraints and the weakly active constraints. The sets of strongly and weakly active constraints change between two consecutive indexes \( j\) using the information brought by the latest multiplier \( \gamma_{(i)}\) :
(42)
with \( \mathcal{A}_{(0)}^0 = \mathcal{I}_p^0(x; z)\) and \( \mathcal{A}_{(0)}^+ = \mathcal{I}_p^+(x; z)\) .
Proposition 9 (Theorem 3.1 [41])
Let \( (x^\star, y^\star, z^\star)\) be a solution of KKT. Suppose that LICQ and SSOSC hold. Then, for any \( k \in \mathbb{N}\) and \( R \in \mathbb{R}^{\ell \times k}\) , the LD-derivatives \( x'(p, R)\) , \( y'(p, R)\) and \( z'(p, R)\) in the direction \( R\) are given as
(43)
with, for \( j = 1, …, k\) ,
(44)
We have shown that the sensitivities remain well-defined if we drop the SCS condition required in Theorem 4. What happens if we suppose in addition that the multipliers \( (y, z)\) are non unique? It turns out that in that case we retain the directional differentiability of the primal solution \( x(p)\) . However, the definition of QP (39) becomes ambiguous: which multipliers should we consider when evaluating the Hessian of the Lagrangian?
Kojima was the first to prove the continuity of the optimal solution assuming only MFCQ and GSSOSC, and proved in addition that they are the weakest conditions under which the perturbed solution is locally unique [9].
Theorem 5 (Theorem 7.2, [9])
Suppose that at a KKT stationary solution \( x^\star\) of NLP\( (p^\star)\) , MFCQ and GSSOSC hold. Then, there are open neighborhoods \( U\) of \( p^\star\) and \( V\) of \( x^\star\) and a function \( x: U \to V\) such that \( x(\cdot)\) is continuous. The function \( x(p)\) is the unique local solution of NLP(\( p\) ) in \( V\) , and MFCQ holds at \( x(p)\) .
The LICQ condition in Proposition 6 can be relaxed to SMFCQ, which also implies the multipliers \( (y^\star, z^\star)\) at the solution are unique (Proposition 2).
Proposition 10 (Theorem 4.2, [11])
Suppose that at a solution \( x^\star\) of NLP\( (p^\star)\) , SMFCQ and SSOSC hold. Then, \( x(\cdot)\) is directionally differentiable at \( p\) for every \( h \in \mathbb{R}^{\ell}\) , and the directional derivative \( x'(p^\star, h)\) is the unique solution of the QP problem
(45)
with \( \mathcal{K}_{p^\star}\) the critical cone defined in (13) and \( \nabla_{x x}^2 \mathcal{L} = \nabla_{x x}^2 \mathcal{L}(x^\star, y^\star, z^\star, p^\star)\) , \( \nabla_{x p}^2 \mathcal{L} = \nabla_{x p}^2 \mathcal{L}(x^\star, y^\star, z^\star, p^\star)\) .
Note that we are no longer able to evaluate the directional derivative of the dual solution in (45), in contrast with the QP (39). Kyparisis [12] proved that we can relax SMFCQ by CRCQ to get the existence of the directional derivative \( x'(p, h)\) in the degenerate case where the multipliers \( (y, z)\) are non unique: it suffices to look at the multipliers at the extreme points \( \mathcal{E}_{p^\star}(x^\star)\) .
Proposition 11 (Theorem 2.2, [12])
Suppose that at a KKT stationary solution \( x^\star\) of NLP\( (p^\star)\) , MFCQ, CRCQ and GSSOSC hold. Then, \( x(\cdot)\) is directionally differentiable at \( p^\star\) for every \( h \in \mathbb{R}^{\ell}\) , and the directional derivative \( x'(p^\star, h)\) is the unique solution of the QP problem, defined for a given \( (y, z) \in \mathcal{E}_{p^\star}(x^\star)\) ,
(46)
with \( \mathcal{K}_{p^\star}\) the critical cone defined in (13) and \( \nabla_{x x}^2 \mathcal{L} = \nabla_{x x}^2 \mathcal{L}(x^\star, y, z, p^\star)\) , \( \nabla_{x p}^2 \mathcal{L} = \nabla_{x p}^2 \mathcal{L}(x^\star, y, z, p^\star)\) .
Finally, Proposition 11 has been extended in [13] to derive a practical way to evaluate the sensitivities for a degenerate nonlinear program. The idea is to look only at multipliers in \( \mathcal{E}_{p^\star}(x^\star)\) that are also solution of the linear program
(47)
The solution \( (y, z)\) of (47) determines the sets of strongly active \( \mathcal{I}_p^+(x^\star; z)\) and weakly active constraints \( \mathcal{I}_p^0(x^\star; z)\) in the QP (46). As such, the set \( S_{p^\star}(x^\star; h)\) is intimately related to the critical set \( \mathcal{K}_p(x^\star, z; h)\) . Indeed, observe that by definition \( (y, z) \in \mathcal{M}_p(x)\) if
(48)
where we have removed the inactive inequality from (48). By noting \( d \in \mathbb{R}^n\) the multiplier associated to the stationary condition, and \( \lambda \in \mathbb{R}^{|\mathcal{I}_p(x^\star)|}\) the multiplier associated to the constraints \( z_\mathcal{I} \geq 0\) , the KKT conditions of (47) write
By removing the multiplier \( \lambda\) and taking into account the disjunction in the complementarity constraint \( 0 \leq \lambda \perp z_\mathcal{I} \geq 0\) , the two first equations rewrite equivalently as
(49)
which describes exactly the critical set \( \mathcal{K}_p(x^\star, z; h)\) . Hence, by theory of linear programming, the problem (47) has an optimal solution if and only if the previous KKT system is consistent, meaning that \( \mathcal{K}_p(x^\star,z;h)\) is nonempty. This is formalized in the following lemma.
Lemma 1 (Lemma 2.2, [42])
The critical cone \( \mathcal{K}_p(x^\star, z; h)\) is nonempty if and only if \( (y, z) \in S_p(x^\star; h)\) .
We can go one step further in the interpretation of the LP (47). Note that its dual writes
(50)
with the multiplier \( y \in \mathbb{R}^{m_e}\) being associated to the equality constraints and the multiplier \( z \in \mathbb{R}^{m_i}\) being associated to the inequality constraints. As a consequence, the multipliers \( (y, z)\) are the marginal costs encoding the objective’s change after a perturbation \( h\) , and satisfy
(51)
The nature of the LP (47) has been further studied in [43, Lemma 4.1]. Izmailov and Solodov noted that the solution of (47) is likely to be unique if the objective of (47) is non-constant. Furthermore, if SMFCQ is not satisfied the set of multipliers \( \mathcal{M}_p(x^\star)\) does not reduce to a singleton. Then, the LP (47) returns a solution \( (y, z) \in \mathcal{E}_p(x^\star)\) that does not satisfy strict-complementarity: \( \mathcal{I}_p^0(x^\star, z) \neq \emptyset\) .
Ralph and Dempe showed in [13] that the directional derivative is unique even when the solution of the LP (47) is not. They also prove that the local solution \( x(\cdot)\) is piecewise differential in the sense of Scholtes [39].
Theorem 6 (Theorem 2, [13])
Suppose that at a KKT stationary solution \( x^\star\) of NLP\( (p^\star)\) , MFCQ, CRCQ and GSSOSC hold. Then,
It turns out that the notion of \( PC^{1}\) functions is adapted to characterize the solution of a parametric nonlinear program.
We now focus on the derivative of the value function \( \varphi(p)\) . In §4.1, we recall classical results focusing on the directional differentiability of the value function \( \varphi\) . The differentiable case is studied in §4.2.
The sensitivity analysis of the optimal value function traces back to the seminal work of Danskin, who studied the sensitivity for a simplified problem where the admissible set \( X(p)\) is a fixed subset \( X \subset \mathbb{R}^{n}\) , that is,
(52)
Theorem 7 (Danskin’s Lemma [44])
Suppose \( X\) is nonempty and compact, and \( f(x, \cdot)\) is continuously differentiable at \( p \in \mathbb{R}^{\ell}\) . Then \( \varphi\) is locally Lipschitz near \( p\) , and directionally differentiable in every direction \( h \in \mathbb{R}^{\ell}\) , with
(53)
with \( \Sigma(p)\) the solution set defined in (7).
Danskin’s theorem has been extended by Hogan to the case where \( X(p)\) is defined as \( X(p) = \{ x \in \mathbb{R}^{n} \;:\; x \in M \;,\; h(x, p) \leq 0 \}\) , for a fixed set \( M\) .
Theorem 8 ([45])
Suppose \( M\) is a closed convex set, \( f(\cdot, p)\) and \( h_i(\cdot, p)\) are convex on \( M\) for all \( p\) , and continuously differentiable on \( M \times N\) , with \( N\) a neighborhood of \( p\) . If (i) the solution set \( \Sigma(p)\) is nonempty and bounded, (ii) \( \varphi(p)\) is finite, (iii) there is a point \( \hat{x} \in M\) such that \( h(\hat{x}, p) < 0\) (Slater), then \( \varphi\) is directionally differentiable, and for all \( h\in \mathbb{R}^{\ell}\) ,
(54)
with \( \mathcal{M}_p(x)\) the multiplier solution set.
The extension to the generic case associated to \( X(p) = \{x \in \mathbb{R}^{n} \; | \; g(x, p) = 0 \, , \, h(x, p) \leq 0 \}\) was proved by Gauvin and Dubeau [46] and Fiacco [47]. The following theorem bounds the lower and upper Dini derivatives (see Appendix A.2) of the value function: in general the bounds are sharp.
Theorem 9 (Theorem 2.3.4 [28])
Suppose that \( X(p)\) is nonempty and uniformly compact near \( p\) and MFCQ holds at each \( x \in \Sigma(p)\) . Then \( \varphi\) is locally Lipschitz near \( p\) , and for any \( h \in \mathbb{R}^{\ell}\) ,
(55)
In the convex case the multiplier set \( \mathcal{M}_p(x)\) is independent on the solution \( x \in \Sigma(p)\) .
Proposition 12 (Corollary 2.3.8 [28])
Let \( f\) and \( h\) be convex functions in \( x\) , \( g\) be affine in \( x\) , such that \( f,g, h\) are all jointly \( C^1\) in \( (x, p)\) . Then, if \( X(p)\) is nonempty and uniformly compact near \( p\) and MFCQ holds at each \( x \in \Sigma(p)\) , then \( \varphi\) is directionally differentiable at \( p\) and
(56)
Note that if in addition SMCFQ holds, then \( \mathcal{M}_p(x)\) reduces to a singleton \( (y^\star, z^\star)\) and the directional derivative simplifies to
(57)
If instead we suppose GSSOSC holds, the local solution \( x(p)\) is a singleton and the computation of the directional derivative also simplifies.
Theorem 10 (Theorem 7.5 [48])
Suppose GSSOSC and MFCQ hold at a stationary solution \( x^\star\) . Then near \( p\) the function \( \varphi(\cdot)\) has a finite one-sided directional derivative, and for every \( h \in \mathbb{R}^{\ell}\) ,
(58)
Now, we impose more stringent constraint qualifications to recover the setting of the Implicit Function Theorem 3: as a consequence the value function \( \varphi\) becomes \( C^2\) differentiable. For a given \( p \in \mathbb{R}^{\ell}\) , the following results hold not only for global solution \( x \in \Sigma(p)\) , but also locally for every KKT stationary point \( w(p) = (x(p), y(p), z(p))\) of NLP(\( p\) ). As such, we define the local value function as
(59)
Theorem 11 (Theorem 3.4.1 [28])
Suppose SOSC, SCS and LICQ hold at a stationary KKT solution \( (x^\star, y^\star, z^\star)\) . Then the local value function \( \varphi_\ell\) is \( C^2\) near \( p\) , and
and also
(60)
The previous theorem gives expressions both for the gradient and the Hessian of the optimal value function. It applies to generic nonlinear programs of the form NLP(\( p\) ). It yields interesting extensions when applied to problem with certain structures.
First, we can prove that if the constraints of the nonlinear problem are independent of the parameter \( p\) ,
(61)
then the gradient and the Hessian depend only on the derivatives of the objective function \( f\) .
Corollary 1
Suppose the conditions of Theorem 11 hold at a stationary solution \( (x^\star, y^\star, z^\star)\) of problem (61). Then, in a neighborhood of \( p\) the local value function \( \varphi_\ell\) is \( C^2\) , with
The other corollary draws a link between Theorem 11 and the shadow price of sensitivity: if the parameters appear only in the right-hand-side of the constraint, that is, for \( p \in \mathbb{R}^{m_e}\) and \( q \in \mathbb{R}^{m_i}\) ,
(62)
then the gradient of the value function is equal to the Lagrange multipliers found at the solution. This is well-known result in convex analysis, the problem (62) being subject to the so-called canonical perturbations.
Corollary 2 (Optimal value function derivatives for RHS perturbations)
Suppose the conditions of Theorem 11 hold at a stationary solution \( (x^\star, y^\star, z^\star)\) of problem (62). Then, in a neighborhood of \( (p, q)\) the local value function \( \varphi_\ell\) is \( C^2\) , with
(63)
Now the theory has been laid out, we give an overview of the existing numerical methods for the sensitivity analysis of a nonlinear optimization program (4). First, we put a special emphasis on the differentiation of conic programs in §5.1, before focusing on the differentiation of nonlinear programs in §5.2.
Conic programming has become an important subfield of optimization. In machine learning, a significant number of problems are conic, motivating a renewed interest for the sensitivity analysis of conic-structured problems. A conic program is a specialization of (4), with the following structure
(64)
where \( \mathcal{K}\) is a pointed convex cone. If the cone is the non-negative orthant \( \mathcal{K} = \mathbb{R}^n_+\) , the problem (64) becomes a LP. We suppose here that the problem’s data are the parameters: \( p = (c, A, b)\) .
The KKT conditions of (64) write out
(65)
As the problem (64) is convex, the KKT conditions (65) are necessary and sufficient.
On the contrary to what we exposed in Theorem 4, the sensitivity analysis of (64) does not differentiate through the KKT conditions (65): instead, it uses an equivalent form of (65) called the homogeneous self-dual embedding (HSD) [49, 50]. The HSD writes out as the problem
(66)
If \( (x, y, \tau, s, \kappa)\) is solution of (66) with \( \tau > 0\) and \( \kappa = 0\) , then \( (x/\tau, y/\tau, s/\tau)\) satisfies the KKT conditions (65) and is solution of the conic program (64). In general, the parameters \( \tau\) and \( \kappa\) characterize the primal and dual infeasibility of the conic problem (64).
We define the skew-symmetric matrix
(67)
Defining the new cone \( \mathcal{C} = K \times \mathbb{R}^m \times \mathbb{R}_+\) and its dual \( \mathcal{C}^\ast = K^\star \times \{0 \}^n \times \mathbb{R}_+\) , the problem (66) becomes equivalent to
(68)
Let \( z \in \mathbb{R}^{n+m+1}\) . Using Moreau’s decomposition theorem, we know that \( z = u - v\) with \( (u, v) \in \mathcal{C} \times \mathcal{C}^\star\) and \( u^\top v = 0\) if and only if \( u = P_{\mathcal{C}}(z)\) and \( v = -P_{\mathcal{C}^o}(z)\) .
As a consequence, the HSD (68) rewrites more compactly as finding \( z \in \mathbb{R}^{n+m+1}\) such that
(69)
The residual map \( R(z)\) returns the HSD residual:
(70)
If we decompose \( z^\star\) component by component as \( z^\star = (u^\star, v^\star,w^\star)\) , with \( (u^\star,v^\star,w^\star) \in \mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}_+^*\) , the KKT solutions \( (x^\star, y^\star, s^\star)\) of (64) is recovered as
(71)
We now have all the elements to evaluate the sensitivity at a given KKT primal-dual solution \( w^\star = (x^\star, s^\star, y^\star)\) of (65). By defining \( u^\star = (x^\star, y^\star, 1)\) , \( v^\star = (s^\star, 0, 0)\) and \( z^\star = u^\star - v^\star\) , the vector \( z^\star\) is solution of the residual map: \( R(z^\star) = 0\) . If the projection operator \( P_{\mathcal{C}}\) is differentiable, then \( R\) is itself differentiable with \( J_z R(z) = (Q+I)J_z P_{\mathcal{K}}(z) - I\) . In that case, by applying the implicit function theorem 3 to \( R(z) = 0\) , we get the Jacobian
(72)
We recover the sensitivity of the primal-dual solution by noting that \( w^\star = \phi(z^\star)\) : after applying the chain-rule, we get \( J_p w^\star = J_z \phi \cdot J_p z^\star\) .
Note that in general the projection operator \( P_{K}\) is not differentiable. However, as recalled in [51], if \( K\) encodes the nonnegative orthant, the second-order cone or the positive definite cone then \( P_{K}\) is strongly semismooth (see Appendix A.3), hence differentiable almost everywhere. As a result, the residual mapping (70) is a strongly semismooth function: the equation (72) is well-defined if we use a version of the implicit function theorem adapted for the strongly semismooth case (and involving Clarke generalized Jacobians). Another rigorous treatment has been made in [49], based on the newly introduced concept of conservative Jacobians.
With the emergence of the differentiable programming paradigm, numerous packages have been developed to differentiate through conic program (64). The approach has been implemented in cvxpylayer [27] and DiffOpt.jl [52]. We emphasize that factorizing the matrix \( \nabla_z R\) in (72) can be challenging, as on the contrary to the KKT matrix the Jacobian \( \nabla_z R\) is non symmetric. For that reason, the system (72) is solved either using a LU factorization, or using a Krylov method (the LSQR algorithm accommodates nicely the degenerate case where the Jacobian of the residual \( \nabla_z R\) is non-invertible).
The solution of NLP(\( p\) has been studied by Fiacco and McCormick in their seminal work [53]: the main idea is to reformulate the problem with penalties and solve the resulting unconstrained problem using Newton method. In that case, the sensitivity analysis uses the approximate solution returned by the algorithm to compute the directional derivative \( x'(p, h)\) .
We recall the penalty approach described originally in [54]. Although this approach is now slightly outdated, it introduces some fundamental ideas still being used today. We follow the approach described in [28, Chapter 6]. The idea is to reformulate NLP(\( p\) with inner and outer penalties. For a strictly feasible point \( x\) (in the sense \( h(x, p) < 0\) ), Fiacco and McCormick define the functional
(73)
where \( r > 0\) is a penalty parameter appearing in the barrier associated to the inequality constraints and the quadratic penalty associated to the equality equations. For a fixed penalty \( r\) , a stationary point \( \nabla_x W\big(x(p, r), r, p\big) = 0\) is found using the traditional Newton method. SENSUMT [54] was developed as an extension of SUMT to compute the sensitivity using the penalty function \( W(\cdot)\) .
Note that the gradient of the penalty \( W(\cdot)\) and the gradient of the Lagrangian (8) are defined respectively as
(74)
where we defined \( h_i := h_i(x, p)\) and \( g_j := g_j(x, p)\) . We observe that for any solution \( x^\star\) of (73), the multipliers \( z^\star = r / h(x^\star, p)\) and \( y^\star = g(x^\star, p) / r\) satisfy the stationary condition \( \nabla_x \mathcal{L}(x^\star, y^\star, z^\star, p) = 0\) : The variables \( (y^\star, z^\star)\) are approximations of the optimal Lagrange multipliers, both depending on the penalty \( r\) . The penalty \( r\) is here interpreted as a new parameter, and is valid candidate to apply Theorem 4.
Proposition 13 (Theorem 6.2.1 [28])
Let \( p^\star \in \mathbb{R}^{\ell}\) . Suppose that a stationary solution \( x^\star\) of NLP(\( p\) LICQ, SOSC and SCS hold. Then in a neighborhood of \( (0, p^\star)\) there exists a unique differentiable function \( w(r, p) = (x(r, p), y(r, p), z(r, p))\) satisfying
(75)
and such that \( w(0, p^\star) = (x^\star, y^\star, z^\star)\) . In addition, for any \( (r, p)\) near \( (0, p^\star)\) , \( x(r, p)\) is locally unique unconstrained local minimizing point of \( W(x, r, p)\) with \( h_i(x(r, p), p) < 0\) and \( \nabla^2_{x x} W(x(r, p), r, p)\) positive definite.
At the solution of (73), the sensitivity of the primal \( x(r, p)\) is given by the Implicit Function Theorem 3 as
(76)
involving only the inverse of the \( n \times n\) Hessian matrix \( \nabla_{x x}^2 W\) . Using Proposition 13, we get that \( \lim_{r \to 0} J_p x(r, p) = J_p x^\star(p)\) . The dual sensitivities are recovered with (75) as \( z_i(r, p) = r / h_i(x(r, p), p)\) and \( y_j = g_j(x(r, p), p) / r\) , giving the respective Jacobians
(77)
The SUMT algorithm described in the previous paragraph has fallen out of fashion, as the resulting problem (73) becomes highly ill-conditioned when we are approaching an optimal solution. However, SUMT has nurtured the development of interior-point methods in the 1980s: the primal-dual interior-point method (IPM) is now one of the most widely used algorithm to solve nonlinear program NLP(\( p\) ).
The solution sensitivity of primal-dual interior point follows from the same principle as in Theorem 13. As an example, we describe the approach uses in sIpopt [55], a package for the sensitivity analysis of NLP developed as an extension of the primal-dual interior-point solver Ipopt. Ipopt rewrites the inequalities in NLP(\( p\) as equality constraints by introducing additional slack variables, giving a reformulated NLP problem with only bound constraints \( x \geq 0\) :
(78)
For a given barrier parameter \( \mu > 0\) , the bound constraints \( x \geq 0\) are penalized using a barrier term in the objective function:
(79)
Ipopt solves a sequence of subproblems (79) for given barrier parameters \( \{\mu_k \}_k\) , with \( \mu_k \to 0\) . For a fixed \( \mu\) , solving (79) with primal-dual IPM amounts to solve the system of nonlinear equations \( F_\mu(x, y, z, p) = 0\) , with
(80)
where we have defined \( X = \text{diag}(x)\) .
The following proposition is a corollary of Proposition 13.
Proposition 14 (Convergence of interior-point)
Let \( w^\star = (x^\star, y^\star, z^\star)\) be a local primal-dual stationary point of (78). Suppose that LICQ, SCS and SOSC hold at \( w^\star\) . We denote by \( x(\mu_k, p)\) the solution of the barrier problem (79) at iteration \( k\) . Then
There exists a unique continuously differentiable function \( w(\cdot, p)\) existing in a neighborhood of \( 0\) for \( \mu > 0\) and solution of (79), and such that
(81)
For \( \mu\) small enough,
(82)
To solve the sparse linear system (82), sIpopt uses the factorization of the KKT system computed by Ipopt at the local solution \( w(\mu ,p)\) . To handle change in the active set, sIpopt uses a fix-and-relax strategy, without resorting to the resolution of the QP (46). The new constraints activated are incorporated in the linear system (82), and the system is solved using a Schur-complement approach.
CasADi [56] is a powerful modeler used to solve nonlinear programs (4). The sensitivity analysis implemented in CasADi is slightly different than sIpopt [57]. Once a solution returned by the optimization solver, CasADi applies a primal-dual active set method to determine exactly the active set at the solution (an information that interior-point does not return by default, unless a crossover is used). Then, CasADi uses a custom sparse QR solver to solve a non-symmetric formulation of the KKT condition. Such QR factorization allows to quickly detect and resolve the singularities associated to potential degeneracy.
sIpopt’s method has been recently specialized in [26] to the parallel extraction of the sensitivity for a quadratic program (QP), for a fixed number of parameters \( (p_1, …, p_N)\) . The solver QPTH solves \( N\) QP problems in parallel using a specialized interior-point method leveraging batched dense linear algebra on the GPU. Once a solution found, the sensitivity are evaluated using a formula analogous to (82). This has been proven relevant for machine learning applications, where neural network are trained with batched stochastic gradient descent. To the best of our knowledge, the method has not been extended to sparse problems, limiting its adoption beyond machine learning. Also, little emphasis is put on the degenerate case.
In nonlinear programming, one of the main application for sensitivity analysis is updating the solution of NLP(\( p\) after a small change in the parameter \( p\) : once the directional derivative is obtained, we can compute an approximation of a new primal solution \( x(p')\) by setting the direction \( h = p' - p\) and using the Taylor expansion [23]:
(83)
However, the method is valid only for small perturbations around \( p^\star\) , without any active set change. In case active set changes are occurring, they have to be accommodated locally using a path following method,
A path following method tracks the path of optimal solution between \( p\) and the new parameter \( p'\) . To do so, it defines the affine interpolation \( p(t) = p + t(p' - p)\) for \( t \in [0, 1]\) and tracks the solution of NLP(\( p\) between \( t=0\) and \( t=1\) . This approach returns more accurate approximations, notably when facing active set changes. For a sequence of scalars \( t_{(0)} < t_{(1)} < … < t_{(N)}\) such that \( t_{(0)} = 0\) and \( t_{(N)} = 1\) , the perturbation between \( t_{(m)}\) and \( t_{(m+1)}\) is defined as
(84)
Then, we apply \( N-1\) sensitivity updates with the direction \( h_{(m)}\) to follow the path of the primal-dual solution \( w(t)\) between \( t_{(0)}\) and \( t_{(N)}\) .
We refer to [23] for a description of the path following method. The method has been extended later on [24] for degenerate problems where the dual solution is non-unique, using the method of Ralph and Dempe described in §3.4. The algorithm proposed in [24] tracks the active set changes by updating iteratively the dual multipliers using the LP (47). As such, it can be interpreted as running \( N\) iterations of a sequential quadratic programming (SQP) algorithm. The original method [24] didn’t come with any convergence guarantee, but it has been extended later in [58] with a rigorous convergence proof. In [58], the path following method is interpreted as a predictor-corrector method, where the corrector solves a linear system to update the sensitivity and the predictor solves a QP to track the active set changes. The multipliers are again updated by solving a LP equivalent to (47). We note that the method requires the evaluation of the second-order derivatives of each constraint in order to solve the QP in the predictor step. To the best of our knowledge, we are not aware of a generic package that implements the method described in [58].
We thank Léonard Moracchini for his help in establishing the first version of this manuscript, and Andrew Rosemberg and Tristan Rigaut for their helpful feedbacks.
Let \( f: \Omega \subset E \to F\) a function, with \( E\) and \( F\) two normed spaces. We start to recall the usual derivatives, before introducing the Clarke Jacobian more adapted in the nonsmooth context.
Definition 9 (Directional derivatives)
Let \( x \in \Omega\) . The function \( f\) has a directional derivative at \( x\) in the direction \( h \in E\) if \( x + th \in \Omega\) for \( t\) small enough, and if the following limit exists
(85)
Definition 10 (Gâteaux-differentiability)
The function \( f\) is Gâteaux differentiable (or \( G\) -differentiable) in \( x \in \Omega\) if there exists a directional derivatives for every direction \( h \in E\) , and if the application
(86)
is continuous and linear. We note \( f'(x)\) the operator such that \( f'(x; h) = f'(x) \cdot h\) .
We note that a \( G\) -differentiable function is not necessarily continuous.
Definition 11 (Fréchet-differentiability)
The function \( f\) is Fréchet differentiable (\( F\) -differentiable, or simply differentiable) at \( x \in \Omega\) if there exists a bounded linear operator \( L\) such that
(87)
\( F\) -differentiability implies that \( f\) is continuous. The operator \( L\) is the derivative of \( f\) in \( x\) . The condition (87) rewrites equivalently
(88)
The Hadamard derivative gives the directional derivative along a curve tangential to \( h\) . It is required to study the sensitivity of infinite-dimensional problems, as often encountered in statistics or stochastic optimization.
Definition 12 (Hadamard-differentiability)
The function \( f\) is Hadamard differentiable at \( x\) if for any mapping \( \varphi: \mathbb{R}_+ \to X\) such that \( \varphi(0) = x\) and \( \frac{1}{t} (\varphi(t) - \varphi(0))\) converges to a vector \( h\) as \( t \downarrow 0\) , the limit
(89)
does exist.
For any sequences \( \{h_n\}_n\) and \( \{t_n\}_n\) such that \( h_n \to h\) and \( t_n \to 0^+\) , the Hadamard directional derivative can also be written in the form
(90)
The Hadamard derivative has the advantage of being compatible with the chain-rule with minimal assumptions.
Proposition 15 (Chain-rule)
Let \( f\) be Hadamard directionally differentiable at \( x\) , and \( g\) be Hadamard directionally differentiable at \( f(x)\) . Then the composite mapping \( g \circ f\) is Hadamard directionally differentiable at \( x\) , and satisfies the chain-rule \( d_H (g \circ f)(x; h) = d_H g (f(x); d_H f(x; y))\) .
Whereas the Hadamard derivative allows for different rates for every directions, the Fréchet derivative imposes the rate is the same for each direction \( h\) . For finite-dimensional space \( E\) , if the directional derivative is continuous then the Fréchet and the Hadamard derivatives coincide.
The notion of derivatives can be extended to nonsmooth functions. We recall the following important results.
Theorem 12 (Rademacher Theorem)
Let \( f: \Omega \to F\) be a locally Lipschitz continuous function. Then \( f\) is differentiable almost everywhere in the sense of the Lebesgue measure.
The concept of Dini-derivatives generalizes the notion of directional differentiability for non-smooth functions.
Definition 13 (Dini-derivatives)
Let \( x \in \Omega\) . The upper Dini derivative in the direction \( h\) is defined as
(91)
Accordingly, the lower Dini derivative is defined as
(92)
Definition 14 (Bouligand-differential)
Let \( \mathcal{D}_f\) be the set of points at which \( f\) is (Fréchet) differentiable. The \( B\) -differential of \( f\) at \( x \in \Omega\) is the set defined as
(93)
Definition 15 (Clarke-differential)
The \( C\) -differential of \( f\) at \( x \in \Omega\) is the convex hull of the Bouligand-differential, that is
(94)
Definition 16 (Clarke-generalized directional derivative)
The Clarke-generalized directional derivative of \( f\) in the direction \( h\) is defined by
(95)
Proposition 16
Suppose that \( f\) is \( L\) -Lipschitz near \( x \in \Omega\) . Then
\( d_C f(x; \cdot)\) is the support function of \( \partial_C f(x)\) , that is,
(96)
Proposition 17
If \( f\) is \( G\) -differentiable at \( x\) , then \( f'(x) \in \partial_C f(x)\) . If in addition \( f\) is \( C^1\) at \( x\) , then \( \partial_C f(x) = \{ f'(x) \}\) .
Proposition 18 (Chain-rule)
Suppose that \( f\) is \( L\) -Lipschitz continuous on \( \Omega\) . If the function \( g:F \to G\) is Lipschitz-continuous near \( f(x)\) , then
(97)
Definition 17 (Semi-smoothness)
The function \( f:\Omega \to F\) is semi-smooth in \( x \in \Omega\) if:
Proposition 19
Suppose \( f\) is \( L\) -Lipschitz near \( x\) , and admits directional derivatives at \( x\) in all directions. Then the following statements are equivalent:
The class of \( PC^1\) functions is a specific tool in nonsmooth analysis, often employed for the sensitivity analysis of nonlinear programs.
Definition 18
The function \( f:X \to F\) is piecewise-differentiable (PC\( ^1\) ) at \( x \in X\) if there exists a neighborhood \( N \subset X\) and a finite collection of \( C^1\) selection functions \( \mathcal{F}_f(x) = \{f_{(1)}, …, f_{(k)} \}\) defined on \( N\) such that \( f\) continuous on \( N\) and for all \( y \in N\) , \( f(y) \in \{f_{(i)}(y)\}_{i \in [k]}\) .
If \( f\) is PC\( ^1\) , we define the active set associated at \( x \in X\) as
(98)
Proposition 20
Let \( f:X \to F\) a PC\( ^1\) function at \( x \in X\) . Then
\( f\) is Lipschitz continuous near \( x\) and directionally differentiable at \( x\) , with
(99)
In addition, the directional derivative \( f'(x; \cdot)\) is piecewise linear.
The notion of coherent orientation plays a central role in obtaining the invertibility of a \( PC^1\) function in order to apply the implicit function theorem.
Definition 19 (Coherent orientation)
Let \( U \subset \mathbb{R}^n\) and \( V \subset \mathbb{R}^m\) be open. A \( PC^1\) function \( F:U \times W \to \mathbb{R}^n\) is said coherently oriented with respect to \( x\) at \( (x^\star, y^\star) \in U \times V\) if all matrices in \( \partial_{B,x} f(x^\star, y^\star)\) have the same non-vanishing determinant sign.
Lexicographic derivatives have been introduced by Nesterov in [37] as the derivatives of lexicographically smooth (L-smooth) functions.
Definition 20 (Lexicographically smooth function)
Let \( f:X \to \mathbb{R}^m\) a locally Lipschitz continuous function on \( X\) . \( f\) is lexicographically smooth at \( x\) if for any \( k \in \mathbb{N}\) and any \( M = [m_{(1)}, …, m_{(k)}] \in \mathbb{R}^{n \times k}\) the following higher-order derivatives are well defined
(100)
The class of L-smooth functions is closed under composition. \( C^1\) functions, convex functions and \( PC^{1}\) functions are all L-smooth functions.
Definition 21 (Lexicographic derivative)
Let \( f:X \to \mathbb{R}^m\) a L-smooth function at \( x\) , and \( M \in \mathbb{R}^{n \times n}\) a nonsingular matrix. The lexicographic derivative of \( f\) at \( x\) in the direction \( M\) is defined as
(101)
The lexicographic subdifferential of \( f\) at \( x\) is defined as
(102)
For any \( k \in \mathbb{N}\) , \( M = [m_{(1)}, …, m_{(k)}] \in \mathbb{R}^{n \times k}\) the LD-derivative of \( f\) in the directions \( M\) is defined as
(103)
The LD-derivative is uniquely defined for \( M \in \mathbb{R}^{n \times k}\) . If in addition \( M \in \mathbb{R}^{n \times n}\) is square and nonsingular, it satisfies
(104)
Interestingly, the LD-derivative obeys a sharp chain-rule.
Proposition 21 (Chain-rule)
Let \( h:X \to Y\) and \( g:Y \to \mathbb{R}^q\) be L-smooth at \( x \in X\) and \( h(x) \in Y\) , respectively. Then the function \( g\circ h\) is L-smooth at \( x \in X\) , and for any \( k \in \mathbb{N}\) and \( M \in \mathbb{R}^{n \times k}\) ,
(105)
LD-derivatives give convenient calculus rules for the derivatives of \( \min\) , \( \max\) , abs, and other nonsmooth functions widely encountered in practice. These rules are based on lexicographic ordering, explaining why LD-derivatives are called "lexicographic". For given vectors \( x, y \in \mathbb{R}^n\) , we say that \( x\) is lexicographically lower than \( y\) if
(106)
The lexicographic minimum function returns the lexicographically ordered minimum of two vectors:
(107)
Similarly, the lexicographic matrix minimum function compares two matrices row by row, and is defined as
(108)
A multivalued mapping (or point-to-set map) is a mapping \( \Gamma: E \rightrightarrows F\) where for each \( x \in E\) , \( \Gamma(x)\) is a subset of \( F\) . They have been used extensively in mathematical programming to study the solution maps of optimization problems.
Definition 22
Let \( \Gamma: E \rightrightarrows F\) a point-to-set map.
Definition 23
The point-to-set map \( \Gamma:E \rightrightarrows F\) is closed at \( x\) if there exists a sequence \( x_n \in E\) , \( x_n\to x\) such that \( y_n \in \Gamma(x_n)\) and \( y_n \to y\) imply \( y \in \Gamma(x)\) .
Definition 24
The point-to-set map \( \Gamma:E \rightrightarrows F\) is open at \( x\) if \( y \in \Gamma(x)\) implies there exists a sequence \( x_n \in E\) , \( x_n\to x\) such that there exists \( m\) and \( \{y_n\}\) such that \( y_n \in \Gamma(x_n)\) for all \( n \geq m\) and \( y_n \to y\) .
Definition 25
The point-to-set map \( \Gamma:E \rightrightarrows F\) is uniformly compact near \( \hat{x}\) if the set \( \cup_{x \in N(\hat{x})} \Gamma(x)\) is bounded for some neighborhood \( N(\hat{x})\) of \( \hat{x}\) .
If \( \Gamma\) is uniformly compact near \( \hat{x}\) , then \( \Gamma\) is closed if and onl if \( \Gamma(\hat{x})\) is a compact set and \( \Gamma\) is upper semicontinuous at \( \hat{x}\) .
Definition 26
The point-to-set map \( \Gamma: E \rightrightarrows F\) is upper Lipschitzian with modulus \( L\) at \( \hat{x} \in E\) if there is a neighborhood \( N\) of \( \hat{x}\) such that for each \( x \in N\) , \( \Gamma(x) \subset \Gamma(\hat{x}) + L \|x - \hat{x} \|B\) , with \( B\) the unit-ball in \( F\) .
Let \( X\) be a linear space, and \( C\) be a non-empty closed convex set in \( X\) .
Definition 27 (Normal cone)
We define the normal cone operator associated to the set \( C\) as, for \( x \in X\)
(109)
With the normal cone, we can define formally the notion of variational inequality, generalizing optimization problem.
Definition 28 (Variational inequality)
Let \( p \in P\) be a parameter. For \( f: X \times P \to X^\star\) , we define the paramaterized variational inequality (also called generalized equation) the problem of finding \( x \in X\) such that
(110)
The equation (110) is a compact formulation stating that \( -f(x, p) \in N_C(x)\) , or, equivalently
(111)
The solution mapping associated to (110) is defined as
(112)
Robinson extended the Implicit function theorem to the nonsmooth variational inequality (110), by extending the notion of regular solution to the variational inequality setting, where the solution mapping \( S(p)\) is not necessarily single-valued. Robinson introduced the notion of strong regularity, which extends the non-singularity condition we impose in the implicit function theorem. If we suppose that \( f\) is (Fréchet) differentiable, the idea is to look at the linearized generalized equation around a solution \( x_0\)
(113)
where we have defined the linearized operator at \( x_0\) :
(114)
Definition 29 (Strong regularity [6])
We say that (110) is strongly regular at a solution \( x_0\) with Lipschitz constant \( L\) if there exists a neighborhood \( U \subset X^\star\) of the origin and \( V \subset X\) of \( x_0\) such that the restriction to \( U\) of \( T^{-1} \cap V\) is a single-valued function from \( U\) to \( V\) and is \( L\) -Lipschitzian on \( U\) .
Putting it more simply, strong regularity implies that the inverse of the linearized operator \( T^{-1}\) has a Lipschitz continuous single-valued localization at \( 0\) for \( x_0\) . If \( C\) is the whole-space \( X\) , then strong regularity implies that the Jacobian \( \nabla_x f(x_0, p)\) is non-singular. The following theorem extends the implicit function theorem to generalized equations (110) satisfying the strong regularity condition 29.
Theorem 13 (Theorem 2.1, [6])
Let \( p \in P\) be a given parameter, \( x_0\) solution of the generalized equation (110). Suppose that \( f\) is Fréchet-differentiable w.r.t. \( x\) exists at \( p\) , and that both \( f(\cdot, \cdot)\) and \( \nabla_x f(\cdot, \cdot)\) are continuous at \( (x_0, p)\) . If (110) is strongly regular at \( x_0\) , then for any \( \varepsilon >0\) there exists neighborhood \( N\) of \( p\) and \( W\) of \( x_0\) as well as a single-valued function \( x:N \to W\) sucht that for any \( p \in N\) , \( x(p)\) is the unique solution of the inclusion
(115)
and, for all \( p, q \in N\) , one has
(116)
Theorem 13 states that the strong regularity of (110) implies that the solution mapping \( S(p)\) of (110) has a Lipschitz-continuous single-valued localization at \( p\) for \( x_0\) .
Corollary 3 (Locally-Lipschitz condition)
Suppose the hypotheses of Theorem 13 hold. Suppose in addition there exists a constant \( \kappa\) such that for each \( (p, q) \in N\) , for each \( x \in W\) , one has
(117)
Then \( x(\cdot)\) is Lipschitzian on \( N\) with modulus \( \kappa (L + \varepsilon)\) .
[1] M. Blondel, Q. Berthet, M. Cuturi, R. Frostig, S. Hoyer, F. Llinares-López, F. Pedregosa, and J.-P. Vert, Efficient and modular implicit differentiation, Advances in neural information processing systems, 35 (2022), pp. 5230–5242.
[2] A. Griewank and A. Walther, Evaluating derivatives: principles and techniques of algorithmic differentiation, SIAM, 2008.
[3] J. C. Gilbert, Automatic differentiation and iterative processes, Optimization methods and software, 1 (1992), pp. 13–21.
[4] A. Griewank and C. Faure, Reduced functions, gradients and hessians from fixed-point iterations for state equations, Numerical Algorithms, 30 (2002), pp. 113–139.
[5] A. V. Fiacco and J. Liu, Degeneracy in nlp and the development of results motivated by its presence, Annals of Operations Research, 46 (1993), pp. 61–80.
[6] Strongly regular generalized equations, Mathematics of Operations Research, 5 (1980), pp. 43–62.
[7] Generalized equations and their solutions, part II: applications to nonlinear programming, Springer, 1982.
[8] A. V. Fiacco, Sensitivity analysis for nonlinear programming using penalty methods, Mathematical programming, 10 (1976), pp. 287–311.
[9] M. Kojima, Strongly stable stationary solutions in nonlinear programs, in Analysis and computation of fixed points, Elsevier, 1980, pp. 93–138.
[10] K. Jittorntrum, Solution point differentiability without strict complementarity in nonlinear programming, Sensitivity, Stability and Parametric Analysis, (1984), pp. 127–138.
[11] A. Shapiro, Second order sensitivity analysis and asymptotic theory of parametrized nonlinear programs, Mathematical Programming, 33 (1985), pp. 280–299.
[12] Sensitivity analysis for nonlinear programs and variational inequalities with nonunique multipliers, Mathematics of operations research, 15 (1990), pp. 286–298.
[13] D. Ralph and S. Dempe, Directional derivatives of the solution of a parametric nonlinear program, Mathematical programming, 70 (1995), pp. 159–172.
[14] J. F. Bonnans and A. Shapiro, Perturbation analysis of optimization problems, Springer Science & Business Media, 2000.
[15] F. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarity problems, Springer, 2003.
[16] A. L. Dontchev and R. T. Rockafellar, Implicit functions and solution mappings: A view from variational analysis, vol. 11, Springer, 2009.
[17] Foundations of bilevel programming, Springer Science & Business Media, 2002.
[18] On differential stability in stochastic programming, Mathematical Programming, 47 (1990), pp. 107–116.
[19] Asymptotic analysis of stochastic programs, Annals of Operations Research, 30 (1991), pp. 169–186.
[20] The method of parametric decomposition in mathematical programming: the nonconvex case, in IIASA PROCEEDINGS SERIES, 1978, p. 131.
[21] V. DeMiguel and F. J. Nogales, On decomposition methods for a class of partially separable nonlinear programs, Mathematics of Operations Research, 33 (2008), pp. 119–139.
[22] R. López-Negrete and L. T. Biegler, A moving horizon estimator for processes with multi-rate measurements: A nonlinear programming sensitivity approach, Journal of Process Control, 22 (2012), pp. 677–688.
[23] V. M. Zavala and L. T. Biegler, The advanced-step nmpc controller: Optimality, stability and robustness, Automatica, 45 (2009), pp. 86–93.
[24] J. Jäschke, X. Yang, and L. T. Biegler, Fast economic model predictive control based on NLP-sensitivities, Journal of Process Control, 24 (2014), pp. 1260–1272.
[25] V. M. Zavala and M. Anitescu, Real-time nonlinear optimization as a generalized equation, SIAM Journal on Control and Optimization, 48 (2010), pp. 5444–5467.
[26] B. Amos and J. Z. Kolter, Optnet: Differentiable optimization as a layer in neural networks, in International Conference on Machine Learning, PMLR, 2017, pp. 136–145.
[27] A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, Differentiable convex optimization layers, Advances in neural information processing systems, 32 (2019).
[28] Introduction to sensitivity and stability analysis in non linear programming, New York: Academic Press, 1983.
[29] J. Gauvin, A necessary and sufficient regularity condition to have bounded multipliers in nonconvex programming, Mathematical Programming, 12 (1977), pp. 136–138.
[30] J. Kyparisis, On uniqueness of Kuhn-Tucker multipliers in nonlinear programming, Mathematical Programming, 32 (1985), pp. 242–246.
[31] M. Benzi, G. H. Golub, and J. Liesen, Numerical solution of saddle point problems, Acta numerica, 14 (2005), pp. 1–137.
[32] S. Demko, W. F. Moss, and P. W. Smith, Decay rates for inverses of band matrices, Mathematics of computation, 43 (1984), pp. 491–499.
[33] Exponential decay of sensitivity in graph-structured nonlinear programs, SIAM Journal on Optimization, 32 (2022), pp. 1156–1183.
[34] L. F. Valenzuela, R. Brown, and M. Pavone, Decentralized implicit differentiation, arXiv preprint arXiv:2403.01260, (2024).
[35] S. Shin, M. Anitescu, and V. M. Zavala, Overlapping schwarz decomposition for constrained quadratic programs, in 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, 2020, pp. 3004–3009.
[36] S. M. Robinson, Generalized equations and their solutions, Part I: Basic theory, Springer, 1979.
[37] Y. Nesterov, Lexicographic differentiation of nonsmooth functions, Mathematical programming, 104 (2005), pp. 669–700.
[38] P. I. Barton, K. A. Khan, P. Stechlinski, and H. A. Watson, Computationally relevant generalized derivatives: theory, evaluation and applications, Optimization Methods and Software, 33 (2018), pp. 1030–1072.
[39] S. Scholtes, Introduction to piecewise differentiable equations, Springer Science & Business Media, 2012.
[40] P. Stechlinski, K. A. Khan, and P. I. Barton, Generalized sensitivity analysis of nonlinear programs, SIAM Journal on Optimization, 28 (2018), pp. 272–301.
[41] P. Stechlinski, J. Jäschke, and P. I. Barton, Generalized sensitivity analysis of nonlinear programs using a sequence of quadratic programs, Optimization, 68 (2019), pp. 485–508.
[42] S. Dempe, Directional differentiability of optimal solutions under Slater's condition, Mathematical programming, 59 (1993), pp. 49–69.
[43] A. F. Izmailov and M. V. Solodov, A note on solution sensitivity for karush–kuhn–tucker systems, Mathematical Methods of Operations Research, 61 (2005), pp. 347–363.
[44] J. M. Danskin, The theory of max-min and its application to weapons allocation problems, vol. 5, Springer Science & Business Media, 1967.
[45] W. W. Hogan, Point-to-set maps in mathematical programming, SIAM review, 15 (1973), pp. 591–603.
[46] J. Gauvin and F. Dubeau, Differential properties of the marginal function in mathematical programming, Optimality and Stability in Mathematical Programming, (1982), pp. 101–119.
[47] Optimal value continuity and differential stability bounds under the mangasarian-fromovitz constraint qualification, Mathematical programming with data perturbations II, (1982), pp. 65–90.
[48] A. V. Fiacco and J. Kyparisis, Sensitivity analysis in nonlinear programming under second order assumptions, in Systems and Optimization: Proceedings of the Twente Workshop Enschede, The Netherlands, April 16–18, 1984, Springer, 1984, pp. 74–97.
[49] J. Bolte, T. Le, E. Pauwels, and T. Silveti-Falls, Nonsmooth implicit differentiation for machine-learning and optimization, Advances in neural information processing systems, 34 (2021), pp. 13537–13549.
[50] E. Busseti, W. M. Moursi, and S. Boyd, Solution refinement at regular points of conic problems, Computational Optimization and Applications, 74 (2019), pp. 627–643.
[51] A. Ali, E. Wong, and J. Z. Kolter, A semismooth newton method for fast, generic convex programming, in International Conference on Machine Learning, PMLR, 2017, pp. 70–79.
[52] A. Sharma, M. Besançon, J. D. Garcia, and B. Legat, Flexible differentiable optimization via model transformations, arXiv preprint arXiv:2206.06135, (2022).
[53] A. V. Fiacco and G. P. McCormick, Nonlinear programming: sequential unconstrained minimization techniques, SIAM, 1968.
[54] A. V. Fiacco and A. Ghaemi, A user's manual for SENSUMT: A penalty function computer program for solution, sensitivity analysis, and optimal value bound calculation in parametric nonlinear programs., tech. rep., 1980.
[55] H. Pirnay, R. López-Negrete, and L. T. Biegler, Optimal sensitivity based on IPOPT, Mathematical Programming Computation, 4 (2012), pp. 307–331.
[56] J. A. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl, CasADi: a software framework for nonlinear optimization and optimal control, Mathematical Programming Computation, 11 (2019), pp. 1–36.
[57] J. A. Andersson and J. B. Rawlings, Sensitivity analysis for nonlinear programming in CasADi, IFAC-PapersOnLine, 51 (2018), pp. 331–336.
[58] V. Kungurtsev and J. Jaschke, A predictor-corrector path-following algorithm for dual-degenerate parametric optimization problems, SIAM Journal on Optimization, 27 (2017), pp. 538–564.
I am normally hidden by the status bar