Thursday, February 27, 2014

Some pitfalls of the sensitivity analysis for LPs

The old, good sensitivity analysis for LPs...It tries to answer a crucial question:

How does the solution of our optimization model react to small perturbations of the input data?

Don't worry, we are not going to review the principles of the sensitivity analysis, one can refer for instance to [1], but we will show that the commonly used techniques can hide unexpected pitfalls.
In general, an LP sensitivity analyzer returns the shadow price of a given input parameter, i.e. broadly speaking the rate of variation of the objective function with respect to the parameter, and the interval in which the that price is constant.

Every OR practitioner should know this tool and its potential pitfalls. But it is not often the case...
here goes the story: one of our customer was puzzled by the results of the sensitivity analysis he was carrying on some LP problems (related to some kind of unit commitment problem): why the shadow prices of the solutions were sometimes different? And why they did not seem to make much sense to him? 

He boiled down to a VERY minimal example for us:

\begin{equation}
\begin{aligned}
& \min & 10^5 x_1 + x_2 \\
\\
&& x_1  + x_2  =100\\
\\
&& x_1,x_2 \in [0,100]
\end{aligned}
\end{equation}

The problem is so simple that can be solved by hand: the optimal solution is $(x_1^\star, x_2^\star)=(0,100)$. It is also degenerate (either the upper or the lower bounds can be dropped) with multiple optimal basis: to see that, let's recast the problem in standard form:

\begin{equation}
\begin{aligned}
& \min & 10^5 x_1 + x_2 \\
\\
&& x_1  + x_2  =100\\
&& x_1  + s_1  =100\\
&& x_2  + s_2  =100\\
\\
&& x_1,x_2,s_1,s_2 \geq 0
\end{aligned}
\end{equation}

The optimal solution now reads $(x_1^\star, x_2^\star, s_1^\star, s_2^\star )=(0,100,100,0)$. It is straightforward to see that there are two optimal basis: $B_1=\{x_2,s_1,s_2\}$ and $B_2=\{x_2,s_1,x_1\}$.

What if we ask for a sensitivity analysis about the linear constraint in problem (1)?  Using the command line MOSEK sensitivity analyser the answer is:


BOUNDS CONSTRAINTS
INDEX    NAME  BOUND    LEFTRANGE        RIGHTRANGE       LEFTSHADOW       RIGHTSHADOW     
0        c0000 FX       -1.000000e+02    -0.000000e+00    1.000000e+00     1.000000e+00    


that is the RHS can be reduced by up to $100$ (the LEFTRANGE) with unitary shadow price.  But we have no information about what might happen if we increase the RHS. Let's  visualize what we would like to investigate (RHS is renamed as $\beta$):

The gray area represent the region defined by the box constraints. The red line is the feasible region of problem (1) and the optimal value is the red dot on the left-up corner. Varying RHS results in a translation of that feasible region and a different optimal value. The yellow lines/points correspond to the feasible region/optimal solution as RHS decreases; the light-blue ones, corresponding to an increase of RHS,  are not reported by the analyzer. But still, it is clearly possible to increase RHS...so what's going on?

This problem is quite well-known in the OR community. In [3], the main issues and pitfall are clearly summarized: the standard analysis, named Type-I, depends on the optimal basis. If more than one basis corresponds to an optimal solution, the analysis will be in general incomplete because limited by the changing of basis.

In our case, the solutions  obtained decreasing RHS share the same basis, while a different basis corresponds to the other ones. So the scope of our analysis is limited by the basis reported by the solver.

A more accurate analysis can be carried out using more computational intensive techniques, sometimes referred as Type-III sensitivity or Optimal Partition Sensitivity (OPS): simply speaking, a sequence of LP problems based on the current optimal solution are solved in order to asses the sensitivity, regardless of the particular basis that might correspond to the solution (see [2]).

Again, let's look at our example. We use the OPS tool available in MOSEK to analyze again the same constraint. The result follows:

BOUNDS CONSTRAINTS
INDEX    NAME  BOUND    LEFTRANGE        RIGHTRANGE       LEFTSHADOW       RIGHTSHADOW     
0        c0000 FX       -1.000000e+02    1.000000e+02     1.000000e+00     1.000000e+05    

Now we can investigate also the increasing of the RHS! As one might expect, a different shadow price correspond to this side of the RHS interval.

The lesson is that in most of the practical applications, optimal solutions are degenerate and thus the Type-I sensitivity analysis might be incomplete. In general, there is no way to predict which basis will be returned by a solver. The OPS analysis, despite being more computational demanding, can deliver more accurate information and it is therefore a tool that practitioners should consider and learn.

[1] Chv├ítal, V. (1983). Linear programming. 
[2] Roos C., Terlaky T., Vial J. P. (1997). Theory and algorithms for linear optimization: an interior point approach. Chichester: Wiley.
[3] Koltai T.,  Terlaky T. (2000). The difference between the managerial and mathematical interpretation of sensitivity analysis results in linear programming. International Journal of Production Economics, 65(3), 257-274.



Thursday, February 20, 2014

Conic duality in practice: when dual beats the primal...


Despite its simplicity, the problem we have been discussing in the last two posts (see the first and second post) is a nice source of examples and insights about conic programming. The formulation we obtained is:

\begin{equation}
\begin{aligned}
\min r_0&\\
 s.t.&\\
 &\quad ( r_0\mathbf{e}, P-\mathbf{e}p_0^T) \in \Pi_{i=1}^k Q^{(n+1)}.
\end{aligned}
\end{equation}

Recall that $p_0,p_i$ are row vectors and $e$ is a vector with all ones. We might wonder whether the dual problem of (1) is harder or easier to solve. Using Conic Programming duality this can be done pretty easily.

So let's use the duality theory of Conic Programming to derive the dual problem of (1).  First, to derive the dual we restate (1) in a slightly different way:

\begin{equation}
\begin{aligned}
\min r_0&\\
 s.t.&\\
 &\quad ( r_0, p_0^T - p_i) \in Q^{(n+1)} & i=1,\ldots,k.
\end{aligned}
\end{equation}

Following [1], this is our primal formulation and the construction of the dual problem should be quite straightforward! Consider the standard primal form (see pp. 69 in [1]):

\begin{equation}
\begin{aligned}
\min c^T x&\\
 s.t.&\\
 &\quad A_i x_i -b_i \in K_i & i=1,\ldots,k,
\end{aligned}
\end{equation}

where $K_i$'s are cones and $x_i=(r_o , p_0)^T$: problem (2) follows by setting $ c=( 1 , 0_{n})^T$ and

\[
A_i= I_{n+1},\quad b_i= \begin{pmatrix} 0 \\ p_i^T \end{pmatrix},\quad K_i=Q^{(n+1)} , \quad i=1,\ldots,k..
\]

To formulate the dual,  for each cone $i$ we introduce a vector $y_i \in R^{n+1}$ of dual variables such that $y_i$ must belong to the self-dual cone of $Q^{(n+1)}$, i.e. $Q^{(n+1)}$ itself. The dual problem takes the form

\begin{equation}
\begin{aligned}
\max & \sum_{i=1}^k b_i^T y_i\\
 s.t.&\\
 &\sum_{i=1}^k y_i = c\\
 &y_i\in Q^{(n+1)} & i=1,\ldots,k.
\end{aligned}
\end{equation}

which is equivalent to

\begin{equation}
\begin{aligned}
\max & \mathrm{Tr} \sum_{i=1}^k B Y^T \\
 s.t.&\\
 & e_k^T Y = c\\
 &Y \in \Pi_{i=1}^k Q^{(n+1)}
\end{aligned}
\end{equation}

where \[ Y=  \begin{pmatrix} y_1^T \\ \vdots \\ y_k^T \end{pmatrix}, B=  \begin{pmatrix} p_1^T \\ \vdots \\ p_k^T \end{pmatrix}, e_k= \begin{pmatrix} 1\\ \vdots \\ 1\end{pmatrix}\in R^k.\]

Let's implement the dual formulation (5) using the MOSEK Fusion API. The code looks like

The code is very simple, compact and self-commenting. Note how we also exploit the fact that the first column of $B$ is composed by zeros. What about performance? The dual beats the primal in every sense! Look at the plot below (tests have been done on my desktop machine Pentium(R) Dual-Core  CPU E5700  @ 3.00GHz under Linux) where we report the running time needed to build the optimization model and to solve it.



The dual is much faster both in terms of time spent in building the model, which is indeed much simpler, and also in terms of solver execution time! Both model building scale almost linearly, but also the solution time does. If the former is expected, the latter might come as a surprise! Note however, that model building time matters when tackling polynomially solvable problems....

Final take-away message:

- Solving the dual instead of the primal can be much more efficient!

- The conic formulation helps to derive the dual problem!


[1] Ben-Tal, A., & Nemirovski, A. (2001). Lectures on modern convex optimization: analysis, algorithms, and engineering applications (Vol. 2). Siam.

Monday, February 17, 2014

Smallest sphere containing points using Fusion

In the previous post we formulated the fairly intuitive problem
How to find the smallest sphere that encloses a set of $k$ points $p_i \in R^n$.
as a second order conic problem:

\begin{equation}
  \begin{aligned}
\min r_0&&&\\
s.t.&&&\\
& r_i = r_0,&\quad& i=1,\ldots,k\\
& w_i = p_i - p_0,&\quad& i=1,\ldots,k\\
& (r_i,w_i) \in Q^{(n+1)}, &\quad& i=1,\ldots,k.
\end{aligned}
\end{equation}


We will show in this post how the MOSEK Fusion API for conic optimization allows for a very compact prototyping. Let's start!

First, we assume that the points are represented as row vectors, i.e. $p_0=(p_0^1,\ldots,p_0^n)$. The given points are organized in a matrix $P$ by row, i.e. $P\in R^{k\times n}$

We reformulate the problem using a different, yet equivalent, formulation using the fact, see [1], that an affine transformation preserve the conic representability of a set, and thus we can write:

\begin{equation}
r= r_0\mathbf{e}, \quad w = P - \mathbf{e}p_0.
\end{equation}

being $ \mathbf{e}\in R^n$ a vector with all components equal to one. The formulation we are looking for is

\begin{equation}
\begin{aligned}
\min r_0&\\
 s.t.&\\
 &\quad ( r_0\mathbf{e}, P-\mathbf{e}p_0^T) \in \Pi_{i=1}^k Q_k^{(n+1)}.
\end{aligned}
\end{equation}

Now let's input the model using the Python Fusion API, but for the other supported languages it would look like much the same. Assuming the given points are stored in a matrix p, the following few steps suffices (click here for the whole code):

1- create an optimization model

2- declare the variables (note $p_0$ is a row vector)

3- define the constraints in one shot (here np is shorthand for numpy)


 Note how we "stack" columns side by side obtaining a $k\times (n+1) $ matrix of expressions, which will be interpreted line by line to obtain the $k$ ice-cream cones.

4- define the objective function


5- solve !


Does it work? Of course it does....here a solution for $n=1000000, k=2$.


Pretty neat right? Where is the magic and what are the pros and cons?

The magic is the ability of Fusion to convert the expressions in standard conic constraints, i.e. the first model we formulate: all the additional variables will be generated automatically by Fusion!

How does it look like the same code with the standard Pyhton API? Just take a look at the constraint declaration. First we need some offset in the variable set:
Then a loop to create all the linear and conic constraints:



The code is still manageable one might say, but it is partially because Python allows for very compact notation. But the point is: could you easily deduce the model implemented by the code? And moreover, what if an offset goes wrong? To you the answers....

Fusion vs. Standard API

Pros:
- a cleaner and shorter code
- easier for the solver to analyze the structure of the model
- avoid the use of indexes and offset to input the problem data
- closer to a modeling language

Cons:
- require more modeling skills
- slower execution during the creation of the model, not the solver execution, at least for large models

We believe usually the pros exceed the cons, especially in the long run when users get used to the syntax and modeling philosophy.

Give Fusion a try!

[1] Ben-Tal, A., & Nemirovski, A. (2001). Lectures on modern convex optimization: analysis, algorithms, and engineering applications (Vol. 2). Siam.

Friday, February 14, 2014

A nice example of conic optimization modeling

Few days ago, my boss Erling Andersen suggested me to look at this problem to learn the MOSEK Fusion API:
How to find the smallest sphere that encloses a set of $k$ points $p_i \in R^n$.
The intuitive formulation is to minimize the radius r of the sphere centered in $p_0 \in R^n$, i.e.

\begin{align}
& \min r \\
& s.t. \\
& & || p_i - p_0||_2 \leq r, & i=1,\ldots,k
\end{align}

The inspiration came from a recent book¹ where the problem, probably taken from the GAMS repository², is presented as a difficult test for nonlinear local optimizers.

At the same time (what a coincidence!), O'Neil and Puget published interesting post about the Chebyshev centers of polygons...which turns out to be equivalent to what I was looking at! As Puget pointed out, the problem is known to be simple: Megiddo³ showed back in 1982 that it can be solved in linear time by a specialized algorithm.

Anyway, we will use it as an example of conic programming modeling example. First we introduce auxiliary variables such that  $w_i = p_i - p_0$, yielding

\begin{align}
& \min r \\
& s.t.\\
& & || w_i ||_2 \leq r, & i=1,\ldots,k\\
& & w_i = p_i - p_0, & i=1,\ldots,k
\end{align}

Constraints (6) define quadratic (Lorentz) cones $Q^{(n+1)}$, i.e.

\begin{equation} || w_i ||_2   \leq r    \Rightarrow (r, w_i) \in  Q^{(k+1)}.\end{equation}

For the second order cones to be disjointed, we introduce auxiliary variables $r_i, i=1,\ldots,n$ that must be equal to $r$. The final formulation of our problem is

\begin{align}
& \min r\\
& s.t.\\
& & r_i = r,  & i=1,\ldots,k\\
& & w_i = p_i - p_0, & i=1,\ldots,k\\
& & (r_i,w_i) \in  Q^{(n+1)}, & i=1,\ldots,k
\end{align}

which is a fairly simple conic quadratic optimization problem!

In the next post we will show how to easily implement this problem using the Mosek Fusion API.

¹ Neculai, Andrei, "Nonlinear Optimization Application Using the GAMS Technology", Springer 2013
² http://www.gams.com/testlib/libhtml/circlen.htm
³ Megiddo, Nimrod. "Linear-time algorithms for linear programming in R3 and related problems." Foundations of Computer Science, 1982. SFCS'08. 23rd Annual Symposium on. IEEE, 1982.

New repository for Rmosek interface available!

As some Rmosek users might know, we have been experiencing technical issues with the CRAN repository that prevent us from updating the Rmosek package.

We have decided to host the package repository at  mosek.com, which will make it easy for users to install the latest release simply from the R command line. Follow us on the dedicated website at



for more updates and details!

Monday, February 10, 2014

A New Product Manager at Mosek

Dear all,
my name is Andrea Cassioli, and I am glad to announce that from February the 1st, I am the new Mosek Product Manager.

Let me introduce myself briefly: my background is a blend of computer science and operational research. I got a PhD. in Computer Science at the University of Florence (IT) working on heuristic methods for difficult global optimization problems. Then I worked as Post-Doc researcher at the University of Florence (IT), the Massachusetts General Hospital (US) and lastly at the Ecole Polytechnique (FR). My research topics has been always focused on solving optimization problems from real-life applications, as for instance machine-learning, urban traffic equilibrium, radiotherapy planning, structural biology. I enjoy coding in several programming languages (C/C++, Python, Java).

I'd like to thank the Mosek CEO E. Andersen for choosing me for this position and all the people  at Mosek for the help they are providing me in this initial period.

I look forward to work with all of you and give my contribute to the grow of the company!

Best,
Andrea Cassioli







Wednesday, February 5, 2014

The MOSEK/Julia interface is now official

At MOSEK we have for some time been aware of the new language called "Julia" (http://julialang.org), but have not really had an excuse to play with it. Then, when we attended the INFORMS conference in Minneapolis last year, we met a couple of guys working on a mathematical modeling extension for Julia (https://github.com/JuliaOpt/JuMP.jl). Fast forward one day and the first MOSEK/Julia interface prototype had been created (first two prototypes, actually - one by me and one by the Julia guys), and from there it just rolled. Now, a couple of months on, we have an almost complete port of the MOSEK solver API and an implementation of the MathProgBase interface allowing LP, QP, SOCP and mixed integer problems to be solved from JuMP using MOSEK.

The JuMP guys from INFORMS 2013 have provided invaluable help and feedback, so many thanks to Miles Lubin and Iain Dunning!

While at first sight there is nothing special about the Julia language, a bit further use quickly shows that it is not just another Matlab-but-done-right or LISP-with-a-new-syntax (I've seen that comment more than once). Or may it is - only actually done right and with a new pleasant syntax! I don't think I can point out any new and innovative features, only the tries and tested old ones combined in a sensible way... You could even get the impression that the language designers knew what they were doing! There are still some rough edges (start-up times, for example), though you can easily forgive that in a version 0.2.0 that is more usable than most version 1.0s.