Logics and integer-programming representations

YALMIP does a lot of modelling for you behind the scenes, but sometimes it is important to know how models are created and what the standard building blocks and tricks are. Creating integer programming representable models seem like magic to some, but there are really only a few standard tricks used leading to a family of models.

  1. Logical models involving binary variables
  2. Logical models involving implications with binary variables and constraints
  3. Multiplications of variables and functions
  4. Representations of functions

Some simple rules and strategies when deriving models for complex logic and combinatorials models:

  1. Try to represent things with disjoint events of the kind “exactly one of these things occur” with a binary variable associated to each event.
  2. Try to arrive at “binary variable implies set of constraints”
  3. Introduce intermediate auxilliary variables to keep things clean
  4. Decompose the logic using intermediate variables to connect parts

In other words, a large sparse simple model is much better than a compact complex model. What we try to emphasize in the examples is that most models can be derived using exactly the same strategy and a few very basic building blocks. Many of the models can be simplified and reduced, but those steps are typically done just as efficiently inside the solver during pre-solve.

Throughout the examples here, a and b represent scalar binary variables, z represents a vector of binary variables, and x is just some generic variable (could be anything).

In the models, we make use of big-M models with associated constant \(M\). We use a notation with the same value everywhere, but in practice you would spend effort on deriving as small constants as possible for every single constraint.

Logical models involving binary variables

s = NOT a

With binary \(a = 1\) representing true and \(a = 0\) representing false, logical negation turns into

s = a AND b

\(s\) has to be \(1\) if both \(a\) and \(b\) are 1. \(s\) has to be \(0\) if either of \(a\) and \(b\) are 0.

The idea generalizes to an arbitrary number of binary variables \(s = z_1 ~\&~ z_2~ \&~ \ldots ~\&~ z_n\) with

s = a OR b

\(s\) has to be \(1\) if either of \(a\) and \(b\) are 1, and \(0\) if none of them are 1.

Also this generalizes trivially to arbitrary length

s = a XOR b

\(s\) has to be \(1\) if exactly one of \(a\) and \(b\) are 1, and \(0\) otherwise

The idea generalizes to an arbitrary number of binary variables with

If a then b

Logical implication of binary variables

Logical models involving binary variables and constraints

If a then \(f(x)\leq 0\)

To ensure a constraint holds when a binary is true, we model implication using a big-M strategy.

This is the model construct which is used for almost all cases below. Make sure you understand why this works. In YALMIP, this is the core of the implies operator.

If logical(z) then \(f(x)= 0\)

Introduce a new binary variable \(s\) to represent the logical condition using the methods above, and then use the standard implication. Note that you do not have to model the logical condition exactly (both directions), all you need is \(\mathop{logical}(z) \rightarrow s\).

This is a general strategy throughout. If you enter complex expressions, introduce new variables for simple sub-parts and build the complete model by combining simple standard models.

If a then \(f(x) < 0\)

Strict inequalities are impossible in practice (unless \(f(x)\) is quantized such as only involving integer variables). Hence, you have to use a margin and hope that this in combination with solver tolerances leads to a solution which actually satisfies the constraints (solutions claimed feasible and optimal can easily be infeasible and are only guaranteed to satisfy solver tolerance and termination critera)

If a then \(f(x)\leq 0\) else \(f(x)\geq 0\)

The standard implication is simply repeated for the two cases.

To create more easily generalizable models and learn a common core strategy, it is adviced to think of this as two disjoint cases each associated with a set of constraints.

Once again ending up as standard models for implications

If a then \(f(x)\leq 0\) else \(g(x)\leq 0\)

We could of course create a model using \(a\) only, but using the same strategy the whole time trains us to realize that all model look the same in principle

Standard models for implications

If a then \(f(x)= 0\)

This is nothing but two inequalities in the implication, hence

If \( f(x) \leq 0\) then a

When \(f(x)\) becomes negative, the binary variable should be forced to be activated. This is accomplished by reversing the constraint and introducing an implication for the reverse model

Note that we use a non-strict inequality. If behaviour around \(f(x)\) is important, a margin will have to be used as discussed before.

If \( f(x) = 0\) then a

First, this is extremely ill-posed in practice as solvers work with floating-point numbers so it might consider, e.g., \(10^{-7}\) to be 0. It is also often ill-posed from a practical point of view. If your model says if waterlevel is 0 is it really absolutely crucial that it behaves differently compared to the case when the waterlevel is \(10^{-11}\), i.e. the thickness of one atom? If yes, how do you plan to implement that solution in practice?

The disjoint logical model is

This is interpreted as

A big-M representation of the implications, using a margin \(\epsilon\) around 0 if wanted leads to

If \( f(x) \leq 0\) then a else not a

Now it starts paying of thinking in terms of auxilliary variables and disjoint cases. This is

A big-M model is thus

If \( f(x) \leq 0\) then \( g(x) \leq 0\)

Glue the two conditions using an intermediate binary variable

These are standard models and we thus have

If (\( f(x) \leq 0\) and \( h(x) \leq 0\) ) then \( g(x) \leq 0\)

Not much variation and the pattern should be clear. Glue conditions using intermediate binary variables

These are standard models and we thus have

Multiplications of variables and functions

y = ab, a and b binary

Binary multiplication is nothing but logical and, hence

Generalization to more terms follows the same generalization as logical and.

y = ax, a binary x, x non-binary

Multiplication of binary and non-binary should be seen as a logical operation

This is implemented using our standard model for implications

y = abx, a and b binary, x non-binary

When there are repeated binary variables, the procedure cab simply be repeated with intermediate variables. Start by introducing a new variable and model for c = ab and then use model for y = cx.

The idea and model generalizes to arbitrary polynomials of binary variables, simply introduce intermediate variables to keep it simple.

y = ax, a binary, x integer

An alternative for multiplying a binary with an integer variable is to make a binary expansion of the integer \(x = z_1 + 2 z_2+ 4z_3 + \ldots +2^{n+1}z_n\) with binary variables \(z\), model products with new variables \(y_i = az_i\) using standard model for binary times binary, and then use \(x = y_1 + 2y_2 + 4y_3 \ldots\).

y = wx, w integer, x continuous

Make binary expansion of w and then create models for binary times continuous.

y = wx, w integer, x integer

Make binary expansions of both w and x and then create models for binary products.

Representations of functions

y = abs(x)

A logical representation of absolute value is

A disjoint representation of this using binary variables

Once again standard implications so the result is a standard big-M model

Remember that absolute value is convex, so you only use a MILP representation if absoutely needed.

y = max(x)

The maximum of a vector can also be thought of as a logical model

Introduce a binary variable for every case and derive disjoint representation

This is finalized with standard implication models

Remember that max is convex, so you only use a MILP representation if absolutely needed.

y = min(x)

Use \( \min(x) = -\max(-x)\).

y = sort(x)

Let \(P\) be a binary permutation matrix (rows and columns sum to 1), and write \(y\) as permutation of \(x\), \(y = Px\). For \(y\) to be sorted (increasing), we must have \(y_i \leq y_{i+1}\). Multiplications \(y = Px\) are implemented using the models above or more directly as, e.g., \(-M(1-P_{ij}) \leq y_i - x_j \leq M(1-P_{ij})\).

y = floor(x)

In theory introduce integer \(y\) and \(x-1 < y \leq x\). In practice strict inequality has to be approximated leading to \(x-1+\epsilon \leq y \leq x\)

y = ceil(x)

In theory introduce integer \(y\) and \(x \leq y < x+1\). In practice strict inequality has to be approximated leading to \(x \leq y \leq x+1-\epsilon\)

y = fix(x)

Round towards zero means we we want floor for positive arguements and ceil for negative arguments. We can write this as

Logical model

Implement the implications with integer \(y\) and a combined model for ceil and floor

y = rem(x,m)

\( y = \mathop{rem}(x,m)\) means \( y = x - nm, n = \mathop{fix}(x/m)\), meaning we have to implement \(\mathop{fix}(x/m)\) using the model above (we assume \(m\) is constant).

y = mod(x,m)

\( y = \mathop{mod}(x,m)\) means \( y = x - nm, n = \mathop{floor}(x/m)\), meaning we have to implement \(\mathop{floor}(x/m)\) using the model above (we assume \(m\) is constant).

y = sgn(x)

A logical model of \(s = \mathop{sgn}(x)\) is

This is interpreted as

A big-M representation of the implications, using a margin \(\epsilon\) around 0 if wanted leads to

y = nnz(x)

To count elements let \(y = \sum_{i=1}^n \z_i\). Introduce additional binary vectors \(v,u\) and the logical model

Once again standard implications

y = f(x), x scalar integer

For an arbitrary function defined over a bounded integer set (here for simple notation assumed to be \(1\leq x \ M\), we simply see it as as the disjoint logic model

This is compactly written as

y = piecewise affine function

A typical piecewise affine model is represented as if \(A_ix\leq b_i\) then \(y = c_i^Tx+d_i\) where \(i = 1,\ldots,N\). From above, this is

Standard implication…

y = piecewise quadratic function

Following the picewise affine model, it is trivial to extend to the quadratic case if \(A_ix\leq b_i\) then \(y = \frac{1}{2}x^TQ_ix + c_i^Tx+d_i\). However, one should be slightly more clever, as the standard approach would introduce equalities involving quadratic expressions, which is nonconvex.

Define \(N\) vectors \(p_i\) with length equal to \(x\), and define \(y\) as \( \sum_{i = 1}^N \frac{1}{2}p_i^TQ_ip_i + c_i^Tp_i+d_i z_i\) where

Standard implication…

Leave a Comment