Document Description

Optimal Solution Predictions for Mixed Integer Programs arxiv: v1 [cs.ai] 23 Jun 2019 Jian-Ya Ding Artificial Intelligence Department, Zhejiang Cainiao Supply Chain Management Co., Ltd

Document Share

Document Transcript

Optimal Solution Predictions for Mixed Integer Programs arxiv: v1 [cs.ai] 23 Jun 2019 Jian-Ya Ding Artificial Intelligence Department, Zhejiang Cainiao Supply Chain Management Co., Ltd Lei Shen Artificial Intelligence Department, Zhejiang Cainiao Supply Chain Management Co., Ltd Bing Wang Artificial Intelligence Department, Zhejiang Cainiao Supply Chain Management Co., Ltd Le Song Computational Science and Engineering College of Computing Georgia Institute of Technology Abstract Chao Zhang Artificial Intelligence Department, Zhejiang Cainiao Supply Chain Management Co., Ltd Shengyin Li Artificial Intelligence Department, Zhejiang Cainiao Supply Chain Management Co., Ltd Yinghui Xu Artificial Intelligence Department, Zhejiang Cainiao Supply Chain Management Co., Ltd Mixed Integer Programming (MIP) is one of the most widely used modeling techniques to deal with combinatorial optimization problems. In many applications, a similar MIP model is solved on a regular basis, maintaining remarkable similarities in model structures and solution appearances but differing in formulation coefficients. This offers the opportunity for machine learning method to explore the correlations between model structures and the resulting solution values. To address this issue, we propose to represent an MIP instance using a tripartite graph, based on which a Graph Convolutional Network (GCN) is constructed to predict solution values for binary variables. The predicted solutions are used to generate a local branching cut to the model which accelerate the solution process for MIP. Computational evaluations on 8 distinct types of MIP problems show that the proposed framework improves the performance of a state-of-the-art open source MIP solver significantly in terms of running time and solution quality. 1 Introduction Mixed Integer Programming (MIP) is widely used to solve vast combinatorial optimizations (CO) in the field of Operations Research (OR). The existence of integer variables endows MIP formulations Preprint. Under review. with the ability to capture the discrete nature of many real world decisions. Applications of MIP includes production scheduling, vehicle routing, facility location, airline crew scheduling, to mention only a few. In many real-world settings, homogeneous MIP instances with similar scales and combinatorial structures are optimized repeatedly, but being treated as completely new tasks. These MIP models across days share remarkable similarities in model structures and solution appearances, which motivates us to integrate Machine Learning (ML) method to explore correlations between an MIP model s structure and its solution values to improve the solver s performance. Identifying correlations between problem structures and solution values is not new, and is widely used as guidelines for heuristics design for CO problems. These heuristic methods are usually human-designed priority rules to guide the search directions to more promising regions in solution space. For example, the nearest neighbor algorithm for traveling salesman problem (TSP) constructs a heuristic solution by choosing the nearest unvisited node as the salesman s next move, based on the observation that two distantly distributed nodes are unlikely to appear consecutively in the optimal route. Similar examples include the shortest processing time first heuristic for flow shop scheduling, the saving heuristic for vehicle routing, the first fit algorithm for bin packing, among many others. A major drawback of heuristics design using problem-specific knowledge is the lack of generalities to other problems, where new domain knowledge has to be re-identified. MIP models can describe CO problems of various types using a standard formulation strategy z = min Ax b,x X c T x, differing only in model coefficients A, b and c. This makes it possible to explore connections between problem structures and the resulting solution values without prior domain knowledge. Predicting solution values of general integer variables in MIP is a difficult task. Notice that most MIP models are binary variables intensive 1, a natural way to explore hidden information in the model is to treat solution value prediction of binary variables as binary classification tasks. Major challenges in solution prediction lies in the implicit correlations among decision variables, since a feasible solution x is restricted by constraints in MIP, i.e., Ax b. Rather than predicting each decision variable value isolatedly, we propose a tripartite graph representation of MIP and use graph embedding to capture connections among the variables. Note that none of the two variable nodes are directly linked in the trigraph, but can be neighbors of distance 2 if they appear in the same constraint in the MIP formulation. Correlations among variables are reflected in embeddings of the trigraph where each vertex maintains aggregate feature information from its neighbors. Incorporating solution prediction results in MIP solving process is not trivial. In fact, false prediction of a single decision variable can sometimes lead to infeasibility of the entire problem. Instead of utilizing the predicted solutions directly, we identify decision variables that are predictable and stable and use this information to guide the Branch and Bound (B&B) tree search to emphasis on unpredictable variables to accelerate solving convergence. This is achieved by a novel labelling mechanism on the training instances, where a sequence of feasible solutions are generated by an iterated proximity search method. Stable decision variables, of which the value remain unchanged across these solutions, are recorded. It is noticeable that although obtaining optimal solutions can be a difficult task, the stable variables can be viewed as an easy-to-predict part that reflects the MIP s local optimality structure. This labelling mechanism is inspiring especially for difficult MIP instances when solving them to optimality is almost impossible. The overall framework of solution prediction based MIP solving can be summarized as follows: Training data generation: For a certain type of CO problem, generate a set of p MIP instances I = {I 1,..., I p } of similar scale from the same distribution D. For each I i I, collect the corresponding variable features, constraint features and edge features, and use the iterated proximity search method to generate solution labels for each binary variable in I i. GCN model training: For each I i I, generate a trigraph from its MIP formulation and train a Graph Convolutional Network (GCN) for solution value prediction of each binary variable. Application of solution prediction: For a new MIP instance I from D, collect features, build the trigraph and use the GCN model to make solution value predictions, based on which an initial local branching cut heuristic is applied to solve I. 1 Take the benchmark set of MIPLIB 2017 [15] as an example, among all 240 MIP benchmark instances, 164 of them are Binary Integer Linear Programming (BILP) problems, and 44 out of the 76 remaining are imbalanced in the sense that binary variables account for more than 90% of all integer variables. 2 2 Related work With a similar motivation, there are some recent attempts that consider integration of ML and OR for solving CO problems. Dai et al. [5] combined reinforcement learning and graph embedding to learn greedy construction heuristics for several optimization problems over graphs. Li et al. [13] trained a graph convolutional network to estimate the likelihood that a vertex in a graph appears in the optimal solution. Selsam et al. [18] proposed a message passing neural network named NeuroSAT to solve SAT problems via a supervised learning framework. Vinyals et al. [20], Kool et al. [10, 11] and Nazari et al. [16] trained Pointer Networks (or its variants) with recurrent neural network (RNN) decoder to solve permutation related optimization problems over graphs, such as Traveling Salesman Problem (TSP) and Vehicle Routing Problem (VRP). Quite different from their settings, our solution prediction framework does not restrict to certain graph based problems, but can adapt to a variety of CO problems using a standard MIP formulation. Quite related to our work, there is an increasing concern on using ML techniques to enhance MIP solving performance. Alvarez et al. [2], Marcos et al. [14], and Khalil et al. [9] tried to use learningbased approach to imitate the behavior of the so-called strong branching method, a node-efficient but time-consuming branching variable selection method in the B&B search tree. He et al. [7] used imitation learning to train a node selection and a node pruning policy to speed up tree search in the B&B process. Khalil et al. [8] used binary classification to predict whether a primal heuristic will succeed at a given node and then decide whether to run a heuristic at that node. Kruber et al. [12] proposed a supervised learning method to decide whether a Danzig-Wolfe reformulation should be applied and which decomposition to choose among all possibles. Interested readers can refer to Bengio et al. [3] for a comprehensive survey on the use of machine learning methods in CO. The proposed MIP solving framework is different from previous work in two aspects: Generalization: Previous solution generation method for CO usually focus on problems with certain solution structures. For example, applications of Pointer Networks [20, 11] are only suited for sequence-based solution encoding, and reinforcement learning [5, 13] type decision making is based on the assumption that a feasible solution can be obtained by sequential decisions. In contrast, the proposed framework do not limit to problems of certain types, but is applicable to all CO problems that can be modeled as MIPs. This greatly enlarges the applicable area of the proposed framework. Representation: Previous applications of ML techniques to enhance MIP solving performance mainly use hand-crafted features and make predictions on each variable independently. Notice that the solution value of an variable is strongly correlated to the objective function and the constraints it participates in, we build a novel tripartite graph representation for MIP, based on which graph embedding technique is applied to extract correlations among variables, constraints and the objective function without human intervene. 3 The Solution Framework ider an MIP problem instance I 0 of the following general form: min c T x (1) s.t. Ax b, (2) x j {0, 1}, j B, (3) x j Z, j Q, x j 0, j P, (4) where the index set of decision variables U := {1,..., n} is partitioned into (B, Q, P), and B, I, P are the index set of binary variables, general integer variables and continuous variables, respectively. The main task here is to predict the probability that a binary variable x j, j B to take value 1 (or zero) in the optimal solution. Next we describe in detail the tripartite graph representation of MIP, the GCN model structure, and how solution prediction results are incorporated to accelerate MIP solving. 3.1 Graph Representation for MIP Our main idea is to use a tripartite graph G = {V, E} to represent an input MIP instance I 0. In particular, objective function coefficients c, constraints right-hand-side (RHS) coefficients b and 3 coefficients matrix A information is extracted from I 0 to build the graph. Vertices and edges in the graph are detailed as follows and graphically illustrated in Fig 1. min c 1 x 1 + c 2 x c n x n a 11 x 1 + a 12 x a 1n x n b 1... a m1 x 1 + a m2 x a mn x n b m Figure 1: Transformation of an MIP instance to a tripartiete graph Vertices: 1) the set of decision variable vertices V V, each of which corresponds to a binary variable in I 0. 2) the set of constraint vertices V C, each of which corresponds to a constraint in I 0. min c 1 x 1 + c 2 x c n x 3) an objective function vertex o. n Edges: a 11 x 1 + a 12 x a 1n x n b 1 has a non-zero coefficient in the corresponding... constraint of c in the MIP formulation. 2) v-o edge: for each v V V, there exists an edge between v and o. a m1 x 1 + a m2 x a mn x n b m 1) v-c edge: there exists an edge between v V V and c V C if the corresponding variable of v 3) c-o edge: there exists an edge between c V C and o if c isactive in the MIP s root LP relaxation. Remark. The presented trigraph representation not only captures connections among the variables, constraints and objective functions, but maintains the detailed coefficients numerics in its structure as well. In particular, non-zero entries in coefficient matrix A are included as features of v-c edges, entries in objective coefficients c as features of v-o edges, and entries in RHS coefficients b as features of c-o edges. Note that the constraint RHS coefficients b are correlated to the objective function by viewing LP relaxation of I 0 from a dual perspective. 3.2 Solution Prediction for MIP We describe in Algorithm Step 1 1 the overall Step forward 2 propagation Step prediction 3 procedure Step 4 based on the trigraph. The model has three stages: 1) a projection layer of same embedding size for each node so that node feature has the same dimension (line 1-3 in the algorithm). 2) a graph attention network Embedding to transform node information among connected nodes (line 4-10). 3) two fully-connected layers between variable nodes and the output layer after several information transitions (line 11). Embedding Algorithm 1 Graph Convolutional Network (forward propagation) Input: Graph G = {V, E}; Input features {x j, j V}; Number of transition iterationst ; Weight matrices W t, t {1,..., T } for graph embedding; Output layer weightmatrix W out ; Non-linearity σ; Embedding Neighborhood function N ; Attention coefficients α. Output: Predicted value of all variable: z v, v V V. 1: h 0 v EMBEDDING(x v), v V V 2: h 0 c EMBEDDING(x c), c V C 3: h 0 o EMBEDDING(x o) 4: for t = 1,..., T do Graph Convolution Layer ( ( 5: h t o σ W t V o CONCAT h t 1 o, )) v V V N (o) αvoht 1 v 6: for c in C do (: ( 7: h t c σ W V c CONCAT σ ( W t oc CONCAT ( )) )) h t o, h t 1 c, v V V N (c) αvcht 1 v ( ( 8: h t o σ W Co CONCAT h t o, )) c V C N (o) αcoht o 9: for v in V do (: ( 10: h t v σ W Cv CONCAT σ ( W ov CONCAT ( )) h t o, h t 1 )) v, c V C N (v) αcvht c Dense Layer 11: z v σ ( W out CONCAT ( h 0 v, h T v )), v VV Nodes representations in the tripartite graph are updated via a 4-step procedure. In the first step (line 5 in Algorithm 1), the objective node o aggregates the representations of all variable nodes 4 min c 1 x 1 + c 2 x c n x n {h v, v V V } to update its representation h o. In a the second step (line 7), {h v, v V V } and h o are 11 x 1 + a 12 x a 1n x n b 1 used to update representation of their neighboring constraint node c V C. In the third step (line 8),... representations of constraints {h c, c V C } are aggregated to update h o, while in the fourth step (line 10) {h c, c a m1 Vx C 1 + } and a m2 xh 2 + o are... + combined a mn x n bto m update {h v, v V V }. See Fig. 2 for an illustration of information transition flow in the trigraph. Step 1 Step 2 Step 3 Step 4 Figure 2: Information transition flow in the trigraph convolutional layer The information Embedding transitions run consecutively as follows: Step 1, transform variable nodes information to the objective node; Step 2, transform the objective and variable nodes information to constraint nodes; Step 3, transform constraint nodes information to the objective node; Step.4, transform the objective node and constraint nodes information to variable nodes; Embedding The intuition behind Algorithm 1 is that at each iteration, a variable node can incrementally gather more aggregation information from its neighboring nodes, which correspond to the related constraints and variables in the MIP formulation. It is worth mentioning that the above transitions only extract Embedding connection relations among the nodes, ignoring the detailed Graph Convolution coefficients Layer numerics A, b and c in MIP. To enhance the representation ability of our model, we include an attention mechanism to import information from the coefficients values. Dense Layer Outputs Attention Mechanism: A distinct feature in the tripartite graph structure is the heterogeneities in nodes and arcs. Rather than using a shared linear transformation (i.e., weight matrix) for all nodes [19], we consider different transformations in each step of graph embedding updates, reflecting the importance of feature of one type of node on the other. In particular, given node i of type T i and node j of type T j, the attention coefficient which indicates the importance of node i V from its neighbor j N (i) is computed as: ( α ij = σ W att T i,t j CONCAT ( ) ) h i, h eij, h j, (5) where h i, h j, h eij are embeddings of node i, j V and edge (i, j) E respectively, and W att T i,t j is the attention weight matrix between type T i and T j nodes. For each i V, the attention coefficient is normalized cross over all neighbor nodes j N (i) using a softmax function. With this attention mechanism, MIP coefficients information in A, b and c (all of which contained in the feature vector of the edges) are incorporated to reflect edge connection importance in the graph. 3.3 Prediction-Based MIP Solving Next, we introduce how the solution value prediction results are utilized to improve MIP solving performance. One approach is to add a local branching type initial cut to the MIP model to reduce the search space of feasible solutions This method aim to identify decision variables that are predictable and stable, and guide the B&B tree search to emphasize on unpredictable variables which accelerate solving convergence while maintaining near optimality. Let ˆx j denotes the predicted solution value of binary variable x j, j B, and let S B denotes an subset of indices of binary variables. A local branching initial cut to the model is defined as: x j + (1 x j ) φ, (6) j S:ˆx k j =0 j S:ˆx k j =1 where φ is a problem parameter that controls the maximum distance from a new solution x to the predicted solution ˆx. Adding cuts with respect to subset S rather than B is due to the unpredictable nature of unstable variables in MIP solutions. Therefore, only those variables with high probability to take value 0 or 1 are included in S. It is worth mentioning that for the extreme case of φ equals 0, the initial cut is equivalent to fixing variables with indices in S at their predicted values. 5 4 Data Collection Features: An ideal feature collection procedure should capture sufficient information to describe the solution process, and being of low computational complexity as well. A good trade off between these two concerns is to collect features at the root node of the B&B tree, where the problem has been presolved to eliminate redundant variables and constraints and the LP relaxation is solved. In particular, we collect for each instance 3 types of problem features, i.e., variable features, constraint features and edge features. Features descriptions are summarized in Table 1 in Appendix A. As presented in the feature table, features of variables and constraints can be divided into three categories: basic features, LP features and structure features. The structure features (most of which can be found in [2, 9]) are usually hand-crafted statistics to reflect correlations between variables and constraints. It is noticeable that our tripartiate graph neural network model can naturally capture these correlations without human expertise and can generate more advanced structure information to improve prediction accuracy. This will be verified in the computational evaluations section. Labels: To make predictions on solution values of binary variables, an intuitive labeling scheme for the variables is to label them with the optimal solution values. Note, however, obtaining optimal solutions for medium or large scale MIP inst

Similar documents

Search Related

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks