Optimal Control of Cooperative MultiRobot
Systems using MixedInteger Linear Programming
Christian Reinl,Oskar von Stryk
∗
Abstract
A new planning method for optimal control of multirobot systems is discussed which
accounts for the (continuous) physical locomotion dynamics of the robots and its tight
coupling to the distribution and allocation of (discrete) subtasks to the robots to fulﬁll a
joint mission.The point of departure is a nonlinear and nonconvex hybrid optimal control
problem (HOCP) formulation which incorporates a detailed hybrid automaton model.
Because of the many diﬃculties involved in solving this problem like large computational
times and the lack of good or global convergence properties it is transcribed into a mixed
integer linear program (MILP).This can be solved much more eﬃciently using existing
algorithms.The proposed approach is outlined for an example problem of cooperative
soccer robots.The MILP solution itself may serve either as a good initial solution estimate
for a method addressing the nonlinear HOCP or may later become the kernel of a model
predictive control method for cooperative multirobot systems.Despite the promising
results obtained so far a variety of open questions yet remains to be answered including the
”best” way of transcribing HOCP to MILP with respect to both computational eﬃciency
and good HOCP solution approximation.
1 Introduction
In this paper multirobot systems are considered where the individual nonlinear physical
motion dynamics is of fundamental importance for the mission success which depends on
optimizing physical values like the robots’ positions or energy consumption.
The problem’s possible combinatorial character complicates an analytical inspection and
hardly theoretically proved results are available for the most general problemformulation.The
key idea in this paper is to build up a centralized MILPbased (cf.[3]) controller for multi
vehicle systems,starting with a systematic HOCP description (Sect.2) and a consistent
transformation (Sect.3) towards optimization based control (Sect.4).Tight coupling of
discrete states (e.g.actions) an respective continuous state variables (positions,velocities
etc.) is a basic feature in there.Especially in an uncertain setting (failures,uncontrollable
objects),the robustness of MILP oﬀers an eﬃcient (cf.[2]) way to be used in receding horizon
controllers.To illustrate this approach we refer to a benchmark problemfromrobot soccer (cf.
∗
Technische Universit¨at Darmstadt,CS Dept.,Email:{reinl,stryk}@sim.tudarmstadt.de.
1
y
x
(x
D
,y
D
)
defender
x
field
(v
x,B
,v
y,B
)
(v
x,1
,v
y,1
)
y
goal
(x
1
,y
1
)
(x
B
,y
B
)
(x
2
,y
2
)
(v
x,2
,v
y,2
)
(0,0)
(v
x,D
,v
y,D
)
y
field
Figure 1:Setting of the soccer benchmark problem
(x
D
,y
D
)
(x
2
,y
2
)
(x
1
,y
1
)
(x
B
,y
B
)
x
y
Figure 2:Contributions to objective
Fig.1) with two strikers versus one (passive) defender,all modeled as moving point masses.
The intention is to ﬁnd the control which optimizes the attackers’ chances for a considered
time horizon [t
0
,t
f
].Results for this representative example will be given in Sect.4.
2 Modeling the cooperative multirobot system
We are considering (in)direct controllable and not controllable moving objects i in our system.
Each one is characterized by its continuous dynamic state x
i
(e.g.position,velocity,...) and a
discrete value q
i
that denotes a certain subtask or role.Together with the continuous control
variable u
i
,the continuous state evolves subject to
˙
x
i
= f
q
i
,i
(x
i
,u
i
).By deﬁning (usually
unknown) switching times t
s
and corresponding speciﬁcations,how to connect x
i
when q
i
switches at t
s
,the individual trajectory for an object is determined.
For the regarded soccer example,i ∈ {1,2,B,D} denotes two strikers,one ball and a
defender.As modes of motion q
i
for the strikers we distinguish free
moving and dribbling.
2.1 Modeling switched dynamics with hybrid automata
We are regarding multirobot systems consisting of moving objects with speciﬁc modes of
motion and rules that deﬁne feasible sequences for them.Thus we are using hybrid automata
to describe the cooperative system.They are well established in the context of robot control.
A hybrid automaton [5] H = (V,E,X,U,ini,f,j,i,e) consists of a ﬁnite directed multi
graph (V,E) with knots in V (called states) and edges in E (socalled switches),a set of
continuous state variables X,a set of continuous control variables U,a map ini which assigns
an initial condition to each edge,the invariants provided by the map i which assigns each
knot with a feasible region for the continuous states and controls using equality and inequality
constraints,a map f which assigns a ﬂow equation or state dynamics to each state,a map
j which assigns jump conditions to edges and a map e which assigns events to edges which
occur at switches.For the proposed soccer application (cf.Fig.3) we added another hierarchy
there that contains conditions that are similar in the covered knots.In this model we only
distinguish whether the ball is dribbled,rolls free or is inside the goal.The respective motion
2
i:dist
2,D
≥ γ
2,D
i:dist
1,D
≥ γ
1,D
i:dist
1,2
≥ γ
1,2
i:dist
D,B
≥ γ
D,B
i:x
B
∈ goal
i:g
1
(
˙
x
1
,u
1
) ≤ 0
i:g
2
(
˙
x
2
,u
2
) ≤ 0
Ball in goal
5
f:
˙
x
1
= 0
f:
˙
x
D
= 0
f:
˙
x
2
= 0
f:
˙
x
B
= 0
e:goal
Game is running
1
i:g
2
(
˙
x
2
,u
2
) ≤ 0
Ball free
3
j:dist
1,B
≤ ε
dribble
e:kick(1)
e:kick(2)
e:catch(2)
f:
¨
x
1
= f
1
(x
1
,
˙
x
1
,u
1
)
i:dist
1,B
> ε
dribble
i:dist
2,B
> ε
dribble
i:g
1
(
˙
x
1
,u
1
) ≤ 0
f:
¨
x
1
= f
1,B
(x
1
,
˙
x
1
,u
1
)
Player 1 dribbles ball
2
j:y
B
 ≤ y
goal
f:
˙
x
B
= f
B
(x
b
)
f:
¨
x
2
= f
2
(x
2
,
˙
x
2
,u
2
)
j:x
B
 ≥ x
field
Player 2 dribbles ball
4
i:g
2
(
˙
x
2
,u
2
) ≤ 0
i:g
1,B
(
˙
x
1
,u
1
) ≤ 0
i:dist
1,B
≤ ε
dribble
f:
¨
x
B
= f
1,B
(x
1
,
˙
x
1
,u
1
)
f:
¨
x
2
= f
2
(x
2
,
˙
x
2
,u
2
)
i:g
2,B
(
˙
x
2
,u
2
) ≤ 0
i:g
1
(
˙
x
1
,u
1
) ≤ 0
i:dist
2,B
≤ ε
dribble
f:
¨
x
B
= f
2,B
(x
2
,
˙
x
2
,u
2
)
f:
¨
x
1
= f
1
(x
1
,
˙
x
1
,u
1
)
f:
¨
x
2
= f
2,B
(x
2
,
˙
x
2
,u
2
)
j:dist
2,B
≤ ε
dribble
e:catch(1)
f:
˙
x
D
= f
D
(x
1
,x
2
,x
B
)
i:x
B
∈ field
Figure 3:Hierarchical hybrid automaton model of the switched motion dynamics
dynamics of a dribbling robot is indexed by “B”.The initial conditions ini are deﬁned with
the position x
i
(t
0
) of the objects i.Catching and kicking a ball are modeled by events
kick(i):
˙
x
B
(t
s
+0) = 3 ∙
˙
x
i
(t
s
−0),catch(i):x
B
(t
s
+0) = x
i
(t
s
−0),(1)
t
s
±0:= lim
→0,>0
t
s
±.All other state trajectories are required to be continuous at t
s
.
The auxiliary variable dist
i
1
,i
2
represents a distance measure between objects i
1
,i
2
and is
used to express collision avoidance (with a constant γ
i
1
,i
2
).Further constraints on state and
control variables according to the speciﬁc motion modes are modelled as invariants i.
The dynamic of the defender is not considered to be switched here.We tested our approach
with a simple model for the dynamic of robots and ball and set for all states except
5
f:
˙
x
B
(t) = v
B
(t),f:
¨
x
§
(t) =
Ã
¨x
§
(t)
¨y
§
(t)
!
=
Ã
˙v
x,§
(t)
˙v
y,§
(t)
!
= u
§
(t) =
Ã
u
x,§
(t)
u
y,§
(t)
!
(§ ∈ {1,2}) (2)
For a dribbling robot the upper bound on its velocity is reduced by a factor c
U
v,dr
.
The numerical method proposed in [4] for the general HOCP formulation uses piecewise
polynomials and binary variables to transfer the hybrid automaton and an objective function
into a ﬁnite dimensional sparse nonlinear mixedbinary optimization problem.It was solved
with a combination of sequential quadratic programming and BranchandBound techniques.
More details and results for this model are given there.Now we are interested in a MILP
formulation that provides a eﬃcient optimizationbased control considering the basic system
characteristics.
2.2 The linearized model
We are introducing a ﬁxed (not necessary) equidistant grid of time points with the sampling
time T
s
:= t
k
−t
k−1
.For the evolution of the continuous state and control variables we are
deﬁning x(k):= x(t
k+1
) and u(k):= u(t
k+1
).The knots in the describing automaton can
switch only at these time points and are not free any more.All (in)equalities that were used
to describe the diﬀerent states and the motion dynamics in the systems will be reformulated
3
or transformed into linear expressions now.Thus the diﬀerential ﬂow conditions f must be
reformulated as diﬀerence equations
˙
x = f
q,i
(x,u) Ã x(k +1) = A
q,i
∙ x(k) +B
q,i
∙ u(k).(3)
Additional state variables and binary variables may be necessary here for case diﬀerentiations
and combination of these cases with logical expressions.In the context of hybrid automata,
this case diﬀerentiations are treated as new subknots.This splitting up strongly depends on
the nonlinearity of the expression and the desired accuracy of the transformed model.The
regions deﬁned by the invariants i are approximated in a polygonal manner.Linear inequali
ties are combined logically therefore.If nonlinear expressions occur in the jump conditions j
of events e,they have to be treated respectively.Afterwards,the automaton is clocked and
covers only linear expressions and logical constraints.Application to the example results in
x
§
(k +1) = x
§
(k) +T
s
v
x,§
(k),v
x,§
(k +1) = v
x,§
(k) +T
s
u
x,§
(k),
x
B
(k +1) = x
B
(k) +T
s
v
x,B
(k),v
x,B
(k +1) =
(
v
x,§
(k) (dribbling)
c
trac
T
s
v
x,B
(k) (ball free)
(§ ∈ {1,2},c
trac
∙ T
s
< 1,y
i
,v
y,i
,analogously).A simple,reactive defender that is always
moving towards the current ball position is modeled by
x
D
(k +1) = x
D
(k) +Ts v
x,D
(k),v
x,D
=
v
U
x,D
D
max
(x
b
(k) −x
D
(k)),
(4)
(D
max
≥ max
x,y,k
{x
b
(k)−x
D
(k),y
b
(k)−y
D
(k)},y
D
,v
y,D
,analogously).The constant v
U
x,D
is the upper bound for v
D
.In the investigated example the controls and velocities are con
straint by quadratic expressions.Generally expressions of the form
±
p
(x
1
−x
2
)
2
+(y
1
−y
2
)
2
≤ ±r can be transformed by using n
γ
≥ 4 linear expressions
±sin(
i
3
π) (x
1
−x
2
) ±cos(
i
3
π) (y
1
−y
2
) ≤ ±r (i = 1,...,n
γ
).(5)
Thus the invariants i:g
q
i
,i
(
˙
x
i
) = (v
x,i
,v
y,i
)
T

2
−v
U
q
i
,i
≤ 0 were reformulated (v
U
q
i
,i
constant,
for u
q
i
,i
respectively).For the distance dist
i
1
,i
2
between two objects as an auxiliary state
variable the columnsum norm was used.
3 Transforming the model into a mixedinteger linear program
For each knot and each edge of the (hierarchical) automaton a timedependent binary variable
b(t) is introduced so that b = 1 iﬀ the state or edge is active.The structure then is transcribed
with simple linear inequalities,e.g.b
(2)
(k) +b
(4)
(k) +b
(3)
(k) ≤ 1,b
(2)
(k) +b
(4)
(k +1) ≤ 1,etc.
Logical relations combined with inequalities are translated using the ’BigM’technique (cf.
[1]).Thus ﬂows and invariants get connected with the respective binary variable,e.g.
IF b
(3)
= 1 THEN v
x,B
(k +1) = c
trac
v
x,B
(k) ⇔
(
(1 −b
(3)
)m≤ v
x,B
(k +1) −c
trac
v
x,B
(k)
v
x,B
(k +1) −c
trac
v
x,B
(k) ≤ (1 −b
(3)
)M
4
M ≥ max{v
x,B
(k +1) −c
trac
v
x,B
(k)},m≤ min{v
x,B
(k +1) −c
trac
v
x,B
(k)} constant.
To rate the quality of a computed attack,we mainly look at the situation at the ﬁnal
time t
N+1
and primarily regard the following components (cf.Fig.2).The positions of the
attacking robots and the ball (x
i
(k),y
i
(k))
T
,distances between the robots,defender and ball
dist
i
1
,i
2
(k) and also the events ”ball in goal ” and ”one robot dribbling” b
(5)
,b
(2)
,b
(4)
.With
carefully determined coeﬃcients then the objective function J is implemented as a weighted
sum.Due to remaining degrees of freedom,u
i
(k) is further added to it.The intention is to
minimize J where the tactical behavior of the team is varied with the coeﬃcients in J.
4 Optimization
Results for the linear implementation of the proposed benchmark problemwith the parameters
T
s
= 0.8,N = 10,x
field
= 270,y
field
= 180,y
goal
= 40,ε
dr
= 5,
γ
2,D
= 30,γ
B,D
= 30,D
max
= 700,c
trac
= 0.88,v
U
B,x
= 135,c
v,dr
= 60.7,
v
U
§,y
= 45,u
U
§,x
= 40,u
U
§,y
= 40,v
U
§,y
= 90,v
U
§,x
= 45,
and the objective function
J =
N
X
k=1
0
@
−0.2dist
B,D
−320b
(5)
+0.001
X
i∈{1,2}
( u
x,i
 +u
y,i
 )
1
A
¯
¯
¯
¯
¯
¯
t=k
+ (6)
0
@
−x
B
+0.6y
B
 −0.15 dist
B,D
−0.01dist
1,2
−160 (b
(2)
+b
(4)
) +0.02
X
i∈{1,2}
(−x
i
+y
i
)
1
A
¯
¯
¯
¯
¯
¯
t=N+1
.
are given.The MILP was solved with CPLEX 10.0 (from ILOG,Inc.) on a PC (Intel(R)
Pentium(R) M processor 1.86GHz;1024 MB RAM) in 30 sec (see Fig.4 for details).
5 Conclusion and outlook
A MILP formulation has been developed which accounts for the tight coupling of discrete
decisions and continuous ﬂowvariables in optimal control of cooperative mobile robot systems.
A consistent modeling towards a linearized formulation was shown.The numerical approach
is applicable to a wide range of scenarios.Ongoing work considers techniques to improve the
MILP by decoupling and additional constraints.Also various methods to linearize the given
nonlinear description more systematically are investigated.MILPmodels can cope well with
the nonconvexities,the combinatorial character and their eﬃciency hardly depends on initial
guesses.They are therefore well suited for repeated application to account for changes in
uncertain environments.
Acknowledgment.Parts of this research have been supported by the German Research
Foundation (DFG) within the Research Training Group 1362 “Cooperative,adaptive and
responsive monitoring in mixed mode environments”.
5
0
2
4
6
8
100
0
100
200
t
xi(t)
0
2
4
6
8
40
20
0
20
40
60
t
vi,x
(t)
0
2
4
6
8
40
20
0
20
40
t
ui,x(t)
0
2
4
6
8
100
0
100
200
t
yi(t)
0
2
4
6
8
40
20
0
20
40
60
t
vi,y
(t)
0
2
4
6
8
40
20
0
20
40
t
ui,y(t)
300
200
100
0
100
200
300
200
150
100
50
0
50
100
150
200
300
200
100
0
100
200
300
200
150
100
50
0
50
100
150
200
300
200
100
0
100
200
300
200
150
100
50
0
50
100
150
200
Figure 4:First two rows:Optimal positions,velocities and controls for the attackers (
)
and the ball (
).Third row:Computed optimal behavior shown at timesteps k = 1,7,11.
Attacker 1 goes to ball,dribbles and kicks it towards the penalty area.Attacker 2 catches it
there and dribbles to a promising position.The defender (
) follows the ball.
References
[1] A.Bemporad and M.Morari.Control of systems integrating logic,dynamics,and con
straints.Automatica,35(3):407–427,1999.
[2] F.Borrelli,D.Subramanian,A.U.Raghunathan,and L.T.Biegler.MILP and NLP
techniques for centralized trajectory planning of multiple unmanned air vehicles.American
Control Conference,June 2006.
[3] M.G.Earl and R.D’Andrea.Iterative MILP methods for vehicle control problems.In
IEEE Conference on Decision and Control,volume 4,pages 4369 – 4374,December 2004.
[4] M.Glocker,C.Reinl,and O.von Stryk.Optimal task allocation and dynamic trajectory
planning for multivehicle systems using nonlinear hybrid optimal control.In Proc.1st
IFACSymposium on Multivehicle Systems,pages 38–43,Salvador,Brazil,2006.
[5] T.A.Henzinger.The theory of hybrid automata.In M.K.Inan and R.P.Kurshan,editors,
Veriﬁcation of Digital and Hybrid Systems,NATO ASI Series F:Computer and Systems
Sciences 170,pages 265–292.Springer,2000.
6
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Commentaires 0
Connectezvous pour poster un commentaire