# Uri Zwick Tel Aviv College - PowerPoint PPT Presentation

Uri Zwick Tel Aviv College

1 / 36

## Uri Zwick Tel Aviv College

##### Presentation Transcript

1. Uri ZwickTel Aviv University Simple Stochastic GamesMean Payoff GamesParity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

2. Simple Stochastic Games Mean Payoff Games Parity Games Randomizedsubexponential algorithm for SSG Deterministicsubexponential algorithm for PG

3. Simple Stochastic Games Mean Payoff Games Parity Games

4. R R R R A simple Simple Stochastic Game

5. min-sink MAX-sink Simple Stochastic game (SSGs)Reachability version[Condon (1992)] R min MAX RAND Two Players: MAX and min Objective:MAX/min the probability of getting to the MAX-sink

6. Simple Stochastic games (SSGs)Strategies A generalstrategy may be randomized and history dependent A positional strategy is deterministicand history independent Positionalstrategy for MAX: choice of an outgoing edge from each MAX vertex

7. Simple Stochastic games (SSGs)Values Every vertex i in the game has a valuevi general positional general positional Both players have positionaloptimal strategies There are strategies that are optimal for every starting position

8. Simple Stochastic game (SSGs)[Condon (1992)] Terminating binary games The outdegrees of all non-sinks are 2 All probabilities are ½. The game terminates with prob. 1 Easy reduction from general gamesto terminating binary games

9. “Solving” terminating binary SSGs The values vi of the vertices of a game are the unique solution of the following equations: The values are rational numbersrequiring only a linear number of bits Corollary: Decision version in NP  co-NP

10. Value iteration (for binary SSGs) Iterate the operator: Converges to the unique solution But, may require an exponentialnumber of iterations to get close

11. Simple Stochastic game (SSGs)Payoff version[Shapley (1953)] R min MAX RAND Limiting average version Discounted version

12. Markov Decision Processes (MDPs) R min MAX RAND Theorem:[Epenoux (1964)] Values and optimal strategies of a MDP can be found by solving an LP

13. SSG  NP  co-NP – Another proof Deciding whether the value of a game isat least (at most) v is in NP  co-NP To show that value  v,guess an optimal strategy  for MAX Find an optimal counter-strategy  for min by solving the resulting MDP. Is the problem in P ?

14. Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)] R min MAX RAND Non-terminating version Discounted version ReachabilitySSGs PayoffSSGs MPGs Pseudo-polynomial algorithm (PZ’96)

15. Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)] Again, both players have optimal positional strategies. Value(σ,) – average of cycle formed

16. Selecting the second largest element with only four storage locations [PZ’96]

17. Parity Games (PGs) A simple example Priorities 2 3 2 1 4 1 EVEN wins if largest priorityseen infinitely often is even

18. 8 3 ODD EVEN Parity Games (PGs) EVEN wins if largest priorityseen infinitely often is even Equivalent to many interesting problemsin automata and verification: Non-emptyness of -tree automata modal -calculus model checking

19. 8 3 ODD EVEN Parity Games (PGs) Mean Payoff Games (MPGs) [Stirling (1993)] [Puri (1995)] Replace priority k by payoff (n)k Move payoffs to outgoing edges

20. Strategy/Policy Iteration Start with some strategy σ (of MAX) While there are improving switches, perform some of them As each step is strictly improving and as there is a finite number of strategies, the algorithm must end with an optimal strategy SSG  PLS (Polynomial Local Search)

21. Strategy/Policy IterationComplexity? Performing only one switch at a time may lead to exponentially many improvements,even for MDPs [Condon (1992)] What happens if we perform all profitable switches[Hoffman-Karp (1966)] ??? Not known to be polynomialO(2n/n) [Mansour-Singh (1999)] No non-linear examples2n-O(1) [Madani (2002)]

22. Arandomizedsubexponentialalgorithm for binary SSGs[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)] Start with an arbitrary strategy  for MAX Choose a random vertex iVMAX Find the optimal strategy ’ for MAX in the gamein which the only outgoing edge of i is (i,(i)) If switching ’ at i is not profitable, then ’ is optimal Otherwise, let  (’)i and repeat

23. Arandomizedsubexponentialalgorithm for binary SSGs[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)] MAX vertices All correct ! Would never be switched ! There is a hidden order of MAX vertices under which the optimal strategy returned by the first recursive call correctly fixes the strategy of MAX at vertices 1,2,…,i

24. The hidden order ui(σ)- the maximum sum of values of a strategy of MAX that agrees with σ on i

25. The hidden order Order the vertices such that Positions 1,..,iwere switchedand would neverbe switched again

26. SSGs are LP-type problems[Halman (2002)] General (non-binary) SSGs can be solved in time Independently observed by[Björklund-Sandberg-Vorobyov (2005)] AUSO – Acyclic Unique Sink Orientations

27. SSGs GPLCP[Gärtner-Rüst (2005)][Björklund-Svensson-Vorobyov (2005)] GPLCPGeneralized Linear ComplementaryProblem with a P-matrix

28. A deterministic subexponential algorithm for parity games Mike PatersonMarcin JurdzinskiUri Zwick

29. Parity Games (PGs) A simple example Priorities 2 3 2 1 4 1 EVEN wins if largest priorityseen infinitely often is even

30. 8 3 ODD EVEN Parity Games (PGs) Mean Payoff Games (MPGs) [Stirling (1993)] [Puri (1995)] Replace priority k by payoff (n)k Move payoffs to outgoing edges

31. Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)] Vertices of highest priority(even) Firstrecursivecall Vertices from whichEVEN can force thegame to enter A Lemma: (i) (ii)

32. Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)] Second recursivecall In the worst case, both recursive calls are on games of size n1

33. Deterministic subexponential alg for PGsJurdzinski, Paterson, Z (2006) Idea:Look for small dominions! Second recursivecall Dominions of size s can be found in O(ns) time Dominion Dominion: A (small) set from which one of the players can win without the play ever leaving this set

34. Open problems • Polynomial algorithms? • Is the Policy Improvement algorithm polynomial? • Faster subexponential algorithmsfor parity games? • Deterministic subexponential algorithmsfor MPGs and SSGs? • Faster pseudo-polynomial algorithmsfor MPGs?