1 / 51

# Web Sampling for Markov Decision Processes .

78 views
Description
Sway Givan Joint work w/E. K. P. Chong, H. Chang, G. Wu. Internet Inspecting for Markov Choice Procedures. Electrical and PC Designing Purdue College. Markov Choice Procedure (MDP). Fixings: Framework state x in state space X Control activity an in A ( x ) Reward R ( x,a )
Transcripts
Slide 1

﻿Sway Givan Joint work w/E. K. P. Chong, H. Chang, G. Wu Online Sampling for Markov Decision Processes Electrical and Computer Engineering Purdue University

Slide 2

Markov Decision Process (MDP) Ingredients: System state x in state space X Control activity an in A ( x ) Reward R ( x,a ) State-move likelihood P ( x,y,a ) Find control approach to expand target fun Bob Givan Electrical and Computer Engineering Purdue University

Slide 3

Optimal Policies Policy – mapping from state and time to activities Stationary Policy – mapping from state to activities Goal – a strategy augmenting the target work V H * ( x 0 ) = max Obj [ R ( x 0 ,a 0 ) , … , R ( x H - 1 ,a H - 1 )] where the "maximum" is over all arrangements u = u 0 ,… ,u H-1 For vast H , a 0 free of H . (w/ergodicity assum.) Stationary ideal activity a 0 for H =  by means of retreating skyline control Bob Givan Electrical and Computer Engineering Purdue University

Slide 4

Q Values  Fix an expansive H , concentrate on limited skyline remunerate Define Q ( x,a ) = R ( x,a ) + E [ V H - 1 * ( y )] "Utility" of activity an at state x. Name: Q-estimation of activity an at state x. Key characters (Bellman\'s conditions): V H *(x) = max a Q(x,a)  0 * (x) = argmax a Q(x,a) Bob Givan Electrical and Computer Engineering Purdue University

Slide 5

Solution Methods Recall: u 0 * ( x ) = argmax a Q ( x,a ) Q ( x,a ) = R ( x,a ) + E [ V H - 1 * ( y )] Problem: Q-esteem relies on upon ideal strategy. State space is to a great degree vast (regularly constant) Two-pronged arrangement approach: Apply a retreating skyline strategy Estimate Q-values by means of reproduction/inspecting Bob Givan Electrical and Computer Engineering Purdue University

Slide 6

Methods for Q-esteem Estimation Previous work by different creators: Unbiased examining (correct Q esteem) [Kearns et al., IJCAI-99] Policy rollout (bring down bound) [Bertsekas & Castanon, 1999] Our strategies: Hindsight advancement (upper bound) Parallel rollout (bring down bound) Bob Givan Electrical and Computer Engineering Purdue University

Slide 7

Expectimax Tree for V * Bob Givan Electrical and Computer Engineering Purdue University

Slide 8

Unbiased Sampling Bob Givan Electrical and Computer Engineering Purdue University

Slide 9

Unbiased Sampling (Cont\'d) For a given sought precision, how extensive ought to testing width and profundity be? Replied: Kearns, Mansour, and Ng (1999) Requires restrictive inspecting width and profundity e.g. C  10 8 , H s > 60 to recognize "best" and "most exceedingly bad" strategies in our booking space We assess with littler width and profundity Bob Givan Electrical and Computer Engineering Purdue University

Slide 10

How to Look Deeper? Bounce Givan Electrical and Computer Engineering Purdue University

Slide 11

Policy Roll-out Bob Givan Electrical and Computer Engineering Purdue University

Slide 12

Policy Rollout in Equations Write V H u ( y ) for the benefit of taking after approach u Recall: Q ( x,a ) = R ( x,a ) + E [ V H-1 * ( y )] = R ( x,a ) + E [max u V H-1 u ( y )] Given a base arrangement u, utilize R ( x,a ) + E [ V H-1 u ( y )] as a lower bound gauge of Q-esteem. Coming about arrangement is PI( u ), given limitless inspecting Bob Givan Electrical and Computer Engineering Purdue University

Slide 13

Policy Roll-out (cont\'d) Bob Givan Electrical and Computer Engineering Purdue University

Slide 14

Parallel Policy Rollout Generalization of strategy rollout, due to [Chang, Givan, and Chong, 2000] Given a set U of base approaches , utilize R ( x,a ) + E [max u ∊ U V H - 1 u ( y )] as a gauge of Q-esteem More precise gauge than approach rollout Still gives a lower bound to genuine Q-esteem Still gives an approach no more terrible than any in U Bob Givan Electrical and Computer Engineering Purdue University

Slide 15

Hindsight Optimization – Tree View Bob Givan Electrical and Computer Engineering Purdue University

Slide 16

Hindsight Optimization – Equations Swap Max and Exp in expectimax tree. Take care of each disconnected advancement issue O (kC\' • f(H)) time where f(H) is the disconnected issue multifaceted nature Jensen\'s imbalance infers upper limits Bob Givan Electrical and Computer Engineering Purdue University

Slide 17

Hindsight Optimization (Cont\'d) Bob Givan Electrical and Computer Engineering Purdue University

Slide 18

Application to Example Problems Apply impartial examining, arrangement rollout, parallel rollout, and insight into the past streamlining to: Multi-class due date booking Random early dropping Congestion control Bob Givan Electrical and Computer Engineering Purdue University

Slide 19

Basic Approach Traffic display gives a stochastic portrayal of conceivable future results Method Formulate organize choice issues as POMDPs by fusing activity demonstrate Solve conviction state MDP web based utilizing testing (pick time-scale to take into account calculation time) Bob Givan Electrical and Computer Engineering Purdue University

Slide 20

Domain 1: Deadline Scheduling Objective: Minimize weighted misfortune Bob Givan Electrical and Computer Engineering Purdue University

Slide 21

Domain 2: Random Early Dropping Objective: Minimize delay without relinquishing throughput Bob Givan Electrical and Computer Engineering Purdue University

Slide 22

Domain 3: Congestion Control Bob Givan Electrical and Computer Engineering Purdue University

Slide 23

Traffic Modeling A Hidden Markov Model (HMM) for each source Note: state is concealed, model is in part watched Bob Givan Electrical and Computer Engineering Purdue University

Slide 24

Deadline Scheduling Results Non-inspecting Policies: EDF: most punctual due date first. Due date touchy, class uncaring. SP: static need. Due date obtuse, class delicate. CM: current minloss [Givan et al., 2000] Deadline and class delicate. Limits weighted misfortune for the present parcels. Sway Givan Electrical and Computer Engineering Purdue University

Slide 25

Deadline Scheduling Results Objective: limit weighted misfortune Comparison: Non-testing arrangements Unbiased inspecting (Kearns et al.) Hindsight advancement Rollout with CM as base approach Parallel rollout Results because of H. S. Chang Bob Givan Electrical and Computer Engineering Purdue University

Slide 26

Deadline Scheduling Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 27

Deadline Scheduling Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 28

Deadline Scheduling Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 29

Random Early Dropping Results Objective: limit postpone subject to throughput misfortune resilience Comparison: Candidate strategies: RED and "cradle k " KMN-examining Rollout of support k Parallel rollout Hindsight enhancement Results because of H. S. Chang. Weave Givan Electrical and Computer Engineering Purdue University

Slide 30

Random Early Dropping Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 31

Random Early Dropping Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 32

Congestion Control Results MDP Objective: limit weighted entirety of throughput, postponement, and misfortune rate Fairness is hard-wired Comparisons: PD-k (corresponding subordinate with k target line) Hindsight enhancement Rollout of PD-k == parallel rollout Results because of G. Wu, in advance Bob Givan Electrical and Computer Engineering Purdue University

Slide 33

Congestion Control Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 34

Congestion Control Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 35

Congestion Control Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 36

Congestion Control Results Bob Givan Electrical and Computer Engineering Purdue University

Slide 37

Results Summary Unbiased examining can\'t adapt Parallel rollout wins in 2 spaces Not generally equivalent to straightforward rollout of one base approach Hindsight enhancement wins in 1 area Simple strategy rollout – the least expensive technique Poor in area 1 Strong in space 2 with best base arrangement – yet how to discover this strategy? So-so in space 3 with any base arrangement Bob Givan Electrical and Computer Engineering Purdue University

Slide 38

Talk Summary Case investigation of MDP inspecting techniques New strategies offering functional upgrades Parallel strategy rollout Hindsight improvement Systematic techniques for utilizing movement models to help settle on system control choices Feasibility of continuous execution relies on upon issue timescale Bob Givan Electrical and Computer Engineering Purdue University

Slide 39

Ongoing Research Apply to other control issues (diverse timescales): Admission/get to control QoS steering Link transmission capacity designation Multiclass association administration Probl

Recommended
View more...