Description

Sway Givan Joint work w/E. K. P. Chong, H. Chang, G. Wu. Internet Inspecting for Markov Choice Procedures. Electrical and PC Designing Purdue College. Markov Choice Procedure (MDP). Fixings: Framework state x in state space X Control activity an in A ( x ) Reward R ( x,a )

Transcripts

Sway Givan Joint work w/E. K. P. Chong, H. Chang, G. Wu Online Sampling for Markov Decision Processes Electrical and Computer Engineering Purdue University

Markov Decision Process (MDP) Ingredients: System state x in state space X Control activity an in A ( x ) Reward R ( x,a ) State-move likelihood P ( x,y,a ) Find control approach to expand target fun Bob Givan Electrical and Computer Engineering Purdue University

Optimal Policies Policy – mapping from state and time to activities Stationary Policy – mapping from state to activities Goal – a strategy augmenting the target work V H * ( x 0 ) = max Obj [ R ( x 0 ,a 0 ) , … , R ( x H - 1 ,a H - 1 )] where the "maximum" is over all arrangements u = u 0 ,… ,u H-1 For vast H , a 0 free of H . (w/ergodicity assum.) Stationary ideal activity a 0 for H = by means of retreating skyline control Bob Givan Electrical and Computer Engineering Purdue University

Q Values Fix an expansive H , concentrate on limited skyline remunerate Define Q ( x,a ) = R ( x,a ) + E [ V H - 1 * ( y )] "Utility" of activity an at state x. Name: Q-estimation of activity an at state x. Key characters (Bellman\'s conditions): V H *(x) = max a Q(x,a) 0 * (x) = argmax a Q(x,a) Bob Givan Electrical and Computer Engineering Purdue University

Solution Methods Recall: u 0 * ( x ) = argmax a Q ( x,a ) Q ( x,a ) = R ( x,a ) + E [ V H - 1 * ( y )] Problem: Q-esteem relies on upon ideal strategy. State space is to a great degree vast (regularly constant) Two-pronged arrangement approach: Apply a retreating skyline strategy Estimate Q-values by means of reproduction/inspecting Bob Givan Electrical and Computer Engineering Purdue University

Methods for Q-esteem Estimation Previous work by different creators: Unbiased examining (correct Q esteem) [Kearns et al., IJCAI-99] Policy rollout (bring down bound) [Bertsekas & Castanon, 1999] Our strategies: Hindsight advancement (upper bound) Parallel rollout (bring down bound) Bob Givan Electrical and Computer Engineering Purdue University

Expectimax Tree for V * Bob Givan Electrical and Computer Engineering Purdue University

Unbiased Sampling Bob Givan Electrical and Computer Engineering Purdue University

Unbiased Sampling (Cont\'d) For a given sought precision, how extensive ought to testing width and profundity be? Replied: Kearns, Mansour, and Ng (1999) Requires restrictive inspecting width and profundity e.g. C 10 8 , H s > 60 to recognize "best" and "most exceedingly bad" strategies in our booking space We assess with littler width and profundity Bob Givan Electrical and Computer Engineering Purdue University

How to Look Deeper? Bounce Givan Electrical and Computer Engineering Purdue University

Policy Roll-out Bob Givan Electrical and Computer Engineering Purdue University

Policy Rollout in Equations Write V H u ( y ) for the benefit of taking after approach u Recall: Q ( x,a ) = R ( x,a ) + E [ V H-1 * ( y )] = R ( x,a ) + E [max u V H-1 u ( y )] Given a base arrangement u, utilize R ( x,a ) + E [ V H-1 u ( y )] as a lower bound gauge of Q-esteem. Coming about arrangement is PI( u ), given limitless inspecting Bob Givan Electrical and Computer Engineering Purdue University

Policy Roll-out (cont\'d) Bob Givan Electrical and Computer Engineering Purdue University

Parallel Policy Rollout Generalization of strategy rollout, due to [Chang, Givan, and Chong, 2000] Given a set U of base approaches , utilize R ( x,a ) + E [max u ∊ U V H - 1 u ( y )] as a gauge of Q-esteem More precise gauge than approach rollout Still gives a lower bound to genuine Q-esteem Still gives an approach no more terrible than any in U Bob Givan Electrical and Computer Engineering Purdue University

Hindsight Optimization – Tree View Bob Givan Electrical and Computer Engineering Purdue University

Hindsight Optimization – Equations Swap Max and Exp in expectimax tree. Take care of each disconnected advancement issue O (kC\' • f(H)) time where f(H) is the disconnected issue multifaceted nature Jensen\'s imbalance infers upper limits Bob Givan Electrical and Computer Engineering Purdue University

Hindsight Optimization (Cont\'d) Bob Givan Electrical and Computer Engineering Purdue University

Application to Example Problems Apply impartial examining, arrangement rollout, parallel rollout, and insight into the past streamlining to: Multi-class due date booking Random early dropping Congestion control Bob Givan Electrical and Computer Engineering Purdue University

Basic Approach Traffic display gives a stochastic portrayal of conceivable future results Method Formulate organize choice issues as POMDPs by fusing activity demonstrate Solve conviction state MDP web based utilizing testing (pick time-scale to take into account calculation time) Bob Givan Electrical and Computer Engineering Purdue University

Domain 1: Deadline Scheduling Objective: Minimize weighted misfortune Bob Givan Electrical and Computer Engineering Purdue University

Domain 2: Random Early Dropping Objective: Minimize delay without relinquishing throughput Bob Givan Electrical and Computer Engineering Purdue University

Domain 3: Congestion Control Bob Givan Electrical and Computer Engineering Purdue University

Traffic Modeling A Hidden Markov Model (HMM) for each source Note: state is concealed, model is in part watched Bob Givan Electrical and Computer Engineering Purdue University

Deadline Scheduling Results Non-inspecting Policies: EDF: most punctual due date first. Due date touchy, class uncaring. SP: static need. Due date obtuse, class delicate. CM: current minloss [Givan et al., 2000] Deadline and class delicate. Limits weighted misfortune for the present parcels. Sway Givan Electrical and Computer Engineering Purdue University

Deadline Scheduling Results Objective: limit weighted misfortune Comparison: Non-testing arrangements Unbiased inspecting (Kearns et al.) Hindsight advancement Rollout with CM as base approach Parallel rollout Results because of H. S. Chang Bob Givan Electrical and Computer Engineering Purdue University

Deadline Scheduling Results Bob Givan Electrical and Computer Engineering Purdue University

Deadline Scheduling Results Bob Givan Electrical and Computer Engineering Purdue University

Deadline Scheduling Results Bob Givan Electrical and Computer Engineering Purdue University

Random Early Dropping Results Objective: limit postpone subject to throughput misfortune resilience Comparison: Candidate strategies: RED and "cradle k " KMN-examining Rollout of support k Parallel rollout Hindsight enhancement Results because of H. S. Chang. Weave Givan Electrical and Computer Engineering Purdue University

Random Early Dropping Results Bob Givan Electrical and Computer Engineering Purdue University

Random Early Dropping Results Bob Givan Electrical and Computer Engineering Purdue University

Congestion Control Results MDP Objective: limit weighted entirety of throughput, postponement, and misfortune rate Fairness is hard-wired Comparisons: PD-k (corresponding subordinate with k target line) Hindsight enhancement Rollout of PD-k == parallel rollout Results because of G. Wu, in advance Bob Givan Electrical and Computer Engineering Purdue University

Congestion Control Results Bob Givan Electrical and Computer Engineering Purdue University

Congestion Control Results Bob Givan Electrical and Computer Engineering Purdue University

Congestion Control Results Bob Givan Electrical and Computer Engineering Purdue University

Congestion Control Results Bob Givan Electrical and Computer Engineering Purdue University

Results Summary Unbiased examining can\'t adapt Parallel rollout wins in 2 spaces Not generally equivalent to straightforward rollout of one base approach Hindsight enhancement wins in 1 area Simple strategy rollout – the least expensive technique Poor in area 1 Strong in space 2 with best base arrangement – yet how to discover this strategy? So-so in space 3 with any base arrangement Bob Givan Electrical and Computer Engineering Purdue University

Talk Summary Case investigation of MDP inspecting techniques New strategies offering functional upgrades Parallel strategy rollout Hindsight improvement Systematic techniques for utilizing movement models to help settle on system control choices Feasibility of continuous execution relies on upon issue timescale Bob Givan Electrical and Computer Engineering Purdue University

Ongoing Research Apply to other control issues (diverse timescales): Admission/get to control QoS steering Link transmission capacity designation Multiclass association administration Probl