Versatile Intelligent Mobile Robotics William D. Brilliant, Presenter Leslie Pack Kaelbling, PI Artificial Intelligence.

Uploaded on:
Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial Intelligence Laboratory MIT. Progress to Date. Fast bootstrapped reinforcement learning algorithmic techniques demo on robot Optical-flow based navigation flow algorithm implemented
Slide 1

´╗┐Versatile Intelligent Mobile Robotics William D. Savvy, Presenter Leslie Pack Kaelbling, PI Artificial Intelligence Laboratory MIT

Slide 2

Progress to Date Fast bootstrapped support learning algorithmic systems demo on robot Optical-stream based route stream calculation actualized pilot route investigates robot pilot route analyzes in reenactment testbed

Slide 3

Making RL Really Work Typical RL techniques require to an extreme degree an excessive amount of information to be handy in a web based setting. Address the issue by solid speculation strategies utilizing human contribution to bootstrap Let people do what they\'re great at Let learning calculations do what they\'re great at

Slide 4

JAQL Learning an esteem work in a persistent state and activity space in view of privately weighted relapse (favor variant of closest neighbor) calculation knows what it knows utilize meta-information to be preservationist about element programming redesigns

Slide 5

Problems with Q-Learning on Robots Huge state spaces/meager information Continuous states and activities Slow to proliferate values Safety amid investigation Lack of beginning information

Slide 6

Value Function Approximation Use a capacity approximator rather than a table speculation manages nonstop spaces and activities Q-learning with VFA has been appeared to veer, even in benevolent cases Which work approximator would it be a good idea for us to use to minimize issues? s Q(s,a) F a

Slide 7

Locally Weighted Regression Store every past dat focuses Given a question point, discover k closest focuses Fit a locally straight model to these focuses, giving nearer ones more weight Use KD-trees to make queries more productive Fast gaining from a solitary information point

Slide 8

Locally Weighted Regression Original capacity

Slide 9

Locally Weighted Regression Bandwidth = 0.1, 500 preparing focuses

Slide 10

Problems with Approximate Q-Learning Errors are opened up by reinforcements

Slide 11

One Source of Errors

Slide 12

Independent Variable Hull Interpolation is sheltered; extrapolation is not, so develop body around known focuses do neighborhood relapse if the inquiry point is inside the structure give a default expectation if not

Slide 13

Recap Use LWR to speak to the esteem work speculation nonstop spaces Use IVH and "don\'t have a clue" moderate forecasts more secure reinforcements

Slide 14

Incorporating Human Input Humans can help a great deal, regardless of the possibility that they can\'t play out the assignment extremely well. Give some underlying fruitful directions through the space Trajectories are not utilized for administered adapting, but rather to control the support learning techniques through helpful parts of the space Learn models of the elements of the world and of the reward structure Once learned models are great, utilize them to upgrade the esteem capacity and approach too.

Slide 15

Give Some Trajectories Supply an illustration strategy Need not be ideal and may be wrong Code or human-controlled Used to produce encounter Follow case arrangement and record encounters Shows learner "intriguing" parts of the space "Terrible" starting approaches may be better

Slide 16

Environment Supplied Control Policy Two Learning Phases Phase One R O A Learning System

Slide 17

Environment Supplied Control Policy Two Learning Phases Phase Two R O A Learning System

Slide 18

What does this Give Us? Normal approach to embed human information Keeps robot safe in early phases of learning Bootstraps data into the Q-work

Slide 19

Experimental Results: Corridor-Following

Slide 20

Corridor-Following 3 nonstop state measurements hall edge balance from center separation to end of hallway 1 consistent activity measurement pivot speed Supplied illustration arrangement Average 110 stages to objective

Slide 21

Corridor-Following Experimental setup Initial preparing runs begin from generally the center of the passageway Translation speed has an altered arrangement Evaluation on various set beginning stages Reward 10 at end of hallway 0 wherever else

Slide 22

Corridor-Following Phase 1 Phase 2 Average preparing "Best" conceivable

Slide 23

Corridor Following: Initial Policy

Slide 24

Corridor Following: After Phase 1

Slide 25

Corridor Following: After Phase 1

Slide 26

Corridor Following: After Phase 2

Slide 27

Conclusions VFA can be made more steady Locally weighted relapse Independent variable frame Conservative reinforcements Bootstrapping esteem work truly initials supplied directions Two learning stages

Slide 28

Optical Flow Get run data outwardly by processing optical stream field closer protests cause stream of higher size development design means you\'re going to hit rate of extension lets you know when rich control laws in view of focus and rate of extension (got from human and fly conduct)

Slide 29

Approaching a Wall

Slide 30

Balance Strategy Simple impediment shirking procedure figure stream field register normal size of stream in each hemi-field move in the opposite direction of the agree with higher greatness (since it has nearer questions)

Slide 31

Balance Strategy in real life

Slide 32

Crystal Space

Slide 33

Crystal Space

Slide 34

Crystal Space

Slide 35

Next Steps Extend RL engineering to incorporate model-learning and arranging Apply RL methods to tune parameters in optical-stream Build topological maps utilizing visual data Build exceptionally complex reenacted environment Integrate arranging and learning in multi-layer framework

View more...