Versatile Intelligent Mobile Robotics William D. Savvy, Presenter Leslie Pack Kaelbling, PI Artificial Intelligence Laboratory MITSlide 2
Progress to Date Fast bootstrapped support learning algorithmic systems demo on robot Optical-stream based route stream calculation actualized pilot route investigates robot pilot route analyzes in reenactment testbedSlide 3
Making RL Really Work Typical RL techniques require to an extreme degree an excessive amount of information to be handy in a web based setting. Address the issue by solid speculation strategies utilizing human contribution to bootstrap Let people do what they\'re great at Let learning calculations do what they\'re great atSlide 4
JAQL Learning an esteem work in a persistent state and activity space in view of privately weighted relapse (favor variant of closest neighbor) calculation knows what it knows utilize meta-information to be preservationist about element programming redesignsSlide 5
Problems with Q-Learning on Robots Huge state spaces/meager information Continuous states and activities Slow to proliferate values Safety amid investigation Lack of beginning informationSlide 6
Value Function Approximation Use a capacity approximator rather than a table speculation manages nonstop spaces and activities Q-learning with VFA has been appeared to veer, even in benevolent cases Which work approximator would it be a good idea for us to use to minimize issues? s Q(s,a) F aSlide 7
Locally Weighted Regression Store every past dat focuses Given a question point, discover k closest focuses Fit a locally straight model to these focuses, giving nearer ones more weight Use KD-trees to make queries more productive Fast gaining from a solitary information pointSlide 8
Locally Weighted Regression Original capacitySlide 9
Locally Weighted Regression Bandwidth = 0.1, 500 preparing focusesSlide 10
Problems with Approximate Q-Learning Errors are opened up by reinforcementsSlide 11
One Source of ErrorsSlide 12
Independent Variable Hull Interpolation is sheltered; extrapolation is not, so develop body around known focuses do neighborhood relapse if the inquiry point is inside the structure give a default expectation if notSlide 13
Recap Use LWR to speak to the esteem work speculation nonstop spaces Use IVH and "don\'t have a clue" moderate forecasts more secure reinforcementsSlide 14
Incorporating Human Input Humans can help a great deal, regardless of the possibility that they can\'t play out the assignment extremely well. Give some underlying fruitful directions through the space Trajectories are not utilized for administered adapting, but rather to control the support learning techniques through helpful parts of the space Learn models of the elements of the world and of the reward structure Once learned models are great, utilize them to upgrade the esteem capacity and approach too.Slide 15
Give Some Trajectories Supply an illustration strategy Need not be ideal and may be wrong Code or human-controlled Used to produce encounter Follow case arrangement and record encounters Shows learner "intriguing" parts of the space "Terrible" starting approaches may be betterSlide 16
Environment Supplied Control Policy Two Learning Phases Phase One R O A Learning SystemSlide 17
Environment Supplied Control Policy Two Learning Phases Phase Two R O A Learning SystemSlide 18
What does this Give Us? Normal approach to embed human information Keeps robot safe in early phases of learning Bootstraps data into the Q-workSlide 19
Experimental Results: Corridor-FollowingSlide 20
Corridor-Following 3 nonstop state measurements hall edge balance from center separation to end of hallway 1 consistent activity measurement pivot speed Supplied illustration arrangement Average 110 stages to objectiveSlide 21
Corridor-Following Experimental setup Initial preparing runs begin from generally the center of the passageway Translation speed has an altered arrangement Evaluation on various set beginning stages Reward 10 at end of hallway 0 wherever elseSlide 22
Corridor-Following Phase 1 Phase 2 Average preparing "Best" conceivableSlide 23
Corridor Following: Initial PolicySlide 24
Corridor Following: After Phase 1Slide 25
Corridor Following: After Phase 1Slide 26
Corridor Following: After Phase 2Slide 27
Conclusions VFA can be made more steady Locally weighted relapse Independent variable frame Conservative reinforcements Bootstrapping esteem work truly initials supplied directions Two learning stagesSlide 28
Optical Flow Get run data outwardly by processing optical stream field closer protests cause stream of higher size development design means you\'re going to hit rate of extension lets you know when rich control laws in view of focus and rate of extension (got from human and fly conduct)Slide 29
Approaching a WallSlide 30
Balance Strategy Simple impediment shirking procedure figure stream field register normal size of stream in each hemi-field move in the opposite direction of the agree with higher greatness (since it has nearer questions)Slide 31
Balance Strategy in real lifeSlide 32
Crystal SpaceSlide 33
Crystal SpaceSlide 34
Crystal SpaceSlide 35
Next Steps Extend RL engineering to incorporate model-learning and arranging Apply RL methods to tune parameters in optical-stream Build topological maps utilizing visual data Build exceptionally complex reenacted environment Integrate arranging and learning in multi-layer framework
Customary mechanical robot control utilizes robot arms and to a great extent pre-registered ... ...
try not to adjust to changing client conduct and gadget modalities ... Client study on portable ...
At the point when a picture is recorded through a camera, a 3-D scene is anticipated onto a 2-D ...
Insight can't exist without cognizance. Fake cognizance: that sounds ... The term Artificial Int ...
Data Systems for Managers - Artificial Intelligence ... Data Systems for Managers - Artificial I ...
furthermore the freebee on Video Game AI. Computer game AI. Computer game AI specialists. both t ...
Requirement writing computer programs is not limited to CLP. These thoughts were brought togethe ...
f1(s) = (number of white rulers) (number of dark rulers), and so on ... champions decline to go ...
c) MAX has a static board assessment work that profits. greater qualities if a board is good to ...
Water game is appreciated just under the first's states thing: ... Water game is delighted in ju ...
Paula Matuszek, CSC 8520, Fall 2005. In light of aima.eecs.berkeley.edu/slides-ppt/m2-agents.ppt ...
All You Really Need to Know about Computer Science Was Learned Pursuing Artificial Intelligenc ...
23 March 2004. AAAI Spring Symposium 2004. 2. Deliberative Navigation Exercise. Understudies oug ...
22 March 2004. AAAI Spring Symposium 2004. 2. Confinement. Talked about with understudies afterD ...