The most effective method to Win a Chinese Chess Diversion.


52 views
Uploaded on:
Description
f1. t. Exhibit of Weights. 1.0101. 1.0000. 1.0000. 0.9987. 1.0000. 1.0000 ... a PC system will beat the
Transcripts
Slide 1

How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju

Slide 2

Set Up RIVER

Slide 3

General

Slide 4

Guard

Slide 5

Minister

Slide 6

Rook

Slide 7

Knight

Slide 8

Cannon

Slide 9

Pawn

Slide 10

Training to what extent does it to take for a human? to what extent does it to take for a PC? Chess program, "KnightCap", utilized TD to take in its assessment capacity while playing on the Free Internet Chess Server (FICS, fics.onenet.net), enhanced from a 1650 rating to a 2100 rating (the level of US Master, best on the planet are appraising around 2900) in only 308 amusements and 3 days of play.

Slide 11

Training to play a progression of amusements in a self-play learning mode utilizing transient distinction taking in The objective is to take in some basic systems piece values or weights

Slide 12

Why Temporal Difference Learning the normal stretching component for the diversion tree is as a rule around 30 the normal diversion keeps going around 100 handle the measure of a diversion tree is 30 100

Slide 13

Searching alpha-beta hunt 3 employ seek versus 4 employ look skyline impact calmness cutoff seek

Slide 14

Horizon Effect t t+1 t+2 t+3

Slide 15

Evaluation Function highlight property of the diversion highlight evaluators Rook, Knight, Cannon , Minister, Guard, and Pawn weight: the estimation of a particular piece sort highlight capacity: f give back the momentum player\'s piece preference on a scale from - 1 to 1 assessment capacity: Y = ∑ k=1 to 7 w k * f k

Slide 16

TD( λ ) and Updating the Weights w i, t+1 = w i, t + a (Y t+1 – Y t ) S k=1 to t l t-k∆ w i Y k = w i, t + a (Y t+1 – Y t )(f i, t + l f i, t-1 + l 2 f i, t-2 + … + l t-1 f i, 1 ) = 0.01 learning rate –how rapidly the weights can change = 0.01 input coefficient - the amount to rebate past qualities

Slide 17

Features Table Array of Weights

Slide 18

Example t=5 t=6 t=7 t-8

Slide 19

Final Reward failure if is a draw, the last reward is 0 if the load up assessment is negative, then the last reward is double the load up if the load up assessment is certain, then the last reward is - 2 times the load up assessment victor if is a draw, the last reward is 0 if the load up assessment is negative, then the last reward is - 2 times the load up assessment if the load up assessment is sure, then the last reward is double the load up assessment

Slide 20

Final Reward the weights are standardized by isolating by the best weight any negative weights are set to zero the most profitable piece has weight 1

Slide 21

Summary of Main Events Red\'s turn Update weights for Red utilizing TD( λ ) Red does alpha-beta inquiry. Red executes the best move discovered Blue\'s turn Update weights for Blue utilizing TD( λ ) Blue does alpha-beta hunt Blue executes the best move found (go to 1)

Slide 22

After the Game Ends Calculate and allot last reward for losing player Calculate and dole out definite prize for winning player Normalize the weights somewhere around 0 and 1

Slide 23

Results 10 diversions arrangement 100 amusements arrangement learned weights are continued into the following arrangement started with all weights introduced to 1 The objective is to take in the distinctive the piece values that is near the default values characterized by H.T. Lau or far superior

Slide 24

Observed Behavior the early stages played pretty haphazardly after 20 diversions had distinguished the most profitable piece – Rook after 250 recreations played better securing the significant pieces, and attempting to catch an important piece

Slide 25

Weights

Slide 26

Testing self-play amusements Red played utilizing the scholarly weights after 250 diversions Blue utilized H.T. Lau\'s likeness the weights 5 amusements red won 3 blue won once one draw

Slide 27

Future Works 8 distinct sorts or "categories" of components: Piece Values Comparative Piece Advantage Mobility Board Position Piece Proximity Time Value of Pieces Piece Combinations Piece Configurations

Slide 28

Examples

Slide 29

Cannon behind Knight

Slide 30

Conclusion Computer Chinese chess has been concentrated on for over a quarter century. As of late, because of the progression of AI examines and improvement of PC equipment in both productivity and limit, some Chinese chess programs with fantastic expert level (around 6-dan in Taiwan) have been effectively created. Teacher Shun-Chin Hsu of Chang-Jung University (CJU), who has included in the advancement of PC Chinese chess programs for quite a while of period, focuses out that "the quality of Chinese chess programs build 1-dan like clockwork." He likewise predicts that a PC system will beat the "best on the planet of Chinese chess" before 2012.

Slide 31

When and What 2004 World Computer Chinese Chess Championship Competition Dates :  June 25-26, 2004 Prizes : (1) First Place USD 1,500 A gold award (2) Second Place USD 900 A silver decoration (3) Third Place USD 600 A bronze decoration (4) Fourth Place USD 300

Slide 32

References C. Szeto. Chinese Chess and Temporal Difference Learning J. Baxter. KnightCap: A chess program that learns by joining TD( λ ) with minimax look T. Trinh. Worldly Difference Learning in Chinese Chess http://chess.ncku.edu.tw/index.html

Recommended
View more...