K-closest neighbor strategies.

Uploaded on:
Category: Sales / Marketing
Framework answered with a rundown of 500 motion pictures to rate on a 1-10 scale (250 arbitrary, 250 famous) ... Most comparative clients are utilized to foresee scores for unrated motion pictures (all the more later) ...
Slide 1

K-closest neighbor strategies William Cohen 10-601 April 2008

Slide 2

But first… .

Slide 3

Onward: multivariate direct relapse col is highlight Multivariate Univariate column is case

Slide 4


Slide 7

ACM Computing Surveys 2002

Slide 9

Review of K-NN techniques (in this way)

Slide 10

Kernel relapse otherwise known as privately weighted relapse, locally straight relapse, LOESS, … What does making the piece more extensive do to predisposition and difference?

Slide 11

BellCore\'s MovieRecommender Participants sent email to videos@bellcore.com System answered with a rundown of 500 films to rate on a 1-10 scale (250 irregular, 250 well known) Only subset should be evaluated New member P sends in appraised motion pictures through email System looks at appraisals for P to evaluations of (an arbitrary example of) past clients Most comparative clients are utilized to foresee scores for unrated motion pictures (all the more later) System returns suggestions in an email message.

Slide 12

Suggested Videos for: John A. Jamus. Your must-see list with anticipated appraisals: 7.0 "Alien (1979)" 6.5 "Blade Runner" 6.2 "Close Encounters Of The Third Kind (1977)" Your video classifications with normal evaluations: 6.7 "Action/Adventure" 6.5 "Science Fiction/Fantasy" 6.3 "Children/Family" 6.0 "Mystery/Suspense" 5.9 "Comedy" 5.8 "Drama"

Slide 13

The survey examples of 243 viewers were counseled. Examples of 7 viewers were observed to be generally comparative. Relationship with target viewer: 0.59 viewer-130 (unlisted@merl.com) 0.55 bullert,jane r (bullert@cc.bellcore.com) 0.51 jan_arst (jan_arst@khdld.decnet.philips.nl) 0.46 Ken Cross (moose@denali.EE.CORNELL.EDU) 0.42 rskt (rskt@cc.bellcore.com) 0.41 kkgg (kkgg@Athena.MIT.EDU) 0.41 bnn (bnn@cc.bellcore.com) By classification, their joint appraisals suggest: Action/Adventure: "Excalibur" 8.0, 4 viewers "Apocalypse Now" 7.2, 4 viewers "Platoon" 8.3, 3 viewers Science Fiction/Fantasy: "Total Recall" 7.2, 5 viewers Children/Family: "Wizard Of Oz, The" 8.5, 4 viewers "Mary Poppins" 7.7, 3 viewers Mystery/Suspense: "Silence Of The Lambs, The" 9.3, 3 viewers Comedy: "National Lampoon\'s Animal House" 7.5, 4 viewers "Driving Miss Daisy" 7.5, 4 viewers "Hannah and Her Sisters" 8.0, 3 viewers Drama: "It\'s A Wonderful Life" 8.0, 5 viewers "Dead Poets Society" 7.0, 5 viewers "Rain Man" 7.5, 4 viewers Correlation of anticipated evaluations with your genuine evaluations is: 0.64 This number measures capacity to assess motion pictures precisely for you. 0.15 means low capacity. 0.85 means great capacity. 0.50 means reasonable capacity.

Slide 14

Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98) v i,j = vote of client i on thing j i = things for which client i has voted Mean vote in favor of i is Predicted vote in favor of "dynamic client" an is weighted total weights of n comparable clients normalizer

Slide 15

Basic k-closest neighbor arrangement Training technique: Save the preparation case At expectation time: Find the k preparing illustrations (x 1 ,y 1 ),… (x k ,y k ) that are nearest to the test case x Predict the most successive class among those y i \'s. Illustration: http://cgm.cs.mcgill.ca/~soss/cs644/ventures/simard/

Slide 16

What is the choice limit? Voronoi graph

Slide 17

Convergence of 1-NN x 2 P(Y|x\'\') P(Y|x) x y 2 neighbor y x 1 P(Y|x 1 ) y 1 accept measure up to let y*=argmax Pr(y|x)

Slide 18

Basic k-closest neighbor characterization Training technique: Save the preparation case At forecast time: Find the k preparing illustrations (x 1 ,y 1 ),… (x k ,y k ) that are nearest to the test case x Predict the most incessant class among those y i \'s. Changes: Weighting case from the area Measuring " closeness " Finding "close" case in an expansive preparing set rapidly

Slide 19

K-NN and insignificant components ? + o + o + o + o +

Slide 20

K-NN and insignificant components + o + ? o + o + o + o + o

Slide 21

+ o + o + o + o + o + o K-NN and unessential elements ?

Slide 22

Ways of rescaling for KNN Normalized L1 separation: Scale by IG: Modified worth separation metric:

Slide 23

Ways of rescaling for KNN Dot item: Cosine separation: TFIDF weights for content: for doc j, highlight i: x i =tf i,j * idf i : #docs in corpus #occur. of term i in doc j #docs in corpus that contain term i

Slide 24

Combining separations to neighbors Standard KNN: Distance-weighted KNN:

Slide 27

William W. Cohen & Haym Hirsh (1998): Joins that Generalize: Text Classification Using WHIRL in KDD 1998: 169-173 .

Slide 30

M1 M2 Vitor Carvalho and William W. Cohen (2008): Ranking Users for Intelligent Message Addressing in ECIR-2008 , and current work with Vitor, me, and Ramnath Balasubramanyan

Slide 31

Computing KNN: advantages and disadvantages Storage: all preparation illustrations are spared in memory A choice tree or direct classifier is much littler Time: to characterize x , you have to circle over all preparation cases (x\',y\') to process separation amongst x and x\'. Be that as it may, you get expectations for each class y KNN is decent when there are numerous classes Actually, there are a few traps to speed this up… particularly when information is meager (e.g., content)

Slide 32

Efficiently actualizing KNN (for content) IDF is pleasant computationally

Slide 33

Tricks with quick KNN K-implies utilizing r-NN Pick k focuses c 1 =x 1 ,… .,c k =x k as places For every x i , discover D i =Neighborhood( x i ) For every x i , let c i =mean(D i ) Go to step 2… .

Slide 34

Efficiently actualizing KNN d j3 Selective order: given a preparation set and test set, discover the N experiments that you can most certainly group d j2 d j4

Slide 35

Train once and select 100 experiments to characterize

View more...