Windows Live Image Search .


44 views
Uploaded on:
Category: Music / Dance
Description
Review. Windows Live Image SearchProblem Definition and BackgroundUser InterfaceArchitectureWhy is it a beta?Questions?. Presentation. Windows Live Image Search is new:Released in Beta structure on March 8, 2006Architected, composed, and built in RedmondClose relative of MSN/Windows Live web searchMicrosoft\'s Image pursuit is accessible just at Windows LiveThe MSN Image Search arrangement is give
Transcripts
Slide 1

Windows Live Image Search Hugh Williams Senior Software Design Engineer Windows Live Search Microsoft Corporation

Slide 2

Overview Windows Live Image Search Problem Definition and Background User Interface Architecture Why is it a beta? Questions?

Slide 3

Introduction Windows Live Image Search is new: Released in Beta frame on March 8, 2006 Architected, planned, and designed in Redmond Close relative of MSN/Windows Live web inquiry Microsoft\'s Image hunt is accessible just at Windows Live The MSN Image Search arrangement is given by an outsider Strong association between the Windows Live Search item group and: Microsoft Research, Cambridge UK Microsoft Research, Asia (Beijing, China) Microsoft Research, Redmond

Slide 4

Problem Definition Find thumbnail pictures utilizing a content question There are no CBIR-based web-scale picture web search tools All cutting edge picture web indexes impart essentials to AltaVista\'s unique PhotoFinder (1998) The thumbnail pictures speak to pages "containing" the first picture We creep website pages and pictures More than a billion pictures Pages and pictures consistently invigorated Large quantities of pictures enter and leave the gathering day by day More later…

Slide 5

Queries From a MSN Search test drawn from a month : Most successive: 65,000+ events Median : 2 events Most inquiries are 1 to 3 words long Most mainstream questions : lindsay lohan, scarlett johansson, angelina jolie, sex, jessica simpson, kate beckinsale, paris hilton, britney lances, shakira, hot, jessica alba, jennifer lopez Random inquiries: connect, rodolfo textual style, playboy, douwe egberts, jesus, tanning, magnificence, oakenfold, priyanka chopra, on-screen characters Around 60 of the main 100 questions are grown-up or big name Other prominent situations are spots, creatures, or articles

Slide 6

More On Queries… In the US, around 10% are spelling blunders Less in a few dialects, more in others Word structures are amazingly basic Tom\'s Diner, Toms Diner, Tom Diner Lots of strangeness: Math.abs 3/4" Ply 103,5 versus 103.5 www cnn.com Every possible spelling of "Britney" Navigational inquiries

Slide 7

Thumbnail Results

Slide 8

Thumbnail Clickthrough

Slide 9

How Users Click Through Around 75% of Web query item site visits are page one. For picture look it is 43%, and the 75% limit in picture pursuit is stretched around page eight

Slide 10

Searching And Ranking Our positioning procedure matches inquiries to records So, what is an archive? We allude to our archives as knobs A knob is made for each connection between a HTML report and a picture (where we have recovered both) The option is a knob for every picture, or a knob for every page A knob normally contains: The thumbnail of the picture Text and headers from the HTML page Image metadata

Slide 11

Background: Ranking So, how would we rank? We rank utilizing: Static Rank: Query Independent esteem Image and page properties, web connect investigation, garbage page likelihood, et cetera Dynamic Rank: Query Dependent esteem TF-IDF, BM25, et cetera The general rank is a blend of Static and Dynamic Rank Broad answer: we figure the comparability between chose knobs and a question, and request the outcomes by diminishing closeness The chose knobs are those that contain all inquiry terms (Boolean AND to discover a channel set , then similitude based requesting of the channel set)

Slide 12

Algorithmic Search Traditional Information Retrieval concentrates on Intelligence Recall Long inquiries Well-framed records Small (low millions) file Image seek concentrates on Precision Short questions Poor archives Billions of knobs in the file

Slide 13

Nodule Text Nodules speak to the connection between a HTML page and a picture Nodule content incorporates components, for example, The HTML page <title> Text from the HTML page Text from close to the picture is a decent begin… ALT or grapple content from the picture Images can be installed in a page utilizing the <img> tag or connected to utilizing the <a> tag

Slide 14

Table Parsing

Slide 15

Image Metadata Ranking uses content and picture properties (the last are solely for picture look) These include: AspectRatio (the proportion of the X measurement to the Y measurement) Pixels (the result of X and Y measurements) PhotoGraphic (regardless of whether a picture is a photo or a realistic) …

Slide 16

Aspect Ratio Extremes

Slide 17

Throwing Out Junk The Web is brimming with balls, lines, and Amazon logos Right now, we disregard little pictures Some we don\'t get (HTML width and tallness qualities help us), numerous we drop in the wake of getting Junk properties help us in positioning: We bring down the rank of pictures with extraordinary viewpoint proportions We bring down the rank of pictures with couple of pixels

Slide 18

Duplicates And Near Duplicates Duplication is dangerous, especially for logos, items, and blurbs We register a hash of all pictures All aside from the most elevated positioned correct copy is expelled from the channel set at inquiry time We are taking a shot at procedures for evacuating close copies

Slide 19

User Interface The Windows Live picture seek UI has five new elements: "Boundless parchment" or "shrewd parchment" Thumbnail estimate slider Film strip comes about view Show full picture Metadata develop understanding

Slide 20

Windows Live Image Search

Slide 21

Infinite Or Smart Scroll Results are displayed in a solitary page Removes others\' paging model Smooths the snap bend Improves browsability Motivated by snap information As examined beforehand, just 43% of clients remain on page one Many sessions indicate profound snap practices Same inspiration for the thumbnail measure slider

Slide 22

Other Features… Motivated and strengthened by ease of use studies Film Strip Results View: Improve comes about route Remove pointless snap activities Make it simple to discover a page or picture Show full picture highlight: Helps find unique picture Particularly valuable for <a> joins Metadata develop Most clients don\'t utilize metadata Reduce mess, enhance peruse involvement

Slide 23

Architecture And Design Crawl and list over a billion knobs at regular intervals Crawl 750 knobs for every second Answer inquiries in under 250ms, with most replied in under 50ms Serve a few million inquiries for each day Peak heap of 150+ questions for every second Serve 10,000+ thumbnails for every second at pinnacle Manage a few petabytes of crude stockpiling

Slide 24

Architecture: Serving Queries

Slide 25

Architecture: Index Building

Slide 26

Indexing: Selection And Crawl Only path into Search is by means of our Crawler We used to have "paid consideration" however surrendered it Google doesn\'t have it, Yahoo! crawls is somewhat organized by Static Rank We creep the main couple of billion pages Biggest issue with slithering: good manners

Slide 27

Distributed Searching I: Single Box Monolithic Model (AltaVista, WebCrawler) – the record goes on a solitary (enormous) box. Points of interest: Easy proportional question volume: simply purchase more web server frontends and Big Boxes Full perceivability on results while positioning Disadvantages: Hard proportional record measure - restricted by CPU and Memory Reliability

Slide 28

Distributed Searching II: Word-Striping Stripe the file by term crosswise over file servers Have a focal box send the inquiry terms to proper servers Merge the outcomes Advantages: Only boxes that have answers get utilized per question Have full perceivability of results while positioning Disadvantages: Some cases are probably going to be more stacked than others It turns out this makes noteworthy system activity

Slide 29

Distributed Searching III: Document Striping Stripe reports haphazardly crosswise over boxes Send inquiry to all cases Merge the outcomes from all crates Advantages: Scales with both file size and inquiry movement volume Minimal system movement, total is simple Disadvantage: zero ability to see on all outcomes while positioning

Slide 30

Why Is It A Beta? We are taking a shot at numerous elements Continuous change of positioning and pertinence Internationalization and availability Scaling and unwavering quality Adult separating New, thought-driving components Many of these include associates in Microsoft Research

Slide 31

© 2006 Microsoft Corporation. All rights held. Microsoft, Windows, Windows Vista and other item names are or might be enlisted trademarks or potentially trademarks in the U.S. or potentially different nations. The data in this is for educational purposes just and speaks to the present perspective of Microsoft Corporation as of the date of this introduction. Since Microsoft must react to changing economic situations, it ought not be deciphered to be a pledge with respect to Microsoft, and Microsoft can\'t ensure the precision of any data gave after the date of this introduction. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Recommended
View more...