CS435/535: Web Scale Applications.

Uploaded on:
Category: Business / Finance
The Internet is a system of heterogeneous systems. Each independently ... The Internet foundation has preferable backing for HTTP over different conventions ...
Slide 1

CS435/535: Internet-Scale Applications http://zoo.cs.yale.edu/classes/cs435/1/12/2009

Slide 2

Outline What are Internet-scale applications? Organization

Slide 3

Outline What are Internet-scale applications?

Slide 4

Internet-Scale: Large Network

Slide 5

Number of Hosts on the Internet: Aug. 1981 213 Oct. 1984 1,024 Dec. 1987 28,174 Oct. 1990 313,000 Jul. 1993 1,776,000 Jul. 1996 19,540,000 Jul. 1999 56 , 218 ,000 Jul. 200 4 285,139, 000 Jul. 2005 353,284, 000 Jul. 2007 489,774,000 Jul. 2008 570,937,000 Jul. 2009 681,064,000 Growth of the Internet in Terms of Number of Hosts CAIDA switch level perspective

Slide 6

ISP Backbone ISP Internet Physical Infrastructure Residential a ccess Cable Fiber DSL Wireless The Internet is a system of heterogeneous systems Each separately administrated system is called an Autonomous System (AS) Campus access, e.g., Ethernet Wireless

Slide 7

Abilene I2 Backbone http://weathermap.grnoc.iu.edu/abilene_jpg.html

Slide 8

Qwest Backbone Map http://www.qwest.com/largebusiness/enterprisesolutions/networkMaps/preloader.swf

Slide 9

ATT Global Backbone IP Network From http://www.business.att.com

Slide 10

AT&T USA Backbone Map From AT&T site.

Slide 11

Internet Diameter

Slide 12

Internet-Scale: Large User Base

Slide 13

User Base of Large Internet Applications in U.S. (June 2009) Total U.S. - Home, Work and University Locations Unique Visitors (000) Unique Visitors Rank Property (000) - - Total Internet : Total Audience 193,896 1 Google Sites 156,871 2 Yahoo! Destinations 154,097 3 Microsoft Sites 127,454 4 AOL LLC 106,467 5 Fox Interactive Media 84,567 6 FACEBOOK.COM 77,031 7 Ask Network 73,041 8 eBay 71,020 9 Amazon Sites 63,178 10 Wikimedia Foundation Sites 60,692 11 Apple Inc. 56,554 12 Glam Media 54,223 13 Viacom Digital 51,575 14 Turner Network 50,841 15 CBS Interactive 50,341 16 craigslist, inc. 46,832 17 New York Times Digital 44,789 18 Weather Channel, The 41,751 19 Adobe Sites 38,120 20 Comcast Corporation 34,865 21 Verizon Communications Corporation 33,436 22 Wal-Mart 33,358 23 AT&T Interactive Network 31,582 24 Disney Online 31,362 25 Demand Media 28,938 26 Superpages.com Network 28,367 27 Expedia Inc 27,058 28 The Mozilla Organization 26,964 29 Target Corporation 26,284 30 WordPress 26,245 31 Answers.com Sites 26,163 32 Bank of America 25,479 33 Photobucket.com LLC 24,528 34 AT&T, Inc. 24,032 35 Gorilla Nation 24,022 36 United Online, Inc 22,828 37 Everyday Health 22,374 38 Break Media 22,334 39 CareerBuilder LLC 21,704 40 NBC Universal 21,202 41 ESPN 20,984 42 NetShelter Technology Media 20,635 43 iVillage.com: The Womens Network 20,594 44 Weatherbug Property 20,465 45 JPMorgan Chase Property 20,211 46 TWITTER.COM* 20,111 47 Real.com Network 19,918 48 EA Online 19,607 49 Gannett Sites 19,298 50 Time Warner - Excluding AOL 19,293 Source: comScore Media Metrix (http://ir.comscore.com/releasedetail.cfm?releaseid=398136)

Slide 14

Internet-Scale: Can Be Data/Processing Intensive

Slide 15

How Much Data? 1 PB = 1000 TB 1EB = 1000 PB

Slide 16

How Much Data? Wayback Machine has 2 PB + 20 TB/month (2006) NOAA has ~1 PB atmosphere information (2007) Google forms 20 PB a day (2008) Internet movement 5-8 EB (Dec. 2008) Size of World\'s advanced substance 500 EB (May 2009) 640K should be sufficient for anyone. 1 PB = 1000 TB 1EB = 1000 PB http://en.wikipedia.org/wiki/Exabyte

Slide 17

Processing Examples Crawling, indexing, looking, mining the Web Ecommerce exchanges Software as administration …

Slide 18

Internet-Scale: Large System Scale

Slide 19

Servers Internet-scale issue? Toss more machines at it ! From little end clients (called P2P) From monster server farms (called server farm applications)

Slide 20

Large Data Centers A pattern: centralization of figuring assets in huge server farms Necessary fixings: fiber, squeeze, and space What do Oregon, Iceland, and surrendered mines have in like manner? Significant configuration point: scale out, not scale up

Slide 21

Source: Harper\'s (Feb, 2008)

Slide 22

Maximilien Brice, © CERN

Slide 23

Internet-Scale: Evolving Computing Model

Slide 24

Evolving Computing Models Do it yourself (fabricate your own particular server farms) Utility figuring Why purchase machines when you can lease cycles? Cases: Amazon\'s EC2, GoGrid, AppNexus Platform as a Service (PaaS) Give me pleasant API and deal with the usage Example: Google App Engine Software as a Service (SaaS) Just run it for me! Case: Gmail; MS Exchange; MS Office Online

Slide 25

Internet-Scale: Likely Web-Based

Slide 26

Web-based Applications The Internet foundation has preferred backing for HTTP over different conventions A pattern of programming applications: From the desktop to the program SaaS == Web-based applications Examples: Google Maps/Doc, Facebook How would we convey very intelligent Web-based applications? AJAX (offbeat JavaScript and XML) For better, or for more terrible…

Slide 27

Internet-Scale: Software/Platform Architecture Matters

Slide 28

Programming Architecture Matters Performance versus programming extensibility

Slide 29

Software Architecture Matters It all comes down to… Divide-and-vanquish Throwing more equipment at the issue as the issue becomes greater

Slide 30

Divide and Conquer "Work" Partition w 1 w 2 w 3 "specialist" "laborer" "laborer" r 1 r 2 r 3 Combine "Result" It is easy to state, hard to ace…

Slide 31

Different Workers Where are the laborers? Distinctive strings in the same center Different centers in the same CPU Different CPUs in a multi-processor framework Different machines in an appropriated framework Many configuration issues Which laborer does what? How do the laborers impart/coordinate? Imagine a scenario where a few specialists bite the dust or are isolated from others.

Slide 32

Example Architecture: Three Tiered Architecture Stateless frontend Soft state center level containing application rationale and regular administrations Backend tireless capacity

Slide 33

Platform Matters "Engineers who have worked at the little scale may be asking themselves for what valid reason we have to trouble with "stage plan" when we could simply utilize some sort of out-of the-crate arrangement. For little scale applications, this can be an incredible thought. We spare time and cash in advance and get a working and serviceable application. The issue comes at bigger scales—there are no off-the-rack packs that will permit you to assemble something like Amazon or Friendster. While building comparable usefulness may be genuinely minor, making that usefulness work for a great many items, a huge number of clients, and without spending considerably a lot on equipment obliges us to fabricate something profoundly redid and advanced for our definite needs. There\'s a justifiable reason motivation behind why the biggest applications on the Internet would all say all are bespoke manifestations: no other methodology can make hugely versatile applications inside a sensible spending plan." http://www.evontech.com/symbian/55.html

Slide 34

Outline What are Internet-scale applications? Course organization

Slide 35

Personnel I nstructor Michael Fischer < fischer-michael@cs.yale.edu > AKW 409 Y. Richard Yang <yry@cs.yale.edu > AKW 308A available time TTh 4 : 00 - 5 : 00 or by arrangement T eaching aide (grader) Ye Wang

Slide 36

What are the Goals o f this Course? Learn outline standards and strategies of: Large-scale Internet applications; Infrastructure supporting such applications See how the standards and systems apply and adjust in true: Real case from DNS/Email/Web, Akamai, Amazon (dynamo, AWS), (Google group, GFS, BigTable, Chubby, AppEngine), Microsoft (Live Mesh, Azure), PPLive

Slide 37

What Will We Cover? Foundation on Internet/DNS/Email/Web Basic reflection/outline for superior customer/server multi-string, async i/o, SEDA versatile applications (e.g., playout cradle) Web administration situated engineering Interactivity (ajax) Server scaling Load adjusting (HTTP neighborhood/Akamai worldwide case) Cloud coordination Over-provisioning and scope organization Servent (end has contributed) plan (P2P) Application/system base mix and interface Tiered design and center

View more...