Chameleon: An Asset Scheduler in An Information Framework Environment.


75 views
Uploaded on:
Category: News / Events
Description
Chameleon: A Resource Scheduler in A Data Grid Environment ... Scheduler model, called Chameleon, is created which is taking into account the displayed booking models ...
Transcripts
Slide 1

Chameleon: A Resource Scheduler in A Data Grid Environment Sang Min Park  Jai-Hoon Kim Ajou University South Korea

Slide 2

Contents Introduction to Data Grid Related Works Scheduling Model Scheduler Implementation Testbed and Application Results Conclusions

Slide 3

Introduction to Data Grid Data Grid Motivations Petabyte scale information generation Distributed information stockpiling to store parts of information Distributed registering assets which prepare the information Two Most Important Approaches for Data Grid Secure, solid, and effective information transport convention (ex. GridFTP) Replication (ex. Copy inventory) Replication Large size records are in part duplicated among destinations Reduce information access time Application Scheduling , Dynamic replication issues are developing

Slide 4

Related Works Data Grid Replica list – mapping from sensible document name to physical example GridFTP – Secure, dependable, and effective record exchange convention Job Scheduling Various planning calculations for computational Grid Application Level Scheduling (AppLes) Large information gathering has not been concerned Job Scheduling in Data Grid Roughly logical and reproduction studies are exhibited Our works characterize more inside and out booking model

Slide 5

Scheduling Model Assumptions Site has both information stockpiling and figuring offices Files are imitated at a portion of Grid locales Each site has diverse measure of computational ability Grid clients demand work execution through Job schedulers

Slide 6

Scheduling Model -System Factors Dynamic framework elements - Factors change after some time Network transmission capacity Data exchange time is corresponding to network transfer speed NWS-instrument for measuring and anticipating system transmission capacity Available registering hubs Determines execution time of occupations Decided by burden on a site System qualities Machine design (groups, MPPs, and so forth) Processor speed, Available memory, I/O execution, and so forth

Slide 7

Scheduling Model -System Factors Application particular variables - Unique components Data Grid applications have Size of information (imitation) If not in the figuring site, information bring is required Much the reality of the situation will become obvious eventually devoured to exchange extensive size information Size of utilization code Application code ought to be moved to locales which perform calculation Not basic to the general execution (little size) Size of delivered yield information When the registering work happens at the remote site, result information ought to be returned back to the nearby Strongly identified with the extent of information

Slide 8

Scheduling Model -application situations The model comprises of 5 unmistakable application situations Local Data and Local Execution Local Data and Remote Execution Remote Data and Local Execution Remote Data and Same Remote Execution Remote Data and Different Remote Execution

Slide 9

Scheduling Model -application situations Terms in the situations

Slide 10

Scheduling Model -application situations Local Data and Local Execution Input information (reproduction) is situated in neighborhood, and preparing is performed with neighborhood accessible processors Data in move comprises of Input information (imitation) Application code Output information Cost comprises of Data exchange time amongst expert and registering hubs by means of LAN Job execution time utilizing neighborhood processors

Slide 11

Scheduling Model -application situations 2. Nearby Data and Remote Execution Locally replicated imitation is exchanged to remote calculation site Cost comprises of Data (input+codes+output) development time by means of WAN amongst neighborhood and remote site Data development time by means of LAN in a remote site Job execution time on a remote site

Slide 12

Scheduling Model -application situations 3. Remote Data and Local Execution Remote imitation is duplicated into neighborhood site, and handling is performed on nearby Cost comprises of Input information development time by means of WAN amongst neighborhood and remote site Data development time through LAN in a nearby site Job execution time on a nearby processors

Slide 13

Scheduling Model -application situations 4. Remote Data and Same Remote Execution Remote site having reproduction performs calculation Cost comprises of Data (code+output) development time by means of WAN amongst neighborhood and remote site Data development time through LAN in a remote site Job execution time on a remote site

Slide 14

Scheduling Model -application situations 5. Remote Data and Different Remote Execution Remote site j performs calculation with copy duplicated from remote site i Cost comprises of Input imitation development time by means of WAN between remote site i and j Data (codes + yield) development time through WAN amongst nearby and remote j Data development time by means of LAN in a remote site j Job execution time in a remote site j

Slide 15

Scheduling Model -scheduler Operations of the scheduler Predict the reaction time of every situation Compare the reaction time of situations Choose the best situation and locales holding information and to perform work execution Requests information development and occupation execution

Slide 16

Scheduler Implementation Develop scheduler model, called Chameleon, for assessing the planning model Built on top of administrations gave by Globus GRAM MDS GridFTP Replica Catalog NWS is utilized for measuring and determining system transmission capacity Scheduling calculations depend on the booking models exhibited

Slide 17

Testbed for trials

Slide 18

Applications Gene arrangement examination applications (Bioinformatics) Computationally escalated investigation on the vast size protein database Bio-researchers foresee structure and elements of recently discovered protein by contrasting it and understood protein database The extent of database spans more than 500 MB There are different variants of protein database Large databases are recreated in Data Grid Two surely understood applications, Blast and FASTA, are executed

Slide 19

Applications -parameters

Slide 20

Experimental Results (1) Results when executing PSI-BLAST Replication situation

Slide 21

Experimental Results (2) Results on the past slide Results when executing FASTA in the above replication situation

Slide 22

Experimental Results (3) No replication happens Results when executing PSI-BLAST

Slide 23

Experimental Results (4) Increasing the quantity of reproduction Decreasing reaction time

Slide 24

Conclusions Job planning models for Data Grid The models comprise of 5 unmistakable situations Scheduler model, called Chameleon, is created which depends on the introduced planning models Perform significant analyses with Chameleon on a developed Grid testbed We accomplish better execution by considering information areas and additionally computational abilities

Slide 25

References ANTZ: http://www.antz.or.kr ApGrid: http://www.apgrid.org B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. "Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing," IEEE Mass Storage Conference, 2001. Mark Baker, Rajkumar Buyya and Domenico Laforenza. "The Grid: International Efforts in Global Computing," International Conference on Advances in Infrastructure for E-Business, Science, and Education on the Internet, SSGRR2000, L\'Aquila, Italy, July 2000. F. Berman and R. Wolski. "The AppLes extend: A status report," Proceedings of the eighth NEC Research Symposium, Berlin, Germany, May 1997. Rajkumar Buyya, Kim Branson, Jon Giddy and David Abramson. "The Virtual Laboratory: A Toolset for Utilizing the World-Wide Grid to Design Drugs," second IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany, May 2002. CERN DataGrid Project: http://www.cern.ch/matrix/Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke. "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets," Journal of Network and Computer Applications , 23:187-200, 2001. Dirk Düllmann, Wolfgang Hoschek, Javier Jean-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger. "Models for Replica Synchronization and Consistency in a Data Grid," tenth IEEE Symposium on High Performance and Distributed Computing (HPDC-10), San Francisco, California, August 2001. I. Foster and C. Kesselman. "The Grid: Blueprint for a New Computing Infrastructure," Morgan Kaufmann, 1999. I. Foster, C. Kesselman and S. Tuecke. "The Anatomy of the Grid: Enabling Scalable Virtual Organizations," International J. Supercomputer Applications, 15(3), 2001. Cynthia Gibas. "Creating Bioinformatics Computer Skills," O\'REILLY, April 2001. The Globus Project: http://www.globus.org

Slide 26

References Leanne Guy, Erwin Laure, Peter Kunszt, Heinz Stockinger, and Kurt Stockinger. "Imitation administration in information networks," Technical report, Global Grid Forum Informational Document, GGF5, Edinburgh, Scotland, July 2002. Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger. "Information Management in an International Data Grid Project," first IEEE/ACM International Workshop on Grid Computing (Grid\'2000), Bangalore, India, Dec 2000. Kavitha Ranganathan and Ian Foster. "Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications," eleventh IEEE International Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh, Scotland, July 2002. Kavitha Ranganathan and Ian Foster. "Configuration and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid," International Conference on Computing in High Energy and Nuclear Physics, Beijing, September 2001. Kavitha Ranganathan and Ian Foster. "Distinguishing Dynamic Replication Strategies for a High Performance Data Grid," International Workshop on Grid Computing, Denver, November 2001. Heinz Stockinger, Kurt Stockinger, Erich Schikuta and Ian Willers. "Towards a Cost Model for Distributed and Replica

Recommended
View more...