Parallel Registering in Chemistry.

Uploaded on:
Category: Funny / Jokes
Characterize some basic terms of parallel and superior figuring. ... The most widely recognized chip family ever. In fact incorporates 16-and 64-bit ...
Slide 1

Parallel Computing in Chemistry Brian W. Hopkins Mississippi Center for Supercomputing Research 4 September 2008

Slide 2

What We\'re Doing Here Define some basic terms of parallel and superior registering. Talk about HPC ideas and how the significance to science thereof. Talk about the specific needs of different computational science applications and philosophies. Quickly present the frameworks and applications being used MCSR.

Slide 3

Why We\'re Doing It Increasing your examination throughput More Data Less Time More Papers More Grants &c. Stop me if there are inquiries!

Slide 4

Some Common Terms Processors: x86, ia64, em64t, &c. Structures: appropriated versus shared memory. Parallelism: message-passing versus shared-memory parallelism.

Slide 5

Processor Types: Basics Integer bit length: the quantity of individual double "bits" used to store a number in memory. 32-bit frameworks: 2 32 bits characterize a number Signed: - 2,147,483,648 to +2,147,483,647 Unsigned: 0 to +4,294,967,295 64-bit frameworks: 2 64 bits characterize a whole number Signed: −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 Unsigned: 0 to +18,446,744,073,709,551,615 Because numbers are utilized for a wide range of stuff, number piece length is a critical requirement on what a processor can do.

Slide 6

Processor Types: In Use Today x86: Pentium, &c. The most well-known chip family ever. In fact incorporates 16-and 64-bit processors, yet "x86" is most ordinarily used to depict 32-bit frameworks. em64t: Athlon, Opteron, Xeon, Core 2 A 64-bit expansion to the x86 group of processors. Once in a while alluded to as x86-64 or amd64. ia64: Itanium, Itanium2 An alternate, non-x86 perfect 64-bit design from Intel. mips: R12000, R14000 64-bit RISC-sort processors utilized as a part of SGI supercomputers

Slide 7

SUPER-registering Modern supercomputers normally comprise of an expansive number of off-the-rack chip snared together in one or more ways. bunches symmetric multiprocessors All normally utilized processor sorts are as a part of utilization in current "super"- PCs.

Slide 8

Supercomputers: Clusters The most effortless and least expensive approach to manufacture a supercomputer is to snare numerous individual PCs together with system links. These supercomputers have a tendency to have "disseminated memory", which means every processor basically works with a memory bank situated in for the situation with it.

Slide 9

Clusters: The Interconnect With a group sort supercomputer, the execution of the system interfacing the hubs is fundamentally vital. System fabrics can be enhanced for either dormancy or transmission capacity, yet not both. Because of the unique needs of elite bunches, various exceptional systems administration advances have been produced for these frameworks equipment and programming together Myrinet, Infiniband, &c.

Slide 10

Supercomputers: SMP Systems Alternatively, it is conceivable to custom-form a PC case and hardware to hold an extensive number of processors. These frameworks are recognized by having "shared" memory, implying that all or a large number of the processors can get to an immense shared pool of memory.

Slide 11

Supercomputers: Hybrid Systems Many present day supercomputers comprise of a rapid group of SMP machines. The frameworks contain expansive quantities of appropriated memory hubs, each of which has an arrangement of processors utilizing shared memory.

Slide 12

Parallel Applications Each sort of supercomputer design shows its own particular interesting programming challenges. Bunch frameworks require parallel projects to call uncommon message-passing schedules to impart data between hubs. Symmetric multiprocessing frameworks require exceptional consideration to keep each brought forth string from adjusting or erasing information that is required by alternate strings. Crossover frameworks regularly require both of these.

Slide 13

Parallel Programming for Clusters The hubs in a group converse with each other through a system association. Since the system association is a bottleneck, it is imperative to minimize the internode correspondence required by the project. Most normally, projects utilize a message-passing library for correspondence between hubs.

Slide 14

Message Passing Cluster registering is amazingly well known. To streamline programming for these designs, different libraries of standard message passing capacities have been produced. MPI TCGMSG Linda, &c. These change in compactness and adaptability.

Slide 15

The Ubiquitous MPI The most prominent message passing library is the message passing interface (MPI). Golden Most organizations advertising supercomputers additionally showcase their own forms of MPI uncommonly tuned to amplify execution of the frameworks they offer. SGI\'s MPT IBM MPI HP-MPI on a basic level, MPI capacities are institutionalized, so any system worked to work with Intel MPI ought to likewise work with HP-MPI. By and by… not really. Moreover, an open-source, convenient MPI is accessible through the MPICH venture.

Slide 16

TCGMSG The TCGMSG library is a stripped-down message passing library outlined particularly for the requirements of computational science applications. NWChem Molpro TCGMSG is by and large more productive than MPI for the particular internode correspondence operations most normal in comp-chem programs However, from a software engineer\'s perspective TCGMSG is less adaptable and able than MPI. There is some jabber about TCGMSG falling into neglect and turning into a legacy library.

Slide 17

A Word On Linda et. al. Some product sellers need additional cash to utilize their product on a group instead of a solitary machine. The surest approach to understand that cash is to manufacture your own special MP library, fabricate the parallel form of your code to work just with it, and after that offer the MP library for additional $$. Subsequently, Linda and comparable MP interfaces. The way of these libraries is to such an extent that you\'re unrealistic to need to code with them, and can anticipate that framework executives will assemble and interface them as required.

Slide 18

Programming for SMP Systems There are distinctive ways to deal with programming for these machines Multithreaded programs Message passing The most widely recognized way to deal with SMP writing computer programs is the OpenMP Interface Watch out! "MP" here stands for "multiprocessing," not "message passing."

Slide 19

Programming for Hybrid Systems Hybrid frameworks have attributes of both bunches and SMP frameworks. The most effective projects for half breed frameworks have cross breed plan. multithreaded or OpenMP parallelism inside a hub message-passing parallelism between hubs Because mixture frameworks are quickly expanding in ubiquity, genuine half breed programming stays extremely uncommon. Thus it\'s normal for coders dealing with half and half frameworks to utilize an immaculate message-passing methodology.

Slide 20

Parallel Efficiency Whenever you utilize different procs rather than one, you bring about two issues: Parts of a count can\'t be parallelized. The processors must perform additional correspondence undertakings. Therefore, most counts will require a bigger number of assets to do in parallel than in serial.

Slide 21

Practical HPC, or, HPC and You Computational Chemistry applications fall into four general classifications: Quantum Chemistry Molecular Simulation Data Analysis Visualization Each of these classes introduces its own particular difficulties.

Slide 22

Quantum Chemistry Lots of Q-Chem at UM and around the state UM: Tschumper, Davis, Doerksen, others MSU: Saebo, Gwaltney JSU: Leszczynski and numerous others. Quantum projects are the greatest customer of assets at MCSR by a wide margin: Redwood: 99% (98 of 99 employments) Mimosa: 100% (86 of 86 occupations) Sweetgum: 100% (24 of 24 employments)

Slide 23

Features of QC Programs Very memory concentrated Limits of 32-bit frameworks will demonstrate Limits on aggregate memory in a crate will likewise indicate Very cycle serious Clock velocity of procs is imperative Fastest procs at MCSR are in mimosa; use them! In a perfect world not extremely plate serious Watch out! In the event that there\'s a memory lack, QC projects will begin building read/compose documents. I/O will completely execute you. To the degree that I/O can\'t be evaded, don\'t do it over a system.

Slide 24

Project-Level Parallelism Quantum extends commonly numerous handfuls to a huge number of individual computations. Regularly, the most effective approach to complete 100 employments on 100 accessible processors is to run 100 1-proc occupations at the same time. All out divider time required increments with each expansion in individual employment parallelism: 100x1 < 50x2 < 25x4 < 10x10 < 5x20 < 1x100

Slide 25

When to Parallelize a Job Some occupations essentially won\'t keep running in serial in any helpful measure of time. These employments ought to be parallelized as meager as would be prudent. What\'s more, occupations that require broad parallelization ought to be run utilizing uncommon elite QC programs

Slide 26

Implications for Gaussian 03 Do not permit G03 to pick its own computation calculations. MP2(fulldirect), SCF(direct) Jobs that will keep running on mimosa should. Employments that will keep running in serial should. Use nearby circle for unavoidable I/O. duplicate records back all at once after a vocation. For genuinely serious occupations, consider moving to another project suite. We\'re here to offer assistance!

Slide 27

Molecular Simulation is not as normal at MCSR as quantum science. Still, some recreation bundles are accessible AMBER NWChem CPMD CP2K And a few researchers here do utilize them MedChem, Wadkins, &c.

Slide 28

Features of MD Programs At their center, MD programs play out a solitary, straightforward figuring a large number of times. Accordingly, MD programs devour a LOT of clock cycles. These projects should always compose huge sub-atomic designs to yield, as are regularly I/O limited. Memory load from standard MD projects is minor. Correspondence stack and be entirely high: long-go ES. All MD projects are MPI-empowered.

Slide 29

Parallelizing Simulations MD recreations for the most part parallelize a great deal more promptly than QC employments. Additionally, singular MD sims tend to take any longer than indiv

View more...