CS 240A : Monday, April 9, 2007.

Uploaded on:
A library can utilize its own communicator, isolated from that of a client ... Default Communicator: MPI_COMM_WORLD. Some MPI Concepts. Information Type. What ...
Slide 1

CS 240A : Monday, April 9, 2007 Join Google discourse bunch! (See course landing page). Accounts on DataStar, San Diego Supercomputing Center ought to be here this evening. DataStar logon & device introduction at twelve Tuesday (tomorrow) in ESB 1003. Homework 0 (portray a parallel application) due Wednesday. Homework 1 (first programming issue) due next Monday, April 16.

Slide 2

Hello, world in MPI #include <stdio.h> #include "mpi.h" int primary( int argc, roast *argv[]) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); printf( "Hello world from procedure %d of %d\n", rank, size ); MPI_Finalize(); return 0; }

Slide 3

MPI in nine schedules (all you truly require) MPI_Init Initialize MPI_Finalize Finalize MPI_Comm_size How numerous procedures? MPI_Comm_size Which procedure am I? MPI_Wtime Timer MPI_Send Send information to one proc MPI_Recv Receive information from one proc MPI_Bcast Broadcast information to all procs MPI_Reduce Combine information from all procs

Slide 4

Ten more MPI schedules (in some cases valuable) More gathering schedules (like Bcast and Reduce): MPI_Alltoall, MPI_Alltoallv MPI_Scatter, MPI_Gather Non-blocking send and get: MPI_Isend, MPI_Irecv MPI_Wait, MPI_Test, MPI_Probe, MPI_Iprobe Synchronization: MPI_Barrier

Slide 5

Some MPI Concepts Communicator An arrangement of procedures that are permitted to impart between themselves. A library can utilize its own particular communicator, isolated from that of a client program. Default Communicator: MPI_COMM_WORLD

Slide 6

Some MPI Concepts Data Type What sort of information is being sent/recvd? For the most part compares to C information sort MPI_INT, MPI_CHAR, MPI_DOUBLE, and so on

Slide 7

Some MPI Concepts Message Tag Arbitrary (whole number) mark for a message Tag of Send must match tag of Recv

Slide 8

Parameters of blocking send MPI_Send(buf, check, datatype, dest, tag, comm) Address of Datatype of Message tag send b uff er every thing Number of things Rank of goal Comm unicator to send process

Slide 9

Parameters of blocking get MPI_Recv(buf, tally, datatype, src, tag, comm, status) Status Address of Datatype of Message tag after oper ation receiv e b uff er every thing Maxim um n umber Rank of source Comm unicator of things to receiv e process

Slide 10

Example: Send a whole number x from proc 0 to proc 1 MPI_Comm_rank(MPI_COMM_WORLD,&myrank);/* get rank */ if (myrank == 0) { int x; MPI_Send(&x, 1, MPI_INT, 1, msgtag, MPI_COMM_WORLD); } else if (myrank == 1) { int x; MPI_Recv(&x, 1, MPI_INT,0,msgtag,MPI_COMM_WORLD,&status); }

Slide 11

Partitioned Global Address Space Languages Explicitly-parallel programming model with SPMD parallelism Fixed at system start-up, ordinarily 1 string for each processor Global location space model of memory Allows software engineer to specifically speak to dispersed information structures Address space is legitimately apportioned Local versus remote memory (two-level order) Performance straightforwardness and tunability are objectives Initial usage can utilize fine-grained shared memory Languages: UPC (C), CAF (Fortran), Titanium (Java)

Slide 12

Global Address Space Eases Programming Thread 0 Thread 1 Thread n The dialects share the worldwide location space deliberation Shared memory is divided by processors Remote memory may stay remote: no programmed reserving inferred One-sided correspondence through peruses/composes of shared variables Both individual and mass memory duplicates Differ on subtle elements Some demonstrates have a different private memory territory Distributed cluster sweeping statement and how they are developed X[0] X[1] X[P] Shared Global location space ptr: ptr: ptr: Private

Slide 13

UPC Execution Model various strings working freely: SPMD Number of strings determined at arrange time or run-time Program variable THREADS indicates number of strings MYTHREAD indicates string list ( 0..THREADS-1 ) upc_barrier is a worldwide synchronization upc_forall is a parallel circle

Slide 14

Hello World in UPC Any legitimate C project is likewise a lawful UPC program If you aggregate and run it as UPC with P strings, it will run P duplicates of the system. #include <upc.h>/* required for UPC augmentations */#include <stdio.h> principle() { printf("Thread %d of %d: hi UPC world\n", MYTHREAD , THREADS ); }

Slide 15

r =1 Example: Monte Carlo Pi Calculation Estimate Pi by tossing darts at a unit square Calculate rate that fall in the unit circle Area of square = r 2 = 1 Area of circle quadrant = ¼ * p r 2 = p/4 Randomly toss darts at x,y positions If x 2 + y 2 < 1, then point is inside circle Compute proportion: # focuses inside/# focuses complete p = 4*ratio See case code

Slide 16

Private versus Shared Variables in UPC Normal C variables and items are dispensed in the private memory space for every string. Shared variables are allotted just once, with string 0 shared int our own; int mine; Simple shared variables of this kind may not happen in an inside a capacity definition Thread 0 Thread 1 Thread n Shared our own: Global location space mine: mine: mine: Private

Slide 17

Shared Arrays Are Cyclic By Default Shared cluster components are spread over the strings shared int x[THREADS]/* 1 component for each string */shared int y[3][THREADS]/* 3 components for each string */shared int z[3*THREADS]/* 3 components for each string, cyclic */In the photos beneath THREADS = 4 Elements with partiality to string 0 are blue As a 2D exhibit, this is legitimately hindered by segments x y z

Slide 18

Example: Vector Addition Questions about parallel vector augmentations: How to format information (here it is cyclic) Which processor does what (here it is "proprietor figures") /* vadd.c */ #include <upc_relaxed.h> #define N 100*THREADS shared int v1[N], v2[N], sum[N]; void principle() { int i; for(i=0; i<N; i++) if (MYTHREAD = i%THREADS) sum[i]=v1[i]+v2[i]; } cyclic design proprietor registers

Slide 19

Vector Addition with upc_forall The vadd case can be revamped as takes after Equivalent code could utilize " &sum[i] " for liking The code would be right however moderate if the proclivity expression were i+1 as opposed to i . /* vadd.c */ #include <upc_relaxed.h> #define N 100*THREADS shared int v1[N], v2[N], sum[N]; void primary() { int i; upc_forall(i=0; i<N; i++; i) sum[i]=v1[i]+v2[i]; } The cyclic information circulation may perform ineffectively on a store based shared memory machine

Slide 20

Pointers to Shared versus Exhibits In the C convention, cluster can be access through pointers Here is the vector expansion case utilizing pointers #include <upc_relaxed.h> #define N 100*THREADS shared int v1[N], v2[N], sum[N]; void fundamental() { int i; shared int *p1, *p2; p1=v1; p2=v2; for (i=0; i<N; i++, p1++, p2++ ) in the event that (i %THREADS= = MYTHREAD) sum[i]= *p1 + *p2 ; } v1 p1

Slide 21

UPC Pointers Where does the pointer live? Where does it point? int *p1;/* private pointer to nearby memory */shared int *p2;/* private pointer to shared space */int *shared p3;/* shared pointer to neighborhood memory */shared int *shared p4;/* shared pointer to shared space */Shared to private is not prescribed.

Slide 22

UPC Pointers Thread 0 Thread 1 Thread n p3: p3: p3: Shared p4: p4: p4: Global location space p1: p1: p1: Private p2: p2: p2: int *p1;/* private pointer to nearby memory */shared int *p2;/* private pointer to shared space */int *shared p3;/* shared pointer to neighborhood memory */shared int *shared p4;/* shared pointer to shared space */Pointers to shared frequently require more stockpiling and are all the more exorbitant to dereference; they may allude to nearby or remote memory.

Slide 23

Common Uses for UPC Pointer Types int *p1; These pointers are quick Use to get to private information in a portion of code performing neighborhood work Often cast a pointer-to-shared to one of these to get speedier access to shared information that is nearby shared int *p2; Use to allude to remote information Larger and slower because of test-for-nearby + conceivable correspondence int *shared p3; Not prescribed shared int *shared p4; Use to fabricate shared connected structures, e.g., a connected rundown

Slide 24

UPC Pointers In UPC pointers to shared items have three fields: string number residential area piece stage (determines position in the square)

Slide 25

UPC Pointers Pointer number-crunching bolsters blocked and non-blocked cluster disseminations Casting of shared to private pointers is permitted yet not the other way around ! At the point when throwing a pointer to shared to a private pointer, the string number of the pointer to shared might be lost Casting of shared to private is very much characterized just if the item indicated by the pointer to imparted has proclivity to the string playing out the cast

Slide 26

Synchronization No verifiable synchronization among strings Several express synchronization components: Barriers (Blocking) upc_barrier Split Phase Barriers (Non Blocking) upc_notify upc_wait Optional name takes into consideration troubleshooting Locks

Slide 27

Bulk Copy Operations in UPC capacities to move information to/from shared memory Typically parts quicker than a circle & task stmt! Can be utilized to move lumps in the common space or amongst shared and private spaces Equivalent of memcpy : upc_memcpy(dst, src, size) : duplicate from shared to shared upc_memput(dst, src, size) : duplicate from private to shared upc_memget(dst, src, size) : duplicate from shared to private Equivalent of memset: upc_memset(dst, roast, size) : instate shared memory

Slide 28

Dynamic Memory Allocation in UPC Dynamic memory allotment of shared memory is accessible in UPC Functions can be aggregate or not An aggregate capacity must be call

View more...