Uploaded on:
BENCHMARKS. Ramon Zatarain. INDEX. Benchmarks and Benchmarking Relation of Benchmarks with Empirical Methods Benchmark definition Types of benchmarks Benchmark suites Measuring performance (CPU, comparing of performance, etc.) Common system benchmarks Examples of software benchmarks
Slide 1

BENCHMARKS Ramon Zatarain

Slide 2

INDEX Benchmarks and Benchmarking Relation of Benchmarks with Empirical Methods Benchmark definition Types of benchmarks Benchmark suites Measuring execution (CPU, looking at of execution, and so forth.) Common framework benchmarks Examples of programming benchmarks Benchmark pitfalls Recommendations Benchmarking rules Bibliography

Slide 3

Benchmarks and Benchmarking A benchmark was a reference point in deciding one\'s present position or height in geological studies and tidal perceptions. A benchmark was a standard against which others could be measured.

Slide 4

Benchmarks and Benchmarking In the 1970s, the idea of a benchmark advanced past a specialized term connoting a reference point. The word relocated into the vocabulary of business, where it came to connote the estimation procedure by which to direct correlations.

Slide 5

Benchmarks and Benchmarking In the mid 1980s, Xerox company, a pioneer in benchmarking, characterize it as the persistent procedure of measuring items, administrations, and practices against the hardest contenders.

Slide 6

Benchmarks and Benchmarking Benchmarks, as opposed to benchmarking, are estimations to assess the execution of a capacity, operation or business with respect to others. In the electronic business, for example, a benchmark has since quite a while ago alluded to a working insights that permits you to contrast your own particular execution with that of another.

Slide 7

RELATION OF BENCHMARKS WITH EMPIRICAL METHODS In numerous regions of Computer sciences, trials are the essential method for showing the potential and estimation of frameworks and procedures. experimental strategies for examining and looking at frameworks and methods are of impressive enthusiasm to numerous CS analysts.

Slide 8

RELATION OF BENCHMARKS WITH EMPIRICAL METHODS The principle assessment criteria that has been received in a few fields, similar to the satisfiability testing (SAT), is observational execution on shared benchmark issues. In the workshop "Future Directions in Software Engineering", numerous issues were tended to; some of them were:

Slide 9

RELATION OF BENCHMARKS WITH EMPIRICAL METHODS In the paper "Investigate Methodology in Software Engineering" four strategies were distinguished: the logical technique, the designing strategy, the experimental technique , and the diagnostic strategy. In paper "We Need To Measure The Quality Of Our Work" the creator call attention to that "we as a group have no by and large acknowledged techniques or benchmarks for measuring and looking at the quality and utility of our exploration comes about".

Slide 10

RELATION OF BENCHMARKS WITH EMPIRICAL METHODS Examples: IEEE Computer Society Workshop on Empirical Evaluation of Computer Vision Algorithms. A benchmark for design acknowledgment frameworks An exact examination of C, C++, Java, Perl, Python, Rexx, and Tcl (hyperlink)

Slide 11

BENCHMARK DEFINITION Some definitions are : It is a test that measures the execution of a framework or subsystem on a very much characterized undertaking or set of assignment. A strategy for looking at the execution of various PC design. On the other hand a strategy for contrasting the execution of various programming

Slide 12

TYPES OF BENCHMARKS Real projects . They have information, yield, and alternatives that a client can choose when running the program. Cases: Compilers, content handling programming, and so forth. Bits . Little, key pieces from genuine projects. They are not utilized for clients. Illustrations: Livermore Loops and Linpack.

Slide 13

TYPES OF BENCHMARKS Toy benchmarks . Regularly somewhere around 10 and 100 lines of code and create an outcome the client definitely knows. Cases: Sieve of Eratosthenes, Puzzle, and Quicksort. Manufactured benchmarks : They attempt to coordinate a normal execution profile. Illustrations: Whetstone and Dhrystone.

Slide 14

BENCHMARK SUITES It is an accumulation of benchmarks to attempt to measures the execution of processors with an assortment of utilizations. The preferred standpoint is that the shortcoming of any one benchmark is decreased by the nearness of alternate benchmarks. A few benchmarks of the suite are parts, however numerous are genuine projects.

Slide 15

BENCHMARK SUITES Example: SPEC92 benchmark suite (20 programs) Benchmark Source Lines of code description Espresso C 13,500 Minimize Boolean capacities Li C 7,413 Lisp mediator (9 ruler probl.) Eqntott C 3,376 translate boolean conditions Compress C 1,503 Data pressure Sc C 8,116 Computation in a spreadsheet Gcc C 83,589 GNU C compiler Spice2g6 Fortran 18,476 Circuit Simulation Package Doduc Fortran 5,334 Simulation of atomic reactor Mdljdp2 Fortran 4,458 Chemical application Wave5 Fortran 7,628 Electromagnetic Simulation Tomcatv Fortran 195 Mesh era program Ora Fortran 535 Traces beams through optical syst. Alvinn C 272 Simulation in neural systems Ear C 4,483 Inner ear show …

Slide 16

MEASURING PERFORMANCE Wall-clock time (passed time). Inertness to finish an assignment, including circle gets to, information/yield exercises, memory gets to, OS overhead. CPU time. Not incorporation of time sitting tight for I/O or running another program. Client CPU time. Time spent in the program System CPU time. Time spent in the OS

Slide 17

CPU Performance Measures MIPS (a huge number of guidelines every second). How quick the machine can work. MFLOPS (Floating-point). GFLOPS (Gigaflops). Different measures are Whets (Whetstone benchmark), VUP (VAX unit of execution), and SPECmarks . Note: Sometimes MIPS can signify "good for nothing markers of execution for sales people".

Slide 18

COMPARING PERFORMANCE Computer A Computer B Computer C Program P1 (secs) Program P2 (secs) Program P3 (secs) 1 10 20 1000 100 20 1001 110 40 Execution times of three projects on three machines

Slide 19

Time i i=1 CPU Performance Measures TOTAL EXECUTION TIME : A normal of the execution times that tracks add up to execution time is the math mean n 1 å Where Time i is the execution for the i th program of a sum of n in the workload n When execution is communicated as a rate we utilize Harmonic mean: Where Rate i is an element of 1/time i , the execution time for the i th of n projects in the workload. It is utilized when execution Is measured as a part of MIPS or MFLOPS n 1 å Rate i i=1

Slide 20

CPU Performance Measures WEIGHTED EXECUTION TIME A question emerges: What is the best possible blend of projects for the workload? In the number juggling mean we accept programs P1 and P2 run similarly in the Workload. A weighted number juggling mean is given by n å Weight i x Time i Where Weight i is the recurrence of the ith program in the workload and Time i Is the execution time of the program "i" i=1

Slide 21

CPU Performance Measures Comp A Comp B Comp C W1 W2 W3 Program P1 (secs) Program P2 (secs) Arithmetic mean:W1 Arithmetic mean:W2 1 10 20 .50 .909 .999 1000 100 20 .50 .091 .001 500.50 55.0 20.0 91.91 18.19 20.0 Arithmetic mean:W3 2.0 10.09 20.0 Weighted math mean execution times utilizing three weightings

Slide 22

COMMON SYSTEM BENCHMARKS 007 (OODBMS). Intended to mimic a CAD/CAM environment. Tests:         - Pointer traversals over reserved information; plate inhabitant information;                meager traversals; and thick traversals         - Updates: listed and unindexed question fields; rehashed                upgrades; inadequate overhauls; redesigns of stored information; and creation                and erasure of items         - Queries: correct match query; ranges; accumulation examine;                way join; impromptu join; and single-level make. Originator: University of Wisconsin Versions: Unknown Availability of Source: Free from Availability of Results: Free from Entry Last Updated: Thursday April 15 15:08:07 1993

Slide 23

AIM Technology, Palo Alto Two suites (III and V) Suite III: reproduction of uses (undertaking or gadget particular)    -Task particular schedules (word preparing, database administration, bookkeeping)    -Device particular schedules (memory, circle, MFLOPs, IOs)    -All estimations speak to a rate of VAX 11/780 execution (100%) all in all, Suite III gives a general sign of execution. Suite V: measures throughput in a multitasking workstation environment by testing:    -Incremental framework stacking    -Multiple parts of framework execution The graphically showed comes about plot the workload level versus time. A few unique models portray different client situations (money related, distributed, programming designing). The distributed reports are copyrighted. A case of AIM benchmark comes about (in .pdf organize)

Slide 24

Dhrystone Short manufactured benchmark program proposed to be illustrative of framework (whole number) programming.  Based on distributed insights on utilization of programming dialect highlights; see unique production in CACM 27,10 (Oct. 1984), 1013-1030. Initially distributed in Ada, now generally utilized as a part of C.  Version 2 (in C) distributed in SIGPLAN Notices 23,8 (Aug. 1988), 49-62, together with estimation rules.  Version 1 is no longer prescribed since cutting edge compilers can dispense with an excess of "dead code" from the benchmark (However, cited MIPS numbers are regularly in view of Version 1.)  Problems: Due to its little size (100 HLL articulations, 1-1.5 KB code), the memory framework outside the store is not tried; compilers can too effortlessly enhance for Dhrystone; and string operations are to some degree over spoke to. Proposal: Use it for controlled analyses just; don\'t indiscriminately trust single Dhrystone MIPS numbers cited some place (generally speaking, don\'t do this for any benchmark). Originator: Reinhold Weicker, Sie

View more...