Workload-Directed Adaptive SMP Multicores: Improving Performance and Efficiency

This article discusses the concept of workload-directed adaptive SMP multicores, which adapt to changing workloads to improve performance and efficiency. The article also defines the terms symmetric multiprocessing and multiprocessors.

Uploaded on May 27, 2023 | 0 Views
wendybeier

About Workload-Directed Adaptive SMP Multicores: Improving Performance and Efficiency

PowerPoint presentation about 'Workload-Directed Adaptive SMP Multicores: Improving Performance and Efficiency'. This presentation describes the topic on This article discusses the concept of workload-directed adaptive SMP multicores, which adapt to changing workloads to improve performance and efficiency. The article also defines the terms symmetric multiprocessing and multiprocessors.. The key topics included in this slideshow are workload-directed adaptive SMP multicores, performance, efficiency, symmetric multiprocessing, multiprocessors,. Download this presentation absolutely free.

Presentation Transcript

1. 4. Workload directed adaptive SMP multicores Dezs Sima Dezs Sima 2013 (v1.1, Last updated 10/12/2013) December 2013

2. 4.1 Introduction to workload directed adaptive SMP multicores

3. 4.1 Introduction to workload directed adaptive SMP multicores (1) Interpretation of the terms symmetric multiprocessing/multiprocessors (SMP) Processor 1 Processor 2 Processor n In a symmetric multiprocessor all processors access main memory in the same way, as indicated below. Figure: Example of an SMP multiprocessor (Based on [ 1 ]) 4.1 Introduction to workload directed adaptive SMP multicores

4. Interpretation of the term symmetric multicore processor Core 1 Core 2 Core n In a symmetric multicore processor all cores access main memory in the same way, as indicated below. Figure: Example of a symmetric multicore processor (Based on [ 1 ]) 4.1 Introduction to workload directed adaptive SMP multicores (2)

5. Note Unfortunately, in the literature usually, both symmetrical multiprocessors and symmetrical multicores are designated in the same way, simply by the term SMP . Despite the fact that this may lead to confusion in the terminology, in this section subsequently we will designate symmetrical multicore processors as SMP multicores or SMPs . 4.1 Introduction to workload directed adaptive SMP multicores (3)

6. Asynchronous adaptive SMPs (aSMPs) Synchronous adaptive SMPs Workload directed adaptive SMPs aSMP enables each core to run at different voltage and frequency This results in lower power by scaling down voltage and frequency of each core to the actual load It restricts each core to run at the same voltage and frequency Voltage and frequency of the core cluster is determined by the highest load in the cluster Cores running low intensity applications waste power running at higher voltage and frequency than needed. Based on [ 2 ] 4.1 Introduction to workload directed adaptive SMP multicores (4) Qualcomm Snapdragon S4 (2012) Examples ARMs big.LITTLE technology (2011) Nvidias vSMP technology (2011)

7. Synchronous adaptive SMP in the 1 + n configuration Synchronous adaptive SMP in the n + n configuration Synchronous adaptive SMPs Using two clusters of cores a cluster of low power (LITTLE) cores and a cluster of high performance (big) cores. E.g. ARMs solution, called big.LITTLE processing It is Nvidias solution, called Variable SMP Using two clusters of cores a cluster of a single low power core and a cluster of high performance cores. Cache coherent interconnect Cluster of a single low power core Cluster of high performance cores CPU0 CPU0 CPU1 CPU2 CPU3 Cache coherent interconnect Cluster of LITTLE cores E.g. Cluster of big cores CPU0 CPU1 CPU2 CPU3 CPU0 CPU1 CPU2 CPU3 To memory To memory 4.1 Introduction to workload directed adaptive SMP multicores (5)

8. Example for performance and energy efficiency of high performance (Cortex-A15) and low power (Cortex-A7) cores [ 3 ] 4.1 Introduction to workload directed adaptive SMP multicores (6)

9. 4.2 Principle of Nvidias variable SMP

10. 4.2 Principle of Nvidias variable SMP (1) Nvidias variable SMP is in fact synchronous adaptive SMP in the 1 + n configuration. It includes two clusters of cores , as shown below: E.g. Cache coherent interconnect Cluster of a single low power core Cluster of high performance cores CPU0 CPU0 CPU1 CPU2 CPU3 4.2 Principle of Nvidias variable SMP Figure: Example layout of Nvidias variable SMP a cluster of a single low power core and a cluster of high performance cores .

11. Exclusive use of the clusters Inclusive use of the clusters Usage models of synchronous adaptive SMPs in the 1 + n configuration Usage models of synchronous adaptive SMPs in the 1 + n configuration Nvidias Variable SMP in Tegra 3 (2011) and Tegra 4 (2013) E.g. It is not implemented yet. 4.2 Principle of Nvidias variable SMP (2)

12. The low power core Optimized for low power consumption by using transistors that require low power to operate, as shown below [4] . 4.2 Principle of Nvidias variable SMP (3)

13. Power-Performance curve of Nvidias vSMP [ 4 ] 4.2 Principle of Nvidias variable SMP (4)

14. 4.3 Principle of ARMs big.LITTLE technology

15. 4.3 Principle of ARMs big.LITTLE technology (1) ARMs big.LITTLE technology is in fact synchronous adaptive SMP in the n + n configuration. It includes two clusters of cores , as shown below: 4.3 Principle of ARMs big.LITTLE technology Figure: Example layout of ARMs big.LITTLE technology a cluster of a low power cores, termed as the LITTLE cores and a cluster of high performance cores, termed as the big cores . CPU0 CPU1 CPU2 CPU3 Cache coherent interconnect Cluster of LITTLE cores Cluster of big cores CPU0 CPU1 CPU2 CPU3 To memory

16. Usage models of synchronous adaptive SMPs in the n + n configuration Usage models of synchronous adaptive SMPs in the n+n configuration Exclusive/inclusive use of the clusters The cluster migration model 4.3 Principle of ARMs big.LITTLE technology (2)

17. Exclusive use of the clusters Inclusive use of the clusters Exclusive/inclusive use of the clusters Clusters are used exclusively, i.e. at a time one of the clusters is in use as shown below for the cluster migration model (to be discussed later) Clusters are used inclusively, i.e. at a time both clusters can be used partly or entirely Cache coherent interconnect Cluster of big cores CPU0 CPU1 CPU2 CPU3 CPU0 CPU1 CPU2 CPU3 Cache coherent interconnect Cluster of LITTLE cores CPU0 CPU1 CPU2 CPU3 Cluster of big cores CPU0 CPU1 CPU2 CPU3 Cluster of LITTLE cores Cache coherent interconnect Cluster of big cores CPU0 CPU1 CPU2 CPU3 CPU0 CPU1 CPU2 CPU3 Cache coherent interconnect Cluster of LITTLE cores CPU0 CPU1 CPU2 CPU3 Cluster of big cores Cluster of LITTLE cores CPU0 CPU1 CPU2 CPU3 Exclusive/inclusive use of the clusters Low load High load Low load High load 4.3 Principle of ARMs big.LITTLE technology (3)

18. Usage models of synchronous adaptive SMPs in the n + n configuration Usage models of synchronous adaptive SMPs in the n+n configuration Exclusive/inclusive use of the clusters The cluster migration model 4.3 Principle of ARMs big.LITTLE technology (2)

19. Exclusive use of the clusters Inclusive use of the clusters Cluster migration Core migration Core migration big.LITTLE MP big.LITTLE processing with cluster migration big.LITTLE processing with core migration The cluster migration model The cluster migration model [5] 4.3 Principle of ARMs big.LITTLE technology (4)

20. There are two core clusters , the LITTLE core cluster and the big core cluster. Tasks run on either the LITTLE or the big core cluster, so only one core cluster is active at any time (except a short interval during a cluster switch). Low workloads , such as background synch tasks, audio or video playback run typically on the LITTLE core cluster . If the workload becomes higher than the max performance of the LITTLE core cluster the workload will be migrated to the big core cluster and vice versa . Big.LITTLE processing with cluster migration [ 5 ] 4.3 Principle of ARMs big.LITTLE technology (5)

21. Cluster selection is driven by OS power management. OS (e.g. the Linux cpufreq routine) samples the load for all cores in the cluster and selects an operating point for the cluster. It switches clusters at terminal points of the current clusters DVFS curve, as illustrated in the next Figure. Cluster switches [ 6 ] 4.3 Principle of ARMs big.LITTLE technology (6)

22. Power/performance curve during cluster switching [ 7 ] (Low power core) (High performance core) DVFS operating points A switch from the low power cluster to the high performance cluster is an extension of the DVFS strategy . A cluster switch lasts about 30 kcycles. 4.3 Principle of ARMs big.LITTLE technology (7)

23. Big.LITTLE processing with core migration [ 5 ], [ 8 ] There are two core clusters , the LITTLE core cluster and the big core cluster. Cores are grouped into pairs of one big core and one LITTLE core. The LITTLE and the big core of a group are used exclusively . Each LITTLE core can switch to its big counterpart if it meets a higher load than its max. performance and vice versa. Each core switch is independent from the others. 4.3 Principle of ARMs big.LITTLE technology (8)

24. Core switches [ 6 ] Core selection in any core pair is performed by OS power management . The DVFS algorithm monitors the core load. When a LITTLE core cannot service the actual load , a switch to its big counterpart is initiated and the LITTLE core is turned off and vice versa. 4.3 Principle of ARMs big.LITTLE technology (9)

25. big.LITTLE MP processing with core migration [ 8 ],[ 5 ] The OS scheduler has all cores of both clusters at its disposal and can activate all cores at any time. Tasks can run or be moved between the LITTLE cores and the big cores as decided by the scheduler. big.LITTLE MP termed also as Heterogeneous Multiprocessing (HMP). 4.3 Principle of ARMs big.LITTLE technology (10)

26. Exclusive use of the clusters Inclusive use of the clusters Cluster migration Core migration Core migration big.LITTLE MP (Heterogeneous Multiprocessing) big.LITTLE processing with cluster migration big.LITTLE processing with core migration big.LITTLE tecnology Use of the big.LITTLE technology in recent mobile processors Samsung Exynos 5 Octa 5410 (2013) (4 + 4 cores) Samsung HMP on Exynos 5 Octa 5420 (2013) (4 + 4 cores) Used in Described first in ARMs White Paper (2012) [ 9 ] Mediatek MT 8135 (2013) (2 + 2 cores) Renesas MP 6530 (2013) (2 + 2 cores) Described first in ARMs White Paper (2011) [ 3 ] Described first in ARMs White Paper (2011) [ 3 ] 4.3 Principle of ARMs big.LITTLE technology (11)

27. References ( 1 ) [ 2 ]: Sag A., Qualcomm Snapdragon S4 Benchmarking Day , BSN, July 25 2012, http://www.brightsideofnews.com/news/2012/7/25/qualcomm-snapdragon-s4- benchmarking-day.aspx [ 3 ]: Greenhalgh P., Big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7, White Paper, Sept. 2011, http://www.arm.com/files/downloads/big.LITTLE_Final.pdf [ 4 ]: Variable SMP A Multi-Core CPU Architecture for Low Power and High Performance, Nvidia, Whitepaper, 2011, http://www.nvidia.com/content/PDF/tegra_white_papers/tegra- whitepaper-0911b.pdf [ 1 ]: Wikipedia, File:SMP - Symmetric Multiprocessor System.svg , http://en.wikipedia.org/wiki/File:SMP_-_Symmetric_Multiprocessor_System.svg [ 5 ]: Klug B., Samsung Announces big.LITTLE MP Support in Exynos 5420 , AnandTech, Sept. 11 2013, http://www.anandtech.com/show/7313/samsung-announces-biglittle- mp-support-in-exynos-5420 [ 6 ]: Gupta A., Implications of Per CPU switching in a big.LITTLE system, ARM [ 7 ]: Glffy Cs., rkezik a valban nyolcmagos Samsung Exynos 5, HWSW, Sept. 10 2013, http://www.hwsw.hu/hirek/50915/samsung-exynos-5-octa-arm-big-little-hmp-cortex.html [ 8 ]: MediaTek Enables ARM big.LITTLE Heterogeneous Multi-Processing Technology in Mobile SoCs, http://www.mediatek.com/_en/Event/201307_TrueOctaCore/MediaTekEnablesARM bigLITTLEHMPTechnology.pdf [ 9 ]: Jeff B., Advances in big.LITTLE Technology for Power and Energy Savings, White Paper, Sept. 2012, http://www.arm.com/files/pdf/Advances_in_big.LITTLE_Technology_for_ Power_and_Energy_Savings.pdf