Analyzing Multi-Core Memory System Interference and Task Scheduling for Many-Core Architectures

Analyzing Multi-Core Memory System Interference and Task Scheduling for Many-Core Architectures
paly

This paper explores strategies for reducing memory system interference in multi-core and many-core architectures. It analyzes the impact of task scheduling on memory hierarchy performance and examines spatial and temporal considerations. Results from experiments with several types of applications are presented.

About Analyzing Multi-Core Memory System Interference and Task Scheduling for Many-Core Architectures

PowerPoint presentation about 'Analyzing Multi-Core Memory System Interference and Task Scheduling for Many-Core Architectures'. This presentation describes the topic on This paper explores strategies for reducing memory system interference in multi-core and many-core architectures. It analyzes the impact of task scheduling on memory hierarchy performance and examines spatial and temporal considerations. Results from experiments with several types of applications are presented.. The key topics included in this slideshow are Multi-core, many-core, memory system interference, task scheduling, spatial scheduling, temporal scheduling, shared cache, on-chip communication, memory controller, interconnect latency,. Download this presentation absolutely free.

Presentation Transcript


1. Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar Mani Azimi * University of Michigan $ Carnegie Mellon University Intel

2. Multi-Core to Many-Core Multi-Core Many-Core 2

3. Many-Core On-Chip Communication 3 Memory Controller Shared Cache Bank $ $ Light Heavy Applications

4. Task Scheduling Traditional When to schedule a task? Temporal Many-Core When to schedule a task? Temporal + Where to schedule a task? Spatial Spatial scheduling impacts performance of memory hierarchy Latency and interference in interconnect, memory, caches 4

5. Problem: Spatial Task Scheduling Applications Cores How to map applications to cores? 5

6. Challenges in Spatial Task Scheduling Applications Cores How to reduce destructive interference between applications ? How to reduce communication distance ? 6 How to prioritize applications to improve throughput?

7. Application-to-Core Mapping 7 Clustering Balancing Isolation Radial Mapping Improve Locality Reduce Interference Improve Bandwidth Utilization Reduce Interference Improve Bandwidth Utilization

8. Step 1 Clustering 8 Inefficient data mapping to memory and caches Memory Controller

9. Step 1 Clustering Improved Locality 9 Reduced Interference Cluster 0 Cluster 2 Cluster 1 Cluster 3

10. Step 1 Clustering Clustering memory accesses Locality aware page replacement policy (cluster-CLOCK) When allocating free page, give preference to pages belonging to the clusters memory controllers (MCs) Look ahead N pages beyond the default replacement candidate to find page belonging to clusters MC Clustering cache accesses Private caches automatically enforce clustering Shared caches can use Dynamic Spill Receive * mechanism 10 *Qureshi et al, HPCA 2009

11. Step 2 Balancing Heavy Light Applications Cores 11 Too much load in clusters with heavy applications

12. Step 2 Balancing Is this the best we can do? Lets take a look at application characteristics Heavy Light Applications Cores 12 Better bandwidth utilization

13. Application Types 13 c PHD Comics

14. Application Types Identify and isolate sensitive applications while ensuring load balance 14 Medium Med Miss Rate High MLP Guru There for cookies Heavy High Miss Rate High MLP Adversary Bitter rival Light Low Miss Rate Nice Guy No opinions Asst. Professor Sensitive High Miss Rate Low MLP Advisor Sensitive Thesis Committee Applications c PHD Comics

15. Step 3 Isolation Heavy Light Applications Cores Sensitive Medium Isolate sensitive applications to a cluster 15 Balance load for remaining applications across clusters

16. Step 3 Isolation How to estimate sensitivity? High Miss high misses per kilo instruction (MPKI) Low MLP high relative stall cycles per miss (STPM) Sensitive if MPKI > Threshold and relative STPM is high Whether to or not to allocate cluster to sensitive applications? How to map sensitive applications to their own cluster? Knap-sack algorithm 16

17. Step 4 Radial Mapping Heavy Light Applications Cores Sensitive Medium Map applications that benefit most from being close to memory controllers c lose to these resources 17

18. Step 4 Radial Mapping What applications benefit most from being close to the memory controller? High memory bandwidth demand Also affected by network performance Metric => Stall time per thousand instructions 18

19. Putting It All Together 19 Balancing Radial Mapping Isolation Clustering Inter-Cluster Mapping Intra-Cluster Mapping Improve Locality Reduce Interference Improve Shared Resource Utilization

20. Evaluation Methodology 60-core system x86 processor model based on Intel Pentium M 2 GHz processor, 128-entry instruction window 32KB private L1 and 256KB per core private L2 caches 4GB DRAM, 160 cycle access latency, 4 on-chip DRAM controllers CLOCK page replacement algorithm Detailed Network-on-Chip model 2-stage routers (with speculation and look ahead routing) Wormhole switching (4 flit data packets) Virtual channel flow control (4 VCs, 4 flit buffer depth) 8x8 Mesh (128 bit bi-directional channels) 20

21. Configurations Evaluated configurations BASE Random core mapping BASE+CLS Baseline with clustering A2C Benchmarks Scientific, server, desktop benchmarks (35 applications) 128 multi-programmed workloads 4 categories based on aggregate workload MPKI MPKI500, MPKI1000, MPKI1500, MPKI2000 21

22. System Performance 22 System performance improves by 17%

23. Network Power 23 Average network power consumption reduces by 52%

24. Summary of Other Results A2C can reduce page fault rate 24

25. Summary of Other Results A2C can reduce page faults Dynamic A2C also improves system performance Continuous Profiling + Enforcement intervals Retains clustering benefits Migration overheads are minimal A2C complements application-aware packet prioritization* in NoCs A2C is effective for a variety of system parameters Number of and placement of memory controllers Size and organization of last level cache 25 *Das et al, MICRO 2009

26. Conclusion Problem: Spatial scheduling for Many-Core processors Develop fundamental insights for core mapping policies Solution: Application-to-Core (A2C) mapping policies A2C improves system performance, system fairness and network power significantly 26 Clustering Clustering Balancing Balancing Radial Radial Isolation Isolation

27. Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar Mani Azimi * University of Michigan $ Carnegie Mellon University Intel