Early Experiences with NFS over RDMA .

Uploaded on:
2. Plot. MotivationRDMA technologiesNFS over RDMATestbed equipment and softwarePreliminary results and analysisConclusion Ongoing work and Future Plans. . 3. What is NFS. A system joined capacity record access convention layered on RPC, commonly persisted UDP/TCP over IPAllow documents to be shared among different customers crosswise over LAN and WANStandard, steady and develop convention embraced for bunch pl
Slide 1

Early Experiences with NFS over RDMA OpenFabric Workshop San Francisco, September 25, 2006 Sandia National Laboratories, CA Helen Y. Chen, Dov Cohen, Joe Kenny Jeff Decker, and Noah Fischer hycsw,idcoehn,jcdecke,nfische@sandia.gov SAND 2006-4293C

Slide 2

Motivation RDMA innovations NFS over RDMA Testbed equipment and programming Preliminary outcomes and investigation Conclusion Ongoing work and Future Plans Outline

Slide 3

What is NFS A system appended capacity record get to convention layered on RPC, ordinarily continued UDP/TCP over IP Allow documents to be shared among different customers crosswise over LAN and WAN Standard, steady and develop convention embraced for bunch stage

Slide 4

Application 1 Application N Application 2 Concurrent I/O Concurrent I/O Concurrent I/O NFS Server NFS Scalability Concerns in Large Clusters Large number of simultaneous solicitations from parallel applications Parallel I/O asks for serialized by NFS to a huge broaden Need RDMA and pNFS

Slide 5

How DMA Works

Slide 6

How RDMA Works

Slide 7

Why NFS over RDMA NFS moves huge pieces of information acquiring many duplicates with each RPC Cluster Computing High transfer speed and low inertness RDMA Offload convention preparing Offload have memory I/O transport An absolute necessity for 10/20 Gbps systems http://www.ietf.org/web drafts/draft-ietf-nfsv4-nfs-rdma-issue proclamation 04.txt

Slide 8

NFSv2 NFSv3 NFSv4 NLM NFSACL RPC XDR UDP TCP RDMA The NFS RDMA Architecture NFS is a group of convention layered over RPC XDR encodes RPC asks for and comes about onto RPC transports NFS RDMA is actualized as another RPC transport instrument Selection of transport is a NFS mount choice Brent Callaghan, Theresa Lingutla-Raj, Alex Chiu, Peter Staubach, Omer Asad, "NFS over RDMA", ACM SIGCOMM 2003 Workshops, August 25-27, 2003

Slide 9

This Study

Slide 10

OpenFabrics Software Stack SA Subnet Administrator IP Based App Access Sockets Based Access Various MPIs Block Storage Access Clustered DB Access to File Systems Application Level MAD Management Datagram Diag Tools Open SM SMA Subnet Manager Agent User Level MAD API UDAPL User APIs PMA Performance Manager Agent InfiniBand OpenFabrics User Level Verbs/API iWARP User Space SDP Lib IPoIB IP over InfiniBand SDP Sockets Direct Protocol Kernel Space Upper Layer Protocol SRP SCSI RDMA Protocol (Initiator) IPoIB SDP SRP iSER RDS NFS-RDMA RPC Cluster File Sys iSER iSCSI RDMA Protocol (Initiator) Connection Manager Abstraction (CMA) RDS Reliable Datagram Service Mid-Layer SA Client MAD SMA Connection Manager Connection Manager UDAPL User Direct Access Programming Lib Kernel sidestep Kernel sidestep HCA Host Channel Adapter InfiniBand OpenFabrics Kernel Level Verbs/API iWARP R-NIC RDMA NIC Provider Hardware Specific Driver Hardware Specific Driver Key Common Apps & Access Methods for utilizing OF Stack InfiniBand Hardware InfiniBand HCA iWARP R-NIC iWARP Offers a typical, open source, and open improvement RDMA application programming interface http://openfabrics.org/

Slide 11

Testbed Key Hardware Mainboard: Tyan Thunder K8WE (S2895) http://www.tyan.com/items/html/thunderk8we.html CPU – Dual 2.2 Ghz AMD Opteron Skt940 http://www.amd.com/us-en/resources/content_type/white_papers_and_tech_docs Memory – 8 GB ATP 1GB PC3200 DDR SDRAM on NFS server and 2 GB CORSAIR CM725D512RLP-3200/M on customer IB Switch: Flextronics InfiniScale III 24-port switch http://mellanox.com/items/switch_silicon.php IB HCA: Mellanox MT25208 InfiniHost III Ex http://www.mellanox.com/items/shared/Infinihostglossy.pdf

Slide 12

Testbed Key Software Kernel: Linux with due date I/O scheduler NFS/RDMA discharge competitor 4 – http://sourceforge.net/ventures/nfs-rdma/oneSIS used to boot every one of the hubs http://www.oneSIS.org OpenFabric IB stack svn 7442 http://openib.org

Slide 13

Server IB switch Clients Testbed Configuration One NFS server and up to four customers NFS/TCP versus NFS/RDMA IPoIB and IB RDMA running SDR Ext2 with Software RAID0 backend Clients ran IOZONE composing and perusing 64KB records and 5GB total document measure used To take out reserve impact on customer To keep up steady plate I/O on server Allowing the assessment of NFS/RDMA transport without being compelled by circle I/O System assets observed utilizing vmstat at 2s interims

Slide 14

Local, NFS, and NFS/RDMA Throughput Reads are from server store reflecting TCP RPC transport accomplished ~180 MB/s (1.4 Gb/s) of throughput RDMA RPC transport was fit for conveying ~700MB/s (5.6Gb/s) throughput RPCNFSDCOUNT=8/proc/sys/sunrpc/svc_rdma/max_requests=16

Slide 15

NFS & NFS/RDMA Server Disk I/O Writes acquired plate I/O issued as indicated by due date scheduler NFS/RDMA server has higher approaching information rate, and in this manner higher square I/O yield rate to circle NFS/RDMA information rate bottlenecked by the capacity I/O rate as demonstrated by the higher IOWAIT time

Slide 16

NFS versus NFS/RDMA Client Interrupt and Context Switch NFS/RDMA brought about ~1/8 of Interrupts, finished in somewhat more than 1/2 of the time NFS/RDMA demonstrated higher setting switch rates showing quicker handling of use solicitations Higher throughput contrasting with NFS!

Slide 17

Client CPU Efficiency CPU per MB of exchange: (D t ) * S % cpu/100/record measure Write NFS 0.00375 NFS/RDMA = 0.00144 61.86% more productive! Perused NFS = 0.00435 NFS/RDMA = 0.00107 75.47% more productive! Enhanced application execution

Slide 18

Server CPU Efficiency CPU per MB of exchange: (D t) * S % cpu/100/record measure Write NFS = 0.00564 NFS/RDMA = 0.00180 68.10% more effective! Perused NFS = 0.00362 NFS/RDMA = 0.00055 84.70% more proficient! Enhanced framework execution

Slide 19

Scalability Test - Throughput To minimize the effect of circle I/O One 5GB, two 2.5GB, three 1.67GB, four 1.25GB Ignored change and rehash because of customer side reserve impact

Slide 20

Scalability Test – Server I/O NFS RDMA transport exhibited speedier preparing of simultaneous RPC I/O solicitations and reactions from and to the 4 customers than NFS Concurrent NFS/RDMA composes were affected more by our moderate stockpiling as demonstrated by the near 80% CPU IOWAIT times

Slide 21

Scalability Test – Server CPU NFS/RDMA brought about ~½ the CPU overhead and for half of the span, however conveyed 4 times the total throughput contrasting with NFS/RDMA compose execution was affected more by the backend stockpiling than NFS, as showed by the ~70% versus ~30% sit still CPU time sitting tight for IO to finish

Slide 22

Preliminary Conclusion Compared to NFS, NFS/RDMA illustrated: noteworthy CPU productivity and promising versatility NFS/RDMA will Improve application and framework level execution! NFS/RDMA can undoubtedly exploit the transfer speed in 10/20 Gigabit arrange for expansive document gets to

Slide 23

Ongoing Work SC06 support HPC Storage Challenge Finalist Micro benchmark MPI Applications with POSIX as well as MPI I/O Xnet NFS/RDMA demo over IB and iWARP

Slide 24

Future Plans Initiate investigation of NFSv4 pNFS execution with RDMA stockpiling Blocks (SRP, iSER) File (NFSv4/RDMA) Object (iSCSI-OSD)?

Slide 25

NFSv3 Use of auxiliary Network Lock Manager (NLM) convention includes multifaceted nature and points of confinement versatility in parallel I/O No quality storing necessity squelches execution NFSv4 Use of Integrated bolt administration permits byte run locking required for Parallel I/O Compound operations enhances effectiveness of information development and … Why NFSv4

Slide 26

Why Parallel NFS (pNFS) pNFS stretches out NFSv4 Minimum expansion to permit out-of-band I/O Standards-based adaptable I/O arrangement Asymmetric, out-of-band arrangements offer adaptability Control way (open/close) not quite the same as Data Path (read/compose) http://www3.ietf.org/procedures/04nov/slides/nfsv4-8/pnfs-reqs-ietf61.ppt

Slide 27

Acknowledgment The writers might want to thank the accompanying for their specialized information Tom Talpey and James Lentini from NetApp Tom Tucker from Open Grid Computing James Ting from Mellanox Matt Leininger and Mitch Sukalski from Sandia

View more...