Comparing Digital Library Service and P2P Information Retrieval Networks

Comparing Digital Library Service and P2P Information Retrieval Networks
paly

This article provides an overview of the architecture and functionalities of Digital Library Service and P2P Information Retrieval networks. It also discusses the three generations of P2P networks and their unique features, as well as experimental results.

About Comparing Digital Library Service and P2P Information Retrieval Networks

PowerPoint presentation about 'Comparing Digital Library Service and P2P Information Retrieval Networks'. This presentation describes the topic on This article provides an overview of the architecture and functionalities of Digital Library Service and P2P Information Retrieval networks. It also discusses the three generations of P2P networks and their unique features, as well as experimental results.. The key topics included in this slideshow are digital library, P2P, information retrieval, network architecture, distributed hash tables,. Download this presentation absolutely free.

Presentation Transcript


1. Digital Library Service An overview Introduction System Architecture Components and their functionalities Experimental Results

2. Introduction Peer-to-Peer (P2P) Information Retrieval framework Peers that share information Cumulative bandwidth High processing power and storage Absence of high cost hardware Three generations of P2P networks

3. 1 st Generation Centralized DB for coordinated look up Napster 2 nd Generation Flooding to search every node on the network Gneutella 3 rd Generation Distributed Hash Tables Tapestry, Chord, Pastry , CAN, Kademlia Uses routing tables to maintain the addresses of its neighbours

4. In 3G P2P networks log N to N nodes have to be contacted to reach destination. Proposed method, the target peer can be contacted directly from the source peer. Search occurs within the target peer to retrieve file reference using keyword indices in a B+ tree

5. System Architecture P2P cluster and Hadoop cluster Hadoop cluster Extract keywords for efficient searching MapReduce programming paradigm P2P cluster Upload files Servicing search requests

6. Map reduce Master (Job Tracker) DFS Master (Name node) Map reduce Slave (Task Tracker) DFS Slave (Data node) Map reduce Slave (Task Tracker) DFS Slave (Data node) HADOOP CLUSTER P2P CLUSTER Keyword extraction SYSTEM ARCHITECTURE

7. Hadoop Software platform to handle vast amounts of data Moving computation to the place of data rather than moving large data blocks to the place of computation HDFS and MapReduce framework HDFS NameNode and DataNode MapReduce computation Map splits input data set into fragments and assigns each fragment to a map task. (K,V) Reduce Merges all intermediate values associated with a key

8. D1,B1 D2,B1 D1,B2 D1,B3 D3,B1 D2,B2 D3,B2 M M M M M M M K 1 ,C 1 K 2 ,C 1 K 3 ,C 1 K 2 ,C 2 K 5 ,C 2 K 3 ,C 2 K 6 ,C 3 K 3 ,C 3 K 4 ,C 3 K 5 ,C 4 K 2 ,C 4 K 4 ,C 4 K 4 ,C 5 K 1 ,C 5 K 6 ,C 5 K 6 ,C 6 K 3 ,C 6 K 1 ,C 6 K 5 ,C 7 K 6 ,C 7 K 4 ,C 7 Sort and Group (D2) K 1 ,[C 6 ] K 2 ,[C 2 ] K 3 ,[C 2 ,C 6 ] K 5 ,[C 2 ] K 6 ,[C 6 ] Sort and Group (D1) R R R R R R K 1 ,[C 1 ] K 2 ,[C 1 ,C 4 ] K 3 ,[C 1 ,C 3 ] K 4 ,[C 4 ,C 3 ] K 5 ,[C 4 ] K 6 ,[C 3 ] R R R R R K 1 ,I K 2 ,I K 3 , I K 4 , I K 5 , I K 6 ,I K 1 , I K 2 , I K 3 , I K 5 , I K 6 , I Map Task 1 Map Task 2 Map Task 3 Reduce Task 1 Reduce Task 2

9. B+ Tree IP and its hash Represents sorted data indexed by a key for efficient insertion, retrieval and removal of records. Inserting / Searching a record requires O(log B N) operations in the worst case B - order, N - nodes

10. DLS Components Start up component: Starting up the Hadoop cluster Identifying nodes to participate in the P2P cluster. Determining the IP hash values for the peers Using SHA1 (160-bit 40-bit) Forming the B+ tree. Uploading B+ trees in other peers. Starting the Web Server.

11. DB Distribution Component Keyword extraction using Hadoop cluster Hashing keywords (SHA1 (160-bit 40-bit) Find peer with relatively close match Upload in target peer Update B+ tree (Keyword file-ref) in target

12. HADOOP CLUSTER Doc 1 Doc 2 Doc n File name, list of keywords Hash search keys Target Identification Upload the document in target node PEERS in P2P network

13. Search Component Process keywords Find 40-bit hash value Search the B+ tree in peer to identify target node Search B+ tree in target node to retrieve file reference

14. list of keywords Hash search keys Identify the search node using Relative difference between hash vales of keywords and IP address in B+ tree Search the document in target peer PEER2 in P2P network Search request Search request PEER1 in P2P network

15. Add/Delete Peer Update IP address table Compute IP-hash of newly added peer Reconstruct the B+ tree and update in peers Relocate appropriate files to new peer Modify metadata in peers

16. Experimental Results Keyword Extraction from multiple files(1MB each) Observation depends on no of keywords

17. Cluster Set up Time It is a factor of No.of nodes

18. Add a new Peer It is a factor of No. of keywords (for 1 peer)

19. Performance of data distribution Component Load time is a factor of No.of keywords

20. Performance of Search Component Search time remains a constant (9 msec) - B+ tree and search distribution 2 4 6 8 10

21. Conclusion P2P Information Retrieval Framework uses 3G P2P DHT approach B+ trees are maintained in peers Hadoop is used for keyword extraction from multiple files in parallel Efficient search on peers

22. THANK YOU