PrefixSpan Algorithm: Efficient Sequential Pattern Mining

This article discusses the PrefixSpan algorithm, a widely used approach for mining sequential patterns efficiently. The article introduces the problem statement and describes the existing solutions before presenting the proposed solution. The article concludes with references for further reading.

Uploaded on Jun 01, 2023 | 10 Views
lilyhaag

About PrefixSpan Algorithm: Efficient Sequential Pattern Mining

PowerPoint presentation about 'PrefixSpan Algorithm: Efficient Sequential Pattern Mining'. This presentation describes the topic on This article discusses the PrefixSpan algorithm, a widely used approach for mining sequential patterns efficiently. The article introduces the problem statement and describes the existing solutions before presenting the proposed solution. The article concludes with references for further reading.. The key topics included in this slideshow are PrefixSpan Algorithm, Mining Sequential Patterns, Prefix Projected Pattern Growth, Algorithm, Efficient, Sequence Mining,. Download this presentation absolutely free.

Presentation Transcript

1. 1/16 PREFIXSPAN ALGORITHM PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth al113301m@student.etf.rs

2. 2/16 2. Problem statement 5. A lgorithm 1. Introduction 3. Existing Solutions 4. Proposed Solution 6. Conclusion 7. References Contet Contet al113301m@student.etf.rs

3. 3/16 Introduction Introduction Given a set of sequences, where each sequence consists of a list of elements and each element consists of set of items. - 5 elements, 9 items - 9-sequence id Sequence 10 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 al113301m@student.etf.rs

4. 4/16 Subsequence vs. super sequence Subsequence vs. super sequence Given two sequences = and =. is called a subsequence of , denoted as , if there exist integers 1j 1 Correct : 1 = 2 = < ( ac)(ac)d(cf)> 3 = Not correct: 4 =< df(cf)> 5 =<( cf)d> 6 =<( abc)dcf al113301m@student.etf.rs

5. 5/16 Sequential Pattern Mining Sequential Pattern Mining Find all the frequent subsequences, i.e. the subsequences whose occurrence frequency in the set of sequences is no less than min_support (user-specified). id Sequence 10 20 <(ad)c(bc)(ae) > 30 <(ef)(ab)(df)cb > 40 Solution 53 frequent subsequences: <(ab)> <(ab)c> <(ab)d> <(ab)f> <(ab)dc> <(bc)> <(bc)a> min_support = 2 al113301m@student.etf.rs

6. 6/16 Existing Solutions Existing Solutions Apriori-like approache s (AprioriSome (1995.), AprioriAll (1995.), DynamicSome (1995.), GSP (1996.)): Potentially huge set of candidate sequences, Multiple scans of databases, Difficulties at mining long sequential patterns. FreeSpan (2000.) - pattern groth method ( Frequent pattern-projected Sequential pattern mining) General idea is to use frequent items to recursively project sequence databases into a smaller projected databases , and grow subsequence fragments in each projected database. PrefixSpan (Prefix-projected Sequential pattern mining) Less projections and quickly shrinking sequence. al113301m@student.etf.rs

7. 7/16 Prefix Prefix Given two sequences = and =, mn. Sequence is called a prefix of if and only if: b i = a i for i m-1; b m a m ; Example : =< a(abc)(ac)d(cf)> =< a(abc)a> al113301m@student.etf.rs

8. 8/16 Projection Projection Given sequences and , such that is a subsequence of . A subsequence of sequence is called a projection of w.r.t. prefix if and only if: has prefix ; There exist no proper super-sequence of such that: is a subsequence of and also has prefix . Example: =< a(abc)(ac)d(cf)> =<( bc)a> =<( bc)(ac)d(cf)> al113301m@student.etf.rs

9. 9/16 Postfix Postfix Let =< a 1 ,a 2 a n > be the projection of w.r.t. prefix =< a 1 a 2 a m-1 a m > (m n) . Sequence =< a m a m+1 a n > is called the postfix of w.r.t. prefix , denoted as = / , where a m =(a m - a m ). We also denote = . Example: =< a(abc)(ac)d(cf)>, =< a(abc)a>, =<(_ c)d(cf)> . al113301m@student.etf.rs

10. 10/16 PrefixSpan Algorithm PrefixSpan Algorithm Input of the algorithm : A sequence database S, and the minimum support threshold min_support. Output of the algorithm: The complete set of sequential patterns. Subroutine: PrefixSpan( , L, S| ) . Parameters: : sequential pattern, L: the length of ; S| : : the -projected database, if <>; otherwise; the sequence database S. Call PrefixSpan(<>,0,S). id Sequence 10 20 <(ad)c(bc)(ae) > 30 <(ef)(ab)(df)cb > 40 al113301m@student.etf.rs

11. 11/16 PrefixSpan Algorithm (2) PrefixSpan Algorithm (2) Method: 1. Scan S | once, find the set of frequent items b such that: b can be assembled to the last element of to form a sequential pattern; or can be appended to to form a sequential pattern. 2. For each frequent item b: append it to to form a sequential pattern and output ; output ; 3. For each : construct -projected database S | and call PrefixSpan( , L+1, S | ). al113301m@student.etf.rs

12. 12/16 1. Find length1sequential patterns: 2. Divide search space Prefix <(abc)(ac)d(cf) > <(_d)c(bc)(ae) > <(_b)(df)cb > <(_f)cbc > <(_c)(ac)d(cf)> <(_c)(ae)> <(df)cb> <(ab)(df)cb> <(_f)(ab)(df)cb> <(af)cbc> <(cf)> <(_f)cb> <(ac)d(cf)> <(bc)(ae)> 4 4 4 3 3 3 1 id Sequence 10 20 <(ad)c(bc)(ae) > 30 <(ef)(ab)(df)cb > 40 PrefixSpan - Example PrefixSpan - Example al113301m@student.etf.rs

13. 13/16 PrefixSpan Example (2) PrefixSpan Example (2) Find subsets of sequential patterns: <(cf)> <(_f)cb> <(bc)(ae)> <(_c)(ae)> <> <_f> 1 2 3 0 1 1 1 2 1 1 1 al113301m@student.etf.rs

14. 14/16 Conclusions Conclusions PrefixSpan Efficient pattern growth method. Outperforms both GSP and FreeSpan. Explores prefix-projection in sequential pattern mining. Mines the complete set of patterns, but reduces the effort of candidate subsequence generation. Prefix-projection reduces the size of projected database and leads to efficient processing. Bi-level projection and pseudo-projection may improve mining efficiency. al113301m@student.etf.rs

15. 15/16 References References Pei J., Han J., Mortazavi-Asl J., Pinto H., Chen Q., Dayal U., Hsu M., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth, 17th International Conference on Data Engineering (ICDE), April 2001. Agrawal R., Srikant R., Mining sequential patterns, Proceedings 1995 Int. Conf. Very Large Data Bases (VLDB94), pp. 487-499, 1999. Han J., Dong G., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.-C., Freespan: Frequent pattern-projected sequential pattern mining, Proceedings 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD00), pp. 355-359, 2000. Wojciech Stach, http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput695- 04/slides/PrefixSpan-Wojciech.pdf. al113301m@student.etf.rs

16. 16/16 Thank you for attention. Questions? Lazar Arsi 2011/3301 AL113301m@student.etf.rs Thank you for attention. Questions? Lazar Arsi 2011/3301 AL113301m@student.etf.rs al113301m@student.etf.rs

Related

Data Mining Techniques for Market Basket Analysis

Spatial Data Mining: Definition, Applications, and Algorithms

CS1104 Lecture 13: Sequential Logic, Counters and Registers

Introduction to Rietveld Refinement using HighScore Plus v3.0e

Link Reversal Algorithms: A Distributed Algorithm Design Technique for Routing and Other Problems

Introduction to Snapshot Tool in VARS Data Mining Project

Time Series II: Data Mining and Analysis Course

Dyn Sched CSE 471: Enhancements to Tomasulo's Algorithm

Round Robin Scheduling Algorithm in Operating System

Optimized Soft Shadow Volume Algorithm with Real-Time Performance

Optimal Coin Payment using Greedy Algorithm

Network Layer Design Issues: Routing Algorithm

Real Time Semi-supervised Source Separation for Online Speech Denoising and Video Chatting

Efficient Service Provision at Local Government Level in Zambia: Implications and Challenges for Decentralisation

Efficient Diversification: Understanding Portfolio Risk

Energy Efficient Chullah: SANTI SEBA

Understanding Efficient Market Hypothesis and its Implications for Business, Corporate Finance, and Investment

Graphing Notes Part 2: Patterns

Gold Crew Spill Cleanup Solution for Efficient Surface Washing

Innovative and Efficient Methods for Generating and Using Electric Energy on Cargo Ships

Minimum Work for Separation Process

Gas Turbine Overhaul Plan (GTOP) Software for Efficient Maintenance and Documentation

Improving Java User Interface Basics

The Evolution of the Periodic Table