Hippocratic Information Administration.


57 views
Uploaded on:
Category: Product / Service
Description
17% security fundamentalists, 56% down to business lion's share, 27% hardly concerned ... We can draftsman registering frameworks to ensure values we accept are ...
Transcripts
Slide 1

Hippocratic Data Management Rakesh Agrawal IBM Almaden Research Center

Slide 2

Thesis We require data frameworks that regard the security of information they oversee AND don\'t block the helpful stream of data. It is achievable to accommodate the obvious inconsistency

Slide 3

Outline Why Privacy in Data Systems Some Technology Directions Some Challenging Problems

Slide 4

Drivers for Privacy Surveys: 17% protection fundamentalists, 56% realistic greater part, 27% insignificantly concerned (Understanding net clients\' demeanor about online security, April 99) 83% would quit working with an organization in the event that it abused client data (Privacy on and off the Internet: What purchasers need, Nov. 2001) Govt. enactments & rules: Fair Information Practices Act (US, 1974) OECD Guidelines (Europe, 1980) Canadian Standards Association\'s Model Code (1995) Australian Privacy Amendment (2000) Japan: proposed enactment (2003) HIPAA, GLB, Recent U.S. Government & State Initiatives

Slide 6

Privacy Violations Accidents: Kaiser, GlobalHealthrax Lax security: Massachusetts govt. Morally faulty conduct: Lotus & Equifax, Lexis-Nexis, Medical Marketing Service, Boston University, CVS & Giant Food Illegal: Toysmart

Slide 7

Assertion Enterprises need instruments and advancements for overseeing private information and upholding protection approaches.

Slide 8

Founding Tenets of Current Database Systems Ullman, "Standards of Database and Knowledgebase Systems" Fundamental: Manage tenacious information. Access a lot of information proficiently. Attractive: Support for information model, abnormal state dialects, exchange administration, access control , and strength. Comparable rundown in other database course readings.

Slide 9

Statistical & Secure Databases Statistical Databases Provide measurable data (whole, number, and so on.) without bargaining delicate data about people, [AW89] Multilevel Secure Databases Multilevel relations, e.g., records labeled "mystery", "classified", or "unclassified", e.g. [JS91] Need to ensure protection in value-based databases that bolster every day operations. Can\'t limit questions to factual inquiries. Can\'t label every one of the records "top mystery".

Slide 10

Our Research Directions Privacy Preserving Data Mining Hippocratic Databases

Slide 11

Data Mining and Privacy The essential errand in information mining: improvement of models about totaled information. Can we create exact models without access to exact data in individual information records? R. Agrawal, R. Srikant. Security Preserving Data Mining . ACM Int\'l Conf. On Management of Data (SIGMOD), May 2000.

Slide 12

30 | 25K | … 50 | 40K | … Randomizer 65 | 50K | … 35 | 60K | … Reconstruct Age Distribution Reconstruct Salary Distribution Data Mining Algorithm Model Privacy Preserving Data Mining

Slide 13

Reconstruction Problem Original qualities x 1 , x 2 , ..., x n from likelihood circulation X To conceal these qualities, we utilize y 1 , y 2 , ..., y n from likelihood conveyance Y Given x 1 +y 1 , x 2 +y 2 , ..., x n +y n the likelihood dissemination of Y Estimate the likelihood dispersion of X.

Slide 14

Intuition (Reconstruct single point) Use Bayes\' principle for thickness capacities

Slide 15

Intuition (Reconstruct single point) Use Bayes\' tenet for thickness capacities

Slide 16

Reconstruction: Intuition Combine assessments of where a point originated from for every one of the focuses: yields evaluation of unique appropriation.

Slide 17

Reconstruction Algorithm f X 0 := Uniform dispersion j := 0 rehash f X j+1 (a) := Bayes\' Rule j := j+1 until (halting measure met) Converges to greatest probability gauge. D. Agrawal & C.C. Aggarwal, PODS 2001.

Slide 18

Works Well

Slide 19

Classification Naïve Bayes Assumes autonomy between qualities. Choice Tree Correlations are debilitated by randomization.

Slide 20

Experimental Methodology Compare precision against Original : unperturbed information without randomization. Randomized : annoyed information however without making any remedies for randomization. Test information not randomized. Manufactured information benchmark from [AGI+92]. Preparing set of 100,000 records, split similarly between the two classes.

Slide 21

Decision Tree Experiments

Slide 22

Accuracy versus Randomization

Slide 23

So far … Question: Can we create exact models without access to exact data in individual information records? Answer: yes, by randomization. for numerical traits, grouping How about Association Rules?

Slide 24

Associations Recap An exchange t is an arrangement of things (e.g. books) All exchanges shape a set T of exchanges Any itemset A has bolster s in T if Itemset An is successive if s  s min Task: Find all regular itemsets

Slide 25

The Problem How to randomize exchanges with the goal that we can discover incessant itemsets while safeguarding security at exchange level? Evfimievski, R. Srikant, R. Agrawal, J. Gehrke. Mining Association Rules Over Privacy Preserving Data . eighth Int\'l Conf. on Knowledge Discovery in Databases and Data Mining, July 2002 .

Slide 26

Randomization Overview Alice J.S. Bach, painting, nasa.gov, … J.S. Bach, painting, nasa.gov, … Recommendation Service B. Lances, baseball, cnn.com, … Bob B. Lances, baseball, cnn.com, … B. Marley, outdoors, linux.org, … Chris B. Marley, outdoors, linux.org, …

Slide 27

Randomization Overview Alice J.S. Bach, painting, nasa.gov, … J.S. Bach, painting, nasa.gov, … Recommendation Service B. Lances, baseball, cnn.com, … Bob Associations B. Lances, baseball, cnn.com, … B. Marley, outdoors, linux.org, … Chris Recommendations B. Marley, outdoors, linux.org, …

Slide 28

Randomization Overview Alice J.S. Bach, painting, nasa.gov, … Metallica, painting, nasa.gov, … Recommendation Service Support Recovery B. Lances, soccer, bbc.co.uk, … Bob Associations B. Lances, baseball, cnn.com, … B. Marley, outdoors, ibm.com … Chris Recommendations B. Marley, outdoors, linux.org, …

Slide 29

Uniform Randomization Given an exchange, keep thing with, say 20% likelihood, supplant with another arbitrary thing with 80% likelihood.

Slide 30

10 M exchanges of size 10 with 10 K things: 1% have { x , y , z } 5% have { x , y } , { x , z } , or { y , z } just 94% have one or zero things of { x , y , z } Example: { x , y , z } at most • 0.2 • (9/10,000) 2 • 0.2 3 • 0.2 2 • 8/10,000 0.008% 800 ts. 97.8% 0.00016% 16 trans. 1.9% under 0.00002% 2 exchanges 0.3% Privacy Breach : Given { x , y , z } in the randomized exchange, we have around 98% sureness of { x , y , z } in the first one

Slide 31

Privacy Breach Suppose: t is a unique exchange; t\' is the comparing randomized exchange; A will be a (continuous) itemset. Definition : Itemset A causes a protection break of level  if, for some thing z  A ,

Slide 32

Our Solution "Where does an astute man conceal a leaf? In the backwoods. Be that as it may, what does he do if there is no backwoods?" "He grows a woods to conceal it in." G.K. Chesterton Insert numerous false things into every exchange Hide genuine itemsets among false ones Can regardless we find successive itemsets while having adequate protection?

Slide 33

Cut and Paste Randomization Given exchange t of size m , build t\' : t = a , b , c , u , v , w , x , y , z t\' =

Slide 34

Cut and Paste Randomization Given exchange t of size m , develop t\' : Choose a number j somewhere around 0 and K m (cutoff); t = a , b , c , u , v , w , x , y , z t\' = j = 4

Slide 35

Cut and Paste Randomization Given exchange t of size m , build t\' : Choose a number j somewhere around 0 and K m (cutoff); Include j things of t into t\' ; t = a , b , c , u , v , w , x , y , z t\' = b , v , x , z j = 4

Slide 36

Cut and Paste Randomization Given exchange t of size m , develop t\' : Choose a number j somewhere around 0 and K m (cutoff); Include j things of t into t\' ; Each other thing is incorporated into t\' with likelihood p m . The decision of K m and p m depends on the wanted level of security. t = a , b , c , u , v , w , x , y , z t\' = b , v , x , z œ , å , ß , ξ , ψ , € , א , ъ , ђ , … j = 4

Slide 37

Partial Supports To recoup unique backing of an itemset, we require randomized backings of its subsets. Given an itemset An of size k and exchange size m , A vector of halfway backings of An is Here s k is the same as the backing of A . Randomized incomplete backings are signified by

Slide 38

Transition Matrix Let k = | A | , m = | t |. Move lattice P = P ( k, m ) associates randomized halfway backings with unique ones: Randomized backings are conveyed as an aggregate of multinomial dispersions.

Slide 39

The Unbiased Estimators Given randomized fractional backings, we can evaluate unique halfway backings: Covariance lattice for this estimator: To gauge it, substitute s l with ( s est ) l . Unique case: estimators for backing and its change

Slide 40

Privacy Breach Analysis what number added things are sufficient to ensure security? Need to fulfill Pr [ z  t | A  t\' ] <  ( no security ruptures) Select parameters with the goal that it holds for all itemsets. Use equation ( ): Parameters are to be chosen ahead of time! Build a security testing test: an itemset whose all subsets have most extreme conceivable backing. Enough to know maximal backing of an itemset for every size.

Slide 41

Lowest Discoverable Support LDS is s.t., when pre

Recommended
View more...