Project and Product Selection by He Jiang Department of Management University of Utah April 1st, 2003
Outline • On Integrating Catalogs • A Hierarchical Constraint Satisfaction Approach to Product Selection for Electronic Shopping Support • A Multiple Attribute Utility Theory Approach to Ranking and Selection
On Integrating Catalogs Rakesh Agrawal and Ramakrishnan Srikant IBM Almaden Research Center
Summary • Problem: integrating documents from different sources into a master catalog. • Gaps: Many data sources have their own categorizations; implicit similarity information in these source catalogs may be ignored. • Approaches: Naïve Bayes classification • Contribution: classification accuracy can be improved by incorporate the implicit similarity information present in these source categorizations
Problem—Why Integration? • B2C shops need to integrate catalogs from multiple vendors ( Amazon); • B2B portals merged into one company (Chipcenter & Questlink eChips); • Information portals categorize documents into categories (Google & Yahoo!). • Corporate portals Merge intra-company and external information into a uniform categorization
Problem Identification—Model Building • Problem identification: classification problem. • Master catalog M with categories C1, C2, …, Cn; • Source catalog N with categories S1, S2, …, Sm; • Merge documents in N into M.
Question How to Integrate?
Straightforward Approach: • Completely ignore N’s categorization, put each of N’s product into M’s category according to M’s classification rule.
Enhanced Approach • incorporate the implicit categorization information present in N into M.
Assumptions and Limitations • M and N may are homogeneous and have significant overlap; • M and N use the same vocabularies (Larkey, 1999). • Catalog hierarchies is flattened and is treated as a set of categories(Good 1965 & Chakrabarti 1997) • Different hierarchy levels (if M>N, can help distinguish categories that M doesn’t have; if N>M, NBHC can be applied.
Related Works and Gaps • Naïve-Bayes classifiers are accurate and fast(Chakrabarti et al 1997, …), so we choose Bayesian model; • Folder systems such as email routing(Agrawal et al, 2000,…), action predicting(Maes, 1994 & Payne et al, 1997), query organizing using text clustering(Sahami et al, 1998) and filings transferring(Dolin et al 1999); But none of this systems address the task of merging hierarchies • The Athena system includes the facility of reorganizing folder hierarchy into a new hierarchy (Agrawal et al, 2000); But no information from the old hierarchy is used in either building the model or routing the documents.
Effect of Weight on Accuracy • Weight can make difference for a given M and N; Tune set method to select a good value for the weight. in which the document will be correctly classified or will never be correctly classified • The highest possible accuracy achievable with the enhanced algorithm is no worse than what can be achieved with the basic algorithm.
Experimental Results—Data Sets Used • Synthetic catalog: deriving source catalog N from M using different distributions(e.g. Gaussian). • Real Catalog: two real-world catalogs that have some common documents; treat the first catalog minus the common documents as M, the remaining documents in the second catalog as N;
Contributions and Future Research Directions • Contributions: enhancing the standard Naive Bayes classification by incorporating the category information of the source catalogs; the highest accuracy of the enhanced technique can be no worse than that can be achieved by standard Naïve Bayes classification. • Future research: using other classifiers such as SVM to incorporating the implicit information of N requires further work
A Hierarchical Constraint Satisfaction Approach to Product Selection for Electronic Shopping Support Young U. Ryu IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and humans Vol. 29, No. 6, November 1999
Summary • Problem: proposing a product selection mechanism for electronic shopping support; • Approach: hierarchical constraint satisfaction (HCS) approach • Gap: simple taxonomy hierarchy(STH) approach is flawed in that the the search is conducted on a single generic product hierarchy; • HCS is more powerful and flexible than STH.
Question • 1. How do we search for a sugar-free decaffeinated cola? • 2. If there isn’t a cola that satisfy all the requirements, i.e., cola, sugar-free and decaffeinated. what’s your recommendation?
Gaps • Search is conducted on a single generic product hierarchy; • There may exist a product that cannot satisfy all the constraints; • A product may be evaluated to be better than another while there is no big differences between these two products.
Hierarchical Constraint Satisfaction Approach • Constraint Satisfaction: a methodology determining assignments of values to variables that are consistent with given constraint; • Hierarchical Constraint Satisfaction: an extension of STH which minimizes the the satisfaction errors of hierarchically organized constraints based on their importance; • Value of HCS: can be applied to cases in which there isn’t a solution that is consistent with given constraints due to conflicting constraints.
Concepts Introduced • Constraint domain transformation: transformation of a Boolean constraint to a arithmetic constraint; • Tree domain: is one whose elements are structured as a tree; thus can be handled more flexibly; • Indifference interval: overcome a shortcoming of hierarchical reasoning when the difference between two alternatives is small;
Constraint Satisfaction Error • Measures the degree of satisfaction of an arithmetic constrain c by the constraint satisfaction error function • for Boolean constraint, transform them into arithmetic constraints; • e.g.
Example • Shopping for wipes products using hierarchical constraint satisfaction approach. Each product is described by the following attributes: • Cost: cents per sheet • Add-on materials: “baking soda”, “aloe vera”, …; • Strength: measured by pressure(psi) that breaks a sheet; • Dispenser type: “box”, “pop-up”; • Added artificial scent: unscented, natural aloe scented, natural jasmine scented and chemical perfume scented; • Product purpose: “general purpose”, “diaper change”.
Contributions and Future Research Directions • Contribution: the product search mechanism is viewed as a satisfaction problem of hierarchically organized constraints over product attributes, thus it is more powerful and flexible than product selection based on a single product taxonomy hierarchy. • Future research: Purchasing requirement specification or constraint hierarchy elicitation; complete prototype implementation of the HCS approach; actual purchasing/sales transaction based on speech –act theory, illocutionary logic and inter-organizational activity coordination.
A Multiple Attribute Utility Theory Approach to ranking and Selection John Butler, Douglas J. Morrice and Peter W. Mullarkey Management Science, Vol. 47, No. 6, June 2001
Summary • Problem: developing a ranking and selection procedure for making comparison of systems that have multiple performance measures; • Approach: combining Multiple Attribute Utility Theory (MAUT) and statistical ranking and selection (R&S) using indifference zone; • Gaps: costing approach is flawed in that accurate cost data may not be available, and it may be difficult to measure performance using costs.. • Advantages: rigorous; close to business practice; simpler to implement; can estimate the number of simulations required; can assess the relative importance of criteria
Gaps • Most of the R&S literature focused on procedures that reduce the multivariate performance measures to a scalar performances measure problem, but these procedures may have some disadvantages, e.g. accurate cost data may not be available; it maybe difficult to accurately attach a dollar value to intangible variables; • Current techniques may require a complicated step of estimating a covariance matrix(Gupta & Panchapakesan 1979); • Previous work doesn’t provide an approach to estimate the number of simulations required to select the best configurations with a high level of probability(Andijani 1998, Kim & Lin 1999). • Previous work lacks a trade-off mechanism that allows the decision maker to combine disparate performance measures.
Assumptions • Decision maker’s preferences are accurately represented ( Clemen 1991, Keeney & Raiffa 1976); • Performance measures that is converted to “utils” can be converted to meaningful unit by choosing an invertible utility function; • There is a indifference zone for the decision maker on all the performance measures;
Additive MAU Model • If mutual utility additive independent, then • Example for additive independence:
Single Attribute Utility Function Used • Methods for assigning weights: trade-off method; analytical hierarchy process (AHP).
Question • What’s the benefit of using this function?
R&S Experimental Set-up • Correct Selection (CS): the R&S procedure accurately identifies the configuration with largest expected utility . • Two stage indifference zone procedure for R&S.
Selection of • A Utility Exchange Approach Table 1 Alternatives by Measures Matrix for Car Selection Table 2 Equivalent Hypothetical Cars
Question Again • Does it mean that the 20 horsepower is worth $1,200?
Establishing the Indifference Zone • Curve dividing the indifference and preference zone: