# to Estimate Correlations between - PDF Document

##### Presentation Transcript

1. NASA Cost Symposium (Ames Research Center) August 27, 2015 Moffett Field, CA • A “Common Risk Factor” Method to Estimate Correlations between Distributions Presented by: Marc Greenberg Cost Analysis Division (CAD) National Aeronautics and Space Administration

2. Outline • • • Correlation Overview Why Propose Another Correlation Method? Underlying Basis for “Common Risk Factor” Method Concept of Mutual Information Using the Unit Square to Estimate Mutual Information – – “Common Risk Factor” Method (for pair of activities) Apply 7 Steps to Estimate Correlation between 2 Distributions Examples Correlation of Durations for Two Morning Commutes Correlation of Costs for Two WBS Elements of a Spacecraft Conclusion, Other Potential Applications & Future Work • – • – – • Slide 2

3. Correlation Overview (1 of 3) a • What is Correlation? A statistical measure of association between two variables. It measures how strongly the variables are related, or change, with each other. If two variables tend to move up or down together, they are said to be positively correlated. If they tend to move in opposite directions, they are said to be negatively correlated. The most common statistic for measuring association is the Pearson (linear) correlation coefficient, rP Another is the Spearman (rank) correlation coefficient, rS Used in Crystal Ball and @Risk – – • • – – • (a) Source: Correlations in Cost Risk Analysis, Ray Covert, MCR LLC, 2006 Annual SCEA Conference, June 2006 Slide 3

4. Correlation Overview • – • – – Correlations (or dependencies) between the uncertainties of WBS CERs are generally determined subjectively However, as we collect more data, more and more of these correlations are determined using historical data Whether functional, applied or both types of correlation, total variance (s2) can be calculated using the following: k n n s s r s s       1 2 1 (2 of 3) a Functional Correlation: Captured through mathematical relationships w/in cost model Applied Correlation: Specified by the analyst and implemented w/in cost model • •  1 … where r jk is the correlation between uncertainties of WBS CERs j and k (notated at sj and sk , respectively)   2 2 2 Total k jk j k k k j The remainder of this presentation will focus on how to calculate r r jk (a) Source: Joint Agency Cost Schedule Risk and Uncertainty Handbook (Sec. 3.2 & Appendix A), 12 March 2014 Slide 4

5. Correlation Overview (3 of 3) a Currently, there are 2 general paths to obtain r… r r Statistical Non-Statistical Data Available: (CADRE, CERs) No Data: Educated Guess Retro- ICE Effective r r Residual Analysis Causal Guess N-Effect Guess Knee in curve (Steve Book Method) Regressed Residuals for 2 CERs (X and Y) for 8 Programs (Pearson's Product Moment Correlation Coefficient, r = 0.88) 0.60 0.55 r = 0.88 Strength None Weak Medium Strong Perfect Positive Negative 0 0.3 0.5 0.9 1 0.50 Example: 0 0.45 Example: -0.3 -0.5 -0.9 -1 0.40 0.35 0.30 0.25 0.20 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 (a) Schematic from Correlations in Cost Risk Analysis, Ray Covert, MCR LLC, 2006 Annual SCEA Conference, June 2006 Slide 5

6. Why Propose Another Correlation Method? 1. For statistical methods, lack of data makes it difficult to calculate robust Pearson’s R or Spearman’s Rho Example: Residuals from previous slide produces Rho = 0.88. However, the residuals exhibit an “influential observation.” 2. For non-statistical methods, there can be many issues: “N-Effect” and “Knee-in-the-Curve” methods are not inherently intuitive to the non-practitioner. Although “Causal Guess” method is simple and intuitive, the analyst and/or subject matter expert are still guessing. Whenever parameters of 2 uncertainty distributions lack basis, the correlation between them is difficult to justify. – – – – Unlike these other methods, the Common Risk Factor Method provides correlation between 2 uncertainties based upon common root-causes. Applying this method may lessen the degree of subjectivity in the estimate. Slide 6

8. Outline • Correlation Overview • Why Propose Another Correlation Method? • Underlying Basis for “Common Risk Factor” Method – Concept of Mutual Information – Using the Unit Square to Estimate Mutual Information • “Common Risk Factor” Method (for pair of activities) – Apply 7 Steps to Estimate Correlation between 2 Distributions • Examples – Correlation of Durations for Two Morning Commutes – Correlation of Costs for Two WBS Elements of a Spacecraft • Conclusion, Other Potential Applications & Future Work Slide 8

9. Concept of Mutual Information • Whenever two objects share common features, these features can be perceived as “mutual information” Binary string x: 0 0 0 1 0 1 1 1 Binary string y: 1 0 1 1 1 0 0 0 2 of the 8 pairs are the same Mutual information: = 2 / 8 or 0.25 or 25% 8 oz.of OJ 16 oz. of OJ The “least common denominator” is 8 oz. of OJ Mutual information: = 8 / 16 or 0.50 or 50% Mutual information can also be applied to risk factors that are common among a pair of uncertainty distributions. Slide 9

10. Mutual Information between 2 groupings Weighted Ave: Mutual Information = S S Weight * (Minimum (X, Y) / Maximum (X, Y)) Minimum (X, Y) Maximum (X, Y) Wtd Mutual Information Mutual Information Group X GroupY Weight 16 / 32 16 oz. 0.50 x 0.50 = 0.25 8 4 12 oz. 4 4 oz. --------------------------------------------------- ----------------------------------- Sum: 32 --------------------------------------------------- Mutual Information between Group X and Y 16 8 / 16 = 0.50 = 0.50 12 4 / 12 = 0.333 = 0.375 = 0.125 8 oz. 12 / 32 0.333 x 0.375 4 / 32 1.00 x 0.125 = 0.125 = 0.125 4 oz. 4 4 / 4 = 1.00 4 oz. ----------------------------------- 0.50 Slide 10

11. The Unit Square: Meeting Times Example a Example Problem: A boy & girl plan to meet at the park between 9 &10am (1.0 hour). Neither individual will wait more than 12 minutes (0.20 of an hour) for the other. If all times within the hour are equally likely for each person, and if their times of arrival are independent, find the probability that they will meet. Solution (Part 1 of 2): X and Y are uniform RV’s The boy’s actions can be depicted as a single continuous random variable X that takes all values over an interval a to b with equal likelihood. This distribution, called a uniform distribution, has a density function of the form Similarly, the girl’s actions can be depicted as a single continuous random variable Y that takes all values over an interval a to b with equal likelihood. In this example, the interval is from 0.0 to 1.0 hour. Therefore a = 0.0 and b = 1.0. Notation for this uniform distribution is U [0, 1] Slide 11 (a) K. Van Steen, PhD, Probability and Statistics, Chapter 2: Random Variables and Associated Functions

12. The Unit Square: Meeting Times Example Solution (Part 2 of 2): Model Frequency when | Neither person will wait more than 0.20 of an hour. This can be modeled as a simulation where a “meeting” occurs only when | X – Y < 0.20 | X – Y < 0.20 . | Simulation of Joint Density Function of Uniformly Distributed Random Variables Probability of | | X - Y < 0.20 on Unit Square Iteration rv (X) 0.142 0.368 0.786 0.375 0.549 0.336 0.613 : : rv (Y) 0.318 0.733 0.647 0.902 0.935 0.775 0.726 : : |X - Y| 0.176 0.365 0.138 0.528 0.386 0.439 0.113 : |X - Y| <0.2? 1 0 1 0 0 0 1 : : 1.00 1 2 3 4 5 6 7 : : 0.90 0.80 0.70 Random Variable Y 0.60 : 0.50 9998 9999 10000 0.157 0.384 0.045 0.186 0.991 0.399 0.029 0.607 0.354 1 0 0 0.40 0.30 Total = 3630 0.20 This simulation indicates that out of 10,000 trials, the boy and girl meet 3,630 times. Probability they will meet = 0.363 or 36% 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Random Variable X Slide 12

13. The Unit Square … Why do we Care? So what does Modeling Frequency of | do with “Common Risk Factors”? X – Y < 0.20, have to | X: 0.786 * 60 = 47 minutes. Arrives at 9:47am. Y: 0.647 * 60 = 39 minutes. Arrives at 9:39am. The girl arrives at 9:39am. The boy arrives at 9:47am. He arrived w/in the 12 minute (0.2 hr) time window. So they do meet. Iteration rv (X) 0.142 0.368 0.786 0.375 0.549 0.336 0.613 : rv (Y) 0.318 0.733 0.647 0.902 0.935 0.775 0.726 : |X - Y| 0.176 0.365 0.138 0.528 0.386 0.439 0.113 : : |X - Y| < 0.2? 1 0 1 0 0 0 1 : 1 2 3 4 5 6 7 : : X: 0.375 * 60 = 22 minutes. Arrives at 9:22am. Y: 0.902 * 60 = 54 minutes. Arrives at 9:54am. The boy arrives at 9:22am. The girl arrives at 9:54am. She arrived after the 12 minute (0.2 hr) time window. So they do not meet. : : : 9998 9999 10000 0.157 0.384 0.045 0.186 0.991 0.399 0.029 0.607 0.354 1 0 0 Using 10,000 trials, the boy & girl meet 3,630 times. Probability they will meet = 0.363 Total = 3630 Given that each person will “use up” 20% of their respective 1.0 hour time interval, we demonstrate the frequency (out of 10,000 trials) that the boy and girl are in “similar states” = Mutual Information Slide 13

14. Unit Square: Geometric Estimate of Prob. The “area of intersection” can be calculated using Geometry Neither person will wait more than 0.20 of an hour. Let Limit, L = 0.20. Joint Density Function of Uniformly Distributed Random Variables Probability of |X - Y| < 0.20 on Unit Square The Probability is Determined by Calculating the Area of the Shaded Region: A1 = A2 = 0.5 (L) * (L) = 0.5 L2 A3 = sqrt (2) ( L) * sqrt (2) (1 = 2 L (1 - L) 1.00 A2 A2 0.90 0.80 - L) 0.70 Y - X = 0.2 Area = A1 +A2 +A3 Random Variable Y 0.60 Area = 0.5 L2 + 0.5 L2 + 2 L (1 - L) A3 A3 0.50 X = Y X - Y = 0.2 0.40 Area = L2 + 2 L (1 - L) 0.30 Area = 0.202 + 2 (0.20) (1 –0.20) 0.20 Area = 0.360 A1 A1 0.10 0.00 Note: This Probability is actually a Volume, not an Area … 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Random Variable X Slide 14

15. Unit Cube = Unit Square Area x 1.00 Probability of 2 Independent Uniformly Distributed Random Variables [0, 1] Intersecting within a 0.20 Interval Example: Likelihood of boy (x) & girl (y) meeting at park between 9 &10am, given neither will wait more than 12 minutes (0.20 hr) 1.00 0.90 Height = 1.00 0.80 0.70 Probabilty that | Y - X | < 0.20 0.60 0.50 0.40 1.00 0.90 0.30 0.80 0.70 0.20 0.60 0.10 Random Variable Y 0.50 0.40 0.00 0.30 0.00 0.04 0.08 0.12 0.16 0.20 0.24 0.20 0.28 0.32 0.36 0.40 0.44 0.48 0.52 0.10 0.56 0.60 0.64 0.68 0.72 0.76 0.80 0.00 0.84 0.88 0.92 Random Variable X 0.96 1.00 For random values of X and Y, when | | Y –X < 0.20, probability = 1.00. Otherwise probability = 0.00 Slide 15

16. Outline • Correlation Overview • Why Propose Another Correlation Method? • Underlying Basis for “Common Risk Factor” Method – Concept of Mutual Information – Using the Unit Square to Estimate Mutual Information • “Common Risk Factor” Method (for pair of activities) – Apply 7 Steps to Estimate Correlation between 2 Distributions • Examples – Correlation of Durations for Two Morning Commutes – Correlation of Costs for Two WBS Elements of a Spacecraft • Conclusion, Other Potential Applications & Future Work Slide 16

17. Common Risk Factor Method (for 2 activities) Assuming 2 uncertainty distributions (e.g. triangular) are given a... The Common Risk Factor Method requires 7 Steps: Step 1: Create Risk Reference Table to determine Risk Factors (RFs) Note: This can be the most time consuming step! Step 2: Estimate RF % contributions to Duration or \$ Uncertainty Step 3: Calculate Min & Max Volumes associated w/common RF pairs Step 4: For RF pair i, Divide Min by Max Volumes to get Correlation Step 5: For RF pair i, Calculate Weighting Factor Step 6: Multiply Steps 4 & 5 Results = Wtd Correlation for RF pair i Repeat Steps 3 through 6 for remaining common RF pairs Step 7: Sum up Weighted Correlations to get total Correlation (a) For methods on developing uncertainty distributions using risk factors, refer to “Expert Elicitation of a Maximum Duration using Risk Scenarios,” 2014 NASA Cost Symposium presentation, M. Greenberg Slide 17

18. Ground Rules and Assumptions (1 of 2) • Best to use when sufficient historical data is not available – If it is available, then this method can be used as a cross-check • At least one Subject Matter Expert (non-cost analyst) is participating by providing inputs / opinion / judgment • Method only presents steps to get positive correlation – Future work will include efforts on negative correlation • Recommend no more than 5 risk factors per distribution – With > 5 common risk factors, SME has difficulty “separating” salient risk factors from all possible risk factors. • Risk factor pairs tend to become alike, producing correlations > 0.30 – As a general rule, risk factors contributing < 5% to overall uncertainty should be added into “Undefined” category Slide 18

19. Ground Rules and Assumptions (2 of 2) • For distributions shown herein, % contribution of each risk is average from Min to Max (simplifying assumption) • Each risk factor represents a uniformly distributed random variable (rv) that can have a value from 0 to 1. – Common risk factors are assumed to be correlated whenever the common risk factors are in a similar state. This occurs when each common risk factor has “overlapping” rv’s along U[0,1] • Trial 98, Weather is moderate for both rv’s X &Y => X & Y are Correlated • Trial 99, Weather is moderate for rv X, severe for rv Y = X & Y are not Correlated – The least common denominator (LCD) of relative contributions of each common risk pair models each common risk factor as continuous rv’s from 0  min value, not anywhere along U[0,1] • Result is that LCD technique will produce lower correlation values • Correlations for non-uniformly distributed random variable rv’s are not included in this presentation Slide 19

20. Example 1: Correlation of 2 Commute Durations • A “Workforce Quality of Life” study is looking into ways to reduce employee commute times while maintaining employee productivity. • A schedule analyst creates a model for to estimate total commute time. Part of her model includes these assumptions: – Commute is from Commuter’s Residence to anywhere in Washington, DC – Maximum Commuting Distance for Phase A of the Study = 8 miles – A person (X) commuting to work in DC from inside the beltway has a most- likely commute time of 20 minutes by car – A person (Y) commuting into DC from inside the beltway has a most-likely commute time of 40 minutes by bus & metro – To run the simulation for estimating total commute time, assume persons X and Y commutes have a medium correlation = 0.50. • Question: Is 0.50 a reasonable estimate of correlation? Examples and Cases that Follow are Notional. They are Provided to Demonstrate the Methodology. Slide 20

21. Example 1: Commute Times Commute Time Based Upon SME Opinion Using Scenario-Based Values (SBV) Method Commute Time Based Upon SME Opinion Using Scenario-Based Values (SBV) Method 0.090 0.045 20 Most-Likely Bus/Metro Time = 30 minutes Most-Likely Driving Time = 20 minutes 0.080 30 0.040 0.070 0.035 0.060 0.030 0.050 0.025 f(x) f(x) 0.040 0.020 0.030 0.015 0.020 0.010 0.010 0.005 40 15 70 20 0.000 0.000 0 5 10 15 20 25 30 35 40 45 0 10 20 30 40 50 60 70 80 Time (minutes) Time (minutes) Bus/Metro: Potential 40 minute impact versus Most-Likely Bus/Metro Time Driving: Potential 20 minute impact versus Most-Likely Driving Time So what is the correlation between these two uncertainty distributions? If we know the relative contributions of underlying risk factors for each distribution, we can calculate the correlation between these two distributions Slide 21

22. Create Risk Reference Table (Step 1) Step 1a: SME & Interviewer Create an Objective Hierarchy Q: To minimize commute time, what is your primary objective? A: Maximize average speed from Residence to Workplace Q: What are primary factors that can impact “average speed”? A: Route Conditions, # of Vehicles, Mandatory Stops & Bus/Metro Efficiency Q: Is it possible that other factors can impact “average speed”? A: Yes … (but SME cannot specify them at the moment) Objective Means The utility of this Objective Hierarchy is to aid the Expert in: These are Primary Factors that can impact Objective (a) Establishing a Framework from which to elicit most risk factors, Route Conditions Maximize Average Speed from Residence to Workplace # of Vehicles on Roads (b) Describing the relative importance of each risk factor with respect to means & objective, and Mandatory Stops Efficiency (c) Creating specific risk scenarios Slide 22 Undefined

23. Create Risk Reference Table (Step 1 cont’d) Step 1b: SME & Interviewer Brainstorm Risk Factors Using the Objective Hierarchy as a guide, the SME answers the following: Q: What are some factors that could degrade route conditions? A: Weather, Road Construction, and Accidents Q: What influences the # of vehicles on the road in any given morning? A: Departure time, Day of the Work Week, and Time of Season (incl. Holiday Season) Q: What is meant by Mandatory Stops? A: By law, need to stop for Red Lights, Emergency Vehicles and School Bus Signals Q: What can reduce Efficiency? A: Picking the Bus or Metro Arriving Late, Bus Stopping at Most Stops, and Moving Below Optimal Speed (e.g. driving below speed limit). Objective Means ThesearePrimary Factors that can impact Objective Route Conditions Maximize Average Speed from Residence to Workplace #of Vehicles on Roads MandatoryStops Efficiency Undefined Slide 23

24. Create Risk Reference Table (Step 1 cont’d) Step 1c: SME & Interviewer Map Risk Factors to the Objective Hierarchy Step 1d: SME & Interviewer work together to Describe Risk Factors Objective Means Risk Factors Description (can include examples) These are Primary Factors These are Causal Factors Subject Matter Expert's (SME's) top-level that can impact Objective that can impact Means Weather Accidents Road Construction Departure Time Day of Work Week Season & Holidays Red Lights Emergency Vehicles School Bus Signals Bus/Metro Arriving Late Bus Stopping at Most Stops Moving below Optimal Speed Undefined description of each Barrier / Risk Rain, snow or icy conditions. Drive into direct sun. Vehicle accidents on either side of highway. Lane closures, bridge work, etc. SME departure time varies from 6:00AM to 9:00AM Driving densities seem to vary with day of week Summer vs. Fall, Holiday weekends Approx 8 traffic intersections; some with long lights Incl. police, firetrucks, ambulances & secret service Route Conditions Maximize Average Speed from Residence to Workplace # of Vehicles on Roads School buses stopping to pick up / drop off Bus arriving late. Metro arriving late. On rare occasion, will call someone during commute Bus or Car Driver going well below speed limit It's possible for SME to exclude some risk factors Mandatory Stops Efficiency Undefined This is the most time-intensive part of SME interview & serves as reference for the interview method being used. Slide 24

25. Step 2. Estimate Risk Factor % Contributions For each type of commute, respective SMEs ascribe the following “max” time impacts to 4 risk factors: • Weather, Road Construction, Bus/Metro Arriving Late and Departure Time Max Impact vs Most Likely Car Bus/Metro 4.0 2.0 10.0 8.0 0.0 26.0 6.0 4.0 20 40 Contribution of Total Car 0.20 0.50 0.00 0.30 1.00 Risk Factor Weather Road Construction Bus/Metro Arriving Late Departure Time Total Delay (minutes): Total 6.0 18.0 26.0 10.0 60 Bus/Metro 0.05 0.20 0.65 0.10 1.00 % Impact due to Realization of Given Risk Car: “Road Construction” contributes most to dispersion (10 minute impact ) Bus/Metro: “Bus/Metro Arriving Late” contributes most to dispersion (26 minute impact ) Note: These impacts can be elicited “ad-hoc” from the SME. Nevertheless, it is recommended to apply more structured methods during the SME interview for long-duration activities or ones with higher criticality indices. a (a) For methods on developing uncertainty distributions using risk factors, refer to “Expert Elicitation of a Maximum Duration using Risk Scenarios,” 2014 NASA Cost Symposium presentation, M. Greenberg Slide 25

26. Correlation of a Risk Pair (Road Construction) The “least common denominator” of 0.20 is used to calculate a probability of 0.36 that rv’s X and Y are in a similar “state.” The “maximum possible” value of 0.50 is used to calculate a probability of 0.75 that rv’s X and Y are in a similar “state.” Car 0.20 0.50 0.00 0.30 Bus/Metro 0.05 0.20 0.65 0.10 Road Const Joint Density Function of Uniformly Distributed Random Variables Probability of | | X - Y < 0.50 on Unit Square Joint Density Function of Uniformly Distributed Random Variables Probability of | | X - Y < 0.20 on Unit Square 1.00 1.00 Given 2 rv’s = 0.50 Obtain mutual information by calculating volume ratio. Given 2 rv’s = 0.20 0.90 0.90 0.80 0.80 Volume = 0.75 0.70 Volume = 0.36 Y - X = 0.5 0.70 Random Variable Y Y - X = 0.2 Random Variable Y 0.60 0.60 Joint Density Function of Uniformly Distributed Random Variables Probability of | | X - Y < 0.50 on Unit Square 1.00 0.50 0.50 X = Y X = Y X - Y = 0.5 X - Y = 0.2 0.90 0.40 0.40 0.80 0.30 0.30 Volume Ratio = 0.48 0.70 0.20 0.20 Y - X = 0.5 Random Variable Y 0.60 0.10 0.10 0.50 0.00 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 X = Y X - Y = 0.5 0.40 Random Variable X Random Variable X 0.30 Correlation of this Risk Pair Indicates a “Relative” Volume = 0.36 / 0.75 = 0.48 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Slide 26 Random Variable X

27. Common Risk Factor Method: Steps 3 - 7 Step 3. Min & Max Volumes Associated with Common Risk Factors Recall: Volume = L 2 + 2 L (1 - L) Step 4. Correlation (per risk factor pair) = Min Volume / Max Volume Step 5. Weighting Factor for Each Min/Max = Max Volume divided by Sum of Max Volumes (e.g. L = 0.20 for Weather, Car) Contribution of Total Calculated Volumes wrt Car Bus/Metro 0.20 0.05 0.50 0.20 0.00 0.65 0.30 0.10 1.00 1.00 Min Volume 0.098 0.360 0.000 0.190 0.648 Max Volume 0.360 0.750 0.878 0.510 2.498 Min/Max Volume 0.271 0.480 0.000 0.373 0.281 Weighting Factor 0.14 0.30 0.35 0.20 1.000 Weighted Min/Max 0.039 0.144 0.000 0.076 0.259 Risk Factor Weather Road Construction Bus/Metro Arriving Late Departure Time Car 0.360 0.750 0.000 0.510 1.620 Bus/Metro 0.098 0.360 0.878 0.190 1.525 Totals: Step 7. Sum up Weighted Correlations to get total Correlation Step 6. Weight Correlation of Each Pair of Common Risk Factors The 0.26 correlation value reflects the mutual information (of common risks) between these 2 activities. The analyst’s “Causal Guess” of 0.50 was not a reasonable estimate of correlation. Slide 27

28. Space Flight Project WBS Standard Level 2 Elements Ref: NPR 7120.5, Appendix G Spacecraft (S/C) Lower Level WBS * * Note: These numeric designations for S/C Level 4 WBS are shown for illustrative purposes only. 06.04.01 Management 06.04.09 Communications 06.04.05 Thermal Control 06.04.06 Elec Pwr & Dist 06.04.02 Sys Engineering 06.04.10 C&DH 06.04.07 GN&C 06.04.11 Software 06.04.03 Prod Assurance 06.04.12 I&T 06.04.08 Propulsion 06.04.04 Structure & Mech The next notional example shows an estimate of correlation between pre-Phase A costs of S/C “Structure & Mech” and “Thermal Control” Slide 28

29. Example 2: Spacecraft Cost Elements 06.04 Structures Cost Uncertainty (\$M) Using Scenario-Based Values (SBV) Method 06.05 Thermal Control System Cost Uncertainty (\$M) Using Scenario-Based Values (SBV) Method 0.250 0.700 Most-Likely Cost = \$10M \$3.00 Most-Likely Cost = \$3M \$10.00 0.600 0.200 0.500 0.150 0.400 f(x) f(x) 0.300 0.100 0.200 0.050 0.100 \$5.40 \$17.80 \$2.28 \$8.00 0.000 0.000 \$0.00 \$1.00 \$2.00 \$3.00 Cost (\$M) \$4.00 \$5.00 \$6.00 \$0.00 \$2.00 \$4.00 \$6.00 \$8.00 \$10.00 Cost (\$M) \$12.00 \$14.00 \$16.00 \$18.00 \$20.00 Structures & Mechanisms: Potential \$7.8M impact versus Most-Likely Cost Thermal Control Systems: Potential \$2.4M impact versus Most-Likely Cost So what is the correlation between these two uncertainty distributions? If we know the relative contributions of underlying risk factors for each distribution, we can calculate the correlation between these two distributions Slide 29

30. Create Risk Reference Table (Step 1) Step 1a: SME & Interviewer Create an Objective Hierarchy Q: To meet the project mission, what is your primary objective? A: Complete DDT&E for a Spacecraft that Meets Cost and Schedule Objectives Q: What are primary means to accomplish this objective? A: Complete Tech Design; Provide Adequate Resources & Expertise for Program Execution Q: Is it possible that other factors can impact DDT&E outcome? A: Yes … (but SME cannot specify them at the moment) The utility of this Objective Hierarchy is to aid the Expert in: Objective Means These are Primary Factors that can impact Objective (a) Establishing a Framework from which to elicit most risk factors, Complete Technical Design to Satisfy System (or Mission) Requirements Complete DDT&E for a Spacecraft that Meets Cost & Schedule Objectives (b) Describing the relative importance of each risk factor with respect to means & objective, and Provide for Adequate Resources & Expertise for Program Execution (c) Creating specific risk scenarios N/A Undefined Slide 30

31. Create Risk Reference Table (Step 1, cont’d) Step 1b: SME & Interviewer Brainstorm Risk Factors Using the Objective Hierarchy as a guide, the SME answers the following: Q: What could influence the successful completion of your Technical Design? –Design Complexity – – – – Q: What are threats and barriers for you getting adequate resources & expertise for Program Execution? Objective Means These are Primary Factors that can impact Objective System Integration Complexity 1 or more Immature Technologies Requirements Creep Skills Deficiency (Vendor) Complete DDT&E fora Spacecraft thatMeets Cost& Schedule Objectives Complete Technical Design toSatisfySystem(or Mission) Requirements Provide forAdequate Resources & Expertise forProgramExecution – – – – – Lack of Programmatic Experience (NASA) Material Price Volatility Organizational Complexity Funding Instability Insufficient Reserves (Sched and/or Cost) N/A Undefined Slide 31

32. Create Risk Reference Table (Step 1, cont’d) Step 1c: SME & Interviewer Map Risk Factors to the Objective Hierarchy Step 1d: SME & Interviewer work together to Describe Risk Factors Objective Means These are Primary Factors that can impact Objective Risk Factors (Primary) These are Causal Factors (aka"Threats" or "Barriers") that can impact Means Description Subject MatterExpert's (SME's) top-level description of each Barrier/ Risk DesignComplexity SystemIntegrationComplexity The complexity of designingcertain aspects may be underestimated We don't fully appreciate the challenges of system integration that will need to occurin 18months There is alikelihood that we may need to incorporate certain components that are currently at TRL6 About 2/3of these types of projects have experienced requirements creep in the past decade The Vendormay lose some of it's "graybeards" overthe next year, leavingadearth in Technical Expertise The Program Office staff has experienced ahigher-than-usual turnoverrate in the past year The system includes exoticmatls that, in the past, were subject to large price swings (largely due to low supply) As of right now, there are 2vendors, 4sub-contractors, 3NASA Centers and 1university workingon this project Because this project is not an Agency priority, it is subject to fundingcuts in any given year. Because of the above risks, it's likely that project will not have sufficient schedule margin and/orcost reserves In most cases, the SMEwill not be able to specify ALLrisk factors that contribute to schedule / cost uncertainty Complete DDT&E fora Spacecraft thatMeets Cost& Schedule Objectives Complete Technical Design toSatisfySystem(or Mission) Requirements 1ormore Immature Technologies Requirements Creep Skills Deficiency(Vendor) Lackof ProgrammaticExperience (NASA) Material Price Volatility Organizational Complexity FundingInstability InsufficientReserves (Schedand/orCost) Undefined Provide forAdequate Resources & Expertise forProgramExecution N/A Undefined This is the most time-intensive part of SME interview & serves as reference for the interview method being used. Slide 32

33. Step 2. Estimate Risk Factor % Contributions For each cost, the SME ascribes the following “max” cost impacts to 5 risk factors: Systems Integration Complexity, Requirements Creep, Skills Deficiency (Vendor), Lack of Programmatic Experience (NASA) and Organizational Complexity • Max Impact vs Most Likely shown by WBS in \$M 06.04.04 \$2.00 \$1.50 \$0.80 \$1.00 \$1.00 \$1.50 \$7.80 Contribution of Total 06.04.04 0.26 0.19 0.10 0.13 0.13 0.19 1.00 Risk Factor 06.04.05 \$0.45 \$0.75 \$0.00 \$0.30 \$0.00 \$0.60 \$2.10 Total (\$M) \$2.45 \$2.25 \$0.80 \$1.30 \$1.00 \$2.10 \$9.90 06.04.05 0.21 0.36 0.00 0.14 0.00 0.29 1.00 System Integration Complexity Requirements Creep Skills Deficiency (Vendor) Lack of Programmatic Experience (NASA) Organizational Complexity Undefined Total Cost Impact (\$M): % Impact Due to Realization of Given Risk Structures & Mech: Sys. Integ. Complexity contributes most to dispersion (\$2M impact ) Thermal Control: Requirements Creep contributes most to dispersion (\$750K impact ) Steps to Calculate Correlation Between These 2 Spacecraft WBS are the Same as Those Used for Example 1. Slide 33

34. Common Risk Factor Method: Steps 3 - 7 Step 3. Min & Max Volumes Associated with Common Risk Factors Recall: Volume = L Step 4. Correlation (per risk factor pair) = Min Volume / Max Volume + 2 L (1 - L) 2 Step 5. Weighting Factor for Each Min/Max = Max Volume divided by Sum of Max Volumes (e.g. L = 0.19 for Requirements Creep) Contributionof Total CalculatedVolumes wrt 06.04.04 0.26 0.19 0.10 0.13 0.13 0.19 1.00 06.04.05 0.21 0.36 0.00 0.14 0.00 0.29 1.00 Min Volume 0.383 0.348 0.000 0.240 0.000 0.348 1.318 Max Volume 0.447 0.587 0.195 0.265 0.240 0.490 2.223 Min/Max Volume 0.856 0.592 0.000 0.905 0.000 0.000 0.392 Weighting Factor 0.20 0.26 0.09 0.12 0.11 0.22 1.000 Weighted Min/Max 0.172 0.156 0.000 0.108 0.000 0.000 0.436 RiskFactor System Integration Complexity Requirements Creep Skills Deficiency (Vendor) Lack of ProgrammaticExperience (NASA) Organizational Complexity Undefined 06.04.04 0.447 0.348 0.195 0.240 0.240 0.348 1.817 06.04.05 0.383 0.587 0.000 0.265 0.000 0.490 1.724 Totals: Step 6. Weight Correlation of Each Pair of Common Risk Factors Step 7. Sum up Weighted Correlations to get total Correlation The 0.44 correlation value reflects the mutual information (of common risks) between Costs of WBS 06.04.04 and 06.04.05 Slide 34

35. Recommended Applications Best for looking at Correlations for Distributions where Risk Impacts are of Most Concern … • Cost and Schedule Estimating – Estimates early-on in Acquisition Life Cycle • Pre-Phase A, pre-Milestone A, etc. where <5 “top-level” risks tend to dominate – Technology Cost Estimating (TRL < 6) – Cross-check on data-driven Correlations (“Statistical”) – Support Independent Estimates (and/or Assessments) • Technical Design and/or Assessment – Assess Early-stage Risks in System Design & Test – Assess threats / barriers to Systems’ Safety – Standing Review Board (SRB) Evaluations Slide 35

36. Recap / Conclusion In summary, this presentation covered: Current challenges that estimators have in specifying defensible correlations between uncertainty distributions The concept of modeling correlation based upon mutual information How the unit square can be used to estimate correlation Depicted as an “intersection” in the unit square (of two uniformly distributed random variables). A 7-step method on how to estimate correlation based upon knowledge of risk factors common among the pair of uncertainty distributions Examples on how to apply the 7-step method • • • – • • Unlike other methods, the Common Risk Factor Method provides correlation between 2 uncertainties based upon common root-causes. Applying this method may lessen the degree of subjectivity in the estimate. Slide 36

37. Contact Information -------------------------------------------------------------------- -------------------------------------------------------------------- Presenter: Title: Organization: Office Location: NASA HQ, Washington, DC Email: marc.w.greenberg@nasa.gov Phone: 202.358.1025 Marc Greenberg Operations Research Analyst NASA Cost Analysis Division (CAD) For more information on the Cost Analysis Division, go to our CAD webpage at: www.nasa.gov/offices/ooe/CAD/ Slide 37

38. Backup Slides Slide 38

39. Depiction of 2 Uniformly Distributed RVs Intersecting … Given Weighting for each continuous random variable = 0.2 “Weather” for Distributions 1 and 2 W1 W2 For the following interval U[0,1], how often would W1 and W2 be in a “similar” state? f(x) 0 1 Weather is not similar between W1 and W2 = 0 Therefore: Weather is similar between W1 and W2 = 1 After 10,000 iterations, W1 and W2 will overlap approximately 3,600 times. In other words, W1 and W2 are expected to be in similar states about 36% of the time. Weather is similar between W1 and W2 = 1 Weather is not similar between W1 and W2 = 0 Another way of describing this is that, when given a common pair of risk factors (each with equal “weighting” of 0.20), they have a 36% chance of being in a similar “condition” or “state.” Weather is similar between W1 and W2 = 1 Weather is not similar between W1 and W2 = 0 ------------------------------------------------------ “state” of weather is similar between W1 & W2 After 10,000 iterations, the total # of times the : : = 3,600 / 10,000 =36% Slide 39

40. Mutual Information of Risk Factors Mutual information can also be applied to risk factors that are common among a pair of uncertainty distributions. The more “similar” the 2 weather contributions (to their respective task uncertainties), the higher the % of mutual information. Duration of Task 1 Duration of Task 2 0.080 2.00 20.00 0.160 0.070 0.140 0.060 0.120 0.050 0.100 30.00 0.040 f(x) f(x) 0.080 15.00 0.030 10.00 0.060 1.00 0.020 0.040 0.010 0.020 41.27 14.02 11.42 0.56 0.000 0.000 0 5 10 15 20 25 30 35 40 45 0 2 4 6 8 10 12 14 16 # of Days # of Days Weather during Task 2 Weather during Task 1 Illustration showing Weather as a risk factor attributed to duration uncertainties for Tasks 1 and 2. (This common risk factor reflects mutual information between Tasks 1 & 2) Slide 40

41. Results: Correlation of Commute Time Uncertainties What if the SMEs added other important risk factors? What if she doesn’t know all of the risk factors? ------------------------------------------------------------------------ The following 2 slides will provide notional cases: • A: Five risk factors affect durations of either or both commute types - Part 1 – All risk factors contribute to > 98% of uncertainty - Part 2 – Account for “Unexplained Uncertainty” for each Commuting Uncertainty Distributions (Car and Bus/Metro) • B: Measure effect of Risk Mitigation to Case A’s Correlation - Improve % on-time arrivals of busses and metro trains - Improve arrival frequency of busses and metro trains during holidays Slide 41

42. Case A: Correlation of Commute Time Uncertainties SME Provides another Common Risk Factor: Accidents Correlation due to Common Risk Factor Contribution to Commute Time Uncertainty (Bus/Metro) Contribution to Commute Time Uncertainty (Car) Weighting Factor Weighted Correlation Adding content bumps up Correlation from 0.26 to 0.465. Min Volume MaxVolume Risk Factor Weather Accidents Road Construction Departure Time Bus/Metro Arriving Late 0.25 0.34 0.26 0.15 0.00 1.00 0.20 0.18 0.12 0.10 0.40 1.00 0.360 0.328 0.226 0.190 0.000 1.103 0.438 0.564 0.452 0.278 0.640 2.372 0.823 0.580 0.499 0.685 0.000 0.184 0.238 0.191 0.117 0.270 1.000 0.152 0.138 0.095 0.080 0.000 0.465 Total: SME provides content on “Undefined” (a catch-all for “Unexplained Variation”): Correlation due to Common Risk Factor Contribution to Commute Time Uncertainty (Bus/Metro) Contribution to Commute Time Uncertainty (Car) Weighting Factor Weighted Correlation Min Volume MaxVolume Risk Factor Weather Accidents Road Construction Departure Time Bus/Metro Arriving Late Undefined Having undefined risk factors reduces Correlation from 0.465 to 0.32. 0.20 0.28 0.22 0.12 0.00 0.18 1.00 0.14 0.13 0.08 0.07 0.28 0.30 1.00 0.260 0.243 0.154 0.135 0.000 0.328 1.120 0.360 0.482 0.392 0.226 0.482 0.510 2.450 0.723 0.505 0.392 0.599 0.000 0.000 0.147 0.197 0.160 0.092 0.197 0.208 1.000 0.106 0.099 0.063 0.055 0.000 0.000 0.323 Total: Slide 42

43. Case B: Correlation of Commute Time Uncertainties Risk Mitigation: Improve % on-time arrivals of busses and metro trains Input Change: “Bus/Metro Arriving Late” Contribution to Commute Time adjusted from 0.28 to 0.20 Correlation due to Common Risk Factor Contribution to Commute Time Uncertainty (Bus/Metro) Contribution to Commute Time Uncertainty (Car) The Risk Mitigation effort would slightly increase Correlation from 0.32 to 0.37. Weighting Factor Weighted Correlation Min Volume MaxVolume Risk Factor Weather Accidents Road Construction Departure Time Bus/Metro Arriving Late Undefined 0.20 0.28 0.22 0.12 0.00 0.18 1.00 0.16 0.15 0.09 0.07 0.20 0.33 1.00 0.294 0.278 0.172 0.135 0.000 0.328 1.207 0.360 0.482 0.392 0.226 0.360 0.551 2.370 0.818 0.576 0.439 0.599 0.000 0.000 0.152 0.203 0.165 0.095 0.152 0.233 1.000 0.124 0.117 0.073 0.057 0.000 0.000 0.371 Total: This increase in Correlation (versus Case A) is due to an increase in Mutual Information between the common Risk Pairs (where BOTH values > 0) By reducing Bus/Metro’s top “uncertainty driver,” the dispersion for the Bus/Metro commute went down (not shown here). At the same time, correlation between the distributions went up. Slide 43

44. Space Vehicle Development Cost “Causal Process” (1) P.S. Killingsworth, Pseudo‐Mathematics: A Critical Reconsideration of Parametric Cost Estimating in Defense Acquisition, Sep 2013 Slide 44

45. Case A: Correlation of Spacecraft Cost Uncertainties Risk Mitigation: (1) Redesign Thermal Ctrl System to reduce Sys Integ Complexity Uncertainty (2) Hire Senior Level advisors to reduce Programmatic Uncertainty (for 06.04.05) Input Changes: (1) “Sys Integ Cmplx” Contribution to Cost Uncertainty adjusted from 0.21 to 0.15 (2) “Lack of Prog Exp” Contribution to Cost Uncertainty adjusted from 0.14 to 0.10 Correlation due to Common Risk Factor Contribution to WBS Cost Uncertainty (06.04.05) Contribution to WBS Cost Uncertainty (06.04.04) The Risk Mitigation effort would decrease Correlation from 0.44 to 0.35. Weighting Factor Weighted Correlation Min Volume MaxVolume Risk Factor System Integration Complexity Requirements Creep Skills Deficiency (Vendor) Lack of Programmatic Experience ( Organizational Complexity Undefined 0.26 0.19 0.10 0.13 0.13 0.19 1.00 0.15 0.42 0.00 0.10 0.00 0.33 1.00 0.278 0.348 0.000 0.190 0.000 0.348 1.163 0.447 0.664 0.195 0.240 0.240 0.551 2.336 0.621 0.524 0.000 0.792 0.000 0.000 0.191 0.284 0.083 0.103 0.103 0.236 1.000 0.119 0.149 0.000 0.081 0.000 0.000 0.349 Total: This decrease in Correlation (versus Baseline) is due to an decrease in Mutual Information between the common Risk Pairs (where BOTH values > 0) By reducing two “uncertainty drivers,” the dispersion for the WBS 06.04.05 (Thermal Ctrl) went down (not shown here). Also, correlation between the distributions went slightly down. Slide 45

46. Mutual Information between 2 groupings Method 1: Mutual Information = S SMinimum (X,Y) / S SMaximum (X,Y) Minimum (X, Y) Maximum (X, Y) Mutual Information Group X GroupY 16 oz. ------------------------------------------------------------------- Sum: 16 32 8 16 8 / 16 = 0.50 8 oz. 4 12 4 / 12 = 0.33 12 oz. 4 oz. 4 4 4 / 4 = 1.00 4 oz. 4 oz. 16 / 32 = 0.50 Slide 46