Vista Grande CanalWater Diversion Project Review of Analytic Techniques John Plummer February 2006
Goals we all agree upon: • Restore Lake Merced • Conserve fresh rainwater • Protect public health
Question: • Did additions of Vista Grande water into Lake Merced contribute to a noticeable increase in contaminants: • E-coli bacteria • Various metals • We will take a new look at the E-coli data
“The primary goal was to determine whether thediversion of limited volumes of treated stormwater (about 0.1 to 3.6 million liters per stormevent) increased concentrations of bacterial indicators of fecal contamination in South LakeMerced. Such increases would indicate the potential for increased human health risk (i.e.,contracting gastrointestinal disease) during recreation in the lake.” EOA final report, Oct. 2005:
EOA’s conclusion: “Geometric mean E. coli concentrations at most lake sample stations were higher following diversion events than background storm events, but the differences were not statistically significant.”
Design of the study: • Water from Vista Grande Canal cleaned using CDS • During six storms water released on riparian buffer, three storms with no release • Measurements taken at six sample points
Analytic approach • The researchers chose to evaluate this data using a Student t-test. • Data for storms with diversions were compared with those without diversions. • Each sample site was evaluated independently. • A probability > 0.05 was considered statistically insignificant.
So, what is Plummer’s problem? • “Sometimes a novice confuses the role of the null hypotheses, thinking that failure to reject it is equivalent to proving it.” JMP Statistics and Graphics Guide, Version 3.1 SAS Institute, Inc., 1995, pg. 265
What is the ‘null hypothesis’? • We assume that no observable impact has occurred as a result of the diversion. • We reject that assumption if, and only if, the likelihood that it is correct is less than one in twenty. This is not a conservative assumption given our agreed-upon goals.
What should we do? • We should assume that there is an impact, and reject that assumption only if it is very unlikely. • There is, contrary to the advice of the EOA engineer, no direct test of this hypothesis. • Instead we consider the power of the original test.
What is the ‘power’ of a test? • The power equals one minus Beta, the probability of a ‘Type II’ error. • A Type II error occurs when we say there is no impact when in fact there is an impact. • For Site #5 the power of the test is 0.18, the probability of a Type II error is 0.82!
Why evaluate each site separately? • This is one approach to removing variance due to location. • Although not a very good approach! • But, is there any site-to-site variance to consider? • No. Adjusted r2 = -0.045.
We see a clearer relationship by normalizing the data by storm • We normalize the data by setting the mean of each storm to 0, std dev to 1 • We also consider only data from diversion events • Site 4 appears to be an outlier • Adjusted r2 = 0.281.
Analyzing each site separately reduces the sensitivity of the t-test • The difference between the means is not reduced, • Also, the variances remain the same, • But the degrees of freedom is substantially reduced.
We want to evaluate the components of the variance. The total variance = • Variance due to the diversion, plus • Variance due to storm events, plus • Variance due to sampling site, plus • A random term.
This is called a “nested ANOVA” • Site is nested within storm event which in turn is nested within diversion group. • Since there is no significant contribution to variance due to site we can use that data as multiple samples for each storm event.
We first account for the variance due to storm event. • Summary statistics: • F-ratio = 13.4, Prob. < 0.0001 • Power = 1.0
We then consider the effect of Diversion on the remaining variance. • Summary statistics • F-ratio = 5.45, Prob. = 0.024 • Power = 0.63.
The resulting model successfully explains 63.3% of the variance. • Summary statistics • F-ratio = 12.43, Prob. < 0.0001 • Power = 1.0 • R-square = 0.633
Several aspects of this work, in addition to the statistical analysis, are quite troubling. • There was a tendency to minimize the importance of health concerns. • There was a lack of familiarity with the uses of the lake. • Additional shortcoming in data handling prejudices the results. I will describe some, but not attempt to cover all, of these problems.
Researchers continually minimize the importance of health considerations. “Although the applicability of these water quality criteria to this study is highly questionable, the criteria are conservative in that full body water contact recreation is prohibited at Lake Merced (SFPUC Resolution No. 10,435)” In fact that resolution prohibits swimming, not full body water contact that does occur, albeit infrequently.
Potential uses of the lake water are misrepresented: “Lake Merced … consists of four inter-connected basins (referred to as South, East, North, and Impound) which serve as an emergency source of non-potable water” (Casteel, et. al., August 2005) The PUC has, as recently as 1985, designated Lake Merced as an emergency source of potable water, and the RWQCB has designated the lake as a potential emergency supply of domestic water.
The researchers do not give up easily. “Please note that swimming and full body water contact is prohibited at Lake Merced.” From proposed letter to be sent to boaters describing this program. This same advice is now posted near the Boathouse.
Researchers overstated the strength of their analysis. “Based on a “weight-of-evidence” approach, the study results suggested that the pilot diversions probably did not increase potential human health risk.” Repeating an erroneous analysis six times, once for each site, does not constitute “weight-of-evidence.”
In case of doubt, guess on the side of the researchers! “The attached document prepared by Michael J. Casteel, Ph.D. (SFPUC Research Microbiologist) provides documentation that the riparian buffer is likely to provide some level of treatment.” (Emphasis added) This report documents successful application of a riparian buffer in Louisiana. It’s a it of a stretch to assume the same thing will happen at Lake Merced.
Metals simply disappeared: “A number of physical, biological and chemical processes potentially govern the fate of metals in the stormwater runoff diverted to the riparian buffer/lake. Such potential processes include accumulation in the riparian buffer soils, removal by biological uptake in the buffer or the lake, and adsorption to particles in the lake system. Transformations among species of individual metals are also likely.” No provision is made in the current proposal to discover which, if any, of these is correct.
The scope of the study precludes more detailed analysis: Question: “There is some delay between a storm event and the delivery of treated stormwater to the test site. No analysis is presented indicating the degree to which the ground has become saturated during this interval. How much water does the buffer absorb? How much runs off directly into the lake?” Answer:“Additional engineering analysis would be needed to address this issue. Such analysis was beyond the scope of the pilot study.”
We all agree, better monitoring is needed: “We do agree that any future increases in diversion volume would require vigilant monitoring. Before any additional diversions to the lake occur, we will carefully design and document additional monitoring activities.” Unfortunately, the current proposal does not include significant improvement in the monitoring program.
Convenience seems to be the leading factor establishing test design. “Samples will be collected from the lake approximately 1 to 3 days after a diversion event is initiated.” No rationale to support this timing is provided. It is likely that is “outside the scope” of this study as well.
On the other hand, perhaps the researchers were not aware that St. Ignatius Rowing Club has their center at the lake.
The researchers assume away statistical difficulties: “It is reasonable to assume the underlying population distribution is lognormal.” Before testing for normality one needs to establish that the data are taken from an homogenous population. Obviously that is not the case here, as the data set is stratified by event.
Data handling served the interests of the researchers, not good statistics. • Data for 1/27/05 fell outside the range of the monitor: • The engineers said the data were good, that the values were reported as less than 100. However, in the calculations they used 100.
The researchers are completely satisfied with their techniques: “Response: Different statistical methods may yield different results. We have not attempted to evaluate the above methods, but we believe that our methods were appropriate.” Maybe there is some small chance, however remote, that someday there might actually be something to learn! But not like this.
The researchers demonstrate an amazing optimism: “ What is really indicated is that there is an approximate 98.5% probability that a sample … will have a concentration lower than the single sample maximum criteria of 576 MPN/100mL for full body contact recreation.” After six events, one of which recorded levels of E-coli within measurement error of this maximum, the researchers are 98.5% confident that we won’t exceed that maximum!
Another example: “CDS effluent concentrations of bacterial indicators and metals were generally several orders of magnitude greater than the concentrations found in South Lake Merced. This suggests that treatment by the riparian buffer effectively reduced bacterial concentrations.” Dilution and die-off effects were not estimated, and were considered “beyond the scope” of this study.
It is little wonder that the researchers would rather that the RWQCB stay away. “The call for intervention of the Regional Board is misplaced … but rather constructively moving ahead with the Year 3 Plan as presented so it can be implemented as set forth.” From an e-mail sent to RWQCB by Patrick Sweetland
In short, expediency has consistently trumped caution. • If the RWQCB is prepared to approve placing contaminated Vista Grande Canal water directly into Lake Merced the project should go forward. • If not, under existing circumstances the project should be discontinued. That “the riparian buffer is likely to provide some level of treatment”is not good enough.
What is needed:Better public notice • The test area should be clearly indicated using buoys. • Notice should be posted at the Boathouse at the initiation of each test event, not a blanket notice for the season. • An “all clear” should be posted when conditions have returned to background levels.
What is needed:Better monitoring • Samples should be taken at the time of release and at two-day intervals until the impact has been eliminated. This will provide some idea as to the effect of die-off. • At least five samples should be taken at location 7 for each event to provide statistically significant measures of background.
What is needed:Better test design • Specific tests should be established to determine the effect of the various possible destinations of the metals. • Samples should be taken at various depths; the assumption of uniformity has not been supported. • Consideration should be given placing Vista Grande water directly into Lake Merced; this would indicate the effect of the biofilter.