Quantitative approaches: the Scientific Method • Description and ‘reduction’ using standardised data sets • Statistical inference I: testing propositions • Statistical inference II: prediction • All mediated through probability theory (pre-set significance levels) Orientation: the core principles over three weeks + SPSS workshop in week 10
Underlying propositions • The fundamental premise of science is that there are fundamental truths that exist (independently of human opinions about them) to be discovered • Research approaches in this paradigm require empirical work – evidence gathered through observation and measurement that can be replicated by others. • The accent is on objectivity (although scientists’ opinions on what they hope to discover may influence interpretations of results!)
Model building • Our interest is in discovering something that we assume is a ‘real-world’ phenomenon • Approaching this statistically means taking data that is available and using them in a meaningful way • This often involves building statistical models of the phenomenon of interest • The reason is that real-world data may best be explained by analogy • We collect ‘observed data’ and then try to ‘fit’ this to a model: our ability to infer from it depends on quality of ‘fit’ - is the model reasonably like the ‘reality’ of interest?
Andy’s equation Everything in statistics boils down to: Outcome i = (Model i) + error i Field, A. (2005:7, 2nd edition) Discovering Statistics using SPSS. London: Sage.
Measurement • Scientists search for appropriate scales that can be observed - e.g. mass, volume, height • Management and social science research seeks operational definitions for more abstract concepts: e.g. scales for measuring satisfaction, motivation, commitment, etc. These scales are contestable. • Not just measurement scales: to what extent must something occur to find a place on a particular measurement scale that constitutes a significant observation? • What’s known as measurement error is an issue.
Cases and variables • Scientists seek to identify cases - people, organisations, events on which to assemble evidence (in the form of variables) • Anything that may change under observation – e.g. employee commitment to managerial goals at a time of organizational changes – is a variable. Variables may be observed/surveyed and manipulated (experimental research).
Populations and samples • Ideally scientists wish to capture and analyse data from the population of interest • Resources constrain satisfaction of this aim • Sampling (random for generalisation purposes) is used to access part of a population for analysis: sampling error!
Preparation for quantitative analysis • Decide how the data is to be displayed and evaluated/tested and then work back to the basis on which it is gathered • Types of data generated by answers to questions opens the way for opportunities and limitations in terms of data display and reduction and evaluation …
Types of data: a fundamental feature of quantitative analysis • Recognising and understanding the characteristics of different types of data which might be collected in a survey-based research = key in selecting analytical tools. • Data can be broadly classified into three main types: (1) ‘nominal’ and (2) ‘ordinal’ categories, and (3) ‘cardinal’ (a term which includes ‘interval’ and ‘ratio’ data).
Characteristics of data types - statistical techniques for various types of analysis. Developed in part from Siegel and Morgan (1996) Statistics and Data Analysis: An Introduction
Question A1 Frequency table Consider Question A1: ‘From which airport did you start your journey?’. Three alternative answers are available coded 1, 2, 3. This is an example of nominal data (‘words’ as answers; no logical order). This data can be summarised by a frequency table and graph.
Consider Question A3 ‘Please rate the courtesy/helpfulness of the crew.’The responses to this question represent ordinal data. A bar-chart or a pie chart would be useful here. Choosing a pie-chart: Graphs for other nominal and ordinal data sets can be produced in the same way as the two illustrations shown.
Consider Question A5a ‘How much did you spend?’. The answers to this question are an example of a cardinal data set. Cardinal data is best represented by a histogram. By dividing the data into classes, the general pattern of the amounts spent can be seen.
Summary statistics IMeasures of ‘central tendency’ using categorical data • Considering the responses to A1 again, the only valid statistical average for this nominal data is the mode. There are in fact two modes here - Luton and Stanstead airports. • For question A4, ‘Please rate the comfort/cleanliness of the aircraft’, the responses do have a logical order and thus the data is ordinal. Here we can use the median as a measure of ‘average’, i.e. the ‘middle response’ will give a feel for a ‘typical response’. A median of 2 indicates that the middle response was ‘good’ and thus there were more favourable responses than negative ones. • However, more useful information can be gained from a frequency table as it gives the percentages of responses in each category.
Summary statistics IIMeasures of ‘central tendency’ and ‘dispersion’ using cardinal data • For Question A5a we can use statistics SPSS produces to summarise cardinal data. • In the example, it is meaningful to state: ‘The mean amount spent on drinks/snacks was £6.79 with a standard deviation of £2.54'. • But is the sample an accurate representation of reality? • To answer that question, we need to encounter the notion of ‘confidence intervals’
Statistical inference: confidence intervals Data sets tend to represent a sample taken from a wider population. ‘Confidence intervals’ are used to indicate the level of accuracy employed for inferring (or making estimates) from models built from samples generalised to the population that is the focus of interest. Generalisation within express confidence intervals is informed by probability theory.
Applying this practically, consider the responses to Question A3 ‘GOEASY’ survey 17 respondents out of 100 rated courtesy/helpfulness of the crew as ‘excellent’. We can determine a 95% confidence interval for this proportion using the formula: For a 95% confidence interval Z = 1.96 and p = 0.17 thus q = 0.83: 0.17 1.96 (0.17 x 0.83 /100) 0.17 1.96 (0.1411 /100) 0.17 1.96 (0.001411) 0.17 1.96 x 0.03756 0.17 0.07362 [0.09637, 0.24362] Thus there is a 95% probability that the proportion of all GOEASY passengers, not just those surveyed, who rate the courtesy/ helpfulness of the crew as ‘excellent’ lies in the range 9.64% and 24.36%. This large confidence interval results from the small sample size (100 respondents). Thus results from small samples should be treated with caution!
Confidence Interval for a Population Mean If the population standard deviation is not known then we use the t distribution in our calculation of the confidence interval. We use the value s as the best estimate of the population standard deviation. The formula for a confidence interval of the mean is t is found from t-tables to be found in many introductory statistics texts. The ‘degrees of freedom’ (an approximation of the sample size) is n -1. SPSS will perform these calculations: here we simply wish to open the ‘black box’ showing how the theory works .
Using the responses to question A5a in the ‘GOEASY’ survey dataset (value of customer spending on drinks and snacks), the confidence interval may be calculated as follows: [NB n = 78 below means that only 78 people answered this question on the survey document] Thus there is a 95% probability that the mean amount spent by all GOEASY passengers (i.e. not just those surveyed) lies in the range £6.22 to £7.36.
Confidence intervals with SPSS This SPSS table enables us to infer that the 95% confidence interval for the mean amount spent on drinks/snacks is £6.22 to £ 7.36 (see shaded area in the tabulation above). Thus we can be 95% confident that the true mean amount spent by all passengers (i.e. the population of ‘GOEASY’ travellers, not just those surveyed) lies between £6.22 to £7.36. Compare this with the manual calculation above.