|
|
Glossary of terms used on Survey Research
A
- Achieved Response
The group of people who actually replied to a survey. The size
and makeup of the group dictates the accuracy of any estimate we
can make of the view of the population.
See sampling.
- Arithmetic mean
A simple average calculated by adding up a group of values, and
dividing the sum by the number of values.
- Artwork
Typesetting quality original from which
printers make plates for printing. Often now provided "on
disk", meaning in a computer file of one kind or another.
B
- Benchmark
A fixed point with which to compare.
Comparing your survey results with those from other organisations
In an effort to put their survey results in context, people sometimes
try to compare them with those obtained by other organisations using
similar questions. See Services:
Benchmarking
Choosing a subset from your own survey to use as a benchmark
You might choose a benchmark subset and
then compare all the other subsets with it.
When making comparisons between a current survey and previous ones,
you might take 2004 as the benchmark and show results for 2005 and
2006 with the improvement / decline since the benchmark year.
The term derives from the distinctive
marks (known as benchmarks) made by the U.K. ordnance survey on
public buildings, bridges etc, whose height above sea level is shown
on Ordnance Survey maps. These provide a fixed point from which
surveyors can derive levels for other places they are surveying.
C
- Census
A survey in which every member of a population
is invited to respond.
- Class
A category within a classification system, see below.
- Classification system
If we want to create subsets from our response
based on people's characteristics such as gender, age range, location
etc. we must ask informants to classify
themselves, usually by ticking a box within a classification system.
- Closing date
The official, or published, closing date is the date by which we
tell informants we want responses
back.
Usually, responses keep trickling in after that date, though. The
actual closing date is the date on which we decide that no more
responses will be accepted, and we go ahead to analyse the data
already received. We plan this date in advance, but usually agree
it finally at the time and in consultation with you, based on
- the number of responses so far, and
- where they have come from, as well as
- the rate at which they continue to arrive.
- how urgently you need the results and
- how important it is to you to include every last possible response.
- Cluster
See topic
- Commitment
Label for a group of employee satisfaction measures concerning
the level of allegiance employees have with their employer. The
individual items and their aggregated value correlate with certain
individual and corporate performance measures so commitment measures
have been regarded as important drivers of improved performance.
See also Engagement.
The range around a sample result, within which the actual value for the population is likely to lie, at a given level of probability.
If we obtain a sample result of
62%, this is only an estimate of the true value we would have obtained
if we had been able to ask every member of the population. We may
calculate that the confidence interval associated with this result is
plus or minus 3% at the 95% confidence level. This means we can be 95% confident that the population result would be somewhere between 59% and 65%.
See also Finite Population Correction.
When we assess the significance of
any differences the survey appears to find, we do so at a given
confidence level, usually the 95% confidence level. This means that
we can be 95% confident that a difference which exceeds the sampling
error we calculate really is significant. The other 5% of the
seemingly significant findings will be due to exceptionally large
sampling errors.
The confidence levels usually offered are as follows
| Confidence level |
Sampling error
based on |
| 68% |
1 standard
error |
| 95% |
2 standard errors |
| 99% |
3 standard errors |
- Correlation
When two sets of data appear to vary in
the same way, they are said to be correlated. If you visit a school
and measure the height and weight of every pupil, those who are
taller will tend to be heavier too. There will be exceptions to
the rule, though, so although using the data collected you could
make a good guess at the weight of an unknown child based only on
knowing how tall they were, you would be caught out now and again
by a very fat short one, or a very thin tall one. In this case height
and weight would be positively correlated to a high degree.
We measure correlation using a correlation coefficient. A correlation
coefficient of 1 means that as one value increases so the other
will increase in a completely predictable way. A correlation coefficient
of -1 means that as one value increases so the other will decrease
in a completely predictable way. A correlation coefficient of 0
means that there is no connection between the two sets of data.
In the example above, height and weight might be correlated with
a coefficient of 0.7
D
- Demographics
A shorthand for the classifications
referred to above.
- Design factor
When we assess the significance of
any differences a survey appears to find, we compare the apparent
difference with the difference which might have arisen as a result
of the sampling process (sampling error).
Only when the difference is greater than might have arisen through
sampling error do we say the difference is significant.
The instrument itself introduces further
error, however, because different people interpret language differently,
so their understanding of an item we included
in the instrument may not be the meaning we intended. To provide
for this extra error, the sampling error may be increased by the
design factor. For example, if we decide that we should allow for
a further 20% margin of error then we multiply the sampling error
by a design factor of 1.2 and only regard as significant any difference
which exceeds the new, bigger range of error.
E
F
- Focus Group
A group of people brought together to
provide their input to a particular issue or problem. When
developing an instrument, we often
use focus groups drawn from the target population
to get their perspective on the issues to be measured. This ensures
we cover the relevant issues, and avoid producing an instrument
which asks everything except the one thing the target groups wish
to tell you.
G
- gsm
Grams per square metre - measure of the
weight of a paper. Standard copier paper is 75 - 80 gsm, meaning
that one square metre of the paper would weigh between 75 and 80
grams. More prestigious papers are usually about 100gsm. At about
150 gsm, the material would begin to feel more like a card than paper.
I
- Index
A single figure representing a range of measures and comparing
one thing with another.
A stock market index is arrived at by calculating the value of
a given "basket" of shares and comparing the current value
with the value at an earlier benchmark date. The result is usually
presented as a current value compared with a base (the benchmark)
of 100. So if the market has been rising and the value of the basket
of shares has increased from £43,023 at the benchmark
date, say 1 April 1998 to £69,415 now the index would
be 69,415 divided by 43,023 and multiplied by 100 = 161. The index
should be quoted as 161 base 1 April 1998 and it tells us that the
particular basket of shares has increased in value by 61 percent
since the base date.
We sometimes use indices to summarise survey results and
compare performance in one area with another, or to compare with
an earlier measure, rather as a stock market index does.
- Informant
One of the people completing and returning a questionnaire, or
otherwise providing information about their characteristics, attitudes
and opinions in a survey.
- Instrument
A survey questionnaire. The purpose of the survey is to measure
attitudes and opinions. A measuring tool (like a rule or a micrometer)
is known as an instrument, and so is the questionnaire which is
the tool for this kind of measuring.
- Item
Each separate question in an instrument
is called an item. It is called an item because it might not actually
be a question. Often it will be a statement such as "I like my job"
and informants will be asked to tick one
of a series of boxes to show how strongly they agree or disagree
with the statement. There are lots of other kinds of item which
might be used, many of which are also not questions. In practice,
the terms "item" and "question" tend to be used
interchangeably.
K
- Keying
Survey items may be positively or negatively
keyed. The distinction concerns the wording of the item and the
system adopted for converting responses on questionnaires into numerical
scores for analysis. A five point scale from Unacceptable to Excellent
might be represented by the numbers 1 to 5, so a respondent's tick
next to Unacceptable would be recorded as a score of 1, and an Excellent
response as a 5.
In this case, higher scores are a good thing, and we refer to the
item as positively keyed. If instead we had chosen to use 5 to represent
Unacceptable, and 1 to represent Excellent, lower scores would mean
more favourable responses, and the question would be a negatively
keyed one.
The distinction is equally relevant in a case where all responses
are expressed as a level of agreement say from Totally Disagree,
scored as 1, to Totally Agree, scored as 7.
If we then offer a positive statement such as "I like working here"
for the respondent to agree or disagree with, this is a positively
keyed item. But a statement such as "ABC Company staff are offhand
on the phone" would be a negatively keyed item, because a higher
level of agreement with it, represented by a higher score, would
be a bad result.
- Keystrokes
The number of keyboard key depressions needed to input the data
represented on one completed questionnaire. Most
items can be input with a single numerical keystroke. Multiple
choice items count as several keystrokes - as many as there are
options to choose from.
Any demographic
or classification items are
coded and count as many keystrokes as there are characters in their
codes. e.g. A classification system with codes a, b, c .. z., or
0, 1, 2 .. 9 would take 1 keystroke and one with codes aa, ab, ac
... zz or 00, 01, 02 .. 99 would need 2 keystrokes.
L
M
- Management Services
Generic name for the application of a range of techniques for the
study of work and organisations with a view to bringing about improvement.
Defined in BS 3138: 1992 Glossary of terms used in management services
as
The provision of advisory and information services to assist
management in improving effective use of resources. This may embrace
the use of work study, O & M, operational research, data processing,
ergonomics, economic forecasting, and industrial engineering.
Usually practised by independent or internal consultants without executive
authority. Their conclusions are usually presented as recommendations
for line management to consider.
N
P
- Panel
A permanent representative sample maintained
by a market research agency from which information is obtained on
more than one occasion either for continuous research or for ad
hoc projects. (MRS Research Buyer's Guide)
- Percentage
An easy way to compare proportions by saying how many each represents
out of one hundred.
If we asked people in an office if they wanted a coffee machine
which made real coffee instead of instant and we found that 38 of
the 54 people in department A agreed, and 46 of the 63 in department
B, it is hard to know which department is more enthusiastic. But
if we say that those agreeing were 70.4% in department A and 73.0%
in department B we can easily see that department B is more in favour
than department A.
Working them out:
38 divided by 54, multiplied by 100 = 70.37037, which we round
off to 70.4 or 70
46 divided by 63, multiplied by 100 = 73.01587, which we round
off to 73.0 or 73
- Percentile
A percentile (abbreviated to %ile) expresses the average
response to a scale item as if the scale had been from 0 to 100. So it
provides a satisfaction score or an agreement score, always out of 100.
It provides a way of converting
results measured using different scales (response
frames) to a common scale of 100 points. Even when different
scales have not been used, it can often be easier to understand
a result expressed as a percentile than a
raw score.
Imagine, for example that we want to compare results from one survey,
or two or more different surveys and some results are on a scale
from 1-5 and others on a scale from 1-7. The same answer can mean
different things according to which scale applies. Say two questions
had the answer 3. On the first scale (1-5), this is exactly the
midpoint, but on the second (1-7) it is closer to the lowest possible
score (1) than to the highest (7).
To work out a percentile, the scale is divided into 100 so-called
percentile points. By working out how far along its possible scale
each average result lies, and expressing it as a percentage of the
way along, we can say at which percentile point the average lies,
and make the results comparable one with the other.
Some examples:
| Scale |
Average
Raw Score |
Percentile |
| 1 - 5 |
3 |
50 |
| 1 - 7 |
3 |
33.33 |
| 1 - 5 |
2.3 |
32.5 |
| 1 - 7 |
2.3 |
21.67 |
| 1 - 7 |
4 |
50 |
| 0 - 1 |
0.45 |
45 |
| 1 - 5 |
1 |
0 |
To calculate a percentile from a raw score,
calculate
(Raw Score - Min) / Range * 100
where Raw Score is the average raw score; Min is the minimum of
the scale; Range is the maximum of the scale minus the minimum of
the scale.
Taking as an example the fourth line in the table above, a Raw
score of 2.3 on a scale from 1 to 7;
Raw Score = 2.3; Min = 1; Range = 7 - 1 = 6.
So %ile = (Raw Score - Min) / Range * 100
= (2.3 - 1 ) / 6 * 100
= 1.3 / 6 *100
= 21.67
- Population
Statistical term for the whole group about
whose characteristics or views we are trying to learn, when we study
only a sample chosen from within it.
R
- Random sample
A sample selected using a technique which ensures that every member of the population
has an equal chance of being selected. Choosing the first 1,000 names
from a telephone directory (sorted alphabetically) would produce a
sample but Mrs Aardvark and Mr Brown would have a better chance of being
included than Mr Zziwa, so it would not be a random sample. Problematic
non-random samples are most likely to arise from a sampling frame
which has been sorted on a relevant characteristic (say postcode, which
might correlate with household income) or has a pattern inherent in it
which might coincide with a sampling interval you might use to select a systematic sample.
- Raw score
When we capture the data from an instrument,
we have to convert ticks in boxes to codes or numbers which the
computer can handle. If an item is a statement
with an agreement scale, there might be five boxes for the
informant to tick, labeled as shown below. We key the score
shown, according to the box ticked. This is known as a raw score,
because it hasn't yet been subjected to any processing.
|
Box label
|
Strongly disagree |
Disagree |
Neither agree nor disagree |
Agree |
Strongly agree |
|
Score
|
1 |
2 |
3 |
4 |
5 |
Having filtered out a subset, then for each
item in the survey we can add all the scores
we have recorded and divide by the number of them to arrive at an
average raw score for this item, within this subset.
- Representative sample
A sample chosen so that it fairly represents the make-up of the population.
This means that the mix of relevant characteristics (Age, Gender,
Product used, Region etc.) is the same as in the population. If the
sample is a small one, it is very hard to choose a sample which
comprises matching percentages of informants taking account of many
different characteristics, say matching the percentages mix of the
population on Gender, Ethnic origin, Age groups, Income and Disability.
Even if it were possible to find a sample whose mix did mimic the
population on all these characteristics, the achieved response
might not. For this reason, we usually are working with samples which,
while reasonably representative, is not wholly so. If an estimate of the
population average view is required, this can be found by reweighting the results.
- Respondent
See informant
- Response frame
The mechanism through which informants
answer the item. It might be a range of tick-boxes
labeled to represent a scale; an agreement scale, say, or a scale
from Very dissatisfied to Very satisfied. In these
cases, the informant would be asked to tick one box. For a multiple
choice item it would be a series of options and the informant might
be asked to tick only one, or as many as apply. For a free text
comment, it is just an area in which the informant can write (or
on the web, type) their response.
- Response rate
The number of responses received, usually expressed as a
percentage of the total number of people invited to respond.
see Response rate enhancement
- Responses
The questionnaires actually returned. See
Achieved response.
S
- Sampling
A technique by which we learn about the characteristics or views
of a whole group (population) by gathering
data about only some representative members of it. The result is
an estimate of the characteristics or views of the whole group.
The accuracy of the estimate depends on the size of the sample and
the popularity of the characteristic or view we are trying to estimate. See Sampling error
See also Random sample; Representative sample; Sampling interval; Sequential sample; Stratified sample; Systematic sample.
- Sampling error
If the sample has produced the result 42% and we estimate the sampling
error (or confidence interval) at plus or minus 3% we might express the result as 42% ±3%.
This means that the population result would have been in the range
39% to 45%. Even this isn't quite specific enough, though, because
in an extreme case the population result might be outside even this
range. So we have to say how sure (how confident) we are that the
population result would have been in the range stated. There are
three commonly used confidence levels; roughly 68%, 95% and 99%
confident, corresponding to sampling errors of plus or minus one,
two and three standard errors respectively.
The most popular confidence level is 95% and this is the one our
reports use unless you ask us to do something different. This means
that when we say that a difference shown on a report is significant
there is only a one in twenty chance that it actually isn't (95%
= 19 out of 20).
Unfortunately, there are several ways these results can be expressed.
Taking the example already used, and assuming that we are 95% confident
of the result given, it might be expressed in any of the following
ways. They all represent exactly the same result:
- 42% ±3% at the 95% confidence level
- 42% ±1.5% at the 68% confidence level
- 42% ±4.5% at the 99% confidence level
- Sample mean: 42% Confidence level 95% Confidence interval ±3%
- 42% Limits of accuracy ±3% at the 95% confidence level
- Sampling Frame
A list comprising one record for each member of the population from which
a sample can be chosen.
- Sampling interval
The number (or average number) of steps through the sampling frame between records to be included in the sample. In a systematic sample, to take a 10% sample, you would select every 10th record, so the sampling interval is 10. For a truly random sample, the number of steps between selected records would be random numbers which average 10.
- Self-administered
A survey instrument designed for the
informant to complete unaided. The distinction
is between this and an instrument which is intended for completion
by a professional interviewer based on an interview with the informant.
- Sequential sample
A sampling technique used when you can’t predict the response rate. If you know you need an achieved response of 200 and you have a mailing list of 20,000 to use as the sampling frame,
how many will you mail? You might only get a 1% response rate, in which
case, you would need to mail all 20,000 but if the response rate was 2%
you would have spent twice as much as you needed to on the mailing.
The
trick is to send a small mailing first, to test the response rate, so
you mail 1,000 and count the responses you get. Now that you know what
response rate to expect, you can select a further sample big enough to
provide the achieved response you need.
- Significance
If we compare the results for the same question from two different
groups of informants, they might appear to show a difference between
the views of the two groups. Before drawing attention to it, and
proposing action based on it, we need to be sure that the difference
could not reasonably be explained simply as the result of
sampling error. If the difference is greater than the
sampling error we could reasonably expect, then we say the difference
is significant. Our standard
reports highlight significant differences at a given confidence
level between subsets or occasions of
running the survey.
Generally, the smaller the sample size,
the greater the sampling error.
- Stakeholder
A convenient jargon term which embraces an organisation's customers,
employees, shareholders, suppliers, neighbours etc; in fact anyone
who has any interest in what the organisation does. The term is
popular lately in government circles and in local government, where
"stakeholders" include Council tax payers; other residents; businesses
and their employees; users of services like leisure facilities and
libraries who may not be resident within the local authority area;
shoppers and mere passers through.
- Standard deviation
A statistical measure of the variation in a set of data. We often
use an average to summarise a number of data items, but an average
tells you nothing about the extent of the spread or "scatter"
of the individual values around it. That is the purpose of working
out the standard deviation.
These two lists of values both average 100 but their standard
deviations are very different.
| |
110 |
150 |
|
| |
98 |
90 |
|
| |
102 |
110 |
|
| |
90 |
50 |
|
| |
95 |
75 |
|
| |
105 |
125 |
|
| |
102 |
110 |
|
| |
99 |
95 |
|
| |
98 |
90 |
|
| |
100 |
100 |
|
| |
100 |
100 |
|
| |
101 |
105 |
|
| Average |
100 |
100 |
|
| Standard deviation |
4.7 |
23.6 |
|
- Standard error
A statistical measure of the extent to which the average of a
sample might differ from the
population average.
- Stratified sample
If you plan to break the survey results down into subsets, you need to ensure that the resulting subsets will provide sample sizes big enough to draw useful conclusions from. So you may need stratify the sampling frame
by splitting it into the categories you will subsequently use to create
the subsets. Then you select a sample in each stratum big enough to
produce a useful sample in the achieved response. This means probably choosing a different percentage from each stratum as below.
|
Clients |
Achieved sample required |
Predicted response rate |
Sample size |
% sample |
Product 1 |
2,545 |
40 |
50 |
80 |
3.1 |
Product 2 |
1,866 |
40 |
50 |
80 |
4.3 |
Product 3 |
752 |
40 |
50 |
80 |
10.6 |
Product 4 |
120 |
40 |
50 |
80 |
66.7 |
- Subset
Any group of informants defined in terms
of their responses to questions in the instrument.
The responses to a survey may be summarised and reported as a whole,
but it is usually helpful to see separately the results obtained
from groups of informants who have some features in common.
A subset may include all female informants, say, or all clients
in the South of England. We can set rules to control whether respondents
are included in a subset via a class or a
range of classes in any of the classification systems by which respondents
have been classified, and / or by specifying responses to any question(s)
in the survey.
A subset definition may admit all members of a single class, (e.g.
a group which includes all females); or a range of classes (e.g.
those in departments c to e). Classification systems may be combined
so if your survey includes codes for department, job type, and length
of service we could create a subset which includes anyone who works
in departments coded a to c, in jobs coded d or f and who has length
of service coded c or higher.
We can also define a subset in terms of responses to the questions
in the body of the survey, so if there was a question about the
frequency of meetings with a five point scale for responses from
"never" through to "very frequent", we could create a subset comprising
people who said they had meetings never or only occasionally. This
would allow us to see how this group of people answer the other
questions in the survey. We can do the same sort of thing by comparing
one question with another, so if a survey asked people to rate various
sources of information we could create a subset of those who say
they get more information from the grapevine than from organised
meetings.
We can also construct weighted subsets
from a number of simple subsets, to produce results which estimate
the results we might have obtained from an overall response in which
the representation of classes was different from that which was
actually received. This is valuable when the distribution of responses
does not reflect the true mix of classes in the population whose
views the survey is intended to estimate.
- Survey fatigue
The phenomenon whereby people get fed
up with filling in survey questionnaires. It becomes more acute
when surveys are repeated too often, or when they appear to be irrelevant,
or pointless. Surveys which ask informants what they want changed,
but after which no change occurs, will often lead through survey
fatigue to a lower response rate next
time the survey is run.
- Systematic sample
A sample selected by taking every Nth record in the sampling frame.
This is fine, provided that there is no pattern in the sampling frame
which would mean that you would be picking the same sort of informant every time.
Say every block of flats on an
estate has ten flats. There are three floors, each with three
one-bedroom flats, then a top floor with a three bedroom flat. You want a
10% sample, so you choose every tenth property. You will get either all
one bedroom, or all three bedroom properties and the sample will not be
representative. In this case, you need to use a truly random sampling technique instead.
T
- Time used
A method of fixing consultancy fees. We perform whatever parts
of the project you have instructed us to do, we keep records of
the time we devote to your work, and bill you for the time spent.
Our daily rate for consulting work is currently £800 per
day plus expenses and VAT. For part days, we bill at £100
per hour plus expenses and VAT. Our minimum billing period is 5
minutes, so we don't charge for an hour if the job takes only ten
minutes.
- Topic
Items (questions) may be grouped into topics, either for reporting
purposes or to allow topic averages to be calculated. Topics
are often called clusters.
- Topic average
Topic (or cluster) averages may be just the arithmetic
mean of the results for the items which make up the topic. If
we are calculating topic averages for you, the items in the topic
must all use the same scale for responses but positively and negatively
keyed questions can be combined to produce
a measure of how favourably informants have responded to the topic.
They can be weighted if you wish, so that some
items are given more weight than others. Each item can be included
in as many topics as you wish, and may have a different weighting
assigned for use in each topic in which it features.
- Transfer of learning
Transfer of learning has occurred when knowledge and skills learned
show themselves in the behaviour of the learner. It is the difference
between knowing how a situation should be handled and actually doing
it that way when it arises.
Many drivers would be able to tell you
that the right way to deal with a rear wheel slide is to steer into
it. Not so many would actually do the right thing when the skid
happened. They are the ones for whom transfer of learning has occurred,
usually as a result of having had the motivation and the opportunity
to practise.
V
- Validity
A measure of the extent to which an instrument truly measures what
it claims to measure. For example, if we are trying to construct
a measure of customer loyalty, we might include an item which says
Next time you need a widget, will you choose an ABC widget?
The item is said to have face validity if, as in this case, it appears
on the face of it that it would measure customer loyalty. (It is
actually a measure of repurchase intention, which is one aspect
of customer loyalty.)
If we offer a scale of responses from certainly not to
certainly and administer the instrument to several different
groups of people, we will get a good measure of the relative loyalty
of the different groups. All we have measured so far, though, is
what people say they will do. If, as part of the instrument development
process, we can administer the instrument to a group of people whose
widget purchasing we can then monitor, so we know who bought a widget,
and whether the one they bought was indeed an ABC widget or some
other manufacturer's, we call this data the criterion. It is the
standard by which we are testing our instrument in the way that
an instrument for measuring distance might be checked against a
known official measure. We can then calculate the
correlation between the responses to the question in the instrument
and the criterion - people's real loyalty as demonstrated by their
buying behaviour. We may be able to show a link between the results
from the item in the instrument and people's future behaviour. The
strength of this link is a measure of the predictive validity of
the item in our instrument.
This is a costly and difficult process to go through and it is
often impossible or impractical to obtain criterion data. For this
reason, many employee and customer satisfaction measures depend
on face validity alone.
- VAT
Value added Tax. The European Union sales tax. In the U.K. the
VAT rate is currently 20% of the cost of most goods or services,
including ours.
- Verify
Key data a second time, comparing the
first and second versions to see that they agree. Provides greater
confidence in the accuracy of the data to be analysed.
W
- Weightings
When we average several items to arrive at a measure of an overall concept, i.e. a topic
or cluster, sometimes it isn't appropriate to give every item in the
list equal importance. Or if we are trying to estimate the views of a population but the demographic
mix in the responses we have received does't reflect the equivalent
mix in the population, we need an average which gives the different
classes weights which reflect their representation among the population
rather than how many responded to our survey.
In either case, the solution is
to use weighted averages. In this example, we have decided that item 2
is twice as important as items 1 and 3, so its weighting is twice the
weighting assigned to the others, and its value influences the result
more than they do. A simple average of the three values is 35. The
weighted average, the total of the Weight x Value column divided by the
sum of the weights, is 190/4 = 47.5.
Item |
Value |
Weight |
Weight x Value |
1 |
20 |
1 |
20 |
2 |
75 |
2 |
150 |
3 |
10 |
1 |
10 |
Totals |
105 |
4 |
190 |
Averages |
35 |
|
47.5 |
We can calculate weighted topic
average results for a topic, and we can calculate
weighted subset results.
|
If this panel gets in your way, click here
to reduce or restore its width, or here to remove it for the duration of this visit.
|