Technical Supplement
Figure B1: Evaluation Designs
Groups |
Time Series |
Period of Measurement Pre & Post Policy |
Post Policy Only |
|---|---|---|---|
| True Control Group | Design 1 Comparison of policy affected group and a randomly selected control group. Able to track policy impact over time. Able to check that the 2 groups initially have similar characteristics |
Design 2 Comparison of policy affected group and a randomly selected control group. Able to check that the 2 groups initially have similar characteristics |
Design 3 Comparison of policy affected group and a randomly selected control group. |
| Non-equivalent control group | Design 4Comparison of policy affected group and a similar as possible control group Able to track policy impact over time. Able to check that the 2 groups initially have similar characteristics |
Design 5 Comparison of policy affected group and a similar as possible control group Able to check that the 2 groups initially have similar characteristics |
Design 6 Comparison of policy affected group and a similar as possible control group |
| Single group | Design 7 Comparison of extrapolated 'policy off' trend with actual 'policy on' trend Able to track policy impact over time. |
Design 8 'Before and after' comparison of the policy affected group |
Design 9 Comparison of what would have happened to the policy affected group in the absence of the policy with what actually happened to the group |
Description of Designs - True Control Group Designs |
|
| B3. | Two truly random groups are selected from the client population. Only one of these actually receives the programme, the other - the control group - does not. Comparisons of the selected output measure(s) can then be made between the two groups in order to determine if the policy or programme has made any significant difference. A comparison of the groups at the pre-policy period either as a time series (design 1) or at single points in time (design 2) can act as a check that the two groups initially do have identical characteristics. However, if there are statistically significant differences in the pre-programme period then it may be necessary to have, for example, a stratified sample based on the characteristics of the programme affected group. If pre-programme implementation measures for the two groups are not available, then the groups will have to be compared in the post-programme implementation period only (design 3). |
Non-equivalent control group designs |
|
| B4. | As with the true control group design, comparisons can be made between the two groups in the post-implementation period only (design 6). However, the designs will be stronger if there are pre-programme measures either as a time series (design 4) or at single points in time (design 5) so as to ensure that the two groups are initially alike in terms of the characteristic being measured. |
| B5. | In the situation where a time series is available for the non-equivalent group, it may be possible to use trend extrapolation (see paragraph B7) of the group. Here the trend of a non-equivalent control group is applied to the baseline position of the policy affected group to determine the base case situation. This design could be used in the absence of pre-programme time-series data for the policy affected group but selection of an appropriate comparison group is crucial. An example of this is where the number of small businesses in NI was not monitored prior to the programme's implementation, then the trend of small business growth in an area in Britain (the non-equivalent control group) could be applied to the baseline to determine what would have happened in the programme's absence. The use of this design assumes that the non-equivalent control group has not been affected by GB policies. |
Single Group Designs |
|
| B6. | Single group designs involve the policy affected group only. Where time series information on the group is available (design 7) then it is possible to incorporate trend extrapolation into the designs. It is essential in this design to have pre and post policy time series data for the policy affected group. The pre-programme time series is projected forward into the operation period to act as a base case. The base case is then compared with the actual trend to estimate the impact of the programme. |
| B7. | . Trend extrapolation designs involve trying to predict an alternative outcome by projecting the patterns (trends) identified before the programme began into a period when the programme is in operation. This involves the assumption that these patterns are an adequate representation of what would have happened in the absence of the policy. It should be recognised that events such as external shocks can make the past a poor predictor of what might have happened and so complicate constructing a base case. |
| B8. | The pre and post policy single group design (design 8) is a simple comparison of measures taken 'before and after' policy implementation. The assumption in this case is that the pre-policy position would have continued in the absence of the policy. The pre-policy situation therefore acts as the base case. However, as with the trend extrapolation approach, the 'before' or pre-policy position may not be a suitable projection of what would have happened in the absence of the policy. |
| B9. | With the single group post-policy only design (design 9), information is necessary on what would have happened without the policy for the design to be feasible. Usually this type of information can be obtained through a survey of the policy affected group and it may even be possible to derive retrospective data on the pre-policy position and so have a 'before and after' or more appropriately, a 'with and without policy' comparison. |
| 'What if' design | |
| B10. | A further type of design -'what if' - is also possible but it is a design of the last resort where there is little or no information for the pre and post phases of the programme to enable proper comparisons to be made. In this case the key question is 'what would happen if the programme was terminated now?' The design then takes the form of comparing the consequences of stopping the programme with allowing it to continue. The evaluation then becomes like an appraisal (a look forward) rather than an evaluation (a look back). |
Selecting and Using designs |
|
| B11. | The derivation of the base case is crucial to the evaluation as it is the comparison of the base case with the actual outcome which will determine whether the policy is achieving value for money. The evaluator should therefore spend some time deciding on which design is best suited to his requirements. The choice of base case design will often hinge on data availability and whether or not the programme is already in operation. |
| B12. | Ideally an evaluation should be planned and designed before the programme has been implemented. However, in practice the decision to evaluate may only be taken once the programme has been in existence for some time. In this case an evaluation plan will not have been incorporated into the programme's operation and use of certain evaluation designs will not be possible. |
| B13. | The ideal situation is for the evaluation to include a true control group and a policy affected group with measures taken for both groups before and after the policy has been implemented. However, much depends on identifying a suitable comparison group together with a reasonable time series of indicators associated with that group. The pre programme comparison of the two groups serves as a check on whether the groups are at least comparable with respect to whatever the programme intends to achieve or change. True control groups are the strongest type of evaluation design. Comparisons in the pre-programme period should show no significant differences between the two groups if they have been truly randomly selected. |
| B14. | Comparison of the policy affected and a non equivalent control group in a single period (i.e. post programme) is less reliable as the two groups may not have had identical characteristics in the period immediately prior to the programme's implementation. If, in practice, it proves problematical to find a suitable comparison group then reliance has to be placed on the single policy affected group and constructing a base case using trend projection and/or surveys. However, evaluations in which only the policy affected group is measured makes interpretation of the results more difficult. Moreover, with respect to trend extrapolation, it should be recognised that, for example, external shocks can complicate the construction of a counterfactual position in the 'policy on' period. |
| Quantitative and Qualitative research methods | |
| B15. | The plan for the evaluation will consider the methods to employ and the measures to use to obtain the information needed for the analysis. Research methods are often divided into two broad categories, quantitative and qualitative |
| B16. | A quantitative approach emphasises the measurement of outcomes and attributes causal effects by means of comparison. Where possible, an attempt should be made to quantify the outputs of a programme. Quantitative methods may involve measuring the levels of inputs and outputs using existing monitoring records or the collection of information by means of surveys or standardised tests. Comparisons of quantitative indicators generated from surveys, tests or monitoring data are frequently a feature of experimental designs (i.e. with control groups). Surveys may take the form of questionnaires, interviews and observation (questionnaires are the most commonly used). The issues to be considered in survey research include the size of the sample, method of sampling, the acceptable level of sampling error and whether the survey should be one-off or repeated over time. |
| B17. | Quantitative methods can provide reliable measurements and comparisons which can be summarised easily and accepted as representative of the population as a whole. They can also be straightforward to repeat, for example by re-running a survey or test, but have limitations in that they tend not to be able to study respondents in depth nor be adaptable to individual circumstances. |
| B18. | Qualitative approaches emphasise the description and understanding of a programme's operation and effects and are useful for exploring concepts, attitudes and behaviour. They are concerned more with the nature of the programme than with providing quantification. The main research methods used for qualitative work are unstructured in-depth interviews (characterised by open-ended questions), group discussions with operators, participants and decision makers, focus groups, participant observation and case studies. Qualitative work may use direct quotation, careful description and open-ended narrative. This can make analysis difficult as responses are neither systematic or standardised. However, insights may be gained through in-depth interviews which might not be revealed in a structured questionnaire. A limitation of the qualitative approach is that sample sizes tend to be small which can prevent wider generalisation from the research. |
| B19. | Whilst qualitative methods permit the evaluation to explore selected issues in detail, quantitative methods fit diverse experiences into predetermined response categories. An advantage of the quantitative approach is that it measures the reactions of a great many people to a limited set of questions thus facilitating comparisons and statistical aggregation of data. However, while it is tempting to place more importance to the perceived objectivity of statistics the figures may have limitations. Qualitative methods, on the other hand, can, indicate the complexities of the change process, help in understanding how programmes work and how those involved (target groups and providers) view their success and failure. However, care is needed because their selective nature may distort the findings. |
| B20. | The type of evaluation design chosen and data availability will influence whether the quantitative or qualitative approach is more appropriate, although most evaluations would benefit from a combination of the two. Conclusions which are supported by a range of methods and data sources should be the most reliable. Quantitative and qualitative approaches should therefore be seen as complementary rather than alternatives and can be used together in addressing different questions within each evaluation. The choice of methods and designs must however be made in the context of the questions that need answered and the timescale and resources for the evaluation. The selection of the most appropriate research methods can therefore be a difficult task but the starting point is a recognition that there are options. |
Analysing Net Additionality |
|
| B21. | The aim of the analysis is to assess the programme's effectiveness and to give an indication of what the programme is buying. To achieve this a comparison should be made between the base case and the actual outcome using one of the designs outlined earlier. The difference between the two cases is a measure of the net additionality of the programme. |
| B22. | Different types of net additionality can be measured:
|
| B23. | The comparison between the base case and the programme's actual outcome, in terms of the output indicator(s) chosen, encapsulates activity that would have occurred in the programme's absence, (deadweight) and also activity which has been displaced by the programme's existence, ( displacement). It also takes account of supplier multiplier and local multiplier effects. These impacts are described in more detail below. |
| Deadweight | |
| B24. | Deadweight is activity that would have occurred regardless of the policy. Deadweight is a difficult concept to measure as the beneficiaries of schemes may be reluctant to admit they would have produced the same outputs without the schemes. Given the difficulty of targeting expenditure, deadweight of 50 per cent or more may often be found. Attempts are often made to improve the targeting of programmes and this should reduce deadweight over time. |
| Displacement | |
| B25. | Displacement of activity within a local area can occur:
|
| B26. | Displacement varies with the programme supported and with the size of the area covered. For some local services e.g. food retailing, hairdressing, vehicle repairs, displacement within a travel-to-work area (TTWA) may be close to 100 per cent. Departments who find that their spatially targeted schemes support such activities need to consider displacement effects carefully. |
| B27. | In order to improve comparability, evaluation studies should provide information on displacement on a local, regional and national (UK) basis, not just the areas of particular interest in terms of the programme concerned. If the policy measure being evaluated applies to more than one area, credit should not be claimed for activities transferred between the areas. |
| Supplier Multiplier Effects | |
| B28. | The supplier multiplier effect results from one industry or sector making purchases from other sectors in the local economy and so boosting employment in these sectors. This process of sectoral interaction continues until the amount of money being re-spent during each round of activity becomes negligible. Evaluation estimates of supplier multipliers, in terms of effects on employment in local labour markets (TTWA or equivalent) have ranged from 1.05 (Enterprise Zones) to 1.11 (Regional Enterprise Grants). Estimates above that range should be supported by robust analysis and empirical evidence. |
| Income Multiplier Effects | |
| B29. | Additional local activity is likely to raise local income and this will generate additional expenditure in the area adding to local employment. The wider an area is defined, (provided it remains small relative to the total national economy), the higher will be the income multiplier. For small areas it is important to estimate how many of the additional employees are resident within the policy area as those who came in from outside (e.g. for construction work) may spend little of the additional income in the area covered by the policy or programme. For most activities local income multiplier effects are fairly small: estimates are generally around 1.1. Estimates significantly above this will need strong analytical support. Regional multipliers, where relevant, may be larger: estimates have ranged from 1.2 - 1.5. If such an estimate is proposed, supporting analysis will be required for the particular region or wider area. |
Use of Control Groups to measure net additionality |
|
| B30. | When using control groups (true or non-equivalent) to measure net additionality the output indicator is measured for both groups, preferably before the policy is implemented and during and after the policy implementation period. |
| Example: | |
| B31. | In February 1990, a one year training programme was introduced aimed at reducing the number of young long term unemployed in North Belfast. Prior to the programme's implementation, there were 550 long term unemployed under the age of 21 registered in the area. Of 50 trainees randomly selected to receive the programme 10 became employed on completion of the training programme. To determine the extent to which the programme contributed to the trainees finding employment, the evaluator must examine the performance of the 500 who did not receive the programme (the control group). It was found that 10% of this group found employment over the period of the programme. |
| B32. | The effect of the programme, in terms of the output indicator chosen i.e. employment, is the difference between the control group and the policy affected group at the time of the evaluation. Assuming that the performance of the trainees would have been the same as that of the control group, then 10% i.e. 5 of the trainees would have obtained employment regardless of the programme (the base case). Comparing the base case with the actual outcome it can be concluded that the programme resulted in an additional 5 of the target group gaining employment. |
Figure B2: Control Group Analysis
Time period |
Output Indicator |
Control Group |
Policy Group |
|---|---|---|---|
t0 |
Long term unemployed under 21 |
500 |
50 |
t1 |
Long term unemployed under 21 finding work |
50 |
10 |
% finding work |
10% |
20% |
t0: Time period prior to policy/programme implementation
t1: Time period at time of evaluation
| B33. | The pre-policy or programme measures act as a check on the results and are essential when using a non-equivalent control group in order to ensure that the control group and policy affected group are initially comparable. |
| B34. | Using this approach will not allow separate identification of deadweight, displacement or the multiplier effects for the indicator chosen. Moreover, it should be recognised that the programme may impact on other factors not covered by the indicator(s) chosen, for example, it may impose external costs or benefits on third parties. |
Use of Trend Extrapolation to Measure Net Additionality |
|
| B35. | When a single group design is being used to measure net additionality, trend extrapolation may be possible if a time series of the output indicator is available. The effect of the programme is estimated, as described earlier, by extrapolating the policy off period trend of the indicator onto the policy on period and comparing this projected trend with the actual outcome. Again the difference encapsulates the elements of deadweight, displacement and multiplier effects. |
| B36. | A variation of the trend extrapolation is to base the projection of the policy-on period on a non-equivalent group. This method would overcome the problem of not having a complete time series in the policy-off period but is a weaker approach. |
| Example: | |
| B37. | In December 1985 a programme was introduced aimed at increasing the number of small businesses in NI. In 1995 it is decided to evaluate the programme. The number of small businesses immediately prior to the programme's implementation (the baseline) is known to have been approximately 1,280. The number of small businesses in 1995 is approximately 1,560 an increase of 280. The evaluator has to determine how much this increase was due to the programme, i.e. how much was net additional. If, prior to programme's implementation, the number of small businesses rose on average by 1% pa, then projecting the trend would result in approximately 1,420 small businesses in 1995. Comparing the base case with the actual outcome it can be concluded that over the operation period the programme has resulted in 140 small business start ups. |
Figure B3: Single group trend extrapolation |
|
| B38. | As with the control group analysis, this approach will not allow separate identification of deadweight, displacement or the multiplier effects for the indicator chosen. Moreover, it should be recognised that the programme may impact on other factors not covered by the indicator(s) chosen, for example, small business growth might be obtained at the expense of a reduction in larger businesses. This will not be picked up by the estimate of the base case but it should be considered in the analysis. |
| Identifying deadweight, displacement and multiplier effects | |
| B39. | The use of surveys, or the use of results from previous relevant studies, can provide information about deadweight, displacement and multiplier effects. Surveying those in receipt of the programme, e.g. UDG recipients, interviews with local businesses not in receipt of the programme and interviewing the wider community may indicate the extent of deadweight, displacement and income and supplier multiplier effects. Qualitative aspects can also be gauged using surveys. Again surveying true control groups will generate more robust results than simply surveying the policy affected group only. Ex-Post Cost-Benefit Analysis |
| B40. | So far the analyses have been couched in terms of one indicator (usually the key output indicator for the programme concerned). In reality there are likely to be a number of key indicators and a useful way to present these is within an ex-post cost benefit framework. With Cost-Benefit Analysis, all relevant costs and benefits over time associated with the programme can be compared with the base case (see figure B4). This is particularly useful if the programme being evaluated has multiple objectives and where quantification is difficult items can be listed and considered within the CBA framework alongside the quantified items. It also allows for the analysis of efficiency and effectiveness measures. |
| B41. | The framework for an ex-post CBA will normally follow a similar sequence as that outlined for evaluation:
|
| B42. | Costs and benefits covered by an evaluation will often include:
|
| B43. | In wider evaluation, any significant costs and benefits which have affected other parts of the public sector or the private sector should be included and separately identified. Expenditure may have lead to both gainers and losers and information on how the costs and benefits were distributed among different organisations, sectors of the economy, or individuals can be an important part of the evaluation. |
| B44. | Where costs and benefits can be valued, the basis for valuation should be their economic cost, i.e. their 'opportunity cost', which is the value of the resource in the most valuable alternative use. This is usually given, near enough, by market values. |
| B45. | Economic costs do not necessarily involve spending or receiving cash, for example, an organisation may already own an asset which, if not employed in the policy or project could have been used for other purposes or sold. Use of this asset therefore has an opportunity cost. |
| B46. | Any important costs and benefits which cannot be valued in monetary terms should at least be recorded and whenever possible quantified. The money values of costs and benefits should normally be expressed in 'real terms' at the general price level applying when the evaluation is carried out. In the absence of relative price changes, general inflation simply raises all cash values by a given percentage and thus it is convenient to express all costs and benefits at the same general price level. |
Figure B4: EX-POST CBA
Example: Enhanced Skills Training Programme
| Base case | Programme outturn | Comparison | |
|---|---|---|---|
| (What would have happened without the programme) | (Actual outcome of the programme) | (Difference between Base case and outturn) | |
| Direct Costs | Capital (£) | Capital (£) | Extra capital costs |
| Current (£) | Current (£) | Extra running costs | |
| Direct benefits/ outputs | |||
| Intermediate output Final output Final output in money terms |
No. persons trained Value of jobs (£) |
No. persons trained No. trainees finding work Value of jobs (£) |
Extra people trained Additional no. of trainees who find work as a result of enhanced training (this is net of deadweight. i.e. those who would have found work regardless of training) Value of additional jobs (£) |
| Indirect effects (+/- spin-offs) | |||
| Displacement | Displacement of jobs as skilled workers enter the labour force (-) | Displacement of jobs as extra skilled workers enter the labour force (-) | Displacement (-) |
| Supplier Multiplier | Supplier multiplier (+) | Supplier multiplier (+) | Net Supplier multiplier (+) |
| Local income Multiplier | Local income multiplier (+) | Local income multiplier (+) | Net Local income multiplier (+) |
| Net Total (£) | Net value of base case output | Net value of programme output | Net value of additional output |
| Outputs which cannot be valued in money terms | List significant items - possibly compile an impact statement of the effect of each item** - it may be feasible to use weighting and scoring to combine a number of outputs into a single overall measure even though they cannot be valued in money terms |
*Assume programme runs at former level of provision.
** See HMT, Appraisal and Evaluation in Central Government,
"The Green Book".

