OFWAT COST ASSESSMENT – ADVANCED ECONOMETRIC MODELS 20 March 2014 FINAL REPORT Submitted by: Cambridge Economic Policy Associates Ltd. CONTENTS Glossary ................................................................................................................................i Executive summary ............................................................................................................. v 1. Introduction ..................................................................................................................1 1.1. Objective.................................................................................................................................... 1 1.2. Changes since the January 2013 ‘CEPA Cost Assessment Report’ .................................. 1 1.3. Process........................................................................................................................................ 2 1.4. Structure of the report ............................................................................................................. 4 2. Approach to modelling ................................................................................................ 5 2.1. Explanatory variables............................................................................................................... 5 2.2. Economies of scale (Cobb-Douglas versus translog).......................................................... 7 2.3. Estimation methods and efficiency specifications............................................................... 8 2.4. Panel length ............................................................................................................................. 12 2.5. Smoothed versus unsmoothed capex .................................................................................. 12 3. Model selection criteria ..............................................................................................15 3.1. Theoretical correctness.......................................................................................................... 16 3.2. Statistical performance........................................................................................................... 17 3.3. Robustness testing.................................................................................................................. 25 3.4. Practical implementation issues............................................................................................ 27 3.5. Regulatory best practice......................................................................................................... 28 3.6. Results coding ......................................................................................................................... 28 4. Model selection...........................................................................................................31 4.1. Introduction............................................................................................................................. 31 4.2. Water ........................................................................................................................................ 31 4.3. Sewerage................................................................................................................................... 34 4.4. Other considerations.............................................................................................................. 37 5. Triangulation ............................................................................................................. 39 5.1. Triangulation options............................................................................................................. 39 5.2. Efficiency adjustments........................................................................................................... 41 Annex 1: Explanatory variables ......................................................................................... 44 A1.1 Water ........................................................................................................................................ 44 A1.2 Sewerage................................................................................................................................... 50 Annex 2: Alternative variables ........................................................................................... 53 A2.1 Water ........................................................................................................................................ 53 A2.2 Sewerage................................................................................................................................... 54 Annex 3: Regional wages................................................................................................... 56 A3.1 Constructing the regional wages variable............................................................................ 56 A3.2 Alternative regional wage variables...................................................................................... 58 Annex 4: Water templates.................................................................................................. 59 Annex 5: Sewerage templates ............................................................................................ 82 Annex 6: Efficiency calculations and challenges .............................................................102 A6.1 Calculating efficiency............................................................................................................102 A6.2 Applying efficiency challenges............................................................................................104 A6.3 Summary of efficiency adjustments ...................................................................................107 Annex 7: Logarithmic transformation of predicted values ..............................................108 Annex 8: Non-normalised coefficients of final models.................................................... 110 A8.1 Water ......................................................................................................................................110 A8.2 Sewerage.................................................................................................................................111 Annex 9: Recommendations for PR19.............................................................................. 112 A9.1 Capacity measures.................................................................................................................112 A9.2 Usage measure.......................................................................................................................112 IMPORTANT NOTICE This report has been commissioned by Ofwat. However, the views expressed are those of CEPA alone. CEPA accepts no liability for use of this report or for any information contained therein by any third party. © All rights reserved by CEPA Ltd. i GLOSSARY Term Definition ASHE Annual Survey of Hours and Earnings Baseline A cost value, derived from the model forecasts/company business plan forecasts, which is used in a menu or price control. BCIS Building Cost Information Service Between estimator Refers to the variation across comparators’ explanatory variables in a data set. It is used in conjunction with the within estimator (variation in the company’s explanatory variables over time) in panel or pooled regressions to estimate the coefficients on explanatory variables. Capex Capital expenditure Cobb-Douglas model The Cobb-Douglas (or log-linear) model transforms the variables into logarithms prior to estimation. This model is deemed superior to a linear model in the cost modelling literature as it does not require marginal costs to be constant as in the linear model. Even so, the Cobb-Douglas model is in itself restrictive because, inter alia, it assumes that the extent of returns to scale is the same irrespective of firm size. Compare with translog model. Corrected OLS (COLS) See ordinary least squares (OLS) defined below. COLS follows the same statistical technique as OLS (i.e. estimating a line of best fit by minimising the sum of squared errors), however the ‘average’ line is shifted towards a ‘frontier’ point i.e., this may be an upper quartile (best) performing company in terms of relatively low costs for its level of outputs. The average line is shifted by changing the intercept point, but no change is made to the slope of the line. Correlation (coefficient) A correlation coefficient is the measure of linear interdependence between two variables. The value ranges from -1 to 1, with -1 indicating a perfect negative correlation and 1 indicating a perfect positive correlation. Zero indicates the absence of correlation between the variables. Corridor The range calculated by using the model parameters, against which company cost forecasts are evaluated. Data envelopment analysis (DEA) A quantitative non-parametric technique that optimises the number of inputs required for a particular output and vice versa. It does not require assumptions on the functional form, but it also does not allow statistical testing on the significance of explanatory variables. FPL Future Price Limits Generalised least squares (GLS) GLS is a technique for estimating the unknown parameters in a linear regression model. It is applied, for example, when some of the assumptions of the classical regression model break down – such as when the variance of the disturbances is assumed to be non-constant across observations (heteroskedasticity) or when there may be correlation between the disturbances (autocorrelation). The technique is used to estimate the random effects panel model (where there is dependence between ii Term Definition observations of the same firm over time). Hausman test This test provides information on whether the fixed or random effects treatment is most appropriate. A high value of the statistic (which represents a rejection of the null hypothesis) indicates that the fixed effects model is preferred to the random effects model. Otherwise the random effects treatment is preferred. Heteroskedasticity One of the assumptions underpinning the classing linear regression model is that the disturbances are homoskedastic (that is have a constant variance). When the disturbances are heteroskedastic this means that the variance of the disturbances is not constant across firms (an example is where the disturbances increase as firm size increases). I&C Industrial and commercial customers IRC Infrastructure renewals charge (annual allowance) IRE Infrastructure renewal expenditure (actual) Maximum likelihood estimation (MLE) This is a method of estimating the parameters of a statistical model. Under the standard assumptions underpinning the classical linear regression model, MLE produces identical estimates to those produced by OLS. However, MLE has been shown to have desirable (large sample) properties under a wide range of assumptions (unlike OLS) and this method is therefore used in a wide range of contexts, including stochastic frontier analysis. Information is needed concerning the distribution of the errors to implement MLE. Menu regulation Menu regulation is a form of regulation where regulated companies are no longer presented with a ‘take it or appeal it’ regulatory offer regarding the allowed level of expenditure, but are instead given a range of options from which to choose. MNI Maintenance of non-infrastructure expenditure (actual) Multicollinearity An exact linear relationship between two or more explanatory variables characterises the extreme case of perfect collinearity (approximate linear relationships between variables are more common in practice). In the former case (perfect collinearity) the OLS procedure cannot be implemented. The latter case (approximate linear relationships) results in high standard errors. Whilst the parameter estimates and estimates of the standard errors are not biased as such, the problem is that it will be hard to draw conclusions on the impact of individual variables on the dependent variable. The overall predictive power of the model is not reduced (only the ability to use the coefficients individually). Opex Operating expenditure Ordinary Least Squares (OLS) OLS is a method by which linear regression analysis seeks to derive a relationship between company performance and characteristics of the production process. This method is used when companies have relatively similar inputs and outputs. Using available information to estimate a line of best fit (by minimising the sum of squared errors) the average cost or production function is calculated. iii Term Definition Pooled OLS The pooled OLS model treats the data as if it was a cross-section – that is, e.g. 90 firms, rather than a panel of 10 firms over nine years. This approach does not therefore recognise the panel structure of the data, and can be tested against the panel model variants. It is however a simple model that is used by economic regulators in particular. Pooled Stochastic Frontier Analysis (SFA) model This is a maximum likelihood estimation model that is the same as COLS except that a one-sided error term is included to permit the existence of inefficiency (with the error term decomposed into its noise and inefficiency components). This approach requires distributional assumptions on the error components. PR14 Price Review 2014 Real price effects (RPEs) The amount by which certain input prices are expected to move relative to RPI (either increased/ decreasing at a faster rate). Regional BCIS index A proxy for regional differences in construction prices, based on tender prices from the BCIS. Time invariant efficiency model: Fixed Effects (FE) This is the standard fixed effects model used in the panel data literature, except that in this case the fixed effects terms are given an inefficiency interpretation. In the fixed effects model, firmspecific effects (unobserved differences between firms) are estimated as fixed parameters to be estimated, by including firmspecific dummy variables in the regression. However, the true distinction between fixed and random effects is whether the effects are correlated with the other regressors or not (in the case of random effects the effects are assumed to be uncorrelated with the regressors, whereas in fixed effects the effects are permitted to be correlated with the regressors). It is sometimes said that this approach is concerned only with the particular firms in the sample (i.e. that the sample contains all relevant firms and there are therefore no additional firms outside the sample of interest). The random effects model treats the unobserved firm effects as randomly distributed across firms (so here we see the current sample as being drawn from a wider sample or population). It has been pointed out in the literature that in fact the fixed effects model can be reformulated and estimated as a random effects model, so the distinction concerning whether the effects are stochastic or not is erroneous (see, for example, Greene, Econometric Analysis, 5th Edition, page 285). Time invariant efficiency model: Random Effects (RE) This is the standard random effects model used in the panel data literature, except that in this case the random effects terms are given an inefficiency interpretation. The random effects specification imposes the assumption that the unobserved individual effects are uncorrelated with the regressors. Time-invariant SFA model This is a maximum likelihood model and an extension of the random effects model but now with distributional assumptions imposed and with estimation proceeding via MLE, not generalised least squares (GLS), as in the standard panel data random effects model. See Pitt and Lee (1981). iv Term Definition Time varying SFA model This is a maximum likelihood model that extends the model above to permit efficiency to vary over time but in a restricted way, since the direction of efficiency change over time must be the same for all firms (and thus rankings cannot change). See Battese and Coelli (1992) Skewness Skewness is a term used to describe non-symmetric distribution (a right skewed distribution has a longer “tail” to the right and vice versa for a left skewed distribution). STW Sewage treatment works Total factor productivity (TFP) A measure of the economy’s long-term technological change. Totex Total expenditure (opex + capex) Translog model The translog model is one of the so-called flexible functional forms and is used routinely in the academic literature. In the current context one of its particular advantages is that it allows the degree of returns to scale to vary with firm size. The CobbDouglas is nested within the translog so it is possible to test the Cobb-Douglas restriction. Triangulation The use of multiple methodologies and the numbers from them (averages, max, min etc.) to come up with a single value for cost assessment. UKWIR UK Water Industry Research WaSC Water and sewerage company Within estimator Refers to the variation in the company’s explanatory variables over time in a data set. It is used in conjunction with the between estimator (variation across companies’ explanatory variables) in panel or pooled regressions to estimate the coefficients on explanatory variables. WoC Water only company WTW Water treatment works v EXECUTIVE SUMMARY Introduction Since August 2012 CEPA, in conjunction with Dr Andrew Smith of the University of Leeds, has been assisting Ofwat in developing water and sewerage econometric cost models. In January 2013 Ofwat published CEPA’s Cost Assessment Report1 as part of their methodology consultation, which discussed the viability of totex modelling in water and sewerage. Since the January report we have received new data from Ofwat, the August 2013 data, and have used this to retest and refine a broad range of models. The models presented in this report use the most recent data, spanning up to 2012-13. They cover total expenditure (totex) in wholesale water and base expenditure (operating and base service capital maintenance expenditure) in wholesale sewerage. Ofwat has modelled sewerage enhancement separately, mainly using unit cost models. In agreement with Ofwat, we excluded several types of costs from the econometric modelling – such as third party costs – as those are beyond the companies’ control. Ofwat are addressing these costs separately in the risk-based review. A report prepared by Jacobs on behalf of Ofwat will be published alongside this report. The Jacobs’ report sets out forecasts for the explanatory variables used with the recommended models to help Ofwat set the cost benchmarks for the companies. Table E.1 below provides a summary of the cost areas included in the advanced econometric models. We model different expenditure breakdowns in water and sewerage. Table E.1: Expenditure modelled Type of expenditure Water Sewerage Wholesale Wholesale Network Treatment & sludge Opex + base capex     Totex     In water, we have some models that cover all of totex, while others only cover base expenditure, i.e. excluding enhancement capex.2 In sewerage, we approached modelling in a slightly different way. We attempted to model totex but it did not prove viable. Therefore, all the sewerage models presented in this report exclude enhancement capex. The data allowed us to split costs between network and treatment/ sludge, and, to model these areas separately as well as modelling them together as wholesale base sewerage expenditure. We worked with Dr Andrew Smith and Ofwat to develop the models to use for calculating cost allowances at PR14 and then to test their robustness. This process began in August 2012 and our development has included an initial consultation with UKWIR and specific inputs on technical issues from several academic advisors. We recognise that given the data constraints and a range of estimation techniques, no econometric model will perfectly reflect all of the 1 CEPA. Ofwat: cost assessment. January 2013. 2 In these cases unit costs are added to determine totex. vi companies’ characteristics.3 As such, our proposed approach for Ofwat is for them to use a number of models with different variables and/or estimation techniques, and triangulate between these models to determine robust cost benchmarks for the companies. We have tested the modelling and undertaken external Quality Assurance (QA) – as a result, we consider that our analysis and recommendations are in line with regulatory best practice. Model selection Our model selection process began with viability testing of totex and opex plus base capex models. When we established that modelling was viable we received additional and revised data from Ofwat covering the years up until 2012/13 – the August 2013 data-set. In order to choose between models, five standard and commonly implemented criteria were used to assess a long list of models:  theoretical correctness;  statistical performance;  practical implementation issues;  robustness testing; and  regulatory best practice. We used these criteria to first reduce our long list of models and then refined this list further by focusing on the statistical performance and robustness testing criteria. We found it difficult to identify suitable metrics to help choose between models in a mechanistic way, so we have adopted an approach based on a ‘traffic-light’ system to indicate how well the model performs against a given criterion, i.e., a ‘green light’ corresponds to ‘good’, ‘amber light’ corresponds to ‘acceptable but with a few issues’, and a ‘red light’ means that the model is flawed. We did not assign a red light to any model for theoretical correctness as the models had already been narrowed down to a theoretically robust set in discussions with Ofwat, UKWIR and by implementing established econometric approaches to modelling. The other categories – statistical performance and robustness testing – do allow for a red traffic light, in which case the model would no longer be considered a candidate. For the former, a red light indicates that several of the core parameter estimates are substantially outside our expectations. For robustness testing it means that either the efficiency scores resulting from the model or the prediction are implausible; or that there is significant evidence for having different coefficients in different time periods. Our final selection process is summarised in Figure E.1. 3 We note that we would not expect any of the models to perfectly predict companies’ expenditure due to inefficiencies. vii Figure E.1: Model Selection Process Identify Theoretical Cost Drivers Functional Form • Translog or Cobb-Douglas • Interaction between scale and density Logical Criteria Sensibility of coefficients and elasticities Statistical Tests • Statistical significance • Hausman / Mundlak testing • Goodness of fit • Robust standard errors Robustness Testing and Model Refinement • Dropping observations/refinement • Dropping variables/using alternative variables • Time-pooling test Final Model Selection Theoretical Correctness Model Performance Robustnessand Selection We believe the preferred models provide a range of efficiency specification methods (timeinvariant efficiency and time-varying), estimation techniques (GLS [RE] and OLS), and full and refined models where available. All our preferred models are in log form (which means the coefficients can be interpreted as elasticities) and allow for different economies of scale for different size companies (referred to as translog models). Our testing and other studies in this sector supported this choice.4 While these types of models are less transparent than standard non-varying economies of scale (which we refer to as Cobb-Douglas [CD]) specifications they better reflect the reality of the economies of 4 For example see Stone & Webster, Investigation into evidence for economies of scale in the water and sewerage industry in England and Wales: Final Report, prepared for and published by Ofwat, 2004, and Saal et al, Scale and scope economies and the efficient configuration of the water industry: a survey of the literature, Aston Centre for Critical Infrastructure and Services Working Paper, Aston University, UK, 2011. viii scale present in the water and sewerage industry.5 Our use of a log specification does mean that the cost predictions generated may be biased, either over- or under-estimated depending on the shape of the production function, and an adjustment factor is required to ensure that the linear transformation of the cost predictions are not biased.6 We have proposed that Ofwat use an adjustment factor in line with that used by Ofgem for DPCR5 and RIIO-GD1. Ofgem referred to this as the ‘alpha correction factor’.7 All the models selected excluded regional BCIS as there is a high correlation between this variable and the regional wage variable. We found that models that included BCIS resulted in unexpected coefficients. We believe that the regional wage variable explains more of the regional price variations than the BCIS. We recommended that Ofwat use five water models and five sewerage models. For water we proposed three model specifications, run using GLS (RE) and/or OLS. Our recommended water model specifications were:  A full model specification including all explanatory variables provided to us by Ofwat including our estimation of the regional wage variable, but excluding BCIS. Model WM3.8  A refined model specification including only variables which we found to be statistically significant or were important cost drivers from a theoretical perspective. Models WM5 and WM6.  An opex plus base capex model using similar explanatory variables to the refined model above, but excluding enhancement expenditure. Models WM9 and WM10. Ofwat modelled the enhancement expenditure separately. Table E.2 below lists the preferred water models’ performance against the selection criteria. Note, when comparing the models we used on the COLS and GLS (RE) efficiency scores, however when the models are triangulated (discussed later) the efficiency target relies only on a correction factor. Based on this application of the modelling results the only difference between the COLS and GLS (RE) is the weight given to the within (variation over time for a company) and between (variation across companies) estimators, with GLS placing more weight on the within estimators than OLS. 5 Cobb-Douglas is a production function rather than a cost function. We are modelling the latter, but we have however used the term CD as the concept is similar. 6 This is explained in statistics as Jensen’s inequality. 7 Ofgem, RIIO-GD1: Initial proposals – Step-by-step guide for the cost efficiency assessment methodology, August 2012, page 12 and Ofgem, Electricity distribution price control review; Final proposals – allowed revenue – cost assessment appendix, December 2009, page 87. 8 We tested a GLS (RE) fully specified model, however as the number of explanatory variables exceeded the number of companies the between estimator could not be computed. The programme we used, LIMDEP, still estimates the full model, but we do not have confidence in the results produced. ix Table E.2: Final water models Theoretical correctness Statistical performance Robustness check Totex WM3 – full translog COLS without BCIS G A A WM5 – refined translog COLS without BCIS G G G WM6 - refined translog GLS (RE) without BCIS G G G Opex + base capex WM9 – refined translog COLS without BCIS G A G WM10 – refined translog GLS (RE) without BCIS G G G The sewerage models we selected were all opex plus base capex models. We could not establish a viable sewerage totex model which produced consistent and robust results. Our recommended sewerage model specifications were:  A sewage treatment model specification run with both GLS (RE) and OLS. The explanatory variables were ‘refined’, as a ‘fully’ specified model did not produce significantly different results from the refined model. Given that there are only 10 comparators in sewerage we considered the greater number of degrees of freedom gained outweighed any potential small loss of explanatory power. Models SM5 and SM6.  A sewer network model specification run using only GLS (RE). Again we used a refined model as we did not find any advantages from using a ‘fully’ specified model. We did not use an OLS model as the coefficients were not in line with our expectations and their interpretation was not consistent with those of the cost drivers. Model SM1.  A sewerage opex plus base capex model specification run with both GLS (RE) and OLS. This model specification used similar explanatory variables to the treatment and network models, however as treatment makes up a greater proportion of expenditure the load explanatory variable was preferred to the length variable. Models SM9 and SM10.  In all cases Ofwat modelled the enhancement expenditure separately. Table E.3 below lists the preferred sewerage models’ performance against the selection criteria. x Table E.3: Final sewerage models Theoretical correctness Statistical performance Robustness check Network opex + base capex SM1 - refined translog GLS (RE) G G G Treatment & sludge opex + base capex SM5 - refined translog GLS (RE) G G G SM6 - refined translog COLS G G G Wholesale opex + base capex SM9 - refined translog GLS (RE) G G A SM10 - refined translog COLS G G A Triangulation and efficiency estimation As we had recommended the use of multiple models to Ofwat, an approach to establish a single estimate across these models was required, for water and sewerage in turn, i.e. a triangulation method. Our proposed triangulation method was based around the following criteria:  maximising the intermediate information each option offers, i.e. estimate from ‘bottomup’ models capturing different parts of the value chain and estimate from ‘top-down’ models capturing the whole value chain;  transparency;  logical flow, i.e. do the weights placed on each model make intuitive sense; and  ease of implementation/ replicability. Our recommended approach follows a logical process of estimating separate elements of the value chain or cost categories (we term these bottom-up models) and top-down models (capturing the whole value chain/ more aggregated costs) before triangulating these together to get a single prediction. Based on this approach and given the need to avoid ‘cherry-picking’ results (i.e. selecting the upper quartile in all models),9 we recommended that the calculation of the cost benchmarks be done based on the final single prediction. We recommended that the simple ratio approach to estimating efficiency should be used.10 This is a transparent approach which avoids cherry-picking, is replicable and has regulatory precedent (this and alternative approaches are discussed in section 5.2).11 9 When we refer to the upper quartile we are referring to the upper quartile efficiency performance, which is equivalent to a lower quartile cost. 10 Rather than using both forms of efficiency estimation (e.g. based on residuals from the econometric modelling and ratio). 11 Ofwat used ratios in PR09 and Ofgem has used ratios for RIIO-GD1 and RIIO-ED1 fast track decisions. xi We note that we found small differences between the alternative options of triangulating at different stages of the modelling, or using a mix of residual and ratio efficiency estimation. In addition, we recommended to Ofwat that the efficiency adjustment be calculated on historical data rather than forecast expenditure. Using historical data means that the companies are compared against the relative past performances rather than their future estimated performance. In the former case, there would be no limit on the number of companies which could be determined as ‘upper quartile’ performers against the benchmark. If the forecast expenditure was used, then there would a limited number of ‘good’ performers as there are a fixed number of companies in each quartile. We did not provided a recommendation to Ofwat on how far from the average industry performance they should set the cost benchmark, e.g., upper quartile/ upper third. We do however consider that this should be based on the level of confidence Ofwat has in the predictions from the modelling and how challenging they wish to make the targets for the companies. This will be a matter of regulatory judgement by Ofwat. 1 1. INTRODUCTION Since August 2012 CEPA, in conjunction with Dr Andrew Smith of the University of Leeds, has been assisting Ofwat in developing water and sewerage econometric cost models. In January 2013 Ofwat published CEPA’s Cost Assessment Report as part of their methodology consultation, which discussed the viability of totex modelling in water and sewerage. Since the January report we have received new data from Ofwat, the August 2013 data, and have used this to retest and refine a broad range of models. The models presented in this report used the most recent data, spanning up to 2012-13. They cover total expenditure (totex) in wholesale water and base expenditure (operating [opex] and base service capital maintenance expenditure [capex]) in wholesale sewerage. Ofwat has modelled sewerage enhancement separately, mainly using unit cost models. These Ofwat models are discussed in a separate report published alongside this one. A report prepared by Jacobs on behalf of Ofwat sets out forecasts for the explanatory variables used with the recommended models to help Ofwat set the cost benchmarks for the companies. 1.1. Objective This report sets out the testing that we undertook to get to a set of robust models for water and sewerage. It also sets out our recommendations for assessing costs for these services in PR14. We worked alongside Ofwat to ensure the modelling is consistent with the rest of the PR14 framework. We also shared initial results of our totex models with the UKWIR steering group in September 2012 to better understand what the industry viewed as its main cost drivers. This also allowed us to understand and build-on the total expenditure benchmarking work undertaken by Reckon on behalf of UKWIR.12 Dr Andrew Smith, of the University of Leeds, took a leading role in the initial development of the approach and definition of possible model structures. He then continued to provide support and guidance to the CEPA team during the testing of various models and the determination of preferred options. This included the provision of expert advice and guidance during the robustness testing phase of the project. In addition, Dr Michael Pollitt, of the Judge Business School at the University of Cambridge, and Jon Stern, of the Centre for Competition and Regulatory Policy at City University, have provided independent external review of the approach we have adopted. We also sought technical advice from Professor William Greene, of the NYU Stern Business School, on the principles of random effects versus corrected ordinary least squares and separating unobserved heterogeneity from inefficiency. They have not reviewed the final models we assess in this report but we have taken their comments into account when selecting the preferred set of models. 1.2. Changes since the January 2013 ‘CEPA Cost Assessment Report’ Since the publication of the CEPA Cost Assessment Report, we have conducted additional modelling and updated our analysis using the latest dataset which included the companies’ August 2013 submissions. There are several significant changes to the results presented in the CEPA Cost Assessment Report, namely: 12 UKWIR, A total expenditure approach to cost assessment, 2012, http://www.ukwir.org/web/ukwirlibrary/95954. 2  We are no longer modelling sewerage opex at a sub-company level. Instead, we prefer the use models that combine opex and capex to avoid capex bias.  We were also able to model treatment base capex and sludge base expenditure (opex and maintenance capex) due to revisited data splits. This led to an increase in the coverage of the econometric modelling, which in turn has reduced the use of unit cost models. In agreement with Ofwat, we excluded several types of costs from the econometric modelling – such as third party cost – as those are materially uncertain. Ofwat are addressing these costs separately in the risk-based review. Table 1.1 below provides a summary of the cost areas included in the advanced econometric models. We model different expenditure breakdowns in water and sewerage. Table 1.1: Expenditure modelled Type of expenditure Water Sewerage Wholesale Wholesale Network Treatment & sludge Opex + base capex     Totex     In water, we have some models that cover all totex, while others only cover base expenditure, i.e. excluding enhancement capex.13 In sewerage, we approached modelling in a slightly different way. We attempted to model totex but it did not prove viable as indicated in the CEPA Cost Assessment Report. Therefore, all the sewerage models presented in this report exclude enhancement capex. The data allowed us to split costs between network and treatment/sludge, and, to model these areas separately as well as modelling them together as wholesale base sewerage expenditure. 1.3. Process The process we have followed in developing the econometric cost assessment models is set out in Figure 1.1 overleaf. This process included quality assurance via ongoing discussions with Ofwat, as well as input from UKWIR and technical advice from academic experts. As discussed above, the introduction of the August 2013 data meant that we had to revisit the viability of the models before we decided on a long list to assess. 13 In these cases unit costs are to determine totex. 3 Figure 1.1: Model development process Activities Interaction with stakeholders/ experts Ofwat review of the data Expert review Phase 1: Scoping Segmentation of costs (base/enhancements/totex) Identification of cost drivers Review of data for errors and inconsistencies Consultation with Ofwat and UKWIR CEPA academic advisor input Selection of cost drivers Testing of different specifications Testing different estimation techniques Phase 2: Model specification Ofwat econometric & engineering input CEPA academic advisor input Selection of estimation methods Selection of specifications Phase 3a: Viability/ long list of models CEPA academic advisor input Selection of preferred models Phase 4: Short list of models Selection of preferred models Phase 5: Final model selection Joint meetings between CEPA and Ofwat CEPA academic advisor input Ofwat academic advisor input Costassessmentreport(2012) Expert review Selection of estimation methods Selection of specifications Review of 2013 data for errors and inconsistencies Phase 3b: Viability/ long list of models Ofwat review of the data CEPA academic advisor input 4 It should be noted that the CEPA Academic Advisor, Dr Andrew Smith, was appointed as the Ofwat Academic Advisor mentioned in the figure above during the model development process. 1.4. Structure of the report The report continues as follows:  Section 2 describes our approach to modelling and the main issues we have looked at while testing, such as explanatory variables, economies of scale, efficiency assumptions, capex smoothing and panel length;  Section 3 sets out the criteria we have used to assess each viable model, including our scoring system;  Section 4 presents the preferred water and sewerage models; and  Section 5 discusses triangulation options and efficiency adjustments. The report also includes a number of annexes which give more detail on the testing we have done and alternatives considered:  Annex 1 sets out the variables used in water and sewerage;  Annex 2 discusses alternative variables that we have considered or tested;  Annex 3 describes how we constructed the regional wage variable used in the final models;  Annex 4 presents the detailed results for a selection of the water models;  Annex 5 presents the detailed results for a selection of the sewerage models;  Annex 6 details the efficiency calculations and adjustments, associated with different types of estimators;  Annex 7 details the options for transforming logarithmic values into level values;  Annex 8 provides the non-normalised coefficients for the final models recommended; and  Annex 9 provides recommendations for cost modelling in PR19. 5 2. APPROACH TO MODELLING As part of our analysis we have tested a wide range of models using the latest dataset, updated after the August submission, consistent with the cost drivers, methods and functional forms that we used during the previous stages of the analysis. These included translogs versus CobbDouglas (CD) functional forms (discussed in Section 2.2); ordinary least squares (OLS), generalised least squares (GLS) random effects (RE), fixed (FE), stochastic frontier analysis (SFA), and true random effect estimations (discussed in Section 2.3); the choice of panel length (discussed in Section 2.4); and smoothed versus unsmoothed capex (discussed in Section 2.5). As we mentioned in the introduction, the data used in our modelling had changed since the publication of our Cost Assessment Report. We were also able to add two years of data to the dataset that we started with in August 2012, which meant that the dataset used for the modelling in this report covered the period up to 2012-13. We note that the final dataset that we used had undergone significant changes, even in the historical costs, as some companies resubmitted their figures. The revisions to the historical data were not consistent across companies in terms of magnitude and direction. This led to changes in the models’ coefficients from our earlier cost modelling. We used the companies’ expenditure data submitted as part of the June Returns and August submissions as the dependent variable. As noted earlier Ofwat adjusted the historical expenditure to exclude certain wholesale costs that are materially uncertain (e.g. costs associated with third party services). We discuss the explanatory variables (cost drivers) and then the assumptions, and associated implications, in turn below. 2.1. Explanatory variables The majority of the variables that we included in our final models are defined in the same way as those we presented in the CEPA Cost Assessment Report. However, we tested a number of new variables and redefined a few of the existing variables used previously. In Annex 1 we provide detail on the specification for each explanatory variable and rationale behind their use. Annex 2 discusses alternative variables we considered and our rationale for not using them. Table 2.1 below presents all the explanatory variables we have tested in the various water models. Table 2.1: Range of explanatory variables in water models Type Variable Core Length of mains Property density Usage Time trend Input prices Average regional wage Regional BCIS index Network characteristics Population density (occupancy) 6 Type Variable Proportion of metered properties Proportion of usage by metered household properties Proportion of usage by metered non-household properties Treatment and sources characteristics Sources Pumping head Proportion of water input from river abstractions Proportion of water input from reservoirs Activity Proportion of new meters Proportion of new mains Proportion of mains relined and renewed Quality Properties below reference pressure level Leakage Properties affected by unplanned interruptions > 3 hrs Properties affected by planned interruptions > 3 hrs While we discuss the variables in more detail in Annex 1, it should be noted that the average wage variable we used is different from that constructed by Ofwat for PR09 and from that used in the January 2013 Cost Assessment Report. A brief description of the new variable is set out in Text Box 2.1 below. Text Box 2.1: Average regional wage The wage variable has been constructed by CEPA, supported by Ofwat, and is different from the way Ofwat constructed regional wages in PR09. It is based on regional rather than local area wage differences as we consider companies are not restricted to sourcing workforce from the county/area of operation. The variable excludes overtime pay and focuses on hourly rather than weekly pay to eliminate any differences that could be attributed to inefficiency or company policy. In this way, the wage variable is exogenous of the particular company and captures the ability of companies to source labour from areas with different wage profiles. We discuss the construction of this variable in more detail in Annex 3. As we decided to no longer conduct sewerage modelling at the sub-company level we did not include any drivers at the sub-company level. We did however include additional drivers for treatment and sludge. Table 2.2 below presents all the explanatory variables we tested in the various sewerage models. Table 2.2: Range of explanatory variables in sewerage models Type Variable Core Length of sewers Density Usage Time trend 7 Type Variable Input prices Average regional wage Regional BCIS index Network activity Proportion of sewers replaced and renewed Treatment and sludge Load Sludge disposed Proportion of load in treatment works size bands 1-3 Proportion of load in treatment works size bands 4 and 5 Proportion of loaded treated by activated sludge treatment Number of large works with the tight consents dummy We note that across both water and sewerage models a number of variables were highly correlated with each other (either negatively or positively). We have set out the correlation matrices for the water and sewerage explanatory variables in Annex 1. We discuss the implications of multicollinearity in Section 3.2.1. 2.2. Economies of scale (Cobb-Douglas versus translog) CD is a production function (which by duality, can be expressed as a cost function) which places weights on the input factors. The CD is a standard functional form used in cost assessment literature. When in a log-linear form the CD allows for the marginal costs to vary and coefficients to be interpreted as the elasticity of cost with respect to the corresponding driver. A translog introduces further flexibility by allowing the economies of scale to vary as well.14 We tested both functional forms in our modelling as previous literature indicated that there is evidence of varying economies of scale in the water and sewerage industry. For example, work commissioned and published by Ofwat (Stone and Webster 2004),15 suggested the presence of variable returns in the water industry, with evidence of diseconomies of scale for water and sewerage companies (WaSCs), but possible economies of scale for WoCs. Although, Stone and Webster could not reject the presence of constant returns to scale for water-only companies (WoCs). In addition, Saal et al (2011)16 found that, for WoCs, the average sample firm was subject to diseconomies of scale. However, it concluded that vertically integrated firms gained significant benefits from economies of scope and scale. We discussed the theoretical implications of the translog with Ofwat staff and we agreed with them that a translog form was viable. The results of our testing, using joint statistical significance of the translog terms, consistently showed that translog models were statistically preferred for both water and sewerage. 14 In practice this is achieved by adding the square and cross terms of the main scale variables to the equation. 15 Supra N4. 16 Supra N4. 8 2.3. Estimation methods and efficiency specifications 2.3.1. Range of estimation techniques tested There are numerous econometric estimation approaches that can be used with panel or pooled data. (The main difference between panel and pooled datasets is that pooled treats all observations as independent while panel data treats companies’ observations as being related over time.)17 As part of our earlier report and this subsequent refinement, we tested a number of approaches. These are set out in Table 2.3 below. Table 2.3: Estimation methods Estimation Method Description Pooled Ordinary Least Squares (OLS) The pooled OLS model treats the data as if it was a cross-section – that is, e.g. 90 firms, rather than a panel of 10 water and sewerage firms over nine years. Not recognizing the structure of the data causes the OLS estimator to place equal weight on the between variation (i.e. differences between companies) and within variation (i.e. differences between years for the same company) when calculating the estimate. OLS does not distinguish between white noise, heterogeneity and inefficiency, unlike the rest of the methods which make some assumptions about the decomposition of residuals into noise and other components such as inefficiency. Efficiency is calculated in each year using the difference between each firm’s residual and the minimum residual for that year (note, different companies may be at the frontier in each year). These efficiencies are then averaged over time (e.g. five years). Although efficiency is allowed to vary over time, we note that there is no structure to this variation. We do not use these efficiency scores in making the efficiency adjustments, however, so these differences are not crucial to the modelling. Pooled Stochastic Frontier Analysis (SFA) This is a maximum likelihood estimation (MLE) model requiring distributional assumptions on the error term and is the same as OLS except that a one-sided error term is included to permit the existence of inefficiency (with the error term decomposed into its noise and inefficiency components). This model attempts to distinguish between white noise and inefficiency, but does not try to control for company heterogeneity. The pooled element of this technique means that the data is (like Pooled OLS above) treated as a cross-section, thus the structure of the data is ignored and the same implications follow. Time invariant panel method - Random Effects (RE) Panel methods in general have the advantage that estimation takes into account the structure of the data. That is, it recognizes that we have 18 water companies over time, rather than different companies each year. In our case, it uses generalised least squares (GLS), which places more weight on the within variation than OLS when calculating parameter estimates. There are two broad categories of panel methods, RE and FE. RE require that firm-specific effects be uncorrelated with cost drivers. The error term thus captures the company effect and white noise. The company effect is assumed to be randomly distributed across firms (within and out of sample). While noise is assumed to have an expected value of zero, thus allowing us to estimate the average company effect, which is interpreted as inefficiency. Efficiency is thus assumed to be constant over time. The model does not distinguish between unobserved heterogeneity and inefficiency. 17 See Section 3.2.3 of the January 2013 CEPA Cost Assessment Report. 9 Estimation Method Description RE models are perceived to yield more precise coefficients than FE and OLS models but have unclear properties in small samples. Time invariant panel method - Fixed Effects (FE) FE is estimated via OLS. It allows for company specific effects to be correlated with cost drivers by estimating the company effect as a parameter in estimation (this can then be recast and interpreted as inefficiency). Efficiency is assumed to be constant over time. The advantage of the FE model is that it produces unbiased and consistent parameter estimates in the presence of correlation between company effects and cost drivers. However, these estimates may be less precise than RE estimates. That is, although FE may be unbiased, the point estimates in a particular sample may be less accurate than RE estimates. Other disadvantages of this model include that it cannot deal with time invariant regressors and the inclusion of company effects means that the number of parameters estimated grows with the number of companies. Time varying true RE This is a maximum likelihood variant of the above RE model that attempts to decompose the company effect into inefficiency and unobserved heterogeneity. This model assumes that heterogeneity is constant over time while inefficiency can vary. It also requires distributional assumptions about the error and heterogeneity terms. However, this model can have difficulties separating persistent inefficiency from time invariant heterogeneity. Time invariant panel SFA (Pitt and Lee)18 This is a MLE model requiring distributional assumptions on both the error and inefficiency terms. It takes the data structure into account. It is an extension of the RE model but with distributional assumptions imposed on the error and company effects (but doesn’t attempt to control for heterogeneity). Estimation proceeding via MLE. For this model, inefficiency is assumed to be constant over time. Time varying SFA (BC92)19 This is a MLE model requiring distributional assumptions on both the error term and on efficiency. It extends the model above (Pitt and Lee) to permit efficiency to vary over time but in a restricted way, since the direction of efficiency change over time must be the same for all firms (and thus rankings cannot change). Time varying SFA (Cuesta 2000)20 This is a flexible version of BC92 (also using MLE estimator) that allows for firm-specific paths of inefficiency. That is, some companies can be catching up or falling away from the frontier in any given year. This model was used by ORR for PR08. Time varying pooled OLS (CSS)21 This model permits firm specific time paths for inefficiency and tries to differentiate between statistical noise and inefficiency (as opposed to pooled OLS that does not differentiate), but without the need to impose distributional assumptions. One disadvantage of the Cornwell, Schmidt and Sickles (CSS) model is that it does not allow us to test the statistical significance of the time variation in inefficiency. 18 See Pitt and Lee, The Measurement and Sources of Technical Inefficiency in the Indonesian Weaving Industry, Journal of Development Economics, 9, 43-64. (1981). 19 See Battese and Coelli, Frontier Production Functions, Technical Efficiency and Panel Data: With Application to Paddy Farmers in India, Journal of Productivity Analysis, 3, 153-169. (1992). 20 See Cuesta R.A. A Production Model With Firm Specific Temporal Variation in Technical Inefficiency: With Application to Spanish Dairy Farms, Journal of Productivity Analysis 13 (2): 139-158. (2000). 21 See Cornwell, Christopher & Schmidt, Peter & Sickles, Robin C., Production Frontiers With Cross-Sectional And TimeSeries Variation In Efficiency Levels, Journal of Econometrics, 46, 185-200, (1990). 10 In general, we found that GLS (RE) models were preferred to FE, and that GLS (RE) and pooled OLS models provided more stable and robust results than SFA models. There are two key differences between a COLS approach using pooled data and a panel RE approach:  Panel RE models use GLS which calculates a weighted average of the ‘between’ (differences between the companies’ cost drivers) and ‘within’ (changes in the company’s cost drivers over time) estimators. While OLS uses both estimators as well, it places a much greater weight on the between estimator than GLS which leads to different results.  RE models require an assumption of time invariant inefficiency when decomposing the errors. The calculation of the inefficiency estimation across all the models is an important consideration which we discuss further in Section 2.3.2. Depending on how the companies’ inefficiency is calculated this may however be a moot point, i.e. in RE the inefficiency is calculated based on the error term as a secondary step, instead a ratio-based approach can be used which does not assume time invariant inefficiency (this is discussed further in Section 5). 2.3.2. Efficiency estimation The different methods used to estimate the coefficients make different assumptions about how efficiency varies (or does not vary) over time, which we explained in more detail in our earlier report. They also use different methods to estimate coefficients. Here, the most robust models tended to be the GLS (RE) models, which assume that efficiency does not vary over the time covered, i.e. five years for water and seven years for sewerage. Although this may seem a rather bold assumption, it is supported by the SFA testing,22 which allows for efficiency to vary in some systematic way (unlike OLS, which assumes that companies’ efficiencies are not related over time but rather vary in a random manner). In many cases the GLS (RE) models were preferred over OLS in terms of the signs, magnitudes and statistical significance of the parameter estimates. However, the assumption in the RE model of time invariant inefficiency, particularly when viewed over a seven year period, may appear rather restrictive. We therefore tested three additional, time varying panel models. The advantage over RE is that these models permit time varying inefficiency. The advantage over OLS in this respect is that the variation is structured over time, not time independent as in OLS. The first two models are the BC92 and Cuesta (2000) models, which are both maximum likelihood stochastic frontier models. The first is commonly used in the literature, partly because it is easier to implement in standard software. The disadvantage of BC92 models is that they require all firms to have the same direction of efficiency change over time (that is, all firms see increasing or decreasing efficiency over time). The Cuesta (2000) model is more difficult to implement, and the Institute for Transport Studies (ITS), University of Leeds has developed LIMDEP (a statistical software package) code for this purpose. It has appealing properties in a 22 This refers to the BC92 and Cuesta testing further below in this section. 11 regulatory context as it allows each firm to have its own time path for inefficiency, so some firms can be catching up to the frontier, whilst others may fall away. The third model, CSS (1990), likewise permits firm specific time paths for inefficiency, but without the need to impose distributional assumptions (unlike the BC92 and Cuesta). One disadvantage of the CSS model is that it does not allow us to test the statistical significance of the time variation in inefficiency. In general we found that the BC92 and Cuesta 2000 models were not robust. In many cases the models did not converge.23 Where the BC92 models did converge, they tended to show that inefficiency was not varying over time. Finally, with both the BC92 and Cuesta models that did converge, there was some ambiguity concerning the estimation of the standard errors. This led us to conclude that these models should not be included in our suite of models (though we would suggest keeping them as possible approaches for PR19). We also tested ‘true random effect’ models, which attempt to disentangle unobserved heterogeneity between companies and inefficiency by assuming that the unobserved heterogeneity is constant over time, while inefficiency is allowed to vary. However, as noted previously, this model can have difficulties distinguishing between persistent inefficiency and time invariant heterogeneity. We did not find these models to be viable as they yielded errors. As a result, the final selection includes models using GLS (RE) and COLS respectively. Box 2.2: Small sample performance of GLS (RE) and COLS While GLS (RE) and OLS are similar approaches, as discussed above, they place different weight on the within estimator. There are numerous discussions around the merits of each of the approaches, but one area that can be an issue in a regulatory context is small samples. We discuss this further below. While in small samples there is uncertainty about the performance of GLS (RE) estimators the academic literature indicates that GLS (RE) is no worse than FE and OLS.24 In fact, GLS (RE) has been shown to outperform OLS and FE estimators in small samples (even in the presence of correlation between firm effects and regressors) due to its superior efficiency, i.e. preciseness of parameter estimates.25 The benefit of having more precise coefficient estimates with GLS (RE) therefore may well outweigh the cost of having some correlation between regressors and firm effects (part of the residual). Any problems such as correlation between regressors and company effects would cause bias in OLS as well. Our extensive testing has suggested that the noncorrelation assumption is reasonable. Furthermore, there are studies showing that GLS (RE) outperforms FE and OLS in small samples. However, academic literature has shown in some cases that the superior efficiency becomes less favourable in samples where N-K<5, where N is the number of observation (in this case the number of companies as the variables have small within variation) and K is the number of 23 Convergence in this case means that one of the criteria for exiting the iterative process of calculation within the statistical software were not met and the software could thus not generate model coefficients. 24 See for example Taylor, W.E., Small Sample Considerations in Estimation from Panel Data, Journal of Econometrics 13, 2008, pages 203-223. 25 See, for example, ibid; and Baltagi, B. H., Econometric Analysis of Panel Data, 2005. 12 variables, excluding translog terms. Therefore, in cases where N-K<5, the GLS (RE) estimators may not perform as well as expected. Additionally, the way we understand Ofwat intends to use the models mitigates concerns about unobserved heterogeneity, ‘within’ variation, or correlation between drivers influencing the benchmarks. The calculation of average and/or upper quartile efficiencies in effect controls for the difficulty in distinguishing between unobserved heterogeneity and inefficiency (and noise in the case of OLS) by not using the frontier. In practice, although GLS (RE) and OLS use different methods to estimate coefficients, their parameter estimates generally converge in our final set of models. Where they do not, the OLS estimates are within the confidence interval of the GLS (RE) estimates. 2.4. Panel length The August submissions allowed us to extend our datasets for both water and sewerage by two years, thus allowing for a nine-year panel for water and an eleven-year panel for sewerage. However, Ofwat advised us that in the first two years of the sewerage dataset the costs were unusual because of a serious outbreak of foot and mouth in the preceding year. This meant that the costs and driver information during these two years was not consistent with the rest of the dataset because of the additional cost of disposing of the sludge or storing it for a longer period. Therefore, we reduced the length of the panel set to exclude the first two years in order to avoid this data consistency issue. Because of the constraints of RE we were reluctant to fully rely on the longer panel as it would mean that companies’ relative efficiencies would stay constant over seven years for water and nine years for sewerage. We therefore tested shorter panel lengths – five years for water and seven years for sewerage. However, as we discuss further in Section 5, the constant efficiency is not an issue when using an alternative method to estimate frontier or upper quartile efficiency challenges. In general, the long panel estimates were very similar to the short panel estimates. Where the model parameters were dissimilar, the long-panel estimates were within the shortpanel confidence intervals. We considered that the five-year panels for water were preferable given that there are 18 companies. However, as there are fewer sewerage companies (10 companies), we chose a seven year panel to allow for additional observations. 2.5. Smoothed versus unsmoothed capex Capex in network companies is generally ‘lumpy’ over time, this is either due to the need to replace existing assets as and when needed or because expansion of a network is on a stepped basis rather than continuously. This means that capex does not generally move ‘smoothly’ in line with the cost drivers which causes difficulties with the modelling estimation. We believe that a partial solution to the problem is to use the smoothed capex, which would be interpreted as 13 annual capex on average over a given period.26 We note that Ofgem used a smoothed capex approach for RIIO-GD1. The lumpiness of capex for water and sewerage is illustrated at the industry level in Figures 2.1 and 2.2 below. These figures show that unsmoothed capex is lumpy and could possibly result in less robust results (and we note that at the company level capex is even lumpier). The figures also show capex smoothed over a five-year period. Given the length of the dataset available to us, we considered that smoothing over five years (which is also consistent with the price control length) was appropriate. Figure 2.1: Water capex profile (£m real) 0 500 1000 1500 2000 2500 3000 2006-07 2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 Waterindustrycapex(£m2012-13prices) Unsmoothed capex Unsmoothed base capex Smoothed capex Smoothed base capex In sewerage, the average effect of smoothing base capex is even more pronounced – see Figure 2.2 overleaf. 26 We note that there is regulatory precedence for using smoothed capex, for example Ofgem used seven-year smoothed capex for RIIO-GD1. 14 Figure 2.2: Sewerage base capex profile (£m real) 0 100 200 300 400 500 600 700 800 900 1000 2004-05 2005-06 2006-07 2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 Sewerageindustrycapex(£m2012-13prices) Network unsmoothed base capex Treatment unsmoothed base capex Network smoothed base capex Treatment smoothed base capex We tested the use of the unsmoothed capex measure as the dependent variable and found these models to perform less well than their smoothed capex counterparts. We used smoothed capex in all the models presented in this report. 15 3. MODEL SELECTION CRITERIA We developed multiple models at different levels of the water and sewerage value chains. We set out the initial viability testing of these models in our earlier report. As the model development set out in the earlier report dealt only with the specific question of whether totex or total cost models were viable we did not focus on a relative assessment of the different models. This meant that we had a range of models which varied by functional form, estimation method, variables included and transformations. In order to assess these models five standard criteria were used:  theoretical correctness;  statistical performance;  practical implementation issues;  robustness testing; and  regulatory best practice. Figure 3.1 briefly introduces our general logic in applying the model selection criteria. The following sub-sections discusses these criteria in more detail. While we have tried to keep the criteria as objective as practicable, given the nature of cost assessment modelling some element of subjectivity is required. We also considered that there is a trade-off between the models, e.g. one model may have a more theoretically correct cost function while another may be more parsimonious and have more intuitively appealing coefficients. This may result in us recommending more than one model for use in setting the cost benchmarks and/ or baseline. The flowchart below (Figure 3.1) does not include practical implementation and regulatory best practice as, at this stage, we consider all our models to be relatively easy to implement and in line with regulatory best practice. However, we discuss these two criteria later. Note, as set out in Section 2 of the CEPA Cost Assessment Report, the initial development of the models was undertaken with due consideration to Ofwat’s Future Price Limits principles. Given that the models assessed in this report build on those initial models, we believe that each of the models assessed in this report are consistent with these principles. 16 Figure 3.1: Model Selection Process Identify Theoretical Cost Drivers Functional Form • Translog or Cobb-Douglas • Interaction between scale and density Logical Criteria Sensibility of coefficients and elasticities Statistical Tests • Statistical significance • Hausman / Mundlak testing • Goodness of fit • Robust standard errors Robustness Testing and Model Refinement • Dropping observations/refinement • Dropping variables/using alternative variables • Time-pooling test Final Model Selection Theoretical Correctness Model Performance Robustnessand Selection 3.1. Theoretical correctness 3.1.1. Cost drivers Theoretical correctness underlies all the modelling we have undertaken. In discussion with Ofwat,27 we developed the models to reflect how companies’ costs are driven. Therefore, theoretical correctness of the functional form (cost function) should ensure that the models reflect the underlying characteristics of the industry. However, it is important to bear in mind that models are always, to some extent, an abstraction from reality. The model estimation software provides statistical evidence as to whether the models fit the theoretical expectations. The main items considered in terms of theoretical correctness are CD versus translog and the efficiency assumptions. 27 At the beginning of the project discussion also took place with UKWIR. 17 3.1.2. Functional form Adopting a translog model (which allows for varying economies of scale across companies) allows for the changing nature of the economies of scale for the vertically integrated water and sewerage companies. As discussed earlier in Section 2.2, this theoretical assumption is consistent with earlier studies of the economies of scale in the industry. Translog models are, however, less transparent (we discuss the transparency issue in Section 3.4 ‘practical implementation issues’ criteria) than other model forms. CD linear models are easier to replicate, but suffer from the imposition of a single degree of economies of scale being assumed across the industry, i.e. all companies are assumed to face one of increasing, constant or decreasing returns to scale. 3.1.3. Time varying inefficiency We also looked at whether a time-varying or a time-invariant efficiency is theoretically more suitable for the length of panel modelled. For longer periods, we would prefer to have timevarying efficiency models (COLS or SFA) as constant efficiency over a longer period of time could be a strong assumption (under RE). We note that this is only a concern if the model residuals are used to make efficiency adjustments. Functional form cannot be considered independently from statistical performance of the variables in the models, which is discussed in the next criterion. 3.2. Statistical performance 3.2.1. Variables The theoretical correctness should ensure that the variables included in the models can be justified as driving or affecting the level of costs and that they reflect the underlying characteristics of the industry. We reduced the range of variables included in the models by considering the following factors:  Statistical significance – is the variable statistically significant? (to be weighed against the other factors below).  Sector significance – is the variable one that a priori is expected to be an important explanatory variable?  Appropriateness of the result – is the sign and impact of the variable what would a priori be expected? With respect to the last criterion, considering the robustness of the explanation for any variable included was important. The latter two criteria are particularly important as focusing only on the statistical significance of variables may result in a mis-specified model due to multicollinearity, measurement error in the regressor, etc. An important aspect affecting the statistical significance of the variables is the correlation between the explanatory variables. The higher the correlation between variables the less reliable the coefficients for these variables will be, and therefore they will also be less significant. 18 However, the overall predictive power of the model will be unaffected. We can chose between a parsimonious specification, which has the advantage of fewer variables that are more precisely estimated, and a fuller specification, which guards against omitted variable bias and unobserved heterogeneity, but results in coefficients being imprecisely estimated. If the focus is on efficiency measures (derived from the residuals between the estimated and the observed values), the latter may be preferable as it would take into account the full range of factors that affect costs and thus reduce the size of the residuals. On the other hand, this then may impede efforts to judge whether the shape of the frontier (determined by the parameter estimates) is plausible. We provide more detail on these matters in Section 3.3.1. Furthermore, careful judgement must be exercised when considering the implications of leaving in a variable with an unexpected coefficient. We encountered a few model specifications particularly in sewerage, in which a few variables fell into this category. In general, we would be less concerned about a variable with an unexpected sign/size that is not statistically significant. However, we still had concerns about using the specification where a coefficient had a large unexpected value, even if it were not statistically significantly from zero, given the implications for predicting future expenditure. In all the models we have taken the log of the explanatory variables (except for the dummy variables). Log-linear models reduce the risk of heteroskedasticity and allow for easier interpretation of the coefficients. The coefficients on the variables reflect cost elasticities, in other words if the coefficient on an explanatory variable is 1.0 then a 1% increase in the explanatory variable will lead to a 1% increase in the costs.28 Log-linear models are the most common approach in academic and regulatory literature. In Tables 3.1 and 3.2 below we set out our expectations for plausible ranges of the coefficients on explanatory variables for the water and sewerage models respectively (a more detailed description of the specification of variables is provided in Annex 1). The expectations are based on our in-team knowledge combined with input from engineers at Ofwat, initial UKWIR meetings with the industry cost assessment steering group and review of the academic evidence.29 We set out these expectations on the basis of ignoring the effects of all other variables. We note that the ranges below may not apply in models with high multicollinearity between variables. In translog models, the expectation of the magnitude of translog variables (i.e. squared and cross-terms) are less clear than coefficients on first order terms. There are a few reasons for this. First of all, when estimating at the industry sample mean, the squared and cross-terms cancel out such that elasticities at the sample mean are given by the first order term only. When examining elasticities away from the sample mean, these terms inform us of the curvature of the cost function. Therefore, although one may be able to have expectations on the magnitude of cost elasticities and whether these elasticities should be increasing or decreasing with a relevant variable, the speed at which the cost elasticities are changing (controlled by higher order terms) is not clear. Lastly, we note that in the past Ofwat has not used such translog variables in cost assessment, and thus it is harder to appeal to historical precedent to formulate expectations of 28 Because we normalise all the translog variables to the sample mean the coefficient on the first order can be interpreted as the elasticity at the sample mean. We note that when the models are used to forecast expenditure, we use the coefficients that have not been normalised. This does not affect the predictive power of the model. 29 For example see Stone and Webster 2004a and Saal et al 2011. 19 higher order terms in UK water and sewerage industries. Nonetheless, we did look at cost elasticities associated with these variables (away from the sample mean) but we refrain from including any expectations on magnitude or sign in the following table. 20 Table 3.1: Range of explanatory variables in water models Type Variable Cost elasticity expectation Core Length of mains These scale variables should be the main drivers of costs. Across these variables we would expect a value of above 0.7 and lower than 1.1.30 A value above 1.0 could indicate diseconomies of scale/ density. In the models using a translog form, interpretations of the normalised coefficients are at the sample mean. Property density Usage Time trend The time trend captures a combination of real price effects (RPE), changes in efficiency and changes in quality not explained by other explanatory variables. We would expect the coefficient to be relatively low, between -0.05 (~-5% per annum) and 0.05 (~5% per annum), as it is only picking up input price inflation above RPI.31 Input prices Average regional wage As labour costs make up a relatively high proportion of totex, we would expect the regional wage coefficient to be relatively high and positive, circa 0.6-0.7, but below 1.0. i.e., if wages were 1% higher in a company’s region then we would expect overall costs to be higher but not by more than 1%. Regional BCIS index The BCIS index effectively acts as a relative (regional) construction price indicator. We would expect the coefficient to follow the same logic for regional wages but to influence the remaining proportion of totex (that is not labour-related or determined at the national level), i.e., <0.4. This variable should not capture changes over time. Network characteristics Population density (occupancy) As with the core scale variables, we would expect a coefficient of around 0.7 to 1.1. Proportion of metered properties We would expect a relatively small negative coefficient, between -0.1 and 0.0, as metered properties are expected to have lower water consumption than non-metered and hence lower costs. If usage is included in the model it is not clear what the effect will be as the cost difference effect could be picked up in either or both variables. We have excluded this variable in the further model refinement because of the uncertainty of its effect on costs. Proportion of usage by metered household properties We would expect a coefficient of around 0.4 to 0.9 (depending on the proportion of metered properties). (If usage is included in the model it is not clear what the effect will be as the cost difference effect could be picked up in either or both variables.) 30 Competition Commission (2000), Mid Kent Water plc: A Report on the References under Section 12 and 14 of the Water Industry Act 1991¸P 267, Professor Stewart, Ofwat’s then academic advisor, estimated a cost elasticity of scale of 0.96. 31 As this is a dummy variable, the coefficient needs to be adjusted using the formula exp(X)-1 to establish the percentage change in costs. 21 Type Variable Cost elasticity expectation Proportion of usage by metered nonhousehold properties We would expect a coefficient of around 0.4 to 0.9 (depending on the proportion of non-metered properties). (If usage is included in the model it is not clear what the effect will be as the cost difference effect could be picked up in either or both variables.) Treatment and sources characteristics Sources (number of) We would expect a low positive number as taking water from more sources drives up costs. Pumping head (x distribution input) This is used as an energy proxy. As energy is a significant driver of costs we would expect this to be relatively high, say 0.4 to 0.6. Proportion of water input from river abstractions We would expect a low positive figure as water from abstractions is expected to lead to higher costs than water from boreholes (our excluded variable). However, this is not always clear because of bankside storage limitations. Proportion of water input from reservoirs We would expect a low positive figure as water from reservoirs is expected to lead to higher costs than water from boreholes (our excluded variable). Activity Proportion of new meters We would expect a low positive number as the installation of new meters should drive up capital costs. Proportion of new mains We would expect a low positive number as the installation of new mains could drive up costs. Proportion of mains relined or renewed We would expect a low positive number as the renewal/relining of new mains could drive up costs. Quality Properties below reference pressure level We would expect a low negative coefficient as the lower the proportion of properties with inadequate water pressure the higher the capex costs would have been to reach that improvement in quality. Leakage We would expect a low negative number as greater costs may be required to achieve a lower leakage level should leakage behave as a quality variable. Properties affected by unplanned interruptions > 3 hrs We would expect a low negative number as greater costs may be required to achieve a lower level of properties affected by unplanned interruptions should this variable behave as a quality measure. Properties affected by planned interruptions > 3 hrs We would expect a low negative number as greater costs may be required to achieve a lower level of properties affected by planned interruptions should this variable behave as a quality measure. This is an ambiguous driver as planned interruptions could also be a sign of quality improvement or scheduled maintenance. 22 Table 3.2: Range of explanatory variables in sewerage models Type Variable Cost elasticity expectation Core Length of sewers These scale variables should be the main drivers of costs. Across these variables we would expect a value of above 0.7 and lower than 1.1. A value above 1.0 could indicate diseconomies of scale. In the models using a translog form, interpretations of the normalised coefficients are at the central mean. Usage Property density We expect this to be a main cost driver. However, the sign of the density coefficient is expected to vary between network and treatment/ sludge models. In network models, we expect it to carry a positive coefficient due to increased costs associated with operating in urbanised areas. In treatment/ sludge models we expect a negative coefficient due to the ability to have larger, more efficient treatment plants serving densely populated areas. For these reasons, the expected sign of the density coefficient in combined models (capturing both network and treatment & sludge) is ambiguous. Time trend The time trend captures a combination of real price effects (RPEs), changes in efficiency and changes in quality not explained by other explanatory variables. We would expect the coefficient to be relatively low, <0.05, as it is only picking up input price inflation above RPI.32 Input prices Average regional wage As labour costs make up a relatively high proportion of totex, we would expect the regional wage coefficient to be relatively high and positive, circa 0.6-0.7, but below 1.0. i.e., if wages were 1% higher in a company’s region then we would expect overall costs to be higher but not by more than 1%. Regional BCIS index The BCIS index effectively acts as a relative (regionally) construction price indicator. We would expect the coefficient to follow the same logic for regional wages but to influence the remaining proportion of totex (that is not labour-related or determined at the national level), i.e., <0.4. This variable should not capture changes over time. Network activity Proportion of sewers replaced and renovated We would expect a low positive number as the refurbishment of sewers should drive up costs. Treatment Load This scale variable for sewage treatment should be the main driver of costs. We would expect a value of above 0.7 and lower than 1.1. A value above 1.0 could be taken to indicate diseconomies of scale. 32 As this is a dummy variable, the coefficient needs to be adjusted using the formula exp(X)-1 to establish the percentage change in costs. 23 Type Variable Cost elasticity expectation Sludge disposed As a possible substitute for the load variable we would expect similar values i.e. a value of above 0.7 and lower than 1.1. A value above 1.0 could be taken to indicate diseconomies of scale. Could also be considered a core variable as highly correlated with length. Proportion of load in treatment works size bands 1-3 We expect a positive coefficient on this variable as works in bands 1-3 tend to be more expensive than band 6 (the omitted proportion) in terms of unit costs due to economies of scale. Proportion of load in treatment works size band 4 We expect a positive coefficient on this variable as works in band 4 tend to be more expensive than band 6 (the omitted proportion) in terms of unit costs due to economies of scale. Proportion of works load in treatment works size band 5 We expect a small positive coefficient on this works density variable (if higher size bands are omitted in the model) to take into account the diseconomies of scale of band 5 works relative to band 6. Proportion of works load in treatment works size band 6 We expect a small negative coefficient on this works density variable if included in a model as it would take into account the economies of scale of band 6 works compared to the lower omitted band(s). Proportion of load undergoing activated sludge treatment We expect a positive coefficient as this treatment is considered the most expensive treatment type. Number of large works with the tight consent dummy Based on prior Ofwat large works models, this variable should have a coefficient around 0.1 to indicate higher costs associated with tight consents on ammonia, BOD5, and suspended solids. 24 3.2.2. Hausman test We used the Hausman test to choose between GLS (RE) and FE models. The test, a standard econometric test for model specification, indicates whether a GLS (RE) functional form is similar to FE. Similarity between GLS (RE) and FE indicated by the Hausmann test suggests the assumption of non-correlation between company effects and regressors in GLS (RE) is reasonable (as FE will always be consistent even when the non-correlation assumption breaks down). In some cases LIMDEP cannot invert the variance-covariance matrix.33 The LIMDEP manual indicates that the best interpretation of this leads to a conclusion that favours the GLS (RE) estimator (this was also supplemented by additional testing described below). We also applied an alternative method for computing the Hausman test, known as the Mundlak approach. This approach is more general in its testing of correlation between company effects and regressors. The results of the Mundlak test broadly supported our findings from the earlier Hausman tests, reaffirming the preference for GLS (RE) models over FE. Where there were discrepancies between the findings of the Hausman and Mudlak tests we carried out further testing to isolate correlated variables (i.e. the variables causing the discrepancy between the two testing methods). Once isolated, we assessed the impact of controlling for correlation via the Mundlak approach. We note that controlling for correlation using the Mundlak approach makes the interpretation of coefficients more cumbersome and less transparent. We also found the impact of controlling for correlated variables to be small in sewerage models and produce unreasonable results in the water models. Therefore, the general support of both the Hausman test and Mundlak approach for GLS (RE), the small differences when controlling for correlation when there were discrepancies between testing methods, and considerations of additional issues (e.g. transparency and interpretation of coefficients) led us to conclude that GLS (RE) is the preferred estimation method for our models. 3.2.3. Goodness-of-fit Ideally we would have liked to assess the ‘goodness-of-fit’ of the models. Unfortunately, in GLS models there is no robust statistical measure of goodness-of-fit - see Green (2008).34 As the majority of the models run for the water industry are based on a generalised least squares (GLS) estimator, the R-squared is not applicable. Furthermore, the R-squared tends to be high in loglinear models in general, which adds another layer of uncertainty to this statistic. An alternative statistical measure of the goodness of fit is the square of the correlation between the observed and the predicted values of the models.35 We note that this measure yields relatively high statistics and small differences in the statistics should not be used as indicating that a model is more robust. For example, a model with a 0.98 statistic should not be considered more robust 33 This means that the differences of the two matrices is not positive and the Hausman statistic can thus not be generated. Greene provides more detail on this in the Limdep manual. 34 Greene, W. H., Econometric Analysis, Sixth Edition, Pearson Prentice Hall, 2008, page 156. 35 We have consulted William Greene on the most appropriate goodness of fit measure for GLS models. 25 than a model with a 0.97 statistic. We have also relied on the stability of the scores to robustness testing. While we have provided standard R-squared statistics for GLS (RE) models in Annex 4 and Annex 5 we warn against their use to avoid misinterpretation. 3.2.4. Robust standard errors Robust standard errors refer to alternative ways of computing standard errors that try to take into account more complex structures within the data. In regular OLS estimation, variances of error terms are assumed to be a constant. However, it may be desirable to impose a covariance structure upon the error terms to take account more specifically for certain effects. White’s robust standard errors take into account heteroskedasticity; that is different variances across different companies. Calculating robust standard errors has no impact on the parameter estimates themselves, only on the estimated standard errors and significance of parameter estimates. White’s standard errors were used consistently in OLS estimation as the assumption of a constant variance is unreasonable. White’s errors were also tested in place of the standard errors calculated via GLS for the random effects models. It was found that these robust standard errors were similar to the GLS standard errors in terms of precision in most cases and would have led to equivalent choices of model selection. Greene also warns against using robust standard errors for GLS as their interpretation is not necessarily straight forward. 3.3. Robustness testing We carried out several robustness tests, which included removing variables, dropping observations, statistical testing, changes in predictions, and rank correlations with other CEPA models. 3.3.1. Refinement To get to the selected set of models, we refined them down from the full model specification by removing variables one at a time. We started by removing the non-core variables with the highest p-value (lowest level of significance) until we got to a stable model. This robustness check resulted in the refined models. We also checked the impact of dropping variables on coefficient estimates. We tried to include as much of the value chain as possible, which led to leaving in some variables even if they were not statistically significant. Further refinement was necessary when, despite being statistically significant, the magnitude and/or sign of a variable was highly different from our a priori expectations (for example BCIS, discussed below). In those cases, besides looking at the coefficients, we also assessed the rank correlations and compared predictions of models covering the same cost area. We found that the inclusion of two variables which we considered important cost drivers during the earlier phases of this project had unexpected results. These variables were:  BCIS – in both water and sewerage; and  Usage – sewerage only. We discuss our findings with respect to these variables in Text Box 3.1 below. 26 Text Box 3.1: BCIS and usage BCIS All models explicitly take into account regional price differences based on the average regional wage variable and/or the BCIS variable, included on the right-hand side of the equations. This differs from Ofwat’s approach in PR09 in which it made an ex-ante adjustment to modelled opex using regional wages and to modelled capex using BCIS. We found, unsurprisingly, that the regional wage variable and the BCIS are highly correlated and when both are included in the modelling it resulted in odd coefficients (e.g., large and/or negative) and did not improve the predictive power of the models. We found that dropping the BCIS variable brought the coefficients on the other variables more in line with our expectations. We therefore dropped it in a number of models and relied on the average regional wage variable. Usage A similar case was made for the usage variable in the sewerage network model. Both OLS and GLS (RE) returned negative coefficients, statistically significant in the case of OLS. This implies that higher levels of usage decrease costs, opposite to what is expected. The result was robust to model refinement as well; dropping BCIS increased the magnitude of this effect. Excluding usage from network had little effect on the models’ predictive power, and brought other point estimates more in line with expectations. For these reasons, usage was also dropped from the network model. Although both BCIS and usage are theoretically important a priori, it is clear from our estimations that there were significant problems with the variables. It is important to note that these variables are imperfect proxies and they may in fact be picking up undesirable effects of other included (or excluded) variables. In the case of BCIS, the data is not comparable year on year and only serves to proxy regional differences in construction prices within the year. It is also highly correlated with wages, the other regional price variable. In the case of usage, the variable tested is defined as load entering system/property. Since load is a measure that captures both the strength of the effluent and its volume, it is impossible to separate the effect attributed only to volume, which is the driver that applies to network activities. Usage performs better in the full wholesale base model, which is less susceptible to outlier observations and includes treatment costs, driven by the strength as well as the volume of sewage. While recognising the importance of scale and regional price variables in the models because of the above reasoning it seems reasonable to drop both the BCIS and usage variables. In terms of rank correlations, we checked if the efficiency rankings of a model were consistent with those of the other models that covered the same part of the value chain or have the same type of expenditure (e.g., base expenditure, or base plus enhancements). This meant comparing:  totex model results;  sewerage network model results;  treatment & sludge model results; and  opex plus base capex model results separately. 27 Rankings and scores that were consistent with other models supported the robustness of our analysis for that particular part of the value chain/expenditure level. However, we note that different estimation methods may make different assumptions about efficiency, which may lead to diverging results. Consequently it was important that these results were discussed with Ofwat and robust judgements formed based on sector knowledge as well as modelling tests and these discussions were an important part of the development and testing process. 3.3.2. Dropping observations We tested the sensitivity of the models’ outputs by dropping observations. This tested the stability of our coefficients, efficiency scores and for the presence of outliers. We used rank correlations and predictions to compare our models. We preferred models that are less sensitive to outlier observations. 3.3.3. Pooling test A structural break occurs if the effect of a cost driver changes from one period to the next. We therefore investigated two different scenarios where we thought a structural break was most likely to occur: the onset of the financial crisis and the beginning of the current price control (AMP5). It is important to note that any variable may be tested for a structural break whether it is justified or not. Therefore, we limited our analysis to variables we thought could display a break from a theoretical/logical standpoint. We chose to investigate the BCIS index, regional wages, usage (sewerage only), and number of sources for water (only tested against AMP5). The first two were directly impacted by the financial crisis through pressure on input prices as demand slowed. The latter two are related to differences in regulatory reporting requirements between AMP4 and AMP5. We concluded from our testing that there was no evidence of AMP5 affecting the parameters associated with our chosen cost drivers. In general, the onset of the financial crisis did not result in significant sensitivity of the coefficients of our chosen variables (i.e. the interaction term was not statistically significant). There was, however, evidence that the onset of the financial crisis did change the way in which regional wages drove costs in one of our models. Where this was the case, the effect had a negligible impact on forecasts, parameter estimates, and efficiency scores. Furthermore, we note that due to choosing a shorter panel length aimed at alleviating concerns of constant efficiency assumptions in the RE model, the ‘pre-crisis’ coefficients in the water models were based on a single year of data. This reduces the robustness of the ‘pre-crisis’ result. It is for this reason and the negligible impact on results that we concluded that the models were not sensitive to time-pooling. 3.4. Practical implementation issues We considered that any proposed cost models should be transparent, replicable and stable. This includes ensuring that the models are not too complex (although this potentially involves a tradeoff with accuracy and theoretical correctness), that the implications of the results are clear and the results of the models are objectively reproducible where applicable. We believe all the models 28 we included in the final round of testing are not unduly complex and can be implemented using standard econometric methods and software. 3.5. Regulatory best practice When developing new cost assessment models it is appropriate to review how other regulatory agencies carry out similar analyses. While we considered that checking the modelling methodology with that used by other regulators is useful, a different approach may not necessarily be a cause for concern as the data availability and context in which the analysis is undertaken may vary. We believe the modelling we carried out offers benefits over Ofwat’s previous cost modelling and is more in line with regulatory practice seen at other regulators, e.g. Ofgem and ORR. In particular, the approach utilises panel data, which is advantageous for a number of reasons (inter alia, it increases the sample size, enables variation in efficiency and technical change over time to be studied, and enables efficiency estimates to be derived without recourse to distributional assumptions).36 We also note that the use of a panel data set is in line with the CC recommendations in the Bristol Water case.37 ORR and Ofgem have both developed panel data models for use in their efficiency determinations, for example, Ofgem’s RIIO-GD1 and RIIOED1. The approach is also in line with that of other regulators in seeking to benchmark total costs (or totex), or at least substantial parts of total costs together, rather than separately. Whilst this could potentially have some disadvantages compared to the more disaggregated approach taken by Ofwat in previous price reviews, in that more tailored models could be developed for different cost categories, it has major advantages in terms of addressing potential incentives for capital bias and ensuring that substitution between different categories of expenditure is taken into account. We note that Ofgem used (and is using) totex benchmarking, in combination with bottom-up benchmarking, for RIIO-GD1 and RIIO-ED1, and in PR08 ORR benchmarked maintenance and renewals together (although they did separate assessments for enhancements and operating costs). Finally, we have used the same data (June Returns) as Ofwat has used for its previous cost modelling, plus the data submitted by companies in August 2013. With respect to our models we have tested a wider range of variables than covered in Ofwat’s previous work, including quality measures, and our final models may be favourably compared with previous Ofwat models in terms of the number of variables included and the extent to which the coefficients accord with engineering understanding while also being statistically significant. 3.6. Results coding There is no singular method or metric for identifying suitable models mechanistically, rather a judgement is required in model selection. To facilitate this process, we have adopted an approach based on a ‘traffic-light’ system to indicate how well the model performs against a given criterion, i.e., a ‘green light’ corresponds to ‘good’, ‘amber light’ corresponds to ‘acceptable but with a few issues’, and a ‘red light’ means that the model is flawed. 36 CEPA and Mott McDonald. Cost assessment – use of panel and sub-company data. May 2011. 37 Competition Commission. Bristol Water Plc Price Determination. 2010. 29 In this sub-section we describe the method of assigning traffic lights to a short-list of models. The selection of traffic lights is based on the conclusions for each model summarised in the templates set out in Annex 4 for water and Annex 5 for sewerage. We note that we ran a much more exhaustive range of models than those presented in these annexes, but we pre-selected these as the most viable models. As we mentioned earlier in the report, all the models presented here are in line with regulatory best practice and there are no obvious concerns about their practical implementation. We therefore only assigned traffic lights for the remaining three categories, i.e. theoretical correctness, statistical performance, and robustness checks. We considered whether the model meets a set of criteria for each category, listed by priority in the table below. The boundary between Amber and Green depends on whether the model satisfies the top criteria. At this stage, we did not assign a red light to any model for theoretical correctness as the models had already been narrowed down to include a set of theoretical drivers following discussions with Ofwat, UKWIR and by implementing standard econometric approaches. The other categories – statistical performance and robustness testing – do allow for a red traffic light, in which case the model would no longer be considered a candidate. For the former, a red light indicates that several of the core parameter estimates are substantially outside the expectations in Tables 3.1 and 3.2 and are statistically significant. For robustness testing it means that either the efficiency scores resulting from the model or the prediction are implausible; or that there is significant evidence for having different coefficients in different time periods. We considered that any model that received a red light (in any category) should not be used to set cost benchmarks/ baselines. 30 Table 3.3: Traffic light criteria in order of priority Theoretical correctness Statistical performance Robustness check R N/A The core parameter estimates are substantially outside the expectations in Tables 3.1 and 3.2. Overall range of efficiency scores and predictions is not plausible. Pooling tests suggest significant and material differences in coefficients for key variables in different time periods. G A 1. Prefer translog over CD functional form, particularly for water where the models are not disaggregated by value chain and there is greater size variation between companies. Preference is based on theoretical reasoning and statistical significance tests of the translog terms. Translog models given Green and CD given Amber, if translog is significant. 2. Are all core theoretical drivers included? If not, given Amber. 1. Coefficient estimates largely in line with expectations (based on Tables 3.1 and 3.2) and elasticities relatively sensible. If not, given Amber. 2. How refined is the model? (Statistically significant parameter estimates while including as much of the value chain drivers as possible.) Is N-K >5 for RE?38 The most refined models given Green. 3. Statistical results: goodness of fit/ statistical preference for GLS (RE) over FE. If FE preferred, given Amber. 1. Sensitivity to dropping observations/ variables. If efficiency scores or predictions are sensitive, given Amber. 2. Are model rankings outliers with respect to other CEPA models at same level of expenditure and value chain disaggregation (see Annex 4 for details)? If so, given Amber. 38 Used as a rule of thumb rather than a hard and fast rule, as we recognise there is no definitive threshold for reduced reliability of GLS (RE) estimates. 31 4. MODEL SELECTION 4.1. Introduction In this section we focus on the models we determined to be the most viable, namely using GLS (RE) or OLS only, and then assess these models against the criteria set out in the preceding section. We do this in turn for water and then sewerage. 4.2. Water 4.2.1. Short list of viable water models We narrowed down our preferred range of viable water models to 10. Seven of these models are at the totex level, while three use opex plus base expenditure. We summarise all these models in templates in Annex 4. The templates provide the results from our testing, coefficients and confidence intervals. A brief description of these models and our assessment of them against our criteria is set out in Table 4.1 overleaf. 32 Table 4.1: Select water models assessed Model reference Description Theoretical correctness Statistical performance Robustness check Totex WM1* Fully specified totex GLS (RE) (translog); includes all theoretical water drivers. G R A WM2* Fully specified totex GLS (RE) (translog), but excluding regional BCIS. G R A WM3 A COLS version of WM2. G A A WM4 Refined totex GLS (RE) (CD); variables included are length of mains, property density, time trend, regional wage costs, population density, proportion of input from river abstractions, and from reservoirs. A A R WM5 Refined totex OLS (translog); variables included are length of mains, property density, time trend, regional wage costs, population density, proportion of input from river abstractions, and from reservoirs. G G G WM6 GLS (RE) version of WM5. G G G WM7 GLS (RE) version of WM5 with BCIS included. G R G Opex + base capex WM8 Refined opex plus base capex GLS (RE) (translog); variables included are length of mains, property density, and their corresponding translog terms, time trend, average regional wage, regional BCIS index, population density, leakage, planned interruptions, proportion of input from river abstractions, and from reservoirs. G R G WM9 OLS version of WM8, excluding BCIS. G A G WM10 GLS (RE) version of WM9. G G G * Note, while the GLS (RE) fully specified models ran in our statistical programme (LIMDEP) because of the number of explanatory variables exceeded the number of companies it was not clear how the between estimator was calculated. Consequently we considered that the models failed the ‘Statistical performance’ criteria. 33 4.2.2. Water models recommended for triangulation After giving due consideration to each of the models in Table 4.1, and in discussion with Ofwat, we recommend using a range of specifications (i.e., full and refined, and totex and opex plus base capex). We found that the full and refined tended to give slightly different results, but given the trade-offs of a richer model (full) and parsimonious model (refined) discussed earlier, there was no overwhelming reason for preferring one over the other. While the totex model offers the benefit of not requiring unit cost models for the enhancement capex the opex plus base capex model appeared robust and offered an alternative view on the companies’ efficiency. In a similar vein, other than the GLS (RE) models being slightly more robust than the COLS models in most cases there was no clear evidence why one should be preferred over the other. As the models provide different predictions we believe that using both estimation techniques is appropriate. We recommend using the following five models, which are based on GLS (RE) and OLS versions of three basic model specifications:  Full totex (WM3): As it included all the variables we considered to be theoretical drivers, this model is less likely to suffer from omitted variable bias than the refined models. The unexpected results for statistical significance and size/signs of the parameters may be due to multicollinearity, which would not pose issues for the overall predictive power of the model. The Amber in the robustness check category refers to the models’ sensitivity to dropping variables, which we do not consider to be a drawback for a fully-specified model. As explained earlier models excluding BCIS are more appropriate given the correlation between this variable and regional wage.  Refined totex (WM5 and WM6): The coefficients are generally as expected and the models have a high rank correlation, despite using different estimation methods. These models have advantages over the full model in that they are more parsimonious and the coefficients should be more precise.  Refined base expenditure (WM9 and WM10): Although we prefer totex to avoid capex bias, these opex plus base models are sufficiently robust and in line with expectations to be used in triangulation, along with a unit cost estimate of enhancement. The amber in Model 9 reflects the unexpected coefficient on population density, which could be due to multicollinearity. We consider that the models can be used directly in triangulation or as a cross-check for the other totex models. Comparing the efficiencies of the two refined water models, one can draw conclusions about the difference between base (WM10) and enhancement expenditure (included in WM6). In the base model, companies seem to be slightly closer to the average industry efficiency than in the totex model. This suggests that companies may differ more in the efficiency of their enhancement activities compared to base activities, though this could also be explained by greater variability in heterogeneity of enhancements. We can see this in Figure 4.1 below; it illustrates the range of efficiencies for the 10 companies that are closest to the industry average. 34 Figure 4.1: Water efficiency ranges 60% 65% 70% 75% 80% 85% 90% 95% 100% Totex Base expenditure Efficiencyscore(%) 4.3. Sewerage 4.3.1. Short list of viable sewerage models The models we tested in sewerage ranged from sewerage totex models to size-band subcompany models for sewage treatment opex only. As noted in the introduction to this paper, we dropped the sub-company models because, while viable, they failed to capture the linkages across the treatment activity achieved by a more comprehensive model.39 We narrowed our preferred range of models to 10. Two of these models were for network opex plus base capex, four for treatment and sludge opex plus base capex, and four for sewerage wholesale opex plus base capex. We did not identify any viable models which included enhancement capex. A brief description of these models and our assessment of them against our criteria is set out in Table 4.2 overleaf. We summarise all these models in templates in Annex 5. 39 We considered this as a solution only when more encompassing models did not appear viable. 35 Table 4.2: Select sewerage models assessed Model reference Description Theoretical correctness Statistical performance Robustness check Network opex + base capex SM1 A refined translog GLS (RE) model that covers network base expenditure (opex and base capex); variables included are length of sewers, property density, and the corresponding translog terms, time trend and regional wages. G G G SM2 The OLS version of SM1. G R G Treatment & sludge opex + base capex SM3 Fully specified treatment & sludge translog GLS (RE). A R R SM4 Slightly refined treatment & sludge CD model (GLS [RE]); variables included are load treated, time trend, regional wages, proportion of load treated by activated sludge, proportion of load treated in size bands 1-3, sludge disposed. A G R SM5 A refined treatment & sludge GLS (RE) model that also uses a translog form; variables included are load treated, property density, and the corresponding translog terms, time trend and regional wages. G G G SM6 The OLS version of SM5. G G G Wholesale opex + base capex SM7 Fully specified translog GLS (RE) that covers both network and treatment & sludge. G A A SM8 The OLS version of SM7. G A A SM9 A refined version of SM7; variables included are load treated, property density, and the corresponding translog terms, time trend, regional wages and proportion of load treated in size bands 1-3. G G A SM10 A refined version of SM8 (this is also the OLS version of SM9). G G A 36 4.3.2. Sewerage models recommended for triangulation As with the water models after giving due consideration to each of the models in Table 4.2, and in discussion with Ofwat, we recommend using a range of specifications (i.e., network, treatment and sludge and sewerage wholesale). Aside from the network models, other than the GLS (RE) models being slightly more robust than the OLS models, in most cases there was no clear evidence why one should be preferred over the other. As the models provide different predictions we believe that using both estimation techniques is appropriate. For the network models, the OLS based model contained unexpected coefficients on the wage variable. This coefficient was highly negative and as such we had concerns about its interpretation and impact on the forecast predictions. We recommend using five final models in sewerage. We note that none of these models cover enhancement, unlike water. The final cost benchmarks/ baseline estimates will need to be based on these models triangulated with the unit cost models to account for enhancement. The majority of the expenditure in sewerage is treatment and sludge related. The suite that we recommend is thus more treatment and sludge oriented in terms of explanatory variables. The final selection for triangulation covers the following models:  Network (SM1): This is a refined network only model. It includes purely network related variables (e.g. length of sewers). The model uses GLS (RE). We believe it is a useful addition to the suite of final models along with the separate treatment models as it offers a bottom-up approach. We did not include an OLS model here as it included an unexpected coefficient on wages.  Treatment and sludge (SM5 and SM6): These are two treatment and sludge only models, both of which are refined. These models include key treatment variables, some of which also relate to network to account for possible trade-offs in expenditure between the two business lines.40 Full models did not add much to the predictive power in this part of the value chain (e.g. sludge disposed is highly correlated with load treated). As treatment comprises a significant portion of expenditure, we selected two models here, which provide a range of approaches (GLS [RE] and OLS). These models need to be combined with the network model (SM1) before they can be compared to the wholesale base sewerage models.  Wholesale base sewerage (SM9 and SM10): These are models that cover the entire sewerage value chain (network and treatment and sludge). They cover the same range of drivers as the network and treatment and sludge models, but the range of variables are more treatment-oriented to account for the higher proportion of expenditure in treatment (therefore load is preferred to length as the key cost driver). These models are refined and did not appear to suffer from multicollinearity. Their predictive power is not very different from that of the full models. The key advantage of these models is combining network and treatment and sludge, which picks up any trade-offs between these two parts of the business. The only difference between these two models is the 40 For example, there may be a trade-off between having a longer network with one large treatment plant or shorter networks with many small treatment plants (larger treatment plants are usually seen as more efficient). 37 estimation method, which leads to the low rank correlation between the two models, marked with amber in the robustness check category. We believe this range of sewerage models accounts for several issues in sewerage: trade-offs between network and treatment and estimation method differences between GLS and OLS. However, we note that the trade-offs between these two areas are likely to be less ‘dynamic’ in nature, as the coefficients reflect the historical structure of the sewerage system, and if the models at the disaggregated level of expenditure contain the appropriate cost drivers the tradeoff issue should be relatively minor. In terms of average efficiency, the models demonstrate that most companies perform differently in network and treatment and sludge. The dispersion in treatment and sludge (SM5) is much higher than in network (SM1). In the combined wholesale base sewerage model (SM9), those differences diminish (in particular the spread between upper and lower quartile) because companies that were less efficient in one service often compensate by being more efficient in the other. We also note that in terms of the average efficiency level in the industry, the wholesale base sewerage model is more in line with the treatment model than with the network one as treatment accounts for the larger proportion of expenditure. Figure 4.2: Sewerage efficiency ranges 60% 65% 70% 75% 80% 85% 90% 95% 100% Treatment & sludge Network Wholesale base Efficiencyscore(%) 4.4. Other considerations 4.4.1. Time trend The time trend variable in all the econometric models accounts for the frontier shift, RPEs and changes in quality not captured via the other variables in the model. A positive time trend indicates that the improvement in technology which would lead to savings had been outweighed by RPEs or increases in quality that the industry has paid for. A negative time trend indicates that gains in ongoing efficiency outweigh the other two factors put together. In previous price 38 controls Ofwat has applied RPEs net of ongoing efficiency of between 0.25 (for base opex) and 0.4% (for base capex). In our preferred water models, time trends in totex are not statistically different from 0%, while at the base expenditure level they are around 1%. This could indicate a range of things, including that ongoing efficiency gains in enhancement have been greater than in maintenance and opex, or that expenditure related to improving quality is contained in maintenance and opex. In sewerage, we only modelled base expenditure. We see a time trend of around 2% in both network and treatment and sludge. A possible explanation as we understand it, is that over AMP5 quality in sewerage has been improving and this would likely lead to higher costs in opex and base capex. 4.4.2. Economies of scale Our modelling results show that there are varying returns to scale/density in both water and sewerage. This is allowed for by the translog specification, which was jointly significant in all models. In water, elasticities with respect to length of mains (size) range between 0.9 and 1.1, suggesting economies of scale for some companies and diseconomies for others. The range is, however, tight with the average showing relatively constant returns to scale. In sewerage, all companies have elasticities with respect to size less than one, suggesting economies of scale. It is also interesting that in terms of density, water and sewerage show different shapes of the elasticity curve. We find the extent of returns to density increasing in sewerage and decreasing in water. These results can be interpreted as having a more dense network facilitates treatment in large works in sewerage. In water, the density affect seems to be related to higher costs of maintenance work in urban areas. 39 5. TRIANGULATION We understand that Ofwat will use the econometric models to forecast the cost benchmarks for the risk-based review in PR14. Given that we were unable to narrow our preferred range of models to a single model for either water or sewerage, we recommend that the results from the preferred list of models be weighted together. We refer to this approach as ‘triangulation’ and we briefly discussed it in the CEPA Cost Assessment Report. We note that, where the models do not use totex the results from the unit cost models and any non-modelled costs must be added to achieve a view of the companies’ totex. The raw model estimates may also require an adjustment to avoid log-transformation bias.41 We discuss this in more detail in Annex 7 and we recommend the use of either the ‘alpha factor’ or ‘conditional mean’ but we consider that the final choice of adjustment is up to Ofwat. We note that these adjustments should be applied before triangulating. We also note that while Annex 4 and Annex 5 show the models’ coefficients at the sample mean for comparison purposes in model selection, the non-normalised coefficients should be used in forecasting AMP6 expenditure. Non-normalised coefficients are the ones resulting from modelling that uses data in which the three translog variables have not been divided by the average of the sample. Annex 8 provides those coefficients that are to be readily used in Ofwat’s feeder models and provides further explanation of how those are reconciled with the normalised coefficients. 5.1. Triangulation options There are a number of ways in which one could triangulate the models’ predictions to yield a final cost benchmark or baseline value. We therefore focused on methods based on the following logical flows: 1. Triangulating based on estimation method to arrive at GLS (RE) water (sewerage) and COLS water (sewerage) estimates that are then combined. 2. Triangulating across disaggregated models to reach a single bottom-up ‘totex’ value and then combining this with a single value from top-down ‘totex’ models. 3. A combination of Option 1 and 2. Triangulate based on estimation method then bottomup vs. top-down. This gives us bottom-up (top-down) GLS (RE) and COLS estimates that are then combined. 4. Similar to Option 2, we build ‘bottom-up’ and ‘top-down’ estimates first, but keep a distinction between refined and full models. While in practice there is little difference between the results of the triangulation process, we considered additional criteria that led a single recommendation. These criteria are:  The intermediate information each option offers - i.e. the usefulness or intuition of information contained in each step.  Transparency. 41 This is due to Jensen’s inequality. 40  Logical flow i.e., do the weights make intuitive sense.  Ease of implementation/ replicability. Following discussions with Ofwat we concluded that Option 4 best met the criteria set out above. We considered that the preservation of a bottom-up estimate provides useful information from a business plan perspective while being weighted with the encompassing view of a topdown totex model. Furthermore, it maintains a logical split between full and refined top-down models for water only companies. In addition, we believe that the implicit weights applied to each model in this triangulation method are intuitive and logical.42 Option 4 is illustrated in Figure 5.1 and Figure 5.2 for water and sewerage respectively. Figure 5.1: Water triangulation Water totex Full totex COLS (50%)Refined totex RE Refined totex COLS Refined totex top- down (50%) (33%) (33%) Enhancements unmodelled costs Refined base RE Refined base COLS Base (50%) (50%) Enhancements unit costs Totex bottom-up (33%) Triangulate Add Triangulate Add Add Triangulate 3 2 1 c b a 42 Though the option to set explicit weights remains available, we considered that this would only be required if new information became available suggesting a preference between the aggregate/ disaggregated models. 41 Figure 5.2: Sewerage triangulation Network Treatment RE Treatment COLS Wholesale base bottom-up Treatment Wholesale base RE Wholesale base COLS Wholesale base top-down Wastewater base (50%) (50%) (50%) (50%) (50%) (50%) Enhancements unit costs Enhancements unmodelled costs Add Triangulate Wastewater totex Add Add Add Add Triangulate Triangulate b c a 5.2. Efficiency adjustments Model cost estimates are all calculated at the average industry efficiency, and there are several ways of making adjustments to these projections when setting efficiency targets. In essence, they are all ways of shifting the prediction line to the upper quartile (UQ), lower quartile (LQ), or frontier.43 Here we give an example with the upper quartile but the same logic applies to the other adjustments. 5.2.1. Method: ratio- or residual-based We consider two different methods of calculating the adjustment for upper quartile efficiency:  based on the residuals from each model; or  based on the ratio of actual expenditure to predicted expenditure. We discuss this in more detail in Annex 6, but we provide an overview of their differences below. Adjusting the predictions based on the regression residuals can only happen at the specific model level. For example, in sewerage this would mean adjusting each of the treatment models, the network model, and each wholesale model by a different percentage based on the upper quartile in each model. However, doing this separately for network and treatment may lead to cherry picking as there may be trade-offs between network and treatment costs. In other words, a company which is very low cost in terms of treatment may have less scope to be low cost in relation to its network. This becomes a more significant issue in relation to combining the advanced regression results with the unit cost models. We therefore do not recommend applying 43 These are the three additional values that Ofwat’s RBR benchmarks are based on, though other adjustments are possible using the same method. 42 the residual-based method at the disaggregated level, even though this might be considered a more theoretically correct approach.44 The alternative approach is to calculate the lower quartile of the companies’ ratios of actual and predicted costs (corresponding to upper quartile efficiency), as in Equation 5.1. 𝑈𝑄 𝑎𝑑𝑗𝑢𝑠𝑡𝑚𝑒𝑛𝑡 = LQ (𝑎𝑐𝑡𝑢𝑎𝑙 𝑐𝑜𝑠𝑡𝑠/𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 c𝑜𝑠𝑡𝑠) (5.1) The upper quartile adjustment is then used as a ‘scaling factor’ to shift the companies’ predicted totex.45 The advantage of this approach is that it avoids cherry-picking as this adjustment can be made after the predictions from all models have been aggregated. We believe that this approach is also more replicable and transparent than the residual based approach. It does, however, assume time invariant inefficiency across all models as the average is taken across all years.46 5.2.2. Ratio approach: historical or forecast efficiencies A caveat of the ratio based approach is that efficiencies can be calculated using either the actual (historical) costs or the companies’ own future forecasted expenditure in the numerator of the equation above. The former compares companies’ performance to their historical benchmark performance, while the latter provides a relative comparison at a point in the future (i.e. over AMP6). The implication of this is that by using efficiencies based only on future forecasted expenditure over AMP6 there will be a certain number of companies (at least a quarter) whose cost assessment will result in them meeting their upper quartile target. On the other hand, by using only historical data it is theoretically possible to have any number of companies meet (or fail to meet) their upper quartile target. Figure 5.3: Historical vs Forecast UQ efficiencies • Accounts for historical performance/ trends • Any number of companies may pass or fail their UQ target UQ Based on Historical Costs • Contingent on companies’ own forecasted costs (forward looking) • Guaranteed to have 25% pass. UQ Based on Company Forecasts We consider that using the actual expenditure is more consistent with the modelling approach we have adopted and is more independent of the business plan submissions. It is also likely to set a more challenging target as it does not ‘guarantee’ a certain number of companies will perform better than the upper quartile. 44 In particular, the residual approach would hold the RE models to having time invariant inefficiency. 45 If we only had one totex model, the two approaches would be the same. 46 In both the OLS and GLS (RE) models, no decomposition between noise and efficiency is undertaken directly. This adjustment is applied through the use of the upper quartile adjustment. 43 We therefore recommend using the ratio-based efficiency adjustment with historical costs. 5.2.3. Where to make the adjustment In the case of the ratio-based approach, the efficiency adjustment can be made at a number of different points in the triangulation diagram without resulting in cherry-picking. The two options we consider most plausible are: A. Calculate the upper quartile at the final step of triangulation. That is, triangulate all models and then apply the adjustment. B. Calculate the upper quartile at the intermediate stage (i.e. bottom-up and top-down estimates adjusted separately) and then triangulate these intermediate UQ estimates to reach a final UQ estimate. We illustrate these options in Figure 5.4 below. We applied the same criteria as for selecting the triangulation option. We consider that Option A best meets the criteria as it transparent, and logical and is relatively simple to implement. Moreover, the two options in practice had negligible differences for both water and sewerage. Figure 5.4: Options for making the UQ adjustment Water totex Full totex COLS (50%)Refined totex RE Refined totex COLS Refined totex top- down (50%) (33%) (33%) Enhancements unmodelled costs Refined base RE Refined base COLS Base (50%) (50%) Enhancements unit costs Totex bottom-up (33%) Triangulate Add Triangulate Add Add Triangulate 3 2 1 c b a Option A Option B 44 ANNEX 1: EXPLANATORY VARIABLES A1.1 Water Most of the variables that we include in our final models are defined in the same way as those we presented in the CEPA Cost Assessment Report. However, there are a few additional variables47 and a few variables that we have defined in a different way48 in water. Not all of the variables in Table A1.1 are used in every model – the table presents a range and the rationale behind the inclusion of each variable. Table A1.1: Range of explanatory variables in water models Type Variable Definition Rationale Core Length of mains Total length of mains at year end Network scale variable and overall business size proxy Property density Number of connected properties/ length of main Rural vs. urban divide and economies of density indicator Usage* Potable water/ connected property Network and resource usage and possible proxy for domestic vs. I&C49 usage - results similar when normalized by population. The definition of this variable has changed and it now excludes non-potable water as it is a third party service, for which costs have been excluded. Time trend Year dummy Takes into account that the data is for 18 companies over five years and shows the change in costs over the years, including changes in efficiency over time, all other things being equal. Input prices Average regional wage* The data is based on the ONS ASHE SOC surveys by region and allocates companies’ service areas to the regions based on Ofwat’s updated county allocation. The wages figure is the average hourly salary excluding overtime based on the number of jobs in the company area. The data is transformed to real terms using RPI. Please refer to Annex 3 for more information. Input price, one of the main cost drivers; the use of these regional indices does not easily deal with the fact that where companies use contractors they may be brought in from other regions and thus have different underlying input prices. 47 Highlighted in blue in the table below. 48 Marked with an asterisk in the table below. 49 Industrial and commercial. 45 Type Variable Definition Rationale Regional BCIS index Provided by Ofwat. The variable uses the construction price index from BCIS, which is based on tender rather than output prices, and allocates the BCIS areas to the companies based on population numbers from the 2001 census. The index was adjusted by the population proportion served within each area. We have used a rolling average in the models where capex is smoothed. Input price, one of the main capex drivers.50 Network characteris tics Population density (occupancy) Population connected /number of properties connected at year end Approximates average consumer size (domestic vs. I&C) and can be used to take some of the variation away from usage. Proportion of metered properties (Metered billed households with external meters + metered billed households without external meters + metered billed nonhouseholds)/ number of properties connected at year end Metered customers are assumed to have lower per capita consumption than nonmetered customers, thus leading to lower pumping and volume related costs; this variable also captures the wholesale costs related to metering such as installation and replacement. During the period covered, some companies entered the replacement cycle and others had significant increases in meter penetration, which would lead to a positive correlation between proportion of metered properties and totex; it is not clear which factor would be stronger Proportion of usage by metered household properties* Water delivered to billed metered households/(potable water delivered) In order to estimate the model, one proportion has to be omitted. The omitted variable is non-metered properties and the coefficients on the included variables should be interpreted relative to the one excluded. If the coefficient sign is positive, then metered household properties have higher costs than non-metered properties. We have updated this variable to reflect the exclusion of non-potable water delivered. Proportion of usage by metered non- household Water delivered to billed metered nonhouseholds/(potable water delivered) The omitted variable is non-metered properties and the coefficients on the included variables should be interpreted relative to that. Proxy for proportion of 50 We understand from Ofwat that the regional BCIS index captures the differences across companies within a year, however it is not comparable across years as the sample within regions is changing. 46 Type Variable Definition Rationale properties* I&C customers. We have updated this variable to reflect the exclusion of non-potable water delivered. Treatment and sources characteris tics Sources Total number of sources/ distribution input It is a safe assumption that there are economies of scale in the resource and raw water distribution part of the business. Pumping head Pumping head x distribution input Energy proxy: the higher the pumping head and the lift over which water needs to be pumped, the higher the energy usage – used in old Ofwat opex power model. Proportion of water input from river abstractions Proportion of water input from river abstractions Proxy for water treatment works (WTW) complexity; boreholes are omitted and considering that boreholes water is generally the cheapest type of source to treat, expect signs to be positive. Proportion of water input from reservoirs Proportion of water input from reservoirs Same as above Activity Proportion of new meters (selective + optant meters installed)/ (Metered billed households with external meters + metered billed households without external meters + metered billed non- households) Enhancement activity Proportion of new mains New mains/Total length of mains at year end Enhancement activity Proportion of mains relined and renewed (mains relined + mains renewed)/ Total length of mains at year end Maintenance activity Quality Properties below reference pressure level Properties below reference pressure level/total properties connected Quality measure: the lower the proportion of properties with inadequate water pressure, the higher the costs because companies have spent or are spending money to improve quality but relationship is unclear in the models. Leakage Leakage volume/distribution input Quality measure: the lower the leakage, the higher the costs because companies have spent money to reduce it; however, companies with a lot of leakage will have to spend more to deal with it – does not always work as quality variable. Properties affected by Properties affected by planned interruptions > 3 hrs/ total Service quality measure: the more interruptions, the lower the quality; thus 47 Type Variable Definition Rationale unplanned interruptions > 3 hrs properties connected if interruptions decrease, this might be associated with service enhancement and thus higher costs, particularly because these interruptions are unplanned. Properties affected by planned interruptions > 3 hrs Properties affected by unplanned interruptions/ total properties connected Service quality measure: the more interruptions, the lower the quality; thus if interruptions decrease, this might be associated with service enhancement and thus higher costs; planned interruptions however may be correlated with maintenance works and may result in positive sign. We note that because we have to take the logarithm of the variables and you cannot take the logarithm of zero, we substituted the 0s with 0.001 or 0.00001 depending on whether the variable was a proportion (between 0 and 1) or not. The correlation coefficients between selected variables listed above are shown in Table A1.2 overleaf. To be clear these are not R-squared values for the correlations. They are the square root of the R-squared values. Highly positive correlations (> 0.5) are highlighted in green, while highly negative correlations (< -0.5) are in orange. 48 Table A1.2: Correlation between selected water variables Variable A B C D E F G H I J K L M N O P Q R S T U V Length of mains (A) 1 -0.02 -0.35 -0.03 -0.18 -0.30 -0.03 -0.10 -0.30 0.89 0.02 -0.15 -0.07 0.48 0.62 0.82 -0.09 -0.05 0.70 0.25 0.00 0.87 Property density (B) -0.02 1 0.30 0.76 0.63 0.57 0.71 -0.60 -0.42 0.26 -0.47 -0.26 0.23 -0.51 0.00 -0.15 0.16 -0.34 0.22 -0.33 -0.38 0.40 Usage (C) -0.35 0.30 1 0.46 0.45 0.56 0.25 0.07 -0.12 -0.13 -0.12 0.47 0.32 -0.45 -0.32 -0.39 -0.01 -0.32 -0.34 -0.36 -0.19 -0.08 Average regional wage - entire economy (D) -0.03 0.76 0.46 1 0.85 0.71 0.73 -0.49 -0.06 0.28 -0.38 -0.32 -0.01 -0.52 -0.06 -0.09 0.13 -0.26 0.15 -0.24 -0.22 0.35 Average regional wage (E) -0.18 0.63 0.45 0.85 1.00 0.85 0.61 -0.27 0.19 0.08 -0.16 -0.33 -0.11 -0.68 -0.05 -0.16 -0.02 -0.17 -0.12 -0.15 -0.19 0.14 Regional BCIS index (F) -0.30 0.57 0.56 0.71 0.85 1 0.68 -0.11 0.17 0.01 -0.05 -0.29 -0.02 -0.63 -0.10 -0.34 0.10 -0.28 -0.22 -0.08 -0.13 0.06 Population density (occupancy) (G) -0.03 0.71 0.25 0.73 0.61 0.68 1 -0.40 -0.03 0.29 -0.18 -0.56 -0.04 -0.47 -0.01 -0.16 0.17 -0.36 0.15 -0.07 -0.17 0.31 Proportion of metered properties (H) -0.10 -0.60 0.07 -0.49 -0.27 -0.11 -0.40 1 0.31 -0.18 0.92 0.40 0.19 -0.02 0.11 0.03 -0.16 0.21 -0.45 0.22 0.19 -0.28 Number of sources (I) -0.30 -0.42 -0.12 -0.06 0.19 0.17 -0.03 0.31 1 -0.32 0.39 -0.27 -0.60 -0.27 0.00 -0.07 -0.21 0.25 -0.37 0.12 0.22 -0.42 Pumping head (J) 0.89 0.26 -0.13 0.28 0.08 0.01 0.29 -0.18 -0.32 1 -0.03 -0.22 0.00 0.23 0.52 0.66 0.01 -0.15 0.70 0.22 -0.08 0.94 Proportion of usage by metered household properties (K) 0.02 -0.47 -0.12 -0.38 -0.16 -0.05 -0.18 0.92 0.39 -0.03 1 0.09 0.04 -0.07 0.29 0.16 -0.15 0.20 -0.37 0.28 0.13 -0.15 Proportion of usage by metered nonhousehold properties (L) -0.15 -0.26 0.47 -0.32 -0.33 -0.29 -0.56 0.40 -0.27 -0.22 0.09 1 0.49 0.00 -0.20 -0.08 -0.15 0.01 -0.23 -0.27 -0.01 -0.19 Proportion of water input from river abstractions (M) -0.07 0.23 0.32 -0.01 -0.11 -0.02 -0.04 0.19 -0.60 0.00 0.04 0.49 1 -0.12 -0.13 -0.19 0.14 -0.09 -0.04 -0.07 -0.12 0.15 49 Variable A B C D E F G H I J K L M N O P Q R S T U V Proportion of water input from reservoirs (N) 0.48 -0.51 -0.45 -0.52 -0.68 -0.63 -0.47 -0.02 -0.27 0.23 -0.07 0.00 -0.12 1 0.20 0.37 0.02 0.04 0.31 0.28 0.11 0.19 Proportion of new meters (O) 0.62 0.00 -0.32 -0.06 -0.05 -0.10 -0.01 0.11 0.00 0.52 0.29 -0.20 -0.13 0.20 1 0.53 -0.09 0.01 0.23 0.23 -0.06 0.51 Proportion of new mains (P) 0.82 -0.15 -0.39 -0.09 -0.16 -0.34 -0.16 0.03 -0.07 0.66 0.16 -0.08 -0.19 0.37 0.53 1 -0.17 0.03 0.50 0.14 0.04 0.62 Proportion of mains renewed or relined (Q) -0.09 0.16 -0.01 0.13 -0.02 0.10 0.17 -0.16 -0.21 0.01 -0.15 -0.15 0.14 0.02 -0.09 -0.17 1 -0.06 0.14 -0.01 0.47 0.02 Properties below reference pressure level (R) -0.05 -0.34 -0.32 -0.26 -0.17 -0.28 -0.36 0.21 0.25 -0.15 0.20 0.01 -0.09 0.04 0.01 0.03 -0.06 1 -0.17 0.11 0.34 -0.19 Leakage (S) 0.70 0.22 -0.34 0.15 -0.12 -0.22 0.15 -0.45 -0.37 0.70 -0.37 -0.23 -0.04 0.31 0.23 0.50 0.14 -0.17 1 0.16 0.14 0.74 Properties affected by unplanned interruptions > 3 hrs (T) 0.25 -0.33 -0.36 -0.24 -0.15 -0.08 -0.07 0.22 0.12 0.22 0.28 -0.27 -0.07 0.28 0.23 0.14 -0.01 0.11 0.16 1 0.19 0.13 Properties affected by planned interruptions > 3 hrs (U) 0.00 -0.38 -0.19 -0.22 -0.19 -0.13 -0.17 0.19 0.22 -0.08 0.13 -0.01 -0.12 0.11 -0.06 0.04 0.47 0.34 0.14 0.19 1 -0.10 Distribution input (V) 0.87 0.40 -0.08 0.35 0.14 0.06 0.31 -0.28 -0.42 0.94 -0.15 -0.19 0.15 0.19 0.51 0.62 0.02 -0.19 0.74 0.13 -0.10 1 50 A1.2 Sewerage The set of variables that we include in sewerage no longer include any drivers at the subcompany level. They also include additional drivers for treatment and sludge. We have used the same notation to indicate if a variable has been updated or added since the CEPA Cost Assessment Report. Table A1.3: Range of explanatory variables in sewerage models Type Variable Definition Rationale Core Length of sewers Total length of sewers at year end Network scale variable Density* (water and sewerage properties connected+ sewerage only properties connected)/ length of sewers Rural versus urban divide and another economies of density indicator Usage* Total load entering system/ properties connected51 Network usage and possible proxy for domestic versus industrial and commercial (I&C) usage. Since load measures both strength and volume of the sewage that goes into the system and only the volume affects the network costs, it may not be a perfect proxy. Time trend Year dummy Takes into account that the data is for 10 companies over nine years and shows the change in costs over the years, all other things being equal. Input prices Average regional wage* The data is based on the ONS ASHE SOC surveys by region and allocates companies’ service areas to the regions based on Ofwat’s updated county allocation. The wages figure is the average hourly salary excluding overtime based on the number of jobs in the company area. The data is transformed to real terms using RPI. Please refer to Annex 3 for more information. Input price is one of the main cost drivers; assumption is that there is little outsourced outside the region of the company’s operation. Regional BCIS index Provided by Ofwat. The variable uses the construction price index from BCIS, which is based on tender rather than output prices, and allocates the BCIS areas to the companies based on population numbers from the 2001 census. The index was adjusted by the population proportion served Input price is one of the main capex drivers 51 Properties connected include both household and non-households. 51 Type Variable Definition Rationale within each area. The index is originally reported in real terms. Network activity Proportion of sewers replaced and renewed (Critical sewers replaced + noncritical sewers replaced+ critical sewers renewed + non-critical sewers renewed)/ Total length of sewers at year end Maintenance activity Treatment and sludge Load Total load in kg BOD5 52 /day Size/scale variable and a main cost driver Sludge disposed Total volume (‘000 tonnes) of dry solids (ttds) Size/scale variable and a main cost driver Proportion of load in treatment works size bands 1-3 (Load in band 1+ Load in band 2+ Load in band 3)/total load This variable should be interpreted in reference to proportion of load in the omitted size band, usually band 6. Since Bands 1-3 tend to be more expensive than higher bands in terms of unit costs due to diseconomies of scale, it is expected that a higher proportion of 1-3 load would lead to higher costs. Proportion of activated sludge treatment Load subject to secondary and tertiary activated sludge treatment/Total load As this is considered the most expensive type of treatment from the ones reported, coefficient sign is expected to be positive. Interpreted against all other treatment type proportion. Number of large works with the tight consents dummy Count of all dummy variables for works with tight consent on suspended solids (SS), Biological Oxygen Demand (BOD5) and ammonia;53 1 if tight consent exists on both SS and BOD5 or ammonia As tight consent requires companies to meet certain discharge quality, this will lead to higher opex. It also partially picks up economies of scale. The correlation coefficients in sewerage are shown in Table A1.4 below. 52 Biological Oxygen Demand. 53 Thresholds for the determination of consent: 30 mg/l for suspended solids, 20 mg/l for BOD, 5 mg/l for ammonia; if the level of each of these items is below the threshold, tight consent is equal to 1 as it is more expensive achieve a lower or tighter concentration in the consent.. 52 Table A1.4: Correlation between selected sewerage variables Variable A B C D E F G H I J K L M N O Length of sewers (A) 1 -0.09 -0.11 0.71 0.49 0.35 0.96 0.98 -0.31 -0.61 -0.48 -0.53 0.56 0.51 0.95 Property density (B) -0.09 1 -0.09 0.24 0.39 0.49 0.05 0.07 -0.02 -0.26 -0.42 -0.60 0.48 0.45 0.01 Usage (C) -0.11 -0.09 1 0.01 -0.28 -0.33 -0.07 -0.04 0.17 -0.33 -0.48 -0.26 0.37 0.22 -0.23 Average regional wage full economy (D) 0.71 0.24 0.01 1 0.82 0.72 0.80 0.77 -0.31 -0.59 -0.49 -0.59 0.58 0.59 0.63 Average regional wage (E) 0.49 0.39 -0.28 0.82 1 0.83 0.59 0.54 -0.35 -0.39 -0.20 -0.49 0.38 0.39 0.45 Regional BCIS index (F) 0.35 0.49 -0.33 0.72 0.83 1 0.49 0.43 -0.19 -0.25 -0.16 -0.39 0.29 0.47 0.32 Sludge disposed (G) 0.96 0.05 -0.07 0.80 0.59 0.49 1 0.98 -0.31 -0.59 -0.52 -0.58 0.59 0.61 0.90 Total load (H) 0.98 0.07 -0.04 0.77 0.54 0.43 0.98 1 -0.32 -0.67 -0.60 -0.64 0.67 0.61 0.93 Sewers replaced and renovated (I) -0.31 -0.02 0.17 -0.31 -0.35 -0.19 -0.31 -0.32 1 0.13 0.06 0.05 -0.08 -0.04 -0.30 Proportion of load in treatment works size bands 1-3 (J) -0.61 -0.26 -0.33 -0.59 -0.39 -0.25 -0.59 -0.67 0.13 1 0.87 0.83 -0.93 -0.49 -0.56 Proportion of load in treatment works size band 4 (K) -0.48 -0.42 -0.48 -0.49 -0.20 -0.16 -0.52 -0.60 0.06 0.87 1 0.84 -0.95 -0.64 -0.47 Proportion of load in treatment works size band 5 (L) -0.53 -0.60 -0.26 -0.59 -0.49 -0.39 -0.58 -0.64 0.05 0.83 0.84 1 -0.95 -0.73 -0.53 Proportion of load in treatment works size band 6 (M) 0.56 0.48 0.37 0.58 0.38 0.29 0.59 0.67 -0.08 -0.93 -0.95 -0.95 1 0.68 0.55 Proportion of activated sludge treatment (N) 0.51 0.45 0.22 0.59 0.39 0.47 0.61 0.61 -0.04 -0.49 -0.64 -0.73 0.68 1 0.48 Number of tight consent large works (O) 0.95 0.01 -0.23 0.63 0.45 0.32 0.90 0.93 -0.30 -0.56 -0.47 -0.53 0.55 0.48 1 53 ANNEX 2: ALTERNATIVE VARIABLES Besides the variables included in Tables 2.1 and Table 2.2., we considered and tested a range of alternative or additional variables. For the variables we only considered but could not test, data was either not available or was not sufficiently reliable for the time period modelled. Here we mainly discuss the variables that we tried as alternative measures and briefly touch on the ones we would have liked to test. A2.1 Water In water, we considered a few variables in addition to those discussed in Table 2.1. Table A2.1: Alternative variables explored in water Alternative variable Original variable Use Reason for rejection Unsmoothed costs Smoothed costs Capex profile Unsmoothed capex is relatively volatile over the years. Models with smoothed capex are more robust. Quality deltas (change in quality ) Quality variables defined in Table A1.1 Measure quality Not significant and not acting like a quality variable. Quality lags Quality variables defined in Table A1.1 Measure quality Not significant and not acting like a quality variable. Pumping head Pumping head x distribution input Energy proxy Identical results Distribution input Water delivered In usage variable Models with alternative variable did not perform better than original variable. Non-normalised quality and activity variables Normalised Measure quality and activity Takes away from the core coefficients Gross weekly average regional wage including all occupations Hourly average regional wage, weighting two SOC options Regional wage proxy Includes a range of occupations not applicable to the water and sewerage industry. We have tried several alternatives here discussed in Annex 3. Serviceability dummy N/A Quality measure This is not feasible to collect going forward as objectivity is compromised when companies self-assess serviceability. In addition, we tested various other independent variables in the modelling, but ruled them out because of low data quality and/ or low variance during the given period (even between companies). These variables included:  refurbished water treatment works (poor data quality);  number of water treatment works (poor data quality);  new/replaced water treatment works (poor data quality);  capacity of water treatment works for maintenance (poor data quality) 54  Security Of Supply Index (no variation in data);  internal floods (sets adverse cost incentives)  external floods (no data for the entire period); and  population equivalent (p.e.) of refurbished sewage treatment works (poor data quality). A2.2 Sewerage We also considered alternative measures to those set out in Table 2.2, but decided against incorporating them in the preferred sewerage models for various reasons. Table A2.2: Alternative variables explored in sewerage Alternative variable Final variable Use Reason for rejection Unsmoothed costs Smoothed costs Capex profile Unsmoothed capex is relatively volatile over the years. Models with smoothed capex are more robust. Pumping station capacity N/A Proxy for energy use Only one year of data available. Should be already captured by density and load as it would be correlated with size. No benefit of including it as a constant. Load entering system / length of sewerage Load entering system/ properties connected Usage measure Highly collinear with density and thus makes the interpretation of coefficients less transparent. Load entering system Load treated Load measure in wholesale model Treatment comprises a larger proportion of the wholesale base costs and thus the model should be more ‘treatment’-oriented. Some companies do not have all of their load entering system being treated under the economically regulated entity, so using load entering system (a network variable) for treatment is not appropriate. Load treated/ load entering system Smoothed wage Unsmoothed wage Wages in the water industry are not as volatile year on year as the proxy used (based on household survey in the regions). Models with alternative variable did not perform better than original variable Wage index (100 + Ofwat wage differential %) Unsmoothed wage (level) To capture regional differences, not over time Models with alternative variable did not perform better with original variable Sludge treated Sludge disposed Not all sludge treated is disposed Insufficient data 55 Alternative variable Final variable Use Reason for rejection in that year Proportion of load subject to three tight consents Number of large works with the tight consents dummy Sludge quality and treatment requirements Models with alternative variable did not perform better than original variable. Flooding incidents (overloaded sewers) N/A To capture quality of network services, specifically internal flooding Quality measure: the higher the number of floods, the lower the quality, the lower the cost; as flood incidents are also a source of opex, variable may not work as a quality measure but a maintenance driver Flooding incidents (overloaded + equipment failure + blockages) N/A Proportion of load in band 4 & Proportion of load in band 5 & Proportion of load in band 6 Proportion of load in bands 1-3 To capture economies of scale in the size of treatment facility Models with a combination of these were tested. The coefficients were most reasonable when only controlling for bands 1-3. This only changes the interpretation of the coefficients, since by controlling for 1-3 we interpret the coefficients vis-à-vis bands 4-6. Serviceability N/A Quality measure This is not feasible to collect going forward as objectivity is compromised when companies self-assess serviceability. 56 ANNEX 3: REGIONAL WAGES We have tested several options for the wage variable to take into account regional differences in labour costs. All of them are based on data collected from the ONS ASHE survey, allocated to the territory of operation of each company. In our final models, we use a variable that is based on Table 15.6a of the ASHE series, which provides regional hourly earnings by occupation category, excluding overtime pay. This annex describes in detail how we constructed the variable and what alternatives we have tested. A3.1 Constructing the regional wages variable ASHE reports the mean and median earnings in its data series. In our analysis, we have decided to use the mean as it better captures the distribution of earnings within the occupation category. In their RIIO-ED1 modelling for Ofgem Frontier Economics tested both mean and median estimates and concluded that the mean was statistically more robust than the median. We also considered using weekly instead of hourly pay to proxy differences in regional wages. Weekly pay may be capturing differences in company policies and in efficiency. For example, if employees in one company work 40 hours a week while employees in another company work 35 hours a week, doing the same job, this would mean that the weekly wages would allow for that inefficiency. We therefore consider hourly wages to be a better proxy for regional discrepancies outside company control. We have excluded overtime pay for similar reasons – it may be a better proxy for differences in company policy (in any industry) rather than a proxy for regional differences. Ideally we would like to use a proxy for regional wage differences in water and sewerage but narrowing it down to the industry we are modelling would lead to an endogeneity problem. If we use the industry specific wage reported in the ASHE (SIC series), the companies have the ability to directly influence that data in the future, and the driver would no longer be outside of their control. On the other hand, we would prefer to capture wages in occupations that are more comparable to water and sewerage sector, rather than using the overall economy differences. Occupations which substantially drive the overall wage differences, particularly in London, such as banking and law would not be representative of water and sewerage and would thus reduce the proxy power of our variable. We therefore weight together two types of occupations, which we consider are predominant and best capture the regional differences between water and sewerage companies. The ones we exclude and our reasoning behind it are summarised in Table A3.1 below. Table A3.1: Excluded occupation categories Category excluded Reason Managers and senior officials We assume at this managerial level there should be a national market Associate professional and technical occupations Relevant ones are already covered in the professional occupations Agriculture, textile, etc Not relevant 57 Category excluded Reason Sales and customer service occupations Call centres should be retail Personal service occupations Not relevant Transport and mobile machine drivers and operatives Not relevant Elementary occupations Not relevant One of the occupations that we include in our wage variable proxies specialist labour, such as engineers, while the other one proxies skilled construction labour. We have used the 2-digit SOC level for each of those:  Specialist: 21 - Science, research, engineering and technology professionals; and  Skilled: 53 - Skilled Construction And Building Trades. We also considered using the 1-digit SOC occupations but that includes occupations that are not applicable to the water and sewerage industry. For example, professional occupations (1-digit alternative for the specialist proxy) includes health workers, teaching professionals, social workers, legal advisers, etc. Although companies may have a few internal lawyers, we assume they do not represent a large proportion even if you look at the wage bill rather than wage level. Data in 3- and more-digit occupation categories are less robust because they rely on smaller sample sizes and may also create industry bias. In terms of the skilled proxy, we also tested a combination of the occupation above (53) and Process, Plant And Machine Operatives (81). The movements across regions are very close to those of only using 53 and we thus think it would not add value to the analysis. Ofgem used a similar approach in DPCR5 and consulted companies on the relative weighting of these two categories of occupation (specialist and skilled) in the different parts of the distribution value chain. On average they assigned a 60:40 ratio in favour of skilled labour. We checked the implicit weight of these two types of occupations in the ASHE sample and compared it to Ofgem’s assumption. We also tested the sensitivity of assigning widely different weights to the two occupations on the overall regional differences. The impact was minor and we have thus stuck with the 60:40 weights in both water and sewerage. The ASHE SOC data in Table 15.6 provides data at the national and regional level. Unlike the Table 7 series (which is not broken down by occupation), it does not have local area breakdowns. However, Ofwat weighted the local area allocations that it had done internally up to the regional level and we were able to use those regional weights to construct the company specific wage variable. The weights used vary between water and sewerage because the companies often cover different territories. for each service We note that the specialist proxy occupation category that we use (21) was reported in the ASHE as Science And Technology Professionals prior to 2010-11. These changes would be consistent across regions and companies and we therefore expect that the variable would still be picking up regional differentials. This structural break, however, may result in changes in the interpretation of the time trend and in the significance of the regional wage variable. The pooling test that we conducted does not indicate that the elasticity of cost with respect to wage is significantly different in the two periods (pre and post 2010-11). Since the last year of actual 58 ASHE SOC data available is 2011-12, we have extrapolated the variable for 2012-13 to be able to use in our analysis. We have assumed that the regional differences have remained the same and that the level of real wages has not changed. A3.2 Alternative regional wage variables We tested a few alternative wage variables. All of them were based on the ONS ASHE data. A3.2.1 Whole economy, local area level One of them used the gross weekly pay (including overtime) reported in Table 7, which provides a breakdown by local area. The theoretical disadvantages of that variable were:  The allocation of local areas to companies’ territory makes the implicit and bold assumption that if a company requires work to be done, say in Islington, it would hire someone from Islington to do it, rather than someone from its wider region of operation; and  It encompasses all occupations (including bankers, lawyers, agricultural workers, etc), which would overestimate the differential between rural and urban areas. This variable resulted in a much wider range of wage levels across companies (around 30%), mainly driven by outlier observations. These are highly unrealistic assumptions, considering water and sewerage workers cover similar activities. This variable is highly correlated with the selected regional wage proxy. Moreover, using this wage variable affected the time trend, which is much higher if using this alternative wage variable. This could be because the time trend is offsetting the downward trend in the real wage variable constructed this way. Using this variable (and the corresponding high time trend) would then mean assuming that in the future real wages will go down in the same way, which is a bold assumption. A3.2.2 BCIS-style relative level We also tested using a wage variable that is similarly constructed to the BCIS variable, i.e. it does not take the changes in wages overtime. In calculating this relative level of wages in each year, the differences in wages would then be picked up in the time trend and values would not be comparable year on year. The use of this index-like variable (between 0 and 2) resulted in the BCIS and the time trend picking up the wage effect. These models did not have superior predictive power to the ones using the real wage variable described in A3.1. Because of the implications on forecasts and the benefits of capturing the relationship between wages and costs over time, we therefore preferred using the regional real wage variable based on a few occupations, weighted together. 59 ANNEX 4: WATER TEMPLATES WM1: Totex full translog RE Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Totex 5 years Translog GLS Panel, random effects 90 28 Econometric results Variable For those variables whose signs are ambiguous should that be indicated – otherwise it looks like a lot of variables are not expected sign Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant -0.29124 Length of mains .90529***   Density -0.1413 Usage -0.14018 Length^2 -0.02157 Density^2 1.18367**  Usage^2 0.51774 Length x Density .66907***  Length x Usage 0.05205 Density x Usage -0.93588 Time trend 0.00188  Average regional wage 1.23852***   Population density -0.68133 Proportion of metered properties -0.41302 Sources -.25322***  Pumping head .14322*   Proportion of water input from river abstractions 0.00164  Proportion of water input from reservoirs -0.01667 Proportion of new meters 0.01179  Proportion of new mains -0.02076 Proportion of mains restored/renovated .04406***   Properties below reference pressure level 0.00097 60 Leakage volume -0.15273  Properties affected by unplanned interruptions > 3 hrs 0.01949 Properties affected by planned interruptions > 3 hrs 0.01111 Proportion of usage by metered household properties 0.24588  Proportion of usage by metered non-household properties -.28900**  Regional BCIS 0.1277  Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Many variables insignificant as well as unexpected sign/magnitude. This is not unexpected given the number of explanatory variables and sample size and is also likely due to multicollinearity. For these reasons we give this model Amber A in statistical performance. Goodness of fit: 0.996435 (Adjusted R-squared: 0.994798607) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Rankings and scores very similar to other full totex models but slightly different from refined models. Scores also differ from the base models which exclude enhancement. Robustness to specification: The refinement of this model showed sensitivity of coefficients to dropping variables (likely in part due to multicollinearity). Therefore given Amber A . Pooling test: Pooling of variables across time was tested. Evidence only of very small and immaterial wage pooling pre/post financial crisis (taken as 2008/09). This finding was seen as less robust because pre-crisis coefficient was based on a single year of data. Companies Cost efficiency Rank Anglian Water Services 89.2% 14 Dwr Cymru Cyfyngedig (Welsh) 90.5% 12 Northumbrian Water Ltd 95.3% 3 Severn Trent Water Ltd 100.0% 1 South West Water Ltd 93.8% 6 61 Southern Water Services Ltd 93.5% 8 Thames Water Utilities Ltd 91.7% 11 United Utilities Water Plc 80.9% 18 Wessex Water Services Ltd 89.5% 13 Yorkshire Water Services Ltd 93.7% 7 Affinity Water 86.6% 16 Bristol Water plc 84.0% 17 Dee Valley Water Plc 92.4% 10 Portsmouth Water Ltd 94.1% 5 Sembcorp Bournemouth Water 92.9% 9 South East Water Ltd 94.9% 4 South Staffordshire Cambridge 96.6% 2 Sutton & East Surrey Water Ltd 86.9% 15 62 WM2: Totex translog RE without BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Totex 5 years Translog GLS Panel, random effects 90 27 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant -0.406 Length of mains .89700***   Density -0.13344 Usage -0.14851 Length^2 -0.02536 Density^2 1.14365**  Usage^2 0.56372 Length x Density .68183***  Length x Usage 0.07234 Density x Usage -0.9283 Time trend 0.00176  Average regional wage 1.23810***   Population density -0.68345 Proportion of metered properties -0.36818 Sources -.24802***  Pumping head .15226**   Proportion of water input from river abstractions 0.00167  Proportion of water input from reservoirs -.01871*  Proportion of new meters 0.01122  Proportion of new mains -0.02081 Proportion of mains restored/renovated .04454***   Properties below reference pressure level 0.00102 Leakage volume -0.1549  63 Properties affected by unplanned interruptions > 3 hrs 0.01959 Properties affected by planned interruptions > 3 hrs 0.01074 Proportion of usage by metered household properties 0.21248  Proportion of usage by metered non-household properties -.29520**  Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Many variables insignificant as well as unexpected sign/magnitude. This is not unexpected given the number of explanatory variables and sample size and is also likely due to multicollinearity. Therefore given Amber A . Goodness of fit: 0.996309 (Adjusted R-squared: 0.994701629) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Rankings and scores very similar to other full totex models, including Model 1, but slightly different from refined models. Scores also differ from the base models which exclude enhancement. Robustness to specification: Refinement of this model showed sensitivity of coefficients to specification. Therefore given Amber A . Pooling test: Pooling of variables across time was tested. Similar to Model 1, no substantial difference in coefficients over time. Companies Cost efficiency Rank Anglian Water Services 89.0% 14 Dwr Cymru Cyfyngedig (Welsh) 90.5% 12 Northumbrian Water Ltd 94.9% 3 Severn Trent Water Ltd 100.0% 1 South West Water Ltd 93.8% 7 Southern Water Services Ltd 93.6% 8 Thames Water Utilities Ltd 91.6% 11 United Utilities Water Plc 80.6% 18 Wessex Water Services Ltd 89.4% 13 64 Yorkshire Water Services Ltd 94.1% 5 Affinity Water 86.5% 16 Bristol Water plc 83.5% 17 Dee Valley Water Plc 92.4% 10 Portsmouth Water Ltd 94.0% 6 Sembcorp Bournemouth Water 93.0% 9 South East Water Ltd 94.9% 4 South Staffordshire Cambridge 97.0% 2 Sutton & East Surrey Water Ltd 86.6% 15 65 WM3: Totex full translog COLS without BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Totex 5 years Translog OLS Pooled cross- section 90 27 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant -0.96128 Length of mains .90456***   Density -0.27601 Usage -0.03222 Length^2 -0.03077 Density^2 1.15405***  Usage^2 -0.24695 Length x Density .64729***  Length x Usage -0.00603 Density x Usage -0.06318 Time trend 0.01193  Average regional wage 1.49168***  Population density -0.56056 Proportion of metered properties -0.77579  Sources -.29272***  Pumping head 0.12203  Proportion of water input from river abstractions 0.00224  Proportion of water input from reservoirs -0.01501 Proportion of new meters 0.02846  Proportion of new mains -.03075**  Proportion of mains restored/renovated .02901**   Properties below reference pressure level 0.00295 Leakage volume -0.20009  66 Properties affected by unplanned interruptions > 3 hrs 0.00779 Properties affected by planned interruptions > 3 hrs 0.02661 Proportion of usage by metered household properties 0.5006  Proportion of usage by metered non-household properties -0.17073 Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-varying efficiency based on a pooled structure. Statistical performance Sign, size, significance of variables: Many variables insignificant as well as unexpected sign/magnitude. Density and usage are unexpected signs but insignificant. Therefore given Amber A Goodness of fit: 0.996785 (Adjusted R-squared: .99546) Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Rankings and scores very similar to other full totex models, including Models 1 and 2, but slightly different from refined models. Scores also differ from the base models which exclude enhancement. Robustness to specification: Refinement of this model showed sensitivity of coefficients to specification. Therefore given Amber A Pooling test: Pooling of variables across time was tested. No substantial difference in coefficients over time. Companies Rebased cost efficiency Ranks Anglian Water Services 90.1% 13 Dwr Cymru Cyfyngedig (Welsh) 89.5% 14 Northumbrian Water Ltd 97.7% 2 Severn Trent Water Ltd 100.0% 1 South West Water Ltd 94.9% 3 Southern Water Services Ltd 92.1% 9 Thames Water Utilities Ltd 91.9% 10 United Utilities Water Plc 83.0% 18 Wessex Water Services Ltd 90.9% 12 Yorkshire Water Services Ltd 91.0% 11 67 Affinity Water 87.1% 17 Bristol Water plc 87.3% 16 Dee Valley Water Plc 92.3% 7 Portsmouth Water Ltd 94.7% 4 Sembcorp Bournemouth Water 92.2% 8 South East Water Ltd 94.5% 5 South Staffordshire Cambridge 94.4% 6 Sutton & East Surrey Water Ltd 88.9% 15 68 WM4: Totex refined CD RE without BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Totex 5 years Cobb- Douglas GLS Panel, random effects 90 9 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant -8.75224*** Length of mains 1.11822***   Density 0.09766  Time trend -0.00329  Average regional wage 1.03174***   Population density 0.90024  Proportion of mains restored/renovated .05567***   Proportion of water input from river abstractions 0.00892  Proportion of water input from reservoirs -0.01441 Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Test carried out to see effect of CobbDouglas formulation. In general statistical testing favours the translog. Therefore given Amber A Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Generally as expected. Density is lower than expected. Proportion of water from reservoirs is negative. Therefore given Amber A Goodness of fit: 0.978893(Adjusted R-squared: 0.976515125) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. Robustness Rank correlations: Rankings and scores very different from translog models. Range of 69 testing efficiency scores also relatively high and looks implausible. Therefore given Red R Robustness to specification: This is a refined model, so relatively stable. Companies Cost efficiency Rank Anglian Water Services 91.9% 2 Dwr Cymru Cyfyngedig (Welsh) 69.5% 13 Northumbrian Water Ltd 77.8% 9 Severn Trent Water Ltd 85.7% 5 South West Water Ltd 67.4% 15 Southern Water Services Ltd 79.5% 8 Thames Water Utilities Ltd 54.1% 18 United Utilities Water Plc 69.9% 12 Wessex Water Services Ltd 68.2% 14 Yorkshire Water Services Ltd 83.0% 7 Affinity Water 83.1% 6 Bristol Water plc 60.5% 17 Dee Valley Water Plc 66.7% 16 Portsmouth Water Ltd 100.0% 1 Sembcorp Bournemouth Water 77.4% 10 South East Water Ltd 86.8% 3 South Staffordshire Cambridge 86.1% 4 Sutton & East Surrey Water Ltd 73.2% 11 70 WM5: Totex refined translog COLS without BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Totex 5 years Translog OLS Pooled cross- section 90 12 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 2.88752*  Length of mains 1.07182***   Density 0.21036  Length^2 -0.02259 Density^2 1.06674**  Length x Density .51222***  Time trend -0.00675  Average regional wage 0.71957  Population density 0.98924  Proportion of mains relined and renovated .06502***   Proportion of water input from reservoirs -0.01397 Proportion of water input from river abstractions .02014***   Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-varying efficiency based on a pooled structure. Statistical performance Sign, size, significance of variables: Generally as expected. Density is slightly lower than expected. Proportion of water from reservoirs is negative but could be due to multicollinearity. Therefore given Green G Goodness of fit: 0.990676 (Adjusted R-squared: .98936) 71 Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings in line with other refined models but different from full models. Robustness to specification: This model is relatively refined and stable. Therefore given Green G Pooling test: Pooling of variables across time was tested. No substantial difference in coefficients over time. Companies Rebased cost efficiency Ranks Anglian Water Services 93.5% 7 Dwr Cymru Cyfyngedig (Welsh) 82.9% 15 Northumbrian Water Ltd 89.9% 11 Severn Trent Water Ltd 94.0% 6 South West Water Ltd 90.2% 10 Southern Water Services Ltd 89.1% 12 Thames Water Utilities Ltd 86.8% 14 United Utilities Water Plc 78.0% 16 Wessex Water Services Ltd 87.9% 13 Yorkshire Water Services Ltd 93.1% 8 Affinity Water 98.3% 3 Bristol Water plc 71.1% 18 Dee Valley Water Plc 95.1% 5 Portsmouth Water Ltd 98.1% 4 Sembcorp Bournemouth Water 90.8% 9 South East Water Ltd 99.2% 2 South Staffordshire Cambridge 100.0% 1 Sutton & East Surrey Water Ltd 75.6% 17 72 WM6: Totex refined translog RE without BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Totex 5 years Translog GLS Panel, random effects 90 12 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 2.51229**  Length of mains 1.07838***   Density 0.28066  Length^2 -0.01917 Density^2 .94174*  Length x Density .55717***  Time trend -0.00319  Average regional wage .95771***   Population density 0.49497  Proportion of mains restored/renovated .05565***   Proportion of water input from reservoirs -0.01229 Proportion of water input from river abstractions 0.01182  Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Generally as expected. Density is slightly lower than expected but still with expected sign. Proportion of water from reservoirs is negative but insignificant. Therefore given Green G 73 Goodness of fit: 0.990126 (Adjusted R-squared: 0.988587195) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings in line with other refined totex models, though different from the full models. Robustness to specification: Relatively refined and stable. Therefore given Green G Pooling test: Pooling of variables across time was tested. No substantial difference in coefficients over time. Companies Cost efficiency Rank Anglian Water Services 95.1% 5 Dwr Cymru Cyfyngedig (Welsh) 79.2% 16 Northumbrian Water Ltd 89.7% 12 Severn Trent Water Ltd 93.3% 7 South West Water Ltd 87.2% 13 Southern Water Services Ltd 92.2% 9 Thames Water Utilities Ltd 85.6% 14 United Utilities Water Plc 79.6% 15 Wessex Water Services Ltd 92.0% 10 Yorkshire Water Services Ltd 92.8% 8 Affinity Water 95.5% 4 Bristol Water plc 69.5% 18 Dee Valley Water Plc 94.8% 6 Portsmouth Water Ltd 98.8% 2 Sembcorp Bournemouth Water 90.6% 11 South East Water Ltd 100.0% 1 South Staffordshire Cambridge 97.7% 3 Sutton & East Surrey Water Ltd 75.0% 17 74 WM7: Totex refined translog RE with BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Totex 5 years Translog GLS Panel, random effects 90 13 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 2.33549**  Length of mains 1.06992***   Density 0.27595  Length^2 -0.02905 Density^2 .93629**  Length x Density .57342***  Time trend -0.00198  Average regional wage 1.00698***   Population density 0.52478  Proportion of mains restored/renovated .05677***   Proportion of water input from reservoirs -0.01535 Proportion of water input from river abstractions 0.01197  Regional BCIS -0.27099 Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Density is lower than expected and proportion of water from reservoirs is negative. In this refined model, BCIS is has a highly unexpected 75 sign and since we consider it to be a core variable, we have given Red R Goodness of fit: 0.989852 (Adjusted R-squared: 0.988116158) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings in line with other refined models but different from full models. Robustness to specification: Robust with regards to dropping BCIS but otherwise relatively refined. Therefore given Green G Pooling test: Pooling of variables across time was tested. No substantial difference in coefficients over time. Companies Cost efficiency Rank Anglian Water Services 94.8% 5 Dwr Cymru Cyfyngedig (Welsh) 79.3% 16 Northumbrian Water Ltd 89.8% 12 Severn Trent Water Ltd 93.9% 7 South West Water Ltd 87.9% 13 Southern Water Services Ltd 92.1% 10 Thames Water Utilities Ltd 85.7% 14 United Utilities Water Plc 80.1% 15 Wessex Water Services Ltd 92.5% 9 Yorkshire Water Services Ltd 93.3% 8 Affinity Water 95.5% 4 Bristol Water plc 69.3% 18 Dee Valley Water Plc 94.6% 6 Portsmouth Water Ltd 99.5% 2 Sembcorp Bournemouth Water 91.4% 11 South East Water Ltd 100.0% 1 South Staffordshire Cambridge 99.4% 3 Sutton & East Surrey Water Ltd 74.8% 17 76 WM8: Base refined translog RE with BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Opex+base Capex 5 years Translog GLS Panel, random effects 90 13 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 1.62291* Length of mains 1.02650***   Density .39851**   Length^2 0.01183 Density^2 0.33649 Length x Density .45041***  Time trend .00993*   Average regional wage .91856***   Population density 1.09772**   Proportion of mains restored/renovated .03850***   Proportion of water input from reservoirs -0.00035  Proportion of water input from river abstractions 0.00361  Regional BCIS -0.19048 Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: BCIS is negative. We consider BCIS to be a core 77 variable, therefore given Red R Goodness of fit: 0.987432 (Adjusted R-squared: 0.985282211) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings in line with other base models. Robustness to specification: Robust with regards to dropping BCIS, otherwise relatively stable. Therefore given Green G Pooling test: Pooling of variables across time was tested. No substantial difference in coefficients over time. Companies Cost efficiency Rank Anglian Water Services 87.0% 4 Dwr Cymru Cyfyngedig (Welsh) 65.3% 18 Northumbrian Water Ltd 82.1% 8 Severn Trent Water Ltd 81.9% 10 South West Water Ltd 89.3% 3 Southern Water Services Ltd 79.6% 14 Thames Water Utilities Ltd 81.9% 9 United Utilities Water Plc 80.5% 12 Wessex Water Services Ltd 80.6% 11 Yorkshire Water Services Ltd 83.6% 7 Affinity Water 76.9% 15 Bristol Water plc 69.2% 16 Dee Valley Water Plc 86.2% 6 Portsmouth Water Ltd 94.3% 2 Sembcorp Bournemouth Water 80.1% 13 South East Water Ltd 100.0% 1 South Staffordshire Cambridge 86.3% 5 Sutton & East Surrey Water Ltd 67.3% 17 78 WM9: Base refined translog COLS without BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Opex+base Capex 5 years Translog OLS Pooled cross- section 90 12 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 2.9165 Length of mains 1.03714***   Density 0.27499  Length^2 0.01439 Density^2 0.23994 Length x Density .35875*  Time trend -0.00077  Average regional wage 0.28008  Population density 2.03158**  Proportion of mains restored/renovated .05994**   Proportion of water input from reservoirs -0.00654 Proportion of water input from river abstractions 0.00477  Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-varying efficiency based on a pooled structure. Statistical performance Sign, size, significance of variables: Generally as expected. Population density slightly high, possibly due to multicollinearity. Therefore given Amber A Goodness of fit: 0.989328 (Adjusted R-squared: .98782) 79 Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings in line with other base models. Robustness to specification: Relatively refined and stable. Therefore given Green G Pooling test: Pooling of variables across time was tested. No substantial difference in coefficients over time.. Companies Rebased cost efficiency Rank Anglian Water Services 86.6% 7 Dwr Cymru Cyfyngedig (Welsh) 71.1% 18 Northumbrian Water Ltd 84.8% 9 Severn Trent Water Ltd 87.2% 6 South West Water Ltd 91.8% 3 Southern Water Services Ltd 75.5% 15 Thames Water Utilities Ltd 82.7% 12 United Utilities Water Plc 79.6% 14 Wessex Water Services Ltd 82.1% 13 Yorkshire Water Services Ltd 87.3% 5 Affinity Water 83.5% 10 Bristol Water plc 73.3% 16 Dee Valley Water Plc 86.2% 8 Portsmouth Water Ltd 96.1% 2 Sembcorp Bournemouth Water 83.0% 11 South East Water Ltd 100.0% 1 South Staffordshire Cambridge 90.7% 4 Sutton & East Surrey Water Ltd 71.6% 17 80 WM10: Base refined translog RE without BCIS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Water Opex+base Capex 5 years Translog GLS Panel, random effects 90 12 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 1.71338* Length of mains 1.03225***   Density .40509**   Length^2 0.01912 Density^2 0.35379 Length x Density .44863***  Time trend .00941*   Average regional wage .90116***   Population density 1.05336**   Proportion of mains restored/renovated .03764***   Proportion of water input from reservoirs 0.00214  Proportion of water input from river abstractions 0.00388  Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Generally as expected. Therefore given Green G Goodness of fit: 0.987553 (Adjusted R-squared: 0.985613208) 81 Hausman test: supports the selection of random effects. Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings in line with other refined totex models, though different from the full models. Robustness to specification: Relatively refined and stable. Therefore given Green G Pooling test: Pooling of variables across time was tested. No substantial difference in coefficients over time. Companies Cost efficiency Rank Anglian Water Services 87.0% 4 Dwr Cymru Cyfyngedig (Welsh) 65.1% 18 Northumbrian Water Ltd 82.0% 9 Severn Trent Water Ltd 81.4% 10 South West Water Ltd 89.0% 3 Southern Water Services Ltd 79.8% 13 Thames Water Utilities Ltd 82.2% 8 United Utilities Water Plc 80.1% 12 Wessex Water Services Ltd 80.2% 11 Yorkshire Water Services Ltd 83.2% 7 Affinity Water 76.8% 15 Bristol Water plc 69.3% 16 Dee Valley Water Plc 86.8% 5 Portsmouth Water Ltd 93.6% 2 Sembcorp Bournemouth Water 79.5% 14 South East Water Ltd 100.0% 1 South Staffordshire Cambridge 85.1% 6 Sutton & East Surrey Water Ltd 67.3% 17 82 ANNEX 5: SEWERAGE TEMPLATES SW1: Base sewerage network refined translog RE Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewage network Opex + base capex 7 years Translog GLS Panel, random effects 70 = 10 companies x 7 years 7 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 2.38617* Length of sewers .81503***   Density 0.57753  Length^2 0.07573 Density^2 -2.41709 Length x Density -2.80243***  Time trend .01923***   Regional wage 0.65243  Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Generally as expected. Therefore given green G Goodness of fit: 0.9188448 (Adjusted R-squared: .918614) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. 83 Robustness testing Rank correlations: Scores and rankings in line with the other network model, though different from the full models. Robustness to specification: Refined and stable. Not substantially sensitive to adding marginal variables. Therefore given Green G Pooling test: Pooling of variables across time was tested. No evidence of pooling. Company Cost efficiency Rank Anglian Water Services 84.9% 5 Dŵr Cymru Cyfyngedig (Welsh) 64.8% 10 Northumbrian Water Ltd 93.5% 2 Severn Trent Water Ltd 78.9% 7 South West Water Ltd 82.3% 6 Southern Water Services Ltd 75.4% 8 Thames Water Utilities Ltd 87.1% 4 United Utilities Water Plc 69.2% 9 Wessex Water Services Ltd 92.2% 3 Yorkshire Water Services Ltd 100.0% 1 84 SW2: Base sewerage network refined translog COLS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewage network Opex + base capex 5 years Translog OLS Pooled cross- section 70 = 10 companies x 7 years 8 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 7.54856 Length of sewers .93319***   Density 1.68459**   Length^2 .16529*  Density^2 3.87864 Length x Density -2.80880**  Time trend -0.00286  Regional wage -1.11998 Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-varying efficiency based on a pooled structure. Statistical performance Sign, size, significance of variables: Regional wages are highly negative. Therefore given Red R Goodness of fit: 0.9334214 (Adjusted R-squared: .92590) Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings in line with other network models, though different from the full models. Robustness to specification: Robust with regards to addition of marginal variables. 85 Therefore given Green G Pooling test: Pooling of variables across time was tested. Evidence only of very small and immaterial wage pooling pre/post financial crisis (taken as 2008/09). Company Rebased average cost efficiency Rank Anglian Water Services 85.4% 4 Dŵr Cymru Cyfyngedig (Welsh) 71.9% 9 Northumbrian Water Ltd 98.2% 2 Severn Trent Water Ltd 84.1% 5 South West Water Ltd 83.1% 6 Southern Water Services Ltd 82.2% 7 Thames Water Utilities Ltd 86.9% 3 United Utilities Water Plc 71.3% 10 Wessex Water Services Ltd 82.1% 8 Yorkshire Water Services Ltd 100.0% 1 86 SW3: Base sewage treatment and sludge full translog RE Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewage treatment + sludge Opex + base capex 7 years Translog GLS Panel, random effects 70 = 10 companies x 7 years 8 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 2.00731 Load .83981***   Load^2 0.01338 Time trend .02182***   Regional wage 1.21993***   Proportion of load treated by activated sludge 0.06375  Proportion of load treated in bands 1-3 0.15658  Proportion of load treated in bands 4 and 5 -0.01552 Regional BCIS -0.33458 Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours translog the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). However, one of the key cost drivers (density) not included. Therefore given Amber A Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Regional BCIS is negative, proportion in bands 4 & 5 is negative, and no statistical significance of treatment specific variables (proportion of activated sludge, load treated in bands 1-3, load treated in bands 4&5). We consider BCIS to 87 be a core variable with a very unexpected coefficient, therefore given Red R Hausman test: supports the selection of random effects Goodness of fit: 0.9157674 (Adjusted R-squared: 0.9070181) Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings with little correlation with final chosen models. Very high detected inefficiency (around 40%), which seems implausible. Therefore given Red R Robustness to specification: Not very robust to refinement or adding density. Pooling test: Pooling tests indicate evidence of regional BCIS for the global financial crisis (taken as 2008/09). This caused large movements in BCIS coefficient. Also contributes to the Red traffic light. Company Cost efficiency Rank Anglian Water Services 79.0% 4 Dŵr Cymru Cyfyngedig (Welsh) 79.3% 3 Northumbrian Water Ltd 73.7% 6 Severn Trent Water Ltd 72.4% 8 South West Water Ltd 73.1% 7 Southern Water Services Ltd 71.5% 9 Thames Water Utilities Ltd 90.5% 2 United Utilities Water Plc 63.4% 10 Wessex Water Services Ltd 100.0% 1 Yorkshire Water Services Ltd 76.1% 5 88 SW4: Base sewage treatment and sludge CD RE Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewage treatment + sludge Opex + base capex 7 years Cobb- Douglas GLS Panel, random effects 70 = 10 companies x 7 years 6 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 2.041 Load .79982***   Time trend .02145***   Regional wage 1.18614***   Proportion of load treated by activated sludge 0.08802  Proportion of load treated in bands 1-3 .16168*   Sludge disposed 0.02572  Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Test carried out to see effect of CobbDouglas formulation. In general statistical testing favours the translog. Also, preferred scale variable (density) not included. Therefore given Amber A Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Generally as expected. Therefore given Green G Hausman test: supports the selection of random effects Goodness of fit: 0.9066471 (Adjusted R-squared: 0.8990643) Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings with little correlation with final chosen models. Very high detected inefficiency, therefore given Red R Robustness to specification: Not robust to including translog terms. 89 Pooling test: Pooling of variables across time was tested. Only small evidence of wage pooling for the start of PR09. Company Cost efficiency Rank Anglian Water Services 78.8% 3 Dŵr Cymru Cyfyngedig (Welsh) 78.5% 4 Northumbrian Water Ltd 72.6% 7 Severn Trent Water Ltd 69.6% 9 South West Water Ltd 73.6% 6 Southern Water Services Ltd 72.1% 8 Thames Water Utilities Ltd 90.3% 2 United Utilities Water Plc 60.4% 10 Wessex Water Services Ltd 100.0% 1 Yorkshire Water Services Ltd 74.6% 5 90 SW5: Base sewage treatment and sludge refined translog RE Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewage treatment + sludge Opex + base capex 7 years Translog GLS Panel, random effects 70 = 10 companies x 7 years 7 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 1.27055 Load .82780***   Density -.58885*   Load ^2 0.0846 Density^2 -2.87877 Load x Density -3.59445***  Time trend .02331***   Regional wage 1.28032***   Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Generally as expected. Coefficient on density suggests that more dense more dense areas can take advantage of treatment economies of scale. Therefore given Green G Goodness of fit: 0.9676082 (Adjusted R-squared: 0.964362) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. Robustness Rank correlations: Scores and rankings with in line with other preferred model for 91 testing treatment. Also similar to wholesale opex+base RE models. Therefore given Green G Robustness to specification: Not very sensitive to adding non-core variables (such as sludge disposed, proportion treated in small works, etc.). Pooling test: Pooling of variables across time was tested. Similar to Model 2, no substantial difference in coefficients over time. Company Cost efficiency Rank Anglian Water Services 91.6% 8 Dŵr Cymru Cyfyngedig (Welsh) 88.2% 9 Northumbrian Water Ltd 98.1% 3 Severn Trent Water Ltd 94.4% 5 South West Water Ltd 93.9% 6 Southern Water Services Ltd 93.2% 7 Thames Water Utilities Ltd 99.1% 2 United Utilities Water Plc 84.4% 10 Wessex Water Services Ltd 100.0% 1 Yorkshire Water Services Ltd 95.1% 4 92 SW6: Base sewage treatment and sludge refined translog COLS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewage treatment + sludge Opex + base capex 7 years Translog OLS Pooled cross- section 70 = 10 companies x 7 years 7 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 1.70913 Load .88110***   Density -.60886***   Load ^2 .12666***  Density^2 -2.47179 Load x Density -4.51344***  Time trend .02146**   Regional wage 1.12747***   Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-varying efficiency based on a pooled structure. Statistical performance Sign, size, significance of variables: Generally as expected, though time trend is a bit high. All variables (save on translog term) are individually statistically significant. Therefore given Green G Goodness of fit: 0.9712557 (Adjusted R-squared: .96801) Practical Implementation Replicability/ transparency: Good. Robustness Rank correlations: Scores and rankings with in line with other preferred model for 93 testing treatment. Therefore given Green G . Robustness to specification: Not very sensitive to adding non-core variables (such as sludge disposed, proportion treated in small works, etc.). Pooling test: Pooling of variables across time was tested. No substantial difference in coefficients over time. Company Rebased average cost efficiency Rank Anglian Water Services 95.3% 7 Dŵr Cymru Cyfyngedig (Welsh) 89.7% 9 Northumbrian Water Ltd 99.7% 2 Severn Trent Water Ltd 100.0% 1 South West Water Ltd 97.7% 3 Southern Water Services Ltd 96.2% 5 Thames Water Utilities Ltd 97.2% 4 United Utilities Water Plc 87.1% 10 Wessex Water Services Ltd 93.9% 8 Yorkshire Water Services Ltd 95.9% 6 94 SW7: Base wholesale sewerage full translog RE Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewerage Opex + base capex 7 years Translog GLS Panel, random effects 70 = 10 companies x 7 years 16 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 4.32722*** Length of sewers .83460***   Density 1.57994***  Usage 0.37907  Length^2 0.00474 Density^2 -3.53657*  Usage^2 -7.53608***  Length x Density -2.99588***  Density x Usage 8.37491**  Length x Usage 0.4672 Time trend 0.0052  Regional wage 0.4889  Proportion of sewers relined and renewed -0.00531 Sludge disposed 0.00802  Proportion of load treated by activated sludge -0.07747 Proportion of load treated in bands 1-3 .10844*   Number of works with tight consents 0.05407  Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G 95 Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because of exhaustive testing of time-varying models that have not proven robust. Statistical performance Sign, size, significance of variables: Proportion of sewers relined and renewed and activated sludge are negative but could be due to multicollinearity. Density is high. Therefore given Amber A Goodness of fit: 0.9815776 (Adjusted R-squared: 0.9774757) Hausman test: supports the selection of random effects Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores in line with other preferred models for opex+base capex. Rankings similar to Model 9 (RE wholesale) but differ from Model 8 (COLS version of this one). Therefore given Amber A Robustness to specification: Relatively robust to dropping non-core variables. Pooling test: Not conducted as model was further refined. Company Cost efficiency Rank Anglian Water Services 92.1% 5 Dŵr Cymru Cyfyngedig (Welsh) 88.8% 9 Northumbrian Water Ltd 91.8% 7 Severn Trent Water Ltd 94.6% 3 South West Water Ltd 91.5% 8 Southern Water Services Ltd 91.9% 6 Thames Water Utilities Ltd 94.2% 4 United Utilities Water Plc 80.9% 10 Wessex Water Services Ltd 96.3% 2 Yorkshire Water Services Ltd 100.0% 1 96 SW8: Base wholesale sewerage full translog COLS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewerage Opex + base capex 7 years Cobb- Douglas OLS Pooled cross- section 70 = 10 companies x 7 years 16 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 5.43947*** Length of sewers .82366***   Density 1.84637***  Usage 1.04331***   Length^2 -0.01814 Density^2 -2.81372 Usage^2 -9.30732***  Length x Density -3.65895***  Density x Usage 13.0344**  Length x Usage 1.44909***  Time trend -0.00113  Regional wage 0.27161  Proportion of sewers relined and renewed 0.01601  Sludge disposed -0.06083 Proportion of load treated by activated sludge 0.13232  Proportion of load treated in bands 1-3 .12416***   Number of works with tight consents .13118*   Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero can be rejected, i.e. translog is statistically preferred). Therefore given Green G 97 Efficiency specification Time-varying efficiency based on a pooled structure. Statistical performance Sign, size, significance of variables: Generally as expected. Density is high, sludge disposed is negative (could be due to multicollinearity). Therefore given Amber A Goodness of fit: 0.9875257 (Adjusted R-squared: .98376) Practical Implementation Replicability/ transparency: Average. Robustness testing Rank correlations: Scores and rankings in line with preferred OLS model for wholesale opex+base capex (Model 10). Apart from this, scores and rankings show little correlation with Model 7 (full version of this one). Therefore given Amber A Robustness to specification: Relatively robust to dropping non-core variables. Pooling test: Not conducted as model was further refined. Company Rebased average cost efficiency Rank Anglian Water Services 92.1% 6 Dŵr Cymru Cyfyngedig (Welsh) 90.2% 9 Northumbrian Water Ltd 92.8% 3 Severn Trent Water Ltd 92.7% 4 South West Water Ltd 94.0% 2 Southern Water Services Ltd 92.3% 5 Thames Water Utilities Ltd 90.8% 8 United Utilities Water Plc 86.4% 10 Wessex Water Services Ltd 91.5% 7 Yorkshire Water Services Ltd 100.0% 1 98 SW9: Base wholesale sewerage refined translog RE Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewerage Opex + base capex 7 years Translog GLS Panel, random effects 70 = 10 companies x 7 years 8 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 2.48948***  Density 0.04286 Load .88260***   Density^2 -2.64727 Load^2 0.00753 Load x Density -2.06762***  Time trend .02429***   Regional wage 1.19874***   Proportion treated in bands 1-3 .15554**   Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-invariant efficiency based on a panel structure. Not a concern because we have tested an exhaustive amount of time-varying models and they have not proven robust. Statistical performance Sign, size, significance of variables: Generally as expected. Time trend is a bit high but more in line with treatment than network. Most variables statistically significant. Therefore given Green G Goodness of fit: 0.9643004 (Adjusted R-squared: 0.9578228) Hausman test: supports the selection of random effects Practical Replicability/ transparency: Average. 99 Implementation Robustness testing Rank correlations: Scores similar to Model 7 (other RE wholesale model) but rankings change with respect to preferred OLS model for opex+base capex (Model 10). Therefore given Amber A Robustness to specification: Relatively refined and stable. Pooling test: Pooling tests carried out on wages. Evidence of inconsequential (very little effect on coefficients) wage pooling for the start of PR09. Company Cost efficiency Rank Anglian Water Services 83.6% 9 Dŵr Cymru Cyfyngedig (Welsh) 85.4% 6 Northumbrian Water Ltd 86.6% 4 Severn Trent Water Ltd 84.9% 7 South West Water Ltd 85.6% 5 Southern Water Services Ltd 83.8% 8 Thames Water Utilities Ltd 90.6% 3 United Utilities Water Plc 72.1% 10 Wessex Water Services Ltd 100.0% 1 Yorkshire Water Services Ltd 93.7% 2 100 SW10: Base wholesale sewerage refined translog COLS Basic description Value chain element Expenditure level Capex smoothing Functional form Estimator Data structure Number of observations Number of indepen- dent variables Sewerage Opex + base capex 7 years Translog OLS Pooled cross- section 70 = 10 companies x 7 years 8 Econometric results Variable Coefficient Statistically significant Expected sign/ magnitude of coefficient (confidence interval) Constant 3.39158***  Density 0.05006 Load .97713***   Density^2 -1.21131 Load^2 .10208*** Load x Density -3.78995***  Time trend .02006**   Regional wage .84660**   Proportion treated in bands 1-3 .12711**   Criteria for choosing the best model(s) Theoretical correctness Functional form seems correct? Statistical testing favours the translog (the null hypothesis that the coefficients on the second order terms in the translog are zero is rejected, i.e. translog is statistically preferred). Therefore given Green G Efficiency specification Time-varying efficiency based on a pooled structure. Statistical performance Sign, size, significance of variables: Generally as expected. Most variables have statistical significance. Therefore given Green G Goodness of fit: 0.9578228 (Adjusted R-squared: .97334) Practical Implementation Replicability/ transparency: Average. Robustness Rank correlations: Scores similar and rankings similar to other wholesale COLS model (Model 8) but rank correlations with respect to preferred RE model (Model 9) rather low. 101 testing Therefore given Amber A Robustness to specification: Relatively refined and stable. Pooling test: Pooling tests carried out on wages. No substantial difference in coefficients over time. Company Rebased average cost efficiency Rank Anglian Water Services 93.8% 7 Dŵr Cymru Cyfyngedig (Welsh) 90.5% 9 Northumbrian Water Ltd 94.8% 5 Severn Trent Water Ltd 98.9% 2 South West Water Ltd 96.0% 3 Southern Water Services Ltd 94.8% 6 Thames Water Utilities Ltd 95.8% 4 United Utilities Water Plc 81.4% 10 Wessex Water Services Ltd 92.7% 8 Yorkshire Water Services Ltd 100.0% 1 102 ANNEX 6: EFFICIENCY CALCULATIONS AND CHALLENGES In this section we discuss efficiency calculations and adjustments in more detail. The notions of efficiency and inefficiency are well known in the academic and regulatory literature. Underpinning these concepts is the idea that there exists an efficiency frontier representing best practice, against which all firms may be judged. Inefficiency and in turn efficiency scores are then computed relative to this frontier, with frontier firms obtaining a score of unity. Whilst the definition of efficiency relative to a frontier is clear, in economic regulation a number of different methodologies have been adopted, each with different assumptions. In turn economic regulators have applied regulatory judgement to the “raw outputs” of cost efficiency models. Below we therefore set out both the assumptions of the models we have adopted, how inefficiency is calculated in those models to generate the “raw output”, and also then how those raw outputs might be used to arrive at an appropriate efficiency challenge for the companies for forecasting purposes. Although in the end we are interested in the efficiency challenge to forecast companies’ expenditure, we first explain the methodology for calculating the company efficiency scores in the historical models under the different estimation methods in our final models: GLS (RE) and OLS. This is mostly done for completeness although we also use those efficiency scores to evaluate the robustness of different models under one of our selection criteria. A6.1 Calculating efficiency The rationale for calculating efficiencies stems from the assumption about the model error term, namely that the residuals can be decomposed into random noise and inefficiency. In some cases, most notably standard panel model applications (i.e. fixed and random effects models), by making certain assumptions it is possible to obtain efficiency scores without making assumptions regarding the distributions of the noise and inefficiency components. In other cases, in what are described in the literature as stochastic frontier models, it is necessary to make assumptions about the distributions of the two terms in order to decompose the residual (typically it is assumed that the noise term is normally distributed and the inefficiency term takes a “one-sided” half normal distribution). Since these distributional assumptions may be considered arbitrary, other methods which do not rely on those assumptions may be preferred. As indicated, GLS (RE) and COLS are implemented in different ways and make different assumptions. As such, we calculate the comparative efficiencies for these methods using different methodologies. These are discussed in turn below. A6.1.1 Random effect models The RE regression is given by the following equation: 103 where α is the constant, βp are the parameter coefficients of the variables included, µi is the timeinvariant company effect, and is the error term, which varies across company and time. The residual is then equal to . In the standard panel data literature, this residual is deemed to capture noise and (time invariant) unobserved heterogeneity between companies (in the random effects model the latter is assumed to be uncorrelated with the variables included in the model). This model has however also been applied to give an efficiency interpretation, such that the company effect terms estimated, after a suitable transformation, are interpreted as efficiency scores. As indicated by µi, RE assumes that efficiency for a company stays constant over the period modelled. The standard panel data literature sets out the method for computing the company effects. First, average each company’s residuals over time to get to a single average residual value for each company.54 This can be thought of as giving us the time invariant company effect (which will be given an efficiency interpretation for our purposes), leaving the time varying part of the residual to represent random noise. Here the assumptions that efficiency remains constant over time and that the expected value of the error term is zero are crucial in obtaining efficiency scores. In the standard panel literature the analysis would stop at that point, with the average residuals having identified the company effects and these effects would not normally be of much interest. However, to go further and obtain efficiency scores from these average residuals, the literature specifies that the company with the minimum average residual is identified, and this corresponds to the most efficient company during the period, i.e. this is the frontier company. This company’s efficiency is thus 100% (score of one) and the rest of the companies are benchmarked against it. The efficiencies of the other companies are calculated by subtracting the frontier company’s average residual from their individual average residuals and calculating the exponent of the negative of that value. This indicates their position (rank) with respect to the frontier line, i.e. the most efficient company. The averages efficiencies calculated in this way are simply indicative of company rankings and relative positions. The average and other efficiency adjustments that need to be made to the predicted values are discussed in Section A6.2. A6.1.2 Pooled OLS models (COLS efficiency) We compute the COLS efficiency scores in a different way due to the assumptions that the pooled OLS models make about the error term. The regression equation that corresponds to this type of model is the following: where α is the constant, βp are the parameter coefficients of the variables included, and εi is the error term, which varies across company and time. The standard OLS model as used in cost function (as opposed to cost frontier) modelling assumes that there is no inefficiency in the model, with the error term comprising entirely noise. Such models are sometimes referred to in 54 Note that in our models the dependent variable and explanatory variables are specified as natural logarithms of the underlying variables. This has the implication on the arithmetic discussed in this section. For example, subtracting logged values actually reflects division of absolute values. 104 the literature as average response functions. In the corrected ordinary least squares (COLS) approach, OLS is used to estimate the parameters of the model, but inefficiency is incorporated into the model by adjusting the estimated constant term by shifting the OLS line down so that it passes through the maximum negative residual. With this interpretation, all deviations from the frontier are assumed to be inefficiency (there is no noise). Since this is a strong assumption we explain below how appropriate efficiency targets may be obtained from this model. In this model, efficiency is permitted to vary across firms and over time. The efficiency is allowed to vary over time in a very flexible way, though the assumption that inefficiency varies independently over time could also be questioned. We would normally assume some structure to changes in firm performance over time, i.e. that there is a noise component in the calculated inefficiency. We explain how the derivation addresses these two strong assumptions below, i.e. efficiency varying across time and the error term being interpreted as inefficiency. To compute the raw efficiency scores from this model, we identify the minimum residual in each individual year and benchmark the companies within each year against these. This allows for different companies to be at the frontier in each year. Under this interpretation the movement in the OLS line (the time trend as calculated in the model) is a change in the average cost of all firms, not a frontier shift, and the frontier shift then is computed using the time trend plus the difference in the minimum residual from year to year. This has the advantage that in the last year the frontier goes through the firm with the lowest cost in that year. To get the average score over the last five years (comparable with RE), we average the efficiency scores of each company across the period. However, as noted earlier the assumption that all deviations from the frontier represent inefficiency is a very strong assumption. The averaged efficiency scores can then be rebased so that the company with the highest average score becomes 100% efficient. This adjusts the position of the frontier, which would have otherwise been calculated without taking account of noise (with reference to other approaches in the regulatory literature, this approach is similar in nature to the use of, for example, an upper quartile adjustment). A6.2 Applying efficiency challenges All the econometric models run for water and sewerage allow for different efficiency challenges to be applied to the predicted expenditures resulting from inserting business plan values for the explanatory variables into the estimated regression equations. The most common options are adjusting companies’ costs to the average industry efficiency line, to the frontier company or to the upper quartile industry line. As these lines are identified based on models using historical data, the efficiency challenges applied to the forecasts will reflect the average efficiency level in the previous price control. The estimated regression line includes a time trend and therefore if we apply the coefficient on the time trend (estimated for example based on the last five years of data) we are implicitly assuming that the frontier shift (or ongoing efficiency changes) in the future will be the same in the next five years. It is possible of course for the regulator to impose a different assumption if needed. The predicted line, , is calculated in different ways under OLS (method used for pooled OLS models) and GLS (used for RE). Therefore each of these models requires a 105 different method for adjusting the predicted line to yield the same type of efficiency challenged forecast using the approach based on residuals. Another alternative would be to calculate the efficiency adjustment based on the ratio between actual costs and predicted values, which is our preferred option as discussed in Section 5.2. The latter is not based on the model residuals and can be done at a more aggregated level. Residual based approaches would be preferable, but have important drawbacks in terms of feasibility and appropriateness when combining multiple models. As discussed in section 5.2, ratio approaches can only be applied in a consistent manner at the individual model level. This raises issues of potential cherry-picking, replicability, and transparency (discussed in section 5.2). We discuss the adjustments that would need to be made to the forecasts under the three types of efficiency challenges below. They all essentially result in shifting the average prediction line down by a certain percentage as illustrated in Figure A6.1 below. Figure A6.1: Illustrating efficiency adjustments LQ Average UQ Frontier A6.2.1 Average industry line The term average efficiency refers to applying a challenge to the value predicted by the model that is consistent with a notional company that exhibits the average efficiency of the industry. That is why, in some cases, such as a pooled OLS model, average efficiency forecasts do not require any adjustments to be made to the predicted values as they are based on the average industry line. This is also the case for RE, for which no adjustments need to be made to the predicted values if one wants to predict the average industry efficiency. If, however, Ofwat would like to have a regression line that the average firm in the sample (not the population) lies on, a small adjustment needs to be made to the values predicted. This is needed because although the 106 average firm in the population is expected to have a residual of zero, this is not the case in the sample and no firm will actually lie on the regression line. We calculate this adjustment for RE models as the difference between the negative of the logged average of the company efficiency scores and the minimum average residual out of all the companies. This can also be expressed in terms of a percentage adjustment to the absolute rather than the logged value. Although we have explained how Ofwat could make this adjustment should they think it appropriate, we recommend that no adjustment is made as the equation will yield the average industry line for a notional average firm. No adjustment is needed when it comes to the ratio-based approach either. A6.2.2 Frontier company line Frontier efficiency adjustments reflect the rationale that all companies in an industry are expected to catch up with the most efficient company and should thus be challenged to do so by applying stricter adjustments to the predicted values. The predicted line in the RE model excludes the noise and firm specific effects. Therefore, to get to the frontier, as defined by the best firm in the sample, we need to shift the predicted line down by the frontier company’s average residual. We do not recommend using the frontier efficiency for pooled OLS models because of the strong no-noise assumption. However, if Ofwat should decide to use frontier efficiency to challenge companies’ costs, the predicted values of the pooled OLS model should be adjusted by the average of the minimum residuals in each of the last five years as different companies are allowed to be at the frontier in different years. The ratio-based calculation is done in the following way. We first calculate the efficiency scores of each company by dividing the company’s actuals by the estimated value (A). Take the minimum of those efficiencies and apply it to the estimated value (A) to shift the line down to that company. A6.2.3 Upper quartile industry line Regulators often use an upper quartile efficiency challenge instead of a frontier challenge as it mitigates the risk of identifying a frontier based on misinterpreting residuals as inefficiency instead of noise. It also sets a more achievable target for companies considering the five-year timeline. To make an upper quartile adjustment to the predicted values of both RE and pooled OLS models, we use the upper quartile residual instead of the minimum residual. As noted in Section A6.1.2, the averaging and rebasing process implied in computing what we referred to as frontier efficiency scores for the pooled OLS models does in fact shift the frontier to some extent. If this method was used it needs to be recognised that application of an upper quartile adjustment in addition to the above approach would result in further deviation from the frontier (though a regulator may consider that such an approach is appropriate based on applying its regulatory judgement). 107 The ratio-based calculation is done in the following way for the UQ. We first calculate the efficiency scores of each company by dividing the company’s actuals by the estimated value (A). Take the lower quartile of those ratios (this corresponds to the upper quartile for the industry). Multiply the estimated value (A) by the upper quartile calculated in the previous step. A6.3 Summary of efficiency adjustments The table below summarises the approaches taken to adjust the different models for each type of efficiency challenge. They are worked out in a spreadsheet that CEPA has provided to Ofwat separately. In the table, the adjustments described are expressed in terms of changes that need to be done to the logged predicted values but the spreadsheets also provide the % adjustments to the absolute values. Table A6.1: Efficiency challenge adjustments to the predicted values Average Frontier Upper quartile RE None Predicted – min (average residuals) Predicted – upper quartile (average residuals) Pooled OLS None Predicted – average (min residuals over last five years) Predicted – average (upper quartile residuals over last five years) Ratio- based None       Predicted Actual Min       Predicted Actual quartilelower 108 ANNEX 7: LOGARITHMIC TRANSFORMATION OF PREDICTED VALUES There are several ways to transform values predicted by log-linear equation into absolute values. Some of the transformation methods require an adjustment to the exponent of the predicted value, while others do not. For the preferred models in this analysis, all transformations have a minor impact on the predicted values because of the sample size. Here we explain the different approaches to log transformation. The rationale behind making an adjustment to the exponent of the log value is that the expected value of the error is zero in logarithmic terms. However, there is no consensus in the academic literature that an adjustment needs to be made, particularly for large samples and financial variables, as argued by William Greene in his textbook Econometric Analysis. The literature on the topic has explored the following approaches to transforming logarithmic data back to costs.  Naive estimator: makes no adjustment to account for the expected value of the logged error being equal to 0.  Conditional mean estimator: makes an adjustment and assumes normal distribution of the errors.  Smearing estimator: does not need to assume normal distribution of the errors. In practice, it yields very similar results to the conditional mean estimator for the sample size that we have.  The “alpha factor” (Ofgem): an adjustment factor that Ofgem used for electricity in 2009 but we have not been able to find supporting literature for it. This is the coefficient of the regression when running the actual cost (£m) on the predicted costs (£m transformed from logs) without a constant. Ofgem state that this should only be used when the errors are homoscedastic otherwise the correction factor is not constant.55 For the models in this report, this does not yield results much different from the other two. This factor also assumes normal distribution of the residuals. The table below summarises the formulae for the different estimators listed above. In these equations, e is the exponent, εi is the ith residual, and N is the sample size. Table A7.1: Log transformation adjustments formulae Estimator Adjustment formula Naive estimator No adjustment Conditional mean estimator Smearing estimator Alpha factor (Ofgem) Coefficient of the regression when running the actual cost (£m) on the predicted costs (£m 55 Ofgem, RIIO-GD1: Initial proposals – Step-by-step guide for the cost efficiency assessment methodology, August 2012, page 12. 109 Estimator Adjustment formula transformed from logs) without a constant The size of the adjustment decreases as sample size increase. Across all our models, it is very close to 100%. The adjustment factors for the individual water models are presented in Table A7.2 below. It is up to Ofwat to decide which estimator to use. Table A7.2: Water models adjustment factors Model Alpha factor Smearing estimator Conditional mean adjustment Naïve estimator WM3 99.7% 100.4% 100.2% 100.0% WM5 101.7% 101.0% 100.5% 100.0% WM6 101.7% 100.6% 100.6% 100.0% WM9 100.4% 101.1% 100.4% 100.0% WM10 99.4% 100.6% 100.6% 100.0% Table A7.3 below provides the same information for the final sewerage models. Table A7.3: Sewerage models adjustment factors Model Alpha factor Smearing estimator Conditional mean adjustment Naïve estimator SM1 101.1% 101.0% 101.0% 100.0% SM5 100.7% 100.4% 100.4% 100.0% SM6 100.1% 100.4% 100.4% 100.0% SM9 101.9% 100.5% 100.5% 100.0% SM10 100.1% 100.3% 100.3% 100.0% 110 ANNEX 8: NON-NORMALISED COEFFICIENTS OF FINAL MODELS A8.1 Water The coefficients presented in Annex 4 are not the ones directly used in Ofwat’s feeder models. They are at the sample mean and we have presented them in this way for easy interpretation and to facilitate model comparison during the model selection process. The non-normalised versions of those model results are presented in Table A8.1 below. These are the ones used in Ofwat’s feeder models along with Jacobs’s exogenous variables to estimate the AMP6 initial threshold. We note that the normalised and the non-normalised models are identical, the only difference is in the presentation of the coefficients of the translog variables (length, density, usage). The rest of the variables should have identical coefficients in both versions. We have presented these results only for the five final models used in triangulation. Table A8.1: Final water models non-normalised coefficients Variable WM3 WM5 WM6 WM9 WM10 Length 3.19829 2.85571 2.91246 1.69157 1.82854 Density -0.659653 0.74489 -0.280996 -2.0025 -2.16202 Usage -0.488767 Length^2 -0.0307684 -0.0225912 -0.0191715 0.0143949 0.0191233 Density^2 1.15405 1.06674 0.94174 0.239944 0.353792 Usage^2 -0.246949 Length x Density 0.647287 0.512217 0.557174 0.358754 0.448631 Length x Usage -0.00603146 Density x Usage -0.0631846 Time trend 0.0119295 -0.00674629 -0.0031923 -0.000768856 0.00941448 Regional wage 1.49168 0.719568 0.957711 0.280084 0.901165 Population density -0.560555 0.989236 0.494968 2.03158 1.05336 Proportion of metered properties -0.775792 Sources -0.292716 Pumping head 0.122031 Proportion of water input from river abstractions 0.00224101 0.0201406 0.0118232 0.00477246 0.00387934 Proportion of water input from reservoirs -0.015007 -0.0139714 -0.0122937 -0.006542 0.00213703 Proportion of new meters 0.0284604 Proportion of new mains -0.030748 Proportion of mains relined and renewed 0.0290132 0.0650153 0.0556453 0.0599445 0.0376357 111 Variable WM3 WM5 WM6 WM9 WM10 Properties below reference pressure level 0.0029499 Leakage volume -0.200091 Properties affected by unplanned interruptions > 3 hrs 0.00778847 Properties affected by planned interruptions > 3 hrs 0.0266115 Proportion of usage by metered household properties 0.5006 Proportion of usage by metered non-household properties -0.170731 Constant -22.5662 -15.1977 -17.1336 -12.774 -14.6658 A8.2 Sewerage Table A8.2 provides the same for the final sewerage models used in triangulation. They are the non-normalised versions of the models shown in Annex 5. Table A8.2: Final sewerage models non-normalised coefficients Variable SM1 SM5 SM6 SM9 SM10 Length 11.2973 Density 50.4841 70.3423 78.6241 49.3735 59.1464 Load 14.1175 17.0439 9.58371 14.659 Length^2 0.07573 Denstiy^2 -2.41709 -2.87877 -2.47179 -2.64728 -1.21131 Load^2 0.0846 0.12666 0.00753 0.10208 Length x Density -2.80243 Load x Density -3.59439 -4.51344 -2.0676 -3.78995 Time Trend 0.01923 0.02331 0.02146 0.02429 0.02006 Regional Wage 0.65243 1.28032 1.12747 1.19874 0.8466 Proportion of load treated in Bands 1-3 0.15554 0.12711 Constant -170.353 -244.736 -281.202 -171.011 -224.343 112 ANNEX 9: RECOMMENDATIONS FOR PR19 Coming out of the model testing, we consider there are a few areas where Ofwat can collect more data to allow the models to account for elements that we were not able to this time round. A9.1 Capacity measures In the dataset currently available to Ofwat, there is no reliable measure of network or treatment capacity (be it in water or in sewerage). In network this would mean taking the diameter as well as the length of the sewers/mains, while in treatment it would reflect the spare capacity of treatment works that could take on additional input/load. The few variables related to capacity that are available from the June Returns seem to be of low data quality. Since a proportion of companies’ costs depend on the equipment capacity, we believe that including such variables in the models could further improve the results. A9.2 Usage measure In sewerage, the load variable, used in usage takes into account both the volume and strength of the sewage. Since only the volume drives network costs, a better measure of usage would be the one that only takes volume into account (not both).