Document 3C SWEEPING METHODS FOR ANALYSIS OF VARIANCE The conventional approach to analysis of variance is to calculate sums of squares for recognizable components of the total variation and to estimate the random variance from the remaining, "error", sum of squares. In a simple design, such as the Randomized Complete Block Design we can recognize that because of the orthogonality of blocks and treatments (each treatment occurs once in each block) the sums of squares for blocks and for treatments can be calculated quite independently. In more complicated, but still orthogonal,designs, such as two replicates of a factorial structure with four two-level factors arranged in four blocks of eight plots per block, we have to identify from the properties of the design which interaction sum of squares cannot be calculated (because of the confounding system). In designs where each block contains a (different) subset of the set of treatments, blocks and treatments are not orthogonal. The sums of squares for blocks and for treatments are not now calculable independently and we have to think about the order in which we fit the terms "Blocks" and "Treatments" in the same way as for fitting terms in multiple regression. That is, we calculate the sum of squares for Blocks (ignoring treatments) and then the sum of squares for Treatments (after allowing for block differences). For some particular designs, such as lattices or Balanced Incomplete Block Designs, standard methods for calculating the analysis of variance are given in text books (from Cochran and Cox onwards) and are available in some statistical computing packages. For other designs, with less regular patterns of treatment subsets in blocks, the analysis of variance can be calculated using powerful packages such as GENSTAT or (somewhat tediously) through a multiple regression package (an example of the multiple regression approach is given in Document 3A, section 7). Whether our experiment is simple or more complex there is an element of the "sausage machine" about the calculations for the analysis of variance. This is particularly true of the calculation of the error sum of squares. There is an alternative approach to the analysis of variance and the estimation of treatment and block effects which, I believe, provides more insight into the concepts of the analysis and particularly the error sum of squares. It is not new and is in fact the basis of some of the better (and more flexible) statistical packages, but it does not appear to be widely known. Using this method we can analyse any design structure with no more than a pocket calculator (though for really complex structures the calculations may be rather tedious). The method is that of "Sweeping" the data. In sweeping we identify the sets of effects (Blocks, Treatments, Main Effects, Main Plot Effects. Rows, Columns) which we wish to allow for in our analysis. Each yield will be labelled by one effect from each set: that is, each yield is in one block, has one treatment, etc. Essentially we define a model expressing the yield for each plot as a sum of several components. For each set of effects in turn we estimate the effects and then subtract from each yield the value of the appropriate effect. After adjusting the yields to allow for all the relevant sets of effects we are left with the residuals which represent the random variation, not explicable by the sets of effects which we have considered, and the error sum of squares is simply the sum of the squared residuals. At any intermediate stage of the analysis the sum of squares of the currently adjusted yields provides a measure of the variation not yet accounted for.