Feasible GLS estimation requires estimates of the variance components σu2 and σe2. This chapter discusses key features of ANOVA, ML, REML, and GEE estimation methods that can be used to estimate these variance components and that are used in the empirical analysis. To keep the presentation manageable, the discussion does not focus on other methods, such as bootstrap, jackknife, and other resampling methods. De Leeuw and Meijer (2008) provide an excellent, more detailed discussion of GLS estimators for multilevel models.
For simplicity of exposition, in what follows, let the symbol Ωi represent a generic covariance matrix for school i, Xi represent a generic covariate matrix for a school, and δij =ui +eij represent a generic normally distributed error for a student.
Balanced Design Estimator
When mi =m for all schools—that is, for balanced designs—a consistent variance estimator for thesimple differences-in-means estimator has the following simple form:
is the variance of the mean outcome between schools (see, for example, Cochrane 1963). This estimator is consistent because E(SB2) =σu2 +(σe2 /m)(see (19)).
If covariates are included in the model, (20) can be generalized using the following between-school regression model:
where yi is the school-level mean, qi is a k1x1 vector of school-level covariates (that could include student-level covariates averaged to the school level), and δi =(ui+ei)is the school-level error term. Estimating (21) by OLS yields the following variance estimator for α1:
where X =[K T Q] is the covariate matrix and RSSB is the regression residual sum of squares.
For balanced designs, α̂1,Balanced is an ANOVA or REML estimator and is minimum variance unbiased under normality of the error terms (Searle 1971). This estimator, however, has no optimal properties for unbalanced designs. Nonetheless, it is appealing due to its simplicity, because it is based entirely on the between-school OLS regression, and produces serviceable estimates for designs that are not too highly unbalanced (which is typically the case in practice). As discussed next, estimating variance components to account for unbalanced designs becomes considerably more complex.
The ANOVA estimator is a method-of-moments estimator that equates regression residual sums of squares to their unobserved expectations and solves these equations to obtain estimators for the variance components. ANOVA methods have the advantage that the variance components can be obtained in one step using easily-understood OLS regression residuals, rather than iteratively, as is the case for the ML, REML, and GEE methods. The disadvantage of the ANOVA methods is that for unbalanced designs, they have no optimal properties beyond asymptotic unbiasedness.
This section discusses the Swamy and Arora (SA; 1972) ANOVA method that was adapted for unbalanced designs by Baltagi and Chang (1994). De Leeuw and Meijer (2008) and Baltagi and Chang (1994) discuss alternative ANOVA estimators that are similar to the SA method.
Under the SA method, an estimator for the student-level variance, σe2, is obtained by first estimating a within-school OLS regression:
This yields the following consistent variance estimator for σe2:
where RSSW is the regression residual sum of squares from (23) and k2 is the number of student-level (within school) covariates.
To obtain an estimator for σu2, the SA method uses the residual sum of squares RSSB from the between-school regression in (21) where schools are weighted by their sample sizes. In this case, RSSB =δ̂′Wδ̂ =δ′B′WBδ, where W is an nxn diagonal weight matrix with weights mi along the diagonal, and B =[IM -X(X′WX)-1X′W]. Using matrix algebra, it can be shown that:
where tr is the matrix trace operator. Thus, a consistent estimator for σu2 is:
which could be negative (leading to a negative ICC estimate).
The estimators for σe2 and σu2 in (24) and (25) can be inserted into (16) and (17) to obtain feasible GLS estimates. Note that for balanced designs, this approach yields (22). The ANOVA approach can be implemented using SAS (see Table 3.1).
Maximum Likelihood Estimator
ML methods simultaneously estimate ATE parameters and variance components, and are often used to estimate linear mixed models (such as HLM models) that are popular in the education field (see, for example, Raudenbush and Bryk 2002). ML estimators are consistent and asymptotically efficient, but do not take into account the loss in degrees of freedom due to the regression coefficients in estimating the variance components.
To demonstrate the ML method, it is convenient to express Ωi as σe2Λi, where
where Imi is the identity matrix, Jmi is an mixmi matrix of 1s, and λ =σu2 /σe2. Note that Λi-1 –(mi +λ-1)-1 Jmi. Because of the normality assumption, the log likelihood is:
where |Λi| denotes the determinant of Λi
Taking derivatives in (27) with respect to the parameters and setting them equal to zero yields the following closed-form solutions for α̂ and σ̂e2 (for a given λ̂):
Equation (28) is the feasible GLS estimator in (16).
The first-order condition for λ is a nonlinear equation that must be solved numerically:
One common iterative method is the Newton-Raphson method where λ̂(iter +1) is updated from λ̂(iter) as follows:
is the Hessian matrix.5 Other iterative methods use -E(H) =.5 Σ-1 tr[Λi-1 Jmi Λi-1 Jmi]in (31) rather than H (Fisher scoring) or the expectation-maximization (EM) algorithm (see Little and Rubin 2002).
The model parameters can then be estimated using the following steps: (1) obtain an initial value for λ̂ (for example, using the ANOVA method), (2) calculate α̂MLE and σ̂e,MLE2 using (28) and (29), (3) update α̂ using (30)-(32), and (4) return to Step (2) until convergence is achieved. Final feasible GLS estimators can then be obtained using (16) and (17). Table 3.1 displays statistical package routines that use the ML method.
Note that most statistical packages impose a non-negativity constraint for α̂ at each iteration. Murray (1998) and Stroup and Littell (2002) demonstrate through simulations, however, that this constraint could deflate the Type I error rate and reduce statistical power. Thus, these authors recommend that options be used in statistical packages that allow for negative variance component estimates. A similar issue applies to the REML estimator discussed next.
Restricted Maximum Likelihood Estimator
Unlike the ML approach, the REML approach for the variance components adjusts for the degrees of freedom loss due to the estimation of the regression parameters (Patterson and Thompson 1971). The REML approach separates the likelihood into two independent parts, one of which depends only on the variance components (the part of interest). The approach profiles out the covariates by finding a linear combination of the outcomes, y* =Ly, whose distribution does not depend on α, where L is a (M -k)xM matrix and k is the rank of the covariate matrix X.6
To find L, consider first the OLS regression residuals r =Py, where P =(IM -X(X′X)-1X′ is an MxM idempotent matrix. Note that because y is assumed to have a multivariate normal distribution, r~N(0,PΩP′), which is independent of α. Thus, a solution for L (which has (M -k) rows) can be obtained from P by satisfying the relation P =L′L subject to the normalizing condition IM-k =LL′. Such a solution can be found using the eigenvectors (E) and eigenvalues of P. Note that because P is idempotent, (M -k) eigenvalues will be one (say, the first ones) and the remaining k will be zero. Thus, L can be calculated as the first (M -k) rows of E′.
Using this L, it follows that y* =Ly ~N (0,LΩL ′), whose distribution is independent of α. The REML log likelihood can be obtained using this distribution. Harville (1974) shows, however, that an equivalent log likelihood that shows more clearly the way that the regression parameters are profiled out of the likelihood can be expressed as follows:
where αGLE is given in (16). This likelihood can be maximized with respect to σe2 and λ using the methods discussed above for the ML estimator (not shown). REML estimates of the variance components can then be used in (16) and (17) to obtain feasible GLS estimators. REML estimates are asymptotically equivalent to ML estimates, but the REML approach tends to produce larger standard errors due to the degrees of freedom adjustments. Table 3.1 displays statistical package routines that use the REML method.
The GEE estimator (discussed above) is also a feasible GLS estimator for the SP model. The ATE parameter estimate obtained from (7) yields the feasible GLS estimator in (16), and the model-based GEE variance estimator using (8) yields the feasible GLS variance estimator in (17). The GEE empirical sandwich variance estimator is I0-1I1I0-1 where
Under the GEE method, the variance components are estimated iteratively by updating regression residuals rij. In Step (1), OLS residuals are used to obtain the following consistent estimates for σu2 and σe2:
These estimates are then used to calculate α̂GLS in (16). In Steps (2) to (4), new residuals are calculated, (34) and (35) are updated, and new estimates of α̂GLS are obtained. Steps (2) to (4) are then continued until convergence is achieved. Table 3.1 displays statistical package routines that use this method that require the specification of an exchangeable working correlation matrix.