Login
In Cooperation with:

American Society for Quality Statistics Division

American Statistical Association

Bernoulli Society for Mathematical Statistics and Probability

Institute of Mathematical Statistics

International Biometric Society

International Chinese Statistical Association

International Society for Bayesian Analysis

International Statistical Institute

Royal Statistical Society

Statistical Society of Canada / Société statistique du Canada
Generalized Linear Models
ref|
GENERALIZED LINEAR MODELS
Joseph M. Hilbe
Arizona State University
1. HISTORY Generalized Linear Models (GLM) is a covering algorithm allowing for the estima- tion of a number of otherwise distinct statistical regression models within a single frame- work. First developed by John Nelder and R.W.M. Wedderburn in 1972, the algorithm and overall GLM methodology has proved to be of substantial value to statisticians in terms of the scope of models under its domain as well as the number of accompanying model statistics facilitating an analysis of fit. In the early days of statistical computing - from 1972 to 1990 - the GLM estimation algorithm also provided a substantial savings of computing memory compared to what was required using standard maximum likelihood techniques. Prior to Nelder and Wedderburn's efforts, GLM models were typically estimated using a Newton-Raphson type full maximum likelihood method, with the exception of the Gaussian model. Commonly known as normal or linear regression, the Gaussian model is usually estimated using a least squares algorithm. GLM, as we shall observe, is a generalization of ordinary least squares regression, employing a weighted least squares algorithm that iteratively solves for parameter estimates and standard errors. In 1974, Nelder coordinated a project to develop a specialized statistical application called GLIM, an acronym for Generalized Linear Interactive Modeling. Sponsored by the Royal Statistical Society and Rothamsted Experimental Station, GLIM provided the means for statisticians to easily estimate GLM models, as well as other more complicated models which could be constructed using the GLM framework. GLIM soon became one of the most used statistical applications worldwide, and was the first major statistical application to fully exploit the PC environment in 1981. However, it was discontinued in 1994. Presently, nearly all leading general purpose statistical packages offer GLM modeling capabilities; e.g. SAS, R, Stata, S-Plus, Genstat, and SPSS. 2. THEORY Generalized linear models software, as we shall see, allows the user to estimate a variety of models from within a single framework, as well as providing the capability of changing models with minimal effort. GLM software also comes with a host of standard residual and fit statistics, which greatly assist researchers with assessing the comparative worth of models. Key features of a generalized linear model include 1) having a response, or dependent variable, selected from the single parameter exponential family of probability distributions, 2) having a link function that linearizes the relationship between the fitted value and explanatory predictors, and 3) having the ability to be estimated using an Iteratively Re-weighted Least Squares (IRLS) algorithm. The exponential family probability function upon which GLMs are based can be expressed as where the distribution is a function of the unknown data,
where
Table 1 presents the standard probability distribution functions (PDF) belonging to the GLM family. Table 1. GLM families : canonical
Each of the distributions in Table 1 are members of the exponential family. It should be noted, however, that the three continuous GLM distributions are usually parameterized with two rather than one parameter: Gaussian, gamma, and inverse Gaussian. Within the GLM framework though, the scale parameter is not estimated, although it is possible to point-estimate the scale value from the dispersion statistic, which is typically displayed in GLM model output. Binomial and count models have the scale value set at 1.0. As a consequence, Table 2 provides the formulae for the deviance and log-likelihoods of each GLM family. Also provided is the variance for each family function. The first line of each GLM distribution or family shows the deviance, with the next two providing the log-likelihood functions parameterized in terms of
Table 2. GLM variance, deviance, and log-likelihood functions
Note that the link and cumulant functions for each of the above GLM log-likelihood functions can easily be abstracted from the equations, which are formatted in terms of the exponential family form as defined in Equation 1. For example, the link and cumulant of the Bernoulli distribution, upon which logistic regression is based, are respectively In GLM terminology, The link function may be inverted such that or Another key feature of generalized linear models is the ability to use the GLM algorithm to estimate non-canonical models; i.e. models in which the link function is not directly derived from the underlying pdf, i.e, The probit and log-linked negative binomial (NB-2) models are two commonly used non-canonical linked regression models. The probit link is often used with the binomial distribution for probit models. Although the probit link is not directly derived from the binomial PDF, the estimates of the GLM-based probit model are identical to those produced using full maximum likelihood methods. The canonical negative binomial (NB-C) is not the traditional negative binomial used to model overdispersed Poisson data. Rather, the use of the log link with the negative binomial (LNB) family duplicates estimates produced by full maximum likelihood NB-2 commands. However, like all non-canonical models, the standard errors of the LNB are slightly different from those of a full maximum likelihood NB-2, unless the traditional GLM algorithm in Table 5 is amended to produce an observed information matrix that is characteristic of full maximum likelihood estimation. The information derived from the algorithm given in Table 5 uses an expected information matrix, upon which standard errors are based. Applications such as Stata's The negative binomial family was not added to commercial GLM software until 1993 (Stata), and is in fact a member of the GLM family only if its ancillary or heterogeneity, parameter is entered into the algorithm as a constant. Setting the ancillary parameter, The ability to incorporate non-canonical links into GLM models greatly extends the scope of models which may be estimated using its algorithm. Commonly used non-canonical models are shown in Table 3. Table 3. Foremost Non-Canonical Models
The link, inverse link, and first derivative of the link for the canonical functions of the standard GLM families, as well as the most used non-canonical functions, are given in Table 4. Table 4, GLM link functions (* canonical)
3. IRLS ALGORITHM Generalized linear models have traditionally been modeled using an Iteratively Re-Weighted Least Squares (IRLS) algorithm. IRLS is a version of maximum likelihood called Fisher Scoring, and can take a variety of forms. A standard IRLS schematic algorithm is given in Table 5. Note that the text to the right of the //'s in Table 5 are comments, not part of the programming code. They provide information regarding the operation performed. Table 5. Generic GLM Estimating Algorithm (Expected Information Matrix)
4. GOODNESS-OF-FIT GLM models are traditionally evaluated as to their fit based on the deviance and Pearson Chi2, or The Pearson dispersion statistic is used with Poisson, negative binomial, and binomial models as an indicator of excessive correlation in the data. Likelihood based models, being derived from a PDF, assume that observations are independent. When they are not, correlation is observed in the data. Values of the Pearson dispersion greater than 1.0 indicate more correlation in the data than is warranted by the assumptions of the underlying distribution. Some statisticians have used the deviance statistic on which to base the dispersion, but simulation studies have demonstrated that Pearson is the correct statistic. See Modeling count data in this Encyclopedia for additional information. From the outset, generalized linear models software has offered users a number of useful residuals which can be used to assess the internal structure of the modeled data. Pearson and deviance residuals are the two most recognized GLM residuals associated with GLM software. Both are observation-based statistics, providing the proportionate contribution of an observation to the overall Pearson Chi2 and deviance fit statistics. The two residuals are given, for each observation, as: The Pearson Chi2 and deviance fit can also be calculated on the basis of their residuals by taking the square of each of the residuals respectively, and summing them over all observations in the model. However, they are seldom calculated in such a manner. Both the Pearson and deviance residuals are usually employed in standardized form. The standardized versions of the Pearson and deviance residuals are given by dividing the respective statistic by Another residual now finding widespread use is the Anscombe residual. First implemented into GLM software in 1993, it now enjoys use in many major software applications. The Anscombe residuals are defined specifically for each family, with the intent of normalizing the residuals as much as possible. The general formula for Anscombe residuals is given as ![]() with ![]() with 5. APPLICATION Consider data from the 1912 Titanic disaster. Information was collected on the survival status, gender, age, and ticket class of the various passengers. With
Generalized linear models No. of obs = 1316 Optimization : ML Residual df = 1313 . Scale parameter = 1 Deviance = 1276.200769 (1/df) Deviance = .973456 Pearson = 1356.674662 (1/df) Pearson = 1.03484 Variance function: V(u) = u . (1 - u) (Bernoulli) Link function: g(u) = ln(u/(1 - u) (Logit) . AIC = .9773562 Log likekihood = -638.1003845 BIC = .8139.863 ================================================================================= survived | Odds ratio OIM Std. Err. z P>|z| [95% Conf. Interval] ================================================================================= . age | .3479809 .0844397 -4.35 0.000 .2162749 .5598924 . sex | .0935308 .0135855 -16.31 0.000 .0703585 .1243347 . class1 | 5.84959 .9986265 10.35 0.000 4.186109 8.174107 . class2 | 2.129343 .3731801 4.31 0.000 1.510315 3.002091 ================================================================================= Using
Reprinted with permission from Lovric, Miodrag (2011), International Encyclopedia of Statistical Science. Heidelberg: Springer Science +Business Media, LLC |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






