Model-based Geostatistics
Hannes Kazianka1
Vienna University of Technology
Jürgen Pilz2
Alpen-Adria University of Klagenfurt
Keywords and Phrases: Spatial Statistics, Geostatistics, Bayesian Kriging, Transformed-Gaussian Kriging, Correlation Function, Geometric Anisotropy, , Generalized Linear Geostatistical Model
Stochastic models for spatial data
Diggle and Ribeiro (2007) and Mase (2011) describe geostatistics as a branch of spatial statistics that deals with statistical methods for the analysis of spatially referenced data with the following properties. Firstly, values , , are observed at a discrete set of sampling locations within some spatial region , . Secondly, each observed value is either a measurement of, or is statistically related to, the value of an underlying continuous spatial phenomenon, , at the corresponding sampling location . The term model-based geostatistics refers to geostatistical methods that rely on a stochastic model. The observed phenomenon is viewed as a realization of a with continuous spatial index, a so-called random field.
Such a random field is fully determined by specifying all multivariate distributions, i.e. for arbitrary and . Since a full characterization of a random field is usually impossible, the mean function and the covariance function play a prominent role. Thereby, represents the trend while defines the dependence structure of the random field. It is typical that the assumption of weak (second-order) isotropy is made about the random field, i.e. its mean function is constant and its covariance function depends on and only through , where denotes the Euclidean distance. In this case is called an isotropic autocovariance function. The covariance function is directly related to smoothness properties of the random field such as mean square continuity and differentiability. A widely used parametric family of isotropic autocovariance functions is the Matèrn family
where denotes the modified Bessel function of order , is a called the ``range parameter'' controlling how fast the covariance decays as the distance gets large, is called the ``nugget parameter'' and is the proportion of the variance attributed to measurement error, controls the variance and denotes the vector of correlation parameters. The parameter controls the smoothness of the corresponding process. A thorough mathematical introduction to the theory of random fields is given in Stein (1999) and Yaglom (1987).
The most important geostatistical model is the linear Gaussian model
 |
(1) |
where is an isotropic zero-mean Gaussian random field with autocovariance function , is a vector of location-dependent explanatory variables and is the vector of regression parameters. The likelihood function for the linear Gaussian model is
where denotes the correlation matrix, is the design matrix and is the vector of observations. The maximum likelihood estimates for and in the linear Gaussian model are
Plugging these estimates into the log-likelihood, we arrive at the so-called profiled log-likelihood, which just contains the parameters
To obtain we have to maximize the latter equation for numerically. Note that this maximization problem is a lot simpler than the maximization of the complete likelihood where and are additional unknowns, especially when is large. Spatial prediction, which is often the goal in geostatistics, is performed based on the estimated parameters. The plug-in predictive distribution for the value of the random field at an unobserved location is Gaussian
N
|
(4) |
where , , , .
Weak isotropy is a rather strong assumption and environmental processes are typically not direction independent but show an anisotropic behavior. A popular extension to isotropic random fields is to consider random fields that become isotropic after a linear transformation of the coordinates (Schabenberger and Gotway (2005)). This special variant of anisotropy is called geometric anisotropy. Let be an isotropic random field on with autocovariance function and mean . For the random field , where , we get that and the corresponding autocovariance function is . When correcting for geometric anisotropy we need to revert the coordinate transformation. has the same mean as but isotropic autocovariance function . When correcting for stretching and rotation of the coordinates we have
Here, and are called the anisotropy ratio and anisotropy angle, respectively. All the models that we consider in this chapter can be extended to account for geometric anisotropy by introducing these two parameters.
The first steps towards Bayesian modeling and prediction in geostatistics were made by Kitanidis (1986) and Omre (1987) who developed a Bayesian version of universal kriging. One of the advantages of the Bayesian approach, besides its ability to deal with the uncertainty about the model parameters, is the possibility to work with only a few measurements. A disadvantage of Bayesian kriging over the non-Bayesian methodology is certainly the increased computational complexity. Assume a Gaussian random field model in the form of the form Eq. (1) with known covariance matrix but unknown parameter vector . From Bayesian analysis we know that it is natural to assume a prior of the form N for , where is a positive semidefinite matrix. It can be shown that the posterior distribution for is
 N
where and . The predictive distribution of is also Gaussian and given by
 N
where , and are defined as in Section 1. From the above representation of the Bayesian kriging predictor it becomes clear that Bayesian kriging bridges the gap between simple and universal kriging. We get simple kriging in case of complete knowledge of the trend, which corresponds to , whereas we get the universal kriging predictor if we have no knowledge of ( in the sense that the smallest eigenvalue of converges to infinity). Interestingly, the Bayesian universal kriging predictor has a smaller or equal variance than the classical universal kriging predictor (see Eq. (4)) since , where denotes the Loewner partial ordering.
Bayesian universal kriging is not fully Bayesian because is assumed known. Diggle and Ribeiro (2007) summarize the results for a fully Bayesian analysis of Gaussian random field models of the form Eq. (1), where and is the range parameter of an isotropic autocorrelation function model.
Probably the simplest way to extend the Gaussian random field model is to assume that a differentiable transformation of the original random field, , is Gaussian. The mean of the transformed field is unknown and parameterized by , . If we assume that the transformation function and the covariance function of are known, the optimal predictor for can be derived using the results from Section 1. However, in practice neither nor is known and we have to estimate them from the data.
A family of one-parameter transformation functions that is widely used in statistics is the so-called Box-Cox family
The Box-Cox transformation is valid for positive-valued random fields and is able to model moderately skewed, unimodal data.
The likelihood of the data in the transformed Gaussian model can be written as
where, , is the determinant of the Jacobian of the transformation, and is the transformation parameter. De Oliveira et al. (1997) point out that the interpretation of changes with the value of , and the same is true for the covariance parameters and , to a lesser extent though. To estimate the parameters and , we make use of the profile likelihood approach that we have already encountered in Section 1. For fixed values of and , the maximum likelihood estimates for and are given by Eqs. (2)-(3) with replaced by . Again, the estimates for and cannot be written in closed form and must be found numerically by plugging and in the likelihood for numerical maximization.
The estimated parameters are subsequently used for spatial prediction. To perform a plug-in prediction we make use of the conditional distribution of the Gaussian variable and back-transform it to the original scale by . A Bayesian approach to spatial prediction in the transformed Gaussian model is proposed in De Oliveira et al. (1997).
The copula-based geostatistical model (Kazianka and Pilz (2010)) also works with transformations of the marginal distributions of the random field and is a generalization of transformed Gaussian kriging. In this approach all multivariate distributions of the random field are described by a copula (Genest (2011)) and a family of univariate marginal distributions. Due to the additional flexibility introduced by the choice of the copula and of the marginal distribution, these models are able to deal with extreme observations and multi-modal data.
(McCullagh and Nelder (1989)) provide a unifying framework for regression modeling of both continuous and discrete data. Diggle and Ribeiro (2007) extend the classical generalized linear model to what they call the generalized linear geostatistical model (GLGM). The responses , , corresponding to location are assumed to follow a family of univariate distributions indexed by their expectation, , and to be conditionally independent given . The are specified through
where is a Gaussian random field with autocovariance function and is a pre-defined link function. The two most frequently applied GLGMs are the Poisson log-linear model, where is assumed to follow a Poisson distribution and the link function is the logarithm, and the binomial logistic-linear model, where is assumed to follow a Binomial distribution, , with and . These models are suitable for representing spatially referenced count data and binomial trials.
Since maximum likelihood estimation of the parameters is difficult, a Monte Carlo (Robert and Casella (2004)) approach is proposed to sample from the posteriors of the model parameters as well as from the predictive distributions at unobserved locations . The algorithm proceeds by sampling from , from and from with the help of Metropolis-Hastings updates. At iteration and actual sample , perform the following steps:
- (a).
- Update
. For , sample a new proposal from the conditional Gaussian distribution , where denotes with its th element removed. Accept with probability .
- (b).
- Update
. Sample a new proposal from a proposal distribution . Accept the new proposal with probability .
- (c).
- Update
. Sample a new proposal from a proposal distribution . Accept the new proposal with probability
- (d).
- Draw a sample
from the conditional Gaussian distribution .
If point predictions for are needed, the Monte Carlo approximation to the expected value of can be used, i.e. , where is the number of .
- De Oliveira, V., Kedem, B. & Short, D. (1997) Bayesian prediction of transformed Gaussian fields. Journal of the American Statistical Association 92, 1422-1433.
- Diggle, P. & Ribeiro, P. (2007) Model-based geostatistics. New York: Springer
- Genest, C. (2011) Copulas in inference. In: Lovric, M. International Encyclopedia of Statistical Science. Heidelberg: Springer Science+Business Media, LLC.
- Kazianka, H. & Pilz, J. (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stochastic Environmental Research and Risk Assessment 24, 661-673.
- Kitanidis, P. (1986) Parameter uncertainty in estimation of spatial function: Bayesian analysis. Water Resources Research 22, 499-507.
- Mase, S. (2011) Geostatistics and kriging predictors. In: Lovric, M. International Encyclopedia of Statistical Science. Heidelberg: Springer Science+Business Media, LLC.
- McCullagh, P. & Nelder, J. (1989) Generalized linear models. Boca Raton: Chapman & Hall/CRC.
- Omre, H. (1987) Bayesian kriging - merging observations and qualified guesses in kriging. Mathematical Geology 19, 25-39.
- Robert, C. & Casella, G. (2004) Monte Carlo statistical methods. New York: Springer.
- Schabenberger, O. & Gotway, C. (2005) Statistical methods for spatial data analysis. Boca Raton: Chapman & Hall/CRC.
- Stein, M. (1999) Interpolation of spatial data. New York: Springer.
- Yaglom, A. (1987) Correlation theory of stationary and related random functions. New York: Springer.
Reprinted with permission from Lovric, Miodrag (2011), International Encyclopedia of Statistical Science. Heidelberg: Springer Sicnece+Business Media, LCC
Footnotes
-
1
- Vienna University of Technology, Wiedner Hauptstrasse 8 /105-1, 1040 Vienna, Austria. Email: hannes.kazianka@fam.tuwien.ac.at
-
2
- Alpen-Adria University of Klagenfurt, Universitaetsstrasse 65-67, 9020 Klagenfurt, Austria. Email: juergen.pilz@uni-klu.ac.at
|