Title: | Mapping Smoothed Effect Estimates from Individual-Level Data |
---|---|
Description: | Contains functions for mapping odds ratios, hazard ratios, or other effect estimates using individual-level data such as case-control study data, using generalized additive models (GAMs) or Cox models for smoothing with a two-dimensional predictor (e.g., geolocation or exposure to chemical mixtures) while adjusting linearly for confounding variables, using methods described by Kelsall and Diggle (1998), Webster at al. (2006), and Bai et al. (2020). Includes convenient functions for mapping point estimates and confidence intervals, efficient control sampling, and permutation tests for the null hypothesis that the two-dimensional predictor is not associated with the outcome variable (adjusting for confounders). |
Authors: | Lu Bai, Scott Bartell, Robin Bliss, and Veronica Vieira |
Maintainer: | Scott Bartell <[email protected]> |
License: | GPL-3 |
Version: | 1.3 |
Built: | 2025-03-06 03:32:48 UTC |
Source: | https://github.com/cran/MapGAM |
Contains functions for estimating and mapping hazard ratios, odds ratios, rate ratios, and continuous outcomes from individual-level data such as case-control study data, using generalized additive models (GAMs) or extended Cox models. This allows for smoothing with a two-dimensional predictor such as geolocation or exposure to chemical mixtures, while adjusting for confounding variables using methods described by Kelsall and Diggle (1998), Webster at al. (2006), Bai et al. (2020), and Tang et al. (2023). Includes convenient functions for mapping, efficient control sampling, confidence intervals, and permutation tests for the null hypothesis that the two-dimensional predictor is not associated with the outcome variable, adjusting for confounders.
Package: | MapGAM |
Type: | Package |
Version: | 1.3 |
Date: | 2023-07-12 |
License: | GPL-3 |
Typical spatial applications will start with the predgrid function to create a regular grid of points within the study area, with optional map boundaries (e.g., a country, state, or regional map). Crude or adjusted odds ratios, hazard ratios, or continuous outcomes are then estimated at each grid point using the modgam function to smooth by geolocation. Finally, the predicted values (and optionally, "clusters"–areas of signficantly increased or decreased values determined via permutation tests or confidence intervals) are plotted using the plot or colormap functions. Most users will use the plot function to create maps from modgam objects, but colormap is provided for backwards compatibility as well as for creating maps from models fit using functions other than modgam. The trimdata and sampcont functions can be used to restrict data to those within map boundaries and to conduct simple or spatiotemporal stratified sampling from eligible controls. The optspan function can be used to find an optimal span size for the LOESS smoother used by the modgam function; it is automatically used within the modgam function when the span size is not provided by the user. These functions can also be applied to non-spatial data when two-dimensional smoothing is of interest, such as investigation of the effects of a mixture of two chemicals.
Lu Bai, Scott Bartell, Robin Bliss, and Veronica Vieira
Send bug reports to [email protected].
Bai L, Gillen DL, Bartell SM, Vieira VM. doi:10.32614/RJ-2020-001Mapping smoothed spatial effect estimates from individual-level data: MapGAM. The R Journal 2020, 12:32-48.
Hastie TJ, Tibshirani RJ. Generalized Additive Models. (Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Boca Raton, Florida, 1990).
Kelsall J, Diggle P. doi:10.1111/1467-9876.00128Spatial variation in risk of disease: a nonparametric binary regression approach. J Roy Stat Soc C-App 1998, 47:559-573.
Tang IW, Bartell SM, Vieira VM. doi:10.1016/j.sste.2023.100584Unmatched Spatially Stratified Controls: A simulation study examining efficiency and precision using spatially-diverse controls and generalized additive models. Spatial and Spatio-temporal Epidemiology 2023, 45:100584.
Vieira V, Webster T, Weinberg J, Aschengrau A, Ozonoff D. doi:10.1186/1476-069X-4-11Spatial analysis of lung, colorectal, and breast cancer on Cape Cod: An application of generalized additive models to case-control data. Environmental Health 2005, 4:11.
Webster T, Vieira V, Weinberg J, Aschengrau A. doi:10.1186/1476-072X-5-26Method for mapping population-based case-control studies using Generalized Additive Models. International Journal of Health Geographics 2006, 5:26.
Young RL, Weinberg J, Vieira V, Ozonoff A, Webster TF. doi:10.1186/1476-072X-9-37A power comparison of generalized additive models and the spatial scan statistic in a case-control setting. International Journal of Health Geographics 2010, 9:37.
trimdata
,
sampcont
,
predgrid
,
optspan
,
modgam
,
colormap
.
plot.modgam
.
# Load synthetic data and a preformatted base map data(MAmap) data(MAdata) # Create a grid on the base map gamgrid <- predgrid(MAdata, map=MAmap) # Fit a GAM with a smooth term for spatial location fit1 <- modgam(data=MAdata, rgrid=gamgrid, m="crude", sp=0.5) # Display odds ratio estimates on the base map plot(fit1, MAmap, exp=TRUE) #### See colormap and modgam help files for more examples
# Load synthetic data and a preformatted base map data(MAmap) data(MAdata) # Create a grid on the base map gamgrid <- predgrid(MAdata, map=MAmap) # Fit a GAM with a smooth term for spatial location fit1 <- modgam(data=MAdata, rgrid=gamgrid, m="crude", sp=0.5) # Display odds ratio estimates on the base map plot(fit1, MAmap, exp=TRUE) #### See colormap and modgam help files for more examples
modgam
ObjectThis function summarizes the AIC of a modgam object produced by modgam
.
## S3 method for class 'modgam' AIC(object, ...)
## S3 method for class 'modgam' AIC(object, ...)
object |
a modgam object. |
... |
extra arguments for S3 generic, ignored by |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
Geocoded tweets from Twitter, with an indicator variable for any mention of beer, time stamp, and state of origin.
data(MAmap)
data(MAmap)
A data frame with 10000 observations on 5 variables:
beer
1 for tweets about beer, 0 for other tweets.
longitude
geocoded longitude.
latitude
geocoded latitude.
state
a factor with the name of the state.
time
a list of POSIXlt format dates and times for the tweets.
A sample of geocoded tweets from within in the contiguous US, from June to October of 2012. Tweets mentioning beer (cases) are oversampled by a factor of 100. Geocoding is typically at the level of city or town; tweets that could not be geocoded were excluded from this data set.
Dr. Matthew Zook, University of Kentucky, floatingsheep.org
data(beertweets) attach(beertweets) plot(longitude,latitude,col=beer+1) # beer in red, non-beer in black
data(beertweets) attach(beertweets) plot(longitude,latitude,col=beer+1) # beer in red, non-beer in black
Survival time, censoring status, and geolocations (jittered to preserve anonymity) for 5000 ovarian cancer patients in California, using a Lambert projection (in meters). CAmap
is a map of California using the same projection.
data(CAdata)
data(CAdata)
A data frame with 5000 observations on the following 6 variables.
time
survival time.
event
censoring status.
X
projected X coordinate.
Y
projected Y coordinate.
AGE
patient age.
INS
insurance type: Mng
(Managed Care: managed care, HMO, PPO and other private insurance), Mcr
(Medicare), Mcd
(Medicaid), Oth
: (Other Insurance: military, county-funded), Uni
(Not Insured, self-pay) and Unk
(Unknown).
data(CAdata) summary(CAdata) plot(CAdata$X,CAdata$Y)
data(CAdata) summary(CAdata) plot(CAdata$X,CAdata$Y)
An evenly-spaced grid of prediction points approximately 5km apart that extended across the latitude and longitude coordinates of participants' locations throughout California with over 7,500 points. Areas with sparse population data are clipped.
data(CAgrid)
data(CAgrid)
A data frame with 7579 geolocations.
X
projected X coordinate.
Y
projected Y coordinate.
Dr. Veronica Vieira, University of California, Irvine
data(CAgrid) data(CAmap) plot(CAmap) points(CAgrid)
data(CAgrid) data(CAmap) plot(CAmap) points(CAgrid)
A map of the outline of CA in SpatialPolygonsDataFrame format, converted from an ESRI shapefile using the readShapePoly
function in the maptools package.
data(CAmap)
data(CAmap)
The format is class SpatialPolygonsDataFrame (package "sp").
State Plane Projected coordinate system, North America Datum 1983 (NAD_1983_StatePlane_California_I_FIPS_0401) for California, using standard parallels 40.00000000 and 41.66666667. The latitude of origin is 39.3, the central meridian is -122.0 and the projection units are meters (False Easting: 200000 m; False Northing: 500000 m).
Dr. Veronica Vieira, University of California, Irvine
data(CAmap) plot(CAmap)
data(CAmap) plot(CAmap)
gamcox
Object
This function provides coefficients of a gamcox object produced by gamcox
.
## S3 method for class 'gamcox' coef(object,...)
## S3 method for class 'gamcox' coef(object,...)
object |
a |
... |
. |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) coef(fit)
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) coef(fit)
modgam
ObjectThis function provides coefficients of a modgam object produced by modgam
.
## S3 method for class 'modgam' coef(object, ...)
## S3 method for class 'modgam' coef(object, ...)
object |
a modgam object. |
... |
extra arguments for S3 generic, ignored by |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
Displays a color image map, including a legend, scale bar, and optional North arrow, showing values for a grid of points. Irregular grids are allowed. Also draws contour lines for regions of signficantly increased or decreased predicted values ("clusters").
colormap(obj, map=NULL, exp=FALSE, add=FALSE, mapmin=NULL, mapmax=NULL, col.seq=rev(rainbow(201,start=0,end=0.66)), anchor=FALSE, border.gray = 0.3, contours="none", contours.drawlabels=FALSE, contours.lty=1, contours.lwd=1, contours.levels, contours.labcex=0.7, interval.exclude=0, arrow=TRUE, axes=FALSE, ptsize=0.9, mai, legend.name = "predicted values",legend.cex=1, legend.add.line,alpha,...)
colormap(obj, map=NULL, exp=FALSE, add=FALSE, mapmin=NULL, mapmax=NULL, col.seq=rev(rainbow(201,start=0,end=0.66)), anchor=FALSE, border.gray = 0.3, contours="none", contours.drawlabels=FALSE, contours.lty=1, contours.lwd=1, contours.levels, contours.labcex=0.7, interval.exclude=0, arrow=TRUE, axes=FALSE, ptsize=0.9, mai, legend.name = "predicted values",legend.cex=1, legend.add.line,alpha,...)
obj |
(Required) A list containing at least these two elements: "grid" (a 2 column data frame with X and Y coordinates) and "fit" (the value to displayed on the map). |
map |
Can be used to map |
exp |
If |
add |
Use |
mapmin |
The minimum value for the color scale legend. When not specified, it will be set to the minimum predicted value. |
mapmax |
The maximum value for the color scale legend. When not specified, it will be set to the maximum predicted value. |
col.seq |
The color sequence (palette) used to display the predicted values on the map. The default is a rainbow palette, but divergent or sequential palettes may be better for some applications. See the colorspace library and the example below for other options. |
anchor |
Use |
border.gray |
Gray scale for the border of |
contours |
When |
contours.drawlabels |
Use |
contours.lty |
The line type for contour lines |
contours.lwd |
The width for contour lines. |
contours.levels |
The levels for contour lines. When |
contours.labcex |
cex for contour labelling. This is an absolute size, not a multiple of |
interval.exclude |
If |
arrow |
Use arrow=T to add a North arrow to the map. |
axes |
Use axes=T to add axes to the map (useful for chemical mixture isoboles). |
ptsize |
The size of the points used to fill in colors on the map. Increase to remove white space inside the map or decrease to remove color outside the map boundaries. NOTE: white space can also be eliminated by increasing the grid size in |
mai |
The margins of the plot. Details see par, |
legend.name |
The name of displayed for the legend bar. |
legend.cex |
A numerical value giving the amount by which legend text should be magnified relative to the default. |
legend.add.line |
A numerical value at which a line will be added on the legend bar to indicate the color for the value. |
alpha |
Levels |
... |
Other auguments past to plot function |
Produces an image map. If the base map is in readOGR
format a scale bar is included, showing whatever measurement units are recorded in the shapefile. If the
units are missing from the shapefile conversion they are assumed to be meters.
Scott Bartell, Lu Bai, Robin Bliss, and Veronica Vieira
Send bug reports to [email protected].
trimdata
,
predgrid
,
plot.modgam
.
data(MAdata) data(MAmap) obj <- list(grid=data.frame(MAdata$Xcoord,MAdata$Ycoord),fit=MAdata$Mercury) colormap(obj, MAmap, legend.name = "mercury") # map the same data using a divergent color palette anchored to the median if (require(colorspace)) { newpal <- diverge_hsv(201) # from the colorspace library colormap(obj, MAmap, legend.name = "mercury", col.seq = newpal, legend.add.line=round(median(obj$fit),2), anchor = TRUE) }
data(MAdata) data(MAmap) obj <- list(grid=data.frame(MAdata$Xcoord,MAdata$Ycoord),fit=MAdata$Mercury) colormap(obj, MAmap, legend.name = "mercury") # map the same data using a divergent color palette anchored to the median if (require(colorspace)) { newpal <- diverge_hsv(201) # from the colorspace library colormap(obj, MAmap, legend.name = "mercury", col.seq = newpal, legend.add.line=round(median(obj$fit),2), anchor = TRUE) }
Calculate the log partial likelihood and derivatives with respect to the subject log hazard ratio (compared to the baseline) for Cox proportional hazard additive model described in gamcox
. Results are used to update estimates in gamcox
function.
dls(Y,X,which,eta,span=0.5,adjust=TRUE)
dls(Y,X,which,eta,span=0.5,adjust=TRUE)
Y |
a list including two elements: |
X |
a data frame containing the variables in the model. The data must be structured so that the X and Y coordinates for two-dimensional predictor (e.g., geolocation) are in the 1st and 2nd columns, respectively. |
which |
matrix index for smooth term. |
eta |
current estimated subject log hazard ratio compared to the baseline. |
span |
smoothing parameter that been used to smoothing the second derivative of the log partial likelihood. |
adjust |
|
For data that having tied failure times, Efron's approximation method is used to calculate the log partial likelihood and correspongding derivatives. Let denote the log hazard ratio, and l denote the partial likelihood. When fitting a Cox proportional hazard additive model,
is updated by
deltaeta |
difference between the input |
w |
inverse of smoothed second derivatives. |
l |
partial likelihood baed on input |
Lu Bai and Scott Bartell
Send bug reports to [email protected].
Hastie TJ, Tibshirani RJ. Generalized Additive Models. (Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Boca Raton, Florida, 1990).
Bristow RE, Chang J, Ziogas A, Gillen DL, Bai L, Vieira VM. Spatial Analysis of Advanced-stage Ovarian Cancer Mortality in California. American Journal of Obstetrics and Gynecology 2015, 213(1), e1-43).
data(CAdata) Y = CAdata[,c("time","event")] X = CAdata[,c(3:5)] eta = coxph(Surv(time,event)~AGE,data=CAdata)$linear.predictors result = dls(Y,X,1:2,eta,span=0.2) plot(eta,result$deltaeta)
data(CAdata) Y = CAdata[,c("time","event")] X = CAdata[,c(3:5)] eta = coxph(Surv(time,event)~AGE,data=CAdata)$linear.predictors result = dls(Y,X,1:2,eta,span=0.2) plot(eta,result$deltaeta)
gamcox
Object
This function provides the formula of a gamcox object produced by gamcox
.
## S3 method for class 'gamcox' formula(x,...)
## S3 method for class 'gamcox' formula(x,...)
x |
a |
... |
. |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) formula(fit)
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) formula(fit)
modgam
ObjectThis function provide the formula of a modgam object produced by modgam
.
## S3 method for class 'modgam' formula(x, ...)
## S3 method for class 'modgam' formula(x, ...)
x |
a modgam object. |
... |
extra arguments for S3 generic, ignored by |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
Fits a Cox proportional hazard additive model with a two-dimensional LOESS smooth for geolocation (or any other two-dimensional predictor). gamcox
uses the backfitting algorithm to combine the smoothing and fitting methods. The smoothing method currently supported is local regression (LOESS).
gamcox(formula, data, subset, weights, span=0.5, I.span=0.2, degree = 1, loess.trace = "exact", Maxiter = 40, tol = 1e-07) gamcox.fit(Y, X, smooth.frame, weights, span=0.5, I.span=0.2, degree = 1, loess.trace = "exact", Maxiter = 40, tol = 1e-07)
gamcox(formula, data, subset, weights, span=0.5, I.span=0.2, degree = 1, loess.trace = "exact", Maxiter = 40, tol = 1e-07) gamcox.fit(Y, X, smooth.frame, weights, span=0.5, I.span=0.2, degree = 1, loess.trace = "exact", Maxiter = 40, tol = 1e-07)
formula |
a formula expression (required), with the response on the left of a ~ oprator, and the predictors on the right. The response must be a survival object as returned by the Surv function. A built-in nonparametric smoothing term is indicated by |
data |
a data frame containing the variables in the model.If not found in |
Y , X
|
for |
smooth.frame |
the model matrix for the smooth term. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
weights |
an optional vector of weights to be used in the fitting process. |
span |
the span size for the LOESS smooth of the two-dimensional predictor, which controls the degree of smoothing. |
I.span |
the span size for the LOESS smooth of the Fisher information. |
degree |
the degree of the polynomials to be used for LOESS smooth, normally 1 or 2. |
loess.trace |
whether the trace of the smoother matrix be computed exactly ( |
Maxiter |
the maximum number of iterations in backfitting algorithm. |
tol |
the tolerence threshold for convergence. |
The model used to fit the data is a Cox proportional hazard additive model with a LOESS smooth for a two-dimensional predictor such as geolocation (Hastie and Tibshirani, 1990):
where is the hazard at time t for participant i,
and
are predictor coordinates for participant i (i.e., projected distance east and north, respectively, from an arbitrarily defined origin location), S(.,.) is a 2-dimensional smoothing function (currently LOESS),
is a row vector of covariate values for participant i, and
is a vector of unknown regression parameters. See the references for more details.
gamcox
returns an object of class gamcox
. It can be examined by print
and predict
.
coefficients |
a named vector of coefficients for the parametric part of the additive predictors, which multiply the columns of the model matrix. The names of the coefficients are the names of the single-degree-of-freedom effects (the columns of the model matrix). If the model is overdetermined there will be missing values in the coefficients corresponding to inestimable coefficients. |
additive.predictors |
the additive fit, given by the product of the model matrix and the coefficients, plus the columns of the |
smooth |
estimated smoothing term. Nonlinear part of the spatial effect on survival rates. |
var |
the approximate pointwise variances for the columns of smooth. |
residuals |
the residuals from the final weighted additive fit; also known as residuals, these are typically not interpretable without rescaling by the weights. |
deviance |
up to a constant, minus twice the maximized log-likelihood. Similar to the residual sum of squares. Where sensible, the constant is chosen so that a saturated model has deviance zero. |
df.residual |
the residual degrees of freedom. |
aic |
AIC of the fitted model. |
Lu Bai
Send bug reports to [email protected].
Hastie TJ, Tibshirani RJ. Generalized Additive Models. (Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Boca Raton, Florida, 1990).
Bristow RE, Chang J, Ziogas A, Gillen DL, Bai L, Vieira VM. Spatial Analysis of Advanced-stage Ovarian Cancer Mortality in California. American Journal of Obstetrics and Gynecology 2015, 213(1), e1-43)
data(CAdata) data(CAmap) fit <- gamcox(Surv(time,event)~AGE + factor(INS) + lo(X,Y),data=CAdata, span=0.2,loess.trace="approximate") fit pred = predict(fit) colormap(list(fit=pred$pred,grid=data.frame(X=CAdata$X,Y=CAdata$Y)),map=CAmap, border.gray=0.5)
data(CAdata) data(CAmap) fit <- gamcox(Surv(time,event)~AGE + factor(INS) + lo(X,Y),data=CAdata, span=0.2,loess.trace="approximate") fit pred = predict(fit) colormap(list(fit=pred$pred,grid=data.frame(X=CAdata$X,Y=CAdata$Y)),map=CAmap, border.gray=0.5)
90 cases and 910 controls with random smoking covarate values and random geolocations within Massachusetts, geocoded on a Lambert projection (in meters). MAmap
is a map of Massachusetts using the same projection.
data(MAdata)
data(MAdata)
A data frame with 1000 observations on the following 6 variables.
Case
0 for controls, 1 for cases.
Xcoord
projected X coordinate.
Ycoord
projected Y coordinate.
Smoking
0 for nonsmokers, 1 for smokers.
Mercury
continuous variable for mercury exposure.
Selenium
continuous variable for selenium exposure.
Lambert conformal conic projection for the State of Massachusetts, using standard parallels 41.71666667 and 42.68333333. The latitude of origin is 41.0, the central meridian is -71.5, and the projection units are meters (False Easting: 200000 m; False Northing: 750000 m).
Simulated data provided by package authors
data(MAdata) summary(MAdata) attach(MAdata) # map participants, cases in red and controls in black plot(Xcoord,Ycoord,col=Case+1)
data(MAdata) summary(MAdata) attach(MAdata) # map participants, cases in red and controls in black plot(Xcoord,Ycoord,col=Case+1)
A map of the outline of MA in SpatialPolygonsDataFrame format, converted from an ESRI shapefile using the readShapePoly
function in the maptools package.
data(MAmap)
data(MAmap)
The format is class SpatialPolygonsDataFrame (package "sp").
Lambert conformal conic projection for the State of Massachusetts, using standard parallels 41.71666667 and 42.68333333. The latitude of origin is 41.0, the central meridian is -71.5, and the projection units are meters (False Easting: 200000 m; False Northing: 750000 m).
Dr. Veronica Vieira, University of California, Irvine
data(MAmap) plot(MAmap)
data(MAmap) plot(MAmap)
Print version and dates for MapGAM package and functions.
MapGAM.version()
MapGAM.version()
Scott Bartell Send bug reports to [email protected].
Fits a crude or adjusted regression on a user-supplied grid for spatial analysis using a generalized additive model with a two-dimensional LOESS smooth for geolocation (or any other two-dimensional predictor). Includes optional pointwise confidence intervals and permutation tests for global and local tests of the null hypothesis that the two-dimensional predictor (e.g., geolocation) is not associated with the outcome. Most applications will pass the output of this function to plot.modgam
to map the resulting odds ratios or other effect estimates.
modgam(formula, rgrid, data, subset, offset, family = binomial(), permute = 0, conditional = TRUE, pointwise = FALSE, m = "adjusted", sp = seq(0.05,0.95,0.05), degree=1, keep=FALSE, type=c("spatial","all"), reference = "median", se.fit = FALSE, alpha = 0.05, verbose = TRUE, ...)
modgam(formula, rgrid, data, subset, offset, family = binomial(), permute = 0, conditional = TRUE, pointwise = FALSE, m = "adjusted", sp = seq(0.05,0.95,0.05), degree=1, keep=FALSE, type=c("spatial","all"), reference = "median", se.fit = FALSE, alpha = 0.05, verbose = TRUE, ...)
formula |
a formula expression, with the response on the left of a ~ oprator, and the predictors on the right. If fitting a Cox additive model, the response must be a survival object as returned by the Surv function. A built-in nonparametric smoothing term is indicated by |
rgrid |
a data frame containing the values of X and Y coordinates at which to generate predictions. If missing, the predictions will be generated for the two-dimentional predictor in |
data |
a data frame containing the variables in the model. If |
family |
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
offset |
name in data and rgrid that will be used as offset in the model. It will be ignored if offset is specified in |
permute |
the number of permutations of the data set for testing the significance of the two-dimensional predictor. |
conditional |
a logical value indicating whether to run a conditional or unconditional permutation test; this argument is used only when |
pointwise |
a logical value indicated whether to include pointwise permutation tests for every grid point. This argument is only used
when |
m |
model type for the GAM. Options are "crude" or "adjusted" (default). If "crude", only the smooth for the two-dimensional predictor is included in the model (i.e., "crude" is a synonym for "unadjusted"). If "adjusted", all covariates in the data set (columns >= 4) are included in the model. |
sp |
the span size for the LOESS smooth of the two-dimensional predictor. If |
degree |
the degree of the polynomials to be used for LOESS smooth, normally 1 or 2. It will be ignored if the degree is indicated in |
keep |
a logical value indicating whether to store and return the pointwise odds ratios, and if conditional = FALSE the span size, for every permuted data set. These values aren't necessary for mapping or cluster identification, and storing them slows the permutation tests, so the default is |
type |
use |
reference |
referent for the predicted values or effects. If |
se.fit |
if |
alpha |
the significace level used for confidence intervals when |
verbose |
a logical value indicating whether to print the GAM model statement, the percentile rank for the global deviance statistic in the permutation test, and the progress of the permutation test (report completion of every 10 permutations). The default is |
... |
further arguments to be passed to the |
The model used to fit the data is a generalized additive model with a LOESS smooth for a two-dimensional predictor such as geolocation (Hastie and Tibshirani, 1990; Kelsall and Diggle, 1998; Webster et al., 2006). Although any family and link function supported by the gam
function is supported, the default binomial family with logit link yields the following model:
where is the probability that the outcome is 1 for participant i,
and
are predictor coordinates for participant i (i.e., projected distance east and north, respectively, from an arbitrarily defined origin location),
is a 2-dimensional smoothing function (currently LOESS),
is a row vector of covariate values for participant i, and
is a vector of unknown regression parameters (including an intercept).
When a permutation test is requested, for each permutation the paired X and Y coordinates in the data set are randomly reassigned to participants, consistent with the null hypothesis that the geolocation (or another two-dimensional predictor entered in place of X and Y) is not associated with the outcome. Note that this procedure intentionally preserves associations between other covariates and the outcome variable so the permutation test reflects the significance of geolocation. See the references for more details.
modgam
returns an object of class modgam
. It can be examined by print
and plot
.
grid |
a data frame with X and Y coordinates from rgrid. |
span |
the span size used for the LOESS smooth. If |
gamobj |
|
predobj |
the |
family |
the family and link function used for the model. |
fit |
the predicted values for each point on the grid. For Gaussian family models, predicted values are for the outcome variable at each grid point. For non-Gaussian families, predicted values are for the spatial effect alone on the scale of the linear predictor, compared to a referent specified through augument |
exp.fit |
the exponential of |
global.pvalue |
the p-value based on the likelihood ratio test comparing models with and without geolocation (or other two-dimensional predictor). |
If se.fit = TRUE
then the following values are provided:
se |
an estimated standard error for each predicted value. |
conf.low |
the lower bounds for pointwise (1- |
conf.high |
the higher bounds for pointwise (1- |
If permute > 0
then the following values are also provided:
global.permt |
the p-value based on the deviance statistic comparing models with and without geolocation (or other two-dimensional predictor), using the distribution of deviance statistics from the permuted data sets to represent the null distribution. For a test of H0: geolocation is unassociated with the outcome variable (adjusting for any covariates if |
pointwise.permt |
Only provided when |
If permute
> 0 and keep = T
then the following values are also provided:
permutations |
a matrix containing null permuted values for each point on the grid, with the results for each permutation in a separate column. For the binomial family and logit link these are provided as odds ratios, otherwise they are reported as linear predictors. |
deviances.permt |
a vector containing deviances for each permutation. |
Permutation tests are computationally intensive, often requiring several hours or more.
Lu Bai, Scott Bartell, Robin Bliss, and Veronica Vieira
Send bug reports to [email protected].
Bai L, Gillen DL, Bartell SM, Vieira VM. doi:10.32614/RJ-2020-001Mapping smoothed spatial effect estimates from individual-level data: MapGAM. The R Journal 2020, 12:32-48.
Hastie TJ, Tibshirani RJ. Generalized Additive Models. (Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Boca Raton, Florida, 1990).
Kelsall J, Diggle P. doi:10.1111/1467-9876.00128Spatial variation in risk of disease: a nonparametric binary regression approach. J Roy Stat Soc C-App 1998, 47:559-573.
Vieira V, Webster T, Weinberg J, Aschengrau A, Ozonoff D. doi:10.1186/1476-069X-4-11Spatial analysis of lung, colorectal, and breast cancer on Cape Cod: An application of generalized additive models to case-control data. Environmental Health 2005, 4:11.
Webster T, Vieira V, Weinberg J, Aschengrau A. doi:10.1186/1476-072X-5-26Method for mapping population-based case-control studies using Generalized Additive Models. International Journal of Health Geographics 2006, 5:26.
Young RL, Weinberg J, Vieira V, Ozonoff A, Webster TF. doi:10.1186/1476-072X-9-37A power comparison of generalized additive models and the spatial scan statistic in a case-control setting. International Journal of Health Geographics 2010, 9:37.
predgrid
,
optspan
,
plot.modgam
colormap
.
# Load base map in SpatialPolygonsDataFrame format # This map was read from ESRI shapefiles using the readShapePoly function (soon to be deprecated) data(MAmap) # Load data and create grid on base map data(MAdata) gamgrid <- predgrid(MAdata, map=MAmap) # Fit crude logistic GAM to the MA data using span size of 50% # and predict odds ratios for every point on gamgrid fit1 <- modgam(data=MAdata, rgrid=gamgrid, m="crude", sp=0.5) # Summary statistics for pointwise crude odds ratios summary(fit1$exp.fit) # Summary stats for pointwise crude log odds (linear predictor) summary(fit1$fit) # fit adjusted GAM using span size of 50%, # including a (too small) conditional permutation test fit2 <- modgam(data=MAdata, rgrid=gamgrid, permute=25, m="adjusted", sp=0.5) fit2 # fit adjusted GAM by specifiying formula fit2.f <- modgam(Case~lo(Xcoord,Ycoord)+Smoking + Mercury + Selenium,data=MAdata, rgrid=gamgrid, sp=0.5) fit2.f # Detailed example with a continuous outcome variable, map projections, # and data trimming: investigating tweet times by geolocation # NOTE: this example requires the maps and mapproj packages # Convert base map and beer tweet data locations to US Albers projection # for better distance estimates than using (lat,long) as (X,Y) coords if(require(maps) & require(mapproj)) { USmap <- map("state",projection="albers",parameters=c(29.5,45.5), plot=FALSE,fill=TRUE,col="transparent") data(beertweets) # Reuse last map projection to convert data coordinates XY <- mapproject(beertweets$longitude,beertweets$latitude)[1:2] jtime <- julian(beertweets$time) # Convert tweet dates and times to time of day (24-hour) tweettime <- as.numeric(jtime-trunc(jtime))*24 beerproj <- data.frame(tweettime, XY[1], XY[2], beertweets$beer) # Generate grid on the US map, trimmed to range of beer data USgrid <- predgrid(beerproj, map=USmap) # Fit adjusted model--adjusting for beer indicator variable fit3 <- modgam(data=beerproj, rgrid=USgrid, family=gaussian, reference="none", m="adjusted", sp=0.05) # Get summary statistics for predicted tweet times across grid points summary(fit3$fit) } # Smoothing survival rates over geolocations with adjustement of Age and Insurance # Including generating pointwise standard errors and confidence intervals data(CAdata) data(CAgrid) data = CAdata[1:1000,] # use a subset of the California data # manually set the categorical variables as factors data$INS = factor(data$INS) # no formula needed if data are arranged in a specific order (see \code{data} argument) fit4 <- modgam(data=data, rgrid=CAgrid, family="survival", sp=0.2) fit4 # fit the same model using the formula argument, with data columns in any order fit4.f <- modgam(Surv(time,event)~AGE+factor(INS)+lo(X,Y), data=data, rgrid = CAgrid, family="survival", sp=0.2) # Smoothing for two-dimensional chemical exposure instead of geolocation # case status in 1st column, mercury and selenium in 2nd and 3rd columns ma2 <- MAdata[,c(1,5:6)] expgrid <- predgrid(ma2) fit5 <- modgam(data=ma2,rgrid=expgrid,sp=.5,m="crude") summary(fit5$exp.fit) # plot the results, with mercury on the X axis and selenium on the Y axis plot(fit5, contours="response", arrow=FALSE, axes=TRUE)
# Load base map in SpatialPolygonsDataFrame format # This map was read from ESRI shapefiles using the readShapePoly function (soon to be deprecated) data(MAmap) # Load data and create grid on base map data(MAdata) gamgrid <- predgrid(MAdata, map=MAmap) # Fit crude logistic GAM to the MA data using span size of 50% # and predict odds ratios for every point on gamgrid fit1 <- modgam(data=MAdata, rgrid=gamgrid, m="crude", sp=0.5) # Summary statistics for pointwise crude odds ratios summary(fit1$exp.fit) # Summary stats for pointwise crude log odds (linear predictor) summary(fit1$fit) # fit adjusted GAM using span size of 50%, # including a (too small) conditional permutation test fit2 <- modgam(data=MAdata, rgrid=gamgrid, permute=25, m="adjusted", sp=0.5) fit2 # fit adjusted GAM by specifiying formula fit2.f <- modgam(Case~lo(Xcoord,Ycoord)+Smoking + Mercury + Selenium,data=MAdata, rgrid=gamgrid, sp=0.5) fit2.f # Detailed example with a continuous outcome variable, map projections, # and data trimming: investigating tweet times by geolocation # NOTE: this example requires the maps and mapproj packages # Convert base map and beer tweet data locations to US Albers projection # for better distance estimates than using (lat,long) as (X,Y) coords if(require(maps) & require(mapproj)) { USmap <- map("state",projection="albers",parameters=c(29.5,45.5), plot=FALSE,fill=TRUE,col="transparent") data(beertweets) # Reuse last map projection to convert data coordinates XY <- mapproject(beertweets$longitude,beertweets$latitude)[1:2] jtime <- julian(beertweets$time) # Convert tweet dates and times to time of day (24-hour) tweettime <- as.numeric(jtime-trunc(jtime))*24 beerproj <- data.frame(tweettime, XY[1], XY[2], beertweets$beer) # Generate grid on the US map, trimmed to range of beer data USgrid <- predgrid(beerproj, map=USmap) # Fit adjusted model--adjusting for beer indicator variable fit3 <- modgam(data=beerproj, rgrid=USgrid, family=gaussian, reference="none", m="adjusted", sp=0.05) # Get summary statistics for predicted tweet times across grid points summary(fit3$fit) } # Smoothing survival rates over geolocations with adjustement of Age and Insurance # Including generating pointwise standard errors and confidence intervals data(CAdata) data(CAgrid) data = CAdata[1:1000,] # use a subset of the California data # manually set the categorical variables as factors data$INS = factor(data$INS) # no formula needed if data are arranged in a specific order (see \code{data} argument) fit4 <- modgam(data=data, rgrid=CAgrid, family="survival", sp=0.2) fit4 # fit the same model using the formula argument, with data columns in any order fit4.f <- modgam(Surv(time,event)~AGE+factor(INS)+lo(X,Y), data=data, rgrid = CAgrid, family="survival", sp=0.2) # Smoothing for two-dimensional chemical exposure instead of geolocation # case status in 1st column, mercury and selenium in 2nd and 3rd columns ma2 <- MAdata[,c(1,5:6)] expgrid <- predgrid(ma2) fit5 <- modgam(data=ma2,rgrid=expgrid,sp=.5,m="crude") summary(fit5$exp.fit) # plot the results, with mercury on the X axis and selenium on the Y axis plot(fit5, contours="response", arrow=FALSE, axes=TRUE)
Obtains spatial effects predictions and optionally estimates standard errors and confidence intervals of those predictions from a fitted generalized additive model object.
mypredict.gam(object, newdata, se.fit = FALSE, type=c("all","spatial"), reference = "median", level = 0.05,verbose=FALSE)
mypredict.gam(object, newdata, se.fit = FALSE, type=c("all","spatial"), reference = "median", level = 0.05,verbose=FALSE)
object |
a fitted |
newdata |
a data frame containing the values at which predictions are required. This argument can be missing, in which case predictions are made at the same values used to compute the object. Only two-dimentional predictor need be present by name in newdata. |
se.fit |
if TRUE, pointwise standard errors and confidence intervals are computed along with the predictions. |
type |
use |
reference |
the type of reference for the estimated effect. If |
level |
the siginificance level used when |
verbose |
a logical value indicating whether to print filling values for newdata. The default is |
pred |
the estimated effect difference or (log ratio) compare to the effect specified by |
se |
the standard errors along with the predictions. |
conf.low |
the lower bounds for pointwise (1- |
conf.high |
the higher bounds for pointwise (1- |
Lu Bai
Send bug reports to [email protected].
Hastie TJ, Tibshirani RJ. Generalized Additive Models. (Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Boca Raton, Florida, 1990).
data(MAdata) data(MAmap) gamgrid <- predgrid(MAdata, map=MAmap) fit <- gam(Case~lo(Xcoord,Ycoord,span=0.2)+Smoking,data=MAdata,family=binomial()) pred1 = mypredict.gam(fit) colormap(list(fit=pred1$pred,grid=data.frame(X=MAdata$X,Y=MAdata$Y)),map=MAmap) pred2 = mypredict.gam(fit,gamgrid) colormap(list(fit=pred2$pred,grid=data.frame(X=gamgrid$X,Y=gamgrid$Y)),map=MAmap) pred3 = mypredict.gam(fit,gamgrid,se.fit=TRUE) colormap(list(fit=pred3$pred,conf.low = pred3$conf.low, conf.high = pred3$conf.high, grid=data.frame(X=gamgrid$X,Y=gamgrid$Y)),map=MAmap,contours = "interval")
data(MAdata) data(MAmap) gamgrid <- predgrid(MAdata, map=MAmap) fit <- gam(Case~lo(Xcoord,Ycoord,span=0.2)+Smoking,data=MAdata,family=binomial()) pred1 = mypredict.gam(fit) colormap(list(fit=pred1$pred,grid=data.frame(X=MAdata$X,Y=MAdata$Y)),map=MAmap) pred2 = mypredict.gam(fit,gamgrid) colormap(list(fit=pred2$pred,grid=data.frame(X=gamgrid$X,Y=gamgrid$Y)),map=MAmap) pred3 = mypredict.gam(fit,gamgrid,se.fit=TRUE) colormap(list(fit=pred3$pred,conf.low = pred3$conf.low, conf.high = pred3$conf.high, grid=data.frame(X=gamgrid$X,Y=gamgrid$Y)),map=MAmap,contours = "interval")
modgam
Determines the optimal span size for modgam
, a spatial generalized additive model (GAM) with a two-dimensional LOESS smooth for location, by minimizing the AIC.
optspan(formula, data, offset,spans = seq(0.05, 0.95, by = 0.05), m ="adjusted", family = binomial(), verbose =TRUE, degree = 1, ...)
optspan(formula, data, offset,spans = seq(0.05, 0.95, by = 0.05), m ="adjusted", family = binomial(), verbose =TRUE, degree = 1, ...)
formula |
a formula expression for the model, with the response on the left of a ~ oprator, and the predictors on the right. If fitting a Cox additive model, the response must be a survival object as returned by the Surv function. A built-in nonparametric smoothing term is indicated by |
data |
a data frame containing the variables in the model. |
offset |
the name in data that will be used as offset in the model. It will be ignored if offset is specified in |
spans |
a vector of candidate span sizes that the optimal span is chosen from. |
m |
the model type. Options are "crude" or "adjusted" (default). If "crude", only the spatial smooth term for location is included in the model. If "adjusted", all covariates in the data set (columns >= 4) are included in the model. |
family |
a family function or the result of a call to a family function. (See |
verbose |
a logical argument; if TRUE shows the AIC for each candidate span size. |
degree |
a smoothing parameter used in |
... |
any additional arguments to pass to the gam function. |
The optimal span size, determined by minimizing the AIC
This function does not return model predictions–only the optimal span size. To obtain model predictions use the modgam function.
Lu Bai, Scott Bartell, Robin Bliss, and Veronica Vieira
Send bug reports to [email protected].
data(MAdata) optspan(data=MAdata, m="crude") data(CAdata) optspan(Surv(time,event)~AGE+factor(INS)+lo(X,Y), data=CAdata, spans=c(0.1,0.2,0.3),family="survival")
data(MAdata) optspan(data=MAdata, m="crude") data(CAdata) optspan(Surv(time,event)~AGE+factor(INS)+lo(X,Y), data=CAdata, spans=c(0.1,0.2,0.3),family="survival")
modgam
Objects
Displays a color image map, including a legend, scale bar, and optional North arrow, showing crude or adjusted odds ratios (or linear predictors) for a grid of points. Irregular grids are allowed. Also draws contour lines for regions of signficantly increased or decreased values of the outcome variable ("clusters"), if permutation ranks are provided. Designed to display modgam
objects but can be used with other model results if the modgamobj list contains the correct elements.
## S3 method for class 'modgam' plot(x, map = NULL, exp = FALSE, add = FALSE, intervals=TRUE, mapmin = NULL, mapmax = NULL, col.seq = diverge_hsv(201), anchor=FALSE, border.gray = 0.3, contours=c("none","response","permrank","interval"), contours.drawlabels=FALSE, contours.lty=1, contours.lwd=1, contours.levels, contours.labcex=0.7, arrow=TRUE, axes=FALSE, ptsize=0.9, alpha=0.05, mai, legend.name = "predicted values", legend.cex=1, ...)
## S3 method for class 'modgam' plot(x, map = NULL, exp = FALSE, add = FALSE, intervals=TRUE, mapmin = NULL, mapmax = NULL, col.seq = diverge_hsv(201), anchor=FALSE, border.gray = 0.3, contours=c("none","response","permrank","interval"), contours.drawlabels=FALSE, contours.lty=1, contours.lwd=1, contours.levels, contours.labcex=0.7, arrow=TRUE, axes=FALSE, ptsize=0.9, alpha=0.05, mai, legend.name = "predicted values", legend.cex=1, ...)
x |
(required) an object of class |
map |
can be used to map predicted values on a base map from the |
exp |
if |
add |
use |
intervals |
if the |
mapmin |
the minimum value for the color scale legend. |
mapmax |
the maximum value for the color scale legend. |
col.seq |
The color sequence (palette) used to display the predicted values on the map. |
anchor |
Use |
border.gray |
gray scale for the border of |
contours |
use |
contours.drawlabels |
use |
contours.lty |
the line type for contour lines. |
contours.lwd |
the width for contour lines. |
contours.levels |
the levels for contour lines. When |
contours.labcex |
cex for contour labelling. This is an absolute size, not a multiple of |
arrow |
use |
axes |
use |
ptsize |
the size of the points used to fill in colors on the map. Increase to remove white space inside the map or decrease to remove color outside the map boundaries. NOTE: white space can also be eliminated by increasing the grid size in |
alpha |
the nominal pointwise type I error rate; only used when |
mai |
the margins of the plot. Details see par. |
legend.name |
the name of displayed for the legend bar. |
legend.cex |
a numerical value giving the amount by which legend text should be magnified relative to the default. |
... |
other arguments to be passed to colormap function. |
Produces an image map showing crude or adjusted linear predictors (or odds ratios). If the base map is in readShapePoly
format a scale bar is included. The scale bar assumes that the X and Y coordinates are provided in meters.
Note that the contour lines use a pointwise nominal type I error rate of alpha; the chance of a type I error occurring for at least one point on the map is much higher, typically approaching 100% at alpha=0.05, because a spatial prediction grid generally contains many points.
Scott Bartell, Lu Bai, Robin Bliss, and Veronica Vieira
Send bug reports to [email protected].
trimdata
,
predgrid
,
optspan
,
modgam
,
colormap
.
data(MAmap) data(MAdata) # Create a grid on the base map MAgrid <- predgrid(MAdata, map=MAmap) # fit crude GAM model to the MA data using span size of 50% fit1 <- modgam(data=MAdata, rgrid=MAgrid, m="crude", sp=0.5) # Plot a map showing crude odds ratios plot(fit1, MAmap) #### A detailed example including map projections and data trimming # NOTE: this example requires the maps and mapproj packages # Convert base map and beer tweet data locations to US Albers projection # projected coords yield better distance estimates than (lat,long) if(require(maps) & require(mapproj)) { USmap <- map("state",projection="albers",parameters=c(29.5,45.5), plot=FALSE,fill=TRUE,col="transparent") data(beertweets) case <- beertweets$beer # Reuse last map projection to convert data coordinates XY <- mapproject(beertweets$longitude,beertweets$latitude)[1:2] beerproj <- data.frame(case, XY[1], XY[2]) # Generate grid on the US map, trimmed to range of beer data USgrid <- predgrid(beerproj, map=USmap) # Fit unadjusted model--geolocation only fit2 <- modgam(data=beerproj, rgrid=USgrid, m="unadjusted", sp=0.05) dev.new(width=7,height=5) plot(fit2, USmap) title(main="Beer Tweet Odds Ratios") }
data(MAmap) data(MAdata) # Create a grid on the base map MAgrid <- predgrid(MAdata, map=MAmap) # fit crude GAM model to the MA data using span size of 50% fit1 <- modgam(data=MAdata, rgrid=MAgrid, m="crude", sp=0.5) # Plot a map showing crude odds ratios plot(fit1, MAmap) #### A detailed example including map projections and data trimming # NOTE: this example requires the maps and mapproj packages # Convert base map and beer tweet data locations to US Albers projection # projected coords yield better distance estimates than (lat,long) if(require(maps) & require(mapproj)) { USmap <- map("state",projection="albers",parameters=c(29.5,45.5), plot=FALSE,fill=TRUE,col="transparent") data(beertweets) case <- beertweets$beer # Reuse last map projection to convert data coordinates XY <- mapproject(beertweets$longitude,beertweets$latitude)[1:2] beerproj <- data.frame(case, XY[1], XY[2]) # Generate grid on the US map, trimmed to range of beer data USgrid <- predgrid(beerproj, map=USmap) # Fit unadjusted model--geolocation only fit2 <- modgam(data=beerproj, rgrid=USgrid, m="unadjusted", sp=0.05) dev.new(width=7,height=5) plot(fit2, USmap) title(main="Beer Tweet Odds Ratios") }
Creates a data frame containing a rectangular grid of points to cover the range of X and Y coordinates provided in a data set, and trims the grid so that the points do not extend beyond the boundaries shown on a map. Users can omit a data set to create a grid covering the whole map, or omit a map to create a grid covering the whole data set.
predgrid(dataXY = NULL, map = NULL, nrow = 100, ncol = 100, X = NULL, Y = NULL)
predgrid(dataXY = NULL, map = NULL, nrow = 100, ncol = 100, X = NULL, Y = NULL)
dataXY |
A data frame with X and Y coordinates. If the data frame has more than 2 columns, the X and Y coordinates must be in the 2nd and 3rd columns, respectively (the format for the data argument in the |
map |
a map for clipping the grid, provided in a recognized format. Supported map classes include "map" (the format produced by the maps package), "sf" (the ISO standard format supported by the sf package), "Raster" (used by the raster package), or "Spatial" (a legacy format used by the maptools package). To load a map from an ESRI shapefile into R, use the st_read function in the sf library, the readOGR function in the rgdal package, or the readShapePoly function in the maptools package. If |
nrow |
the number of rows in the grid (default=100). |
ncol |
the number of columns in the grid (default=100). |
X |
a vector of X coordinates, supplied instead of (or in addition to) dataXY. If both X and dataXY are provided, X will be used instead of the corresponding column of dataXY. |
Y |
a vector of Y coordinates, supplied instead of (or in addition to) dataXY. If both Y and dataXY are provided, Y will be used instead of the corresponding column of dataXY. |
predgrid
creates a grid of dimensions nrow*ncol using the range of X and Y coordinates (e.g., longitude and latitude) in the data frame supplied as dataXY
. If the map argument is used, the function trimdata
is automatically used to clip the grid. Users should be sure to use the same projection for the map and the data; putting both on the same plot can help reveal differing projections. If the map centroid is not in the range of the data a warning message is printed; this might indicate differing projections but can occur naturally when the data were not sampled from the entire extent of the map, or when map boundaries are concave.
A data frame with X and Y coordinates in the first two columns. The column names from dataXY are used for the output. If the columns of dataXY are unnamed then the names "X" and "Y" are assigned to the data frame.
Veronica Vieira, Scott Bartell, and Robin Bliss
Send bug reports to [email protected].
trimdata
,
optspan
,
modgam
,
colormap
.
# define a rectangular 100x100 grid covering the MA data data(MAdata) gamgrid <- predgrid(MAdata) # plot the grid points plot(gamgrid$Xcoord, gamgrid$Ycoord, cex=0.1, col="red") # and the data locations points(MAdata$Xcoord,MAdata$Ycoord) # But that grid extends beyond the state boundaries and into the ocean! # Better to also clip the grid to a map of MA using the following code: # Clip a 50x50 grid covering the MA data to a map of MA data(MAmap) gamgrid2 <- predgrid(MAdata, map=MAmap, nrow=50, ncol=50) # plot the MA map and grid points plot(MAmap) points(gamgrid2, cex=0.1, col="red")
# define a rectangular 100x100 grid covering the MA data data(MAdata) gamgrid <- predgrid(MAdata) # plot the grid points plot(gamgrid$Xcoord, gamgrid$Ycoord, cex=0.1, col="red") # and the data locations points(MAdata$Xcoord,MAdata$Ycoord) # But that grid extends beyond the state boundaries and into the ocean! # Better to also clip the grid to a map of MA using the following code: # Clip a 50x50 grid covering the MA data to a map of MA data(MAmap) gamgrid2 <- predgrid(MAdata, map=MAmap, nrow=50, ncol=50) # plot the MA map and grid points plot(MAmap) points(gamgrid2, cex=0.1, col="red")
gamcox
FitsObtains spatial effects predictions and optionally estimates standard errors and confidence intervals of those predictions from a fitted Cox proportional hazard additive model object.
## S3 method for class "gamcox" ## S3 method for class 'gamcox' predict(object, newdata = object$data, se.fit = FALSE, type=c("spatial","all"), reference = "median", level = 0.05, verbose=FALSE,...)
## S3 method for class "gamcox" ## S3 method for class 'gamcox' predict(object, newdata = object$data, se.fit = FALSE, type=c("spatial","all"), reference = "median", level = 0.05, verbose=FALSE,...)
object |
a fitted |
newdata |
a data frame containing the values at which predictions are required. This argument can be missing, in which case predictions are made at the same values used to compute the object. A comprehensive effect (hazard ratio) of the covariates included in |
se.fit |
if |
type |
use |
reference |
the type of reference for the estimated effect. If |
level |
the confidence level for condifence bands. |
verbose |
a logical value indicating whether to print filling values for newdata. The default is |
... |
extra arguments for S3 generic, ignored by |
pred |
the estimated log hazards ratio compared to the effect specified by |
se |
the standard errors along with the predictions. |
conf.low |
the lower bounds for pointwise (1- |
conf.high |
the higher bounds for pointwise (1- |
Lu Bai
Send bug reports to [email protected].
Hastie TJ, Tibshirani RJ. Generalized Additive Models. (Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Boca Raton, Florida, 1990).
data(CAdata) data(CAmap) fit <- gamcox(Surv(time,event)~AGE + factor(INS) + lo(X,Y),data=CAdata, span=0.2,loess.trace="approximate") fit pred1 = predict(fit) colormap(list(fit=pred1$pred,grid=data.frame(X=CAdata$X,Y=CAdata$Y)),map=CAmap, border.gray=0.5) data(CAgrid) pred2 = predict(fit,CAgrid[,c("X","Y")]) colormap(list(fit=pred2$pred,grid=data.frame(X=CAgrid$X,Y=CAgrid$Y)),map=CAmap, border.gray=0.5, legend.name="log hazard ratio") ## Circle significant areas based on the confidence intervals specified by conf.low and conf.high pred3 = predict(fit,CAgrid[,c("X","Y")],se.fit=TRUE) colormap(list(fit=pred3$pred,conf.low = pred3$conf.low, conf.high = pred3$conf.high, grid=data.frame(X=CAgrid$X,Y=CAgrid$Y)),map=CAmap,border.gray = 0.7, contours = "interval",legend.name="log hazard ratio")
data(CAdata) data(CAmap) fit <- gamcox(Surv(time,event)~AGE + factor(INS) + lo(X,Y),data=CAdata, span=0.2,loess.trace="approximate") fit pred1 = predict(fit) colormap(list(fit=pred1$pred,grid=data.frame(X=CAdata$X,Y=CAdata$Y)),map=CAmap, border.gray=0.5) data(CAgrid) pred2 = predict(fit,CAgrid[,c("X","Y")]) colormap(list(fit=pred2$pred,grid=data.frame(X=CAgrid$X,Y=CAgrid$Y)),map=CAmap, border.gray=0.5, legend.name="log hazard ratio") ## Circle significant areas based on the confidence intervals specified by conf.low and conf.high pred3 = predict(fit,CAgrid[,c("X","Y")],se.fit=TRUE) colormap(list(fit=pred3$pred,conf.low = pred3$conf.low, conf.high = pred3$conf.high, grid=data.frame(X=CAgrid$X,Y=CAgrid$Y)),map=CAmap,border.gray = 0.7, contours = "interval",legend.name="log hazard ratio")
gamcox
Object
This function displays information about a gamcox object produced by gamcox
.
## S3 method for class 'gamcox' print(x,...)
## S3 method for class 'gamcox' print(x,...)
x |
a |
... |
. |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) print(fit)
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) print(fit)
modgam
ObjectThis function displays information about a modgam object produced by modgam
.
## S3 method for class 'modgam' print(x, ...)
## S3 method for class 'modgam' print(x, ...)
x |
a modgam object. |
... |
extra arguments for S3 generic, ignored by |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
gamcox
Object
This function provides residuals of a gamcox object produced by gamcox
.
## S3 method for class 'gamcox' residuals(object,...)
## S3 method for class 'gamcox' residuals(object,...)
object |
a |
... |
. |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) residuals(fit)
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) residuals(fit)
modgam
ObjectThis function provides residuals of a modgam object produced by modgam
.
## S3 method for class 'modgam' residuals(object, ...)
## S3 method for class 'modgam' residuals(object, ...)
object |
a modgam object. |
... |
extra arguments for S3 generic, ignored by |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
Take all cases and a random sample of controls from a data frame. Simple random sampling and spatially stratified random sampling are available. For spatially statified random sampling, strata can be defined by region, or by region and additional stratification variables (see Tang et al., 2023 for examples and simulation comparisons). If no specific regions are specified with stratified sampling, the function will create a regular grid for spactially stratified sampling.
sampcont(rdata, type = "stratified", casecol=1, Xcol=2, Ycol=3, regions = NULL, addstrat = NULL, times = NULL, n = 1, nrow = 100, ncol = 100)
sampcont(rdata, type = "stratified", casecol=1, Xcol=2, Ycol=3, regions = NULL, addstrat = NULL, times = NULL, n = 1, nrow = 100, ncol = 100)
rdata |
a data frame with case status in the |
casecol |
the column number in |
Xcol |
the column number in |
Ycol |
the column number in |
type |
|
regions |
a vector of length equal to the number of rows in |
addstrat |
a vector of length equal to the number of rows in |
times |
included for backward compatibility; now replaced by the |
n |
the number of controls to sample from the eligible controls in each stratum. All available controls will be taken for strata with fewer than |
nrow |
the number of rows used to create a regular grid for sampling regions. Only used when |
ncol |
the number of columns used to create a regular grid for sampling regions. Only used when |
rdata |
a data frame with all cases and a random sample of controls. |
w |
the inverse probability weights for the rows in |
ncont |
the total number of controls in the sample. |
type |
statified or simple sampling, as specified by the same argument described above. |
gridsize |
a vector with the numbers of rows and columns for the stratified sampling grid. |
grid |
the stratified sampling grid in PolySet format. |
Scott M. Bartell and Ian W. Tang [email protected].
Tang IW, Bartell SM, Vieira VM. doi:10.1016/j.sste.2023.100584Unmatched Spatially Stratified Controls: A simulation study examining efficiency and precision using spatially-diverse controls and generalized additive models. Spatial and Spatio-temporal Epidemiology 2023, 45:100584.
#### load beertweets data, which has 719 cases and 9281 controls data(beertweets) # take a simple random sample of 1000 controls samp1 <- sampcont(beertweets, type="simple", n=1000) # take a stratified random sample of controls on a 80x50 grid samp2 <- NULL samp2 <- sampcont(beertweets, nrow=80, ncol=50) # Compare locations for the two sampling designs (cases in red) par(mfrow=c(2,1), mar=c(0,3,4,3)) plot(samp1$rdata$longitude, samp1$rdata$latitude, col=3-samp1$rdata$beer, cex=0.5, type="p", axes=FALSE, ann=FALSE) # Show US base map if maps package is available mapUS <- require(maps) if (mapUS) map("state", add=TRUE) title("Simple Random Sample, 1000 Controls") if (!is.null(samp2)) { plot(samp2$rdata$longitude, samp2$rdata$latitude, col=3-samp2$rdata$beer, cex=0.5, type="p", axes=FALSE, ann=FALSE) if (mapUS) map("state", add=TRUE) title(paste("Spatially Stratified Sample,",samp2$ncont,"Controls")) } par(mfrow=c(1,1)) ## Note that weights are needed in statistical analyses # Prevalence of cases in sample--not in source data mean(samp1$rdata$beer) # Estimated prevalence of cases in source data weighted.mean(samp1$rdata$beer, w=samp1$w) ## Do beer tweet odds differ below the 36.5 degree parallel? # Using full data glm(beer~I(latitude<36.5), family=binomial, data=beertweets) # Stratified sample requires sampling weights if (!is.null(samp2)) glm(beer~I(latitude<36.5), family=binomial, data=samp2$rdata, weights=samp2$w)
#### load beertweets data, which has 719 cases and 9281 controls data(beertweets) # take a simple random sample of 1000 controls samp1 <- sampcont(beertweets, type="simple", n=1000) # take a stratified random sample of controls on a 80x50 grid samp2 <- NULL samp2 <- sampcont(beertweets, nrow=80, ncol=50) # Compare locations for the two sampling designs (cases in red) par(mfrow=c(2,1), mar=c(0,3,4,3)) plot(samp1$rdata$longitude, samp1$rdata$latitude, col=3-samp1$rdata$beer, cex=0.5, type="p", axes=FALSE, ann=FALSE) # Show US base map if maps package is available mapUS <- require(maps) if (mapUS) map("state", add=TRUE) title("Simple Random Sample, 1000 Controls") if (!is.null(samp2)) { plot(samp2$rdata$longitude, samp2$rdata$latitude, col=3-samp2$rdata$beer, cex=0.5, type="p", axes=FALSE, ann=FALSE) if (mapUS) map("state", add=TRUE) title(paste("Spatially Stratified Sample,",samp2$ncont,"Controls")) } par(mfrow=c(1,1)) ## Note that weights are needed in statistical analyses # Prevalence of cases in sample--not in source data mean(samp1$rdata$beer) # Estimated prevalence of cases in source data weighted.mean(samp1$rdata$beer, w=samp1$w) ## Do beer tweet odds differ below the 36.5 degree parallel? # Using full data glm(beer~I(latitude<36.5), family=binomial, data=beertweets) # Stratified sample requires sampling weights if (!is.null(samp2)) glm(beer~I(latitude<36.5), family=binomial, data=samp2$rdata, weights=samp2$w)
gamcox
Object
This function summarizes information about a gamcox object produced by gamcox
.
## S3 method for class 'gamcox' summary(object,...)
## S3 method for class 'gamcox' summary(object,...)
object |
a |
... |
. |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) summary(fit)
data(CAdata) fit <- gamcox(Surv(time,event)~lo(X,Y)+AGE+INS,data=CAdata, span = 0.2) summary(fit)
modgam
ObjectThis function summarizes information about a modgam object produced by modgam
.
## S3 method for class 'modgam' summary(object, ...)
## S3 method for class 'modgam' summary(object, ...)
object |
a modgam object. |
... |
extra arguments for S3 generic, ignored by |
Lu Bai
Send bug reports to [email protected].
modgam
gamcox
,
predict.gamcox
.
modgam
FunctionBased on the arguments in modgam
, build a formula used for model fitting.
toformula(formula, data,m="adjusted",surv = FALSE, span=0.5,degree=2, smooth = TRUE,offset='')
toformula(formula, data,m="adjusted",surv = FALSE, span=0.5,degree=2, smooth = TRUE,offset='')
formula |
a user specified formula expression (for |
data |
a well structured data frame with the two geolocation parameters in the first and second columns. |
m |
use |
surv |
|
span , degree
|
smoothing parameters used for smoothing functions (see details in |
smooth |
|
offset |
the name for offset. |
The function returns the formula for the model.
Lu Bai
Send bug reports to [email protected].
modgam
, .
data(CAdata) toformula(data=CAdata, surv=TRUE)
data(CAdata) toformula(data=CAdata, surv=TRUE)
Takes a subset of a data frame: returns rows with geolocations inside the boundaries determined by a map. If rectangle=FALSE
, strict map boundaries are used. If rectangle=TRUE
, a rectangular boundary is determined based on the range of X and Y coordinates for the map, along with an optional buffer.
trimdata(rdata, map, Xcol=2, Ycol=3, rectangle=F, buffer=0.05)
trimdata(rdata, map, Xcol=2, Ycol=3, rectangle=F, buffer=0.05)
rdata |
the data set (required). If |
map |
a map for clipping the grid, provided in a recognized format. Supported map classes include "map" (the format produced by the maps package), "sf" (the ISO standard format for maps, supported by the sf package), "Raster" (used by the raster package), or "Spatial" (a legacy format used by the maptools package). To load a map from an ESRI shapefile into R, use the |
Xcol |
the column number of rdata for the X coordinates of geolocation. Only used if rdata has >2 columns. |
Ycol |
the column number of rdata for the Y coordinates of geolocation. Only used if rdata has >2 columns. |
rectangle |
if |
buffer |
a fraction of the map range to add to either side of the rectangular boundary before trimming the data (default 5%). The buffer argument is ignored if |
Functions from the sf package are used to convert map formats and clip the grid to points inside the map boundaries. Be sure to use fill=T
when using the map
function to produce maps for trimdata
: maps produced using the default fill=F
argument do not work properly with this function! If the map bounding box centroid is not in the range of the data, a warning message is printed; this might indicate differing projections, but can occur naturally when the data were not sampled from the entire extent of the map, or when map boundaries are concave.
A subset of the rdata data frame containing only those rows with geolocations within the specified boundaries.
Veronica Vieira, Scott Bartell, and Robin Bliss
Send bug reports to [email protected].
predgrid
,
optspan
,
modgam
,
colormap
.
# This example uses the "sf" package to read in an external ESRI shapefile for Maine if (require(sf)) { # download example shapefile zip from github, and unzip zippath <- paste(tempdir(),"Income_schooling.zip",sep="/") download.file("https://github.com/mgimond/Spatial/raw/main/Data/Income_schooling.zip", destfile = zippath, mode='wb') unzip(zippath, exdir = tempdir()) # read shapefile into sf format shppath <- paste(tempdir(),"Income_schooling.shp",sep="/") basemap0 <- st_read(shppath) # Create example data by randomly sampling within bounding box rs <- st_bbox(basemap0) # get ranges of X and Y MEdata <- data.frame(X=runif(300,rs[1],rs[3]), Y=runif(300,rs[2],rs[4])) plot(basemap0["NAME"], reset=FALSE) plot(st_as_sf(MEdata,coords=1:2), add=TRUE) # plot data in black # trim data to basemap, and plot trimmed data with red X's dME <- trimdata(MEdata, basemap0) plot(st_as_sf(dME,coords=1:2), col="red", pch="X", add=TRUE) dev.off() # clear map settings } # The following examples use the "maps" package, which includes its own maps if (require(maps)) { data(beertweets) ### Trim beer tweet data to US base map, and plot them basemap1 <- map("usa", fill=TRUE, col="transparent") dUS <- trimdata(beertweets, basemap1) # Plot tweet locations (beer tweets in red) points(dUS$longitude, dUS$latitude, col=dUS$beer+1, cex=0.5) dev.off() # clear map settings ### Trim beer tweet data to Texas base map, and plot them basemap2 <- map("state", regions="texas", fill=TRUE, col="transparent") dTX <- trimdata(beertweets, basemap2) # Plot tweet locations (beer tweets in red) points(dTX$longitude, dTX$latitude, col=dTX$beer+1, cex=0.5) }
# This example uses the "sf" package to read in an external ESRI shapefile for Maine if (require(sf)) { # download example shapefile zip from github, and unzip zippath <- paste(tempdir(),"Income_schooling.zip",sep="/") download.file("https://github.com/mgimond/Spatial/raw/main/Data/Income_schooling.zip", destfile = zippath, mode='wb') unzip(zippath, exdir = tempdir()) # read shapefile into sf format shppath <- paste(tempdir(),"Income_schooling.shp",sep="/") basemap0 <- st_read(shppath) # Create example data by randomly sampling within bounding box rs <- st_bbox(basemap0) # get ranges of X and Y MEdata <- data.frame(X=runif(300,rs[1],rs[3]), Y=runif(300,rs[2],rs[4])) plot(basemap0["NAME"], reset=FALSE) plot(st_as_sf(MEdata,coords=1:2), add=TRUE) # plot data in black # trim data to basemap, and plot trimmed data with red X's dME <- trimdata(MEdata, basemap0) plot(st_as_sf(dME,coords=1:2), col="red", pch="X", add=TRUE) dev.off() # clear map settings } # The following examples use the "maps" package, which includes its own maps if (require(maps)) { data(beertweets) ### Trim beer tweet data to US base map, and plot them basemap1 <- map("usa", fill=TRUE, col="transparent") dUS <- trimdata(beertweets, basemap1) # Plot tweet locations (beer tweets in red) points(dUS$longitude, dUS$latitude, col=dUS$beer+1, cex=0.5) dev.off() # clear map settings ### Trim beer tweet data to Texas base map, and plot them basemap2 <- map("state", regions="texas", fill=TRUE, col="transparent") dTX <- trimdata(beertweets, basemap2) # Plot tweet locations (beer tweets in red) points(dTX$longitude, dTX$latitude, col=dTX$beer+1, cex=0.5) }