rGLM {rGLM}R Documentation

EM algorithm to fit penalized maximum likelihood estimates of trait associations with SNP haplotypes

Description

This function takes a dataset of haplotypes in which rows for individuals of uncertain phase have been augmented by "pseudo-individuals" who carry the possible multilocus genotypes consistent with the single-locus phenotypes. The EM algorithm is used to find penalized MLEs for trait associations with covariates in L1 regularized generalized linear models. Parts of code is borrowed from the hapassoc function in hapassoc package as well the arguments and values.

Usage

rGLM(form, haplos.list, baseline = "missing",family = binomial, freq = NULL, maxit = 50, tol = 0.001, start = NULL, lambda=0, trace = FALSE) 

Arguments

form model equation in usual R format
haplos.list list of haplotype data from pre.hapassoc, which is a function borrowed from the hapassoc package
baseline optional, haplotype to be used for baseline coding. Default is the most frequent haplotype according to the initial haplotype frequency estimates returned by pre.hapassoc.
family binomial, poisson, gaussian or freq are supported, default=binomial
freq initial estimates of haplotype frequencies, default values are calculated in pre.hapassoc using standard haplotype-counting (i.e. EM algorithm without adjustment for non-haplotype covariates)
maxit maximum number of iterations of the EM algorithm; default=50
tol convergence tolerance in terms of either the maximum difference in parameter estimates between interations or the maximum relative difference in parameter estimates between iterations, which ever is larger.
start starting values for parameter estimates in the risk model
lambda tuning parameter lambda
trace indicates whether or not a list of the genotype variables used to form haplotypes and a list of other non-genetic variables should be printed; default is TRUE.

Details

haplos.list is a list from pre.hapassoc function. Please check the detail in the pre.hapassoc help file.

Value

it number of iterations of the EM algorithm
beta estimated regression coefficients
freq estimated haplotype frequencies
fits fitted values of the trait
wts final weights calculated in last iteration of the EM algorithm. These are estimates of the conditional probabilities of each multilocus genotype given the observed single-locus genotypes.
response trait value
converged TRUE/FALSE indicator of convergence. If the algorithm fails to converge, only the converged indicator is returned.
model model equation
loglik the log-likelihood evaluated at the maximum likelihood estimates of all parameters call the function call

References

Burkett K, Graham J, McNeney B. 2006. hapassoc: Software for likelihood inference of trait associations with SNP haplotypes and other attributes. Journal of Statistical Software {16}:1-19.

Guo, W. and Lin, S. 2009. Generalized linear modeling with regularization for detecting common disease rare haplotype association. Genetics Epidemiology. DOI: 10.1002/gepi.20382.


[Package rGLM version 2.0 Index]