Title:
Finding starting points for Markov chain Monte Carlo analysis of
genetic data from large and complex pedigrees
Abstract:
Genetic data from founder populations are advantageous for studies of
complex traits that are often plagued by the problem of genetic
heterogeneity. However, the desire to analyze large and complex pedigrees
that often arise from such populations, coupled with the need to handle
many linked and highly polymorphic loci simultaneously, poses challenges
to current standard approaches. A viable alternative to solving such
problems is via Markov chain Monte Carlo (MCMC) procedures, where a Markov
chain, defined on the state space of a latent variable (e.g., genotypic
configuration or inheritance vector), is constructed. However, finding
starting points for the Markov chains is a difficult problem when the
pedigree is not single-locus peelable; methods proposed in the literature
have not yielded completely satisfactory solutions. We propose a
generalization of the heated Gibbs sampler with relaxed penetrances (HGRP)
of Lin et al., ([1993] IMA J. Math. Appl. Med. Biol. 10:1-17) to search
for starting points. HGRP guarantees that a starting point will be found
if there is no error in the data, but the chain usually needs to be run
for a long time if the pedigree is extremely large and complex. By
introducing a forcing step, the current algorithm substantially reduces
the state space, and hence effectively speeds up the process of finding a
starting point. Our algorithm also has a built-in preprocessing procedure
for Mendelian error detection. The algorithm has been applied to both
simulated and real data on two large and complex Hutterite pedigrees under
many settings, and good results are obtained. The algorithm has been
implemented in a user-friendly package called START.