CE: Cross entropy Monte Carlo methods for SNP selection This MATLAB toolbox is for SNP selection (tagging) from haplotypes with thousands SNPS. The idea is to maximize the generalized entropy ratio (ER) related criteria with cross entropy monte carlo method. Requirements: * A MATLAB environment. * Octave is not supported. Install: simply add a directory where you have unpacked *.m into MATLAB path. For example: % cd ~/work % tar xvfz CE.tar.gz % cd CE/ % matlab >> addpath ~/work/CE Download: CEMC.tar.gz performance: The matlab version requires serveral hours for over 1,000 SNPs and hundreds of distinct haplotypes. C version should be much faster since it can deal with strings more efficiently. There in NO C version available currently. I might write a C version in the near future. Format: Haplotype Frequency. 1 1 1 1 2 0.2 ...... Functions: [finalp, curbests, Zg, ps, gammas] = hapcemc2(haplist, shap, Fhap, N, rho, maxIters, epn, lamda, pr) This is the main funtion for CEMC. Input: haplist: haplotype list Shap: the string version of haplotype (We have to use strings for haplotype for large data set) Fhap: the haplotype frequency N: sample size rho: The proportion in important sampling MaxIters: the maximum iterations epn: stop criteria lamda: is the weight pr: initial probability output: finalp: the final probability each SNP is choosen curbest: best result for each iteration: demos: %load hap51; % S = strread(X); % for lam = 0.3:0.1:0.7. % [fp, bst] =hapcemc2(X, S, F, 1000, 0.15, 100, 0.00001, lam); % end % c = find(fp > 0.5); % H=11; %vb = perform(X, F, c, H, 0) Reference: 1. Liu, Z., and Lin, S. 2005 Multilocus LD measure and tagging SNP selection with generalized mutual information. Genet Epidemiol. 29, 353-364. 2. Liu, Z., Lin, S., Tan, M. (2006) Genome-wide tagging SNPs with entropy-based Monte Carlo methods. Journal of Computational Biology, accepted 3. Weale, M.E., Depondt, C., Macdonald, S.J., Smith, A., Lai, P.S., Shorvon, S.D., Wood, N.W., and Goldstein, D.B. 2003. Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am J Hum Genet 73, 551-565. (The perform.m and performl.m functions are revised from Weale's TAGIT3.0 package)