This file contains instructions for running a R function - TopKCEMC **************************************************** ***********Download and Check the files************* **************************************************** Download and unzip files into your working directory. For example, if your working directory is /home/TopKCEMC/, after download, it should contain the following files TopKCEMC.R (source R code) TopK.c (source C code) TopK.dll (DLL file for Windows) para.txt (parameter file) All_in_one_example.txt (example of input data file) inputlist1.txt (example of input data file) inputlist2.txt (example of input data file) inputlist3.txt (example of input data file) out.txt (example of output) **************************************************** *******************Input Files********************** **************************************************** There are two types of input files.Note that in the input files, anything after # denote comments. 1.data files which include the lists.There are two formats for data file. You can choose either option, as detailed below. We recommend the second one. 2.parameter file which have to be prepared for running this program. ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1.Data File. This program supports two input data formats(i) and (ii) In the following file format,text after # are comments (i)All lists in one file. In this format, you need to provide the total number of lists and total number of each items.An example of this input is in "All_in_one_example.txt" n ###### Total number of lists to be aggregation k1 ###### Total number of elements in list1 G11 ###### k1 identifiers for the k1 elements in list1 G12 .... G1k1 k2 ###### Total number of elements in the list2 G11 ###### k2 identifiers for the k1 elements in list2 G12 .... G1k1 .... .... kn ###### Total number of elements in the list_n G11 ###### kn identifiers for the k1 elements in listn G12 .... G1kn (ii)one file per one list, in each file the format is as follows.An example of this input is in "inputlist1.txt","inputlist2.txt","inputlist3.txt" G1 ###### k_i identifiers for the k1 elements in list_i G2 ... +++++++++++++++++++++++++ 2.Parameter file The parameter file needsto be saved as "para.txt". See the "para.txt" in your working directory as an example. For the runing parameter, we suggest you use the default set in this example file. ############################################################ InputStyle=a # a=0 if lists in input format(i)(a single file) # a=1 if lists in input format(i)(one file per list) FileNames=b # b="file1,file2,...,filen", if InputStyle=0 you only need to specify on input file OutputType=c # c=0 :Screen output # c=1 :output to text file;The output file will be "tempout.txt" # c=2 :both Screen output and txt file output; prob_out=d # d=0 :no output for probability matrix # d=1 :output for probability matrix dm=e # e='s' :distance measure for Spearman's footrule # e="k" :distance measure for Kendall's tau k=f # f :f is the size of the ordered list to be outputed. K can't exceed the maximum of the input lists. #The following tunning par specifications are our recommendation, but they can be changed by user N=n # n :number of random samples in each iterate #(default value: 10*total number of different items in the input lists) N1=n1 # n1 :number of random samples retained from the previous iterate each time (default value: 0.1 * N) rho=r # r :proportion of top random samples used to estimated new probability matrix (default value: 0.1) w=w # w: :weight of the estimated probability matrix. The new probability matrix after each iterate is: w * estimated matrix + (1-w) * matrix after previous iterate (default value: 0.5) d.w=dw # dw :the weighted average distances between aggregated lists and input lists.(default value: 1 for all input lists) **************************************************** **********Run TopKCEMC ****************** **************************************************** 1.Compile the C code for use in Linux or Mac OS X. It can be done with the following command: R CMD SHLIB TopK.c Windows users can use the included TopK.dll directly. 2.Start the R program 3.Change the R's default working directory to your directory that includes all files of this program, data files and parameter file. if you are under Unix or Linux and you save all these files in "/home/", you can type >setwd("/home/TopKCEMC/") if you are under Windows and you save all these files in "C:\TopKCEMC" under Windows, you can type >setwd("C:/TopKCEMC/") 4.load TopKCEMC >source("TopKCEMC.R") if you are under Unix or Linux, you can type >dyn.load("TopK.so") #Note:if TopK.so does not work in your machine(Linux or Unix), you can exit R and type R CMD SHLIB TopK.c then start R in step 1, and step 2,3. if you are under Windows, you can type >dyn.load("TopK.dll") #Note:if TopK.dll does not work in your machine(windows), you need to compile Topk.c into TopK.dll again using window C complier, such as Visual C, Borland C, etc 5.Call the program by >ExcCEMC(para="para.txt") We recommend that you run the program using the sample input data and parameter files and make sure your output matches the sample output "out.txt" See the example of parameters file---"para.txt"