This file contains instructions for running a R function (in a R
package) - miRComp to combine the prediction results of three miR
targets prediction algorithms  miRanda, TargetScan and PicTar.


****************************************************
***********Download and Run miRComp******************
****************************************************

Download and unzip files into your working directory, for example,
C:\miRComp. This directory will contain miRComp.R (source R code),
mR-miR-375.txt, TS-mIR-375.txt, PT-miR-375.txt (sample data files)
and EnsemblHugoRefSeq-Sep08.txt (default input dataset). Then, start
R program, after the R prompt (">" as usual), type (without >)

>setwd("C:/miRComp") 
>source("miRComp.R")

This will cause R to read and parse the source file. There are many
ways to call the function miRComp and input files. For example,
you can type

>miRComp("mR-miR-375.txt","TS-miR-375.txt","PT-miR-375.txt")

to combine three input files

or you can do a two-file combination

>miRComp("mR-miR-375.txt","TS-miR-375.txt") 
>miRComp("mR-miR-375.txt", ,"PT-miR-375.txt")

or you can designate the input files and ignore the order

>miRComp(TargetScanfile="TS-miR-375.txt",PicTarfile="PT-miR-375.txt",miRandafile="mR-miR-375.txt")

Error information will be given if there is only one input file

>miRComp("mR-miR-375.txt") 
Error in miRComp("mR-miR-375.txt") : There must be at least two input files

The above commands call the function miRComp to combine the input
files and output results to the same directory.


****************************************************
*******************Input Files**********************
****************************************************


The miRComp function takes the output from three miR target prediction
algorithms as its input. In the above examples, mR-miR-375.txt is
the output of miRanda, TS-miR-375.txt is the output of TargetScan and
PT-miR-375.txt is the output of PicTar. EnsemblHugoRefSeq-Sep08.txt
was downloaded from Ensembl BioMart (www.ensembl.org/Multi/martview)
and is taken as a default input. It provides a match (or conversion)
between different gene identifers (Ensembl gene ids, Ensembl transcipt
ids, HUGO ids and RefSeq ids). However, not all gene identifiers in
this dataset have corresponding names in other identifiers. In this
case, the information from targets prediction files were exploited
to complement this deficency.

Each of these three miR target prediction
algorithms has user-friendly web interface (miRanda:
http://cbio.mskcc.org/cgi-bin/mirnaviewer/mirnaviewer.pl,
TargetScan: http://genes.mit.edu/targetscan.test/ucsc.html, PicTar:
http://pictar.bio.nyu.edu/cgi-bin/PicTar_vertebrate.cgi).  To obtain
target prediction output, you need input the miR name (miRanda) or
select from the drop-down menu (TargetScan and PicTar). The output
is given as an excel format (miRanda) or a html format (TargetScan
and PicTar). It is recommended that you save the excel file as a Tab
delimited text file (miRanda) or cut and paste the html content into
EXCEL and save as a Tab delimited text file (TargetScan and PicTar). No
column name is needed. For TargetScan output, due to the last column's
long text content, it is recommended that you delete this column in
case of possible reading errors.



****************************************************
**********************Output Files*******************
*****************************************************


------------------------------ 
result-statistics
------------------------------

result-statistics records some simple facts of
combinaton results.  A typical result-statistics is as follows
-----------------------------------------------------------------------------------
34 lines read from file 'mR-miR-375.txt' ( 34 Ensembl Transcript ids )

161 lines read from file 'TS-miR-375.txt' ( 161 HUGO ids )

miRanda predicted 23 genes ( HUGO ids, including 4 genes without HUGO
ids )

TargetScan predicted 161 genes ( HUGO ids )

miRanda and TargetScan predicted 3 common genes ( HUGO ids )

miRanda and TargetScan predicted a
combination of 181 genes ( HUGO ids )
-------------------------------------------------------------------------------------


------------------------------ 
combinedtargets
------------------------------

combinedtargets is the main output file. The column names and their
meanings are

1. microRNA: the name of microRNA of interest 
2. HUGO: HUGO ids used as matching identifiers for different gene names. 
   As the matching identifier, it is unique, but it could be blank since 
   some Ensembl ids don't correspond to any HUGO ids.  
3. miRanda: Ensembl gene ids (ENSG) denoting targets predicted by miRanda 
   algorithm. NA means not predicted by miRanda.  
4. ENST.id: Ensembl transcript ids (ENST) denoting targets predicted by 
   miRanda algorithm. Several Ensembl transcript ids may correspond to 
   one Ensembl gene id. NA means not predicted by miRanda.  
5. rank.mr: ranks of target genes predicted by miRanda. A gene with 
   higher rank (for example, 1) means more possibly it is the target of 
   microRNA of interest. The rank of an unpredicted gene is assigned to 
   be the lowest rank of all predicted genes plus 1.  
6. pvalue.mr: a "pvalue" computed from rank.mr. The pvalue of an 
   unpredicted gene is assigned to be 1.
7. TargetScan: HUGO ids denoting targets predicted by TargetScan
   algorithm. NA means not predicted by TargetScan.  
8. rank.ts: ranks of target genes predicted by TargetScan.  
9. pvalue.ts: "Estimated false discovery rate" according to TargetScan.  
10.PicTar: RefSeq ids denoting targets predicted by PicTar algorithm. 
   NA means not predicted by PicTar.  
11. rank.pt: ranks of target genes predicted by PicTar.  
12. pvalue.pt: a "pvalue" computed from rank.mr.
13. mR: Is this gene predicted by miRanda? (1: Yes; 0: No) 
14. TS: Is this gene predicted by TargetScan? (1: Yes; 0: No) 
15. PT: Is this gene predicted by PicTar? (1: Yes; 0: No) 
16. mRmTS: Is this gene predicted by both miRanda and TargetScan? 
    (1: Yes; 0: No) 
17. mRmPT: Is this gene predicted by both miRanda and PicTar? (1: Yes; 0: No)
18. TSmPT: Is this gene predicted by both TargetScan and PicTar? (1:Yes; 0: No) 19. match: how many algorithms predict this target? (1, 2, or 3?)  
20. avepvalue: This gene's "average" pvalue by combining pavlues from 
    three algorithms.  21. averank: This gene's "average" rank, averaged by 
    ranks from three algorithms.

------------------------------- 
plot.png
-------------------------------

plot.png contains three plots, ratio plot, common targets plot and
correlation plot. These plots provide a measure of consistency of
different algorithms. If two algorithms both predicted n targets
according to their respective ranks, how many are common targets? What
is the ratio of number of common targets to n? What is the correlation
of ranks? checking these numbers' variation to n could give us a way to
"shortlist" the targets.


------------------------------------------------------------------
mRHugo TSHugo PTHugo mRmTSHugo mRmPTHugo TSmPTHugo AllthreeHugo
------------------------------------------------------------------
These files contains the HUGO ids of (combination of) algorithms and
could be used as input files for further analysis.


****************************************************
*********************References**********************
*****************************************************
Jin Zhou, Shili Lin, Vince Melfi, Joe Verducci (2006). 
Composite MicroRNA Target Predictions and Comparisons of Several 
Prediction Algorithms. 
MBI Technical Report No. 
(http://mbi.osu.edu/publications/pub2006.html)






