Preparing your Script for CRdata
- Remove all code relating to control of graphics devices (e.g. dev.new, dev.off, windows, X11).
- If your code refers to other R scripts (e.g. functions), include the relevant code within the submitted script.
- Do not attempt to install packages within your code. CRdata automatically loads and updates all R and Bioconductor packages. However, some package installations or updates may fail due to dependencies etc. If you find that a package you wish to use is not already installed on CRdata, you can install the package by following Adding new R packages to CRdata.
- Ensure that all functions called within your script are either part of an installed package, or included within the script.
- If you intend to make your script available to Guest Users, make sure your script does not attempt to save any internal variables.
- Files in the Guest User directory may be deleted without notice.
- Registered users can save interval variables (R objects) for later use (e.g. in a data processing pipeline). - CRdata's free processor queue currently only runs Unix processing nodes. Windows-specific R commands (e.g. to read Excel worksheets) will not work on CRdata.
-You can purchase EC2 Windows Instances, configure them as CRdata R worker nodes, and attach them to your private CRdata queues.
- If there is enough demand (and funding!) we will be happy to add Windows nodes to our free service.
Inputting data in CRdata
Input to a CRdata script can come in two ways:
- Upload data files to be used by the script
- Specify parameters that can be changed in runtime during execution of the script
The script developer must specify the input variables. Do not assign hard-coded values to the variables that are expected to be changed dynamically by the user of the script. The dynamic values of variables in the script can be one of the following data types:
- String
- Dataset (a file)
- Integer
- Float (floating point or real value)
- Enumeration (user of the script picks one of integer values in a range)
- Boolean (TRUE or FALSE)
- List (user of the script picks one of the values in a comma-separated list of possible strings, integers or floating point)
The names of variables in the script MUST match corresponding names you provide in the CRdata graphical interface when uploading the script.
Outputting results in CRdata
Standard R scripts that one would run on a desktop computer must be adapted by embedding a series of CRdata tags in the R script. These tags direct CRdata to produce web-based HTML output results. This page provides guidance about the available CRdata tags, and how to instrument R scripts with the tags to display the desired output. Internally CRdata tags are used to generate HTML output using the R2HTML package.
The CRdata tag library
All CRdata script output commands must be enclosed within special Tag pairs. Each tag must start on a new line. The goal of the CRdata tagging system is to be non-intrusive, and enable you to run your R scripts in both your desktop environment as well as in CRdata. In order to achieve this objective, all CRdata tags are prefixed with # (comments in R), and hence can be ignored when running the same script on your desktop. As a general rule, CRdata tags are only used to specify and format HTML output for your R scripts.
crdata_title - can be used in your script in order display a header in your HTML output. The syntax is:
#<crdata_title>your_text</crdata_title>
crdata_text - can used to display some arbitrary text in your HTML output. The syntax is:
#<crdata_text>some_text</crdata_text>
crdata_object - can be used to output the values of R variables such as a tables (actually, anything that has some value which you would like to list in the HTML report). The elements of variables output in this way can be integer, string, floating-point/real, boolean, etc. If 'some_object' is a variable in your internal code variable, the syntax to output its value is:
#<crdata_object>some_object</crdata_object>
crdata_section - can be used to draw a horizontal separator line in the HTML output, this is similar in function to <hr> in standard HTML. The syntax is:
#<crdata_section/>
crdata_empty_line - can be used to put empty space or a line break in the HTML output. This is similar in function to <br> in standard HTML. The syntax is:
#<crdata_empty_line/>
crdata_image - can be used to enclose any drawing or graphical outputs. You must precede each new graphical output with the #<crdata_image> tag and end it with #</crdata_image>. In addition, you need to put each graphics command inside a print() statement. For example, the commands "plot(x)" followed by "lines(x)" become:
#<crdata_image>
print(plot(x))
print(lines(x))
#</crdata_image>
Saving R script output files for pipeline processing
CRData allows scripts to save output files so that output from one script can be input to another script as a dataset. This involves 2 main steps:
- (A) A script can write intermediate results files to CRdata
- (B) Another script can read the output generated by step (A) as an input dataset file
For instance, lets say you have a script like following:
TFs <- read.table(coTFs, sep="\t", header=TRUE)
#<crdata_object>TFs</crdata_object>
write.table(TFs,file="List_Data.txt")
The above script when run as a job outputs the file List_Data.txt. Importantly, note how the object "TFs" is declared using CRdata tags before the write.table() command. CRData automatically uploads all the files output by your script to your private data folder. Before uploading, the files are renamed with a prefix to identify the job that output the data. For instance if the job ID was 587, then List_Data.txt is renamed as job-0000000587-List-Data.txt.
Note the filename is prefixed with "job-
- jpg, png, gif, html, htm, js, css, pdf, log, r, rb, java, php, py, pyc
After running the above script, if you go to your Private Files in the Data tab, you see the output file listed as above. Now the file can be loaded into another script as a 'normal' dataset input file (the user will choose this file at run time, so the name change shouldn't matter).
Adding new R packages to CRdata
CRdata automatically updates its R and Bioconductor libraries regularly. The only packages not installed and updated are those that have complex dependencies on non-R libraries and cannot be auto installed (or packages that perform interactive graphics and other functions not compatible with our web-service). So you should find all packages compatible with CRdata already pre-installed and up-to-date. In rare cases, the author of a new package not yet publicly available may wish to use the package with CRdata.
CRData enables users to upload new R packages. You have to create an R Script and - at the time of submitting the script to CRdata - tag it with the reserved tag, 'INSTALL SCRIPT'. Once you submit a job with that script you'll notice that the job status is set to PENDING. The site admin will receive an email notification about this job. The site admin will test the job and will either approve and disapprove it. If approved, then the Job becomes DONE (with a successful flag), if not it's FAILED. In either case, the script owner will receive an email notification about it. See below for an illustration of a sample installation script that assumes CRData doesn't have the R package "packageName" pre-installed, and attempts to install the library, and test it.
# Not Run
install.packages(packageName) # packageName is the name of the package being installed
library(packageName)
# call the package and test that it installed correctly
aPackageCommand() # sample command from the package to check it executes correctly
Illustrated examples for CRdata instrumented code
Note any of the following scripts are runnable code provided to developers in order to illustrate how to use CRdata and provide guidance through working examples. A developer can copy the code given in each example, and try running them in CRdata enviroment. These scripts can also be run in your local R environment without any side effects as long there is a mechanism in place to input any required input data. Hopefully these examples will provide a better understanding of how to embed such tags in your own scripts for CRdata.
Example: Output title and some arbitrary text in HTML output, no input parameters
#<crdata_title>CRdata Output Title 1</crdata_title>
#<crdata_text>Arbitrary Text 1</crdata_text>
#<crdata_title>CRdata Output Title 2</crdata_title>
#<crdata_text>Arbitrary Text 2</crdata_text>
Example: Output HTML title, some arbitrary text with some formatting such as empty space and horizontal line separator, no input parameters
#<crdata_title>Title 01</crdata_title>
#<crdata_text>Arbitrary Text 01</crdata_text>
#<crdata_title>Title 02</crdata_title>
#<crdata_text>An empty line should follow</crdata_text>
#<crdata_empty_line/>
#<crdata_text>Arbitrary Text 03</crdata_text>
#<crdata_section/>
#<crdata_text>Arbitrary Text 04</crdata_text>
Example: Output HTML title, arbitrary text with formatting, and object values such as arrays using crdata_object tag. Note that when run on your local desktop all lines prefixed by # are ignored and treated as comments by your R interpreter. In the example below values of d00 and d01 (which are arrays) are output in HTML format. This script requires no input parameters.
#<crdata_title>CRdata Output Example 1</crdata_title>
#<crdata_text>Numbers 0 thru 5</crdata_text>
d00 = c(0,1,2,3,4,5)
#<crdata_object>d00</crdata_object>
#<crdata_title>CRdata Output example 2</crdata_title>
#<crdata_text>An empty line should follow</crdata_text>
#<crdata_empty_line/>
#<crdata_text>Numbers 6 thru 10</crdata_text>
d01 = c(6,7,8,9,10)
#<crdata_object>d01</crdata_object>
#<crdata_section/>
#<crdata_text>Arbitrary Text 04</crdata_text>
Example: Outputs some graphical plots and summary of dataframes and matrices. Note that when run on your local desktop all code prefixed by # are ignored and treated as comments by your R interpreter. The example below, when run on local desktop, would show the graphical output in a console, whereas when run in CRdata environment would generate images as files that are then placed into an HTML/web page. This is a self contained example that doesn't require any input parameters.
#<crdata_title>This is a CRdata run report</crdata_title>
tmp <- as.data.frame(matrix(rnorm(100),ncol=10))
#<crdata_object>tmp</crdata_object>
#<crdata_section/>
#<crdata_image caption="First Graph">
print(plot(tmp))
#</crdata_imaget>
#<crdata_section/>
#<crdata_image caption="Second Graph">print(plot(tmp))
#</crdata_imaget>
#<crdata_section/>
#<crdata_image caption="Third Graph">print(plot(tmp))</crdata_imaget>
#next: same plot without a title
#<crdata_image>print(plot(tmp))</crdata_imaget>
Example: Another graphical plot output example. This is a self contained example that doesn't require any input parameters.
#<crdata_text>Print of an R vector</crdata_text>
x = c(1,4,9,15,27)
#<crdata_object>x</crdata_object>
#<crdata_text>Plot of an R vector</crdata_text>
#<crdata_image>
print(plot(x))
print(lines(x))
#</crdata_image>
Example: Integer input parameter example. This example assumes that the variable aNumber is specified in input form as INTEGER/NUMBER by the developer when uploading the script. It also assumes that the user of the script, inputs value for aNumber variable when running the script using CRdata. In order to run the script locally, you must somehow input the value of aNumber variable. Otherwise R interpreter will throw an error:
#<crdata_text>Arbitrary Text 00</crdata_text>
z <- rnorm(aNumber)
#<crdata_object>z</crdata_object>
#<crdata_text>Arbitrary Text 01</crdata_text>
Example: String input parameter example. This example assumes that the variable aLabel is specified as an input by the developer when uploading the script. It also assumes that the user of the script, inputs a value for aLabel variable when running the script using CRdata. In order to run the script locally, you must somehow input the value of aLabel variable otherwise R interpreter will throw an error:
#<crdata_text>Arbitrary Text 00</crdata_text>
x = c(0,1,2,3,4,5)
#<crdata_object>x</crdata_object>
#<crdata_image>
plot(x, ylab=aLabel)
#</crdata_image>
#<crdata_text>Arbitrary Text 01</crdata_text>
Example: Multiple string input parameters. This example assumes that the variables aLabel, anAnimal are specified in input form by the developer when uploading the script. It also assumes that the user of the script, inputs value for aLabel, anAnimal variables when running the script using CRdata. In order to run the script locally, you must somehow input the value of aLabel, anAnimal variables otherwise R interpreter will throw an error:
#<crdata_text>Arbitrary Text 00</crdata_text>
x = c(0,1,2,3,4,5)
#<crdata_object>x</crdata_object>
z1 = c("The",anAnimal,"runs.")
#<crdata_object>z1</crdata_object>
#<crdata_text>Arbitrary Text 10</crdata_text>
#<crdata_image>
plot(x, ylab=aLabel)
#</crdata_image>
#<crdata_text>Arbitrary Text 01</crdata_text>
Example: Switch input parameter. This example assumes that variable inputName is specified as a LIST type with the comma separated values: "Hamid", "Michael", and "Rajiv". The user of the script will have to choose one of the possible values for inputName parameter when executing the script. In order to run the script locally, you must somehow specify the desired value of inputName parameter.
x = switch(inputName,
Hamid = "Hello Hamid",
Michael = "Hello Michael",
Rajiv = "Hello Rajiv"
)
#<crdata_object>x</crdata_object>
Example: Floating point input parameter. This example assumes that the variable realInput is specified as a real or floating point type by the developer when uploading the script. It also assumes that the user of the script specifies the runtime value for realInput. In order to run the script locally, you must somehow specify the desired value of realInput parameter.
FloatOut <- realInput*1
#<crdata_object>FloatOut</crdata_object>
Example: Reading input from a remote URL rather than a data file. Adapted example from http://www.mayin.org/ajayshah/KB/R/html/g1.html
datafilename="http://personality-project.org/R/datasets/psychometrics.prob2.txt"
dataset =read.table(datafilename,header=TRUE) #read the data file
names(dataset) #what are the variables?
dataset=dataset[,-1] #get rid of the ID
n = names(dataset) #check the names again
#<crdata_object>n</crdata_object>
r = round(cor(dataset),2) #find the correlation matrix
#<crdata_object>r</crdata_object>
dataset=scale(dataset) #convert to standardized scores
#<crdata_object>dataset</crdata_object>
dataset=data.frame(dataset)
#<crdata_object>dataset</crdata_object>
Example: Error handling in CRdata. In this example, the variable zzz isn't initialized, so printing it even in your local R environment would produce errors. Likewise in CRdata, except the error message is output to the HTML log file.
#<crdata_title>This is brand new CRDATA run report</crdata_title>
tmp <- as.data.frame(matrix(rnorm(100),ncol=10))
summary(tmp)
#<crdata_object>tmp</crdata_object>
#<crdata_section/>
#<crdata_image caption="First Graph">
plot(tmp)
#</crdata_image>
#<crdata_image caption="Second Graph">plot(tmp)
#</crdata_image>
print(zzz)
#<crdata_image caption="Third Graph">plot(tmp)</crdata_image>
#<crdata_image>plot(tmp)</crdata_image>