AboutDRETools is a command-line software package for studying differential RNA editing.
DRETools allows users to:
- Calculate units to reduce sample-bias (similar to FPKM for RNA expression).
- Find differential editing among editing sites and editing islands.
- Find editing islands (i.e., clusters of editing sites).
- Merge editing sites from multiple samples.
- Calculate useful statistics regarding editing sample, gene, and site-wise.
InstallDRETools has two main dependencies Python 3.5+ and R 3.4+. Once these are satisfied DRETools can be easily installed from source code avalible here or on Unix-based systems using the following command:
For more information on using pip see the PiPy website or the DRETools PyPi page.
pip3 install dretools
Note: Currently there seems to be an issue when installing pysam on MacOS. If you encounter the following error try install install Pysam 0.13 using the following command.
pip3 install pysam==0.13
ExamplesThis section contain examples of DRETools usage. Full documentation can be found of the DRETools wiki. Furthermore, the example files used here are included in the site repository and can be tested locally by simply cloning this repository. Finally, note that the example files only contain a very small region due due to size constraints, therefore, they will behave in the same way as full RNA-seq samples.
Cloning the repositoryTo clone the repo simply open a command-line terminal and run the following commands.
This will download a local copy of the repository. Once finished and DRETools is installed simply run the following commands in the repository folder.
git init git clone email@example.com:dretools/dretools.bitbucket.io.git
Viewing help menusAn overview of operations offered by DRETools can be found by typing dretools, dretools --help, or dretools -h. Similary help menues are avalible for each operations by typing dretools operation --help or dretools -h.
dretools --help usage: dretools [-h] operation Operations Available: Units sample-epk - Calculate the editing-per-kilobase (EPK) for a sample. region-epk - Calculate the editing-per-kilobase (EPK) for transcriptomic regions. edsite-epk - Calculate the editing-per-kilobase (EPK) for editing sites. Differential Editing region-diff - Test for differentially edited transcriptomic regions. edsite-diff - Test for differentially edited editing sites. Stats sample-stats - Calculate sample-wise information about the editing in a set of samples. region-stats - Calculate region-wise information about the editing within a sample. edsite-stats - Calculate site-wise information about the editing within a sample. Merge find-islands - Find editing islands within one or more files containing editing sites. edsite-merge - Merge two or more files containing editing sites. optional arguments: -h, --help show this help message and exit
Merge editing sites from multiple samplesConsenus sets of editing sites are an important for calculating units to reduce sample-bias and find differntially edited sites. DRETools offers a the "edsite-merge" function to allow users to create their own consuses sets of editing sites. The "edsite-merge" function offers a varetiy of options that allow users to adjust the strigency required for bases to be included in the consenus, such as the minimum coverage, minimum number of edited bases per site, and the minimum number of samples a site should be detected in for inclusion into the consenus. Together these can help to weed-out false positives.
Note: When calculating EPK we recomend using a large consenus set such as the references avalible from RADAR, REDIportal, and DARNED. Additionally the consenus set used in the DRETools manuscript can be found here. Unlike the other resources it uses GRCh38 coordinates.
dretools edsite-merge \ --min-editing 3 \ --min-coverage 5 \ --min-samples 3 \ --vcf VCF/*.vcf > OUT/consensus_sites.vcf head -n 5 OUT/consensus_sites.vcf #chromosome position id ref alt qual fil sample_cnt 7 38723601 . T C . . 3 19 3648300 . T C . . 4 2 128192922 . A G . . 6 19 3647846 . T C . . 3
Calculate sample-wise EPKWhen searching for differentially edited sites or transcriptomic regions (i.e. editing islands) we first need to find the the rate of editing within the sample. EPK is an effective unit for describing the rate editing within a sample, while reducing sample-biases, such as library size. Note that is is important to use the consesus set of editing sites in this calculation as we want to consider sites with coverage but no editing in addition to edited sites.
dretools sample-epk \ --name SRR3091828 \ --vcf OUT/consensus_sites.vcf \ --alignment BAM/SRR3091828.ex.bam \ > SRR3091828.sample_epk.tsv cat SRR3091828.sample_epk.tsv #Sample_Name Editable_Area Average_Depth Total_Ref_Bases Total_Alt_Bases EPK SRR3091828 34 8 140 120 857.1428571
Calculate site-wise EPKThe next step in finding differentially edited sites is calculating the EPK of all sites within a sample. Note that is is important to use the consesus set of editing sites in this calculation.
dretools edsite-epk \ --vcf OUT/consensus_sites.vcf \ --alignment BAM/SRR3091828.ex.bam \ > SRR3091828.edsite_epk.tsv head -n 5 SRR3091828.edsite_epk.tsv #Name Area Depth Ref_Bases Alt_Bases EPK 19:56379264 1 9 3 6 2000.0 6:149725422 1 8 2 6 3000.0 3:40537447 0 0 0 0 0 19:3648221 0 0 0 0 0
Find differentially edited editing sitesOnce the global-editing-rate and site-wise editing intestity is determined differential editing detection can proceed. This can done by passing sample-wise and site-wise EPKs to the edsite-diff function. Note, we recommend a COV less that 0.5 and coverge above 5 for normal differential editing detection.
dretools edsite-diff \ --max-depth-cov 5.0 \ --min-depth 2 \ --names scrRNA,siRNA \ --sites OUT/consensus_sites.vcf \ --sample-epk \ EPK/SRR3091828.sample_epk.tsv,EPK/SRR3091829.sample_epk.tsv,EPK/SRR3091830.sample_epk.tsv \ EPK/SRR3091831.sample_epk.tsv,EPK/SRR3091832.sample_epk.tsv,EPK/SRR3091833.sample_epk.tsv \ --site-epk \ EPK/SRR3091828.edsite_epk.tsv,EPK/SRR3091829.edsite_epk.tsv,EPK/SRR3091830.edsite_epk.tsv \ EPK/SRR3091831.edsite_epk.tsv,EPK/SRR3091832.edsite_epk.tsv,EPK/SRR3091833.edsite_epk.tsv \ > diff_sites.tsv cat diff_sites.tsv #Group_1_Name Group_2_Name Record_Name Group_1_Mean Group_2_Mean LM_pvalue ttest_pvalue scrRNA siRNA 19:56379264 2055.56 469.44 0.4762386 0.0017602 scrRNA siRNA 6:149725422 2585.86 609.26 0.2510187 0.0659989 scrRNA siRNA 6:149725408 555.56 185.61 0.901597 0.0050499 scrRNA siRNA 2:128192967 1654.76 191.9 0.1133313 0.0891759
Detect editing islandsIn our previously published tool RNAEditor (10.1093/bib/bbw087), we notices cluster of editing sites (editing islands) and include an algorithm to detect these clusters. DRETool also includes an implementation of this algorithm, allowing users to find editing islands.
dretools find-islands \ --min-editing 3 \ --min-coverage 5 \ --min-length 20 \ --min-points 5 \ --epsilon 50 \ --vcf VCF/*.vcf > OUT/islands.bed head -n 5 OUT/islands.bed #Chromosome Start End ID Score Strand Length Number_of_Sites Density 2 128192920 128193033 ub5Kpx5pky615pmsdq4PiQ . + 113 7 0.06195 7 38723547 38723651 kluKgmv4-KKrQsRMYX_iCA . - 104 6 0.05769 6 149725405 149725553 kt31w8egfUS3NGIlr1dTeg . - 148 10 0.06757 19 3648149 3648239 SUz1q7g3dUYHVjlhFK4ZAg . - 90 8 0.08889
Calculate Island-EPKOnce editing islands are detected, the editing-intestity with each islands can be measured in EPK with the function region-epk.
dretools region-epk \ --vcf OUT/consensus_sites.vcf \ --regions OUT/islands.bed \ --alignment BAM/SRR3091828.ex.bam > SRR3091828.island_epk.tsv head -n 5 SRR3091828.island_EPK.tsv #Sample_Name Editable_Area Average_Depth Total_Ref_Bases Total_Alt_Bases EPK k6PgarZdBaNmffVr3vuaTw 4 7 20 9 450.0 ThhHLUHdPtIw-S_A5Jlgog 3 5 8 8 1000.0 vUGirWK-net3lUdIqm6Mtw 4 10 27 14 518.5185185 TSPIk46dhUGs3ALkJDPBXw 3 8 11 12 1090.9090909
Find differentially edited islandsFinally, once we have calculate the region EPK we can find differntially edited editing islands.
dretools region-diff \ --regions OUT/islands.bed \ --min-area 1 \ --min-depth 1 \ --names scrRNA,siRNA \ --sample-epk \ EPK/SRR3091828.sample_epk.tsv,EPK/SRR3091829.sample_epk.tsv,EPK/SRR3091830.sample_epk.tsv \ EPK/SRR3091831.sample_epk.tsv,EPK/SRR3091832.sample_epk.tsv,EPK/SRR3091833.sample_epk.tsv \ --region-epk \ EPK/SRR3091828.island_epk.tsv,EPK/SRR3091829.island_epk.tsv,EPK/SRR3091830.island_epk.tsv \ EPK/SRR3091831.island_epk.tsv,EPK/SRR3091832.island_epk.tsv,EPK/SRR3091833.island_epk.tsv \ > huvec_diff_islands.tsv head -n 5 huvec_diff_islands.tsv #Group_1 Group_2 Record_Name Group_1_Mean Group_2_Mean LM_pvalue ttest_pvalue scrRNA siRNA k6PgarZdBaNmffVr3vuaTw 2:128192921-128193032 982.72 219.28 0.9763355 0.0474272 scrRNA siRNA ThhHLUHdPtIw-S_A5Jlgog 19:3648150-3648238 1579.41 913.73 0.7009463 0.3207373 scrRNA siRNA vUGirWK-net3lUdIqm6Mtw 7:38723548-38723650 1029.32 180.94 0.330851 0.0479502 scrRNA siRNA TSPIk46dhUGs3ALkJDPBXw 6:149725406-149725552 997.62 281.94 0.121698 0.0067977