About
DRETools is a command-line software package for studying differential RNA editing.DRETools allows users to:
- Calculate units to reduce sample-bias (similar to FPKM for RNA expression).
- Find differential editing among editing sites and editing islands.
- Find editing islands (i.e., clusters of editing sites).
- Merge editing sites from multiple samples.
- Calculate useful statistics regarding editing sample, gene, and site-wise.
Install
DRETools has two main dependencies Python 3.5+ and R 3.4+. Once these are satisfied DRETools can be easily installed from source code avalible here or on Unix-based systems using the following command:pip3 install dretools
For more information on using pip see the
PiPy website or the DRETools PyPi page.Note: Currently there seems to be an issue when installing pysam on MacOS. If you encounter the following error try install install Pysam 0.13 using the following command.
pip3 install pysam==0.13
Examples
This section contain examples of DRETools usage. Full documentation can be found of the DRETools wiki. Furthermore, the example files used here are included in the site repository and can be tested locally by simply cloning this repository. Finally, note that the example files only contain a very small region due due to size constraints, therefore, they will behave in the same way as full RNA-seq samples.Cloning the repository
To clone the repo simply open a command-line terminal and run the following commands.git init
git clone git@bitbucket.org:dretools/dretools.bitbucket.io.git
This will download a local copy of the repository. Once finished and DRETools is installed simply run the following commands in the repository folder.
Viewing help menus
An overview of operations offered by DRETools can be found by typing dretools, dretools --help, or dretools -h. Similary help menues are avalible for each operations by typing dretools operation --help or dretools -h.
dretools --help
usage: dretools [-h] operation
Operations Available:
Units
sample-epk - Calculate the editing-per-kilobase (EPK) for a sample.
region-epk - Calculate the editing-per-kilobase (EPK) for transcriptomic regions.
edsite-epk - Calculate the editing-per-kilobase (EPK) for editing sites.
Differential Editing
region-diff - Test for differentially edited transcriptomic regions.
edsite-diff - Test for differentially edited editing sites.
Stats
sample-stats - Calculate sample-wise information about the editing in a set of samples.
region-stats - Calculate region-wise information about the editing within a sample.
edsite-stats - Calculate site-wise information about the editing within a sample.
Merge
find-islands - Find editing islands within one or more files containing editing sites.
edsite-merge - Merge two or more files containing editing sites.
optional arguments:
-h, --help show this help message and exit
Merge editing sites from multiple samples
Consenus sets of editing sites are an important for calculating units to reduce sample-bias and find differntially edited sites. DRETools offers a the "edsite-merge" function to allow users to create their own consuses sets of editing sites. The "edsite-merge" function offers a varetiy of options that allow users to adjust the strigency required for bases to be included in the consenus, such as the minimum coverage, minimum number of edited bases per site, and the minimum number of samples a site should be detected in for inclusion into the consenus. Together these can help to weed-out false positives.Note: When calculating EPK we recomend using a large consenus set such as the references avalible from RADAR, REDIportal, and DARNED. Additionally the consenus set used in the DRETools manuscript can be found here. Unlike the other resources it uses GRCh38 coordinates.
dretools edsite-merge \
--min-editing 3 \
--min-coverage 5 \
--min-samples 3 \
--vcf VCF/*.vcf > OUT/consensus_sites.vcf
head -n 5 OUT/consensus_sites.vcf
#chromosome position id ref alt qual fil sample_cnt
7 38723601 . T C . . 3
19 3648300 . T C . . 4
2 128192922 . A G . . 6
19 3647846 . T C . . 3
Calculate sample-wise EPK
When searching for differentially edited sites or transcriptomic regions (i.e. editing islands) we first need to find the the rate of editing within the sample. EPK is an effective unit for describing the rate editing within a sample, while reducing sample-biases, such as library size. Note that is is important to use the consesus set of editing sites in this calculation as we want to consider sites with coverage but no editing in addition to edited sites.
dretools sample-epk \
--name SRR3091828 \
--vcf OUT/consensus_sites.vcf \
--alignment BAM/SRR3091828.ex.bam \
> SRR3091828.sample_epk.tsv
cat SRR3091828.sample_epk.tsv
#Sample_Name Editable_Area Average_Depth Total_Ref_Bases Total_Alt_Bases EPK
SRR3091828 34 8 140 120 857.1428571
Calculate site-wise EPK
The next step in finding differentially edited sites is calculating the EPK of all sites within a sample. Note that is is important to use the consesus set of editing sites in this calculation.
dretools edsite-epk \
--vcf OUT/consensus_sites.vcf \
--alignment BAM/SRR3091828.ex.bam \
> SRR3091828.edsite_epk.tsv
head -n 5 SRR3091828.edsite_epk.tsv
#Name Area Depth Ref_Bases Alt_Bases EPK
19:56379264 1 9 3 6 2000.0
6:149725422 1 8 2 6 3000.0
3:40537447 0 0 0 0 0
19:3648221 0 0 0 0 0
Find differentially edited editing sites
Once the global-editing-rate and site-wise editing intestity is determined differential editing detection can proceed. This can done by passing sample-wise and site-wise EPKs to the edsite-diff function. Note, we recommend a COV less that 0.5 and coverge above 5 for normal differential editing detection.
dretools edsite-diff \
--max-depth-cov 5.0 \
--min-depth 2 \
--names scrRNA,siRNA \
--sites OUT/consensus_sites.vcf \
--sample-epk \
EPK/SRR3091828.sample_epk.tsv,EPK/SRR3091829.sample_epk.tsv,EPK/SRR3091830.sample_epk.tsv \
EPK/SRR3091831.sample_epk.tsv,EPK/SRR3091832.sample_epk.tsv,EPK/SRR3091833.sample_epk.tsv \
--site-epk \
EPK/SRR3091828.edsite_epk.tsv,EPK/SRR3091829.edsite_epk.tsv,EPK/SRR3091830.edsite_epk.tsv \
EPK/SRR3091831.edsite_epk.tsv,EPK/SRR3091832.edsite_epk.tsv,EPK/SRR3091833.edsite_epk.tsv \
> diff_sites.tsv
cat diff_sites.tsv
#Group_1_Name Group_2_Name Record_Name Group_1_Mean Group_2_Mean LM_pvalue ttest_pvalue
scrRNA siRNA 19:56379264 2055.56 469.44 0.4762386 0.0017602
scrRNA siRNA 6:149725422 2585.86 609.26 0.2510187 0.0659989
scrRNA siRNA 6:149725408 555.56 185.61 0.901597 0.0050499
scrRNA siRNA 2:128192967 1654.76 191.9 0.1133313 0.0891759
Detect editing islands
In our previously published tool RNAEditor (10.1093/bib/bbw087), we notices cluster of editing sites (editing islands) and include an algorithm to detect these clusters. DRETool also includes an implementation of this algorithm, allowing users to find editing islands.
dretools find-islands \
--min-editing 3 \
--min-coverage 5 \
--min-length 20 \
--min-points 5 \
--epsilon 50 \
--vcf VCF/*.vcf > OUT/islands.bed
head -n 5 OUT/islands.bed
#Chromosome Start End ID Score Strand Length Number_of_Sites Density
2 128192920 128193033 ub5Kpx5pky615pmsdq4PiQ . + 113 7 0.06195
7 38723547 38723651 kluKgmv4-KKrQsRMYX_iCA . - 104 6 0.05769
6 149725405 149725553 kt31w8egfUS3NGIlr1dTeg . - 148 10 0.06757
19 3648149 3648239 SUz1q7g3dUYHVjlhFK4ZAg . - 90 8 0.08889
Calculate Island-EPK
Once editing islands are detected, the editing-intestity with each islands can be measured in EPK with the function region-epk.
dretools region-epk \
--vcf OUT/consensus_sites.vcf \
--regions OUT/islands.bed \
--alignment BAM/SRR3091828.ex.bam > SRR3091828.island_epk.tsv
head -n 5 SRR3091828.island_EPK.tsv
#Sample_Name Editable_Area Average_Depth Total_Ref_Bases Total_Alt_Bases EPK
k6PgarZdBaNmffVr3vuaTw 4 7 20 9 450.0
ThhHLUHdPtIw-S_A5Jlgog 3 5 8 8 1000.0
vUGirWK-net3lUdIqm6Mtw 4 10 27 14 518.5185185
TSPIk46dhUGs3ALkJDPBXw 3 8 11 12 1090.9090909
Find differentially edited islands
Finally, once we have calculate the region EPK we can find differntially edited editing islands.
dretools region-diff \
--regions OUT/islands.bed \
--min-area 1 \
--min-depth 1 \
--names scrRNA,siRNA \
--sample-epk \
EPK/SRR3091828.sample_epk.tsv,EPK/SRR3091829.sample_epk.tsv,EPK/SRR3091830.sample_epk.tsv \
EPK/SRR3091831.sample_epk.tsv,EPK/SRR3091832.sample_epk.tsv,EPK/SRR3091833.sample_epk.tsv \
--region-epk \
EPK/SRR3091828.island_epk.tsv,EPK/SRR3091829.island_epk.tsv,EPK/SRR3091830.island_epk.tsv \
EPK/SRR3091831.island_epk.tsv,EPK/SRR3091832.island_epk.tsv,EPK/SRR3091833.island_epk.tsv \
> huvec_diff_islands.tsv
head -n 5 huvec_diff_islands.tsv
#Group_1 Group_2 Record_Name Group_1_Mean Group_2_Mean LM_pvalue ttest_pvalue
scrRNA siRNA k6PgarZdBaNmffVr3vuaTw 2:128192921-128193032 982.72 219.28 0.9763355 0.0474272
scrRNA siRNA ThhHLUHdPtIw-S_A5Jlgog 19:3648150-3648238 1579.41 913.73 0.7009463 0.3207373
scrRNA siRNA vUGirWK-net3lUdIqm6Mtw 7:38723548-38723650 1029.32 180.94 0.330851 0.0479502
scrRNA siRNA TSPIk46dhUGs3ALkJDPBXw 6:149725406-149725552 997.62 281.94 0.121698 0.0067977