About

DRETools is a command-line software package for studying differential RNA editing.

DRETools allows users to:
  • Calculate units to reduce sample-bias (similar to FPKM for RNA expression).
  • Find differential editing among editing sites and editing islands.
  • Find editing islands (i.e., clusters of editing sites).
  • Merge editing sites from multiple samples.
  • Calculate useful statistics regarding editing sample, gene, and site-wise.

Install

DRETools has two main dependencies Python 3.5+ and R 3.4+. Once these are satisfied DRETools can be easily installed from source code avalible here or on Unix-based systems using the following command:

pip3 install dretools
For more information on using pip see the PiPy website or the DRETools PyPi page.

Note: Currently there seems to be an issue when installing pysam on MacOS. If you encounter the following error try install install Pysam 0.13 using the following command.
pip3 install pysam==0.13

Examples

This section contain examples of DRETools usage. Full documentation can be found of the DRETools wiki. Furthermore, the example files used here are included in the site repository and can be tested locally by simply cloning this repository. Finally, note that the example files only contain a very small region due due to size constraints, therefore, they will behave in the same way as full RNA-seq samples.

Cloning the repository

To clone the repo simply open a command-line terminal and run the following commands.
git init
git clone git@bitbucket.org:dretools/dretools.bitbucket.io.git
This will download a local copy of the repository. Once finished and DRETools is installed simply run the following commands in the repository folder.

Viewing help menus

An overview of operations offered by DRETools can be found by typing dretools, dretools --help, or dretools -h. Similary help menues are avalible for each operations by typing dretools operation --help or dretools -h.

dretools --help

usage: dretools [-h] operation

Operations Available:
    Units
        sample-epk   - Calculate the editing-per-kilobase (EPK) for a sample.
        region-epk   - Calculate the editing-per-kilobase (EPK) for transcriptomic regions.
        edsite-epk   - Calculate the editing-per-kilobase (EPK) for editing sites.
    Differential Editing
        region-diff  - Test for differentially edited transcriptomic regions.
        edsite-diff  - Test for differentially edited editing sites.
    Stats
        sample-stats - Calculate sample-wise information about the editing in a set of samples.
        region-stats - Calculate region-wise information about the editing within a sample.
        edsite-stats - Calculate site-wise information about the editing within a sample.
    Merge
        find-islands - Find editing islands within one or more files containing editing sites.
        edsite-merge - Merge two or more files containing editing sites.

optional arguments:
  -h, --help  show this help message and exit

Merge editing sites from multiple samples

Consenus sets of editing sites are an important for calculating units to reduce sample-bias and find differntially edited sites. DRETools offers a the "edsite-merge" function to allow users to create their own consuses sets of editing sites. The "edsite-merge" function offers a varetiy of options that allow users to adjust the strigency required for bases to be included in the consenus, such as the minimum coverage, minimum number of edited bases per site, and the minimum number of samples a site should be detected in for inclusion into the consenus. Together these can help to weed-out false positives.

Note: When calculating EPK we recomend using a large consenus set such as the references avalible from RADAR, REDIportal, and DARNED. Additionally the consenus set used in the DRETools manuscript can be found here. Unlike the other resources it uses GRCh38 coordinates.

dretools edsite-merge \
    --min-editing  3  \
    --min-coverage 5  \
    --min-samples  3  \
    --vcf VCF/*.vcf > OUT/consensus_sites.vcf

head -n 5 OUT/consensus_sites.vcf
#chromosome	position	id	ref	alt	qual	fil	sample_cnt
7	38723601	.	T	C	.	.	3
19	3648300	.	T	C	.	.	4
2	128192922	.	A	G	.	.	6
19	3647846	.	T	C	.	.	3

Calculate sample-wise EPK

When searching for differentially edited sites or transcriptomic regions (i.e. editing islands) we first need to find the the rate of editing within the sample. EPK is an effective unit for describing the rate editing within a sample, while reducing sample-biases, such as library size. Note that is is important to use the consesus set of editing sites in this calculation as we want to consider sites with coverage but no editing in addition to edited sites.

dretools sample-epk                   \
    --name SRR3091828                 \
    --vcf OUT/consensus_sites.vcf         \
    --alignment BAM/SRR3091828.ex.bam \
    > SRR3091828.sample_epk.tsv

cat SRR3091828.sample_epk.tsv
#Sample_Name	Editable_Area	Average_Depth	Total_Ref_Bases	Total_Alt_Bases	EPK
SRR3091828	34	8	140	120	857.1428571

Calculate site-wise EPK

The next step in finding differentially edited sites is calculating the EPK of all sites within a sample. Note that is is important to use the consesus set of editing sites in this calculation.

dretools edsite-epk                    \
    --vcf OUT/consensus_sites.vcf      \
    --alignment BAM/SRR3091828.ex.bam  \
    > SRR3091828.edsite_epk.tsv

head -n 5 SRR3091828.edsite_epk.tsv
#Name	Area	Depth	Ref_Bases	Alt_Bases	EPK
19:56379264	1	9	3	6	2000.0
6:149725422	1	8	2	6	3000.0
3:40537447	0	0	0	0	0
19:3648221	0	0	0	0	0

Find differentially edited editing sites

Once the global-editing-rate and site-wise editing intestity is determined differential editing detection can proceed. This can done by passing sample-wise and site-wise EPKs to the edsite-diff function. Note, we recommend a COV less that 0.5 and coverge above 5 for normal differential editing detection.

dretools edsite-diff                \
    --max-depth-cov 5.0             \
    --min-depth 2                   \
    --names scrRNA,siRNA            \
    --sites OUT/consensus_sites.vcf \
    --sample-epk                    \
    EPK/SRR3091828.sample_epk.tsv,EPK/SRR3091829.sample_epk.tsv,EPK/SRR3091830.sample_epk.tsv \
    EPK/SRR3091831.sample_epk.tsv,EPK/SRR3091832.sample_epk.tsv,EPK/SRR3091833.sample_epk.tsv \
    --site-epk                                                                    \
    EPK/SRR3091828.edsite_epk.tsv,EPK/SRR3091829.edsite_epk.tsv,EPK/SRR3091830.edsite_epk.tsv \
    EPK/SRR3091831.edsite_epk.tsv,EPK/SRR3091832.edsite_epk.tsv,EPK/SRR3091833.edsite_epk.tsv \
    > diff_sites.tsv

cat diff_sites.tsv
#Group_1_Name	Group_2_Name	Record_Name	Group_1_Mean	Group_2_Mean	LM_pvalue   ttest_pvalue
scrRNA	siRNA	19:56379264	2055.56	469.44	0.4762386	0.0017602
scrRNA	siRNA	6:149725422	2585.86	609.26	0.2510187	0.0659989
scrRNA	siRNA	6:149725408	555.56	185.61	0.901597	0.0050499
scrRNA	siRNA	2:128192967	1654.76	191.9	0.1133313	0.0891759

Detect editing islands

In our previously published tool RNAEditor (10.1093/bib/bbw087), we notices cluster of editing sites (editing islands) and include an algorithm to detect these clusters. DRETool also includes an implementation of this algorithm, allowing users to find editing islands.

dretools find-islands \
    --min-editing 3   \
    --min-coverage 5  \
    --min-length 20   \
    --min-points 5    \
    --epsilon 50      \
    --vcf VCF/*.vcf > OUT/islands.bed

head -n 5 OUT/islands.bed
#Chromosome Start   End ID  Score   Strand  Length  Number_of_Sites Density
2  128192920 128193033 ub5Kpx5pky615pmsdq4PiQ . + 113 7  0.06195
7  38723547  38723651  kluKgmv4-KKrQsRMYX_iCA . - 104 6  0.05769
6  149725405 149725553 kt31w8egfUS3NGIlr1dTeg . - 148 10 0.06757
19 3648149   3648239   SUz1q7g3dUYHVjlhFK4ZAg . - 90  8  0.08889

Calculate Island-EPK

Once editing islands are detected, the editing-intestity with each islands can be measured in EPK with the function region-epk.

dretools region-epk                \
    --vcf OUT/consensus_sites.vcf  \
    --regions OUT/islands.bed      \
    --alignment BAM/SRR3091828.ex.bam  > SRR3091828.island_epk.tsv

head -n 5 SRR3091828.island_EPK.tsv
#Sample_Name	Editable_Area	Average_Depth	Total_Ref_Bases	Total_Alt_Bases	EPK
k6PgarZdBaNmffVr3vuaTw	4	7	20	9	450.0
ThhHLUHdPtIw-S_A5Jlgog	3	5	8	8	1000.0
vUGirWK-net3lUdIqm6Mtw	4	10	27	14	518.5185185
TSPIk46dhUGs3ALkJDPBXw	3	8	11	12	1090.9090909

Find differentially edited islands

Finally, once we have calculate the region EPK we can find differntially edited editing islands.

dretools region-diff           \
    --regions  OUT/islands.bed \
    --min-area  1              \
    --min-depth 1              \
    --names scrRNA,siRNA                                                                      \
    --sample-epk                                                                              \
    EPK/SRR3091828.sample_epk.tsv,EPK/SRR3091829.sample_epk.tsv,EPK/SRR3091830.sample_epk.tsv \
    EPK/SRR3091831.sample_epk.tsv,EPK/SRR3091832.sample_epk.tsv,EPK/SRR3091833.sample_epk.tsv \
    --region-epk                                                                              \
    EPK/SRR3091828.island_epk.tsv,EPK/SRR3091829.island_epk.tsv,EPK/SRR3091830.island_epk.tsv \
    EPK/SRR3091831.island_epk.tsv,EPK/SRR3091832.island_epk.tsv,EPK/SRR3091833.island_epk.tsv \
    > huvec_diff_islands.tsv

head -n 5 huvec_diff_islands.tsv
#Group_1	Group_2	Record_Name	Group_1_Mean	Group_2_Mean	LM_pvalue  ttest_pvalue
scrRNA siRNA k6PgarZdBaNmffVr3vuaTw 2:128192921-128193032 982.72  219.28 0.9763355 0.0474272
scrRNA siRNA ThhHLUHdPtIw-S_A5Jlgog 19:3648150-3648238    1579.41 913.73 0.7009463 0.3207373
scrRNA siRNA vUGirWK-net3lUdIqm6Mtw 7:38723548-38723650   1029.32 180.94 0.330851  0.0479502
scrRNA siRNA TSPIk46dhUGs3ALkJDPBXw 6:149725406-149725552 997.62  281.94 0.121698  0.0067977