results

This module handles everything about the final analysis and results output.

results.generateGraphs(sras)

Generates all the graphs that are configured to run in the main.config file.

It reads all the h5 files and produces the required graphs.

Args:
sras (list): List of tuples with the following data:
  • A list of the paths for the input run (one file if single-end, two if paired-end)

  • Run type. “single” if single-end, “paired” if paired-end

  • Run ID

results.mergeCalling(sras)

Merges the results from REDItools2, JACUSA, SnpEff and bcftools mpileup, and generates the final CSV files and the graphs.

If configured to execute REDItools2, it reads the REDItools2 VCF and merges it with the SnpEff REDItools2 VCF file by the SNV position, so the SNVs found by REDItools2 that were not found by SnpEff are discarded. Then, it filters the SNVs by the minimum SNV coverage and the minimum read support and saves those results in a different dataframe. These are saved into an CSV file named reditools.csv.

If configured to execute JACUSA2, it reads the JACUSA VCF and merges it with the SnpEff JACUSA VCF file by the SNV position, so the SNVs found by JACUSA that were not found by SnpEff are discarded. Then, it filters the SNVs by the minimum SNV coverage and the minimum read support and saves those results in a different dataframe. These are saved into an CSV file named jacusa.csv.

Finally, if both tools were executed, both filtered dataframes are merged by the SNV position, so that only the SNVs common to both outputs remain. These are saved into an CSV file named runCommon.csv. If only one tool was executed, then its respective output dataframe is also saved as runCommon.csv.

These dataframes are also serialized as jacusa.h5, reditools.h5 and common.h5.

The JACUSA, REDItools2 and common dataframes are concatenated by dataframe type to the dataframes of the other runs. In the end, there will be a global JACUSA dataframe, a global REDItools2 datafram and a global common dataframe.

The JACUSA dataframe is saved into an CSV file named globalJacusa.csv and jacusa.h5. Same thing happens with REDItools2 (globalReditools.csv and reditools.h5) and common (globalCommon.csv and common.h5) dataframes.

All this process is repeated for every reference viral genome. The output files are found in the 6-visualization directory.

Args:
sras (list): List of tuples with the following data:
  • A list of the paths for the input run (one file if single-end, two if paired-end)

  • Run type. “single” if single-end, “paired” if paired-end

  • Run ID

results.runSnpEff(sras)

Runs SnpEff on the VCF files resulting from JACUSA and REDItools2.

First, it creates a configuration SnpEff file, adding the database names for every reference genome. Then, it creates those databases. Finally, each VCF file is analyzed against each database and outputs a VCF file per run per tool. These files’ names end with .snpeff.jacusa.vcf and .snpeff.reditools.vcf, and are found in the 5-snpeff directory.

Args:
sras (list): List of tuples with the following data:
  • A list of the paths for the input run (one file if single-end, two if paired-end)

  • Run type. “single” if single-end, “paired” if paired-end

  • Run ID