Introduction

Introduction

Introduction

@author: naveen duhan

Read pyseqrna configuration

Quality Submodules

pyseqrna.quality_check module

Title:This modules contains read quality check function
Created:May 19, 2021
Author:Naveen Duhan
pyseqrna.quality_check.fastqcRun(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, pairedEND=False, afterTrim=False, outDir=None, dep='')

This function perform fastqc quality using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

Parameters:
  • sampleDict – Samples dictionary generated by pyseqrna_utils.read_input_file function.
  • configFile – Parameters file for FastQC tool. Default from pyseqrna params
  • slurm – True to enable slurm job scheduling on HPC
  • mem – Memory if slurm True
  • cpu – Number of threads to use if slurm True.
  • task – Number of tasks per cpu if slurm True.
  • pairedEND – True if samples are paired
  • afterTrim – True if checking quality after trimming.
  • outDir – Output directory for results. Default is current working directory.
  • dep – Slurm job dependency.

pyseqrna.quality_trimming module

Title:This modules contains read quality check functions for pySeqRNA
Created:May 20, 2021
Author:Naveen Duhan
pyseqrna.quality_trimming.flexbarRun(sampleDict, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')

This function is to perform adapter and quality based trimming of reads using flexbar trimming tool (https://github.com/seqan/flexbar)

Parameters:
  • samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
  • configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
  • slurm – True if using slurm to schedule jobs.
  • mem – Provide memory in GB to use. Default 20 GB.
  • tasks – Number of cpu-tasks to run. Defaults to 1.
  • cpu – Total number of threads to use. Default 8.
  • pairedEND – True if samples are paired.
  • outDir – Output directory for results. Default is current working directory.
  • dep – slurm job id on which this job depends. Defaults to ‘’.
pyseqrna.quality_trimming.trim_galoreRun(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')

This function is to perform adapter and quality based trimming of reads using trmmomatic trimming tool

Parameters:
  • samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
  • configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
  • slurm – True if using slurm to schedule jobs.
  • mem – Provide memory in GB to use. Default 20 GB.
  • tasks – Number of cpu-tasks to run. Defaults to 1.
  • cpu – Total number of threads to use. Default 8.
  • pairedEND – True if samples are paired.
  • outDir – Output directory for results. Default is current working directory.
  • dep – slurm job id on which this job depends. Defaults to ‘’.
pyseqrna.quality_trimming.trimmomaticRun(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')

This function is to perform adapter and quality based trimming of reads using trmmomatic trimming tool

Parameters:
  • samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
  • configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
  • slurm – True if using slurm to schedule jobs.
  • mem – Provide memory in GB to use. Default 20 GB.
  • tasks – Number of cpu-tasks to run. Defaults to 1.
  • cpu – Total number of threads to use. Default 8.
  • pairedEND – True if samples are paired.
  • outDir – Output directory for results. Default is current working directory.
  • dep – slurm job id on which this job depends. Defaults to ‘’.

pyseqrna.ribosomal module

Title:This script contains sortMeRNA function for removing ribosomal RNA from reads.
Created:July 31, 2021
Author:Naveen Duhan
pyseqrna.ribosomal.sortmernaRun(sampleDict=None, outDir='.', rnaDatabases=None, pairedEND=False, slurm=False, mem=10, cpu=8, task=1, dep='')

This function execute sortMeRNA to remove ribosomal RNA from fastq reads.

Parameters:
  • sampleDict – Sample dictionary containing sample information
  • outDir – Output directory. Defaults to present working directory.
  • pairedEND – True if samples are paired-end. Defaults to False
  • slurm – True if SLURM scheduling is available. Defaults to False
  • mem – Memory in GB. Defaults to 10
  • cpu – Total number of CPU to use per task. Defaults to 8
  • task – Number of tasks per job. Defaults to 1
  • dep – Slurm job id if depends on other job. Defaults to ‘’

Alignment Submodules

pyseqrna.aligners module

Title:This modules contains read align class functions for pySeqRNA

:Created : July 21, 2021

:Author : Naveen Duhan

class pyseqrna.aligners.STAR_Aligner(genome=None, configFile=None, outDir=None, slurm=False)

Bases: object

Class for STAR alignment program

Parameters:
  • configFile – Path to STAR config file. This file will used to get the parameters for STAR alignment program
  • slurm – To run commands with slurm task-scheduler.
build_index(mem=20, tasks=1, cpu=8, gff=None, dep='')

This function build geneome index for read alignment

Parameters:
  • mem – Provide memory in GB to use. Defaults to 20.
  • tasks – Number of cpu-tasks to run. Defaults to 1.
  • gff – Gene feature file to index with genome. Defaults to None.
  • cpu – Total number of threads to use. Default 8.
  • dep – slurm job id on which this job depends. Defaults to ‘’.
check_index()

Function to check if star index is valid and exists.

Returns:Return true if genome index is valid.
run_Alignment(target=None, pairedEND=False, mem=20, cpu=8, tasks=1, dep='')

This function align reads against indexed reference genome.

Parameters:
  • target – target dictionary containing sample information.
  • pairedEND – True if samples are paired.
  • mem – Provide memory in GB to use. Default 20 GB.
  • tasks – Number of cpu-tasks to run. Defaults to 1.
  • cpu – Total number of threads to use. Default 8.
  • dep – slurm job id on which this job depends. Defaults to ‘’.
class pyseqrna.aligners.hisat2_Aligner(genome=None, configFile=None, outDir='pySeqRNA_results', slurm=False)

Bases: object

Class for HISAT2 alignment program

Parameters:
  • configFile – Path to HISAT2 config file. This file will used to get the parameters for HISAT2 alignment program.
  • slurm – To run commands with slurm task-scheduler.
build_index(mem=8, tasks=1, cpu=8, dep='')

This function build geneome index for read alignment

Parameters:
  • mem – Provide memory in GB to use. Defaults to 20.
  • tasks – Number of cpu-tasks to run. Defaults to 1.
  • gff – Gene feature file to index with genome. Defaults to None.
  • cpu – Total number of threads to use. Default 8.
  • dep – slurm job id on which this job depends. Defaults to ‘’.
check_index(largeIndex=False)

Function to check if star index is valid and exists.

Param:True if genome indexed with large index.
Returns:Return true if genome index is valid.
run_Alignment(target=None, pairedEND=False, mem=20, cpu=8, tasks=1, dep='')

This function align reads against indexed reference genome.

Parameters:
  • target – target dictionary containing sample information.
  • pairedEND – True if samples are paired.
  • mem – Provide memory in GB to use. Default 20 GB.
  • tasks – Number of cpu-tasks to run. Defaults to 1.
  • cpu – Total number of threads to use. Default 8.
  • dep – slurm job id on which this job depends. Defaults to ‘’.

pyseqrna.pyseqrna_stats module

Title:This module generate read alignment statistics
Created:Spetember 10, 2021
Author:Naveen Duhan
pyseqrna.pyseqrna_stats.align_stats(sampleDict=None, trimDict=None, bamDict=None, riboDict=None, pairedEND=False)

This function calculates the alignment statistics

Parameters:
  • sampleDict – Raw Reads sample dictionary containing all samples
  • trinDict – Dictionary containing trimmed samples
  • bamDict – Dictionary containing all samples bam files.
  • riboDict – Dictionary containing filtered reads.
Returns:

DataFrame

Return type:

A DataFrame containg alignment statistics

Quantification Submodules

pyseqrna.quantification module

Title:This modules contains feature counts in aligned reads for pySeqRNA
Created:July 29, 2021
Author:Naveen Duhan
pyseqrna.quantification.featureCount(configFile=None, bamDict=None, gff=None, slurm=False, mem=8, cpu=8, tasks=1, outDir='.', dep='')

This function counts feature in the aligned BAM files using featureCounts tool.

Parameters:
  • configFile – Paramters file for featureCounts.
  • bamDict – A dictionary containing all the aligned BAM files.
  • gff – Gene feature file GFF/GTF
  • slurm – True if SLURM scheduling is available. Defaults to False
  • mem – Memory in GB. Defaults to 10
  • cpu – Total number of CPU to use per task. Defaults to 8
  • task – Number of tasks per job. Defaults to 1
  • outDir – Output directory. Defaults to present working directory.
  • dep – Slurm job id if depends on other job. Defaults to ‘’
Returns:

DataFrame

Return type:

A DataFrame containing read counts per feature per sample.

pyseqrna.quantification.htseqCount(configFile=None, bamDict=None, gff=None, slurm=False, mem=8, cpu=8, tasks=1, outDir='.', dep='')

This function counts feature in the aligned BAM files using HTSeq.

Parameters:
  • configFile – Paramters file for featureCounts.
  • bamDict – A dictionary containing all the aligned BAM files.
  • gff – Gene feature file GFF/GTF
  • slurm – True if SLURM scheduling is available. Defaults to False
  • mem – Memory in GB. Defaults to 10
  • cpu – Total number of CPU to use per task. Defaults to 8
  • task – Number of tasks per job. Defaults to 1
  • outDir – Output directory. Defaults to present working directory.
  • dep – Slurm job id if depends on other job. Defaults to ‘’
Returns:

DataFrame

Return type:

A DataFrame containing read counts per feature per sample.

pyseqrna.multimapped_groups module

Title:This module count multimapped read groups in aligned files.
Created:June 5, 2022
Author:Naveen Duhan
pyseqrna.multimapped_groups.countMMG(sampleDict=None, bamDict=None, gff=None, feature='gene', minCount=100, percentSample=0.5)

This function calculates multimapped gene groups.

Parameters:
  • sampleDict – a dictionary containing samples information.
  • bamDict – a dictionary containing BAM files.
  • gff – gene feature file.
  • feature – feature type.
  • minCount – minimum number of reads per sample.
  • percentSample – minimum number of reads in percent sample.
Returns:

DataFrame

Return type:

A DataFrame containing multimapped read groups counts.

pyseqrna.normalize_counts module

Title:This module converts raw read counts to normalize counts
Created:August 3, 2021
Author:Naveen Duhan
class pyseqrna.normalize_counts.Normalization(countFile=None, featureFile=None, typeFile='GFF', keyType='ncbi', attribute='ID', feature='gene', geneColumn='Gene')

Bases: object

This class is for calculation of normalized counts from raw counts

CPM(plot=True, figsize=(20, 10))

This function convert counts to counts per million (CPM)

Parameters:
  • plot – True if to plot log raw counts and log CPM counts on boxplot.
  • figsize – Figure size.
RPKM(plot=True, figsize=(20, 10))

This function convert counts to reads per killobase per million (RPKM)

Parameters:
  • plot – True if to plot log raw counts and log RPKM counts on boxplot.
  • figsize – Figure size.
TPM(plot=True, figsize=(20, 10))

This function convert counts to reads per killobase per million

Parameters:
  • plot – True if to plot log raw counts and log TPM counts on boxplot.
  • figsize – Figure size.
meanRatioCount(plot=True, figsize=(20, 10))

This function convert counts to medianRatio count

Parameters:
  • plot – True if to plot log raw counts and log TPM counts on boxplot.
  • figsize – Figure size.
pyseqrna.normalize_counts.boxplot(data=None, countType=None, figsize=(20, 10), **kwargs)

This function make a boxplot with boxes colored according to the countType they belong to

Parameters:
  • data – List containg log data of raw and normalize read counts.
  • countType – Columns data of raw and normalize read counts.
  • figsize – Figure size.
  • kwargs – Other optional arguments for boxplot.

pyseqrna.clustering module

Title:This module function generate a similarity dendrogram between samples

:Created : August 2, 2021

:Author : Naveen Duhan

pyseqrna.clustering.clusterSample(countDF=None)

Function to cluster samples based on similarity

Parameters:countDF – A dataframe of read counts or normalized read counts.
Returns:A clustered dendrogram of samples
pyseqrna.clustering.leaf_label(temp)

Funtion to generate leaf label for dendrogram

Parameters:temp – Temp leaves arrangment for dendrogram
Returns:Return leaf labels for dendrogram.

Differential Genes Submodules

pyseqrna.differential_expression module

Title:This module finds differentially expressed genes from raw read counts
Created:October 22, 2021
Author:Naveen Duhan
class pyseqrna.differential_expression.Gene_Description(species, type, combinations=None, degFile=None, filtered=True)

Bases: object

This class fetch gene name and description for genes.

add_names()

This function add gene name and description in DEGs.

Returns:DataFrame
Return type:DEGs file with gene name and description.
add_names_annotation(file)

This function add gene name and description in functional annotation.

Returns:DataFrame
Return type:Functional annotation file with gene name and description.
pyseqrna.differential_expression.degFilter(degDF=None, CompareList=None, FDR=0.05, FOLD=2, plot=True, figsize=(10, 6), replicate=True, mmg=False, extraColumns=False)

This function filter all gene expression file based on given FOLD and FDR

Parameters:
  • degDF – A datafram containing all gene differantial expression in all combinations.
  • CompareList – A list of all the sample comparison.
  • FDR – False Discovery Rate for filtering DEGs. Defaults to 0.05.
  • FOLD – Fold change value. The log2 of the value will be calculated. Defaults to 2.
  • plot – True if want to plot DEGs per sample on barplot. Defaults to True.
pyseqrna.differential_expression.runDESeq2(countDF=None, targetFile=None, design='sample', combination=None, gene_column='Gene', mmg=False, subset=False, lib=None)

This function is a wrapper to DESeq2 package in R for differeantial expression analysis from raw read counts.

Parameters:
  • countFile – Raw read count file.
  • targetFile – Tab-delimited target file with replication and sample name
  • design – [description]. Defaults to None.
  • combination – Comparison list contaning samples to compare.
  • gene_column – First column in raw read count file. Defaults to ‘Gene’.
  • mmg – True if raw read counts are from multimapped gene groups.
  • subset – If runDESeq2 subset raw read count according to comparison.
  • lib – library path of DESeq2 to use.
Returns:

DataFrame

Return type:

A datafram containing all gene differantial expression in all combinations.

pyseqrna.differential_expression.run_edgeR(countDF=None, targetFile=None, combination=None, gene_column='Gene', mmg=False, subset=False, replicate=True, bcv=0.4, lib=None)

This function is a wrapper to edgeR package in R for differeantial expression analysis from raw read counts.

Parameters:
  • countFile – Raw read count file.
  • targetFile – Tab-delimited target file with replication and sample name
  • design – [description]. Defaults to None.
  • combination – Comparison list contaning samples to compare.
  • gene_column – First column in raw read count file. Defaults to ‘Gene’.
  • mmg – True if raw read counts are from multimapped gene groups.
  • subset – If runDESeq2 subset raw read count according to comparison.
  • replicate – False if there are no replicates.
  • bcv – Biological coefficient of variation if there are no replicate.
  • lib – library path of DESeq2 to use.
Returns:

DataFrame

Return type:

A datafram containing all gene differantial expression in all combinations.

Functional Annotation Submodules

pyseqrna.gene_ontology module

Title:gene_ontology module is for performing gene ontology enrichment analysis of differentially expressed genes
Created:January 5, 2022
Author:Naveen Duhan
class pyseqrna.gene_ontology.GeneOntology(species=None, type=None, keyType='ensembl', taxid=None, gff=None)

Bases: object

This class is for Gene Ontology enrichment

Parameters:
  • species – Species name. Ex. for Arabidopsis thaliana it is athaliana
  • type – Species is from plants or animals.
  • keyType – Genes are from NCBI or ENSEMBL. Default is ENSEMBL.
  • taxid – Taxonomy ID if keyType is NCBI.
  • gff – Gene feature file.
barplotGO(df=None, nrows=20, colorBy='logPvalues')

This function creates a barplot for Gene Ontology enrichment.

Parameters:
  • df – Gene Ontology enrichment file from enrichGO function.
  • nrows – Number of rows to plot. Default to 20 rows.
  • colorBy – Color bar on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:

a barplot

dotplotGO(df=None, nrows=20, colorBy='logPvalues')

This function creates a dotplot for Gene Ontology enrichment.

Parameters:
  • df – Gene Ontology enrichment file from enrichGO function.
  • nrows – Number of rows to plot. Default to 20 rows.
  • colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:

a dotplot

enrichGO(file=None, pvalueCutoff=0.05, plot=True, plotType='dotplot', nrows=20, colorBy='logPvalues')

This function performs Gene Ontology enrichment of DEGs.

Parameters:
  • file – Differentially expressed genes in a sample.
  • pvalueCutoff – P-value cutoff for enrichment. Default is 0.05.
  • plot – True if a plot is needed. Default is True.
  • plotType – Gene Ontology enrichment visualization on dotplot/barplot. Default is dotplot.
  • nrows – Number of rows to plot. Default to 20 rows.
  • colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:

a dictionary

Rtype results:

Gene Ontology enrichment results.

Rtype plot:

a dotplot/barplot

pyseqrna.pathway module

Title:This module is for performing KEGG pathway enrichment analysis of differentially expressed genes
Created:January 10, 2022
Author:Naveen Duhan
class pyseqrna.pathway.Pathway(species=None, keyType=None, gff=None)

Bases: object

This class is for KEGG Pathway enrichment

Parameters:
  • species – Species name. Ex. for Arabidopsis thaliana it is athaliana
  • keyType – Genes are from NCBI or ENSEMBL. Default is ENSEMBL.
  • gff – Gene feature file.
barplotKEGG(df=None, nrows=20, colorBy='logPvalues')

This function creates a barplot for KEGG pathway enrichment.

Parameters:
  • df – KEGG pathway enrichment file from enrichKEGG function.
  • nrows – Number of rows to plot. Default to 20 rows.
  • colorBy – Color bar on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:

a barplot

dotplotKEGG(df=None, nrows=20, colorBy='logPvalues')

This function creates a dotplot for KEGG pathway enrichment.

Parameters:
  • df – KEGG pathway enrichment file from enrichKEGG function.
  • nrows – Number of rows to plot. Default to 20 rows.
  • colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:

a dotplot

enrichKEGG(file, pvalueCutoff=0.05, plot=True, plotType='dotplot', nrows=20, colorBy='logPvalues')

This function performs KEGG pathway enrichment of DEGs.

Parameters:
  • file – Differentially expressed genes in a sample.
  • pvalueCutoff – P-value cutoff for enrichment. Default is 0.05.
  • plot – True if a plot is needed. Default is True.
  • plotType – KEGG pathway enrichment visualization on dotplot/barplot. Default is dotplot.
  • nrows – Number of rows to plot. Default to 20 rows.
  • colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:

a dictionary

Rtype results:

KEGG pathway enrichment results.

Rtype plot:

a dotplot/barplot

Utility Submodules

pyseqrna.arg_parser module

Title:Argument parser module for pySeqRNA
Created:October 11, 2021
Author:Naveen Duhan

pyseqrna.pyseqrna_utils module

Title:This modules contains utility functions for pySeqRNA

:Created : July 11, 2021

:Author : Naveen Duhan

pyseqrna.pyseqrna_utils.PyseqrnaLogger(mode, log)

This function intialize logger in the pySeqRNA modules

Parameters:
  • mode – Logger name for the module
  • log – File name for logging
pyseqrna.pyseqrna_utils.add_MMG(degDF=None, anotDF=None, combination=None)
pyseqrna.pyseqrna_utils.change_attribute(args)

This function changes the attribute for feature counts based on GFF or GTF file.

pyseqrna.pyseqrna_utils.change_ids(df, file)

This function changes ids to other ids.

Parameters:
  • df – A DataFrame containing IDs and synonym IDs.
  • file – A DataFrame in which IDs needs to be replaced.
pyseqrna.pyseqrna_utils.check_files(*args)

This function check if files exist

Parameters:args – List of files to check.
Returns:True or False.
Return type:Retrun true only if all files in list exists in a directory.
pyseqrna.pyseqrna_utils.check_path(*args)

This function check if directory exist

Parameters:args – List of directory to check.
Returns:True or False
Return type:Retrun true only if all files in list exists in a directory.
pyseqrna.pyseqrna_utils.check_status(job_id)

This function is check status of slurm job

Parameters:job_id – slurm job id
Returns:True/False
Return type:If job completed return True. Default False.
pyseqrna.pyseqrna_utils.clusterRun(job_name='pyseqRNA', sout=' pyseqrna', serror='pyseqrna', command='command', time=4, mem=10, cpu=8, tasks=1, dep='')

This function is for submitting job on cluster with SLURM job Scheduling

Parameters:
  • job_name – Slurm job name on HPC. Defaults to ‘pyseqRNA’.
  • command – Command to excute on HPC.
  • time – Slurm Job time allotment.
  • mem – Memory to use in GB.
  • cpu – Number of CPUs to use for the job.
  • tasks – Number of tasks to execute.
  • dep – Slurm Job dependency. Defaults to ‘’.
Returns:

rtype:Slurm sbatch ID

pyseqrna.pyseqrna_utils.findFiles(searchPATH=None, searchPattern=None, recursive=False, verbose=False)

This function find searches files.

Parameters:
  • searchPATH – Search directory.
  • pattern – Pattern to search.
  • recursive – True if want to search recursively. Defaults to False.
  • verbose – True if want to print output. Defaults to False.
pyseqrna.pyseqrna_utils.getFiles(pattern, path)

This function searches all files containing patterns in a directory.

Parameters:
  • pattern – Patteren to search.
  • path – A direcory path.
Returns:

All files containing a pattern.

pyseqrna.pyseqrna_utils.getGenes(file=None, combinations=None, multisheet=True, geneType='all', outDir='.', mmg=False)

This function extract genes from filtered differentiall expressed genes.

Parameters:
  • file – Filtered DEGs file.
  • combinations – Comparison list contaning samples.
  • multisheet – True if file is multisheet.
  • outDir – Output directory. Default is current working directory.
GeneType:

Genes to extract all, up , down. Default is all.

pyseqrna.pyseqrna_utils.get_basename(filePATH)

This function get the base name of the file from full path

Parameters:filePATH – Path to file.
pyseqrna.pyseqrna_utils.get_cpu()

This function get actual CPU count of the system

Returns:Integer
Return type:int with 80 % of CPU count
pyseqrna.pyseqrna_utils.get_directory(filePATH)

This function retrun directory of a file

Parameters:filePATH – Path to file.
pyseqrna.pyseqrna_utils.get_file_extension(filePATH)

This function return the extension of file

Parameters:filePATH – Path to file.
pyseqrna.pyseqrna_utils.get_parent(filePATH)

This function return the file name without extension

Parameters:filePATH – Path to file.
pyseqrna.pyseqrna_utils.make_directory(dir)

This function create a directory

Parameters:dir – Directory name.
Returns:Name of created directory.
pyseqrna.pyseqrna_utils.parse_config_file(infile)

This function parse the config file for all the programs used in pySeqRNA

Param:configFile: <program>.ini config file containing arguments.
Retrun:Program specific arguments
Return type:a dictionary
pyseqrna.pyseqrna_utils.parse_gff(file)

This function parse a gene feature file in a dataframe for gene IDs

Parameters:file – A gene feature file.
pyseqrna.pyseqrna_utils.read_input_file(infile, inpath, paired=False)

This function reads input sample file and convert into a dictionary. It also make all possible combination for DEG analysis. Target dataframe for differential analysis.

Parameters:
  • inputFile – input sample file containing the infromation about project
  • inputPath – Path for input fastq files
  • pairedEND – Check if reads are paired end]. Defaults to False
Returns:

samples, combinations and targets for differential expression

Return type:

A dictionary

pyseqrna.pyseqrna_utils.replace_cpu(args, args2)

This function replace the actual CPU in config file.

Returns:Change CPU count to 80% of available CPU

pyseqrna.version module

Title:This is version of pySeqRNA
Created:September 15, 2021
Author:naveen duhan

pyseqrna.pyseqrna_plots module

Title:This module function generate visualization

:Created : November 2, 2021

:Author : Naveen Duhan

pyseqrna.pyseqrna_plots.plotHeatmap(degDF=None, combinations=None, num=50, figdim=(12, 10), extraColumns=False, type='counts')

This function plots a heatmap based on FOLD change or counts.

Parameters:
  • degDF – All gene expression file or Counts file.
  • combinations – All sample combinations.
  • num – Total number of genes to plot. Default is 50 (25 up and 25 down)
  • figdim – Figure dimensions.
  • type – Heatmap to create on counts/degs.
pyseqrna.pyseqrna_plots.plotMA(degDF=None, countDF=None, comp=None, FOLD=2, FDR=0.05, color=('red', 'grey', 'green'), dim=(8, 5), dotsize=8, markerType='o', alpha=0.5)

This function plots a MA plot.

Parameters:
  • degDF – All gene expression file.
  • comp – Sample comparison.
  • FOLD – FOLD change. Defaults to 2.
  • FDR – FDR value. Defaults to 0.05.
  • color – Colors to be used in plot. Defaults to (‘red’,’grey’,’green’).
  • dim – Dimensions of the plot. Defaults to (8,5).
  • dotsize – Dotsize on plot. Defaults to 8.
  • markeType – Shape to use. Defaults to ‘o’.
  • alpha – Transparency of plot. Defaults to 0.5.
pyseqrna.pyseqrna_plots.plotVenn(DEGFile=None, FOLD=2, comparisons=None, degLabel='', fontsize=14, figsize=(12, 12), dpi=300)

This function plots a Venn diagram for filtered degs in samples.

Parameters:
  • DEGFile – Filtered deg excel file containg samples sheet-wise.
  • FOLD – FOLD change. Defaults to 2.
  • comparisons – Comparison list. Defaults to None.
  • degLabel – How to put labes either total/ up-down. Defaults to “” i.e. up-down.
  • fontsize – Font size. Defaults to 14.
  • figsize – Figure size. Defaults to (12,12).
  • dpi – Figure DPI resolution. Defaults to 300.
pyseqrna.pyseqrna_plots.plotVolcano(degDF=None, comp=None, FOLD=2, pValue=0.05, color=('red', 'grey', 'green'), dim=(8, 5), dotsize=8, markerType='o', alpha=0.5)

This function plots a Volcano plot.

Parameters:
  • degDF – All gene expression file.
  • comp – Sample comparison.
  • FOLD – FOLD change. Defaults to 2.
  • pValue – Pvalues. Defaults to 0.05.
  • color – Colors to be used in plot. Defaults to (‘red’,’grey’,’green’).
  • dim – Dimensions of the plot. Defaults to (8,5).
  • dotsize – Dotsize on plot. Defaults to 8.
  • markeType – Shape to use. Defaults to ‘o’.
  • alpha – Transparency of plot. Defaults to 0.5.