Introduction¶

@author: naveen duhan

Read pyseqrna configuration

Quality Submodules¶

pyseqrna.quality_check module¶

Title:	This modules contains read quality check function
Created:	May 19, 2021
Author:	Naveen Duhan

pyseqrna.quality_check.fastqcRun(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, pairedEND=False, afterTrim=False, outDir=None, dep='')¶

This function perform fastqc quality using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

Parameters:

Parameters:	sampleDict – Samples dictionary generated by pyseqrna_utils.read_input_file function. configFile – Parameters file for FastQC tool. Default from pyseqrna params slurm – True to enable slurm job scheduling on HPC mem – Memory if slurm True cpu – Number of threads to use if slurm True. task – Number of tasks per cpu if slurm True. pairedEND – True if samples are paired afterTrim – True if checking quality after trimming. outDir – Output directory for results. Default is current working directory. dep – Slurm job dependency.

sampleDict – Samples dictionary generated by pyseqrna_utils.read_input_file function.
configFile – Parameters file for FastQC tool. Default from pyseqrna params
slurm – True to enable slurm job scheduling on HPC
mem – Memory if slurm True
cpu – Number of threads to use if slurm True.
task – Number of tasks per cpu if slurm True.
pairedEND – True if samples are paired
afterTrim – True if checking quality after trimming.
outDir – Output directory for results. Default is current working directory.
dep – Slurm job dependency.

pyseqrna.quality_trimming module¶

Title:	This modules contains read quality check functions for pySeqRNA
Created:	May 20, 2021
Author:	Naveen Duhan

pyseqrna.quality_trimming.flexbarRun(sampleDict, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')¶

This function is to perform adapter and quality based trimming of reads using flexbar trimming tool (https://github.com/seqan/flexbar)

Parameters:

Parameters:	samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function. configFile – A config file for flexbar parameters. Default is flexbar.ini from param. slurm – True if using slurm to schedule jobs. mem – Provide memory in GB to use. Default 20 GB. tasks – Number of cpu-tasks to run. Defaults to 1. cpu – Total number of threads to use. Default 8. pairedEND – True if samples are paired. outDir – Output directory for results. Default is current working directory. dep – slurm job id on which this job depends. Defaults to ‘’.

samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
slurm – True if using slurm to schedule jobs.
mem – Provide memory in GB to use. Default 20 GB.
tasks – Number of cpu-tasks to run. Defaults to 1.
cpu – Total number of threads to use. Default 8.
pairedEND – True if samples are paired.
outDir – Output directory for results. Default is current working directory.
dep – slurm job id on which this job depends. Defaults to ‘’.

pyseqrna.quality_trimming.trim_galoreRun(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')¶

This function is to perform adapter and quality based trimming of reads using trmmomatic trimming tool

Parameters:

Parameters:	samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function. configFile – A config file for flexbar parameters. Default is flexbar.ini from param. slurm – True if using slurm to schedule jobs. mem – Provide memory in GB to use. Default 20 GB. tasks – Number of cpu-tasks to run. Defaults to 1. cpu – Total number of threads to use. Default 8. pairedEND – True if samples are paired. outDir – Output directory for results. Default is current working directory. dep – slurm job id on which this job depends. Defaults to ‘’.

samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
slurm – True if using slurm to schedule jobs.
mem – Provide memory in GB to use. Default 20 GB.
tasks – Number of cpu-tasks to run. Defaults to 1.
cpu – Total number of threads to use. Default 8.
pairedEND – True if samples are paired.
outDir – Output directory for results. Default is current working directory.
dep – slurm job id on which this job depends. Defaults to ‘’.

pyseqrna.quality_trimming.trimmomaticRun(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')¶

This function is to perform adapter and quality based trimming of reads using trmmomatic trimming tool

Parameters:

Parameters:	samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function. configFile – A config file for flexbar parameters. Default is flexbar.ini from param. slurm – True if using slurm to schedule jobs. mem – Provide memory in GB to use. Default 20 GB. tasks – Number of cpu-tasks to run. Defaults to 1. cpu – Total number of threads to use. Default 8. pairedEND – True if samples are paired. outDir – Output directory for results. Default is current working directory. dep – slurm job id on which this job depends. Defaults to ‘’.

samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
slurm – True if using slurm to schedule jobs.
mem – Provide memory in GB to use. Default 20 GB.
tasks – Number of cpu-tasks to run. Defaults to 1.
cpu – Total number of threads to use. Default 8.
pairedEND – True if samples are paired.
outDir – Output directory for results. Default is current working directory.
dep – slurm job id on which this job depends. Defaults to ‘’.

pyseqrna.ribosomal module¶

Title:	This script contains sortMeRNA function for removing ribosomal RNA from reads.
Created:	July 31, 2021
Author:	Naveen Duhan

pyseqrna.ribosomal.sortmernaRun(sampleDict=None, outDir='.', rnaDatabases=None, pairedEND=False, slurm=False, mem=10, cpu=8, task=1, dep='')¶

This function execute sortMeRNA to remove ribosomal RNA from fastq reads.

Parameters:

Parameters:	sampleDict – Sample dictionary containing sample information outDir – Output directory. Defaults to present working directory. pairedEND – True if samples are paired-end. Defaults to False slurm – True if SLURM scheduling is available. Defaults to False mem – Memory in GB. Defaults to 10 cpu – Total number of CPU to use per task. Defaults to 8 task – Number of tasks per job. Defaults to 1 dep – Slurm job id if depends on other job. Defaults to ‘’

sampleDict – Sample dictionary containing sample information
outDir – Output directory. Defaults to present working directory.
pairedEND – True if samples are paired-end. Defaults to False
slurm – True if SLURM scheduling is available. Defaults to False
mem – Memory in GB. Defaults to 10
cpu – Total number of CPU to use per task. Defaults to 8
task – Number of tasks per job. Defaults to 1
dep – Slurm job id if depends on other job. Defaults to ‘’

Alignment Submodules¶

pyseqrna.aligners module¶

Title:	This modules contains read align class functions for pySeqRNA

:Created : July 21, 2021

:Author : Naveen Duhan

class pyseqrna.aligners.STAR_Aligner(genome=None, configFile=None, outDir=None, slurm=False)¶

Bases: object

Class for STAR alignment program

Parameters:	configFile – Path to STAR config file. This file will used to get the parameters for STAR alignment program slurm – To run commands with slurm task-scheduler.

build_index(mem=20, tasks=1, cpu=8, gff=None, dep='')¶

This function build geneome index for read alignment

Parameters:	mem – Provide memory in GB to use. Defaults to 20. tasks – Number of cpu-tasks to run. Defaults to 1. gff – Gene feature file to index with genome. Defaults to None. cpu – Total number of threads to use. Default 8. dep – slurm job id on which this job depends. Defaults to ‘’.

check_index()¶

Function to check if star index is valid and exists.

Returns:	Return true if genome index is valid.

run_Alignment(target=None, pairedEND=False, mem=20, cpu=8, tasks=1, dep='')¶

This function align reads against indexed reference genome.

Parameters:	target – target dictionary containing sample information. pairedEND – True if samples are paired. mem – Provide memory in GB to use. Default 20 GB. tasks – Number of cpu-tasks to run. Defaults to 1. cpu – Total number of threads to use. Default 8. dep – slurm job id on which this job depends. Defaults to ‘’.

class pyseqrna.aligners.hisat2_Aligner(genome=None, configFile=None, outDir='pySeqRNA_results', slurm=False)¶

Bases: object

Class for HISAT2 alignment program

Parameters:	configFile – Path to HISAT2 config file. This file will used to get the parameters for HISAT2 alignment program. slurm – To run commands with slurm task-scheduler.

build_index(mem=8, tasks=1, cpu=8, dep='')¶

This function build geneome index for read alignment

Parameters:	mem – Provide memory in GB to use. Defaults to 20. tasks – Number of cpu-tasks to run. Defaults to 1. gff – Gene feature file to index with genome. Defaults to None. cpu – Total number of threads to use. Default 8. dep – slurm job id on which this job depends. Defaults to ‘’.

check_index(largeIndex=False)¶

Function to check if star index is valid and exists.

Param:	True if genome indexed with large index.
Returns:	Return true if genome index is valid.

run_Alignment(target=None, pairedEND=False, mem=20, cpu=8, tasks=1, dep='')¶

This function align reads against indexed reference genome.

Parameters:	target – target dictionary containing sample information. pairedEND – True if samples are paired. mem – Provide memory in GB to use. Default 20 GB. tasks – Number of cpu-tasks to run. Defaults to 1. cpu – Total number of threads to use. Default 8. dep – slurm job id on which this job depends. Defaults to ‘’.

pyseqrna.pyseqrna_stats module¶

Title:	This module generate read alignment statistics
Created:	Spetember 10, 2021
Author:	Naveen Duhan

pyseqrna.pyseqrna_stats.align_stats(sampleDict=None, trimDict=None, bamDict=None, riboDict=None, pairedEND=False)¶

This function calculates the alignment statistics

Parameters:	sampleDict – Raw Reads sample dictionary containing all samples trinDict – Dictionary containing trimmed samples bamDict – Dictionary containing all samples bam files. riboDict – Dictionary containing filtered reads.
Returns:	DataFrame
Return type:	A DataFrame containg alignment statistics

Quantification Submodules¶

pyseqrna.quantification module¶

Title:	This modules contains feature counts in aligned reads for pySeqRNA
Created:	July 29, 2021
Author:	Naveen Duhan

pyseqrna.quantification.featureCount(configFile=None, bamDict=None, gff=None, slurm=False, mem=8, cpu=8, tasks=1, outDir='.', dep='')¶

This function counts feature in the aligned BAM files using featureCounts tool.

Parameters:	configFile – Paramters file for featureCounts. bamDict – A dictionary containing all the aligned BAM files. gff – Gene feature file GFF/GTF slurm – True if SLURM scheduling is available. Defaults to False mem – Memory in GB. Defaults to 10 cpu – Total number of CPU to use per task. Defaults to 8 task – Number of tasks per job. Defaults to 1 outDir – Output directory. Defaults to present working directory. dep – Slurm job id if depends on other job. Defaults to ‘’
Returns:	DataFrame
Return type:	A DataFrame containing read counts per feature per sample.

pyseqrna.quantification.htseqCount(configFile=None, bamDict=None, gff=None, slurm=False, mem=8, cpu=8, tasks=1, outDir='.', dep='')¶

This function counts feature in the aligned BAM files using HTSeq.

Parameters:	configFile – Paramters file for featureCounts. bamDict – A dictionary containing all the aligned BAM files. gff – Gene feature file GFF/GTF slurm – True if SLURM scheduling is available. Defaults to False mem – Memory in GB. Defaults to 10 cpu – Total number of CPU to use per task. Defaults to 8 task – Number of tasks per job. Defaults to 1 outDir – Output directory. Defaults to present working directory. dep – Slurm job id if depends on other job. Defaults to ‘’
Returns:	DataFrame
Return type:	A DataFrame containing read counts per feature per sample.

pyseqrna.multimapped_groups module¶

Title:	This module count multimapped read groups in aligned files.
Created:	June 5, 2022
Author:	Naveen Duhan

pyseqrna.multimapped_groups.countMMG(sampleDict=None, bamDict=None, gff=None, feature='gene', minCount=100, percentSample=0.5)¶

This function calculates multimapped gene groups.

Parameters:	sampleDict – a dictionary containing samples information. bamDict – a dictionary containing BAM files. gff – gene feature file. feature – feature type. minCount – minimum number of reads per sample. percentSample – minimum number of reads in percent sample.
Returns:	DataFrame
Return type:	A DataFrame containing multimapped read groups counts.

pyseqrna.normalize_counts module¶

Title:	This module converts raw read counts to normalize counts
Created:	August 3, 2021
Author:	Naveen Duhan

class pyseqrna.normalize_counts.Normalization(countFile=None, featureFile=None, typeFile='GFF', keyType='ncbi', attribute='ID', feature='gene', geneColumn='Gene')¶

Bases: object

This class is for calculation of normalized counts from raw counts

CPM(plot=True, figsize=(20, 10))¶

This function convert counts to counts per million (CPM)

Parameters:	plot – True if to plot log raw counts and log CPM counts on boxplot. figsize – Figure size.

RPKM(plot=True, figsize=(20, 10))¶

This function convert counts to reads per killobase per million (RPKM)

Parameters:	plot – True if to plot log raw counts and log RPKM counts on boxplot. figsize – Figure size.

TPM(plot=True, figsize=(20, 10))¶

This function convert counts to reads per killobase per million

Parameters:	plot – True if to plot log raw counts and log TPM counts on boxplot. figsize – Figure size.

meanRatioCount(plot=True, figsize=(20, 10))¶

This function convert counts to medianRatio count

Parameters:	plot – True if to plot log raw counts and log TPM counts on boxplot. figsize – Figure size.

pyseqrna.normalize_counts.boxplot(data=None, countType=None, figsize=(20, 10), **kwargs)¶

This function make a boxplot with boxes colored according to the countType they belong to

Parameters:	data – List containg log data of raw and normalize read counts. countType – Columns data of raw and normalize read counts. figsize – Figure size. kwargs – Other optional arguments for boxplot.

pyseqrna.clustering module¶

Title:	This module function generate a similarity dendrogram between samples

:Created : August 2, 2021

:Author : Naveen Duhan

pyseqrna.clustering.clusterSample(countDF=None)¶

Function to cluster samples based on similarity

Parameters:	countDF – A dataframe of read counts or normalized read counts.
Returns:	A clustered dendrogram of samples

pyseqrna.clustering.leaf_label(temp)¶

Funtion to generate leaf label for dendrogram

Parameters:	temp – Temp leaves arrangment for dendrogram
Returns:	Return leaf labels for dendrogram.

Differential Genes Submodules¶

pyseqrna.differential_expression module¶

Title:	This module finds differentially expressed genes from raw read counts
Created:	October 22, 2021
Author:	Naveen Duhan

class pyseqrna.differential_expression.Gene_Description(species, type, combinations=None, degFile=None, filtered=True)¶

Bases: object

This class fetch gene name and description for genes.

add_names()¶

This function add gene name and description in DEGs.

Returns:	DataFrame
Return type:	DEGs file with gene name and description.

add_names_annotation(file)¶

This function add gene name and description in functional annotation.

Returns:	DataFrame
Return type:	Functional annotation file with gene name and description.

pyseqrna.differential_expression.degFilter(degDF=None, CompareList=None, FDR=0.05, FOLD=2, plot=True, figsize=(10, 6), replicate=True, mmg=False, extraColumns=False)¶

This function filter all gene expression file based on given FOLD and FDR

Parameters:	degDF – A datafram containing all gene differantial expression in all combinations. CompareList – A list of all the sample comparison. FDR – False Discovery Rate for filtering DEGs. Defaults to 0.05. FOLD – Fold change value. The log2 of the value will be calculated. Defaults to 2. plot – True if want to plot DEGs per sample on barplot. Defaults to True.

pyseqrna.differential_expression.runDESeq2(countDF=None, targetFile=None, design='sample', combination=None, gene_column='Gene', mmg=False, subset=False, lib=None)¶

This function is a wrapper to DESeq2 package in R for differeantial expression analysis from raw read counts.

Parameters:	countFile – Raw read count file. targetFile – Tab-delimited target file with replication and sample name design – [description]. Defaults to None. combination – Comparison list contaning samples to compare. gene_column – First column in raw read count file. Defaults to ‘Gene’. mmg – True if raw read counts are from multimapped gene groups. subset – If runDESeq2 subset raw read count according to comparison. lib – library path of DESeq2 to use.
Returns:	DataFrame
Return type:	A datafram containing all gene differantial expression in all combinations.

pyseqrna.differential_expression.run_edgeR(countDF=None, targetFile=None, combination=None, gene_column='Gene', mmg=False, subset=False, replicate=True, bcv=0.4, lib=None)¶

This function is a wrapper to edgeR package in R for differeantial expression analysis from raw read counts.

Parameters:	countFile – Raw read count file. targetFile – Tab-delimited target file with replication and sample name design – [description]. Defaults to None. combination – Comparison list contaning samples to compare. gene_column – First column in raw read count file. Defaults to ‘Gene’. mmg – True if raw read counts are from multimapped gene groups. subset – If runDESeq2 subset raw read count according to comparison. replicate – False if there are no replicates. bcv – Biological coefficient of variation if there are no replicate. lib – library path of DESeq2 to use.
Returns:	DataFrame
Return type:	A datafram containing all gene differantial expression in all combinations.

Functional Annotation Submodules¶

pyseqrna.gene_ontology module¶

Title:	gene_ontology module is for performing gene ontology enrichment analysis of differentially expressed genes
Created:	January 5, 2022
Author:	Naveen Duhan

class pyseqrna.gene_ontology.GeneOntology(species=None, type=None, keyType='ensembl', taxid=None, gff=None)¶

Bases: object

This class is for Gene Ontology enrichment

Parameters:	species – Species name. Ex. for Arabidopsis thaliana it is athaliana type – Species is from plants or animals. keyType – Genes are from NCBI or ENSEMBL. Default is ENSEMBL. taxid – Taxonomy ID if keyType is NCBI. gff – Gene feature file.

barplotGO(df=None, nrows=20, colorBy='logPvalues')¶

This function creates a barplot for Gene Ontology enrichment.

Parameters:	df – Gene Ontology enrichment file from enrichGO function. nrows – Number of rows to plot. Default to 20 rows. colorBy – Color bar on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:	a barplot

dotplotGO(df=None, nrows=20, colorBy='logPvalues')¶

This function creates a dotplot for Gene Ontology enrichment.

Parameters:	df – Gene Ontology enrichment file from enrichGO function. nrows – Number of rows to plot. Default to 20 rows. colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:	a dotplot

enrichGO(file=None, pvalueCutoff=0.05, plot=True, plotType='dotplot', nrows=20, colorBy='logPvalues')¶

This function performs Gene Ontology enrichment of DEGs.

Parameters:	file – Differentially expressed genes in a sample. pvalueCutoff – P-value cutoff for enrichment. Default is 0.05. plot – True if a plot is needed. Default is True. plotType – Gene Ontology enrichment visualization on dotplot/barplot. Default is dotplot. nrows – Number of rows to plot. Default to 20 rows. colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:	a dictionary
Rtype results:	Gene Ontology enrichment results.
Rtype plot:	a dotplot/barplot

pyseqrna.pathway module¶

Title:	This module is for performing KEGG pathway enrichment analysis of differentially expressed genes
Created:	January 10, 2022
Author:	Naveen Duhan

class pyseqrna.pathway.Pathway(species=None, keyType=None, gff=None)¶

Bases: object

This class is for KEGG Pathway enrichment

Parameters:	species – Species name. Ex. for Arabidopsis thaliana it is athaliana keyType – Genes are from NCBI or ENSEMBL. Default is ENSEMBL. gff – Gene feature file.

barplotKEGG(df=None, nrows=20, colorBy='logPvalues')¶

This function creates a barplot for KEGG pathway enrichment.

Parameters:	df – KEGG pathway enrichment file from enrichKEGG function. nrows – Number of rows to plot. Default to 20 rows. colorBy – Color bar on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:	a barplot

dotplotKEGG(df=None, nrows=20, colorBy='logPvalues')¶

This function creates a dotplot for KEGG pathway enrichment.

Parameters:	df – KEGG pathway enrichment file from enrichKEGG function. nrows – Number of rows to plot. Default to 20 rows. colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:	a dotplot

enrichKEGG(file, pvalueCutoff=0.05, plot=True, plotType='dotplot', nrows=20, colorBy='logPvalues')¶

This function performs KEGG pathway enrichment of DEGs.

Parameters:	file – Differentially expressed genes in a sample. pvalueCutoff – P-value cutoff for enrichment. Default is 0.05. plot – True if a plot is needed. Default is True. plotType – KEGG pathway enrichment visualization on dotplot/barplot. Default is dotplot. nrows – Number of rows to plot. Default to 20 rows. colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns:	a dictionary
Rtype results:	KEGG pathway enrichment results.
Rtype plot:	a dotplot/barplot

Utility Submodules¶

pyseqrna.arg_parser module¶

Title:	Argument parser module for pySeqRNA
Created:	October 11, 2021
Author:	Naveen Duhan

pyseqrna.pyseqrna_utils module¶

Title:	This modules contains utility functions for pySeqRNA

:Created : July 11, 2021

:Author : Naveen Duhan

pyseqrna.pyseqrna_utils.PyseqrnaLogger(mode, log)¶

This function intialize logger in the pySeqRNA modules

Parameters:	mode – Logger name for the module log – File name for logging

pyseqrna.pyseqrna_utils.add_MMG(degDF=None, anotDF=None, combination=None)¶

pyseqrna.pyseqrna_utils.change_attribute(args)¶: This function changes the attribute for feature counts based on GFF or GTF file.

pyseqrna.pyseqrna_utils.change_ids(df, file)¶

This function changes ids to other ids.

Parameters:	df – A DataFrame containing IDs and synonym IDs. file – A DataFrame in which IDs needs to be replaced.

pyseqrna.pyseqrna_utils.check_files(*args)¶

This function check if files exist

Parameters:	args – List of files to check.
Returns:	True or False.
Return type:	Retrun true only if all files in list exists in a directory.

pyseqrna.pyseqrna_utils.check_path(*args)¶

This function check if directory exist

Parameters:	args – List of directory to check.
Returns:	True or False
Return type:	Retrun true only if all files in list exists in a directory.

pyseqrna.pyseqrna_utils.check_status(job_id)¶

This function is check status of slurm job

Parameters:	job_id – slurm job id
Returns:	True/False
Return type:	If job completed return True. Default False.

pyseqrna.pyseqrna_utils.clusterRun(job_name='pyseqRNA', sout=' pyseqrna', serror='pyseqrna', command='command', time=4, mem=10, cpu=8, tasks=1, dep='')¶

This function is for submitting job on cluster with SLURM job Scheduling

Parameters:

job_name – Slurm job name on HPC. Defaults to ‘pyseqRNA’.
command – Command to excute on HPC.
time – Slurm Job time allotment.
mem – Memory to use in GB.
cpu – Number of CPUs to use for the job.
tasks – Number of tasks to execute.
dep – Slurm Job dependency. Defaults to ‘’.

Returns:

rtype:	Slurm sbatch ID

pyseqrna.pyseqrna_utils.findFiles(searchPATH=None, searchPattern=None, recursive=False, verbose=False)¶

This function find searches files.

Parameters:	searchPATH – Search directory. pattern – Pattern to search. recursive – True if want to search recursively. Defaults to False. verbose – True if want to print output. Defaults to False.

pyseqrna.pyseqrna_utils.getFiles(pattern, path)¶

This function searches all files containing patterns in a directory.

Parameters:	pattern – Patteren to search. path – A direcory path.
Returns:	All files containing a pattern.

pyseqrna.pyseqrna_utils.getGenes(file=None, combinations=None, multisheet=True, geneType='all', outDir='.', mmg=False)¶

This function extract genes from filtered differentiall expressed genes.

Parameters:	file – Filtered DEGs file. combinations – Comparison list contaning samples. multisheet – True if file is multisheet. outDir – Output directory. Default is current working directory.
GeneType:	Genes to extract all, up , down. Default is all.

pyseqrna.pyseqrna_utils.get_basename(filePATH)¶

This function get the base name of the file from full path

Parameters:	filePATH – Path to file.

pyseqrna.pyseqrna_utils.get_cpu()¶

This function get actual CPU count of the system

Returns:	Integer
Return type:	int with 80 % of CPU count

pyseqrna.pyseqrna_utils.get_directory(filePATH)¶

This function retrun directory of a file

Parameters:	filePATH – Path to file.

pyseqrna.pyseqrna_utils.get_file_extension(filePATH)¶

This function return the extension of file

Parameters:	filePATH – Path to file.

pyseqrna.pyseqrna_utils.get_parent(filePATH)¶

This function return the file name without extension

Parameters:	filePATH – Path to file.

pyseqrna.pyseqrna_utils.make_directory(dir)¶

This function create a directory

Parameters:	dir – Directory name.
Returns:	Name of created directory.

pyseqrna.pyseqrna_utils.parse_config_file(infile)¶

This function parse the config file for all the programs used in pySeqRNA

Param:	configFile: <program>.ini config file containing arguments.
Retrun:	Program specific arguments
Return type:	a dictionary

pyseqrna.pyseqrna_utils.parse_gff(file)¶

This function parse a gene feature file in a dataframe for gene IDs

Parameters:	file – A gene feature file.

pyseqrna.pyseqrna_utils.read_input_file(infile, inpath, paired=False)¶

This function reads input sample file and convert into a dictionary. It also make all possible combination for DEG analysis. Target dataframe for differential analysis.

Parameters:	inputFile – input sample file containing the infromation about project inputPath – Path for input fastq files pairedEND – Check if reads are paired end]. Defaults to False
Returns:	samples, combinations and targets for differential expression
Return type:	A dictionary

pyseqrna.pyseqrna_utils.replace_cpu(args, args2)¶

This function replace the actual CPU in config file.

Returns:	Change CPU count to 80% of available CPU

pyseqrna.version module¶

Title:	This is version of pySeqRNA
Created:	September 15, 2021
Author:	naveen duhan

pyseqrna.pyseqrna_plots module¶

Title:	This module function generate visualization

:Created : November 2, 2021

:Author : Naveen Duhan

pyseqrna.pyseqrna_plots.plotHeatmap(degDF=None, combinations=None, num=50, figdim=(12, 10), extraColumns=False, type='counts')¶

This function plots a heatmap based on FOLD change or counts.

Parameters:	degDF – All gene expression file or Counts file. combinations – All sample combinations. num – Total number of genes to plot. Default is 50 (25 up and 25 down) figdim – Figure dimensions. type – Heatmap to create on counts/degs.

pyseqrna.pyseqrna_plots.plotMA(degDF=None, countDF=None, comp=None, FOLD=2, FDR=0.05, color=('red', 'grey', 'green'), dim=(8, 5), dotsize=8, markerType='o', alpha=0.5)¶

This function plots a MA plot.

Parameters:

degDF – All gene expression file.
comp – Sample comparison.
FOLD – FOLD change. Defaults to 2.
FDR – FDR value. Defaults to 0.05.
color – Colors to be used in plot. Defaults to (‘red’,’grey’,’green’).
dim – Dimensions of the plot. Defaults to (8,5).
dotsize – Dotsize on plot. Defaults to 8.
markeType – Shape to use. Defaults to ‘o’.
alpha – Transparency of plot. Defaults to 0.5.

pyseqrna.pyseqrna_plots.plotVenn(DEGFile=None, FOLD=2, comparisons=None, degLabel='', fontsize=14, figsize=(12, 12), dpi=300)¶

This function plots a Venn diagram for filtered degs in samples.

Parameters:

DEGFile – Filtered deg excel file containg samples sheet-wise.
FOLD – FOLD change. Defaults to 2.
comparisons – Comparison list. Defaults to None.
degLabel – How to put labes either total/ up-down. Defaults to “” i.e. up-down.
fontsize – Font size. Defaults to 14.
figsize – Figure size. Defaults to (12,12).
dpi – Figure DPI resolution. Defaults to 300.

pyseqrna.pyseqrna_plots.plotVolcano(degDF=None, comp=None, FOLD=2, pValue=0.05, color=('red', 'grey', 'green'), dim=(8, 5), dotsize=8, markerType='o', alpha=0.5)¶

This function plots a Volcano plot.

Parameters:

degDF – All gene expression file.
comp – Sample comparison.
FOLD – FOLD change. Defaults to 2.
pValue – Pvalues. Defaults to 0.05.
color – Colors to be used in plot. Defaults to (‘red’,’grey’,’green’).
dim – Dimensions of the plot. Defaults to (8,5).
dotsize – Dotsize on plot. Defaults to 8.
markeType – Shape to use. Defaults to ‘o’.
alpha – Transparency of plot. Defaults to 0.5.

Introduction

Introduction

Table Of Contents

Introduction¶

Quality Submodules¶

pyseqrna.quality_check module¶

pyseqrna.quality_trimming module¶

pyseqrna.ribosomal module¶

Alignment Submodules¶

pyseqrna.aligners module¶

pyseqrna.pyseqrna_stats module¶

Quantification Submodules¶

pyseqrna.quantification module¶

pyseqrna.multimapped_groups module¶

pyseqrna.normalize_counts module¶

pyseqrna.clustering module¶

Differential Genes Submodules¶

pyseqrna.differential_expression module¶

Functional Annotation Submodules¶

pyseqrna.gene_ontology module¶

pyseqrna.pathway module¶

Utility Submodules¶

pyseqrna.arg_parser module¶

pyseqrna.pyseqrna_utils module¶

pyseqrna.version module¶

pyseqrna.pyseqrna_plots module¶