Quality Submodules¶
pyseqrna.quality_check module¶
Title: | This modules contains read quality check function |
---|---|
Created: | May 19, 2021 |
Author: | Naveen Duhan |
-
pyseqrna.quality_check.
fastqcRun
(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, pairedEND=False, afterTrim=False, outDir=None, dep='')¶ This function perform fastqc quality using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
Parameters: - sampleDict – Samples dictionary generated by pyseqrna_utils.read_input_file function.
- configFile – Parameters file for FastQC tool. Default from pyseqrna params
- slurm – True to enable slurm job scheduling on HPC
- mem – Memory if slurm True
- cpu – Number of threads to use if slurm True.
- task – Number of tasks per cpu if slurm True.
- pairedEND – True if samples are paired
- afterTrim – True if checking quality after trimming.
- outDir – Output directory for results. Default is current working directory.
- dep – Slurm job dependency.
pyseqrna.quality_trimming module¶
Title: | This modules contains read quality check functions for pySeqRNA |
---|---|
Created: | May 20, 2021 |
Author: | Naveen Duhan |
-
pyseqrna.quality_trimming.
flexbarRun
(sampleDict, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')¶ This function is to perform adapter and quality based trimming of reads using flexbar trimming tool (https://github.com/seqan/flexbar)
Parameters: - samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
- configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
- slurm – True if using slurm to schedule jobs.
- mem – Provide memory in GB to use. Default 20 GB.
- tasks – Number of cpu-tasks to run. Defaults to 1.
- cpu – Total number of threads to use. Default 8.
- pairedEND – True if samples are paired.
- outDir – Output directory for results. Default is current working directory.
- dep – slurm job id on which this job depends. Defaults to ‘’.
-
pyseqrna.quality_trimming.
trim_galoreRun
(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')¶ This function is to perform adapter and quality based trimming of reads using trmmomatic trimming tool
Parameters: - samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
- configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
- slurm – True if using slurm to schedule jobs.
- mem – Provide memory in GB to use. Default 20 GB.
- tasks – Number of cpu-tasks to run. Defaults to 1.
- cpu – Total number of threads to use. Default 8.
- pairedEND – True if samples are paired.
- outDir – Output directory for results. Default is current working directory.
- dep – slurm job id on which this job depends. Defaults to ‘’.
-
pyseqrna.quality_trimming.
trimmomaticRun
(sampleDict=None, configFile=None, slurm=False, mem=10, cpu=8, task=1, paired=False, outDir=None, dep='')¶ This function is to perform adapter and quality based trimming of reads using trmmomatic trimming tool
Parameters: - samplesDict – A dictionary containing sample information generated by pyseqrna_utils.read_input_file function.
- configFile – A config file for flexbar parameters. Default is flexbar.ini from param.
- slurm – True if using slurm to schedule jobs.
- mem – Provide memory in GB to use. Default 20 GB.
- tasks – Number of cpu-tasks to run. Defaults to 1.
- cpu – Total number of threads to use. Default 8.
- pairedEND – True if samples are paired.
- outDir – Output directory for results. Default is current working directory.
- dep – slurm job id on which this job depends. Defaults to ‘’.
pyseqrna.ribosomal module¶
Title: | This script contains sortMeRNA function for removing ribosomal RNA from reads. |
---|---|
Created: | July 31, 2021 |
Author: | Naveen Duhan |
-
pyseqrna.ribosomal.
sortmernaRun
(sampleDict=None, outDir='.', rnaDatabases=None, pairedEND=False, slurm=False, mem=10, cpu=8, task=1, dep='')¶ This function execute sortMeRNA to remove ribosomal RNA from fastq reads.
Parameters: - sampleDict – Sample dictionary containing sample information
- outDir – Output directory. Defaults to present working directory.
- pairedEND – True if samples are paired-end. Defaults to False
- slurm – True if SLURM scheduling is available. Defaults to False
- mem – Memory in GB. Defaults to 10
- cpu – Total number of CPU to use per task. Defaults to 8
- task – Number of tasks per job. Defaults to 1
- dep – Slurm job id if depends on other job. Defaults to ‘’
Alignment Submodules¶
pyseqrna.aligners module¶
Title: | This modules contains read align class functions for pySeqRNA |
---|
:Created : July 21, 2021
:Author : Naveen Duhan
-
class
pyseqrna.aligners.
STAR_Aligner
(genome=None, configFile=None, outDir=None, slurm=False)¶ Bases:
object
Class for STAR alignment program
Parameters: - configFile – Path to STAR config file. This file will used to get the parameters for STAR alignment program
- slurm – To run commands with slurm task-scheduler.
-
build_index
(mem=20, tasks=1, cpu=8, gff=None, dep='')¶ This function build geneome index for read alignment
Parameters: - mem – Provide memory in GB to use. Defaults to 20.
- tasks – Number of cpu-tasks to run. Defaults to 1.
- gff – Gene feature file to index with genome. Defaults to None.
- cpu – Total number of threads to use. Default 8.
- dep – slurm job id on which this job depends. Defaults to ‘’.
-
check_index
()¶ Function to check if star index is valid and exists.
Returns: Return true if genome index is valid.
-
run_Alignment
(target=None, pairedEND=False, mem=20, cpu=8, tasks=1, dep='')¶ This function align reads against indexed reference genome.
Parameters: - target – target dictionary containing sample information.
- pairedEND – True if samples are paired.
- mem – Provide memory in GB to use. Default 20 GB.
- tasks – Number of cpu-tasks to run. Defaults to 1.
- cpu – Total number of threads to use. Default 8.
- dep – slurm job id on which this job depends. Defaults to ‘’.
-
class
pyseqrna.aligners.
hisat2_Aligner
(genome=None, configFile=None, outDir='pySeqRNA_results', slurm=False)¶ Bases:
object
Class for HISAT2 alignment program
Parameters: - configFile – Path to HISAT2 config file. This file will used to get the parameters for HISAT2 alignment program.
- slurm – To run commands with slurm task-scheduler.
-
build_index
(mem=8, tasks=1, cpu=8, dep='')¶ This function build geneome index for read alignment
Parameters: - mem – Provide memory in GB to use. Defaults to 20.
- tasks – Number of cpu-tasks to run. Defaults to 1.
- gff – Gene feature file to index with genome. Defaults to None.
- cpu – Total number of threads to use. Default 8.
- dep – slurm job id on which this job depends. Defaults to ‘’.
-
check_index
(largeIndex=False)¶ Function to check if star index is valid and exists.
Param: True if genome indexed with large index. Returns: Return true if genome index is valid.
-
run_Alignment
(target=None, pairedEND=False, mem=20, cpu=8, tasks=1, dep='')¶ This function align reads against indexed reference genome.
Parameters: - target – target dictionary containing sample information.
- pairedEND – True if samples are paired.
- mem – Provide memory in GB to use. Default 20 GB.
- tasks – Number of cpu-tasks to run. Defaults to 1.
- cpu – Total number of threads to use. Default 8.
- dep – slurm job id on which this job depends. Defaults to ‘’.
pyseqrna.pyseqrna_stats module¶
Title: | This module generate read alignment statistics |
---|---|
Created: | Spetember 10, 2021 |
Author: | Naveen Duhan |
-
pyseqrna.pyseqrna_stats.
align_stats
(sampleDict=None, trimDict=None, bamDict=None, riboDict=None, pairedEND=False)¶ This function calculates the alignment statistics
Parameters: - sampleDict – Raw Reads sample dictionary containing all samples
- trinDict – Dictionary containing trimmed samples
- bamDict – Dictionary containing all samples bam files.
- riboDict – Dictionary containing filtered reads.
Returns: DataFrame
Return type: A DataFrame containg alignment statistics
Quantification Submodules¶
pyseqrna.quantification module¶
Title: | This modules contains feature counts in aligned reads for pySeqRNA |
---|---|
Created: | July 29, 2021 |
Author: | Naveen Duhan |
-
pyseqrna.quantification.
featureCount
(configFile=None, bamDict=None, gff=None, slurm=False, mem=8, cpu=8, tasks=1, outDir='.', dep='')¶ This function counts feature in the aligned BAM files using featureCounts tool.
Parameters: - configFile – Paramters file for featureCounts.
- bamDict – A dictionary containing all the aligned BAM files.
- gff – Gene feature file GFF/GTF
- slurm – True if SLURM scheduling is available. Defaults to False
- mem – Memory in GB. Defaults to 10
- cpu – Total number of CPU to use per task. Defaults to 8
- task – Number of tasks per job. Defaults to 1
- outDir – Output directory. Defaults to present working directory.
- dep – Slurm job id if depends on other job. Defaults to ‘’
Returns: DataFrame
Return type: A DataFrame containing read counts per feature per sample.
-
pyseqrna.quantification.
htseqCount
(configFile=None, bamDict=None, gff=None, slurm=False, mem=8, cpu=8, tasks=1, outDir='.', dep='')¶ This function counts feature in the aligned BAM files using HTSeq.
Parameters: - configFile – Paramters file for featureCounts.
- bamDict – A dictionary containing all the aligned BAM files.
- gff – Gene feature file GFF/GTF
- slurm – True if SLURM scheduling is available. Defaults to False
- mem – Memory in GB. Defaults to 10
- cpu – Total number of CPU to use per task. Defaults to 8
- task – Number of tasks per job. Defaults to 1
- outDir – Output directory. Defaults to present working directory.
- dep – Slurm job id if depends on other job. Defaults to ‘’
Returns: DataFrame
Return type: A DataFrame containing read counts per feature per sample.
pyseqrna.multimapped_groups module¶
Title: | This module count multimapped read groups in aligned files. |
---|---|
Created: | June 5, 2022 |
Author: | Naveen Duhan |
-
pyseqrna.multimapped_groups.
countMMG
(sampleDict=None, bamDict=None, gff=None, feature='gene', minCount=100, percentSample=0.5)¶ This function calculates multimapped gene groups.
Parameters: - sampleDict – a dictionary containing samples information.
- bamDict – a dictionary containing BAM files.
- gff – gene feature file.
- feature – feature type.
- minCount – minimum number of reads per sample.
- percentSample – minimum number of reads in percent sample.
Returns: DataFrame
Return type: A DataFrame containing multimapped read groups counts.
pyseqrna.normalize_counts module¶
Title: | This module converts raw read counts to normalize counts |
---|---|
Created: | August 3, 2021 |
Author: | Naveen Duhan |
-
class
pyseqrna.normalize_counts.
Normalization
(countFile=None, featureFile=None, typeFile='GFF', keyType='ncbi', attribute='ID', feature='gene', geneColumn='Gene')¶ Bases:
object
This class is for calculation of normalized counts from raw counts
-
CPM
(plot=True, figsize=(20, 10))¶ This function convert counts to counts per million (CPM)
Parameters: - plot – True if to plot log raw counts and log CPM counts on boxplot.
- figsize – Figure size.
-
RPKM
(plot=True, figsize=(20, 10))¶ This function convert counts to reads per killobase per million (RPKM)
Parameters: - plot – True if to plot log raw counts and log RPKM counts on boxplot.
- figsize – Figure size.
-
TPM
(plot=True, figsize=(20, 10))¶ This function convert counts to reads per killobase per million
Parameters: - plot – True if to plot log raw counts and log TPM counts on boxplot.
- figsize – Figure size.
-
meanRatioCount
(plot=True, figsize=(20, 10))¶ This function convert counts to medianRatio count
Parameters: - plot – True if to plot log raw counts and log TPM counts on boxplot.
- figsize – Figure size.
-
-
pyseqrna.normalize_counts.
boxplot
(data=None, countType=None, figsize=(20, 10), **kwargs)¶ This function make a boxplot with boxes colored according to the countType they belong to
Parameters: - data – List containg log data of raw and normalize read counts.
- countType – Columns data of raw and normalize read counts.
- figsize – Figure size.
- kwargs – Other optional arguments for boxplot.
pyseqrna.clustering module¶
Title: | This module function generate a similarity dendrogram between samples |
---|
:Created : August 2, 2021
:Author : Naveen Duhan
-
pyseqrna.clustering.
clusterSample
(countDF=None)¶ Function to cluster samples based on similarity
Parameters: countDF – A dataframe of read counts or normalized read counts. Returns: A clustered dendrogram of samples
-
pyseqrna.clustering.
leaf_label
(temp)¶ Funtion to generate leaf label for dendrogram
Parameters: temp – Temp leaves arrangment for dendrogram Returns: Return leaf labels for dendrogram.
Differential Genes Submodules¶
pyseqrna.differential_expression module¶
Title: | This module finds differentially expressed genes from raw read counts |
---|---|
Created: | October 22, 2021 |
Author: | Naveen Duhan |
-
class
pyseqrna.differential_expression.
Gene_Description
(species, type, combinations=None, degFile=None, filtered=True)¶ Bases:
object
This class fetch gene name and description for genes.
-
add_names
()¶ This function add gene name and description in DEGs.
Returns: DataFrame Return type: DEGs file with gene name and description.
-
add_names_annotation
(file)¶ This function add gene name and description in functional annotation.
Returns: DataFrame Return type: Functional annotation file with gene name and description.
-
-
pyseqrna.differential_expression.
degFilter
(degDF=None, CompareList=None, FDR=0.05, FOLD=2, plot=True, figsize=(10, 6), replicate=True, mmg=False, extraColumns=False)¶ This function filter all gene expression file based on given FOLD and FDR
Parameters: - degDF – A datafram containing all gene differantial expression in all combinations.
- CompareList – A list of all the sample comparison.
- FDR – False Discovery Rate for filtering DEGs. Defaults to 0.05.
- FOLD – Fold change value. The log2 of the value will be calculated. Defaults to 2.
- plot – True if want to plot DEGs per sample on barplot. Defaults to True.
-
pyseqrna.differential_expression.
runDESeq2
(countDF=None, targetFile=None, design='sample', combination=None, gene_column='Gene', mmg=False, subset=False, lib=None)¶ This function is a wrapper to DESeq2 package in R for differeantial expression analysis from raw read counts.
Parameters: - countFile – Raw read count file.
- targetFile – Tab-delimited target file with replication and sample name
- design – [description]. Defaults to None.
- combination – Comparison list contaning samples to compare.
- gene_column – First column in raw read count file. Defaults to ‘Gene’.
- mmg – True if raw read counts are from multimapped gene groups.
- subset – If runDESeq2 subset raw read count according to comparison.
- lib – library path of DESeq2 to use.
Returns: DataFrame
Return type: A datafram containing all gene differantial expression in all combinations.
-
pyseqrna.differential_expression.
run_edgeR
(countDF=None, targetFile=None, combination=None, gene_column='Gene', mmg=False, subset=False, replicate=True, bcv=0.4, lib=None)¶ This function is a wrapper to edgeR package in R for differeantial expression analysis from raw read counts.
Parameters: - countFile – Raw read count file.
- targetFile – Tab-delimited target file with replication and sample name
- design – [description]. Defaults to None.
- combination – Comparison list contaning samples to compare.
- gene_column – First column in raw read count file. Defaults to ‘Gene’.
- mmg – True if raw read counts are from multimapped gene groups.
- subset – If runDESeq2 subset raw read count according to comparison.
- replicate – False if there are no replicates.
- bcv – Biological coefficient of variation if there are no replicate.
- lib – library path of DESeq2 to use.
Returns: DataFrame
Return type: A datafram containing all gene differantial expression in all combinations.
Functional Annotation Submodules¶
pyseqrna.gene_ontology module¶
Title: | gene_ontology module is for performing gene ontology enrichment analysis of differentially expressed genes |
---|---|
Created: | January 5, 2022 |
Author: | Naveen Duhan |
-
class
pyseqrna.gene_ontology.
GeneOntology
(species=None, type=None, keyType='ensembl', taxid=None, gff=None)¶ Bases:
object
This class is for Gene Ontology enrichment
Parameters: - species – Species name. Ex. for Arabidopsis thaliana it is athaliana
- type – Species is from plants or animals.
- keyType – Genes are from NCBI or ENSEMBL. Default is ENSEMBL.
- taxid – Taxonomy ID if keyType is NCBI.
- gff – Gene feature file.
-
barplotGO
(df=None, nrows=20, colorBy='logPvalues')¶ This function creates a barplot for Gene Ontology enrichment.
Parameters: - df – Gene Ontology enrichment file from enrichGO function.
- nrows – Number of rows to plot. Default to 20 rows.
- colorBy – Color bar on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns: a barplot
-
dotplotGO
(df=None, nrows=20, colorBy='logPvalues')¶ This function creates a dotplot for Gene Ontology enrichment.
Parameters: - df – Gene Ontology enrichment file from enrichGO function.
- nrows – Number of rows to plot. Default to 20 rows.
- colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns: a dotplot
-
enrichGO
(file=None, pvalueCutoff=0.05, plot=True, plotType='dotplot', nrows=20, colorBy='logPvalues')¶ This function performs Gene Ontology enrichment of DEGs.
Parameters: - file – Differentially expressed genes in a sample.
- pvalueCutoff – P-value cutoff for enrichment. Default is 0.05.
- plot – True if a plot is needed. Default is True.
- plotType – Gene Ontology enrichment visualization on dotplot/barplot. Default is dotplot.
- nrows – Number of rows to plot. Default to 20 rows.
- colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns: a dictionary
Rtype results: Gene Ontology enrichment results.
Rtype plot: a dotplot/barplot
pyseqrna.pathway module¶
Title: | This module is for performing KEGG pathway enrichment analysis of differentially expressed genes |
---|---|
Created: | January 10, 2022 |
Author: | Naveen Duhan |
-
class
pyseqrna.pathway.
Pathway
(species=None, keyType=None, gff=None)¶ Bases:
object
This class is for KEGG Pathway enrichment
Parameters: - species – Species name. Ex. for Arabidopsis thaliana it is athaliana
- keyType – Genes are from NCBI or ENSEMBL. Default is ENSEMBL.
- gff – Gene feature file.
-
barplotKEGG
(df=None, nrows=20, colorBy='logPvalues')¶ This function creates a barplot for KEGG pathway enrichment.
Parameters: - df – KEGG pathway enrichment file from enrichKEGG function.
- nrows – Number of rows to plot. Default to 20 rows.
- colorBy – Color bar on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns: a barplot
-
dotplotKEGG
(df=None, nrows=20, colorBy='logPvalues')¶ This function creates a dotplot for KEGG pathway enrichment.
Parameters: - df – KEGG pathway enrichment file from enrichKEGG function.
- nrows – Number of rows to plot. Default to 20 rows.
- colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns: a dotplot
-
enrichKEGG
(file, pvalueCutoff=0.05, plot=True, plotType='dotplot', nrows=20, colorBy='logPvalues')¶ This function performs KEGG pathway enrichment of DEGs.
Parameters: - file – Differentially expressed genes in a sample.
- pvalueCutoff – P-value cutoff for enrichment. Default is 0.05.
- plot – True if a plot is needed. Default is True.
- plotType – KEGG pathway enrichment visualization on dotplot/barplot. Default is dotplot.
- nrows – Number of rows to plot. Default to 20 rows.
- colorBy – Color dot on plots with logPvalues / FDR. Defaults to ‘logPvalues’.
Returns: a dictionary
Rtype results: KEGG pathway enrichment results.
Rtype plot: a dotplot/barplot
Utility Submodules¶
pyseqrna.arg_parser module¶
Title: | Argument parser module for pySeqRNA |
---|---|
Created: | October 11, 2021 |
Author: | Naveen Duhan |
pyseqrna.pyseqrna_utils module¶
Title: | This modules contains utility functions for pySeqRNA |
---|
:Created : July 11, 2021
:Author : Naveen Duhan
-
pyseqrna.pyseqrna_utils.
PyseqrnaLogger
(mode, log)¶ This function intialize logger in the pySeqRNA modules
Parameters: - mode – Logger name for the module
- log – File name for logging
-
pyseqrna.pyseqrna_utils.
add_MMG
(degDF=None, anotDF=None, combination=None)¶
-
pyseqrna.pyseqrna_utils.
change_attribute
(args)¶ This function changes the attribute for feature counts based on GFF or GTF file.
-
pyseqrna.pyseqrna_utils.
change_ids
(df, file)¶ This function changes ids to other ids.
Parameters: - df – A DataFrame containing IDs and synonym IDs.
- file – A DataFrame in which IDs needs to be replaced.
-
pyseqrna.pyseqrna_utils.
check_files
(*args)¶ This function check if files exist
Parameters: args – List of files to check. Returns: True or False. Return type: Retrun true only if all files in list exists in a directory.
-
pyseqrna.pyseqrna_utils.
check_path
(*args)¶ This function check if directory exist
Parameters: args – List of directory to check. Returns: True or False Return type: Retrun true only if all files in list exists in a directory.
-
pyseqrna.pyseqrna_utils.
check_status
(job_id)¶ This function is check status of slurm job
Parameters: job_id – slurm job id Returns: True/False Return type: If job completed return True. Default False.
-
pyseqrna.pyseqrna_utils.
clusterRun
(job_name='pyseqRNA', sout=' pyseqrna', serror='pyseqrna', command='command', time=4, mem=10, cpu=8, tasks=1, dep='')¶ This function is for submitting job on cluster with SLURM job Scheduling
Parameters: - job_name – Slurm job name on HPC. Defaults to ‘pyseqRNA’.
- command – Command to excute on HPC.
- time – Slurm Job time allotment.
- mem – Memory to use in GB.
- cpu – Number of CPUs to use for the job.
- tasks – Number of tasks to execute.
- dep – Slurm Job dependency. Defaults to ‘’.
Returns: rtype: Slurm sbatch ID
-
pyseqrna.pyseqrna_utils.
findFiles
(searchPATH=None, searchPattern=None, recursive=False, verbose=False)¶ This function find searches files.
Parameters: - searchPATH – Search directory.
- pattern – Pattern to search.
- recursive – True if want to search recursively. Defaults to False.
- verbose – True if want to print output. Defaults to False.
-
pyseqrna.pyseqrna_utils.
getFiles
(pattern, path)¶ This function searches all files containing patterns in a directory.
Parameters: - pattern – Patteren to search.
- path – A direcory path.
Returns: All files containing a pattern.
-
pyseqrna.pyseqrna_utils.
getGenes
(file=None, combinations=None, multisheet=True, geneType='all', outDir='.', mmg=False)¶ This function extract genes from filtered differentiall expressed genes.
Parameters: - file – Filtered DEGs file.
- combinations – Comparison list contaning samples.
- multisheet – True if file is multisheet.
- outDir – Output directory. Default is current working directory.
GeneType: Genes to extract all, up , down. Default is all.
-
pyseqrna.pyseqrna_utils.
get_basename
(filePATH)¶ This function get the base name of the file from full path
Parameters: filePATH – Path to file.
-
pyseqrna.pyseqrna_utils.
get_cpu
()¶ This function get actual CPU count of the system
Returns: Integer Return type: int with 80 % of CPU count
-
pyseqrna.pyseqrna_utils.
get_directory
(filePATH)¶ This function retrun directory of a file
Parameters: filePATH – Path to file.
-
pyseqrna.pyseqrna_utils.
get_file_extension
(filePATH)¶ This function return the extension of file
Parameters: filePATH – Path to file.
-
pyseqrna.pyseqrna_utils.
get_parent
(filePATH)¶ This function return the file name without extension
Parameters: filePATH – Path to file.
-
pyseqrna.pyseqrna_utils.
make_directory
(dir)¶ This function create a directory
Parameters: dir – Directory name. Returns: Name of created directory.
-
pyseqrna.pyseqrna_utils.
parse_config_file
(infile)¶ This function parse the config file for all the programs used in pySeqRNA
Param: configFile: <program>.ini config file containing arguments. Retrun: Program specific arguments Return type: a dictionary
-
pyseqrna.pyseqrna_utils.
parse_gff
(file)¶ This function parse a gene feature file in a dataframe for gene IDs
Parameters: file – A gene feature file.
-
pyseqrna.pyseqrna_utils.
read_input_file
(infile, inpath, paired=False)¶ This function reads input sample file and convert into a dictionary. It also make all possible combination for DEG analysis. Target dataframe for differential analysis.
Parameters: - inputFile – input sample file containing the infromation about project
- inputPath – Path for input fastq files
- pairedEND – Check if reads are paired end]. Defaults to False
Returns: samples, combinations and targets for differential expression
Return type: A dictionary
-
pyseqrna.pyseqrna_utils.
replace_cpu
(args, args2)¶ This function replace the actual CPU in config file.
Returns: Change CPU count to 80% of available CPU
pyseqrna.version module¶
Title: | This is version of pySeqRNA |
---|---|
Created: | September 15, 2021 |
Author: | naveen duhan |
pyseqrna.pyseqrna_plots module¶
Title: | This module function generate visualization |
---|
:Created : November 2, 2021
:Author : Naveen Duhan
-
pyseqrna.pyseqrna_plots.
plotHeatmap
(degDF=None, combinations=None, num=50, figdim=(12, 10), extraColumns=False, type='counts')¶ This function plots a heatmap based on FOLD change or counts.
Parameters: - degDF – All gene expression file or Counts file.
- combinations – All sample combinations.
- num – Total number of genes to plot. Default is 50 (25 up and 25 down)
- figdim – Figure dimensions.
- type – Heatmap to create on counts/degs.
-
pyseqrna.pyseqrna_plots.
plotMA
(degDF=None, countDF=None, comp=None, FOLD=2, FDR=0.05, color=('red', 'grey', 'green'), dim=(8, 5), dotsize=8, markerType='o', alpha=0.5)¶ This function plots a MA plot.
Parameters: - degDF – All gene expression file.
- comp – Sample comparison.
- FOLD – FOLD change. Defaults to 2.
- FDR – FDR value. Defaults to 0.05.
- color – Colors to be used in plot. Defaults to (‘red’,’grey’,’green’).
- dim – Dimensions of the plot. Defaults to (8,5).
- dotsize – Dotsize on plot. Defaults to 8.
- markeType – Shape to use. Defaults to ‘o’.
- alpha – Transparency of plot. Defaults to 0.5.
-
pyseqrna.pyseqrna_plots.
plotVenn
(DEGFile=None, FOLD=2, comparisons=None, degLabel='', fontsize=14, figsize=(12, 12), dpi=300)¶ This function plots a Venn diagram for filtered degs in samples.
Parameters: - DEGFile – Filtered deg excel file containg samples sheet-wise.
- FOLD – FOLD change. Defaults to 2.
- comparisons – Comparison list. Defaults to None.
- degLabel – How to put labes either total/ up-down. Defaults to “” i.e. up-down.
- fontsize – Font size. Defaults to 14.
- figsize – Figure size. Defaults to (12,12).
- dpi – Figure DPI resolution. Defaults to 300.
-
pyseqrna.pyseqrna_plots.
plotVolcano
(degDF=None, comp=None, FOLD=2, pValue=0.05, color=('red', 'grey', 'green'), dim=(8, 5), dotsize=8, markerType='o', alpha=0.5)¶ This function plots a Volcano plot.
Parameters: - degDF – All gene expression file.
- comp – Sample comparison.
- FOLD – FOLD change. Defaults to 2.
- pValue – Pvalues. Defaults to 0.05.
- color – Colors to be used in plot. Defaults to (‘red’,’grey’,’green’).
- dim – Dimensions of the plot. Defaults to (8,5).
- dotsize – Dotsize on plot. Defaults to 8.
- markeType – Shape to use. Defaults to ‘o’.
- alpha – Transparency of plot. Defaults to 0.5.