Nitto BioPharma develops and delivers innovative, life-transforming therapies for patients’ unmet medical needs, and accelerates the ability to bring these products to market.
AI Dynamics worked with Nitto BioPharma to white label a BLAST integration solution utilizing the company’s NeoPulse® end-to-end enterprise AI platform for experts and non-experts alike. Nitto BioPharma came to AI Dynamics requesting a tool to extract 19 base pair length potential siRNA sequences from a provided target protein sequence and rank them in order of inhibition value. We’ve developed a case study to document this project.
To learn more about how AI Dynamics’ NeoPulse platform can unlock the undisputed power of AI for your company, contact us today!
siRNA – Small interfering RNA – Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA, typically 20-24 base pairs in length, and is able to regulate gene expressions. It is an attractive new class of therapeutics due to its ability to inhibit specific genes in many genetic diseases.
Inhibition value – the rate that a substance reduces the rate of activity of another substance. A substance with an IC50 value reduces the activity of the second substance by 50%.
Nucleic acid sequence – A nucleic acid sequence is a succession of bases signified by a series of a set of five different letters that indicate the order of nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. DNA is responsible for storing and transferring genetic information, while RNA acts as a messenger between DNA and ribosomes to make proteins. By convention, sequences are usually presented from the 5′ end to the 3′ end. For DNA, the sense strand is used. Because nucleic acids are normally linear (unbranched) polymers, specifying the sequence is equivalent to defining the covalent structure of the entire molecule. For this reason, the nucleic acid sequence is also termed the primary structure.
Call query – Represents specific operations that can be invoked to perform tasks, such as adding, updating or deleting data.
FASTA format – In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the FASTA software package, but has now become a near universal standard in the field of bioinformatics.
Entrez – Entrez is a molecular biology database system that provides integrated access to nucleotide and protein sequence data, gene-centered and genomic mapping information, 3D structure data, PubMed MEDLINE, and more. The system is produced by the National Center for Biotechnology Information (NCBI) and is available via the Internet.
HUGO Gene Nomenclature Committee – The HGNC is responsible for approving unique symbols and names for human loci, including protein coding genes, ncRNA genes and pseudogenes, to allow unambiguous scientific communication. For each known human gene HGNC approves a gene name and symbol (short-form abbreviation). All approved symbols are stored in the HGNC database, www.genenames.org, a curated online repository of HGNC-approved gene nomenclature, gene groups and associated resources including links to genomic, proteomic and phenotypic information.
ROC Curve – An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots true positive and false positive rates.
R – R is a statistical measure of fit that indicates how much variation of a dependent variable is explained by the independent variable(s) in a regression. It ranges 0-1. What qualifies as a “good” R value will depend on the context.
AUC – Area under Curve measures how well predictions are ranked rather than absolute values. It measures the quality of the model’s predictions irrespective of what classification threshold is chosen. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.
Nitto BioPharma, Inc. based in San Diego, develops and delivers innovative life-transforming therapies for patients’ unmet medical needs, and accelerates the ability to bring these products to market. Nitto BioPharma is a subsidiary of Nitto Denko Corporation, based in Osaka, Japan.
siRNA silencing is considered one of the most promising techniques in future therapy for viral-mediated and gene-mediated disease, such as HIV, HBV and cancer. The key to this technique is siRNA’s inhibition efficiency prediction and proper siRNA selection. AI Dynamics worked with Nitto BioPharma to white label a BLAST integration solution built on the company’s NeoPulse® end-to-end enterprise AI platform for experts and non-experts alike.
Nitto BioPharma came to AI Dynamics requesting a tool to extract 19 base pair length potential siRNA sequences from a provided target protein sequence and rank them in order of inhibition value. After receiving the inhibition value, Nitto BioPharma wants to submit siRNA sequences with the highest inhibition values to BLAST (Basic Local Alignment Search Tool), which finds regions of similarity between biological sequences. It compares nucleotides to sequence databases and calculates the statistical significance.
The NeoPulse platform takes a FASTA sequence (a text-based format for representing nucleotide sequences) or target name as input and returns potential siRNAs with the highest inhibition in order. A deep learning regression model is applied to predict inhibition of a 19-sequence siRNA from its nucleic acid sequence. The model is built based on the state-of-the-art NLP algorithm, Transformer, which significantly improves the model performance. Transfer learning is applied in the training process, utilizing the feature extracted from more than 2 million human RNA sequences.
AI Dynamics created a model to predict inhibition of a 19-sequence siRNA from its nucleic acid sequence. The project call queried API to analyze the imported nitto_reg to gain inhibition values. To obtain the full FASTA sequences (the text-based format for representing nucleotide sequences using single-letter codes) and their annotations, AI Dynamics used the Entrez API (to gain access to the Entrez molecular biology database system that provides integrated access to nucleotide sequence data) and GeneNames API (to gain access to the database of the HUGO Gene Nomenclature Committee (HGNC), which is responsible for approving gene names and symbols for every known human gene).
To obtain BLAST results, Nitto BioPharma used the NCBI Blast API to gain access to the suite of programs used to generate alignments between a nucleotide or protein sequence, referred to as a “query” and nucleotide or protein sequences within a database, referred to as “subject” sequences.
NeoPulse then took the full FASTA sequence and broke it into 19-digit siRNA sequences. The result table includes sequence and value columns and can be exported to a .csv file. NeoPulse then sent a BLAST API request with the sequences that have inhibition values above the “Minimum Predicted Inhibition Value” input. In the BLAST result table, the user can get the inhibition value of the sequences that have a partial match to the original sequence.
The result of our solution achieved R=0.85 and AUC=0.93. It outperformed other published siRNA prediction models such as, BIOPREDsi (Novartis model; Huesken et al., 2005), MysiRNA (Mysara et al., 2012) and SMEpred (Dar et al., 2016), which reported R=0.66, 0.70 and 0.72, respectively. The higher accuracy helps in designing siRNA sequence more efficiently, reducing time and cost of screening in the lab.
The effective application of AI has the potential to dramatically improve the ability to new, life-transforming therapies and accelerate the timeframe to bring these to market. AI Dynamics’ NeoPulse® enterprise AI platform enables AI experts and non-experts to create and roll out powerful new solutions quickly.
Huesken et al., (2005) Design of a genome-wide siRNA library using an artificial neural network. Nat. Biotechnol. 23(8):995-1001.
Mysara et al., (2012) MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy (ΔG). J Biomed Inform. 45(3):528-34.
Dar et al., (2016) SMEpred workbench: A web server for predicting
efficacy of chemically modified siRNAs. RNA Biol. 13(11):1144-1151.