The Customer
Nitto BioPharma, Inc. based in San Diego, develops and delivers innovative life-transforming therapies for patients’ unmet medical needs, and accelerates the ability to bring these products to market. Nitto BioPharma is a division of Nitto Denko Corporation, based in Osaka, Japan.
Customer Challenge
siRNA silencing is considered one of the most promising techniques in future therapy for viral-mediated and gene-mediated disease, such as HIV, HBV and cancer. The key to this technique is siRNA’s inhibition efficiency prediction and proper siRNA selection. AI Dynamics worked with Nitto BioPharma to white label a BLAST integration solution built on the company’s NeoPulse® end-to-end enterprise AI platform for experts and non-experts alike.
Nitto BioPharma came to AI Dynamics requesting a tool to extract 19 base pair length potential siRNA sequences from a provided target protein sequence and rank them in order of inhibition value. After receiving the inhibition value, Nitto BioPharma wants to submit siRNA sequences with the highest inhibition values to BLAST (Basic Local Alignment Search Tool), which finds regions of similarity between biological sequences. It compares nucleotides to sequence databases and calculates the statistical significance.
Solution
The NeoPulse platform takes a FASTA sequence (a text-based format for representing nucleotide sequences) or target name as input and returns potential siRNAs with the highest inhibition in order. A deep learning regression model is applied to predict inhibition of a 19-sequence siRNA from its nucleic acid sequence. The model is built based on the state-of-the-art NLP algorithm, Transformer, which significantly improves the model performance. Transfer learning is applied in the training process, utilizing the feature extracted from more than 2 million human RNA sequences.
AI Dynamics created a model to predict the inhibition of a 19-sequence siRNA from its nucleic acid sequence. The project call queried API to analyze the imported nitto_reg to gain inhibition values. To obtain the full FASTA sequences (the text-based format for representing nucleotide sequences using single-letter codes) and their annotations, AI Dynamics used the Entrez API (to gain access to the Entrez molecular biology database system that provides integrated access to nucleotide sequence data) and GeneNames API (to gain access to the database of the HUGO Gene Nomenclature Committee (HGNC), which is responsible for approving gene names and symbols for every known human gene).
To obtain BLAST results, Nitto BioPharma used the NCBI Blast API to gain access to the suite of programs used to generate alignments between a nucleotide or protein sequence, referred to as a “query” and nucleotide or protein sequences within a database, referred to as “subject” sequences.
NeoPulse then took the full FASTA sequence and broke it into 19-digit siRNA sequences. The result table includes sequence and value columns and can be exported to a .csv file. NeoPulse then sent a BLAST API request with the sequences that have inhibition values above the “Minimum Predicted Inhibition Value” input. In the BLAST result table, the user can get the inhibition value of the sequences that have a partial match to the original sequence.
Results
The result of our solution achieved R=0.85 and AUC=0.93. It outperformed other published siRNA prediction models such as, BIOPREDsi (Novartis model; Huesken et al., 2005), MysiRNA (Mysara et al., 2012), and SMEpred (Dar et al., 2016), which reported R=0.66, 0.70 and 0.72, respectively. The higher accuracy helps in designing siRNA sequences more efficiently, reducing the time and cost of screening in the lab.
White-labeled solution (BLAST integration) built on top of NeoPulse.
Customer can integrate new data into model through automated retraining
Reference
Huesken et al., (2005) Design of a genome-wide siRNA library using an artificial neural network. Nat. Biotechnol. 23(8):995-1001.
Mysara et al., (2012) MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy (ΔG). J Biomed Inform. 45(3):528-34.
Dar et al., (2016) SMEpred workbench: A web server for predicting efficacy of chemically modified siRNAs. RNA Biol. 13(11):1144-1151.
We were able to successfully train an AI model to recognize complex industrial parts using Neopulse 3.0 on AWS. The AI solution was built very quickly and was able to recognize objects in unpredictable, real-world environments with high accuracy.
Everybody seems confident in AI, and they actually enjoy solving various AI problems.
The vendor’s services are integral to providing AI solutions for a wider audience. They had an effective project management style, accented by a quick working style.
AI Dynamics applied LRP to a cancer classification model, built on gene expression data, to identify a set of genes important for classifying cancer patients.