## Nussinov algorithm to predict secondary RNA fold structures

The prediction of secondary RNA structures is an interesting and important problem in bioinformatics. It is by no means at the forefront of the area, but still cool to look at as an introduction to bio(informatics/computation). This post will introduce the Nussinov-Jacobson algorithm for predicting and building secondary fold structures of RNA sequences. It is a dynamic programming algorithm and was one of the first developed for the prediction of RNA structures as RNA combinatorics can be quite involved and expensive. For a recap on dynamic programming, check out this post!

## RNA & Nussinov

RNA, DNAs less famous cousin, is in charge of carrying out and transferring the genetic messages set out by DNA to your body for tasks such as protein synthesis. DNA codes and acts as a blueprint for your genetic traits, while the various types of RNA convert that genetic information into useful proteins by transferring the DNA blueprints out of the nucleus and into the ribosome. Both DNA and RNA are pretty cool.

Formally now, we can define RNA as a sequence $$S$$, made up of four (nucleotide) bases; $$S \in \{G, C, A, U\}$$. These 4 bases come in pairs, known as Watson-Crick pairs, and couple as follow: $$G \leftrightarrow C, \; A \leftrightarrow U$$. These pairs are the “strong/stable” base pairs, however, in reality, we also have the coupling of less stable pairs, known as canonical pairs, namely $$G \leftrightarrow U$$.

## 0/1 Knapsack Discrete Optimization w/ Dynamic Programming

The Knapsack problem is one I’ve encountered a handful of times, both in my studies (courses, homework, whatever…), and in real life. Unfortunately, the times I encountered it in a real situation I never had to actually fully solve it – it was more along the lines of, “hey, this is just like the Knapsack problem, but we can work around it”.

I’m currently taking a course on discrete optimization, a field I’ve always thought was cool but never been great at and one which also has a reputation for being particularly hard as you progress deeper. As a way for me to brush up on some basics I thought I’d revisit dynamic programming, an often used technique in solving discrete optimization problems, with the Knapsack problem. The course is very adamant in trying all the different covered algorithms on all problems; for example, if solving Knapsack starts to be too time consuming with DP, use Linear Programming or Branch & Bound with some type of constraint relaxation. To that I say, I will… but I also won’t! I’ll of course try all the algorithms, but since this is a discrete optimization course, I also thought “let me make this even more painful” and try to optimize each algorithm as best I can 🙂

## Introduction

The Knapsack problem is actually a combinatorial optimization problem where we need to find an optimal number of objects from a finite set of objects usually based on their properties. It’s an interesting problem as it is both an NP-hard (optimization) and NP-complete (decision) problem that has a pseudo-polynomial algorithm, meaning its worst-case time complexity is polynomial in numeric values of the input. We’ll go through the problem both informally and formally.

Continue reading “0/1 Knapsack Discrete Optimization w/ Dynamic Programming”