The Knapsack problem is one I’ve encountered a handful of times, both in my studies (courses, homework, whatever…), and in real life. Unfortunately, the times I encountered it in a real situation I never had to actually *fully* solve *it* – it was more along the lines of, “hey, this is just like the Knapsack problem, but we can work around it”.

I’m currently taking a course on discrete optimization, a field I’ve always thought was cool but never been great at and one which also has a reputation for being particularly hard as you progress deeper. As a way for me to brush up on some basics I thought I’d revisit dynamic programming, an often used technique in solving discrete optimization problems, with the Knapsack problem. The course is very adamant in trying all the different covered algorithms on all problems; for example, if solving Knapsack starts to be too time consuming with DP, use Linear Programming or Branch & Bound with some type of constraint relaxation. To that I say, I will… but I also won’t! I’ll of course try all the algorithms, but since this *is* a discrete *optimization* course, I also thought “let me make this *even more painful*” and try to optimize each algorithm as best I can ðŸ™‚

Contents

## Introduction

The Knapsack problem is actually a combinatorial optimization problem where we need to find an optimal number of objects from a finite set of objects usually based on their properties. It’s an interesting problem as it is both an NP-hard (optimization) and NP-complete (decision) problem that has a pseudo-polynomial algorithm, meaning its worst-case time complexity is polynomial in numeric values of the input. We’ll go through the problem both informally and formally.

### Informal

Imagine you’re at the farmers market with a single bag to carry home your produce. Because there are so many nice things at the market, and your bag only has capacity \(K\), you’ll need to choose very carefully what you want to buy from items in the set \(\mathcal{I}\). So, how do you choose the items? The first approach may be to simply maximize the value in your knapsack by either picking or leaving an item based on their value (i.e, how much they’re worth or how tasty they are). Another approach may be to maximize the value density per item in your bag, since we know how much all the items weigh individually and their value.

As this is post is about solving the 0/1 Knapsack problem, we’ll only be focused on the first possible approach. We want to maximize the total value in our knapsack by either picking or leaving an item.

### Formal

0/1 Knapsack means that we solve the problem by restricting an item to either 0 or 1; left or picked, in or out. Given a set of items \(\mathcal{I}\) with weight and value \(w_i, v_i\), and a knapsack with capacity \(K\), we want to maximize the sum of our items value subject to our knapsacks capacity. More formally we can model the problem with ab objective function to maximize, subject to constraints and a decision variable:

Maximize

\(

\large\sum_{i = 1}^\mathcal{I} v_i x_i \\

\)

subject to

\(

\large\sum_{i = 1}^\mathcal{I} w_i x_i \leq K

\)

where \(x_i \in \{0, 1\}\), is the binary decision for each item.

If one ignores the computational cost and inefficiency, the easiest and simplest approach to solving this is by brute force. Following this is a trivial greedy approach, where we can improve it by maybe using a simple heuristic during search, or beforehand. For example, sorting the items by value and choosing one-by-one until we reach capacity \(K\). Here’s a greedy example:

value = 0 weight = 0 knapsack = [0] * len(items) for i in items: if weight + item.weight <= capacity: knapsack[item.index] = 1 value += item.value weight += item.weight

The issue with a greedy algorithm is that, even after you leave it running for *months*, the solution it gives has no guarantee of being the best or most optimal one. Proving the solution is correct becomes tricky and you’ll have to spend further time on a proof. Remember, Knapsack is NP-Complete. A better and *smarter* approach (psst, the hint is in the title) is to use Dynamic Programming!

## Dynamic Programming

Dynamic programming (DP) is a technique used when the solution to a problem has an optimal substructure and overlapping sub-problems. At its *most basic*, it’s a “better version of divide and conquer” – a description which is wrong but gives a very general “layman’s” overview. DP consists of programming in such a way where one uses past knowledge of a problem to solve it. This essentially means that if you take a problem and divide it into smaller and smaller sub-problems, at some point, you’ll be solving the same subproblem *again*. Because there’s a high likelihood of re-seeing the same subproblem, the results of these sub-problems are **memoized**; stored for later, ideally in cache. Because we have these sub-problems and their solutions stored, we can now also use them to help us solve bigger problems until we’ve solved our original problem. An example would be calculating the \(n^{\text{th}}\) fibonacci number. If \(n\) is *huge*, it may take some time… unless you already have the solution to \(f(n – 1, n – 2)\), where \(f\) is a function for calculating a fibonacci number.

Solutions to previous sub-problems in the fibonacci example can be stored in a lookup table, making access time quick and efficient. This fast/low time complexity comes at a cost of having high space complexity from storing all previous results.

It’s a great technique for powerful algorithm design, path-finding problems, and reducing time complexity. It’s also usually one of the first and “easiest” methods of solving discrete optimization problems, given the problem has the appropriate properties. One must also take into account the exact problems constraints and other details of course. My professor once called it “careful bruteforce”. As a side-note, I hate the name.

I’ll briefly cover the Fibonacci sequence with DP as an example before jumping back into Knapsack. Let’s say I want to compute fibonacci to the 10 millionth integer, a computation which is sure to take some time.

A common way of computing fibonacci is by implementing it recursively:

def fib(n): if n <= 2: return 1 return fib(n - 1) + fib(n - 2)

Even though the above implementation works, it’s not at all efficient as `n`

gets bigger. Imagine having to recurse \(1000000\) times! Its time complexity is also less than desirable since it is **exponential**. We can represent `fib`

as an equation, \(T(n-1) + T(n – 2) + \mathcal{O}(1)\), meaning the time to compute for `fib(n)`

is equal to the sum of time taken to calculate `fib(n - 1)`

and `fib(n - 2)`

plus a time constant. This produces an upper bound of \(\mathcal{O}(2^n)\), or more specifically \(\mathcal{O}(2^\phi)\), where \(\phi\) is the golden ratio.

We can now compute the fibonacci series, even though it’s very slow. To make this faster we can implement the same function above using a technique called **memoization**. The idea of memoization is a simple one as stated earlier, especially for our `fib`

use case. Whenever we compute a number in the sequence, we store it in a “cache type variable” – here, in Python, we use a dictionary. For now, ignore any language or other type optimizations we can do. So when we *are* computing our `fib`

sequence, we can check our cache variable for numbers that have already been computed removing the need to constantly re-compute them. Let’s rewrite our `fib`

function with this in mind:

mem = {} def fib(n): if n <= 2: return 1 if n in mem: return mem[n] mem[n] = fib(n - 1) + fib(n - 2) return mem[n]

Let’s do a quick time complexity analysis. Any non-memoized calls, i.e, when we call `fib(n)`

, costs us \(n\), and any memoized calls costs us \(\mathcal{O}(1)\). This leaves us with \(\mathcal{O}(n)\), which is a huge improvement! Take note how we’re also breaking down our task into smaller little sub-problems by introducing more variables and processes, rather than just one recursive function.

We’ve now got the building blocks to transform this solution into a DP one. DP algorithms are often times referred to, and implemented as **bottom-up**. When we program recursively, we usually start thinking about how to code it from the “top”, and work our way down. With a bottom-up approach we… well, we do the opposite: we work our way to the top from the bottom. For example, we usually learn maths in a bottom-up approach, but learn sports top-down – play first then get used to the rules. In our code, instead of having the recursive call to `fib`

, we’ll implement a `for`

loop.

def fib(n): f2 = 0 f1 = 1 if n <= 2: return 1 for i in range(1, n): f = f1 + f2 f1, f2 = f, f1 return f

There are more LOC, but, the code is now a more optimized, faster, and efficient solution to our simple fibonacci problem! Note, it’s still \(\mathcal{O}(n)\). If it’s not clear why this is better, we now end up needing only the last two numbers to calculate the next number, as opposed to having to recalculate the entire fibonacci “tree” for each \(n\) (1^{st} solution).

For empiricism, here are run times with all three methods of computing Fibonacci:

- Recursion:
`fib(40)`

– \(\sim\) 21.5s - Memoziation:
`fib(250)`

– 258 Âµs \(\pm\) 4.88 Âµs per loop (mean \(\pm\) std. dev. of 7 runs, 10000 loops each) - Bottom-up (DP):
`fib(250)`

– 137 Âµs \(\pm\) 11.2 Âµs per loop (mean \(\pm\) std. dev. of 7 runs, 10000 loops each)

## 0/1 Knapsack

Having gone over the formal formulation of the problem we can now design a solution. The first step is to identify or break the problem down into sub-problems as required for DP algorithms. Let’s assume we have a knapsack with capacity \(K = 8\) and \(4\) items, \(\mathcal{I} = {1, 2, 3, 4}\), each with a weight and value, \(w_i, v_i\). Now, while this is still a very easy problem that you could solve in your head, it may take a minute or two to go through all the combinations of items. What if you \(0\) items? \(1\) item? \(2\) items? You can see that as the items decrease it becomes easier to solve. These are our sub-problems!

If you imagine we have a 2-dimensional matrix or tensor, of size \(O(\mathcal{I} + 1, K + 1, )\) where each row x column pair is a sub-problem solution, we can use this to solve a complete Knapsack problem. Take as an example item 1, \(i_1\), with properties \(w_1 = 8. v_1 = 80\). With capacity \(K\), our matrix would start to look like this:

\(

\begin{array}{c|lcr}

i & \text{0} & \text{1} & \text{2} & \text{3} & \text{4} & \text{5} & \text{6} & \text{7} & \text{8} \\

\hline

0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\

1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 80

\end{array}

\)

Remember the first row is for item \(0\) (no items), and the first column is with capacity \(0\). So, with this format of DP set in place, we’re now looking for the optimal value solution \(O(i, K)\).

To continue filling up the DP matrix, we need to define a **recurrence relation**. This is an equation that recursively defines a sequence of values in an array, which our matrix rows and columns are. Specifically, we will be using the **Bellman equation**. Completing the 3rd row of our matrix with \(i_2\) having \(w_2 = 5, v_2 = 40\) we get:

\(

\begin{array}{c|lcr}

i & \text{0} & \text{1} & \text{2} & \text{3} & \text{4} & \text{5} & \text{6} & \text{7} & \text{8} \\

\hline

2 & 0 & 0 & 0 & 0 & 0 & 40 & 40 & 40 & ?

\end{array}

\)

What happens with \(O(2, 8)\)? We can see that the clearly optimal solution would be to take \(i_1\) as its value is double the current items but what if we didn’t? How is this decision made? This is where the recurrence relation comes in and defines the rules for “traversing” and filling the table until the problem is complete. At each \(O(i, K)\) we check the item’s weight against the knapsack capacity. If \(w_i \leq K\) then we either choose \(O(i – 1, K)\) *or* \(O(i – 1, K – w_i) + v_i\). Since we’re optimizing our carrying capacity in the knapsack, we choose the maximum of those two values. Let’s use this to fill the rest of our matrix where \(w_3 = 3, v_3 = 20\) and \(w_4 = 1, v_4 = 50\).

\(

\begin{array}{c|lcr}

i & \text{0} & \text{1} & \text{2} & \text{3} & \text{4} & \text{5} & \text{6} & \text{7} & \text{8} \\

\hline

0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\

1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 80 \\

2 & 0 & 0 & 0 & 0 & 0 & 40 & 40 & 40 & 80 \\

3 & 0 & 0 & 0 & 20 & 20 & 40 & 40 & 40 & 80 \\

4 & 0 & 50 & 50 & 50 & 70 & 70 & 90 & 90 & 90

\end{array}

\)

Our optimal value is \(90\) by choosing items \(i_3, i_4\).

To formalize this, we end up with:

\(

O(i, K) = \max{(O(i – 1, K), O(i – 1, K – w_i) + v_i)} \\ \quad \text{if } w_i \leq K \\ \quad \text{otherwise } O(i – 1, K)

\)

which is a slight variation of the Bellman equation.

One last thing to touch upon is how we actually find out what items we’ve chosen. In order to traceback through the matrix and find which items have been selected, we end up doing a similar reverse operation starting from \(O(i, K)\):

- If \(O(i, K) != O(i – 1, K)\)
- \(i\) is a chosen item and is marked as taken, \(1\)
- Move to \(O(i – 1, K – w_i)\)

- If \(O(i, K) = O(i – 1, K)\)
- \(i\) is not chosen and is marked as such, \(0\)
- Move to \(O(i – 1, K)\)

Once this is over, we’ll have a) an optimal value and, b) array of values to take and leave. Following this algorithm for our matrix gives us an array \([0, 1, 0, 1]\)

## Implementation

Implementing this in Python is very straight forward if the steps highlighted above are followed. Since we’re filling out a matrix we need two `for`

loops. One for the rows and one for the columns. Inside these loops we use our recurrence relation.

def dp(item_count: int, capacity: int, items: list): table = np.zeros((item_count + 1, capacity + 1), dtype = np.int32) for i in range(item_count): for w in range(capacity + 1): if items[i].weight <= w: p1 = table[i][w] p2 = items[i].value + table[i][w - items[i].weight] table[i + 1][w] = max(p1, p2) else: table[i + 1][w] = table[i][w] ... Items = namedtuple("Item", ["value", "weight"]) items = [Item(80, 8)]

Some things to note:

`items`

is a list of`namedtuples`

so that it contains value and weight attributes for each item- Since the first row of the matrix is going to be empty and contain only \(0\), I use
`i + 1`

as`i`

and`i`

as`i - 1`

. This is important to remember when reading the code. Of course, it can be done in other ways but Â¯\_(ãƒ„)_/Â¯ - This is
*sloooooow*.

We now need to implement the traceback discussed above.

knapsack = [] knapsack_opt_val = table[item_count][capacity] # [-1][-1] w = capacity for i in range(item_count, 0, -1): if table[i][w] != table[i - 1][w]: knapsack.append(1) w -= items[i - 1].weight i -= 1 else: knapsack.append(0) return(knapsack_opt_val, knapsack)

More things to note:

- Again, there are different ways to implement this loop, like pre-allocating an array
`knapsack`

needs to be reversed at the end since our first item is \(O(i, K)\) ->`knapsack.reverse()`

We can now produce an optimal knapsack solution *and* retrieve the chosen items!

## Optimization

This to me was the most fun part of the problem. Technically we have a working solution. It takes *roughly* 15 seconds to compute a solution for 40 items, and 1 minute 30 for 50 items. Clearly this isn’t going to work when we have 200 items.

Taking a step back, our current algorithm has time complexity \(\mathcal{O}(Kn)\), where \(K\) is the size of the knapsack and \(n\) is the number of items – our two `for`

loops. However, one needs to take into account multiple things:

- Python can be slow (fight me)
- \(K\) needs \(\log{K}\) bits to be represented
- This actually makes our \(\mathcal{O}(Kn)\) exponential in the size of the input, hence being pseudo-polynomial, something I mentioned earlier

To optimize this you might say, “just rewrite it using C++” or something similar right? Right, well, I’ve done that, but, there are also a lot of optimizations that can be done directly *in* Python to both the algorithm and Python itself. I actually ended up spending about 15 days optimizing everything (way too much), but it was very fun and I learnt a lot, especially when vectorizing the code, something I’ve done maybe only once as an exercise.

### Space optimization

The first optimization we can do actually involves approaching the DP problem differently. In the specifics of Knapsack, if you pay attention to the problem formulation and the code (in other words, what we’re actually doing), you can see that we’re only *ever* using rows `i`

and `i + 1`

. This means we don’t need \(i\) rows in our matrix. We only need 2! This will reduce our matrix space from \(O(i, K)\) to \(O(2, K)\).

Doing this however means we cant use `i`

inside our loops when accessing the matrix as we’ll be out of bounds. We only need to loop between `0`

and `1`

. One way to achieve this is by using some simple bitwise manipulation. If we `AND (&)`

the value of `i`

with 1 we’ll always get back either `0`

or `1`

, allowing us to access both rows of the matrix.

... for i in range(item_count): c, p = (i & 1), (~i & 1) for w in range(capacity + 1): if items[i].weight <= w: p1 = table_small[c, w] p2 = items[i].value + table_small[c, w - items[i].weight] table_small[p, w] = max(p1, p2) else: table_small[p, w] = table_small[c, w] ...

`c`

and `p`

are the equivalent of `i`

and `i + 1`

. `p1`

here is the current row and `p2`

is the current row minus the item’s weight but with the addition of `item[i].value`

as per the Knapsack formulation (\(O(i – 1, K – w_i) + v_i\)). If the condtional statement is `False`

, we simply assign the current row to the previous row. It’s important to remember that we’re not really incrementing and iterating over the matrix with `c`

and `p`

as we would with `i`

, they’ll always be alternating between `0`

and `1`

respectively.

We’ve now improved space complexity and got a *slight* decrease in run time. Howver, the downside of this is that can no longer traceback through the matrix and find the knapsacks items. It’s worthwhile noting that you can also reduce using 2 rows to just using 1 but I didn’t bother with this.

### Binary matrix

Another space saving optimization is to convert the matrix from `np.int32`

to `np.bool`

. We essentially only store *decisions* at each \(O(i, K)\) hugely decreasing the size of the matrix in RAM. To do this we’ll need to compute the DP matrix as normal, only with 2 rows however, but we also compute a binary matrix and store \(O(i, \mathcal{K}) != O(i – 1, K)\) as a decision variable. \(O(i, K)\) will be \(1\) if its value isn’t the same as the one above it, which will either be the current item or the \(O(i – 1, K – w_i)\), otherwise it will be \(0\).

def dp_opt_v2(item_count: int, capacity: int, items: list): table_small = np.zeros((2, capacity + 1), dtype = np.int32) table = np.zeros((item_count + 1, capacity + 1), dtype = np.bool) for i in range(item_count): c, p = (i & 1), (~i & 1) for w in range(capacity + 1): if items[i].weight <= w: p1 = table_small[c, w] p2 = items[i].value + table_small[c, w - items[i].weight] table_small[p, w] = max(p1, p2) table[i + 1, w] = int(table_small[c, w] != table_small[p, w]) else: table_small[p, w] = table_small[c, w] table[i + 1, w] = 0

Note I’m also using the 2 row matrix optimization alongside this so it is still possible to retrieve the optimal solution, otherwise we would be able to traceback but not have an actual solution… although there are other ways to derive it from this, such as using the `knapsack`

array, but alas, tomato potato.

This optimization also requires a slight modification of the traceback algorithm. We now need to look at if \(O(i, K) = 1\). If it does, this item was chosen, otherwise it wasn’t.

knapsack = [] knapsack_opt_val = table_small[-1][-1] w = capacity for i in range(item_count, 0, -1): if table[i, w] == 1: knapsack.append(1) w -= items[i - 1].weight i -= 1 else: knapsack.append(0) return (knapsack_opt_val, knapsack)

We’ve now got a solution which encompasses a binary matrix to traceback and find the knapsack’s items *and *we only use a \(O(2, K)\) matrix to store the optimal solution instead \(O(\mathcal{I}, K)\). This has saved us some signigicant space complexity, however, the fact is we still have 2 slow`for`

loops. If only there was a way to speed up even just *1 *loop while also using the above optimizations…

### Vectorization

This final optimization is the one which took me the most time… many days of trial and error and re-reading `numpy`

documentation and vectorization tutorials. Turns out it was an excellent exercise in doing vectorization (to me anyway, especially since this is my 2nd time?). Fun fact, once I’d finished this optimization using my test data, I tried using it with a real problem, one in the 10s of thousands of items and capacity 1 million. I forgot to change `table`

‘s `dtype`

to `np.int32`

rather than `np.int8`

so all my calculations were overflowing. Took me longer than I’d like to admit to figure out my “typo”.

Anyhow, because we’re doing operations on a matrix, and because `numpy`

is being used, one very rewarding and powerful way to speed this up is to use *vectorization*. Vectorization is the process of rewriting a `for`

loop so that instead of iterating \(K\) times to process our row, in our case, we process the entire row simultaneously. We can do this thanks to vector and SIMD instruction sets in CPUs which can apply the same operation to some data.

`numpy`

has a lot of functions that are vectorized, and so in an ideal world, rewriting code in Python that makes use of these vectorized functions is a matter of finding the right one replacing it with your `for`

loop. Of course, there may be constraints to your particular problem which make vectorization harder or not possible, so it’s not a “magic” spell that works everywhere.

Vectorization tends to work best on problems that require the same operation over and over. Likewise, it’s also simpler to implement on straightforward and simple `for`

loops and ones without too many dependencies such as read-after-write, indirect memory acces or complicated indexing.

As a slightly-oversimplified example, take this loop in C++ and Python:

for (int i = 0; i <= 20; i++) row[i] += column[i];

for i in range(20): row[i] += column[i]

A vectorized solution would look like this:

for (int i = 0; i <= 20; i += 4) row[i:i+3] += column[i:i+3]

row += column

A few things to note besides the simplifications:

- We’re assuming that the Python arrays are actually
`numpy.ndarray`

- Vectorizing in Python with
`numpy`

is different than in C++ where you tell the compiler to vectorize, plus some extra magic `numpy`

also handles**broadcasting**- In the C++ vectorized version we’re using 4 because we’re processing 4 elements at the same time

I’ve added some much more detailed information on vectorization in the “Resources & References” section at the bottom.

With that brief in mind, let’s move on to vectorizing our Python code with `numpy`

.

The code we want to vectorize is the 2nd `for`

loop where we iterate over \(K\). Let’s go through *exactly* what we’re doing in the loop and make a note of all the operations that are actually going on.

`if items[i].weight <= w`

- Assign \(O(i – 1, K)\) and \(O(i – 1, K – w_i)\) to some variables
- Compute \(O(i, K)\)

`numpy`

has a a great function called

numpy.where which returns elements based on a logical condition and is a vectorized function which can also broadcast. Essentially, you give it a condition along with two return values for `True`

and `False`

. As a quick example, if we have an array and want to add \(10\) to all the values greater than some \(n\), `np.where`

would look like this:

array = np.arange(5) for i in range(len(array)): if array[i] > 2: array[i] += 10

array = np.arange(5) array = np.where(array > 2, array + 10, array)

`np.where`

goes through all the elements in `array`

for us without the need for the explicit `for`

loop. Try this exercise also:

- Create two random
`numpy`

arrays of equal length and size - Use
`np.where`

to check if`array1`

\(>\)`array2`

with the outputs`1, 0`

for`True`

and`False`

I actually got stuck here for a long time trying to feed it `np.where(items[i].weight <= w, ...)`

or some variation of. Needless to say I couldn’t get this to work as I kept getting tripped up by the `prev`

variable, and more specifically, `table[i][w - items[i].weight]`

. I figured a rewrite of the loop and conditional logic would be required to get this to work… here’s where the pain started! ðŸ™‚

I’ll save you the time I spent figuring this out, but we can actually get rid of the `if`

statement and use the current item’s weight from the first `for`

loop to achieve the same thing. We can just do operations on or with `table[i, :w_i]`

, `table[i, :-w_i]`

, and `table[i, w_i:]`

.

table_bool = np.zeros((item_count + 1, capacity + 1), dtype = np.bool) # binary matrix table_small = np.zeros((2, capacity + 1), dtype = np.int32) # 2 rows to store optimal solution for i in range(item_count): w = item[i].weight c, p = (i & 1), (~i & 1) # go between row 0 & 1 always table_small[c, :w] = table_small[p, :w] row1 = table_small[p, w:] row2 = table_small[c, :-w] + item[i].value table_small[c, w:] = ...

We’re doing a few crucial and possibly “hard-to-wrap-your-head-around”â„¢ things here with respect to logic, so it’s best to explain with an example during the first iteration of the loop:

- Make row
`c`

(`0`

) equal to row`p`

(`1`

) up until the`w`

^{th}index (essentially being the`if`

statement checking for the weight condition, as it let’s us “skip” up until that point)- We’re doing this to essentially “reset” the values that don’t fit in the knapsack during the current iteration
- Notice how we then assign values to
`table[c, w:]`

- Assign
`row1`

the value of the alternate row (row`1`

) in the iteration from the`w`

^{th}element onwards, where`w`

is the current item’s weight- We’re only interested in filling in the part of the matrix where the item can actually fit into the knapsack

- Assign
`row2`

the value of the*this*row (row`0`

) in the current iteration*without*the last \(w_i\) elements and add \(v_i\) to all of them- This is the equivalent of \(O(i – 1, K – w_i) + v_i\) but in vectorized form

`row1`

and `row2`

are the equivalent of our `p1`

and `p2`

from other solutions. The majority of the logic comes from understanding the advanced indexing we’re using and the fact that we can operate on all elements of a row in one swift line without the need of a for loop. These ideas, and how arrays are broadcast, are key for understanding this vectorized solution. If you’re struggling to get your head around this, like I did the first too many times, try playing with this in a notebook.

Now we need to compute the current row’s values from the \(i\). To do this we’ll use `np.where`

and compare `row1`

and `row2`

. This is actually the easiest part. We check if elements in `row2`

are greater than elements in `row1`

. If an element returns True, we assign `row2`

, \(O(i – 1, K – w_i) + v_i\), else `row1`

, \(O(i – 1, K)\) just as always. Of course this comparison could also be done the other way round.

table_small[c, w:] = np.where(row2 > row1, row2, row1) table_bool[i + 1, w] = np.where(table[c, w:] != table[p, w:], 1, 0)

I’ve also gone ahead and implemented `np.where`

for the boolean/binary matrix to store item decisions and traceback through the knapsack once we’re finished. We already know how to traceback through a binary matrix since we implemented it during the second optimization.

## Results

Now that this has been optimized a few times over, we can look at some quick results:

*4 items, capacity 5*(mean Â± std. dev. of 10 runs, 100 loops each):- Default implementation: 93.1 Âµs Â± 9.58 Âµs per loop
- Fully optimized: 87.5 Âµs Â± 8.65 Âµs per loop

*20 items, capacity 30,000*- Default implementation: 3.47 s Â± 257 ms per loop (mean Â± std. dev. of 10 runs,
**10 loops each**)- Overall 5m 47s

- Fully optimized: 2.26 ms Â± 137 Âµs per loop (mean Â± std. dev. of 10 runs,
**100 loops each**)- Overall 2.27s

- Default implementation: 3.47 s Â± 257 ms per loop (mean Â± std. dev. of 10 runs,

Taking those averages and having a rough estimate, running the 2nd test with 100 loops on the default implementation would’ve taken roughly an hour to complete. That’s almost a speed up of *x1500*!

I would’ve tested more, but I *realllly*, couldn’t be bothered to wait so long for things to finish running. Another important note is that while the first two optimizations have very little effect on time, if any, it did greatly help with RAM. I was unable to compute a solution to 10,000 items with 1,000,000 capacity due to the overhead of having a 10,000 x 1,000,000 `np.int32`

table allocated otherwise.

Full code can be found here: Python, C++.

**Resources & References:**

What is vectorization?

Why is vectorization faster in general than for loops?

SIMD examples

Programming guidelines for vectorizing C++

Fundamentals of Parallelism on Intel Architecture

Vectorizing a loop with G++

MathJax reference