Learning DSA in Python: Two Sum

Learning DSA in Python: Two Sum

You have the below dataset:

Orange: #10

Apple: #30 

Pawpaw: #40 

Lemon: #50 

Guava: #60 

Pineapple: #70

This dataset is a hypothetical dataset of an e-commerce business.

You want to implement the feature that a user will be able to search for pairs of products based on how much money they have at their disposal.

For example, I have #100 at my disposal, what combination of products in the store will amount to that?

How do you solve this problem using the dataset provided?

Let's try and understand the problem. If you don't try to understand the problem, you won't know how to approach solving it.

"...user will be able to search for pairs of products based on how much money they have at their disposal"

What is a pair?

A pair is a set of two things used together or regarded as a unit.

Based on this, we can rephrase the objective to: "...user will be able to search for a set of two products whose price total adds up to the money they have at their disposal"

Next, let's convert the objective to programming speak. "...find two prices whose total add up to a certain amount", "...find two numbers whose total add up to a certain number"

You as the programmer know what feature you are trying to achieve, you will only check off the feature if you've successfully implemented it.

If you fail to correctly rephrase the problem, it just means you will spend more time solving the problem. That is, the problem will take more of your time until you successfully solve it by getting the rephrasing right and writing the programming solution for it.

You can't rely on large language models to help you in getting the phrasing right or help you in breaking down the problem. Rely on large language models for code snippets for each micro part of the solution. But always try to code the solution yourself first.

Now that the problem has been rephrased to finding two numbers whose total adds up to a certain number, we can say we have a classic two-sum problem. We may phrase this same problem in another way and classify the problem as another classic DSA problem, but as I've mentioned, we can only say we've solved the problem if we've implemented the feature we need to implement.

Introducing Two Sum

The two-sum problem is all about finding two numbers whose sum adds up to another number. Given a list of numbers, how do you find two numbers in the list whose sum adds up to a specified number? That's what the two-sum problem is all about.

10, 30, 40, 60, 70

The above list contains two numbers whose sum adds up to 100. They are 30 and 70.

How do you find the two numbers using code?

Well, let's not dwell too much on the problem.

First of all, if you've exposed yourself to DSA problems, you would have learnt one or two approaches to solving the problem. But rather than approaching the problem using a memorized approach, let's try and reason out the approach using our current level of programming knowledge.

At this point in your Python programming journey, I will assume you are already familiar with such things as lists, loops, tuples, dictionaries, variables, built-in functions, if statements, etc. These are the fundamentals. We will use our current knowledge of the fundamentals to try out different ways to solve the problem.

We want to find two numbers in a list. At this point, we should know if we are to find a value in a list, we will be iterating through that list and the traditional way of iterating through a list is to use a loop. So, we will be looping through the list to find a value, in fact, two values.

When we use a loop to iterate through a list, we get the values, but we want to find two values whose sum adds up to another number. In the loop, we will therefore be checking if two values in the list add up to another number. We will be using an if statement to accomplish this. If they add up, we will be returning the two numbers. In the classic two-sum problem, the index of the two numbers is what is returned. Since the goal is to learn DSA, we will solve for both.

So given the below code which gives us access to the values of a list in Python, how do we check if two values in the list add up to a certain number? For example 100?

prices = [10, 30, 40, 60, 70]

for price in prices:

Let's see, can we check for the first and second values in the list in each iteration of the loop? Given the code, No. We can only have access to the first and second values in the list in each iteration of the loop if we have access to the index in each iteration of the loop. Can we have access to the index of the values in a list in each iteration of a loop? Yes. How? By defining the loop as the below:

for i in range(len(prices)):

All I did here is to create a range of numbers whose value starts from 0 and ends with the value derived when you subtract 1 from the length of the list. By creating such a range of numbers, I get access to the index of each value in the list.

What I did was use my existing knowledge to generate indexes that can be used to access values in the list in a correct manner. If you didn't have this knowledge before, you now have it - that we can get the indexes of a list by defining the for loop in a certain way.

Now that I have access to the indexes of the list which gives me access to the first and second values in each iteration of the loop, will this solve the two-sum problem? Let's construct some codes to see.

Based on my current hunch, the below code is constructed:

prices = [10, 30, 40, 60, 70]

for i in range(len(prices)):
    if ((prices[i] + prices[i + 1]) == 100): 
        print("equal")

When I ran the code, it outputted an index out-of-range exception. How do I eliminate the index out-of-range problem? Well, I can add a second condition that states that the print statement should not be executed if i is not the last index. How do I define such a condition?

It can be defined thus:

for i in range(len(prices)):
    if (i == (len(prices) - 1)): 
        break
    if ((prices[i] + prices[i + 1]) == 100): 
        print("equal")

While this approach seems to solve the problem, it has become obvious that the approach won't work when the two numbers don't follow each other.

The objective is to find two numbers, not two consecutive numbers.

Run the below, you won't get an output despite the fact two different groups of numbers in the list add up to 100

prices = [30, 10, 40, 70, 60]

for i in range(len(prices)):
    if (i == (len(prices) - 1)): 
        break
    if ((prices[i] + prices[i + 1]) == 100): 
        print("equal")

So, this approach doesn't solve the problem in all situations.

The approach of using the first and second values in each iteration of the loop to check for the two numbers worked in one situation but didn't work in another situation, what other approach can I try?

First of all, let's understand why I used the approach of checking the first and second values in each iteration of the loop.

I did that because I need a way to have the two numbers at the same time so I can perform an addition operation on them and see if their sum adds up to 100.

So basically, what I need now is a way to have two numbers in a list in each iteration of the loop.

I've tried one approach, it gave me the two numbers, but it didn't work in all situations. What other way can I use to access two numbers in a list in a particular iteration of a loop? Well, one approach is to define an inner loop. Why an inner loop? Well, the idea is that I want to have two numbers in each iteration of the list, an inner loop makes it possible for me to traverse the entire list in each iteration of the loop. This means I can have access to a value and any other value in the list in each iteration of the loop just by defining an inner loop.

prices = [30, 10, 40, 70, 60]

for i in range(len(prices)): 
    for j in range(len(prices)): 
        print(f"{i} and {j}")

The above gives me access to two numbers in each iteration of the outer loop. The below output is produced:

0 and 0 
0 and 1 
0 and 2 
0 and 3 
0 and 4 
1 and 0 
1 and 1 
1 and 2 
1 and 3 
1 and 4 
2 and 0 
2 and 1 
2 and 2 
2 and 3 
2 and 4 
3 and 0 
3 and 1 
3 and 2 
3 and 3 
3 and 4 
4 and 0 
4 and 1 
4 and 2 
4 and 3 
4 and 4

In the first iteration of the outer loop, the index 0 and 0 are printed, the index 0 and 1 are printed next, the index 0 and 2 are printed next, the index 0 and 3 are printed next, and the index 0 and 4 are printed last.

In the second iteration of the outer loop, the index 1 and 0 are printed, the index 1 and 1 are printed next, and so on.

You can see I have access to two numbers in each iteration of the loop. I can now use these two indexes to do the summing and get the two numbers.

The below code demonstrates that:

prices = [30, 10, 40, 70, 60]

for i in range(len(prices)):
    for j in range(len(prices)):
        if (prices[i] + prices[j] == 100): 
            print(f"{prices[i]} and {prices[j]}")

You should get the below output:

30 and 70 
40 and 60 
70 and 30 
60 and 40

When I ran the code, I got the correct output, but there was an issue, the output was duplicated. How do I solve this?

First of all, let's try to understand why this is happening. So why is this happening?

Well, let's postulate. 30 is associated with index 0, 70 is associated with index 3, 40 is associated with index 2, and 60 is associated with index 4.

The output 30 and 70 were produced in the first iteration of the outer loop; the output 70 and 30 were produced in the third iteration of the outer loop. While the output 30 and 70 were produced when the outer loop was in its first iteration, the inner loop was in its fourth iteration. While the output 70 and 30 were produced when the outer loop was in its fourth iteration, the inner loop was in its first iteration.

If you look at the below, a 0 and 1 will be printed, a 1 and 0 will also be printed, a 0 and 2 will be printed, a 2 and 0 will also be printed, a 0 and 3 will be printed, a 3 and 0 will also be printed, a 0 and 4 is printed, a 4 and 0 is also printed.

0 and 0 
0 and 1 
0 and 2 
0 and 3 
0 and 4 
1 and 0 
1 and 1 
1 and 2 
1 and 3 
1 and 4 
2 and 0 
2 and 1 
2 and 2 
2 and 3 
2 and 4 
3 and 0 
3 and 1 
3 and 2 
3 and 3 
3 and 4 
4 and 0 
4 and 1 
4 and 2 
4 and 3 
4 and 4

If you want to skip the printing of 1 and 0, 2 and 0, 3 and 0 and 4 and 0, all you need to do is configure the loop as below which effectively solves the problem of repeating outputs:

prices = [30, 10, 40, 70, 60]

for i in range(len(prices)): 
    for j in range(i + 1, len(prices)):
        if (prices[i] + prices[j] == 100): 
            print(f"{prices[i]} and {prices[j]}")

Part 2

While we've seen a solution that works, the solution is not efficient. At least, there are more efficient solutions. Below is one of them:

prices = [30, 10, 40, 70, 60]

for price in prices:
    complement = 100 - price 
    if complement in prices: 
        print(f"{price} and {complement}")

This solution is based on the insight that you can just in the for loop get the complement of each number in the list and check if the complement is inside the list. This is more efficient than running an inner loop.

You will still experience the problem of duplicate output.

Todo: Explore how it can be solved.

Now, the requirement of the classic two-sum problem is for us to get the index of the two numbers.

While the first approach can easily be tweaked to get the index of the two numbers, the second approach will need more work. A dictionary will need to be introduced in the second approach. The logic is that since we want to match a value in the list with its complement while returning their index if they match, we will introduce a dictionary that will only hold one value at a time. The interesting thing about this dictionary is that the value in the dictionary will be the index of the value currently being iterated on in the list, while the index of the dictionary will be the value currently being iterated on. If you try to reverse things, you will see why it needs to be that way.

prices = [30, 10, 40, 70, 60] 
prices_dict = {}
for i in range(len(prices)): 
    complement = 100 - prices[i] 
    prices_dict[prices[i]] = i 
    if complement in prices_dict: 
        print(f"{i} and {prices_dict[complement]}")

Challenge: You're creating a fintech app for investments. You want to make it possible for an investor to optimize their investment portfolio by finding pairs of investments with specific characteristics. For example, to find two stocks whose combined prices match a target value, can you see the problem being a two-sum problem?

Conclusion

You've journeyed with me to see how the two-sum problem can be solved in Python. You also saw what features could lead to the two-sum problem. Please if you find issues with the codes, don't hesitate to let me know in the comment box.