Code Comparison of Different Solutions for PSet 1

batmanbury
New Member

Posts: 13

Code Comparison of Different Solutions for PSet 1 Mar 13, 2014 9:29:36 GMT

Quote

Post by batmanbury on Mar 13, 2014 9:29:36 GMT

I noticed the activity on this board declined pretty quickly, so I hope everyone is at least using the discussion forums on the edX.org website.

I wanted to show something I found interesting about my take(s) on Problem Set 1, Excercise 3: Alphabetical Substrings. I took this course last semester, so I mainly wanted to take it again to see if and how my skills improved, and how my thoughts on problem solving changed. Since the due date for this PSet is past, I hope nobody minds my posting the code in its entirety.

For this problem, we're supposed to write a program that prints the longest substring in a string in which the letters occur in alphabetical order. For example, if s = 'azcbobobegghakl', then the program should print:

Longest substring in alphabetical order is: beggh

In the case of ties, print the first substring. For example, if s = 'abcbcd', then your program should print:

Longest substring in alphabetical order is: abc

Here are my two solutions, you can compare them yourself, then I'll talk about what I did below.

Oct. 2013 Solution:

strings = []

alphas = []



for j in range(len(s)+1):

    for i in range(len(s)):

        if s[i:j] != '':

            strings.append(s[i:j])



for substring in strings:

    if substring == ''.join(sorted(substring)):

        alphas.append(substring)



print max(alphas, key=len)

Mar. 2014 Solution:

def subcount(s):

    longest = ''

    current = ''

    for i in range(len(s)):

        for c in s[i:]:

            if current + c == ''.join(sorted(current + c)):

                current += c

            else:

                if len(current) > len(longest):

                    longest = current

                    current = ''

                    break

                else:

                    current = ''

                    break

    return longest



print 'Longest substring in alphabetical order is: ' + subcount(s)

You can see the 2013 solution is clearly 'shorter' but that really has nothing to do with how efficient it is, or rather, isn't. The main loop is a nested for loop which begins by iterating over each letter of the initial string, s, then does so again (hence the nested loop) in order to take 'slices' of increasing size starting at each letter. Basically, it takes every possible substring, of every possible length and stores them in the 'strings' list. This is terribly inefficient and unnecessary, but it worked at the time, and it was all I could think to do. The second loop goes through each of the stored substrings, then checks them against a sorted version of the same substring, which means if the substring is 'qwerty,' it will sort it (alphabetize it) and check if they are both the same. 'Qwerty' is not the same as 'eqrtwy' so that string would NOT get added to the 'alphas' list. If they are the same, for example 'eggs' == 'eggs' so that would go into the 'alphas' list. By the time this finishes, I have two lists and haven't accounted for ties. On the last line, I print the longest substring with max(alphas, key=len), which just luckily happens to result in the correct substring even if there's a tie (by virtue of how Python iterates through lists). Like I said, not great, but it worked at the time.

For the 2014 solution, I started without referencing my old code. I created a function which initializes two empty strings, 'longest' and 'current.' Then I enter the main loop iterating over the length of the input string. The inner loop goes over each character in the string, checking to see if adding one additional character will result in an alphabetical string. If it does, then 'current' will continue to grow. For example, if the main string is 'alphabet' then 'current' will go from 'a' to 'al' to 'alp' but once it hits the 'h' it doesn't get added. Instead, if the length of 'longest' is shorter than 'current' (which it is) it takes that value. At this point, 'longest' = 'alp' then current gets reset to an empty string, and the inner loop breaks to go to the next letter. To continue the example, we'll now check substrings that begin with 'l' then 'p' then 'h' which offer nothing alphabetically longer than 'alp.' But, at the end, 'current' will build up to 'abet' which then replaces 'longest' and the we exit the loop before returning 'abet' as the longest substring.

So in the new code, I managed to solve it without storing anything in memory larger than a string, whereas I was creating these potentially longer and longer lists with my old code. I was also storing non-alphabetical substrings that had no reason to be stored. As far as ties, the new code will naturally choose the first appearing substring because of that greater-than sign (>). Only longer substrings will replace 'longest.'

So, maybe there's not many people here to see it, but I hope someone thinks it's as interesting as I do. I was surprised to see how much my code changed after just a few months. If you want to grow and develop your skills, just keep taking classes, program as often as you can. I think the important thing for me here is that BOTH times around this problem I thought I did the best I could do at the time. Clearly the second solution is better, but there's obviously an even better solution out there than mine. There are endless ways to create any given program, and only the time you put into programming will help you imagine the cleaner or more efficient or elegant implementation.

Last Edit: Mar 13, 2014 9:36:28 GMT by batmanbury

antisense
New Member

Posts: 3

Code Comparison of Different Solutions for PSet 1 Mar 13, 2014 13:37:47 GMT

Quote

Post by antisense on Mar 13, 2014 13:37:47 GMT

As someone who is just starting out (I have so far finished Codecademy, worked through most of The Hard Way, and I just completed problem set 3 for 0.1x yesterday, yay!), this is definitely interesting and motivating. I know when I am building code for the problem sets, I am including redundancy and inefficient looping. I can see some issues in my code, but I have a hard time figuring out a different way to do it. As I work a full time job, with many out of work hours, I figure ensuring I finish everything by the deadlines is my first priority.

You're showing me that taking the course on its next iteration is a chance to fix these errors and see improvement.

Thanks for the post, and congratulations on sticking with it and improving!

andreareina
New Member

Posts: 8

Code Comparison of Different Solutions for PSet 1 Mar 18, 2014 9:00:10 GMT

Quote

Post by andreareina on Mar 18, 2014 9:00:10 GMT

s = "abcbcd"

longest_cur = s[0]
longest_sofar = s[0]

for i in range(1, len(s)):
    if s[i] >= s[i-1]:
        longest_cur += s[i]
        if len(longest_cur) > len(longest_sofar):
            longest_sofar = longest_cur
    else: longest_cur = s[i]

print "Longest substring in alphabetical order is: %s" % longest_sofar

So, a couple of comments on your code. You have duplication that should be factored out:

if len(current) > len(longest):
    longest = current
    current = ''
    break
else:
    current = ''
    break

You see how both the if and the else reset current and then break? Put them outside the if statement and save 5 lines:

if len(current) > len(longest):
    longest = current
current = ''
break

Your for c in s[i:] loop also threw me for a loop (groan). You concatenate the strings, then sort them, which produces a list so you have to use the join method on an empty string to turn the list back into a string.... Also, it's inefficient to sort something to find out if it's sorted. Since you're adding things onto the end of a string, you can simply compare the potential addition to the last character of the string:

for c in s[i:]:
    if c >= current[:-1]: # we use >= because "bb" is acceptable
        ...

This is of course how I think (in fact that's what I use in my solution), you may very well find this to be less clear than your solution -- I notice that you used the ''.join() idiom in both solutions. However this two-at-a-time comparison of elements is part of how sorting algorithms are implemented under the hood, so it's worth getting used to. Plus, it'll save you execution time when the length of the list gets really big.

georgemacky
New Member

Posts: 1

Code Comparison of Different Solutions for PSet 1 Aug 18, 2022 12:11:38 GMT

Quote

Post by georgemacky on Aug 18, 2022 12:11:38 GMT

Having a robust project controls solution to help projects throughout all life cycle stages gives organizations an internal advantage over the competition and keeps them organized.

Project management tools give companies sufficient leverage to maximize profits and time for expansion.

primavera p6

Python CS

Code Comparison of Different Solutions for PSet 1

Post by batmanbury on Mar 13, 2014 9:29:36 GMT

Post by antisense on Mar 13, 2014 13:37:47 GMT

Post by andreareina on Mar 18, 2014 9:00:10 GMT

Post by georgemacky on Aug 18, 2022 12:11:38 GMT

Quick Reply