Friday 6 December 2013

Sorts and Big O

If you had a deck of 20 cards numbered from 1-20 laid out in front of you, and you were asked to sort them in increasing order in the most efficient way possible, how would you do it? Surprisingly, there are many acceptable approaches to this problem, which leads to the topic of sorting algorithms. For my last post, I'll be discussing three basic sorts: selection sort, quick sort, and merge sort as well as their efficiency. If any of my explanations are unclear, I really recommend checking out the wikipedia articles on each of the sorts since they are explained in more detail and have gifs which I found really helpful.

       Selection

Selection sort is probably one of the simplest and most intuitive approaches to sorting - you look through the the list (or in this case the deck of cards) to find the smallest element, and put that aside. Then you go through the rest of the deck and continue the process as many times as there were cards you started with and you end up with a sorted deck. In terms of code, a possible implementation is starting at the first (0th) element of an array, and switching it with the smallest value of the rest of the array. Then the second time through you can start on the second (1st) element and so on until you have gone through the array. 

If the size of the input (the number of elements) is n, then the first time through the list you have to check n - 1 elements. The second time, you ignore one element, so you only need to check n - 2 elements, and so on until you have 0 elements to check. So in relation to n, the number of operations you have to do is (n - 1) + (n - 2) + ... + 1 =  (n^2 + n) / 2, which means selection is O(n^2)

    Quick
Quick sort is a naturally recursive algorithm that is as follows: choose an element of  a list that is easily accessible (for example, the first, last or middle one) - this element is called the pivot. Then, move all of the elements that are smaller than the pivot to its left (or right if you want decreasing order) and all of the elements that are bigger than the pivot to its right.  Then, you perform this procedure again to the left half and then the right half (excluding you original pivot, since it is already in the correct position) of the array and repeat until you reach the base case - you have 1 or 0 elements, which don't need to be sorted. The recursive nature of quick sort lends itself to a very simple implementation in code which would look like: return ([quick sort of all the elements less than the pivot] + [the pivot] + [quick sort of all of the elements greater than the pivot]) if the array has more than 1 element, otherwise just return the array.

Assuming that for each pivot you pick, there are roughly the same amount of elements less than the pivot as there are elements greater than the pivot, you'll be splitting the list in half each time, which is logn (base 2). Each time you split the list, you'll be checking (at most) every element against a pivot. Since there are (almost) n elements that are checked logn times, you get O(nlogn).

However, if by chance every time you pick a pivot that is either the smallest or the biggest element in the list you are examining, you'll have to split the list n-1 times. And since the fact that you have to check at most every element against a pivot, the number of operations is roughly n(n-1), which is O(n^2). 

      Merge
Merge is another recursive algorithm which first splits the array until each slice is either 1 or 0 elements or in other words, until each slice is sorted. Then, it goes through all of the sub arrays and merges them into a sorted list by comparing the first element in each subarray and appending the smaller (or bigger if you want decreasing order) element to the merged array until both subarrays are empty (since they are already sorted, the merged array using this method will also be sorted). This process is continued until only one array remains. 

Similarly to the reasoning used for quicksort, since you are performing n operations logn times (for each split), you end up with O(nlogn). 


In the end, it is clear that merge will always be better than selection sort, and quick will almost always be better than selection sort. However, which one between merge and quick is faster? In merge sort you must check every element at every level, whereas for quick sort, you have to check every element except for the pivots that you already know are in the correct location from previous levels. In other words, in the best case, quick sort will have to perform less operations for every split. Since they split their input the same amount of times, a balanced quick sort will be slightly faster. In other words, both of these sorts are composed of two parts: the splitting (logn) and the operations every split (n). Unless the quick sort is perfectly balanced (split exactly in half each time), the splitting part will be somewhere between logn and n, but the operations every split will be less for quick sort. So the faster algorithm depends on the relationship between those two parts, and whether the time lost in quick sort by splitting more times is offset by the time gained in having to do less operations per split. In practice, as we've seen in class, quick sort ends up being slightly faster. 

Thursday 5 December 2013

Week 10 - 12

Looks like we're already at the end of this course. This semester has really flown by! Because of the reading week (or should I say reading days), week 10 just consisted of the second term test. I thought it was pretty straightforward - three recursive functions which test your knowledge on binary trees and linked lists. The one mistake I made was assuming that a default value of None was assigned to the next attribute of a new instance of a linked node. Had I ran my code, I would have realized that the initializer took 3 arguments (self, value, next) instead of just two. I'll have to make sure to pay more attention to the code given in the question next time!

For the past two weeks (week 11 and 12), we touched briefly on a number of different concepts:

  • information hiding: 
Here we discussed python's view on the accessibility of attributes in a class and introduced the built in property(), which takes a get, a set, and a delete function as well as an optional docstring and works by setting an attribute equal to the property of any of the above functions. This allows us to vary the accessibility of users - for example, we can create the set function of property() to accept only certain values in order to restrict the possible values a user can change the attribute to. Or, we can input only a get function to property() to make an attribute read only.
  • MapReduce:
MapReduce is composed of two parts: it maps a function through an iterable which sorts or filters the values, and then it reduces it to one value. This allows more efficient processing of large data sets by allowing operations to be run in parallel. In order to reinforce our understanding, Danny introduced the reduce() function from the functools module runs a function through an iterable pairwise to reduce it to one value.
  • scopes and namespaces
I thought this part was self explanatory, but we went through an example involving local, nonlocal, and global namespaces (using nested functions) to where python checks first when a variable is called - first local, then nonlocal, then global.
  • tracing recursive function
Here the main point that Danny made was that when tracing through recursive functions, it isn't necessary to try and go through exactly what the computer is doing. You can first go through the simplest non-recursive case, then afterwards instead of repeating the process, you can just plug in the known results (being a human you don't have to recurse over again to know what the result is) and keep on going until you reach an output.
  • memoization
Danny touched briefly on how memoization works and what its benefits are with the classic fibonacci example, and then showed us an example of a memoization decorator.

Overall, I've really enjoyed the course, and I wish everyone best of luck on their exams!