My conversations with gullible machines...: April 2009

I wrote up a small program to find prime-numbers between x and y ; implemented the Sieve of Eratosthenes.
The SOE is basically where you take all numbers (say 1-100) into an array. Cross out 1. Loop from 2-Sqrt(N) i.e. 2-10. Now cross out every multiple of 2 in the array. Next cross out every multiple of 3.. and so on. Finally all the numbers that aren't crossed out are prime.

However I did a very simplistic implementation (test-driven) to boot. All the tests did pass.. however running it to get all primes between 10K and 100K took 34 secs to execute. And I was on the new n improved ruby 1.9 with a better VM. I switched to the stable ruby 1.8.6 and it took ~60 secs. So now I turned back to my primary Weapon C#. The same steps implemented in C# took 722 msecs. So now there were 2 possibilities
1. Ruby is sloooow (the usual gripe)
2. Something in my algorithm is messed up.

So I posted my source onto StackOverflow - (my current fave time sink :) to get some eyeballs to look at the problem.
So here's my naive implementation

class PrimeGenerator
  def self.get_primes_between( x, y)
    sieve_array = Array.new(y) {|index|
      (index == 0 ? 0 : index+1)
    }

    position_when_we_can_stop_checking = Math.sqrt(y).to_i
    (2..position_when_we_can_stop_checking).each{|factor|
      sieve_array[(factor).. (y-1)].each{|number|
        sieve_array[number-1] = 0 if isMultipleOf(number, factor)
      }
    }
    
    sieve_array.select{|element| 
      ( (element != 0) && ( (x..y).include? element) )
    }
  end
  def self.isMultipleOf(x, y)
    return (x % y) == 0
  end
end

# Benchmarking code


require 'benchmark'
Benchmark.bm(30) do |r|
  r.report("Gishu") { a = PrimeGenerator.get_primes_between( 1_000, 10_000) }
end

A few people got back with what might be the problem.
One of the first and most voted issue was the slicing the array to get a new subset array. That's got to be expensive inside a loop for an array of this size. Obviously I was using for loops in C# (since C# doesn't have Ruby's each)

for index in factor..y-1 do
  sieve_array[index] = 0 if isMultipleOf(index+1, factor)
end

That shaved off 8 secs - 24%. But still 26 secs. Another optimization was to reduce iterations - instead of checking each number with IsMultiple(number, _loopVar), set the 'step' for the loop to equal _loopVar (so from 2 you go 4,6,8,10...)

And finally Mike Woodhouse dispatched a homer. He posted a tiny snippet that did the same thing in under 1 sec. No that's not a typo.

def sieve_to(n)
  s = (0..n).to_a
  s[0]=s[1]=nil
  s.each do |p|
    next unless p
    break if p * p > n
    (p*p).step(n, p) { |m| s[m] = nil }
  end
  s.compact
end

# Usage:
# puts sieve_to(11).select{|x| x>=3}.inspect


        user     system      total        real
Gishu  27.219000   0.000000  27.219000 ( 27.375000)
Mike    0.141000   0.000000   0.141000 (  0.140625)

I then had to know how he did it..

Smart#1: He skips if array[_loopvar] is already 0. whereas if array[6] is crossed out due to 3, I did a scanning run to set all multiples of 6 to 0 which were already 0.(now down to 6 secs)

Smart#2: Use the Numeric#step trick to jump to the next multiple vs traversing entire list with an IsMultiple check (now down to 0.25 secs)

Smart#3: He begins the inner loop with Square of p. e.g if _loopvar is 7, I used to check from 8->end. Mike checks from 49->end. This one was a little tricky to figure out.. the reason is 7x2, 7x3, 7x4, 7x5, 7x6 have already been crossed out when _loopVar was 2,3,4,5,6 respectively. (now down to 0.21 secs)

Smart#4: Smart use of Ruby's nil for crossed out vs my choice of 0. That followed by Array#compact reduces the number of elements for the final select operation to get primes between x and y (now down to 0.18 secs.)

So after all that it's Ruby 0.15 secs and C# (with same optimizations) 0.028 secs, much more easy to take in.

An interesting thought to end this post is that even with the naive first implementation C# performed reasonably well... under a sec. I'd have moved on to the next task. Ruby on the other hand slowed to a crawl forcing me to take another look for optimizations. In a indirect sort of way, C#'s superior performance could be hiding bad/substandard solutions and letting programmers get away with it...

require 'test/unit'
require 'merge_sort'
class TestMergeSort < Test::Unit::TestCase
def test_partition_array_with_even_number_of_elements
left, right = MergeSort.partition_array [11, 5, 9, 15]
assert_equal [11, 5], left
assert_equal [9, 15], right
end
def test_partition_array_with_odd_number_of_elements
left, right = MergeSort.partition_array [11, 5, 45, 9, 15]
assert_equal [11, 5], left
assert_equal [45, 9, 15], right
end

def test_merge_arrays
assert_equal( [10,20,30,40],
MergeSort.merge_arrays( [10,30], [20,40] ) )

assert_equal( [1, 3, 5, 9],
MergeSort.merge_arrays( [1, 5, 9], [3] ) )
assert_equal( [1, 3, 5, 9],
MergeSort.merge_arrays( [3], [1,5,9] ) )
end

def test_sort
assert_equal [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
MergeSort.sort([ 8, 3, 4, 0, 9, 1, 2, 7, 6, 5])
end
end

class MergeSort
def self.partition_array( input_array)
mid_index = (input_array.length / 2) - 1
[input_array[0..mid_index], input_array[(mid_index+1)..(input_array.length-1)]]
end

def self.merge_arrays(array1, array2)
merged_array = []

until (array1.empty? || array2.empty?) do
if (array1[0] < array2[0])
merged_array << array1.shift
else
merged_array << array2.shift
end
end

merged_array + array1 + array2
end

def self.sort( input_array )
return input_array if (input_array.length < 2)

left, right = self.partition_array input_array
sortedLeft = self.sort left
sortedRight = self.sort right
y = self.merge_arrays( sortedLeft, sortedRight )
end
end

My conversations with gullible machines...

Ruby is slow... until you learn to write good code.

MergeSort in Ruby