Time complexity different between operating systems. Why?

A forum for general discussion of the Python programming language.

Time complexity different between operating systems. Why?

Postby Mekire » Sat May 18, 2013 5:02 am

Ok, so first off I have a dual boot setup with Windows 7 and Linux Mint 13. I generally work in Windows because I'm honestly not so comfortable in Linux but I like to be able to test compatibility and such when necessary.

Brought on by a recent thread, I realized that certain operations were not performing the same on the two OS.

Specifically string concatenation and the str.join method. My original assertion was that the string concatenation method should be completely avoided because it goes quadratic.

Take the following code (still using time instead of timeit; still lazy):
Code: Select all
import time

def make_string_concat(num):
    l = ''
    for i in xrange(num):
        l += '{}\n'.format(i)
    return l

def make_string_join(num):
    return "\n".join(str(i) for i in xrange(num))+"\n"

###
if __name__ == "__main__":
    NUMBERS = [1000000,2000000,3000000]
    for number in NUMBERS:
        print("n = {}".format(number))
       
        start = time.time()
        DATA1 = make_string_join(number)
        print("Time to generate data (str.join): {}".format(time.time()-start))

        start = time.time()
        DATA2 = make_string_concat(number)
        print("Time to generate data (concatenation): {}\n".format(time.time()-start))

So on windows the result I get is:
Code: Select all
>>>
n = 1000000
Time to generate data (str.join): 0.389999866486
Time to generate data (concatenation): 2.88599991798

n = 2000000
Time to generate data (str.join): 0.655000209808
Time to generate data (concatenation): 12.2779998779

n = 3000000
Time to generate data (str.join): 0.99799990654
Time to generate data (concatenation): 28.2520000935

>>>
It would appear that the str.join stays linear while the concatenation blows up quadratically.

Then Metulburr showed me his results for the same code, which completely contradicted mine. I rebooted to Linux and tested the same code.

Result with Linux on the same computer:
Code: Select all
>>>
n = 1000000
Time to generate data (str.join): 0.224174022675
Time to generate data (concatenation): 0.316344022751

n = 2000000
Time to generate data (str.join): 0.449282884598
Time to generate data (concatenation): 0.658910989761

n = 3000000
Time to generate data (str.join): 0.677663803101
Time to generate data (concatenation): 0.943947792053

>>>
Here, while the join method is still superior, they both appear to have linear time complexity.

Aside from the standard, "Lol Windows," line of reasoning; what would be causing the time complexity of concatenation to be O(n^2) with one operating system, and o(n) with the other?

-Mek
User avatar
Mekire
 
Posts: 817
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: Time complexity different between operating systems. Why

Postby metulburr » Sat May 18, 2013 12:37 pm

I sometimes have anything from dual boot, triple boot, to quad boot. But for compatibility testing, i normally use virtual box with windows 7 set up in it. Which is a lot easier and quicker than rebooting into another system. Of course beforehand, i didnt really look at the timing differences of the two between Windows and linux. Not sure why concatenation in windows is so slow. Would be intersting to know.

i guess i should ensure str.join instead for my code that is going to be runnning on windows, huh? lol

No wonder why someone said that str.join is faster, if they did it in windows and got those results...and i did it in linux, i am thinking, "you are complaing about .2 seconds?", when in reality they were talking about X number of seconds.
New Users, Read This
version Python 3.3.2 and 2.7.5, tkinter 8.5, pyqt 4.8.4, pygame 1.9.2 pre
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
User avatar
metulburr
 
Posts: 1107
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: Time complexity different between operating systems. Why

Postby hrs » Sat May 18, 2013 1:33 pm

I got this from cProfile
Code: Select all
         12000036 function calls in 6.926 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.926    6.926 <string>:1(<module>)
        3    0.000    0.000    3.288    1.096 con.py:10(make_string_join)
  6000003    2.052    0.000    2.052    0.000 con.py:11(<genexpr>)
        1    0.001    0.001    6.926    6.926 con.py:14(main)
        3    1.843    0.614    3.636    1.212 con.py:4(make_string_concat)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  6000009    1.794    0.000    1.794    0.000 {method 'format' of 'str' objects}
        3    1.237    0.412    3.288    1.096 {method 'join' of 'str' objects}
       12    0.000    0.000    0.000    0.000 {time.time}

So it looks like benchmarking a generator expression against string concat.

edit: This is on linux btw.
hrs
 
Posts: 86
Joined: Thu Feb 07, 2013 9:26 pm

Re: Time complexity different between operating systems. Why

Postby casevh » Sat May 18, 2013 3:45 pm

I've done a little research and this is a summary of what I've found so far. It may not be 100% accurate.

In CPython 2.4, += was optimized to try and resize a string object in-place. Previous versions of CPython would always create a new string object and copy the data. Beginning with 2.4, CPython will try (when it can) to just resize the memory block without copying the data. It does this by calling the realloc() function provided by the operating system. It appears that the realloc() function on Windows copies the data to a new memory block while the realloc() on Linux just extends the existing block without copying the data.

realloc() doesn't guarantee either type of behavior. It just guarantees that it will return a pointer to block of memory with the new size and any data from the old block will be copied if needed. Different strategies can be taken to optimize different goals: speed vs. minimize memory fragmentation vs. minimize total footprint. Your test happened to highlight one of the differences.

When concatenating strings, join() is guaranteed to run in linear time complexity. The change in CPython 2.4 improves the performance in a common environment. Because the improved += behavior is not part of the language specification, it shouldn't be relied on. join() is the recommended way to concatenate a large number of strings.

Side note: when running benchmarks, you usually should disable CPython's garbage collection. The garbage collector automatically triggers when a large number of objects are created and deleted. The purpose of the garbage collector is to look for, and try to delete, isolated reference cycles: object A refers to B and B refers back to A, but no other references to A or B exist. Since your benchmark isn't creating cycles, it isn't needed. GC is useful for more complex programs.
casevh
 
Posts: 56
Joined: Sat Feb 09, 2013 7:35 am

Re: Time complexity different between operating systems. Why

Postby micseydel » Sun May 19, 2013 2:40 am

Thanks as always for the information casevh! I was really curious about this, I had a lot of the information you provided here but was missing some important stuff!
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 929
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Time complexity different between operating systems. Why

Postby snippsat » Sun May 19, 2013 3:21 am

Tnx for good explanation casevh.
It would appear that the str.join stays linear while the concatenation blows up quadratically.

It`s more that memory use blow up bye the use of +=.

Some more info about implementation in Python source string concatenation VS list join
+= is fastest if string length < 40.
http://stackoverflow.com/questions/3055 ... s-str-join
Code: Select all
Iterations: 1,000,000       
String Length:  Time +=     Time ''.join()
1                0.953990        1.3280
4                1.233990        1.8140
6                1.516000        2.2810
12               2.250000        3.2500
80              15.530900       12.3750
222            101.797000       30.5160
443            238.063990       57.2030
User avatar
snippsat
 
Posts: 89
Joined: Thu Feb 21, 2013 12:04 am

Re: Time complexity different between operating systems. Why

Postby micseydel » Sun May 19, 2013 4:41 am

Yeah, but when it's that fast the speed different between the two likely doesn't matter. I'd say stylistically it's preferable to use the more robust version.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 929
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA


Return to General Discussions

Who is online

Users browsing this forum: Google [Bot], micseydel and 2 guests