Range() too large in Problem 3

For questions about problems on the Project Euler web site. No spoilers. Please include the question number in the subject line of your post.

Range() too large in Problem 3

Postby Explorador » Tue Aug 26, 2014 12:53 pm

Well, I am trying the 3rd problem, and I solved it properly for the example given:
The prime factors of 13195 are 5, 7, 13 and 29.
. But when I try it for the problem:
What is the largest prime factor of the number 600851475143 ?
I get the problem that 600851475143 is too large to apply this code:

Code: Select all
primos = [x for x in range(2,600851475143)if all(x%i for i in range(2,x))]


(I do not put the whole code for avoiding spoiler)

So it tells me:

Traceback (most recent call last):
File "<pyshell#70>", line 1, in <module>
primos = [x for x in range(2,600851475143)if all(x%i for i in range(2,x))]
OverflowError: range() result has too many items


Is there any way to run this code (or something similar)?

Thank you very much.
Explorador
 
Posts: 14
Joined: Mon Aug 25, 2014 9:32 am
Location: Spain

Re: Range() too large in Problem 3

Postby Explorador » Tue Aug 26, 2014 2:11 pm

I solved the Problem 3 just changing the huge number of the code for a shorter one (which I supossed that would be inside the range of useful prime numbers) and I tried the solution at projecteuler and it worked.

But it would be great if someone could help with the situation described in the first post, in order to find a greater solution for future problems.
Explorador
 
Posts: 14
Joined: Mon Aug 25, 2014 9:32 am
Location: Spain

Re: Range() too large in Problem 3

Postby micseydel » Tue Aug 26, 2014 5:36 pm

The problem is that your code is too inefficient. Your program doesn't scale because its time complexity is inefficient. Let's expand your code out for easier analysis.

Code: Select all
N = 600851475143

primos = []
for x in range(2, N):
    if all(x % i for i in range(2, x)):
        primos.append(x)

You have two loops here. How many total loop iterations will there be here with respect to N? Clearly the first will be N, and for each of those there will be up to N more. This comes out to approximately N*N/2, for which I'll use Big O notation and just say O(N**2). This isn't good for the general case, and in this case N is so large that there's no way a modern computer is going to be able to calculate it. So what you want is to improve the Big O time complexity.

I'm now going to factor out part of your code for easier analysis.
Code: Select all
N = 600851475143

def isprime(x):
    return all(x % i for i in range(2, x))

primos = []
for x in range(2, N):
    if isprime(x):
        primos.append(x)

We can see here that your isprime() function is O(x). Do you think you can improve upon that?
Also, you check all the values up to N; is it at all possible that any of the values greater than N/2 are what you're looking for? Can you further decrease your lower bound?
Try one of these at a time on the smaller N value to check for correctness. They can be completed separately.

When I improved the Big O of each of these two parts, I had a solution that ran in about 0.08s on my computer and used very little memory since I used xrange() instead of range() (making just this change would make your code not error out, but it would never finish running).

By the way, when I learned Big O in school, I didn't really get it. It was through doing Project Euler problems that I really began to understand proper optimization. If you continue with PE and this post doesn't make sense yet, try coming back on occasion until it clicks ;)
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1507
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Range() too large in Problem 3

Postby Explorador » Wed Aug 27, 2014 3:10 pm

I have been reading your tips. This is what I have:

Due to odd = odd * odd, and we are considering only prime numbers, we have that not only N/2, but we could even do N/15, as 15 = 3*5 (odd*odd).

So one of your tips is a bit improved, as you asked me to do: instead of N/2, we have N/15. But I am sure that we could do something even smarter.

I also tried xrange() instead of range(), but it seems that unless I improve your last tip (about Big O function), it will still not working.

I did not really pay attention to Big O before (as you also said hahaha), so now I am a kind of newbie in this. I researched a bit about it, and about how to improve it. I asked a mathematician friend of mine, and he told me that using a http://en.wikipedia.org/wiki/Logarithmic_integral_function it could work. I have never studied these kind of integral functions before, so I am a newbie here again hahaha. My python knowledge is quite elemental, so it is difficult for me to conceptualize the Big O and the Logarithmic integral function into a python code. (for the Big O it could be ok, but not for the LIF).

If you can add more information, I will be pleased of listening to you.

Thank you very much for your time and explanations.
Explorador
 
Posts: 14
Joined: Mon Aug 25, 2014 9:32 am
Location: Spain

Re: Range() too large in Problem 3

Postby micseydel » Wed Aug 27, 2014 5:20 pm

I've never used the logarithmic integral function, and although I'm sure it's useful, I think it's overkill (or possibly irrelevant) here. Big O is something that fresh CS students sometimes don't understand the first time, but it's not too hard and once it clicks it's pretty easy. I wouldn't worry about the formal definition, just the idea of the number of loops, multiplied by the operation in those loops.

Reducing it by a factor of 15 is good, but doesn't change the Big O (which ignores constant multipliers). I will say that you have two components which are O(N), one inside the other making it O(N**2) and that you can optimize separately each of those two "linear" running times. I'll give you some examples of faster running times than O(N): O(log(n)) and O(sqrt(n)). With those in as a basis, does anything come to mind?
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1507
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Range() too large in Problem 3

Postby Explorador » Thu Aug 28, 2014 12:27 pm

After reading more, I think I fully got the Big O concept. And your tips brought me to this (I must admit you were quite clear, so it was not difficult to follow you):

Code: Select all
def isprime(x):
   return all(x % i for i in xrange(2, int(math.sqrt(x))))
primos = []
N = 600851475143
for x in xrange(2, int(math.sqrt(N))):
   if isprime(x):
      primos.append(x)


But what I get is this:

>>> primos[:30]
[2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 15, 17, 19, 23, 25, 29, 31, 35, 37, 41, 43, 47, 49, 53, 59, 61, 67, 71, 73, 79]


So it is obvious it does not work properly. I have been trying few changes, but some of them completely spoil the function, and others just do not reach where I want (-> getting only prime numbers). So I am a bit lost right now due to I think I got the Big O concept (maybe not), but for some reason it does not work as I supposed it had to. Maybe it is because I am not writing the code properly.

I think that the problem is in the "def isprime(x)" part and not in the "for x in range..." part, because I have the feeling that the problem is in the second line of the "def", exactly in the xrange function. Maybe instead of (2, int(math.sqrt(x))) it should be another thing.

As usual, I will be pleased of listening to your tips. And thanks again :)
Explorador
 
Posts: 14
Joined: Mon Aug 25, 2014 9:32 am
Location: Spain

Re: Range() too large in Problem 3

Postby micseydel » Thu Aug 28, 2014 5:23 pm

You're much closer now, you've greatly reduced the work your code has to do. As you've noticed, you're getting false positives for many primes. Let's consider the input value 9 for your isprime() function, which doesn't quite work yet. It will iterate from 2 up to the square root of 9, non-inclusive. Is that what you want? Or is that a bit off?

For further development on this problem, what would you say your Big O is now? And although this solution is good, and what you have here is similar to the first isprime() function I wrote, you'll eventually need something better for future PE problems. We can point you in the right direction, but I'll avoid the S word if you don't know it already to prevent spoilers ;)
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1507
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Range() too large in Problem 3

Postby Explorador » Thu Aug 28, 2014 9:55 pm

Here are the news:

I could fix the false positives primes just adding +1 as follows:

Code: Select all
def isprime(x):
   return all(x % i for i in xrange(2, int(math.sqrt(x)) + 1))


Then, I was thinking about the Big O, and I noticed that the growth rate of a logarithm is lower than the square root, so I tried these things:

Code: Select all
def isprime(x):
   return all(x % i for i in range(2, int(math.log(x, 2)) + 2))
for x in xrange(2, int(math.sqrt(N))):
   if isprime(x):
      primos.append(x)


(N and primos are declared before)

And we have:

Code: Select all
len(primos)
135888


And it works properly unless the fact that it does not start at 2, and the fact that number 49 is there, but maybe it could be fixed with something as we did last time(in addition to 49, it is very probable there are more "intruders"):

Code: Select all
primos[:30]
[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113]


Then, I tried another option:

Code: Select all
def isprime(x):
   return all(x % i for i in range(2, int(math.log(x, 2)) + 2))
for x in xrange(2, int(math.log(N, 2))):
   if isprime(x):
      primos.append(x)


But this time, the result it:

Code: Select all
len(primos)
11


so just:
Code: Select all
primos[:30]
[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]


Finally, I also tried:

Code: Select all
def isprime(x):
   return all(x % i for i in xrange(2, int(math.sqrt(x)) + 1))
for x in xrange(2, int(math.log(N, 2))):
   if isprime(x):
      primos.append(x)


with:

Code: Select all
len(primos)
12


and:

Code: Select all
primos
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]


So, after these 3 options, I reached a conclusion with 2 possible points of view:

1) Only the first case is the efficent one, so using log at both places is not a good idea because it makes it so tiny.
2) Maybe second or third case could be ok if the base of the log is a different one, or we do any other kind of improvement.

In any case, I still having the problem of not using the log properly (remember the little mistake we noticed in the first option).

Right now, I would use sqrt because it works fine, but it is true that log could be even better, so it could be great if I can improve its use.

I would be pleased to listen to your opinion as usual.
Explorador
 
Posts: 14
Joined: Mon Aug 25, 2014 9:32 am
Location: Spain

Re: Range() too large in Problem 3

Postby micseydel » Fri Aug 29, 2014 12:41 am

Are you saying the Project Euler website indicated that you got it correct?

Regarding log() vs sqrt(), while it's true that log grows more slowly and so for larger inputs will be faster, and that your O(n) algorithm was probably as correct as your O(sqrt(n)), just too inefficient, you can only choose the faster one because it is also correct. Not every linear algorithm can be done in square root or logarithmic time. For example, binary search takes advantage of certain properties of some collections (which are sorted and have random access) to perform a search in logarithmic time. If you take an unsorted list and apply the same very efficient algorithm, it will not be correct.

With this in mind, can it be done correctly in logarithmic time like it can square root time? Or is the square root version not correct either, and just happens to work in this case?
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1507
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Range() too large in Problem 3

Postby Explorador » Fri Aug 29, 2014 9:39 am

Yep, Project Euler website gave it to me as a good answer (well.. not only with what we wrote, but also with more code that I will not put in order to avoid spoiler). In fact, now I am about to start the 8th problem :)

About what you said... I think that we have to consider each problem separately. For example, in this case we found that square root works fine, but maybe it will not in another case. So we cannot say that square root will be always better than logarithmic one. I think that maybe the logarithmic one is not good in this case, but it could in another one, as the binary case you linked me. Maybe I am wrong, but this is the conclusion I reached after trying to apply them to this Problem 3 of PE.
Explorador
 
Posts: 14
Joined: Mon Aug 25, 2014 9:32 am
Location: Spain


Return to Project Euler

Who is online

Users browsing this forum: No registered users and 1 guest

cron