Forcing resumption of execution

This is the place for queries that don't fit in any of the other categories.

Forcing resumption of execution

Postby comcomtech » Wed Aug 28, 2013 2:16 pm

My son wrote the following script to help me verify the uniformity of "randomly generated" directory output.

The directory refuses to respond after some 100 or so queries and resumes after a delay of an hour or two.

I am not a programmer. I need to modify the script to:
1. Force resumption of execution (after an hour, say) if the targeted site fails to respond.
2. Generate a written file (file.txt) if the targeted site doesn't resume responding in an hour (so I can capture the information already in memory.
3. Enable an execution of the script at a later point in time to add to the data already collected in the first run (and printed to a file ["file.txt"] rather than creating a new file).

Alternately, I need to modify the script to:
1. Pause after 50 searches for one hour. Then resume searching.
2. I need a total of about 1,500 searches.

Thanks very much in advance for your help!

Josh Wallace

P.S. My son just started his doctorate in math; he's too busy now to fiddle with this anymore.

Code: Select all
import re
import time
import urllib.parse
import urllib.request

# post data
url = 'http://ottiaq.org/en/directory/results/'
values = { 'langue_depart' : 'EN', 'langue_arrivee' : 'FR', 'cat' : 'ALL', 'profession[0]' : 'on', 'profession[1]' : 'off', 'profession[2]' : 'off', 'profession[3]' : 'off' }
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')

# open existing file (maybe remove this since multiple executions doesn't really work)
# f = open('test.txt', 'r+')
# tempfile = f.read().rsplit('\n')
# file = [[tempfile[i], tempfile[i+1]] for i in range(0, len(tempfile)-2, 2)]

filename = input('Log file: ')
f = open(filename, 'w+')
N = input('Number of samples: ')
file = []

# patterns
p = re.compile('"nom"><h4>\s*[^\s]*\s*,\s*[^\s]*')
q1 = re.compile('"nom"><h4>\s*')
q2 = re.compile('\s*,\s*')

# false webpage (for testing)
for i in range(int(N)):
   # retrieve data and extract relevant section
   
   # for false page, use:
   # html = open('test.html','r').read()
   # m = p.findall(html)
   # time.sleep(0.5)
   
   # for real page, use:
   req = urllib.request.Request(url, data)
   response = urllib.request.urlopen(req)
   html = response.read()
   m = p.findall(html.decode('utf-8'))
   time.sleep(0.5)

   # format names
   for index, item in enumerate(m):
      item = q1.sub('', item)
      item = q2.sub(', ', item)
      m[index] = item
   
   # update count
Last edited by micseydel on Wed Aug 28, 2013 4:47 pm, edited 1 time in total.
Reason: Locked OP, added code tags.
comcomtech
 
Posts: 3
Joined: Wed Aug 28, 2013 2:07 pm

Re: Forcing resumption of execution

Postby micseydel » Thu Aug 29, 2013 1:50 am

comcomtech wrote:I am not a programmer. I need to modify the script to:

What is it that you want from us exactly? Are you looking to hire someone to accomplish this for you? Are you looking for help to become a programmer so that you can make these modifications? Or are you looking for someone to do it for you?
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 939
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Forcing resumption of execution

Postby comcomtech » Thu Aug 29, 2013 5:02 am

I'm looking for someone to do it for me, assuming it's fairly easy to do.

I already made some modifications to the original code (the delay in each execution, the print to screen) that enabled me to try and find a work around and ultimately diagnose the problem.

I think there are a couple of lines of code I can add that would provide the desired result. If one of you could tell me what they are, I can insert them myself.
comcomtech
 
Posts: 3
Joined: Wed Aug 28, 2013 2:07 pm

Re: Forcing resumption of execution

Postby micseydel » Thu Aug 29, 2013 6:24 am

I should have said this before, but it doesn't like you included the whole code. I noticed that when I added code tags as well.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 939
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Forcing resumption of execution

Postby comcomtech » Thu Aug 29, 2013 10:03 am

Code: Select all
# You're right. Sorry.


import re
import time
import urllib.parse
import urllib.request

# post data
url = 'http://ottiaq.org/en/directory/results/'
values = { 'langue_depart' : 'EN', 'langue_arrivee' : 'FR', 'cat' : 'ALL', 'profession[0]' : 'on', 'profession[1]' : 'off', 'profession[2]' : 'off', 'profession[3]' : 'off' }
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')

# open existing file (maybe remove this since multiple executions doesn't really work)
# f = open('test.txt', 'r+')
# tempfile = f.read().rsplit('\n')
# file = [[tempfile[i], tempfile[i+1]] for i in range(0, len(tempfile)-2, 2)]

filename = input('Log file: ')
f = open(filename, 'w+')
N = input('Number of samples: ')
file = []

# patterns
p = re.compile('"nom"><h4>\s*[^\s]*\s*,\s*[^\s]*')
q1 = re.compile('"nom"><h4>\s*')
q2 = re.compile('\s*,\s*')

# false webpage (for testing)
for i in range(int(N)):
   # retrieve data and extract relevant section
   
   # for false page, use:
   # html = open('test.html','r').read()
   # m = p.findall(html)
   # time.sleep(0.5)
   
   # for real page, use:
   req = urllib.request.Request(url, data)
   response = urllib.request.urlopen(req)
   html = response.read()
   m = p.findall(html.decode('utf-8'))
   time.sleep(0.5)

   # format names
   for index, item in enumerate(m):
      item = q1.sub('', item)
      item = q2.sub(', ', item)
      m[index] = item
   
   # update count
   for name in m:
      matched = 0
      # check if name is already in file and increase corresponding count if it is
      for index, entry in enumerate(file):
         if re.match(name, entry[0]):
            matched = 1
            file[index][1] = str(1 + int(file[index][1]))
      # if name was not in file, append it with count 1
      if matched == 0:
         file.append([name, str(1)])
         print(name)

f.truncate(0)
# write data to file
for item in file:
   f.write('%s\n%s\n' % (item[0], item[1]))

f.close()

Mekire: Again. PLEASE use code tags. Indentation is lost without them.
Last edited by Mekire on Thu Aug 29, 2013 10:15 am, edited 1 time in total.
Reason: Code tags added.
comcomtech
 
Posts: 3
Joined: Wed Aug 28, 2013 2:07 pm


Return to General Coding Help

Who is online

Users browsing this forum: Majestic-12 [Bot] and 2 guests