Comparing dict

This is the place for queries that don't fit in any of the other categories.

Comparing dict

Postby Claire5555 » Tue Jun 04, 2013 11:14 am

am in need of advice, so I have two text file with a bunch of transcript name and their corresponding length, it looks like this:

File1:
A 256
B 456
File2:
A 245
B 435
I want to compare the length of the transcript and see if the length in file 2 is at least 90% of the length in file for the corresponding transcript name ( I hope I am clear!)

I wrote the following script but the output file only gives me one transcript instead of 100
from collections import defaultdict
import numpy as np

Code: Select all
 ercctranscript_size = {}
for line in open('ERCC.txt'):
    columns = line.strip().split()
    transcript = columns[0]
    size = columns[1]
    ercctranscript_size[transcript] = int(size)
    size = ercctranscript_size[transcript]

unknown_transcript = open('Not_sequenced_ERCC_transcript.txt', 'w')
blast_file = open('blast.txt')
out_file = open ('out.txt', 'w')

blast_transcript = {}
for line in blast_file:
    columns = line.strip().split()
    blasttranscript = columns[0].strip()
    blastsize = columns[1].strip()
    blast_transcript[blasttranscript] = int(blastsize)
    blastsize = blast_transcript[blasttranscript]

    if transcript not in blast_transcript:
        unknown_transcript.write('{0}\n'.format(transcript))
    else:
        if blastsize >= 0.9*size:
            print >> out_file, transcript, True
        else:
            print >> out_file, transcript, False


Does anyone see why I am not getting the entire list as an output with false or true?


Thanks a lot for your help
Last edited by Claire5555 on Tue Jun 04, 2013 12:15 pm, edited 2 times in total.
Claire5555
 
Posts: 3
Joined: Tue Jun 04, 2013 11:07 am

Re: Comparing dict

Postby setrofim » Tue Jun 04, 2013 12:13 pm

Please read this; specifically, the "how to post code" section.
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm

Re: Comparing dict

Postby Claire5555 » Tue Jun 04, 2013 12:14 pm

Thanks for showing me!!!
Claire5555
 
Posts: 3
Joined: Tue Jun 04, 2013 11:07 am

Re: Comparing dict

Postby MichelFJM » Tue Jun 04, 2013 1:24 pm

Hello

The most obvious mistake is the position of the line :
Code: Select all
size = ercctranscript_size[transcript]

In your code, size takes only one value (taken with last line in the first loop).

This code line should be in the second loop (before the test with size).

It you still have problems, please give again you code, a longer file1-file2 example and expected result.
MichelFJM
 
Posts: 19
Joined: Wed May 22, 2013 1:41 pm

Re: Comparing dict

Postby Claire5555 » Tue Jun 04, 2013 2:08 pm

Thanks for the reply, I unfortunately get the same output again so I ll try to clarify things as you asked:

ERCC.txt
ERCC-00002 1061
ERCC-00003 1023
ERCC-00004 523
ERCC-00009 984
ERCC-00012 994
ERCC-00013 808
ERCC-00014 1957
ERCC-00016 844
ERCC-00017 1136
ERCC-00019 644

blast_file:
ERCC-00002 1058
ERCC-00003 1017
ERCC-00004 519
ERCC-00009 977
ERCC-00019 638
ERCC-00022 746
ERCC-00024 134
ERCC-00024 126
ERCC-00024 98
ERCC-00025 445

Code: Select all
ercctranscript_size = {}
for line in open('ERCC.txt'):
    columns = line.strip().split()
    transcript = columns[0]
    size = columns[1]
    ercctranscript_size[transcript] = int(size)

unknown_transcript = open('Not_sequenced_ERCC_transcript.txt', 'w')
blast_file = open('blast.txt')
out_file = open ('out.txt', 'w')

blast_transcript = {}
for line in blast_file:
    columns = line.strip().split()
    blasttranscript = columns[0].strip()
    blastsize = columns[1].strip()
    blast_transcript[blasttranscript] = int(blastsize)
    blastsize = blast_transcript[blasttranscript]
    size = ercctranscript_size[transcript]
   
    if transcript not in blast_transcript:
        unknown_transcript.write('{0}\n'.format(transcript))
    else:
        if blastsize >= 0.9*size:
            print >> out_file, transcript, True
        else:
            print >> out_file, transcript, False


I am expecting this output:
ERCC-00002 TRUE
ERCC-00003 TRUE
ERCC-00004 TRUe
ERCC-00009 TRUE
ERCC-00019 FALSE

I understand your reply about size having one value and as my output only contains one value I guess this is the problem. I changed the size variable position but it did not make any differences (I guess I did not get it)

Thanks for your help so much
Claire
Claire5555
 
Posts: 3
Joined: Tue Jun 04, 2013 11:07 am

Re: Comparing dict

Postby ochichinyezaboombwa » Tue Jun 04, 2013 8:01 pm

This is not what you wanted to do.
To see what's going on,
after (or before) the lines
Code: Select all
    blastsize = blast_transcript[blasttranscript]
    size = ercctranscript_size[transcript]

add one more:
Code: Select all
print ("Comparting", blasttranscript, "with", transcript))
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: Comparing dict

Postby MichelFJM » Wed Jun 05, 2013 1:51 pm

Hello

There are some confusions with your variables names near values comparisons. With this code :
Code: Select all
ercctranscript_size = {}
for line in open('ERCC.txt'):
    columns = line.strip().split()
    transcript = columns[0]
    size = columns[1]
    ercctranscript_size[transcript] = int(size)

unknown_transcript = open('Not_sequenced_ERCC_transcript.txt', 'w')
blast_file = open('blast.txt')
out_file = open ('out.txt', 'w')

blast_transcript = {}
for line in blast_file:
    columns = line.strip().split()
    blasttranscript = columns[0].strip()
    blastsize = columns[1].strip()
    blast_transcript[blasttranscript] = int(blastsize)
    if blasttranscript not in ercctranscript_size:
        print blasttranscript,"not found in ",ercctranscript_size.keys()
    else:
        size = ercctranscript_size[blasttranscript]
        if blastsize >= 0.9*size:
            print "WRITE",blasttranscript, True , " because ",blastsize,">=0.9*",size
        else:
            print "WRITE",blasttranscript, False

You obtain almost what you want.
MichelFJM
 
Posts: 19
Joined: Wed May 22, 2013 1:41 pm


Return to General Coding Help

Who is online

Users browsing this forum: Bing [Bot], stranac and 4 guests