## How to split up a string into a list, 5 characters per chunk

This is the place for queries that don't fit in any of the other categories.

### How to split up a string into a list, 5 characters per chunk

Hi, I'm doing an exercise for bioinformatics. In the exercise I have to split a gene sequence, which is in the form of a string, into base groups of 5. So for example:
Code: Select all
`s='GTAGTACGAATTTGAGCAAA'`

and then I want my output to be in a form of a list:
Code: Select all
`l=['GTAGT','ACGAA','TTTGA','GCAAA']`

But I have absolutely no idea how to do this. Please help!
Last edited by Yoriz on Thu Feb 28, 2013 7:03 pm, edited 2 times in total.
Reason: Added code tags, Changed title
johnick013

Posts: 1
Joined: Mon Feb 25, 2013 3:36 pm

### Re: How to split up a string

Admins will probably tell you to read this:
http://www.python-forum.org/viewtopic.php?f=6&t=145
You should use code tags, and most important, show your attempts to solve the problem.

Live long and prosper.
Spock

zeycus

Posts: 23
Joined: Sun Feb 17, 2013 10:30 am

### Re: How to split up a string

Here is a recursive solution, it will take any length of string, when there is less then 5 left for a group it will use whatever is left for the last list item, which my or may not be want you want to happen.
Code: Select all
`string = 'GTAGTACGAATTTGAGCAAA'def chunk_five(string):    return [string[:5]] + chunk_five(string[5:]) if string else []print chunk_five(string)['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`
Due to the reasons discussed here we will be moving to python-forum.io/ on October 1 2016
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

Yoriz

Posts: 1672
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

While recursion is neat, it's not efficient, and I'm not sure that list concatenation is either. Below I have an iterator solution which will work for a string of greater length than 5000, and which is significantly less likely to get you a MemoryError too.
Code: Select all
`>>> from itertools import izip>>> def chunk_five(iterable):   my_it = iter(iterable)   return izip(*[my_it]*5)>>> chunk_five('GTAGTACGAATTTGAGCAAA')<itertools.izip object at 0x7f4390034248>>>> list(chunk_five('GTAGTACGAATTTGAGCAAA'))[('G', 'T', 'A', 'G', 'T'), ('A', 'C', 'G', 'A', 'A'), ('T', 'T', 'T', 'G', 'A'), ('G', 'C', 'A', 'A', 'A')]>>> >>> ]>>> def chunk_five(iterable):   my_it = iter(iterable)        # if getting back strings instead of tuples is important   return (''.join(five) for five in izip(*[my_it]*5))>>> list(chunk_five('GTAGTACGAATTTGAGCAAA'))['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`
Due to the reasons discussed here we will be moving to python-forum.io on October 1, 2016.

This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

micseydel

Posts: 3000
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

### Re: How to split up a string

While recursion and iterators are nice, aren't they a bit high level? Why not just use slicing?

Code: Select all
`genes = 'GTAGTACGAATTTGAGCAAA'fives = [genes[start:(start + 5)] for start in range(0, len(genes), 5)]`

Even list comprehensions might be above beginner level, so I might even put it in a loop:

Code: Select all
`genes = 'GTAGTACGAATTTGAGCAAA'fives = []for start in range(0, len(games), 5):   fives.append(genes[start:(start + 5)])`
Due to the reasons discussed here we will be moving to python-forum.io on October 1st, 2016.
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.
ichabod801

Posts: 688
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

### Re: How to split up a string

Here's is another go.
Code: Select all
`string = 'GTAGTACGAATTTGAGCAAA'def yield_chunk_five(string):    while string:        yield string[:5]        string = string[5:]print list(yield_chunk_five(string))['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`
Due to the reasons discussed here we will be moving to python-forum.io/ on October 1 2016
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

Yoriz

Posts: 1672
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

What's wrong with high level? The iterator works well for very large samples, which is common with DNA. Also, this person likely isn't someone who need to learn general Python, they're just someone trying to do bioinformatics and so they need to know how to do this one thing.

Yoriz: that solution makes new, potentially big strings every iteration of the loop.
Due to the reasons discussed here we will be moving to python-forum.io on October 1, 2016.

This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

micseydel

Posts: 3000
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

### Re: How to split up a string

Is this just going to turn into how many ways can we split the string into lenths of five?

Code: Select all
`[''.join(word) for word in zip(*[genes[start::5] for start in range(5)])]`
Due to the reasons discussed here we will be moving to python-forum.io on October 1st, 2016.
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.
ichabod801

Posts: 688
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

### Re: How to split up a string

O bugger, i thought it was just chopping 5 off the string each time but i think i see now that its creating a new string that's 5 less then the last, back to the drawing board.
Due to the reasons discussed here we will be moving to python-forum.io/ on October 1 2016
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

Yoriz

Posts: 1672
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

micseydel wrote:What's wrong with high level? The iterator works well for very large samples, which is common with DNA. Also, this person likely isn't someone who need to learn general Python, they're just someone trying to do bioinformatics and so they need to know how to do this one thing.

When teaching I stick to simple. I don't know who this guy is or what the context of his exercise in Bioinformatics is, so I would aim for simple that he is more likely to understand.
Due to the reasons discussed here we will be moving to python-forum.io on October 1st, 2016.
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.
ichabod801

Posts: 688
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

### Re: How to split up a string

And I'm just a hobbyist python coder that makes up crappy solutions that might help for the time being till some one that knows what there doing comes along.
Due to the reasons discussed here we will be moving to python-forum.io/ on October 1 2016
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

Yoriz

Posts: 1672
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

Is this just going to turn into how many ways can we split the string into lenths of five?

Why not
Code: Select all
`>>> import re>>> s = 'GTAGTACGAATTTGAGCAAA'>>> re.findall(r'.'*5, s)['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`

Code: Select all
`>>> map(None, *([iter(s)] * 5))[('G', 'T', 'A', 'G', 'T'), ('A', 'C', 'G', 'A', 'A'), ('T', 'T', 'T', 'G', 'A'), ('G', 'C', 'A', 'A', 'A')]`
We will be moving to python-forum.io on October 1 2016

snippsat

Posts: 1251
Joined: Thu Feb 21, 2013 12:04 am

### Re: How to split up a string

ichabod801 wrote:Is this just going to turn into how many ways can we split the string into lenths of five?

Its already been done to death.
http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks
Due to the reasons discussed here we will be moving to python-forum.io/ on October 1 2016
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

Yoriz

Posts: 1672
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

ichabod801 wrote:Is this just going to turn into how many ways can we split the string into lenths of five?

If so, let me drop a few lines:

Code: Select all
`>>> import textwrap>>> split_seq=textwrap.TextWrapper(width=5).wrap>>> split_seq('GTAGTACGAATTTGAGCAAA')['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`
Code: Select all
`<function signature at 0xb73f910c>`

Jaro

Posts: 8
Joined: Sat Feb 23, 2013 6:16 pm