## How to split up a string into a list, 5 characters per chunk

This is the place for queries that don't fit in any of the other categories.

### How to split up a string into a list, 5 characters per chunk

Hi, I'm doing an exercise for bioinformatics. In the exercise I have to split a gene sequence, which is in the form of a string, into base groups of 5. So for example:
Code: Select all
`s='GTAGTACGAATTTGAGCAAA'`

and then I want my output to be in a form of a list:
Code: Select all
`l=['GTAGT','ACGAA','TTTGA','GCAAA']`

Last edited by Yoriz on Thu Feb 28, 2013 7:03 pm, edited 2 times in total.
Reason: Added code tags, Changed title
johnick013

Posts: 1
Joined: Mon Feb 25, 2013 3:36 pm

### Re: How to split up a string

http://www.python-forum.org/viewtopic.php?f=6&t=145
You should use code tags, and most important, show your attempts to solve the problem.

Live long and prosper.
Spock

zeycus

Posts: 23
Joined: Sun Feb 17, 2013 10:30 am

### Re: How to split up a string

Here is a recursive solution, it will take any length of string, when there is less then 5 left for a group it will use whatever is left for the last list item, which my or may not be want you want to happen.
Code: Select all
`string = 'GTAGTACGAATTTGAGCAAA'def chunk_five(string):    return [string[:5]] + chunk_five(string[5:]) if string else []print chunk_five(string)['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`
Join the #python-forum IRC channel on irc.freenode.net!

Yoriz

Posts: 1465
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

While recursion is neat, it's not efficient, and I'm not sure that list concatenation is either. Below I have an iterator solution which will work for a string of greater length than 5000, and which is significantly less likely to get you a MemoryError too.
Code: Select all
`>>> from itertools import izip>>> def chunk_five(iterable):   my_it = iter(iterable)   return izip(*[my_it]*5)>>> chunk_five('GTAGTACGAATTTGAGCAAA')<itertools.izip object at 0x7f4390034248>>>> list(chunk_five('GTAGTACGAATTTGAGCAAA'))[('G', 'T', 'A', 'G', 'T'), ('A', 'C', 'G', 'A', 'A'), ('T', 'T', 'T', 'G', 'A'), ('G', 'C', 'A', 'A', 'A')]>>> >>> ]>>> def chunk_five(iterable):   my_it = iter(iterable)        # if getting back strings instead of tuples is important   return (''.join(five) for five in izip(*[my_it]*5))>>> list(chunk_five('GTAGTACGAATTTGAGCAAA'))['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`
Join the #python-forum IRC channel on irc.freenode.net for off-topic chat!

Please prefer not to PM members. The point of the forum is so that anyone can benefit. We don't want to help you over PMs/emails/Skype chats that others can't benefit from

micseydel

Posts: 2252
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

### Re: How to split up a string

While recursion and iterators are nice, aren't they a bit high level? Why not just use slicing?

Code: Select all
`genes = 'GTAGTACGAATTTGAGCAAA'fives = [genes[start:(start + 5)] for start in range(0, len(genes), 5)]`

Even list comprehensions might be above beginner level, so I might even put it in a loop:

Code: Select all
`genes = 'GTAGTACGAATTTGAGCAAA'fives = []for start in range(0, len(games), 5):   fives.append(genes[start:(start + 5)])`
Craig "Ichabod" O'Brien
Minimalist, buddhist, theist, and programmer
Current languages: Python, SAS, C++
Previous serious languages: Erlang, R, Java, VBA, Lisp, HyperTalk, BASIC
ichabod801

Posts: 381
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

### Re: How to split up a string

Here's is another go.
Code: Select all
`string = 'GTAGTACGAATTTGAGCAAA'def yield_chunk_five(string):    while string:        yield string[:5]        string = string[5:]print list(yield_chunk_five(string))['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`
Join the #python-forum IRC channel on irc.freenode.net!

Yoriz

Posts: 1465
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

What's wrong with high level? The iterator works well for very large samples, which is common with DNA. Also, this person likely isn't someone who need to learn general Python, they're just someone trying to do bioinformatics and so they need to know how to do this one thing.

Yoriz: that solution makes new, potentially big strings every iteration of the loop.
Join the #python-forum IRC channel on irc.freenode.net for off-topic chat!

Please prefer not to PM members. The point of the forum is so that anyone can benefit. We don't want to help you over PMs/emails/Skype chats that others can't benefit from

micseydel

Posts: 2252
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

### Re: How to split up a string

Is this just going to turn into how many ways can we split the string into lenths of five?

Code: Select all
`[''.join(word) for word in zip(*[genes[start::5] for start in range(5)])]`
Craig "Ichabod" O'Brien
Minimalist, buddhist, theist, and programmer
Current languages: Python, SAS, C++
Previous serious languages: Erlang, R, Java, VBA, Lisp, HyperTalk, BASIC
ichabod801

Posts: 381
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

### Re: How to split up a string

O bugger, i thought it was just chopping 5 off the string each time but i think i see now that its creating a new string that's 5 less then the last, back to the drawing board.
Join the #python-forum IRC channel on irc.freenode.net!

Yoriz

Posts: 1465
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

micseydel wrote:What's wrong with high level? The iterator works well for very large samples, which is common with DNA. Also, this person likely isn't someone who need to learn general Python, they're just someone trying to do bioinformatics and so they need to know how to do this one thing.

When teaching I stick to simple. I don't know who this guy is or what the context of his exercise in Bioinformatics is, so I would aim for simple that he is more likely to understand.
Craig "Ichabod" O'Brien
Minimalist, buddhist, theist, and programmer
Current languages: Python, SAS, C++
Previous serious languages: Erlang, R, Java, VBA, Lisp, HyperTalk, BASIC
ichabod801

Posts: 381
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

### Re: How to split up a string

And I'm just a hobbyist python coder that makes up crappy solutions that might help for the time being till some one that knows what there doing comes along.
Join the #python-forum IRC channel on irc.freenode.net!

Yoriz

Posts: 1465
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

Is this just going to turn into how many ways can we split the string into lenths of five?

Why not
Code: Select all
`>>> import re>>> s = 'GTAGTACGAATTTGAGCAAA'>>> re.findall(r'.'*5, s)['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`

Code: Select all
`>>> map(None, *([iter(s)] * 5))[('G', 'T', 'A', 'G', 'T'), ('A', 'C', 'G', 'A', 'A'), ('T', 'T', 'T', 'G', 'A'), ('G', 'C', 'A', 'A', 'A')]`

snippsat

Posts: 798
Joined: Thu Feb 21, 2013 12:04 am

### Re: How to split up a string

ichabod801 wrote:Is this just going to turn into how many ways can we split the string into lenths of five?

Its already been done to death.
http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks
Join the #python-forum IRC channel on irc.freenode.net!

Yoriz

Posts: 1465
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: How to split up a string

ichabod801 wrote:Is this just going to turn into how many ways can we split the string into lenths of five?

If so, let me drop a few lines:

Code: Select all
`>>> import textwrap>>> split_seq=textwrap.TextWrapper(width=5).wrap>>> split_seq('GTAGTACGAATTTGAGCAAA')['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']`
Code: Select all
`<function signature at 0xb73f910c>`

Jaro

Posts: 8
Joined: Sat Feb 23, 2013 6:16 pm