DNA and RNA sequences¶
Creating a DNA sequence from a string¶
All sequence and alignment objects have a molecular type, or MolType
which provides key properties for validating sequence characters. Here we use the DNA
MolType
to create a DNA sequence.
>>> from cogent import DNA
>>> my_seq = DNA.makeSequence("AGTACACTGGT")
>>> my_seq
DnaSequence(AGTACAC... 11)
>>> print my_seq
AGTACACTGGT
>>> str(my_seq)
'AGTACACTGGT'
Creating a RNA sequence from a string¶
>>> from cogent import RNA
>>> rnaseq = RNA.makeSequence('ACGUACGUACGUACGU')
Converting to FASTA format¶
>>> from cogent import DNA
>>> my_seq = DNA.makeSequence('AGTACACTGGT')
>>> print my_seq.toFasta()
>0
AGTACACTGGT
Convert a RNA sequence to FASTA format¶
>>> from cogent import RNA
>>> rnaseq = RNA.makeSequence('ACGUACGUACGUACGU')
>>> rnaseq.toFasta()
'>0\nACGUACGUACGUACGU'
Creating a named sequence¶
>>> from cogent import DNA
>>> my_seq = DNA.makeSequence('AGTACACTGGT','my_gene')
>>> my_seq
DnaSequence(AGTACAC... 11)
>>> type(my_seq)
<class 'cogent.core.sequence.DnaSequence'>
Setting or changing the name of a sequence¶
>>> from cogent import DNA
>>> my_seq = DNA.makeSequence('AGTACACTGGT')
>>> my_seq.Name = 'my_gene'
>>> print my_seq.toFasta()
>my_gene
AGTACACTGGT
Complementing a DNA sequence¶
>>> from cogent import DNA
>>> my_seq = DNA.makeSequence("AGTACACTGGT")
>>> print my_seq.complement()
TCATGTGACCA
Reverse complementing a DNA sequence¶
>>> print my_seq.reversecomplement()
ACCAGTGTACT
The rc
method name is easier to type
>>> print my_seq.rc()
ACCAGTGTACT
Translate a DnaSequence
to protein¶
>>> from cogent import DNA
>>> my_seq = DNA.makeSequence('GCTTGGGAAAGTCAAATGGAA','protein-X')
>>> pep = my_seq.getTranslation()
>>> type(pep)
<class 'cogent.core.sequence.ProteinSequence'>
>>> print pep.toFasta()
>protein-X
AWESQME
Converting a DNA sequence to RNA¶
>>> from cogent import DNA
>>> my_seq = DNA.makeSequence('ACGTACGTACGTACGT')
>>> print my_seq.toRna()
ACGUACGUACGUACGU
Convert an RNA sequence to DNA¶
>>> from cogent import RNA
>>> rnaseq = RNA.makeSequence('ACGUACGUACGUACGU')
>>> print rnaseq.toDna()
ACGTACGTACGTACGT
Testing complementarity¶
>>> from cogent import DNA
>>> a = DNA.makeSequence("AGTACACTGGT")
>>> a.canPair(a.complement())
False
>>> a.canPair(a.reversecomplement())
True
Joining two DNA sequences¶
>>> from cogent import DNA
>>> my_seq = DNA.makeSequence("AGTACACTGGT")
>>> extra_seq = DNA.makeSequence("CTGAC")
>>> long_seq = my_seq + extra_seq
>>> long_seq
DnaSequence(AGTACAC... 16)
>>> str(long_seq)
'AGTACACTGGTCTGAC'
Slicing DNA sequences¶
>>> my_seq[1:6]
DnaSequence(GTACA)
Getting 3rd positions from codons¶
We’ll do this by specifying the position indices of interest, creating a sequence Feature
and using that to extract the positions.
>>> from cogent import DNA
>>> seq = DNA.makeSequence('ATGATGATGATG')
Creating the position indices, note that we start at the 2nd index (the ‘first’ codon’s 3rd position) indicate each position as a span (i -- i+1
).
>>> indices = [(i, i+1) for i in range(len(seq))[2::3]]
Create the sequence feature and use it to slice the sequence.
>>> pos3 = seq.addFeature('pos3', 'pos3', indices)
>>> pos3 = pos3.getSlice()
>>> assert str(pos3) == 'GGGG'
Getting 1st and 2nd positions from codons¶
The only difference here to above is that our spans cover 2 positions.
>>> from cogent import DNA
>>> seq = DNA.makeSequence('ATGATGATGATG')
>>> indices = [(i, i+2) for i in range(len(seq))[::3]]
>>> pos12 = seq.addFeature('pos12', 'pos12', indices)
>>> pos12 = pos12.getSlice()
>>> assert str(pos12) == 'ATATATAT'
Return a randomized version of the sequence¶
print rnaseq.shuffle()
ACAACUGGCUCUGAUG
Remove gaps from a sequence¶
>>> from cogent import RNA
>>> s = RNA.makeSequence('--AUUAUGCUAU-UAu--')
>>> print s.degap()
AUUAUGCUAUUAU