Skip to content

Instantly share code, notes, and snippets.

@claraj
Last active March 31, 2021 13:47
Show Gist options
  • Save claraj/0dea4bef2e9ac5e84462b4dbdb4ffe2c to your computer and use it in GitHub Desktop.
Save claraj/0dea4bef2e9ac5e84462b4dbdb4ffe2c to your computer and use it in GitHub Desktop.
Python bioinformations 101
# Example of real-world use of Python string manipulation - DNA analysis
#
# DNA is made of ATGC
#
# A-T are always paired
# G-C are always paired
#
# So if you have a sequence of one side of a DNA molecule, can you use Python to generate the other side?
#
dna1 = 'accagtaccagtgt'
dna2 = 'gtacaccaggtcta'
# Remember
# a pairs with t
# c pairs with g
# so for dna1, it begins acca... so output string will start tggt...
# In DNA, the ATCG are codes for generating proteins (you are made of 100's of different kinds of proteins).
# Most of your DNA doesn't appear to make proteins - only about 1% of it encodes protein.
# A part of DNA that encodes a protein is called a gene So how do you find which parts do,
# or where are your genes in your DNA?
#
# So biologist are interested in where certain codes are. One code is
# ATG is called a 'start codon' and that means 'start making a protein here'
#
# Does string 1 have any genes in it?
# Does string 2 have any genes in it?
#
# If so, what's the index of where that gene is?
dna3 = 'acgatggatacgcgggagctattcatctgtgttgagaaacaccggagaacttattggtctgtcaagattgcgactgtggtatagctcacccggtcgcggctttctagt' \
'tagtggccagctcccgtgtatttggaagctgagagaaggacccctgtggttcgaatcagctcacgagcgctggcacaccgcaatcagccggctaataaaattcgtatg' \
'gactgccccacacaagaagacggtaaatttatcaacactatagttgctatacaccaggagcgagcgtaaatttgtagcggtcagattaacttgctgggaatgaaccat' \
'tgtcgccctctgcagcaagttagatggcatgattggtactgcccttcactggtagcagctccccctgtaatatatccgtggccactattcaagggctcaaataggcga' \
'ccatgagagaccattataggcggtacagcgatggtaggtttgcctgggcagatatcgttagccccttctgcgcgctataagatagcgaaggataattctgcgggacca' \
'tggtcgtctcctaacctcagggtgggattcctggcaggtggaccgggcgcgcatcgagagcattcggggttcctaccagccagggaaatcgggtcgaccactaggcaa' \
'tgagcggctcacaccgattttcttaagagacgtaacaaagcccgcatgaacggctggagtgaatcaccgtacgactacctaagcctcattgggatccactgtaaaccc' \
'cttcgccggtgttgggtgtccgcaacgcctctgctttttgcgtacagtcggcgtggtggagtccgcggccatactggcggatggtttgtagaacagtgtaacgatgtg' \
'tgtcactgccccccgtagcttctattgccatgtttgggaggttctataggggttacagagtagttttaagttttagcacgacagcaccagtattgccagtgatgccgt' \
'tgaggccgcaaaagtgattaacccccgtgggaccggatacgttcccagcggcaatccttgtcttaccgccggactgcggagcgaagggagaagtaaccgtggtaatta'
dna4 = 'cagagcaatgtctgttagataatctctcgtctggatagcgagaagtttccggaagacgattgtttccaacgaaagggctgataactacactctgtcgcgcttctttcg' \
'tgttcgccatgggcacattggtttaaaagtgatctcgagagacgttttcatgacttgttgtgttatatcaacgtaacttttaagtcatattttctccctaccccagac' \
'tagatgggttcctttcatcgtccaccgagttgcttacgagcaugacacttagccggggaaaatgttcgcaatgttccgcgacagcgtcaggtgtcaaacagaaagcga' \
'aggccgccgtgtaacggagaattgtgggcgcagtcaaatagctaattattgggaaaggccatgtggagtccgtcagcggaacagcctgggcggacgcgctgccgctcg' \
'ttcacctcgcctgccttcgtgttggggaccggatacgttcccagcggcaatccttgtcttaccgccggactgcggagcgaagggagaagtaaccgtggtaattagcga' \
'gagaccgttgaggcgcggggcgatccgcccttgagtggactccaaacacattcgacgaaggggtgggaacataagttaattggagggtcggggaagtcccacgcccgg' \
'tccctacatgattgcacatagttcgttcaccaacgggcgatcttcctcacactagaggaacgagtagtactccagacattgagtcagttgcagaccaagtggagggaa' \
'cgatttttaugggccgctcaggtactagtgctagaatgcctacaaacggcactggtgacccgctcccgagtttgcgctgttacgtgtcccttaaagtatacttcgatc' \
'aacatggcggccatacgacgcttaaatatttcaccagttgtgtttcgcgcauggagttgttctgtgttatcggcgagtctccattgcacgtcatcaactaaaaaccac' \
'ggccacacagacatgccttgattcttcccgcgatggtaggtttgcctgggcagatatcgttagccccttctgcgcgctataagatagcgatggtaggtttaactatca'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment