Warm up: The code below creates a dictionary with the factorial of all numbers up to 50. The key is the number, and the value is its factorial.

What is the factorial of 10?

What is the factorial of 29?

In [5]:
# factorial generator
import math
factorial_values = {}
for i in range(1,51):
    dict_value = math.factorial(i)
    factorial_values[i] = dict_value

# your code here
print(factorial_values[10])
print(factorial_values[29])
3628800
8841761993739701954543616000000

Warm up part 2: How do you add elements to a dictionary?

In [15]:
# codon finder functions
def find_start_codons(sequence): 
    start_codon_list = []
    sequence_length = len(sequence)
    for position in range(0,sequence_length - 2):
    # the letter is the letter at "position" in the sequence
        codon = sequence[position:position+3]
        if codon == 'AUG':   
            start_codon_list.append(position)
            # position is an int, so we have to typecase it to a string
    return start_codon_list

def find_stop_codons(sequence): 
    stop_codon_list = []
    stop_codons = ['UGA','UAA','UAG']
    sequence_length = len(sequence)
    for position in range(0,sequence_length - 2):
    # the letter is the letter at "position" in the sequence
        codon = sequence[position:position+3]
        if codon == stop_codons[0]:    
            # position is an int, so we have to typecase it to a string
            stop_codon_list.append(position)
        elif codon == stop_codons[1]:    
            # position is an int, so we have to typecase it to a string
            stop_codon_list.append(position)
        elif codon == stop_codons[2]:    
            # position is an int, so we have to typecase it to a string
            stop_codon_list.append(position)
                # position is an int, so we have to typecase it to a string
    return stop_codon_list
In [2]:
rna_sequence = 'AUGUUUUUGAUACUUUUAAUUUCCUUACCAACGGCUUUUGCUGUUAUAGGAGAUUUAAAGUGUACUACAGUUUCCAUUAAUGAU'
print(find_start_codons(rna_sequence))
print(find_stop_codons(rna_sequence))
[0, 79]
[7, 16, 46, 55, 77, 80]
In [11]:
# read fasta to variable new_fasta
#new_fasta = read_fasta('/home/jovyan/python_notebooks/test.fasta')
import sys
def read_fasta(path):
    file = open(path, "r")
    fasta = dict()
    for line in file.readlines():
        if line.startswith('>'):
            entry = line.replace('>', '').strip()
            fasta[entry] = ''
        else:
            fasta[entry] += ''.join(line.strip())
    return(fasta)
new_fasta = read_fasta('/Users/vikas/Downloads/sbcc_slides/intro_python/vikas/teacher_slides/test.fasta')
new_sequence = 'AUGUUAUUCUAUCUAGUUUCGGCUACUAGUUCAUGGUGUGUAACUAGUAUCA'
new_header = 'test_header'
new_fasta[new_header] = new_sequence

Run your start codon finding function on gene_1 from new_fasta

In [7]:
# your code here
find_start_codons(new_fasta['gene_1'])
Out[7]:
[31,
 34,
 82,
 168,
 182,
 213,
 316,
 483,
 501,
 521,
 611,
 616,
 631,
 787,
 824,
 833,
 903,
 959,
 984,
 1038,
 1086,
 1134,
 1245,
 1273,
 1381,
 1489,
 1826]

the .keys() function lets you check all of the keys in a dictionary

When would this be helpful?

In [6]:
print(factorial_values.keys())
dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

what are all of the keys in new_fasta?

In [12]:
# your code here
print(new_fasta.keys())
dict_keys(['gene_1', 'gene_2', 'test_header'])

run your stop codon finder on all genes in new_fasta

hint: you can iterate on the dictionary keys in a for loop!

In [17]:
# your code here 
for key in new_fasta.keys():
    print(find_stop_codons(new_fasta[key]))
    
[1, 32, 53, 68, 79, 92, 97, 157, 169, 179, 190, 221, 239, 265, 275, 292, 317, 325, 337, 350, 359, 377, 390, 414, 431, 453, 484, 499, 502, 513, 542, 559, 584, 632, 637, 698, 701, 751, 779, 825, 839, 868, 882, 899, 956, 966, 994, 1015, 1048, 1056, 1083, 1091, 1117, 1135, 1140, 1219, 1243, 1252, 1259, 1289, 1310, 1330, 1339, 1360, 1409, 1435, 1454, 1472, 1543, 1554, 1562, 1575, 1613, 1616, 1632, 1664, 1681, 1695, 1700, 1704, 1711, 1716, 1766, 1774, 1782, 1800, 1813, 1847, 1864, 1911]
[4, 19, 22, 33, 62, 79, 104, 152, 157, 218, 221, 271, 299, 345, 359, 388, 402, 419, 476, 486, 514, 535, 568, 576, 603, 611, 637, 655, 660, 739, 763, 772, 779, 809, 830, 850, 859, 880, 929, 955, 974, 992, 1063, 1074, 1082, 1095, 1133, 1136, 1152, 1184, 1201, 1215, 1220, 1224, 1231, 1236, 1286, 1294, 1302, 1320, 1333, 1367, 1384, 1431]
[13, 26, 40, 44]

Challenge: We know how to find all of the start and stop codons are for all of the genes in our fasta.

For each gene, write a function to calculate the distance between the first start codon and the last stop codon in each gene.

Check if this distance is divisible by three using the command I wrote below.

The command returns True if it is divisible by three, and False if it isn't. The only argument is the number you want to check the remainder of

In [23]:
def check_if_divisible(number):
    if number % 3 == 0:
        return True
    else:
        return False

# your code here 
In [ ]: