Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Python In the following code cell, write a function named get_ec() that takes xm

ID: 3601780 • Letter: P

Question

Python

In the following code cell, write a function named get_ec() that takes xml, a string in xml format containing protein entries, and returns a list of EC numbers (text of the "ecNumber" element) for each of these entries.

Example

For the string in the code cell below:

ec_string1 = '''<?xml version="1.0" encoding="UTF-8"?>
<entries>
<protein>
<recommendedName>
<fullName>Nucleoside diphosphate kinase</fullName>
<shortName>NDK</shortName>
<shortName>NDP kinase</shortName>
<ecNumber>2.7.4.6</ecNumber>
</recommendedName>
</protein>
<protein>
<recommendedName>
<fullName>Tyrosine--tRNA ligase</fullName>
<ecNumber>6.1.1.1</ecNumber>
</recommendedName>
<alternativeName>
<fullName>Tyrosyl-tRNA synthetase</fullName>
<shortName>TyrRS</shortName>
</alternativeName>
</protein>
</entries>
'''.strip()

def get_ec(xml):
'''
Takes entries of proteins and returns the text from the element "ecNumber" as a list.
  
Paramters
---------
xml: String. An XML format string.
  
Returns
-------
ec_list: List. Contains text of the element "ecNumber" for each protein entry.
'''
ec_list = []
# YOUR CODE HERE
return ec_list

Explanation / Answer

We will 2 list of index of position <ecNumber> and </ecNumber>

We will put the substring between those indexes.

def get_ec(xml):

ec_starting=[n for n in xrange(len(xml)) if xml.find('<ecNumber>', n) == n]
ec_end=[n for n in xrange(len(xml)) if xml.find('</ecNumber>', n) == n]

ec_list=[]

for f,b in zip(ec_starting,ec_end):
ec_list.append(xml[f+10:b]) #adding 10,for moving index upto end of <ecNumber>
return ec_list