{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# HTTP/HTML and Web Services\n", "## 10/24/2023\n", "\n", "print view\n", "\n", "notebook\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import re\n", "m = re.search(r'([^:]):(\\d+)([^:]?):([^:]+)(:([^:]+))?',\"A:44B:CYS:SG\")" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "('A', '44', 'B', 'CYS', ':SG', 'SG')" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m.groups()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# HTTP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hypertext Transfer Protocol\n", "\n", "A request-response protocol in a client-server framework." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# HTTP Requests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The request consists of the following:\n", "\n", "* A request line with desired method (action)\n", "* Request Headers\n", "* An empty line.\n", "* An optional message body.\n", "\n", "Example\n", "
GET / HTTP/1.1\n",
    "Host: cnn.com\n",
    "Connection: keep-alive\n",
    "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\n",
    "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.69 Safari/537.36\n",
    "Accept-Encoding: gzip,deflate,sdch\n",
    "Accept-Language: en-US,en;q=0.8\n",
    "Cookie: SelectedEdition=www; optimizelyEndUserId=oeu1364949768474r0.014468349516391754; optimizelySegments=%7B%22170962340%22%3A%22false%22%2C%22171657961%22%3A%22gc%22%2C%22172148679%22%3A%22none%22%2C%22172265329%22%3A%22search%22%7D; optimizelyBuckets=%7B%7D; s_vi=[CS]v1|25F54343051D3284-6000013900246AA6[CE]
\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# HTTP Requests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **GET**\n", " \n", " Requests a representation of the specified resource. Requests using GET should only retrieve data and should have no other effect.\n", " \n", "**POST**\n", "\n", "Submits data to the server in the request body. Can have side-effects.\n", "\n", "**HEAD**\n", "\n", "Asks for the response identical to the one that would correspond to a GET request, but without the response body (just the headers). \n", "\n", "**OPTIONS, PUT, DELETE, TRACE and CONNECT**\n", "\n", "HTTP 1.1 methods that are less commonly used." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Sending Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**GET**\n", "\n", "Data must be in the query string of the URL. This is the part of the URL after a question mark:\n", "\n", " http://server/program/path/?query_string\n", "\n", "The query_string is made up of name=value pairs separated by &:\n", "\n", " http://server/program/path/?field1=value1&field2=value2&field3=value3\n", " \n", "URLs have length limits (differs by browser, but generally needs to be <2000 characters)\n", " \n", "**POST**\n", "\n", "The data is sent in the request body. There is no length limit (just patience limit)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "http://www.rcsb.org/pdb/ngl/ngl.do?pdbid=3ERK&bionumber=1" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# HTTP Response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A response consists of the following:\n", "\n", "* A Status-Line (includes [response code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes))\n", "* Response Headers\n", "* An empty line\n", "* An optional message body\n", "\n", "
\n", "\n", "Example:\n", "\n", "```\n", "HTTP/1.1 200 OK\n", "Server: nginx\n", "Date: Thu, 10 Oct 2013 14:23:59 GMT\n", "Content-Type: text/html\n", "Transfer-Encoding: chunked\n", "Connection: keep-alive\n", "Set-Cookie: CG=US:PA:Pittsburgh; path=/\n", "Last-Modified: Thu, 10 Oct 2013 14:23:14 GMT\n", "Vary: Accept-Encoding\n", "Cache-Control: max-age=60, private\n", "Expires: Thu, 10 Oct 2013 14:24:58 GMT\n", "Content-Encoding: gzip\n", "\n", " CNN.com - Breaking News, U.S., World, Weather, Entertainment & Video News \n", "``` \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# requests " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`requests` is a simple but high-level interface for requesting http data\n", "\n", "Also:\n", " * `urllib2` - another high-level interface\n", " * `urllib` - similar to `urllib2` but different, has `urlencode` function\n", " * `urllib3` - successor to `urllib2` but different\n", " * `httplib` - low-level interface to http request\n", " * `mechanize` - much higher level interface for scripting web interactions - use this if you need to submit form data, passwords, etc.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `get`\n", "\n", "`requests.get` takes a URL and returns a response object that contains the message body.\n", "\n", "**Note:** `urllib2` and `mechanize` have a `urlopen` method that returns a file-like object." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n" ] } ], "source": [ "import requests\n", "response = requests.get('http://mscbio2025.csb.pitt.edu')\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Must include protocol in URL**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "ename": "MissingSchema", "evalue": "Invalid URL 'www.cnn.com': No scheme supplied. Perhaps you meant https://www.cnn.com?", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mMissingSchema\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[4], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m fail \u001b[38;5;241m=\u001b[39m \u001b[43mrequests\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mwww.cnn.com\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.local/lib/python3.10/site-packages/requests/api.py:73\u001b[0m, in \u001b[0;36mget\u001b[0;34m(url, params, **kwargs)\u001b[0m\n\u001b[1;32m 62\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mget\u001b[39m(url, params\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m 63\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124mr\u001b[39m\u001b[38;5;124;03m\"\"\"Sends a GET request.\u001b[39;00m\n\u001b[1;32m 64\u001b[0m \n\u001b[1;32m 65\u001b[0m \u001b[38;5;124;03m :param url: URL for the new :class:`Request` object.\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 70\u001b[0m \u001b[38;5;124;03m :rtype: requests.Response\u001b[39;00m\n\u001b[1;32m 71\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m---> 73\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mrequest\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mget\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43murl\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparams\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mparams\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.local/lib/python3.10/site-packages/requests/api.py:59\u001b[0m, in \u001b[0;36mrequest\u001b[0;34m(method, url, **kwargs)\u001b[0m\n\u001b[1;32m 55\u001b[0m \u001b[38;5;66;03m# By using the 'with' statement we are sure the session is closed, thus we\u001b[39;00m\n\u001b[1;32m 56\u001b[0m \u001b[38;5;66;03m# avoid leaving sockets open which can trigger a ResourceWarning in some\u001b[39;00m\n\u001b[1;32m 57\u001b[0m \u001b[38;5;66;03m# cases, and look like a memory leak in others.\u001b[39;00m\n\u001b[1;32m 58\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m sessions\u001b[38;5;241m.\u001b[39mSession() \u001b[38;5;28;01mas\u001b[39;00m session:\n\u001b[0;32m---> 59\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43msession\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmethod\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43murl\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43murl\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.local/lib/python3.10/site-packages/requests/sessions.py:575\u001b[0m, in \u001b[0;36mSession.request\u001b[0;34m(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)\u001b[0m\n\u001b[1;32m 562\u001b[0m \u001b[38;5;66;03m# Create the Request.\u001b[39;00m\n\u001b[1;32m 563\u001b[0m req \u001b[38;5;241m=\u001b[39m Request(\n\u001b[1;32m 564\u001b[0m method\u001b[38;5;241m=\u001b[39mmethod\u001b[38;5;241m.\u001b[39mupper(),\n\u001b[1;32m 565\u001b[0m url\u001b[38;5;241m=\u001b[39murl,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 573\u001b[0m hooks\u001b[38;5;241m=\u001b[39mhooks,\n\u001b[1;32m 574\u001b[0m )\n\u001b[0;32m--> 575\u001b[0m prep \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mprepare_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43mreq\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 577\u001b[0m proxies \u001b[38;5;241m=\u001b[39m proxies \u001b[38;5;129;01mor\u001b[39;00m {}\n\u001b[1;32m 579\u001b[0m settings \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmerge_environment_settings(\n\u001b[1;32m 580\u001b[0m prep\u001b[38;5;241m.\u001b[39murl, proxies, stream, verify, cert\n\u001b[1;32m 581\u001b[0m )\n", "File \u001b[0;32m~/.local/lib/python3.10/site-packages/requests/sessions.py:486\u001b[0m, in \u001b[0;36mSession.prepare_request\u001b[0;34m(self, request)\u001b[0m\n\u001b[1;32m 483\u001b[0m auth \u001b[38;5;241m=\u001b[39m get_netrc_auth(request\u001b[38;5;241m.\u001b[39murl)\n\u001b[1;32m 485\u001b[0m p \u001b[38;5;241m=\u001b[39m PreparedRequest()\n\u001b[0;32m--> 486\u001b[0m \u001b[43mp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mprepare\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 487\u001b[0m \u001b[43m \u001b[49m\u001b[43mmethod\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmethod\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mupper\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 488\u001b[0m \u001b[43m \u001b[49m\u001b[43murl\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43murl\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 489\u001b[0m \u001b[43m \u001b[49m\u001b[43mfiles\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfiles\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 490\u001b[0m \u001b[43m \u001b[49m\u001b[43mdata\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 491\u001b[0m \u001b[43m \u001b[49m\u001b[43mjson\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mjson\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 492\u001b[0m \u001b[43m \u001b[49m\u001b[43mheaders\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmerge_setting\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 493\u001b[0m \u001b[43m \u001b[49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mheaders\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mheaders\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdict_class\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mCaseInsensitiveDict\u001b[49m\n\u001b[1;32m 494\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 495\u001b[0m \u001b[43m \u001b[49m\u001b[43mparams\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmerge_setting\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparams\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparams\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 496\u001b[0m \u001b[43m \u001b[49m\u001b[43mauth\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmerge_setting\u001b[49m\u001b[43m(\u001b[49m\u001b[43mauth\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mauth\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 497\u001b[0m \u001b[43m \u001b[49m\u001b[43mcookies\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmerged_cookies\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 498\u001b[0m \u001b[43m \u001b[49m\u001b[43mhooks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmerge_hooks\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequest\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhooks\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhooks\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 499\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 500\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m p\n", "File \u001b[0;32m~/.local/lib/python3.10/site-packages/requests/models.py:368\u001b[0m, in \u001b[0;36mPreparedRequest.prepare\u001b[0;34m(self, method, url, headers, files, data, params, auth, cookies, hooks, json)\u001b[0m\n\u001b[1;32m 365\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Prepares the entire request with the given parameters.\"\"\"\u001b[39;00m\n\u001b[1;32m 367\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mprepare_method(method)\n\u001b[0;32m--> 368\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mprepare_url\u001b[49m\u001b[43m(\u001b[49m\u001b[43murl\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparams\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 369\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mprepare_headers(headers)\n\u001b[1;32m 370\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mprepare_cookies(cookies)\n", "File \u001b[0;32m~/.local/lib/python3.10/site-packages/requests/models.py:439\u001b[0m, in \u001b[0;36mPreparedRequest.prepare_url\u001b[0;34m(self, url, params)\u001b[0m\n\u001b[1;32m 436\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m InvalidURL(\u001b[38;5;241m*\u001b[39me\u001b[38;5;241m.\u001b[39margs)\n\u001b[1;32m 438\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m scheme:\n\u001b[0;32m--> 439\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m MissingSchema(\n\u001b[1;32m 440\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInvalid URL \u001b[39m\u001b[38;5;132;01m{\u001b[39;00murl\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m: No scheme supplied. \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 441\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mPerhaps you meant https://\u001b[39m\u001b[38;5;132;01m{\u001b[39;00murl\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m?\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 442\u001b[0m )\n\u001b[1;32m 444\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m host:\n\u001b[1;32m 445\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m InvalidURL(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInvalid URL \u001b[39m\u001b[38;5;132;01m{\u001b[39;00murl\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m: No host supplied\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", "\u001b[0;31mMissingSchema\u001b[0m: Invalid URL 'www.cnn.com': No scheme supplied. Perhaps you meant https://www.cnn.com?" ] } ], "source": [ "fail = requests.get('www.cnn.com')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Requesting with Data (POST)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "url = 'http://pocketquery.csb.pitt.edu/pocket.cgi'\n", "values = {'json' : '{\"pdbid_text\":[\"1ycr\"],\"start\":0,\"num\":24,\"sort\":\"score\",\"dir\":\"desc\",\"cmd\":\"cluster\"}'}\n", "\n", "data = requests.post(url,data=values)\n", "print(data)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "the_page = data.text" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['1YCR,B,178,2,5.856,19;23,PHE;TRP,-6.31,-12.62,-6.61,-6.01,5.20125,10.4025,4.7898,5.6127,133.9,267.8,131.1,136.7,72.3,144.6,64.7,79.9,0,0,0,0,0,0,0,0,0.972816',\n", " '1YCR,B,129,1,0,19,PHE,-6.61,-6.61,-6.61,-6.61,5.6127,5.6127,5.6127,5.6127,131.1,131.1,131.1,131.1,79.9,79.9,79.9,79.9,0,0,0,0,0,0,0,0,0.96009']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ " the_page.split('\\n')[16:18]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Requesting with Data (GET)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RCSB PDB - 1YCR: MDM2 BOUND TO THE TRANSACTIVATION DOMAIN OF P53
\"RCSB
210,836 Structures from the PDB
1,068,577 Computed Structure Models (CSM)
  • \"PDB-101\"
  • \"wwPDB\"
  • \"EMDataResource\"
  • \"NAKB:
  • \"wwPDB
  • \"PDB-Dev

 1YCR

MDM2 BOUND TO THE TRANSACTIVATION DOMAIN OF P53


Experimental Data Snapshot

  • Method: X-RAY DIFFRACTION
  • Resolution: 2.60 Å
  • R-Value Free: 0.276 
  • R-Value Work: 0.200 
  • R-Value Observed: 0.200 

wwPDB Validation   3D Report Full Report


This is version 1.4 of the entry. See complete history


Literature

Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain.

Kussie, P.H.Gorina, S.Marechal, V.Elenbaas, B.Moreau, J.Levine, A.J.Pavletich, N.P.

(1996) Science 274: 948-953

  • PubMed8875929 Search on PubMed
  • DOI: https://doi.org/10.1126/science.274.5289.948
  • Primary Citation of Related Structures:  
    1YCQ, 1YCR

  • PubMed Abstract: 

    The MDM2 oncoprotein is a cellular inhibitor of the p53 tumor suppressor in that it can bind the transactivation domain of p53 and downregulate its ability to activate transcription. In certain cancers, MDM2 amplification is a common event and contributes to the inactivation of p53. The crystal structure of the 109-residue amino-terminal domain of MDM2 bound to a 15-residue transactivation domain peptide of p53 revealed that MDM2 has a deep hydrophobic cleft on which the p53 peptide binds as an amphipathic alpha helix. The interface relies on the steric complementarity between the MDM2 cleft and the hydrophobic face of the p53 alpha helix and, in particular, on a triad of p53 amino acids-Phe19, Trp23, and Leu26-which insert deep into the MDM2 cleft. These same p53 residues are also involved in transactivation, supporting the hypothesis that MDM2 inactivates p53 by concealing its transactivation domain. The structure also suggests that the amphipathic alpha helix may be a common structural motif in the binding of a diverse family of transactivation factors to the TATA-binding protein-associated factors.


  • Organizational Affiliation

    Cellular Biochemistry and Biophysics Program, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA. nikola@xray2.mskcc.org

Asymmetric Unit
Biological Assembly 1  

 3D ViewStructure | 1D-3D View | Validation Report


Global Symmetry: Asymmetric - C1 
Global Stoichiometry: Hetero 2-mer - A1B1 


Find Similar Assemblies

Biological assembly 1 assigned by authors and generated by PISA (software)

PreviousNext

Macromolecule Content

  • Total Structure Weight: 14.34 kDa 
  • Atom Count: 818 
  • Modelled Residue Count: 98 
  • Deposited Residue Count: 124 
  • Unique protein chains: 2

Macromolecules
Find similar proteins by:  (by identity cutoff)  |  3D Structure
Entity ID: 1
MoleculeChainsSequence LengthOrganismDetailsImage
MDM2109Homo sapiensMutation(s): 0 
Gene Names: MDM2
EC: 2.3.2.27
UniProt & NIH Common Fund Data Resources
Find proteins for Q00987 (Homo sapiens)
Explore Q00987 
Go to UniProtKB:  Q00987
PHAROS:  Q00987
Entity Groups\n", " 
Sequence Clusters30% Identity50% Identity70% Identity90% Identity95% Identity100% Identity
UniProt GroupQ00987
Protein Feature View
Expand
  • Reference Sequence

Find similar proteins by:  Sequence   |   3D Structure  

Entity ID: 2
MoleculeChainsSequence LengthOrganismDetailsImage
P5315N/AMutation(s): 0 
UniProt & NIH Common Fund Data Resources
Find proteins for P04637 (Homo sapiens)
Explore P04637 
Go to UniProtKB:  P04637
PHAROS:  P04637
Entity Groups\n", " 
Sequence Clusters30% Identity50% Identity70% Identity90% Identity95% Identity100% Identity
UniProt GroupP04637
Protein Feature View
Expand
  • Reference Sequence
Experimental Data & Validation

Experimental Data

  • Method: X-RAY DIFFRACTION
  • Resolution: 2.60 Å
  • R-Value Free: 0.276 
  • R-Value Work: 0.200 
  • R-Value Observed: 0.200 
  • Space Group: C 2 2 21
Unit Cell:
Length ( Å )Angle ( ˚ )
a = 43.414α = 90
b = 100.546β = 90
c = 54.853γ = 90
Software Package:
Software NamePurpose
X-PLORrefinement
TNTrefinement
X-PLORmodel building
HKLdata reduction
X-PLORphasing

Structure Validation

View Full Validation Report



Entry History 

Deposition Data

Revision History  (Full details and data files)

  • Version 1.0: 1997-11-19
    Type: Initial release
  • Version 1.1: 2008-03-24
    Changes: Version format compliance
  • Version 1.2: 2011-07-13
    Changes: Version format compliance
  • Version 1.3: 2019-07-17
    Changes: Data collection, Other, Refinement description
  • Version 1.4: 2019-08-14
    Changes: Data collection
\n" ] } ], "source": [ "values = {'structureId' : '1ycr'}\n", "data = requests.get('http://www.pdb.org/pdb/explore/explore.do',values)\n", "\n", "print(data.text)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Encoding Data\n", "\n", "URLs are only allowed to have alphanumeric characters (and a handful of punctuation marks). This means data needs to be encoded when passing it as a value. requests will do this for you if you pass values as a dictionary." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "values = {'q':\"What's the meaning of life?\"}\n", "data = requests.get('http://www.google.com/search',values)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'http://www.google.com/search?q=What%27s+the+meaning+of+life%3F'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.url" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Faking It\n", "\n", "\n", "\n", "...or a python script" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "['python-requests 2', 'python-requests 2']" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import requests, re\n", "page = requests.get('https://www.whatsmybrowser.org/').text\n", "re.findall(r'You’re using (.*?)\\.',page)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Chrome 61', 'Chrome 61']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "page = requests.get('https://www.whatsmybrowser.org/',headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}).text\n", "re.findall(r'You’re using (.*?)\\.',page)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Use `mechanize` if a site is giving you trouble**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Getting Data From the Web" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If it's on the web, you can get it into python.\n", "\n", "* *screen scraping* - getting data from your computer screen; often to the extreme of using screen shots and OCR\n", "* *web scraping* - downloading HTML content and parsing out what you need\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Web Scraping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The advantage of web scraping is that it always works - if you can see the data in your browser, you can see it in python.\n", "\n", "Disadvantages:\n", "\n", "* HTMLParser isn't very sophisticated and requires you to manage context\n", " * see [pyquery](https://pypi.python.org/pypi/pyquery) for a more powerful HTML parsing package inspired by JQuery\n", " * [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is another popular library for extracting information from HTML\n", " \n", "* Scraping is slow and inefficient\n", " * may need to make several requests to get your data (think pagination)\n", " * downloads a lot more than just your data (html)\n", " * your code may be easily broken by changes in the website design\n", " \n", "**Only parse raw HTML if there is no better option**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# ReSTful Web Services" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "REpresentational State Transfer\n", "\n", "* Client–server\n", "* Stateless\n", "* Cacheable\n", "* Layered system\n", "* Uniform interface\n", "\n", " * basically, resources are identified by their url\n", " * ReST does *not* specify the format of the resources, but they are often provided in **XML**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# XML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Extensible Markup Language*. A very general way to express structured data; a generalization of HTML.\n", "\n", "* **Tag** Key building block of XML - starts and ends with < >\n", " * `
` start tag\n", " * `
` end tag\n", " * `
` empty element tag (not matching end)\n", " \n", "* **Element** A component of the document; everything between a start and end tag. May contain child elements.\n", "\n", "* **Attribute** A name-value pair within a start or empty element tag\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# XML Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's consider accessing NCBI's Entrez service to get data from the Gene Expression Omnibus (note we could use BioPython and avoid doing this directly).\n", "\n", "Let's look for data from humans with between 100 and 500 samples:\n", "\n", "* (human[Organism]) AND 100:500[Number of Samples]\n", "\n", "We construct a URL to represent this query.\n", "\n", "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gds&term=human[Organism]+AND+100:500[Number+of+Samples]\n", "\n", "db=gds - GEO datasets \n", "\n", "* GEO DataSets is a study-level database which users can search for studies relevant to their interests. The database stores descriptions of all original submitter-supplied records, as well as curated DataSets.\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "6651200\n", "200245735\n", "200241425\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `xml.etree.ElementTree`" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "eSearchResult {}\n" ] } ], "source": [ "import xml.etree.ElementTree as ET\n", "root = ET.fromstring(result)\n", "print(root.tag,root.attrib)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Count\n", "RetMax\n", "RetStart\n", "IdList\n", "TranslationSet\n", "TranslationStack\n", "QueryTranslation\n" ] } ], "source": [ "for child in root:\n", " print(child.tag)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "root.find('IdList')" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['200245735', '200241425', '200181711', '200181709', '200245630', '200241428', '200232216', '200231719', '200245626', '200240155', '200222110', '200185796', '200233715', '200227832', '200226400', '200211631', '200211158', '200131747', '200237999', '200225817']\n" ] } ], "source": [ "ids = [child.text for child in root.find('IdList')]\n", "print(ids)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "root.findall('Id') #find only considers children" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['200245735', '200241425', '200181711', '200181709', '200245630', '200241428', '200232216', '200231719', '200245626', '200240155', '200222110', '200185796', '200233715', '200227832', '200226400', '200211631', '200211158', '200131747', '200237999', '200225817']\n" ] } ], "source": [ "ids = [elem.text for elem in root.iter('Id')] #iter considers the full tree\n", "print(ids)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# XML Parsing Alternative: Regular Expressions" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "6651200\n", "200245735\n", "200241425\n", "\n" ] } ], "source": [ "import re\n", "regex = re.compile(r'(\\d+)')\n", "print(regex.search(result))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['200245735', '200241425', '200181711', '200181709', '200245630', '200241428', '200232216', '200231719', '200245626', '200240155', '200222110', '200185796', '200233715', '200227832', '200226400', '200211631', '200211158', '200131747', '200237999', '200225817']\n" ] } ], "source": [ "ids = [m.group(1) for m in regex.finditer(result)]\n", "print(ids)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# JSON: JavaScript Object Notation\n", "\n", "A lightweight data-interchange format. \n", "\n", "Essentially, represent data as JavaScript, which is very similar to python dictionaries/lists." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import json" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'{\"a\": 1, \"b\": [1.2, 1.3, 1.4]}'" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json.dumps({'a':1, 'b':[1.2,1.3,1.4]})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# PDB Restful\n", "\n", "https://data.rcsb.org/redoc/index.html\n", "\n", "Access PDB data through endpoints (URLs). The path of the endpoints starts with https://data.rcsb.org/rest/v1/core, followed by the type of the resource, e.g. entry, polymer_entity, and the identifier. " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"audit_author\":[{\"name\":\"Fermi, G.\",\"pdbx_ordinal\":1},{\"name\":\"Perutz, M.F.\",\"pdbx_ordinal\":2}],\"cell\":{\"angle_alpha\":90.0,\"angle_beta\":99.34,\"angle_gamma\":90.0,\"length_a\":63.15,\"length_b\":83.59,\"length_c\":53.8,\"zpdb\":4},\"citation\":[{\"country\":\"UK\",\"id\":\"primary\",\"journal_abbrev\":\"J.Mol.Biol.\",\"journal_id_astm\":\"JMOBAK\",\"journal_id_csd\":\"0070\",\"journal_id_issn\":\"0022-2836\",\"journal_volume\":\"175\",\"page_first\":\"159\",\"page_last\":\"174\",\"pdbx_database_id_doi\":\"10.1016/0022-2836(84)90472-8\",\"pdbx_database_id_pub_med\":6726807,\"rcsb_authors\":[\"Fermi, G.\",\"Perutz, M.F.\",\"Shaanan, B.\",\"Fourme, R.\"],\"rcsb_is_primary\":\"Y\",\"rcsb_journal_abbrev\":\"J Mol Biol\",\"title\":\"The crystal structure of human deoxyhaemoglobin at 1.74 A resolution\",\"year\":1984},{\"country\":\"UK\",\"id\":\"1\",\"journal_abbrev\":\"Nature\",\"journal_id_astm\":\"NATUAS\",\"journal_id_csd\":\"0006\",\"journal_id_issn\":\"0028-0836\",\"journal_volume\":\"295\",\"page_first\":\"535\",\"rcsb_authors\":[\"Perutz, M.F.\",\"Hasnain, S.S.\",\"Duke, P.J.\",\"Sessler, J.L.\",\"Hahn, J.E.\"],\"rcsb_is_primary\":\"N\",\"rcsb_journal_abbrev\":\"Nature\",\"title\":\"Stereochemistry of Iron in Deoxyhaemoglobin\",\"year\":1982},{\"country\":\"US\",\"id\":\"3\",\"journal_abbrev\":\"Annu.Rev.Biochem.\",\"journal_id_astm\":\"ARBOAW\",\"journal_id_csd\":\"0413\",\"journal_id_issn\":\"0066-4154\",\"journal_volume\":\"48\",\"page_first\":\"327\",\"rcsb_authors\":[\"Perutz, M.F.\"],\"rcsb_is_primary\":\"N\",\"rcsb_journal_abbrev\":\"Annu Rev Biochem\",\"title\":\"Regulation of Oxygen Affinity of Hemoglobin. Influence of Structure of the Globin on the Heme Iron\",\"year\":1979},{\"country\":\"UK\",\"id\":\"4\",\"journal_abbrev\":\"J.Mol.Biol.\",\"journal_id_astm\":\"JMOBAK\",\"journal_id_csd\":\"0070\",\"journal_id_issn\":\"0022-2836\",\"journal_volume\":\"100\",\"page_first\":\"3\",\"rcsb_authors\":[\"Teneyck, L.F.\",\"Arnone, A.\"],\"rcsb_is_primary\":\"N\",\"rcsb_journal_abbrev\":\"J Mol Biol\",\"title\":\"Three-Dimensional Fourier Synthesis of Human Deoxyhemoglobin at 2.5 Angstroms Resolution, I.X-Ray Analysis\",\"year\":1976},{\"country\":\"UK\",\"id\":\"5\",\"journal_abbrev\":\"J.Mol.Biol.\",\"journal_id_astm\":\"JMOBAK\",\"journal_id_csd\":\"0070\",\"journal_id_issn\":\"0022-2836\",\"journal_volume\":\"97\",\"page_first\":\"237\",\"rcsb_authors\":[\"Fermi, G.\"],\"rcsb_is_primary\":\"N\",\"rcsb_journal_abbrev\":\"J Mol Biol\",\"title\":\"Three-Dimensional Fourier Synthesis of Human Deoxyhaemoglobin at 2.5 Angstroms Resolution, Refinement of the Atomic Model\",\"year\":1975},{\"country\":\"UK\",\"id\":\"6\",\"journal_abbrev\":\"Nature\",\"journal_id_astm\":\"NATUAS\",\"journal_id_csd\":\"0006\",\"journal_id_issn\":\"0028-0836\",\"journal_volume\":\"228\",\"page_first\":\"516\",\"rcsb_authors\":[\"Muirhead, H.\",\"Greer, J.\"],\"rcsb_is_primary\":\"N\",\"rcsb_journal_abbrev\":\"Nature\",\"title\":\"Three-Dimensional Fourier Synthesis of Human Deoxyhaemoglobin at 3.5 Angstroms Resolution\",\"year\":1970},{\"book_publisher\":\"Oxford University Press\",\"id\":\"2\",\"journal_abbrev\":\"Haemoglobin and Myoglobin. Atlas of Molecular Structures in Biology\",\"journal_id_csd\":\"0986\",\"journal_id_issn\":\"0-19-854706-4\",\"journal_volume\":\"2\",\"rcsb_authors\":[\"Fermi, G.\",\"Perutz, M.F.\"],\"rcsb_is_primary\":\"N\",\"rcsb_journal_abbrev\":\"Haemoglobin And Myoglobin Atlas Of Molecular Structures In Biology\",\"year\":1981},{\"book_publisher\":\"National Biomedical Research Foundation, Silver Spring,Md.\",\"id\":\"7\",\"journal_abbrev\":\"Atlas of Protein Sequence and Structure (Data Section)\",\"journal_id_csd\":\"0435\",\"journal_id_issn\":\"0-912466-02-2\",\"journal_volume\":\"5\",\"page_first\":\"56\",\"rcsb_is_primary\":\"N\",\"rcsb_journal_abbrev\":\"Atlas Of Protein Sequence And Structure (data Section)\",\"year\":1972},{\"book_publisher\":\"National Biomedical Research Foundation, Silver Spring,Md.\",\"id\":\"8\",\"journal_abbrev\":\"Atlas of Protein Sequence and Structure (Data Section)\",\"journal_id_csd\":\"0435\",\"journal_id_issn\":\"0-912466-02-2\",\"journal_volume\":\"5\",\"page_first\":\"64\",\"rcsb_is_primary\":\"N\",\"rcsb_journal_abbrev\":\"Atlas Of Protein Sequence And Structure (data Section)\",\"year\":1972}],\"diffrn\":[{\"crystal_id\":\"1\",\"id\":\"1\"}],\"entry\":{\"id\":\"4HHB\"},\"exptl\":[{\"method\":\"X-RAY DIFFRACTION\"}],\"exptl_crystal\":[{\"density_matthews\":2.26,\"density_percent_sol\":45.48,\"id\":\"1\"}],\"pdbx_audit_revision_category\":[{\"category\":\"atom_site\",\"data_content_type\":\"Structure model\",\"ordinal\":1,\"revision_ordinal\":4},{\"category\":\"database_PDB_caveat\",\"data_content_type\":\"Structure model\",\"ordinal\":2,\"revision_ordinal\":4},{\"category\":\"entity\",\"data_content_type\":\"Structure model\",\"ordinal\":3,\"revision_ordinal\":4},{\"category\":\"entity_name_com\",\"data_content_type\":\"Structure model\",\"ordinal\":4,\"revision_ordinal\":4},{\"category\":\"entity_src_gen\",\"data_content_type\":\"Structure model\",\"ordinal\":5,\"revision_ordinal\":4},{\"category\":\"pdbx_database_status\",\"data_content_type\":\"Structure model\",\"ordinal\":6,\"revision_ordinal\":4},{\"category\":\"pdbx_validate_rmsd_angle\",\"data_content_type\":\"Structure model\",\"ordinal\":7,\"revision_ordinal\":4},{\"category\":\"pdbx_validate_rmsd_bond\",\"data_content_type\":\"Structure model\",\"ordinal\":8,\"revision_ordinal\":4},{\"category\":\"struct_ref\",\"data_content_type\":\"Structure model\",\"ordinal\":9,\"revision_ordinal\":4},{\"category\":\"struct_ref_seq\",\"data_content_type\":\"Structure model\",\"ordinal\":10,\"revision_ordinal\":4},{\"category\":\"atom_site\",\"data_content_type\":\"Structure model\",\"ordinal\":11,\"revision_ordinal\":5},{\"category\":\"pdbx_validate_rmsd_angle\",\"data_content_type\":\"Structure model\",\"ordinal\":12,\"revision_ordinal\":5},{\"category\":\"pdbx_validate_rmsd_bond\",\"data_content_type\":\"Structure model\",\"ordinal\":13,\"revision_ordinal\":5},{\"category\":\"struct_site\",\"data_content_type\":\"Structure model\",\"ordinal\":14,\"revision_ordinal\":5},{\"category\":\"atom_site\",\"data_content_type\":\"Structure model\",\"ordinal\":15,\"revision_ordinal\":6},{\"category\":\"atom_sites\",\"data_content_type\":\"Structure model\",\"ordinal\":16,\"revision_ordinal\":6},{\"category\":\"database_2\",\"data_content_type\":\"Structure model\",\"ordinal\":17,\"revision_ordinal\":6},{\"category\":\"database_PDB_matrix\",\"data_content_type\":\"Structure model\",\"ordinal\":18,\"revision_ordinal\":6},{\"category\":\"pdbx_struct_conn_angle\",\"data_content_type\":\"Structure model\",\"ordinal\":19,\"revision_ordinal\":6},{\"category\":\"pdbx_validate_close_contact\",\"data_content_type\":\"Structure model\",\"ordinal\":20,\"revision_ordinal\":6},{\"category\":\"pdbx_validate_main_chain_plane\",\"data_content_type\":\"Structure model\",\"ordinal\":21,\"revision_ordinal\":6},{\"category\":\"pdbx_validate_peptide_omega\",\"data_content_type\":\"Structure model\",\"ordinal\":22,\"revision_ordinal\":6},{\"category\":\"pdbx_validate_planes\",\"data_content_type\":\"Structure model\",\"ordinal\":23,\"revision_ordinal\":6},{\"category\":\"pdbx_validate_polymer_linkage\",\"data_content_type\":\"Structure model\",\"ordinal\":24,\"revision_ordinal\":6},{\"category\":\"pdbx_validate_rmsd_angle\",\"data_content_type\":\"Structure model\",\"ordinal\":25,\"revision_ordinal\":6},{\"category\":\"pdbx_validate_rmsd_bond\",\"data_content_type\":\"Structure model\",\"ordinal\":26,\"revision_ordinal\":6},{\"category\":\"pdbx_validate_torsion\",\"data_content_type\":\"Structure model\",\"ordinal\":27,\"revision_ordinal\":6},{\"category\":\"struct_ncs_oper\",\"data_content_type\":\"Structure model\",\"ordinal\":28,\"revision_ordinal\":6},{\"category\":\"pdbx_database_remark\",\"data_content_type\":\"Structure model\",\"ordinal\":29,\"revision_ordinal\":7}],\"pdbx_audit_revision_details\":[{\"data_content_type\":\"Structure model\",\"ordinal\":1,\"provider\":\"repository\",\"revision_ordinal\":1,\"type\":\"Initial release\"},{\"data_content_type\":\"Structure model\",\"details\":\"Coordinates and associated ncs operations (if present) transformed into standard crystal frame\",\"ordinal\":2,\"provider\":\"repository\",\"revision_ordinal\":6,\"type\":\"Remediation\"}],\"pdbx_audit_revision_group\":[{\"data_content_type\":\"Structure model\",\"group\":\"Version format compliance\",\"ordinal\":1,\"revision_ordinal\":2},{\"data_content_type\":\"Structure model\",\"group\":\"Advisory\",\"ordinal\":2,\"revision_ordinal\":3},{\"data_content_type\":\"Structure model\",\"group\":\"Version format compliance\",\"ordinal\":3,\"revision_ordinal\":3},{\"data_content_type\":\"Structure model\",\"group\":\"Advisory\",\"ordinal\":4,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"group\":\"Atomic model\",\"ordinal\":5,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"group\":\"Data collection\",\"ordinal\":6,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"group\":\"Database references\",\"ordinal\":7,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"group\":\"Other\",\"ordinal\":8,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"group\":\"Source and taxonomy\",\"ordinal\":9,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"group\":\"Structure summary\",\"ordinal\":10,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"group\":\"Atomic model\",\"ordinal\":11,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"group\":\"Data collection\",\"ordinal\":12,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"group\":\"Derived calculations\",\"ordinal\":13,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"group\":\"Advisory\",\"ordinal\":14,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"group\":\"Atomic model\",\"ordinal\":15,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"group\":\"Data collection\",\"ordinal\":16,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"group\":\"Database references\",\"ordinal\":17,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"group\":\"Derived calculations\",\"ordinal\":18,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"group\":\"Other\",\"ordinal\":19,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"group\":\"Refinement description\",\"ordinal\":20,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"group\":\"Advisory\",\"ordinal\":21,\"revision_ordinal\":7}],\"pdbx_audit_revision_history\":[{\"data_content_type\":\"Structure model\",\"major_revision\":1,\"minor_revision\":0,\"ordinal\":1,\"revision_date\":\"1984-07-17T00:00:00+0000\"},{\"data_content_type\":\"Structure model\",\"major_revision\":1,\"minor_revision\":1,\"ordinal\":2,\"revision_date\":\"2008-03-03T00:00:00+0000\"},{\"data_content_type\":\"Structure model\",\"major_revision\":1,\"minor_revision\":2,\"ordinal\":3,\"revision_date\":\"2011-07-13T00:00:00+0000\"},{\"data_content_type\":\"Structure model\",\"major_revision\":2,\"minor_revision\":0,\"ordinal\":4,\"revision_date\":\"2020-06-17T00:00:00+0000\"},{\"data_content_type\":\"Structure model\",\"major_revision\":3,\"minor_revision\":0,\"ordinal\":5,\"revision_date\":\"2021-03-31T00:00:00+0000\"},{\"data_content_type\":\"Structure model\",\"major_revision\":4,\"minor_revision\":0,\"ordinal\":6,\"revision_date\":\"2023-02-08T00:00:00+0000\"},{\"data_content_type\":\"Structure model\",\"major_revision\":4,\"minor_revision\":1,\"ordinal\":7,\"revision_date\":\"2023-03-15T00:00:00+0000\"}],\"pdbx_audit_revision_item\":[{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.B_iso_or_equiv\",\"ordinal\":1,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_x\",\"ordinal\":2,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_y\",\"ordinal\":3,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_z\",\"ordinal\":4,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_entity.pdbx_description\",\"ordinal\":5,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_entity_src_gen.gene_src_common_name\",\"ordinal\":6,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_entity_src_gen.pdbx_beg_seq_num\",\"ordinal\":7,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_entity_src_gen.pdbx_end_seq_num\",\"ordinal\":8,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_entity_src_gen.pdbx_gene_src_gene\",\"ordinal\":9,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_entity_src_gen.pdbx_seq_type\",\"ordinal\":10,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_database_status.process_site\",\"ordinal\":11,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_rmsd_angle.angle_deviation\",\"ordinal\":12,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_rmsd_angle.angle_value\",\"ordinal\":13,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_rmsd_bond.bond_deviation\",\"ordinal\":14,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_rmsd_bond.bond_value\",\"ordinal\":15,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ref.pdbx_align_begin\",\"ordinal\":16,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ref_seq.db_align_beg\",\"ordinal\":17,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ref_seq.db_align_end\",\"ordinal\":18,\"revision_ordinal\":4},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.B_iso_or_equiv\",\"ordinal\":19,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_x\",\"ordinal\":20,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_y\",\"ordinal\":21,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_z\",\"ordinal\":22,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_rmsd_bond.bond_deviation\",\"ordinal\":23,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_rmsd_bond.bond_value\",\"ordinal\":24,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_site.pdbx_auth_asym_id\",\"ordinal\":25,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_site.pdbx_auth_comp_id\",\"ordinal\":26,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_site.pdbx_auth_seq_id\",\"ordinal\":27,\"revision_ordinal\":5},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_x\",\"ordinal\":28,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_y\",\"ordinal\":29,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_site.Cartn_z\",\"ordinal\":30,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[1][1]\",\"ordinal\":31,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[1][2]\",\"ordinal\":32,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[1][3]\",\"ordinal\":33,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[2][1]\",\"ordinal\":34,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[2][2]\",\"ordinal\":35,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[2][3]\",\"ordinal\":36,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[3][1]\",\"ordinal\":37,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[3][2]\",\"ordinal\":38,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_matrix[3][3]\",\"ordinal\":39,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_vector[1]\",\"ordinal\":40,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_vector[2]\",\"ordinal\":41,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_atom_sites.fract_transf_vector[3]\",\"ordinal\":42,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_2.pdbx_DOI\",\"ordinal\":43,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_2.pdbx_database_accession\",\"ordinal\":44,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[1][1]\",\"ordinal\":45,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[1][2]\",\"ordinal\":46,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[1][3]\",\"ordinal\":47,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[2][1]\",\"ordinal\":48,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[2][2]\",\"ordinal\":49,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[2][3]\",\"ordinal\":50,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[3][1]\",\"ordinal\":51,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[3][2]\",\"ordinal\":52,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx[3][3]\",\"ordinal\":53,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx_vector[1]\",\"ordinal\":54,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx_vector[2]\",\"ordinal\":55,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_database_PDB_matrix.origx_vector[3]\",\"ordinal\":56,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_struct_conn_angle.value\",\"ordinal\":57,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_close_contact.dist\",\"ordinal\":58,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_peptide_omega.omega\",\"ordinal\":59,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_planes.rmsd\",\"ordinal\":60,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_polymer_linkage.dist\",\"ordinal\":61,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_torsion.phi\",\"ordinal\":62,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_pdbx_validate_torsion.psi\",\"ordinal\":63,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[1][1]\",\"ordinal\":64,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[1][2]\",\"ordinal\":65,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[1][3]\",\"ordinal\":66,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[2][1]\",\"ordinal\":67,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[2][2]\",\"ordinal\":68,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[2][3]\",\"ordinal\":69,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[3][1]\",\"ordinal\":70,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[3][2]\",\"ordinal\":71,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.matrix[3][3]\",\"ordinal\":72,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.vector[1]\",\"ordinal\":73,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.vector[2]\",\"ordinal\":74,\"revision_ordinal\":6},{\"data_content_type\":\"Structure model\",\"item\":\"_struct_ncs_oper.vector[3]\",\"ordinal\":75,\"revision_ordinal\":6}],\"pdbx_database_pdbobs_spr\":[{\"date\":\"1984-07-17T00:00:00+0000\",\"id\":\"SPRSDE\",\"pdb_id\":\"4HHB\",\"replace_pdb_id\":\"1HHB\"}],\"pdbx_database_related\":[{\"content_type\":\"unspecified\",\"db_id\":\"2HHB\",\"db_name\":\"PDB\",\"details\":\"REFINED BY THE METHOD OF JACK AND LEVITT. THIS\\n ENTRY PRESENTS THE BEST ESTIMATE OF THE\\n COORDINATES.\"},{\"content_type\":\"unspecified\",\"db_id\":\"3HHB\",\"db_name\":\"PDB\",\"details\":\"SYMMETRY AVERAGED ABOUT THE (NON-CRYSTALLOGRAPHIC)\\n MOLECULAR AXIS AND THEN RE-REGULARIZED BY THE\\n ENERGY REFINEMENT METHOD OF LEVITT. THIS ENTRY\\n PRESENTS COORDINATES THAT ARE ADEQUATE FOR MOST\\n PURPOSES, SUCH AS COMPARISON WITH OTHER STRUCTURES.\"},{\"content_type\":\"unspecified\",\"db_id\":\"1GLI\",\"db_name\":\"PDB\"}],\"pdbx_database_status\":{\"pdb_format_compatible\":\"Y\",\"process_site\":\"BNL\",\"recvd_initial_deposition_date\":\"1984-03-07T00:00:00+0000\",\"status_code\":\"REL\"},\"rcsb_accession_info\":{\"deposit_date\":\"1984-03-07T00:00:00+0000\",\"has_released_experimental_data\":\"N\",\"initial_release_date\":\"1984-07-17T00:00:00+0000\",\"major_revision\":4,\"minor_revision\":1,\"revision_date\":\"2023-03-15T00:00:00+0000\",\"status_code\":\"REL\"},\"rcsb_entry_container_identifiers\":{\"assembly_ids\":[\"1\"],\"entity_ids\":[\"1\",\"2\",\"3\",\"4\",\"5\"],\"entry_id\":\"4HHB\",\"model_ids\":[1],\"non_polymer_entity_ids\":[\"3\",\"4\"],\"polymer_entity_ids\":[\"1\",\"2\"],\"rcsb_id\":\"4HHB\",\"pubmed_id\":6726807},\"rcsb_entry_info\":{\"assembly_count\":1,\"branched_entity_count\":0,\"cis_peptide_count\":0,\"deposited_atom_count\":4779,\"deposited_hydrogen_atom_count\":0,\"deposited_model_count\":1,\"deposited_modeled_polymer_monomer_count\":574,\"deposited_nonpolymer_entity_instance_count\":6,\"deposited_polymer_entity_instance_count\":4,\"deposited_polymer_monomer_count\":574,\"deposited_solvent_atom_count\":221,\"deposited_unmodeled_polymer_monomer_count\":0,\"disulfide_bond_count\":0,\"entity_count\":5,\"experimental_method\":\"X-ray\",\"experimental_method_count\":1,\"inter_mol_covalent_bond_count\":0,\"inter_mol_metalic_bond_count\":4,\"molecular_weight\":64.74,\"na_polymer_entity_types\":\"Other\",\"nonpolymer_bound_components\":[\"HEM\"],\"nonpolymer_entity_count\":2,\"nonpolymer_molecular_weight_maximum\":0.62,\"nonpolymer_molecular_weight_minimum\":0.09,\"polymer_composition\":\"heteromeric protein\",\"polymer_entity_count\":2,\"polymer_entity_count_dna\":0,\"polymer_entity_count_rna\":0,\"polymer_entity_count_nucleic_acid\":0,\"polymer_entity_count_nucleic_acid_hybrid\":0,\"polymer_entity_count_protein\":2,\"polymer_entity_taxonomy_count\":2,\"polymer_molecular_weight_maximum\":15.89,\"polymer_molecular_weight_minimum\":15.15,\"polymer_monomer_count_maximum\":146,\"polymer_monomer_count_minimum\":141,\"resolution_combined\":[1.74],\"selected_polymer_entity_types\":\"Protein (only)\",\"solvent_entity_count\":1,\"structure_determination_methodology\":\"experimental\",\"structure_determination_methodology_priority\":10,\"diffrn_resolution_high\":{\"provenance_source\":\"From refinement resolution cutoff\",\"value\":1.74}},\"rcsb_primary_citation\":{\"country\":\"UK\",\"id\":\"primary\",\"journal_abbrev\":\"J.Mol.Biol.\",\"journal_id_astm\":\"JMOBAK\",\"journal_id_csd\":\"0070\",\"journal_id_issn\":\"0022-2836\",\"journal_volume\":\"175\",\"page_first\":\"159\",\"page_last\":\"174\",\"pdbx_database_id_doi\":\"10.1016/0022-2836(84)90472-8\",\"pdbx_database_id_pub_med\":6726807,\"rcsb_orcididentifiers\":[\"?\",\"?\",\"?\",\"?\"],\"rcsb_authors\":[\"Fermi, G.\",\"Perutz, M.F.\",\"Shaanan, B.\",\"Fourme, R.\"],\"rcsb_journal_abbrev\":\"J Mol Biol\",\"title\":\"The crystal structure of human deoxyhaemoglobin at 1.74 A resolution\",\"year\":1984},\"refine\":[{\"details\":\"THE COORDINATES GIVEN HERE ARE IN THE ORTHOGONAL ANGSTROM\\nSYSTEM STANDARD FOR HEMOGLOBINS. THE Y AXIS IS THE\\n(NON CRYSTALLOGRAPHIC) MOLECULAR DIAD AND THE X AXIS IS THE\\nPSEUDO DIAD WHICH RELATES THE ALPHA-1 AND BETA-1 CHAINS.\\nTHE TRANSFORMATION GIVEN IN THE *MTRIX* RECORDS BELOW\\nWILL GENERATE COORDINATES FOR THE *C* AND *D* CHAINS FROM\\nTHE *A* AND *B* CHAINS RESPECTIVELY.\",\"ls_rfactor_rwork\":0.135,\"ls_dres_high\":1.74,\"pdbx_diffrn_id\":[\"1\"],\"pdbx_refine_id\":\"X-RAY DIFFRACTION\"}],\"refine_hist\":[{\"cycle_id\":\"LAST\",\"d_res_high\":1.74,\"number_atoms_solvent\":221,\"number_atoms_total\":4779,\"pdbx_number_atoms_ligand\":174,\"pdbx_number_atoms_nucleic_acid\":0,\"pdbx_number_atoms_protein\":4384,\"pdbx_refine_id\":\"X-RAY DIFFRACTION\"}],\"struct\":{\"title\":\"THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RESOLUTION\"},\"struct_keywords\":{\"pdbx_keywords\":\"OXYGEN TRANSPORT\",\"text\":\"OXYGEN TRANSPORT\"},\"symmetry\":{\"int_tables_number\":4,\"space_group_name_hm\":\"P 1 21 1\"},\"rcsb_id\":\"4HHB\"}\n" ] } ], "source": [ "response = requests.get('https://data.rcsb.org/rest/v1/core/entry/4hhb')\n", "print(response.text)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "{'audit_author': [{'name': 'Fermi, G.', 'pdbx_ordinal': 1},\n", " {'name': 'Perutz, M.F.', 'pdbx_ordinal': 2}],\n", " 'cell': {'angle_alpha': 90.0,\n", " 'angle_beta': 99.34,\n", " 'angle_gamma': 90.0,\n", " 'length_a': 63.15,\n", " 'length_b': 83.59,\n", " 'length_c': 53.8,\n", " 'zpdb': 4},\n", " 'citation': [{'country': 'UK',\n", " 'id': 'primary',\n", " 'journal_abbrev': 'J.Mol.Biol.',\n", " 'journal_id_astm': 'JMOBAK',\n", " 'journal_id_csd': '0070',\n", " 'journal_id_issn': '0022-2836',\n", " 'journal_volume': '175',\n", " 'page_first': '159',\n", " 'page_last': '174',\n", " 'pdbx_database_id_doi': '10.1016/0022-2836(84)90472-8',\n", " 'pdbx_database_id_pub_med': 6726807,\n", " 'rcsb_authors': ['Fermi, G.', 'Perutz, M.F.', 'Shaanan, B.', 'Fourme, R.'],\n", " 'rcsb_is_primary': 'Y',\n", " 'rcsb_journal_abbrev': 'J Mol Biol',\n", " 'title': 'The crystal structure of human deoxyhaemoglobin at 1.74 A resolution',\n", " 'year': 1984},\n", " {'country': 'UK',\n", " 'id': '1',\n", " 'journal_abbrev': 'Nature',\n", " 'journal_id_astm': 'NATUAS',\n", " 'journal_id_csd': '0006',\n", " 'journal_id_issn': '0028-0836',\n", " 'journal_volume': '295',\n", " 'page_first': '535',\n", " 'rcsb_authors': ['Perutz, M.F.',\n", " 'Hasnain, S.S.',\n", " 'Duke, P.J.',\n", " 'Sessler, J.L.',\n", " 'Hahn, J.E.'],\n", " 'rcsb_is_primary': 'N',\n", " 'rcsb_journal_abbrev': 'Nature',\n", " 'title': 'Stereochemistry of Iron in Deoxyhaemoglobin',\n", " 'year': 1982},\n", " {'country': 'US',\n", " 'id': '3',\n", " 'journal_abbrev': 'Annu.Rev.Biochem.',\n", " 'journal_id_astm': 'ARBOAW',\n", " 'journal_id_csd': '0413',\n", " 'journal_id_issn': '0066-4154',\n", " 'journal_volume': '48',\n", " 'page_first': '327',\n", " 'rcsb_authors': ['Perutz, M.F.'],\n", " 'rcsb_is_primary': 'N',\n", " 'rcsb_journal_abbrev': 'Annu Rev Biochem',\n", " 'title': 'Regulation of Oxygen Affinity of Hemoglobin. Influence of Structure of the Globin on the Heme Iron',\n", " 'year': 1979},\n", " {'country': 'UK',\n", " 'id': '4',\n", " 'journal_abbrev': 'J.Mol.Biol.',\n", " 'journal_id_astm': 'JMOBAK',\n", " 'journal_id_csd': '0070',\n", " 'journal_id_issn': '0022-2836',\n", " 'journal_volume': '100',\n", " 'page_first': '3',\n", " 'rcsb_authors': ['Teneyck, L.F.', 'Arnone, A.'],\n", " 'rcsb_is_primary': 'N',\n", " 'rcsb_journal_abbrev': 'J Mol Biol',\n", " 'title': 'Three-Dimensional Fourier Synthesis of Human Deoxyhemoglobin at 2.5 Angstroms Resolution, I.X-Ray Analysis',\n", " 'year': 1976},\n", " {'country': 'UK',\n", " 'id': '5',\n", " 'journal_abbrev': 'J.Mol.Biol.',\n", " 'journal_id_astm': 'JMOBAK',\n", " 'journal_id_csd': '0070',\n", " 'journal_id_issn': '0022-2836',\n", " 'journal_volume': '97',\n", " 'page_first': '237',\n", " 'rcsb_authors': ['Fermi, G.'],\n", " 'rcsb_is_primary': 'N',\n", " 'rcsb_journal_abbrev': 'J Mol Biol',\n", " 'title': 'Three-Dimensional Fourier Synthesis of Human Deoxyhaemoglobin at 2.5 Angstroms Resolution, Refinement of the Atomic Model',\n", " 'year': 1975},\n", " {'country': 'UK',\n", " 'id': '6',\n", " 'journal_abbrev': 'Nature',\n", " 'journal_id_astm': 'NATUAS',\n", " 'journal_id_csd': '0006',\n", " 'journal_id_issn': '0028-0836',\n", " 'journal_volume': '228',\n", " 'page_first': '516',\n", " 'rcsb_authors': ['Muirhead, H.', 'Greer, J.'],\n", " 'rcsb_is_primary': 'N',\n", " 'rcsb_journal_abbrev': 'Nature',\n", " 'title': 'Three-Dimensional Fourier Synthesis of Human Deoxyhaemoglobin at 3.5 Angstroms Resolution',\n", " 'year': 1970},\n", " {'book_publisher': 'Oxford University Press',\n", " 'id': '2',\n", " 'journal_abbrev': 'Haemoglobin and Myoglobin. Atlas of Molecular Structures in Biology',\n", " 'journal_id_csd': '0986',\n", " 'journal_id_issn': '0-19-854706-4',\n", " 'journal_volume': '2',\n", " 'rcsb_authors': ['Fermi, G.', 'Perutz, M.F.'],\n", " 'rcsb_is_primary': 'N',\n", " 'rcsb_journal_abbrev': 'Haemoglobin And Myoglobin Atlas Of Molecular Structures In Biology',\n", " 'year': 1981},\n", " {'book_publisher': 'National Biomedical Research Foundation, Silver Spring,Md.',\n", " 'id': '7',\n", " 'journal_abbrev': 'Atlas of Protein Sequence and Structure (Data Section)',\n", " 'journal_id_csd': '0435',\n", " 'journal_id_issn': '0-912466-02-2',\n", " 'journal_volume': '5',\n", " 'page_first': '56',\n", " 'rcsb_is_primary': 'N',\n", " 'rcsb_journal_abbrev': 'Atlas Of Protein Sequence And Structure (data Section)',\n", " 'year': 1972},\n", " {'book_publisher': 'National Biomedical Research Foundation, Silver Spring,Md.',\n", " 'id': '8',\n", " 'journal_abbrev': 'Atlas of Protein Sequence and Structure (Data Section)',\n", " 'journal_id_csd': '0435',\n", " 'journal_id_issn': '0-912466-02-2',\n", " 'journal_volume': '5',\n", " 'page_first': '64',\n", " 'rcsb_is_primary': 'N',\n", " 'rcsb_journal_abbrev': 'Atlas Of Protein Sequence And Structure (data Section)',\n", " 'year': 1972}],\n", " 'diffrn': [{'crystal_id': '1', 'id': '1'}],\n", " 'entry': {'id': '4HHB'},\n", " 'exptl': [{'method': 'X-RAY DIFFRACTION'}],\n", " 'exptl_crystal': [{'density_matthews': 2.26,\n", " 'density_percent_sol': 45.48,\n", " 'id': '1'}],\n", " 'pdbx_audit_revision_category': [{'category': 'atom_site',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 1,\n", " 'revision_ordinal': 4},\n", " {'category': 'database_PDB_caveat',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 2,\n", " 'revision_ordinal': 4},\n", " {'category': 'entity',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 3,\n", " 'revision_ordinal': 4},\n", " {'category': 'entity_name_com',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 4,\n", " 'revision_ordinal': 4},\n", " {'category': 'entity_src_gen',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 5,\n", " 'revision_ordinal': 4},\n", " {'category': 'pdbx_database_status',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 6,\n", " 'revision_ordinal': 4},\n", " {'category': 'pdbx_validate_rmsd_angle',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 7,\n", " 'revision_ordinal': 4},\n", " {'category': 'pdbx_validate_rmsd_bond',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 8,\n", " 'revision_ordinal': 4},\n", " {'category': 'struct_ref',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 9,\n", " 'revision_ordinal': 4},\n", " {'category': 'struct_ref_seq',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 10,\n", " 'revision_ordinal': 4},\n", " {'category': 'atom_site',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 11,\n", " 'revision_ordinal': 5},\n", " {'category': 'pdbx_validate_rmsd_angle',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 12,\n", " 'revision_ordinal': 5},\n", " {'category': 'pdbx_validate_rmsd_bond',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 13,\n", " 'revision_ordinal': 5},\n", " {'category': 'struct_site',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 14,\n", " 'revision_ordinal': 5},\n", " {'category': 'atom_site',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 15,\n", " 'revision_ordinal': 6},\n", " {'category': 'atom_sites',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 16,\n", " 'revision_ordinal': 6},\n", " {'category': 'database_2',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 17,\n", " 'revision_ordinal': 6},\n", " {'category': 'database_PDB_matrix',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 18,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_struct_conn_angle',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 19,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_validate_close_contact',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 20,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_validate_main_chain_plane',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 21,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_validate_peptide_omega',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 22,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_validate_planes',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 23,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_validate_polymer_linkage',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 24,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_validate_rmsd_angle',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 25,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_validate_rmsd_bond',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 26,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_validate_torsion',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 27,\n", " 'revision_ordinal': 6},\n", " {'category': 'struct_ncs_oper',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 28,\n", " 'revision_ordinal': 6},\n", " {'category': 'pdbx_database_remark',\n", " 'data_content_type': 'Structure model',\n", " 'ordinal': 29,\n", " 'revision_ordinal': 7}],\n", " 'pdbx_audit_revision_details': [{'data_content_type': 'Structure model',\n", " 'ordinal': 1,\n", " 'provider': 'repository',\n", " 'revision_ordinal': 1,\n", " 'type': 'Initial release'},\n", " {'data_content_type': 'Structure model',\n", " 'details': 'Coordinates and associated ncs operations (if present) transformed into standard crystal frame',\n", " 'ordinal': 2,\n", " 'provider': 'repository',\n", " 'revision_ordinal': 6,\n", " 'type': 'Remediation'}],\n", " 'pdbx_audit_revision_group': [{'data_content_type': 'Structure model',\n", " 'group': 'Version format compliance',\n", " 'ordinal': 1,\n", " 'revision_ordinal': 2},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Advisory',\n", " 'ordinal': 2,\n", " 'revision_ordinal': 3},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Version format compliance',\n", " 'ordinal': 3,\n", " 'revision_ordinal': 3},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Advisory',\n", " 'ordinal': 4,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Atomic model',\n", " 'ordinal': 5,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Data collection',\n", " 'ordinal': 6,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Database references',\n", " 'ordinal': 7,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Other',\n", " 'ordinal': 8,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Source and taxonomy',\n", " 'ordinal': 9,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Structure summary',\n", " 'ordinal': 10,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Atomic model',\n", " 'ordinal': 11,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Data collection',\n", " 'ordinal': 12,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Derived calculations',\n", " 'ordinal': 13,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Advisory',\n", " 'ordinal': 14,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Atomic model',\n", " 'ordinal': 15,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Data collection',\n", " 'ordinal': 16,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Database references',\n", " 'ordinal': 17,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Derived calculations',\n", " 'ordinal': 18,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Other',\n", " 'ordinal': 19,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Refinement description',\n", " 'ordinal': 20,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'group': 'Advisory',\n", " 'ordinal': 21,\n", " 'revision_ordinal': 7}],\n", " 'pdbx_audit_revision_history': [{'data_content_type': 'Structure model',\n", " 'major_revision': 1,\n", " 'minor_revision': 0,\n", " 'ordinal': 1,\n", " 'revision_date': '1984-07-17T00:00:00+0000'},\n", " {'data_content_type': 'Structure model',\n", " 'major_revision': 1,\n", " 'minor_revision': 1,\n", " 'ordinal': 2,\n", " 'revision_date': '2008-03-03T00:00:00+0000'},\n", " {'data_content_type': 'Structure model',\n", " 'major_revision': 1,\n", " 'minor_revision': 2,\n", " 'ordinal': 3,\n", " 'revision_date': '2011-07-13T00:00:00+0000'},\n", " {'data_content_type': 'Structure model',\n", " 'major_revision': 2,\n", " 'minor_revision': 0,\n", " 'ordinal': 4,\n", " 'revision_date': '2020-06-17T00:00:00+0000'},\n", " {'data_content_type': 'Structure model',\n", " 'major_revision': 3,\n", " 'minor_revision': 0,\n", " 'ordinal': 5,\n", " 'revision_date': '2021-03-31T00:00:00+0000'},\n", " {'data_content_type': 'Structure model',\n", " 'major_revision': 4,\n", " 'minor_revision': 0,\n", " 'ordinal': 6,\n", " 'revision_date': '2023-02-08T00:00:00+0000'},\n", " {'data_content_type': 'Structure model',\n", " 'major_revision': 4,\n", " 'minor_revision': 1,\n", " 'ordinal': 7,\n", " 'revision_date': '2023-03-15T00:00:00+0000'}],\n", " 'pdbx_audit_revision_item': [{'data_content_type': 'Structure model',\n", " 'item': '_atom_site.B_iso_or_equiv',\n", " 'ordinal': 1,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_x',\n", " 'ordinal': 2,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_y',\n", " 'ordinal': 3,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_z',\n", " 'ordinal': 4,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_entity.pdbx_description',\n", " 'ordinal': 5,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_entity_src_gen.gene_src_common_name',\n", " 'ordinal': 6,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_entity_src_gen.pdbx_beg_seq_num',\n", " 'ordinal': 7,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_entity_src_gen.pdbx_end_seq_num',\n", " 'ordinal': 8,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_entity_src_gen.pdbx_gene_src_gene',\n", " 'ordinal': 9,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_entity_src_gen.pdbx_seq_type',\n", " 'ordinal': 10,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_database_status.process_site',\n", " 'ordinal': 11,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_rmsd_angle.angle_deviation',\n", " 'ordinal': 12,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_rmsd_angle.angle_value',\n", " 'ordinal': 13,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_rmsd_bond.bond_deviation',\n", " 'ordinal': 14,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_rmsd_bond.bond_value',\n", " 'ordinal': 15,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ref.pdbx_align_begin',\n", " 'ordinal': 16,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ref_seq.db_align_beg',\n", " 'ordinal': 17,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ref_seq.db_align_end',\n", " 'ordinal': 18,\n", " 'revision_ordinal': 4},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.B_iso_or_equiv',\n", " 'ordinal': 19,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_x',\n", " 'ordinal': 20,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_y',\n", " 'ordinal': 21,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_z',\n", " 'ordinal': 22,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_rmsd_bond.bond_deviation',\n", " 'ordinal': 23,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_rmsd_bond.bond_value',\n", " 'ordinal': 24,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_site.pdbx_auth_asym_id',\n", " 'ordinal': 25,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_site.pdbx_auth_comp_id',\n", " 'ordinal': 26,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_site.pdbx_auth_seq_id',\n", " 'ordinal': 27,\n", " 'revision_ordinal': 5},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_x',\n", " 'ordinal': 28,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_y',\n", " 'ordinal': 29,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_site.Cartn_z',\n", " 'ordinal': 30,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[1][1]',\n", " 'ordinal': 31,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[1][2]',\n", " 'ordinal': 32,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[1][3]',\n", " 'ordinal': 33,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[2][1]',\n", " 'ordinal': 34,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[2][2]',\n", " 'ordinal': 35,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[2][3]',\n", " 'ordinal': 36,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[3][1]',\n", " 'ordinal': 37,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[3][2]',\n", " 'ordinal': 38,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_matrix[3][3]',\n", " 'ordinal': 39,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_vector[1]',\n", " 'ordinal': 40,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_vector[2]',\n", " 'ordinal': 41,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_atom_sites.fract_transf_vector[3]',\n", " 'ordinal': 42,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_2.pdbx_DOI',\n", " 'ordinal': 43,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_2.pdbx_database_accession',\n", " 'ordinal': 44,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[1][1]',\n", " 'ordinal': 45,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[1][2]',\n", " 'ordinal': 46,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[1][3]',\n", " 'ordinal': 47,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[2][1]',\n", " 'ordinal': 48,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[2][2]',\n", " 'ordinal': 49,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[2][3]',\n", " 'ordinal': 50,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[3][1]',\n", " 'ordinal': 51,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[3][2]',\n", " 'ordinal': 52,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx[3][3]',\n", " 'ordinal': 53,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx_vector[1]',\n", " 'ordinal': 54,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx_vector[2]',\n", " 'ordinal': 55,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_database_PDB_matrix.origx_vector[3]',\n", " 'ordinal': 56,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_struct_conn_angle.value',\n", " 'ordinal': 57,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_close_contact.dist',\n", " 'ordinal': 58,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_peptide_omega.omega',\n", " 'ordinal': 59,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_planes.rmsd',\n", " 'ordinal': 60,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_polymer_linkage.dist',\n", " 'ordinal': 61,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_torsion.phi',\n", " 'ordinal': 62,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_pdbx_validate_torsion.psi',\n", " 'ordinal': 63,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[1][1]',\n", " 'ordinal': 64,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[1][2]',\n", " 'ordinal': 65,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[1][3]',\n", " 'ordinal': 66,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[2][1]',\n", " 'ordinal': 67,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[2][2]',\n", " 'ordinal': 68,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[2][3]',\n", " 'ordinal': 69,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[3][1]',\n", " 'ordinal': 70,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[3][2]',\n", " 'ordinal': 71,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.matrix[3][3]',\n", " 'ordinal': 72,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.vector[1]',\n", " 'ordinal': 73,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.vector[2]',\n", " 'ordinal': 74,\n", " 'revision_ordinal': 6},\n", " {'data_content_type': 'Structure model',\n", " 'item': '_struct_ncs_oper.vector[3]',\n", " 'ordinal': 75,\n", " 'revision_ordinal': 6}],\n", " 'pdbx_database_pdbobs_spr': [{'date': '1984-07-17T00:00:00+0000',\n", " 'id': 'SPRSDE',\n", " 'pdb_id': '4HHB',\n", " 'replace_pdb_id': '1HHB'}],\n", " 'pdbx_database_related': [{'content_type': 'unspecified',\n", " 'db_id': '2HHB',\n", " 'db_name': 'PDB',\n", " 'details': 'REFINED BY THE METHOD OF JACK AND LEVITT. THIS\\n ENTRY PRESENTS THE BEST ESTIMATE OF THE\\n COORDINATES.'},\n", " {'content_type': 'unspecified',\n", " 'db_id': '3HHB',\n", " 'db_name': 'PDB',\n", " 'details': 'SYMMETRY AVERAGED ABOUT THE (NON-CRYSTALLOGRAPHIC)\\n MOLECULAR AXIS AND THEN RE-REGULARIZED BY THE\\n ENERGY REFINEMENT METHOD OF LEVITT. THIS ENTRY\\n PRESENTS COORDINATES THAT ARE ADEQUATE FOR MOST\\n PURPOSES, SUCH AS COMPARISON WITH OTHER STRUCTURES.'},\n", " {'content_type': 'unspecified', 'db_id': '1GLI', 'db_name': 'PDB'}],\n", " 'pdbx_database_status': {'pdb_format_compatible': 'Y',\n", " 'process_site': 'BNL',\n", " 'recvd_initial_deposition_date': '1984-03-07T00:00:00+0000',\n", " 'status_code': 'REL'},\n", " 'rcsb_accession_info': {'deposit_date': '1984-03-07T00:00:00+0000',\n", " 'has_released_experimental_data': 'N',\n", " 'initial_release_date': '1984-07-17T00:00:00+0000',\n", " 'major_revision': 4,\n", " 'minor_revision': 1,\n", " 'revision_date': '2023-03-15T00:00:00+0000',\n", " 'status_code': 'REL'},\n", " 'rcsb_entry_container_identifiers': {'assembly_ids': ['1'],\n", " 'entity_ids': ['1', '2', '3', '4', '5'],\n", " 'entry_id': '4HHB',\n", " 'model_ids': [1],\n", " 'non_polymer_entity_ids': ['3', '4'],\n", " 'polymer_entity_ids': ['1', '2'],\n", " 'rcsb_id': '4HHB',\n", " 'pubmed_id': 6726807},\n", " 'rcsb_entry_info': {'assembly_count': 1,\n", " 'branched_entity_count': 0,\n", " 'cis_peptide_count': 0,\n", " 'deposited_atom_count': 4779,\n", " 'deposited_hydrogen_atom_count': 0,\n", " 'deposited_model_count': 1,\n", " 'deposited_modeled_polymer_monomer_count': 574,\n", " 'deposited_nonpolymer_entity_instance_count': 6,\n", " 'deposited_polymer_entity_instance_count': 4,\n", " 'deposited_polymer_monomer_count': 574,\n", " 'deposited_solvent_atom_count': 221,\n", " 'deposited_unmodeled_polymer_monomer_count': 0,\n", " 'disulfide_bond_count': 0,\n", " 'entity_count': 5,\n", " 'experimental_method': 'X-ray',\n", " 'experimental_method_count': 1,\n", " 'inter_mol_covalent_bond_count': 0,\n", " 'inter_mol_metalic_bond_count': 4,\n", " 'molecular_weight': 64.74,\n", " 'na_polymer_entity_types': 'Other',\n", " 'nonpolymer_bound_components': ['HEM'],\n", " 'nonpolymer_entity_count': 2,\n", " 'nonpolymer_molecular_weight_maximum': 0.62,\n", " 'nonpolymer_molecular_weight_minimum': 0.09,\n", " 'polymer_composition': 'heteromeric protein',\n", " 'polymer_entity_count': 2,\n", " 'polymer_entity_count_dna': 0,\n", " 'polymer_entity_count_rna': 0,\n", " 'polymer_entity_count_nucleic_acid': 0,\n", " 'polymer_entity_count_nucleic_acid_hybrid': 0,\n", " 'polymer_entity_count_protein': 2,\n", " 'polymer_entity_taxonomy_count': 2,\n", " 'polymer_molecular_weight_maximum': 15.89,\n", " 'polymer_molecular_weight_minimum': 15.15,\n", " 'polymer_monomer_count_maximum': 146,\n", " 'polymer_monomer_count_minimum': 141,\n", " 'resolution_combined': [1.74],\n", " 'selected_polymer_entity_types': 'Protein (only)',\n", " 'solvent_entity_count': 1,\n", " 'structure_determination_methodology': 'experimental',\n", " 'structure_determination_methodology_priority': 10,\n", " 'diffrn_resolution_high': {'provenance_source': 'From refinement resolution cutoff',\n", " 'value': 1.74}},\n", " 'rcsb_primary_citation': {'country': 'UK',\n", " 'id': 'primary',\n", " 'journal_abbrev': 'J.Mol.Biol.',\n", " 'journal_id_astm': 'JMOBAK',\n", " 'journal_id_csd': '0070',\n", " 'journal_id_issn': '0022-2836',\n", " 'journal_volume': '175',\n", " 'page_first': '159',\n", " 'page_last': '174',\n", " 'pdbx_database_id_doi': '10.1016/0022-2836(84)90472-8',\n", " 'pdbx_database_id_pub_med': 6726807,\n", " 'rcsb_orcididentifiers': ['?', '?', '?', '?'],\n", " 'rcsb_authors': ['Fermi, G.', 'Perutz, M.F.', 'Shaanan, B.', 'Fourme, R.'],\n", " 'rcsb_journal_abbrev': 'J Mol Biol',\n", " 'title': 'The crystal structure of human deoxyhaemoglobin at 1.74 A resolution',\n", " 'year': 1984},\n", " 'refine': [{'details': 'THE COORDINATES GIVEN HERE ARE IN THE ORTHOGONAL ANGSTROM\\nSYSTEM STANDARD FOR HEMOGLOBINS. THE Y AXIS IS THE\\n(NON CRYSTALLOGRAPHIC) MOLECULAR DIAD AND THE X AXIS IS THE\\nPSEUDO DIAD WHICH RELATES THE ALPHA-1 AND BETA-1 CHAINS.\\nTHE TRANSFORMATION GIVEN IN THE *MTRIX* RECORDS BELOW\\nWILL GENERATE COORDINATES FOR THE *C* AND *D* CHAINS FROM\\nTHE *A* AND *B* CHAINS RESPECTIVELY.',\n", " 'ls_rfactor_rwork': 0.135,\n", " 'ls_dres_high': 1.74,\n", " 'pdbx_diffrn_id': ['1'],\n", " 'pdbx_refine_id': 'X-RAY DIFFRACTION'}],\n", " 'refine_hist': [{'cycle_id': 'LAST',\n", " 'd_res_high': 1.74,\n", " 'number_atoms_solvent': 221,\n", " 'number_atoms_total': 4779,\n", " 'pdbx_number_atoms_ligand': 174,\n", " 'pdbx_number_atoms_nucleic_acid': 0,\n", " 'pdbx_number_atoms_protein': 4384,\n", " 'pdbx_refine_id': 'X-RAY DIFFRACTION'}],\n", " 'struct': {'title': 'THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 1.74 ANGSTROMS RESOLUTION'},\n", " 'struct_keywords': {'pdbx_keywords': 'OXYGEN TRANSPORT',\n", " 'text': 'OXYGEN TRANSPORT'},\n", " 'symmetry': {'int_tables_number': 4, 'space_group_name_hm': 'P 1 21 1'},\n", " 'rcsb_id': '4HHB'}" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pdbinfo = json.loads(response.text)\n", "pdbinfo" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'assembly_count': 1,\n", " 'branched_entity_count': 0,\n", " 'cis_peptide_count': 0,\n", " 'deposited_atom_count': 4779,\n", " 'deposited_hydrogen_atom_count': 0,\n", " 'deposited_model_count': 1,\n", " 'deposited_modeled_polymer_monomer_count': 574,\n", " 'deposited_nonpolymer_entity_instance_count': 6,\n", " 'deposited_polymer_entity_instance_count': 4,\n", " 'deposited_polymer_monomer_count': 574,\n", " 'deposited_solvent_atom_count': 221,\n", " 'deposited_unmodeled_polymer_monomer_count': 0,\n", " 'disulfide_bond_count': 0,\n", " 'entity_count': 5,\n", " 'experimental_method': 'X-ray',\n", " 'experimental_method_count': 1,\n", " 'inter_mol_covalent_bond_count': 0,\n", " 'inter_mol_metalic_bond_count': 4,\n", " 'molecular_weight': 64.74,\n", " 'na_polymer_entity_types': 'Other',\n", " 'nonpolymer_bound_components': ['HEM'],\n", " 'nonpolymer_entity_count': 2,\n", " 'nonpolymer_molecular_weight_maximum': 0.62,\n", " 'nonpolymer_molecular_weight_minimum': 0.09,\n", " 'polymer_composition': 'heteromeric protein',\n", " 'polymer_entity_count': 2,\n", " 'polymer_entity_count_dna': 0,\n", " 'polymer_entity_count_rna': 0,\n", " 'polymer_entity_count_nucleic_acid': 0,\n", " 'polymer_entity_count_nucleic_acid_hybrid': 0,\n", " 'polymer_entity_count_protein': 2,\n", " 'polymer_entity_taxonomy_count': 2,\n", " 'polymer_molecular_weight_maximum': 15.89,\n", " 'polymer_molecular_weight_minimum': 15.15,\n", " 'polymer_monomer_count_maximum': 146,\n", " 'polymer_monomer_count_minimum': 141,\n", " 'resolution_combined': [1.74],\n", " 'selected_polymer_entity_types': 'Protein (only)',\n", " 'solvent_entity_count': 1,\n", " 'structure_determination_methodology': 'experimental',\n", " 'structure_determination_methodology_priority': 10,\n", " 'diffrn_resolution_high': {'provenance_source': 'From refinement resolution cutoff',\n", " 'value': 1.74}}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pdbinfo['rcsb_entry_info']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "{'rcsb_cluster_membership': [{'cluster_id': 105, 'identity': 100},\n", " {'cluster_id': 112, 'identity': 95},\n", " {'cluster_id': 96, 'identity': 90},\n", " {'cluster_id': 47, 'identity': 70},\n", " {'cluster_id': 20, 'identity': 50},\n", " {'cluster_id': 31, 'identity': 30}],\n", " 'entity_poly': {'nstd_linkage': 'no',\n", " 'nstd_monomer': 'no',\n", " 'pdbx_seq_one_letter_code': 'VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR',\n", " 'pdbx_seq_one_letter_code_can': 'VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR',\n", " 'pdbx_strand_id': 'A,C',\n", " 'rcsb_artifact_monomer_count': 0,\n", " 'rcsb_conflict_count': 0,\n", " 'rcsb_deletion_count': 0,\n", " 'rcsb_entity_polymer_type': 'Protein',\n", " 'rcsb_insertion_count': 0,\n", " 'rcsb_mutation_count': 0,\n", " 'rcsb_non_std_monomer_count': 0,\n", " 'rcsb_sample_sequence_length': 141,\n", " 'type': 'polypeptide(L)'},\n", " 'entity_src_gen': [{'gene_src_common_name': 'Human',\n", " 'gene_src_genus': 'Homo',\n", " 'pdbx_alt_source_flag': 'sample',\n", " 'pdbx_beg_seq_num': 1,\n", " 'pdbx_end_seq_num': 141,\n", " 'pdbx_gene_src_gene': 'HBA1, HBA2',\n", " 'pdbx_gene_src_ncbi_taxonomy_id': '9606',\n", " 'pdbx_gene_src_scientific_name': 'Homo sapiens',\n", " 'pdbx_seq_type': 'Biological sequence',\n", " 'pdbx_src_id': 1}],\n", " 'rcsb_entity_host_organism': [{'beg_seq_num': 1,\n", " 'end_seq_num': 141,\n", " 'pdbx_src_id': '1',\n", " 'provenance_source': 'Primary Data'}],\n", " 'rcsb_entity_source_organism': [{'beg_seq_num': 1,\n", " 'common_name': 'Human',\n", " 'end_seq_num': 141,\n", " 'ncbi_common_names': ['human'],\n", " 'ncbi_parent_scientific_name': 'Eukaryota',\n", " 'ncbi_scientific_name': 'Homo sapiens',\n", " 'ncbi_taxonomy_id': 9606,\n", " 'pdbx_src_id': '1',\n", " 'provenance_source': 'Primary Data',\n", " 'scientific_name': 'Homo sapiens',\n", " 'source_type': 'genetically engineered',\n", " 'taxonomy_lineage': [{'depth': 1,\n", " 'id': '131567',\n", " 'name': 'cellular organisms'},\n", " {'depth': 1, 'id': '131567', 'name': 'biota'},\n", " {'depth': 2, 'id': '2759', 'name': 'Eukaryota'},\n", " {'depth': 2, 'id': '2759', 'name': 'Eucarya'},\n", " {'depth': 2, 'id': '2759', 'name': 'Eucaryotae'},\n", " {'depth': 2, 'id': '2759', 'name': 'Eukarya'},\n", " {'depth': 2, 'id': '2759', 'name': 'Eukaryotae'},\n", " {'depth': 2, 'id': '2759', 'name': 'eucaryotes'},\n", " {'depth': 2, 'id': '2759', 'name': 'eukaryotes'},\n", " {'depth': 3, 'id': '33154', 'name': 'Opisthokonta'},\n", " {'depth': 3, 'id': '33154', 'name': 'Fungi/Metazoa group'},\n", " {'depth': 3, 'id': '33154', 'name': 'opisthokonts'},\n", " {'depth': 4, 'id': '33208', 'name': 'Metazoa'},\n", " {'depth': 4, 'id': '33208', 'name': 'Animalia'},\n", " {'depth': 4, 'id': '33208', 'name': 'metazoans'},\n", " {'depth': 4, 'id': '33208', 'name': 'multicellular animals'},\n", " {'depth': 5, 'id': '6072', 'name': 'Eumetazoa'},\n", " {'depth': 6, 'id': '33213', 'name': 'Bilateria'},\n", " {'depth': 7, 'id': '33511', 'name': 'Deuterostomia'},\n", " {'depth': 7, 'id': '33511', 'name': 'deuterostomes'},\n", " {'depth': 8, 'id': '7711', 'name': 'Chordata'},\n", " {'depth': 8, 'id': '7711', 'name': 'chordates'},\n", " {'depth': 9, 'id': '89593', 'name': 'Craniata'},\n", " {'depth': 10, 'id': '7742', 'name': 'Vertebrata'},\n", " {'depth': 10, 'id': '7742', 'name': 'vertebrates'},\n", " {'depth': 11, 'id': '7776', 'name': 'Gnathostomata'},\n", " {'depth': 11, 'id': '7776', 'name': 'jawed vertebrates'},\n", " {'depth': 12, 'id': '117570', 'name': 'Teleostomi'},\n", " {'depth': 13, 'id': '117571', 'name': 'Euteleostomi'},\n", " {'depth': 13, 'id': '117571', 'name': 'bony vertebrates'},\n", " {'depth': 14, 'id': '8287', 'name': 'Sarcopterygii'},\n", " {'depth': 15, 'id': '1338369', 'name': 'Dipnotetrapodomorpha'},\n", " {'depth': 16, 'id': '32523', 'name': 'Tetrapoda'},\n", " {'depth': 16, 'id': '32523', 'name': 'tetrapods'},\n", " {'depth': 17, 'id': '32524', 'name': 'Amniota'},\n", " {'depth': 17, 'id': '32524', 'name': 'amniotes'},\n", " {'depth': 18, 'id': '40674', 'name': 'Mammalia'},\n", " {'depth': 18, 'id': '40674', 'name': 'mammals'},\n", " {'depth': 19, 'id': '32525', 'name': 'Theria'},\n", " {'depth': 20, 'id': '9347', 'name': 'Eutheria'},\n", " {'depth': 20, 'id': '9347', 'name': 'Placentalia'},\n", " {'depth': 20, 'id': '9347', 'name': 'eutherian mammals'},\n", " {'depth': 20, 'id': '9347', 'name': 'placental mammals'},\n", " {'depth': 20, 'id': '9347', 'name': 'placentals'},\n", " {'depth': 21, 'id': '1437010', 'name': 'Boreoeutheria'},\n", " {'depth': 21, 'id': '1437010', 'name': 'Boreotheria'},\n", " {'depth': 22, 'id': '314146', 'name': 'Euarchontoglires'},\n", " {'depth': 23, 'id': '9443', 'name': 'Primates'},\n", " {'depth': 23, 'id': '9443', 'name': 'Primata'},\n", " {'depth': 23, 'id': '9443', 'name': 'primate'},\n", " {'depth': 24, 'id': '376913', 'name': 'Haplorrhini'},\n", " {'depth': 25, 'id': '314293', 'name': 'Simiiformes'},\n", " {'depth': 25, 'id': '314293', 'name': 'Anthropoidea'},\n", " {'depth': 26, 'id': '9526', 'name': 'Catarrhini'},\n", " {'depth': 27, 'id': '314295', 'name': 'Hominoidea'},\n", " {'depth': 27, 'id': '314295', 'name': 'ape'},\n", " {'depth': 27, 'id': '314295', 'name': 'apes'},\n", " {'depth': 28, 'id': '9604', 'name': 'Hominidae'},\n", " {'depth': 28, 'id': '9604', 'name': 'Pongidae'},\n", " {'depth': 28, 'id': '9604', 'name': 'great apes'},\n", " {'depth': 29, 'id': '207598', 'name': 'Homininae'},\n", " {'depth': 29, 'id': '207598', 'name': 'Homo/Pan/Gorilla group'},\n", " {'depth': 30, 'id': '9605', 'name': 'Homo'},\n", " {'depth': 30, 'id': '9605', 'name': 'humans'},\n", " {'depth': 31, 'id': '9606', 'name': 'Homo sapiens'},\n", " {'depth': 31, 'id': '9606', 'name': 'human'}],\n", " 'rcsb_gene_name': [{'provenance_source': 'Primary Data', 'value': 'HBA1'},\n", " {'provenance_source': 'Primary Data', 'value': 'HBA2'},\n", " {'provenance_source': 'UniProt', 'value': 'HBA1'},\n", " {'provenance_source': 'UniProt', 'value': 'HBA2'}]}],\n", " 'rcsb_polymer_entity': {'formula_weight': 15.15,\n", " 'pdbx_description': 'Hemoglobin subunit alpha',\n", " 'pdbx_number_of_molecules': 2,\n", " 'rcsb_multiple_source_flag': 'N',\n", " 'rcsb_source_part_count': 1,\n", " 'rcsb_source_taxonomy_count': 1,\n", " 'src_method': 'man',\n", " 'rcsb_macromolecular_names_combined': [{'name': 'Hemoglobin subunit alpha',\n", " 'provenance_code': 'ECO:0000304',\n", " 'provenance_source': 'PDB Preferred Name'},\n", " {'name': 'Alpha-globin',\n", " 'provenance_code': 'ECO:0000303',\n", " 'provenance_source': 'PDB Synonym'},\n", " {'name': 'Hemoglobin alpha chain',\n", " 'provenance_code': 'ECO:0000303',\n", " 'provenance_source': 'PDB Synonym'}],\n", " 'rcsb_polymer_name_combined': {'names': ['Hemoglobin subunit alpha'],\n", " 'provenance_source': 'UniProt Name'}},\n", " 'rcsb_polymer_entity_align': [{'provenance_source': 'SIFTS',\n", " 'reference_database_accession': 'P69905',\n", " 'reference_database_name': 'UniProt',\n", " 'aligned_regions': [{'entity_beg_seq_id': 1,\n", " 'length': 141,\n", " 'ref_beg_seq_id': 2}]}],\n", " 'rcsb_polymer_entity_annotation': [{'annotation_id': 'PF00042',\n", " 'assignment_version': '34.0',\n", " 'name': 'Globin (Globin)',\n", " 'provenance_source': 'Pfam',\n", " 'type': 'Pfam'},\n", " {'annotation_id': 'P69905',\n", " 'assignment_version': '1.0',\n", " 'name': 'Glycoprotein',\n", " 'provenance_source': 'PDB',\n", " 'type': 'GlyGen'}],\n", " 'rcsb_polymer_entity_container_identifiers': {'asym_ids': ['A', 'C'],\n", " 'auth_asym_ids': ['A', 'C'],\n", " 'chem_comp_monomers': ['ALA',\n", " 'ARG',\n", " 'ASN',\n", " 'ASP',\n", " 'CYS',\n", " 'GLN',\n", " 'GLU',\n", " 'GLY',\n", " 'HIS',\n", " 'LEU',\n", " 'LYS',\n", " 'MET',\n", " 'PHE',\n", " 'PRO',\n", " 'SER',\n", " 'THR',\n", " 'TRP',\n", " 'TYR',\n", " 'VAL'],\n", " 'entity_id': '1',\n", " 'entry_id': '4HHB',\n", " 'rcsb_id': '4HHB_1',\n", " 'reference_sequence_identifiers': [{'database_accession': 'P69905',\n", " 'database_name': 'UniProt',\n", " 'entity_sequence_coverage': 1.0,\n", " 'provenance_source': 'SIFTS',\n", " 'reference_sequence_coverage': 0.9929577464788732}],\n", " 'uniprot_ids': ['P69905']},\n", " 'rcsb_polymer_entity_feature': [{'assignment_version': '34.0',\n", " 'feature_id': 'PF00042',\n", " 'name': 'Globin (Globin)',\n", " 'provenance_source': 'Pfam',\n", " 'type': 'Pfam',\n", " 'feature_positions': [{'beg_seq_id': 26, 'end_seq_id': 136}]},\n", " {'name': 'Hydropathy values',\n", " 'provenance_source': 'biojava-7.0.1',\n", " 'type': 'hydropathy',\n", " 'feature_positions': [{'beg_seq_id': 5,\n", " 'values': [-0.47,\n", " -0.47,\n", " -1.32,\n", " -1.03,\n", " -0.66,\n", " -0.96,\n", " -0.61,\n", " -0.61,\n", " -0.07,\n", " 0.28,\n", " 0.01,\n", " 0.09,\n", " 0.09,\n", " -0.16,\n", " -0.44,\n", " -0.54,\n", " -0.16,\n", " -0.42,\n", " -0.77,\n", " -0.77,\n", " 0.01,\n", " -0.58,\n", " -1.03,\n", " -0.43,\n", " 0.02,\n", " 0.49,\n", " 0.2,\n", " 0.9,\n", " 0.52,\n", " 0.02,\n", " 0.33,\n", " 0.4,\n", " 0.11,\n", " -0.34,\n", " -0.46,\n", " -0.54,\n", " -1.21,\n", " -0.72,\n", " -1.03,\n", " -0.53,\n", " -0.19,\n", " -0.47,\n", " -0.37,\n", " -0.77,\n", " -0.39,\n", " -0.42,\n", " -0.27,\n", " -0.31,\n", " -0.78,\n", " -1.04,\n", " -0.73,\n", " -1.12,\n", " -1.47,\n", " -1.2,\n", " -0.61,\n", " -1.47,\n", " -0.83,\n", " -0.37,\n", " -0.09,\n", " -0.43,\n", " 0.2,\n", " 1.1,\n", " 0.83,\n", " 0.28,\n", " 1.13,\n", " 0.54,\n", " -0.27,\n", " 0.02,\n", " 0.23,\n", " -0.36,\n", " -0.62,\n", " -0.4,\n", " -0.13,\n", " -0.4,\n", " 0.41,\n", " 0.71,\n", " 0.11,\n", " 0.71,\n", " 0.74,\n", " 0.74,\n", " -0.03,\n", " -0.38,\n", " -0.16,\n", " -1.08,\n", " -0.52,\n", " -0.52,\n", " -1.12,\n", " -0.3,\n", " -0.89,\n", " -0.22,\n", " -0.22,\n", " -0.22,\n", " 0.7,\n", " 0.14,\n", " 0.18,\n", " 0.63,\n", " 0.59,\n", " 1.4,\n", " 1.56,\n", " 1.91,\n", " 1.91,\n", " 1.69,\n", " 1.98,\n", " 1.98,\n", " 2.12,\n", " 1.52,\n", " 1.3,\n", " 0.44,\n", " 0.83,\n", " 0.33,\n", " -0.04,\n", " -0.04,\n", " 0.78,\n", " 0.0,\n", " 0.38,\n", " 0.09,\n", " 0.9,\n", " 0.2,\n", " -0.16,\n", " 0.33,\n", " 0.56,\n", " 0.29,\n", " 0.56,\n", " 0.82,\n", " 0.82,\n", " 0.32,\n", " 1.18,\n", " 2.03,\n", " 1.64,\n", " 1.13,\n", " 0.5,\n", " 0.44,\n", " -0.52]}]},\n", " {'name': 'Disordered binding sites',\n", " 'provenance_source': 'Anchor2',\n", " 'type': 'disorder_binding',\n", " 'feature_positions': [{'beg_seq_id': 1,\n", " 'values': [0.39,\n", " 0.38,\n", " 0.37,\n", " 0.36,\n", " 0.36,\n", " 0.36,\n", " 0.35,\n", " 0.35,\n", " 0.35,\n", " 0.35,\n", " 0.36,\n", " 0.37,\n", " 0.38,\n", " 0.39,\n", " 0.39,\n", " 0.39,\n", " 0.39,\n", " 0.39,\n", " 0.4,\n", " 0.4,\n", " 0.39,\n", " 0.39,\n", " 0.4,\n", " 0.41,\n", " 0.42,\n", " 0.42,\n", " 0.4,\n", " 0.39,\n", " 0.37,\n", " 0.36,\n", " 0.36,\n", " 0.35,\n", " 0.35,\n", " 0.35,\n", " 0.36,\n", " 0.37,\n", " 0.38,\n", " 0.39,\n", " 0.4,\n", " 0.41,\n", " 0.41,\n", " 0.41,\n", " 0.4,\n", " 0.39,\n", " 0.39,\n", " 0.38,\n", " 0.38,\n", " 0.38,\n", " 0.38,\n", " 0.38,\n", " 0.39,\n", " 0.4,\n", " 0.4,\n", " 0.41,\n", " 0.42,\n", " 0.43,\n", " 0.42,\n", " 0.42,\n", " 0.4,\n", " 0.4,\n", " 0.39,\n", " 0.38,\n", " 0.38,\n", " 0.37,\n", " 0.36,\n", " 0.36,\n", " 0.36,\n", " 0.36,\n", " 0.36,\n", " 0.36,\n", " 0.36,\n", " 0.37,\n", " 0.37,\n", " 0.37,\n", " 0.38,\n", " 0.39,\n", " 0.39,\n", " 0.4,\n", " 0.41,\n", " 0.42,\n", " 0.42,\n", " 0.43,\n", " 0.43,\n", " 0.42,\n", " 0.41,\n", " 0.4,\n", " 0.4,\n", " 0.41,\n", " 0.41,\n", " 0.42,\n", " 0.43,\n", " 0.42,\n", " 0.4,\n", " 0.38,\n", " 0.37,\n", " 0.35,\n", " 0.34,\n", " 0.32,\n", " 0.32,\n", " 0.31,\n", " 0.31,\n", " 0.31,\n", " 0.31,\n", " 0.3,\n", " 0.3,\n", " 0.29,\n", " 0.28,\n", " 0.26,\n", " 0.25,\n", " 0.23,\n", " 0.21,\n", " 0.2,\n", " 0.19,\n", " 0.17,\n", " 0.15,\n", " 0.14,\n", " 0.13,\n", " 0.12,\n", " 0.12,\n", " 0.11,\n", " 0.11,\n", " 0.11,\n", " 0.1,\n", " 0.09,\n", " 0.09,\n", " 0.09,\n", " 0.09,\n", " 0.08,\n", " 0.08,\n", " 0.08,\n", " 0.07,\n", " 0.07,\n", " 0.07,\n", " 0.07,\n", " 0.07,\n", " 0.08,\n", " 0.08,\n", " 0.08,\n", " 0.08,\n", " 0.08,\n", " 0.09]}]},\n", " {'name': 'Disordered regions',\n", " 'provenance_source': 'IUPred2(short)',\n", " 'type': 'disorder',\n", " 'feature_positions': [{'beg_seq_id': 1,\n", " 'values': [0.9,\n", " 0.85,\n", " 0.79,\n", " 0.68,\n", " 0.64,\n", " 0.6,\n", " 0.5,\n", " 0.46,\n", " 0.38,\n", " 0.34,\n", " 0.27,\n", " 0.31,\n", " 0.37,\n", " 0.28,\n", " 0.26,\n", " 0.26,\n", " 0.27,\n", " 0.24,\n", " 0.18,\n", " 0.19,\n", " 0.26,\n", " 0.23,\n", " 0.18,\n", " 0.16,\n", " 0.21,\n", " 0.15,\n", " 0.17,\n", " 0.22,\n", " 0.22,\n", " 0.26,\n", " 0.25,\n", " 0.21,\n", " 0.14,\n", " 0.15,\n", " 0.24,\n", " 0.16,\n", " 0.19,\n", " 0.12,\n", " 0.13,\n", " 0.19,\n", " 0.17,\n", " 0.17,\n", " 0.18,\n", " 0.27,\n", " 0.28,\n", " 0.29,\n", " 0.37,\n", " 0.35,\n", " 0.35,\n", " 0.35,\n", " 0.35,\n", " 0.31,\n", " 0.36,\n", " 0.44,\n", " 0.38,\n", " 0.31,\n", " 0.38,\n", " 0.39,\n", " 0.44,\n", " 0.39,\n", " 0.37,\n", " 0.38,\n", " 0.32,\n", " 0.35,\n", " 0.33,\n", " 0.35,\n", " 0.38,\n", " 0.39,\n", " 0.36,\n", " 0.28,\n", " 0.29,\n", " 0.26,\n", " 0.22,\n", " 0.24,\n", " 0.24,\n", " 0.18,\n", " 0.25,\n", " 0.21,\n", " 0.2,\n", " 0.23,\n", " 0.18,\n", " 0.21,\n", " 0.14,\n", " 0.18,\n", " 0.21,\n", " 0.15,\n", " 0.18,\n", " 0.08,\n", " 0.08,\n", " 0.05,\n", " 0.05,\n", " 0.05,\n", " 0.06,\n", " 0.06,\n", " 0.03,\n", " 0.02,\n", " 0.02,\n", " 0.02,\n", " 0.01,\n", " 0.01,\n", " 0.01,\n", " 0.01,\n", " 0.01,\n", " 0.01,\n", " 0.01,\n", " 0.01,\n", " 0.01,\n", " 0.01,\n", " 0.02,\n", " 0.01,\n", " 0.01,\n", " 0.02,\n", " 0.02,\n", " 0.02,\n", " 0.01,\n", " 0.03,\n", " 0.05,\n", " 0.04,\n", " 0.03,\n", " 0.04,\n", " 0.04,\n", " 0.03,\n", " 0.03,\n", " 0.06,\n", " 0.03,\n", " 0.03,\n", " 0.02,\n", " 0.05,\n", " 0.05,\n", " 0.03,\n", " 0.03,\n", " 0.07,\n", " 0.1,\n", " 0.16,\n", " 0.2,\n", " 0.34,\n", " 0.38,\n", " 0.41,\n", " 0.57,\n", " 0.69,\n", " 0.76]}]}],\n", " 'rcsb_polymer_entity_feature_summary': [{'count': 0,\n", " 'coverage': 0.0,\n", " 'type': 'CARD_MODEL'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'IMGT_ANTIBODY_DESCRIPTION'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'IMGT_ANTIBODY_DOMAIN_NAME'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'IMGT_ANTIBODY_GENE_ALLELE_NAME'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'IMGT_ANTIBODY_ORGANISM_NAME'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'IMGT_ANTIBODY_PROTEIN_NAME'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'IMGT_ANTIBODY_RECEPTOR_DESCRIPTION'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'IMGT_ANTIBODY_RECEPTOR_TYPE'},\n", " {'count': 1, 'coverage': 0.78723, 'type': 'Pfam'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'SABDAB_ANTIBODY_ANTIGEN_NAME'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'SABDAB_ANTIBODY_NAME'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'SABDAB_ANTIBODY_TARGET'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'artifact'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'modified_monomer'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'mutation'}],\n", " 'rcsb_polymer_entity_name_com': [{'name': 'Alpha-globin,Hemoglobin alpha chain'}],\n", " 'rcsb_related_target_references': [{'related_resource_name': 'ChEMBL',\n", " 'related_resource_version': '33',\n", " 'related_target_id': 'CHEMBL2887',\n", " 'target_taxonomy_id': 9606,\n", " 'aligned_target': [{'entity_beg_seq_id': 1,\n", " 'length': 140,\n", " 'target_beg_seq_id': 2}]},\n", " {'related_resource_name': 'ChEMBL',\n", " 'related_resource_version': '33',\n", " 'related_target_id': 'CHEMBL2095168',\n", " 'target_taxonomy_id': 9606,\n", " 'aligned_target': [{'entity_beg_seq_id': 1,\n", " 'length': 140,\n", " 'target_beg_seq_id': 2}]},\n", " {'related_resource_name': 'DrugBank',\n", " 'related_resource_version': '5.1',\n", " 'related_target_id': 'P69905',\n", " 'target_taxonomy_id': 9606,\n", " 'aligned_target': [{'entity_beg_seq_id': 1,\n", " 'length': 140,\n", " 'target_beg_seq_id': 2}]},\n", " {'related_resource_name': 'Pharos',\n", " 'related_resource_version': '6.13.4',\n", " 'related_target_id': '6231',\n", " 'target_taxonomy_id': 9606,\n", " 'aligned_target': [{'entity_beg_seq_id': 1,\n", " 'length': 140,\n", " 'target_beg_seq_id': 2}]}],\n", " 'rcsb_target_cofactors': [{'cofactor_name': 'Iron Dextran',\n", " 'cofactor_resource_id': 'DB00893',\n", " 'mechanism_of_action': 'After iron dextran is injected, the circulating iron dextran is removed from the plasma by cells of the reticuloendothelial system, which split the complex into its components of iron and dextran. The iron is immediately bound to the available protein moieties to form hemosiderin or ferritin, the physiological forms of iron, or to a lesser extent to transferrin. This iron which is subject to physiological control replenishes hemoglobin and depleted iron stores.',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [17139284, 17016423, 11752352],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'XEEYBQQBJWHFJM-UHFFFAOYSA-N',\n", " 'cofactor_smiles': '[Fe]',\n", " 'cofactor_name': 'Iron',\n", " 'cofactor_resource_id': 'DB01592',\n", " 'mechanism_of_action': 'Iron is necessary for the production of hemoglobin. Iron-deficiency can lead to decreased production of hemoglobin and a microcytic, hypochromic anemia.',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [16901899],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'HCHKCACWOHOZIP-UHFFFAOYSA-N',\n", " 'cofactor_smiles': '[Zn]',\n", " 'cofactor_name': 'Zinc',\n", " 'cofactor_resource_id': 'DB01593',\n", " 'mechanism_of_action': '**Zinc has three primary biological roles**: _catalytic_, _structural_, and _regulatory_. The catalytic and structural role of zinc is well established, and there are various noteworthy reviews on these functions. For example, zinc is a structural constituent in numerous proteins, inclusive of growth factors, cytokines, receptors, enzymes, and transcription factors for different cellular signaling pathways. It is implicated in numerous cellular processes as a cofactor for approximately 3000 human proteins including enzymes, nuclear factors, and hormones [L2096].\\r\\n\\r\\nZinc promotes resistance to epithelial apoptosis through cell protection (cytoprotection) against reactive oxygen species and bacterial toxins, likely through the antioxidant activity of the cysteine-rich metallothioneins [A32419].\\r\\n\\r\\nIn HL-60 cells (promyelocytic leukemia cell line), zinc enhances the up-regulation of A20 mRNA, which, via TRAF pathway, decreases NF-kappaB activation, leading to decreased gene expression and generation of tumor necrosis factor-alpha (TNF-alpha), IL-1beta, and IL-8 [A32418].\\r\\n\\r\\nThere are several mechanisms of action of zinc on acute diarrhea. Various mechanisms are specific to the gastrointestinal system: zinc restores mucosal barrier integrity and enterocyte brush-border enzyme activity, it promotes the production of antibodies and circulating lymphocytes against intestinal pathogens, and has a direct effect on ion channels, acting as a potassium channel blocker of adenosine 3-5-cyclic monophosphate-mediated chlorine secretion. Cochrane researchers examined the evidence available up to 30 September 2016 [L2106].\\r\\n\\r\\nZinc deficiency in humans decreases the activity of serum _thymulin_ (a hormone of the thymus), which is necessary for the maturation of T-helper cells. T-helper 1 (Th(1)) cytokines are decreased but T-helper 2 (Th(2)) cytokines are not affected by zinc deficiency in humans [A32417].\\r\\n\\r\\nThe change of _Th(1)_ to _Th(2)_ function leads to cell-mediated immune dysfunction. Because IL-2 production (Th(1) cytokine) is decreased, this causes decreased activity of natural-killer-cell (NK cell) and T cytolytic cells, normally involved in killing viruses, bacteria, and malignant cells [A32424]. \\r\\n\\r\\nIn humans, zinc deficiency may lead to the generation of new CD4+ T cells, produced in the thymus. In cell culture studies (HUT-78, a Th(0) human malignant lymphoblastoid cell line), as a result of zinc deficiency, nuclear factor-kappaB (NF-kappaB) activation, phosphorylation of IkappaB, and binding of NF-kappaB to DNA are decreased and this results in decreased Th(1) cytokine production [A32417].\\r\\n\\r\\nIn another study, zinc supplementation in human subjects suppressed the gene expression and production of pro-inflammatory cytokines and decreased oxidative stress markers [A32424]. In HL-60 cells (a human pro-myelocytic leukemia cell line), zinc deficiency increased the levels of TNF-alpha, IL-1beta, and IL-8 cytokines and mRNA. In such cells, zinc was found to induce A20, a zinc finger protein that inhibited NF-kappaB activation by the tumor necrosis factor receptor-associated factor pathway. This process decreased gene expression of pro-inflammatory cytokines and oxidative stress markers [A32417].\\r\\n\\r\\nThe exact mechanism of zinc in acne treatment is poorly understood. However, zinc is considered to act directly on microbial inflammatory equilibrium and facilitate antibiotic absorption when used in combination with other agents. Topical zinc alone as well as in combination with other agents may be efficacious because of its anti-inflammatory activity and ability to reduce P. acnes bacteria by the inhibition of P. acnes lipases and free fatty acid levels [L2102].',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [23896426],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'HAEJSGLKJYIYTB-ZZXKWVIFSA-N',\n", " 'cofactor_smiles': 'OC(=O)\\\\C=C\\\\C1=CC=C(C=C1)C(O)=O',\n", " 'cofactor_chem_comp_id': 'CIN',\n", " 'cofactor_name': '4-Carboxycinnamic Acid',\n", " 'cofactor_resource_id': 'DB02126',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [10592235],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'TZRXHJWUDPFEEY-UHFFFAOYSA-N',\n", " 'cofactor_smiles': '[O-][N+](=O)OCC(CO[N+]([O-])=O)(CO[N+]([O-])=O)CO[N+]([O-])=O',\n", " 'cofactor_name': 'Pentaerythritol tetranitrate',\n", " 'cofactor_resource_id': 'DB06154',\n", " 'mechanism_of_action': 'Pentaerythritol tetranitrate is the lipid soluble polyol ester of nitric acid belonging to the family of _nitro-vasodilators_. Pentaerythritol tetranitrate releases free nitric oxide (NO) after the denitration reaction, which triggers NO-dependent signaling transduction involving soluble _guanylate cyclase (sGC_). Nitric oxide binds reversibly to the ferrous-heme center of sGC, causing conformational change and activating the enzyme. This enzyme activation results in increased cellular concentrations of _cyclic guanosine monophosphate _(cGMP) within the vascular smooth muscle, resulting in vasodilation mediated by cGMP-dependent protein kinases. Additionally, this agent causes dose-dependent arterial and venous bed [L2393].',\n", " 'neighbor_flag': 'N',\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'AVXQTLSOXWQOHO-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'COC1=C(OCC2=NC=CC=C2)C=C(C)C=C1',\n", " 'cofactor_chem_comp_id': 'B77',\n", " 'cofactor_name': '2-[(2-methoxy-5-methylphenoxy)methyl]pyridine',\n", " 'cofactor_resource_id': 'DB07427',\n", " 'neighbor_flag': 'Y',\n", " 'pubmed_ids': [10592235],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'MBHBRRBLXCXQKV-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'COC1=CC(OCC2=CC=NC=C2)=C(C)C=C1',\n", " 'cofactor_chem_comp_id': 'B78',\n", " 'cofactor_name': '4-[(5-methoxy-2-methylphenoxy)methyl]pyridine',\n", " 'cofactor_resource_id': 'DB07428',\n", " 'neighbor_flag': 'Y',\n", " 'pubmed_ids': [10592235],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'CXMXRPHRNRROMY-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'OC(=O)CCCCCCCCC(O)=O',\n", " 'cofactor_chem_comp_id': 'DEC',\n", " 'cofactor_name': 'Sebacic acid',\n", " 'cofactor_resource_id': 'DB07645',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [10592235],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'OYJPTSMWFKGZJM-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'CC(C)(OC1=CC=C(NC(=O)NC2=CC(Cl)=CC(Cl)=C2)C=C1)C(O)=O',\n", " 'cofactor_chem_comp_id': 'L35',\n", " 'cofactor_name': '2-[4-({[(3,5-DICHLOROPHENYL)AMINO]CARBONYL}AMINO)PHENOXY]-2-METHYLPROPANOIC ACID',\n", " 'cofactor_resource_id': 'DB08077',\n", " 'neighbor_flag': 'Y',\n", " 'pubmed_ids': [10592235],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'RXOHFPCZGPKIRD-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'OC(=O)C1=CC2=CC=C(C=C2C=C1)C(O)=O',\n", " 'cofactor_chem_comp_id': 'NDD',\n", " 'cofactor_name': '2,6-dicarboxynaphthalene',\n", " 'cofactor_resource_id': 'DB08262',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [10592235],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'BNFRJXLZYUTIII-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'CC1=CC(NC(=O)CC2=CC=C(OC(C)(C)C(O)=O)C=C2)=CC(C)=C1',\n", " 'cofactor_chem_comp_id': 'RQ3',\n", " 'cofactor_name': 'Efaproxiral',\n", " 'cofactor_resource_id': 'DB08486',\n", " 'neighbor_flag': 'Y',\n", " 'pubmed_ids': [10592235],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'QMKYBPDZANOJGF-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'OC(=O)C1=CC(=CC(=C1)C(O)=O)C(O)=O',\n", " 'cofactor_chem_comp_id': 'TMM',\n", " 'cofactor_name': 'Trimesic acid',\n", " 'cofactor_resource_id': 'DB08632',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [10592235],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'IOVCWXUNBOPUCH-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'ON=O',\n", " 'cofactor_chem_comp_id': '2NO',\n", " 'cofactor_name': 'Nitrous acid',\n", " 'cofactor_resource_id': 'DB09112',\n", " 'mechanism_of_action': 'Cyanide has a high affinity for the oxidized form of iron (Fe3+) such as that found in cytochrome oxidase a3 [A19441]. Cyanide binds to and inhibits cytochrome oxidase a3, preventing oxidative phophorylation from occuring. The resultant lack of ATP cannot support normal cellular processes, particularly in the brain. Compensatory increases in anaerobic respiration result in rising levels of lactic acid and subsequent acidosis. \\r\\n\\r\\nNitrite primarily acts by oxidizing hemoglobin to methemoglobin [A19440]. The now oxidized Fe3+ in methemoglobin also binds cyanide with high affinity and accepts cyanide from cytochrome a3. This leaves cytochrome a3 free to resume its function in oxidative phosphorylation. The slow dissociation of cyanide from methemoglobin allows hepatic enzymes such as rhodanese to detoxify the compound without further systemic toxicity occuring. Methemoglobin is reduced back to hemoglobin by methemoglobin reductase allowing the affected blood cells to resume normal functioning.\\r\\n\\r\\nThe reduction of nitrite by hemoglobin results in the formation of nitric oxide [A19442]. Nitric oxide acts as a powerful vasodilator, producing vascular smooth muscle relaxation through activation of soluble guanylate cyclase and the subsequent cyclic guanylyl triphosphate mediated signalling cascade [A19443].',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [1569239],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'RYGMFSIKBFXOCR-UHFFFAOYSA-N',\n", " 'cofactor_smiles': '[Cu]',\n", " 'cofactor_name': 'Copper',\n", " 'cofactor_resource_id': 'DB09130',\n", " 'mechanism_of_action': \"Copper is absorbed from the gut via high affinity copper uptake protein and likely through low affinity copper uptake protein and natural resistance-associated macrophage protein-2 [A19528]. It is believed that copper is reduced to the Cu1+ form prior to transport. Once inside the enterocyte, it is bound to copper transport protein ATOX1 which shuttles the ion to copper transporting ATPase-1 on the golgi membrane which take up copper into the golgi apparatus. Once copper has been secreted by enterocytes into the systemic circulation it remain largely bound by ceruloplasmin (65-90%), albumin (18%), and alpha 2-macroglobulin (12%). \\r\\n\\r\\nCopper is an essential element in the body and is incorporated into many oxidase enzymes as a cofactor [A19518]. It is also a component of zinc/copper super oxide dismutase, giving it an anti-oxidant role. Copper defiency occurs in Occipital Horn Syndrome and Menke's disease both of which are associated with impaired development of connective tissue due to the lack of copper to act as a cofactor in protein-lysine-6-oxidase. Menke's disease is also associated with progressive neurological impairment leading to death in infancy. The precise mechanisms of the effects of copper deficiency are vague due to the wide range of enzymes which use the ion as a cofactor.\\r\\n\\r\\nCopper appears to reduce the viabilty and motility of spermatozoa [A19526]. This reduces the likelihood of fertilization with a copper IUD, producing copper's contraceptive effect [A19526]. The exact mechanism of copper's effect on sperm are unknown.\",\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [23896426],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'MYMOFIZGZYHOMD-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'O=O',\n", " 'cofactor_chem_comp_id': 'OXY',\n", " 'cofactor_name': 'Oxygen',\n", " 'cofactor_resource_id': 'DB09140',\n", " 'mechanism_of_action': 'Oxygen therapy increases the arterial pressure of oxygen and is effective in improving gas exchange and oxygen delivery to tissues, provided that there are functional alveolar units. Oxygen plays a critical role as an electron acceptor during oxidative phosphorylation in the electron transport chain through activation of cytochrome c oxidase (terminal enzyme of the electron transport chain). This process achieves successful aerobic respiration in organisms to generate ATP molecules as an energy source in many tissues. Oxygen supplementation acts to restore normal cellular activity at the mitochondrial level and reduce metabolic acidosis. There is also evidence that oxygen may interact with O2-sensitive voltage-gated potassium channels in glomus cells and cause hyperpolarization of mitochondrial membrane [A19120]. ',\n", " 'neighbor_flag': 'Y',\n", " 'pubmed_ids': [1634355],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'CADNYOZXMIKYPR-UHFFFAOYSA-B',\n", " 'cofactor_smiles': '[Fe+3].[Fe+3].[Fe+3].[Fe+3].[O-]P([O-])(=O)OP([O-])([O-])=O.[O-]P([O-])(=O)OP([O-])([O-])=O.[O-]P([O-])(=O)OP([O-])([O-])=O',\n", " 'cofactor_name': 'Ferric pyrophosphate',\n", " 'cofactor_resource_id': 'DB09147',\n", " 'mechanism_of_action': 'The usage of ferric pyrophosphate is based on the strong complex formation between these two species. Besides, the capacity of pyrophosphate to trigger iron removal from transferrin, enhance iron transfer from transferrin to ferritin and promote iron exchange between transferrin molecules. These properties make it a very suitable compound for parenteral administration, iron delivery into circulation and incorporation into hemoglobin.[A31979]',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [10231452],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'MQBDAEHWGRMADS-XNHLMZCASA-M',\n", " 'cofactor_smiles': '[O--].[O--].[O--].[Na+].[Fe+3].[Fe+3].OC[C@@H](O)[C@@H](O)[C@H](O)[C@@H](O)C([O-])=O.OC[C@H]1O[C@@](CO)(O[C@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@@H](O)[C@@H]1O.OC[C@H]1O[C@@](CO)(O[C@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@@H](O)[C@@H]1O.OC[C@H]1O[C@@](CO)(O[C@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@@H](O)[C@@H]1O.OC[C@H]1O[C@@](CO)(O[C@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@@H](O)[C@@H]1O.OC[C@H]1O[C@@](CO)(O[C@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@@H](O)[C@@H]1O',\n", " 'cofactor_name': 'Sodium ferric gluconate complex',\n", " 'cofactor_resource_id': 'DB09517',\n", " 'mechanism_of_action': 'The complex is endocytosed by macrophages of the reticuloendothelial system. Within an endosome of the macrophage , lysosome fuses with the endosome creating an acidic environment leading to the cleavage of the complex from iron. Iron is then incorporated in ferritin, transferrin or hemoglobin. Sodium ferric gluconate also normalizes RBC production by binding with hemoglobin ',\n", " 'neighbor_flag': 'N',\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'BAUYGSIQEAFULO-UHFFFAOYSA-L',\n", " 'cofactor_smiles': '[Fe++].[O-]S([O-])(=O)=O',\n", " 'cofactor_name': 'Ferrous sulfate anhydrous',\n", " 'cofactor_resource_id': 'DB13257',\n", " 'mechanism_of_action': 'Iron is required to maintain optimal health, particularly for helping to form red blood cells (RBC) that carry oxygen around the body. A deficiency in iron indicates that the body cannot produce enough normal red blood cells.[A32514,L11800] Iron deficiency anemia occurs when body stores of iron decrease to very low levels, and the stored iron is insufficient to support normal red blood cell (RBC) production. Insufficient dietary iron, impaired iron absorption, bleeding, pregnancy, or loss of iron through the urine can lead to iron deficiency.[A32514,L11794] Symptoms of iron deficiency anemia include fatigue, breathlessness, palpitations, dizziness, and headache.\\r\\n\\r\\nTaking iron in supplement form, such as ferrous sulfate, allows for more rapid increases in iron levels when dietary supply and stores are not sufficient.[L2175] Iron is transported by the divalent metal transporter 1 (DMT1) across the endolysosomal membrane to enter the macrophage. It can then can be incorporated into ferritin and be stored in the macrophage or carried of the macrophage by ferroportin. This exported iron is oxidized by the enzyme to ceruloplasmin to Fe3+, followed by sequestration by transferrin for transport in the serum to various sites, including the bone marrow for hemoglobin synthesis or into the liver.[A32524] Iron combines with porphyrin and globin chains to form hemoglobin, which is critical for oxygen delivery from the lungs to other tissues.[L2263]',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [24310424, 21694802, 18954837],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'SXAWSYZURCZSDX-UHFFFAOYSA-B',\n", " 'cofactor_smiles': '[Fe+3].[Fe+3].[Fe+3].[Fe+3].OP(O)(=O)OP(O)(O)=O.OP(O)(=O)OP(O)(O)=O.OP([O-])(=O)OP([O-])([O-])=O.OC(CC([O-])=O)(CC([O-])=O)C([O-])=O.OC(CC([O-])=O)(CC([O-])=O)C([O-])=O.OC(CC([O-])=O)(CC([O-])=O)C([O-])=O',\n", " 'cofactor_name': 'Ferric pyrophosphate citrate',\n", " 'cofactor_resource_id': 'DB13995',\n", " 'mechanism_of_action': 'The usage of ferric pyrophosphate is based on the strong complex formation between these two species. Besides, the capacity of pyrophosphate to trigger iron removal from transferrin, enhance iron transfer from transferrin to ferritin and promote iron exchange between transferrin molecules. These properties make it a very suitable compound for parenteral administration, iron delivery into circulation and incorporation into hemoglobin.[A31979]',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [10231452],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'DJWUNCQRNNEAKC-UHFFFAOYSA-L',\n", " 'cofactor_smiles': '[Zn++].CC([O-])=O.CC([O-])=O',\n", " 'cofactor_name': 'Zinc acetate',\n", " 'cofactor_resource_id': 'DB14487',\n", " 'mechanism_of_action': '**Zinc has three primary biological roles**: _catalytic_, _structural_, and _regulatory_. The catalytic and structural role of zinc is well established, and there are various noteworthy reviews on these functions. For example, zinc is a structural constituent in numerous proteins, inclusive of growth factors, cytokines, receptors, enzymes, and transcription factors for different cellular signaling pathways. It is implicated in numerous cellular processes as a cofactor for approximately 3000 human proteins including enzymes, nuclear factors, and hormones [L2096].\\r\\n\\r\\nZinc promotes resistance to epithelial apoptosis through cell protection (cytoprotection) against reactive oxygen species and bacterial toxins, likely through the antioxidant activity of the cysteine-rich metallothioneins [A32419].\\r\\n\\r\\nIn HL-60 cells (promyelocytic leukemia cell line), zinc enhances the up-regulation of A20 mRNA, which, via TRAF pathway, decreases NF-kappaB activation, leading to decreased gene expression and generation of tumor necrosis factor-alpha (TNF-alpha), IL-1beta, and IL-8 [A32418].\\r\\n\\r\\nThere are several mechanisms of action of zinc on acute diarrhea. Various mechanisms are specific to the gastrointestinal system: zinc restores mucosal barrier integrity and enterocyte brush-border enzyme activity, it promotes the production of antibodies and circulating lymphocytes against intestinal pathogens, and has a direct effect on ion channels, acting as a potassium channel blocker of adenosine 3-5-cyclic monophosphate-mediated chlorine secretion. Cochrane researchers examined the evidence available up to 30 September 2016 [L2106].\\r\\n\\r\\nZinc deficiency in humans decreases the activity of serum _thymulin_ (a hormone of the thymus), which is necessary for the maturation of T-helper cells. T-helper 1 (Th(1)) cytokines are decreased but T-helper 2 (Th(2)) cytokines are not affected by zinc deficiency in humans [A32417].\\r\\n\\r\\nThe change of _Th(1)_ to _Th(2)_ function leads to cell-mediated immune dysfunction. Because IL-2 production (Th(1) cytokine) is decreased, this causes decreased activity of natural-killer-cell (NK cell) and T cytolytic cells, normally involved in killing viruses, bacteria, and malignant cells [A32417]. \\r\\n\\r\\nIn humans, zinc deficiency may lead to the generation of new CD4+ T cells, produced in the thymus. In cell culture studies (HUT-78, a Th(0) human malignant lymphoblastoid cell line), as a result of zinc deficiency, nuclear factor-kappaB (NF-kappaB) activation, phosphorylation of IkappaB, and binding of NF-kappaB to DNA are decreased and this results in decreased Th(1) cytokine production [A32417].\\r\\n\\r\\nIn another study, zinc supplementation in human subjects suppressed the gene expression and production of pro-inflammatory cytokines and decreased oxidative stress markers [A32417]. In HL-60 cells (a human pro-myelocytic leukemia cell line), zinc deficiency increased the levels of TNF-alpha, IL-1beta, and IL-8 cytokines and mRNA. In such cells, zinc was found to induce A20, a zinc finger protein that inhibited NF-kappaB activation by the tumor necrosis factor receptor-associated factor pathway. This process decreased gene expression of pro-inflammatory cytokines and oxidative stress markers [A32417].\\r\\n\\r\\nThe exact mechanism of zinc in acne treatment is poorly understood. However, zinc is considered to act directly on microbial inflammatory equilibrium and facilitate antibiotic absorption when used in combination with other agents. Topical zinc alone as well as in combination with other agents may be efficacious because of its anti-inflammatory activity and ability to reduce P. acnes bacteria by the inhibition of P. acnes lipases and free fatty acid levels [L2102].',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [23896426],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'VRIVJOXICYMTAG-IYEMJOQQSA-L',\n", " 'cofactor_smiles': '[Fe++].[H][C@@](O)(CO)[C@@]([H])(O)[C@]([H])(O)[C@@]([H])(O)C([O-])=O.[H][C@@](O)(CO)[C@@]([H])(O)[C@]([H])(O)[C@@]([H])(O)C([O-])=O',\n", " 'cofactor_name': 'Ferrous gluconate',\n", " 'cofactor_resource_id': 'DB14488',\n", " 'mechanism_of_action': 'Iron is necessary for the production of hemoglobin. Iron-deficiency can lead to decreased production of hemoglobin and a microcytic, hypochromic anemia.',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [16901899],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'MDXRFOWKIZPNTA-UHFFFAOYSA-L',\n", " 'cofactor_smiles': '[Fe++].[O-]C(=O)CCC([O-])=O',\n", " 'cofactor_name': 'Ferrous succinate',\n", " 'cofactor_resource_id': 'DB14489',\n", " 'mechanism_of_action': 'Iron is necessary for the production of hemoglobin. Iron-deficiency can lead to decreased production of hemoglobin and a microcytic, hypochromic anemia.',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [16901899],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'RFBYLSCVRUTUSB-ZZMNMWMASA-L',\n", " 'cofactor_smiles': '[Fe++].[H][C@](O)(CO)[C@@]1([H])OC(=O)C(O)=C1O.[H][C@](O)(CO)[C@@]1([H])OC(=O)C([O-])=C1[O-]',\n", " 'cofactor_name': 'Ferrous ascorbate',\n", " 'cofactor_resource_id': 'DB14490',\n", " 'mechanism_of_action': 'Iron is necessary for the production of hemoglobin. Iron-deficiency can lead to decreased production of hemoglobin and a microcytic, hypochromic anemia.',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [16901899],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'PMVSDNDAUGGCCE-TYYBGVCCSA-L',\n", " 'cofactor_smiles': '[Fe++].[H]\\\\C(=C(\\\\[H])C([O-])=O)C([O-])=O',\n", " 'cofactor_name': 'Ferrous fumarate',\n", " 'cofactor_resource_id': 'DB14491',\n", " 'mechanism_of_action': 'Iron is necessary for the production of hemoglobin. Iron-deficiency can lead to decreased production of hemoglobin and a microcytic, hypochromic anemia.',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [16901899],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'YJYOLOWXCPIBSY-UHFFFAOYSA-L',\n", " 'cofactor_smiles': '[Fe++].NCC([O-])=O.NCC([O-])=O.OS(O)(=O)=O',\n", " 'cofactor_name': 'Ferrous glycine sulfate',\n", " 'cofactor_resource_id': 'DB14501',\n", " 'mechanism_of_action': 'Iron is necessary for the production of hemoglobin. Iron-deficiency can lead to decreased production of hemoglobin and a microcytic, hypochromic anemia.',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [16901899],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'JIAARYAFYJHUJI-UHFFFAOYSA-L',\n", " 'cofactor_smiles': '[Cl-].[Cl-].[Zn++]',\n", " 'cofactor_name': 'Zinc chloride',\n", " 'cofactor_resource_id': 'DB14533',\n", " 'mechanism_of_action': 'Zinc performs catalytic, structural, and regulatory roles in the body. Zinc is a component of approximately 3000 human proteins.[A204104]\\r\\n\\r\\nZinc is cytoprotective against reactive oxygen species mediated apoptosis through the action of metallothioneins.[A32419]\\r\\n\\r\\nIn a promyelocytic leukemia cell line, zinc enhances the up-regulation of A20 mRNA, which, via the TRAF pathway, decreases NF-kappaB activation, leading to decreased gene expression and generation of TNF-α, IL-1β, and IL-8 [A32418].\\r\\n\\r\\nIn patients with diarrhea, zinc restores mucosal barrier integrity, restores enterocyte brush-border enzyme activity, promotes the production of antibodies, and promotes the production of circulating lymphocytes against intestinal pathogens.[A204101] Zinc also directly affects ion channels as a potassium channel blocker of cAMP-mediated chlorine secretion.[A204101]\\r\\n\\r\\nZinc deficiency decreases thymulin, inhibiting T-helper cell maturation and decreased Th-1 cytokines like IL-2.[A32417] Decreased IL-2 decreases the activity of NK cells and CD8+ T cells.[A32417] Zinc deficiency also leads to the generation of CD4+ T cells, decreased NF-κB activation, decreased phosphorylation of IκB, and decreased binding of NF-κB to DNA.[A32417]',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [646791],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_name': 'Zinc sulfate, unspecified form',\n", " 'cofactor_resource_id': 'DB14548',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [646791],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'FWCVZAQENIZVMY-UHFFFAOYSA-N',\n", " 'cofactor_smiles': 'CC(C)N1N=CC=C1C1=C(COC2=CC=CC(O)=C2C=O)C=CC=N1',\n", " 'cofactor_name': 'Voxelotor',\n", " 'cofactor_resource_id': 'DB14975',\n", " 'mechanism_of_action': 'Deoxygenated sickle hemoglobin (HbS) polymerization is the causal factor for sickle cell disease. The genetic mutation associated with this disease leads to the formation of abnormal, sickle shaped red blood cells that aggregate and block blood vessels throughout the body, causing vaso-occlusive crises.[T734] Voxelotor binds irreversibly with the N‐terminal valine of the α‐chain of hemoglobin, leading to an allosteric modification of Hb20, which increases the affinity for oxygen. Oxygenated HbS does not polymerize.[A188126,A188129] By directly blocking HbS polymerization, voxelotor can successfully treat sickle cell disease by preventing the formation of abnormally shaped cells, which eventually cause lack of oxygenation and blood flow to organs.[A188123,A188138,T734]',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [30655275],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'cofactor_in_ch_ikey': 'JTQTXQSGPZRXJF-DOJSGGEQSA-N',\n", " 'cofactor_smiles': '[Fe+3].OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO[C@H]1O[C@H](CO[C@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@@H](O)[C@H](O)[C@H]1O',\n", " 'cofactor_name': 'Ferric derisomaltose',\n", " 'cofactor_resource_id': 'DB15617',\n", " 'mechanism_of_action': 'This drug is a complex made of iron (III) hydroxide and derisomaltose, which is an iron carbohydrate oligosaccharide that works to releases iron. The released iron then binds to the transport protein, transferrin, and is taken to erythroid precursor cells[A190528] for incorporation into the hemoglobin molecule.[L11581,L11617]',\n", " 'neighbor_flag': 'N',\n", " 'pubmed_ids': [29261547, 24310424],\n", " 'resource_name': 'DrugBank',\n", " 'resource_version': '5.1',\n", " 'target_resource_id': 'P69905'},\n", " {'binding_assay_value': 5.09,\n", " 'binding_assay_value_type': 'pEC50',\n", " 'cofactor_smiles': 'CC(C)N1N=CC=C1C1=C(COC2=C(C=O)C(O)=CC=C2)C=CC=N1',\n", " 'cofactor_resource_id': 'CHEMBL4101807',\n", " 'mechanism_of_action': 'Voxelotor is a hemoglobin S (HbS) polymerization inhibitor that binds to HbS with a 1:1 stoichiometry and exhibits preferential partitioning to red blood cells (RBCs). By increasing the affinity of Hb for oxygen, voxelotor demonstrates dose-dependent inhibition of HbS polymerization. Nonclinical studies suggest that voxelotor may inhibit RBC sickling, improve RBC deformability, and reduce whole blood viscosity.',\n", " 'neighbor_flag': 'N',\n", " 'resource_name': 'Pharos',\n", " 'resource_version': '6.13.4',\n", " 'target_resource_id': '6231'}],\n", " 'rcsb_id': '4HHB_1',\n", " 'rcsb_polymer_entity_group_membership': [{'group_id': '47_70',\n", " 'aggregation_method': 'sequence_identity',\n", " 'similarity_cutoff': 70.0,\n", " 'aligned_regions': [{'entity_beg_seq_id': 1,\n", " 'length': 46,\n", " 'ref_beg_seq_id': 5},\n", " {'entity_beg_seq_id': 47, 'length': 69, 'ref_beg_seq_id': 52},\n", " {'entity_beg_seq_id': 116, 'length': 26, 'ref_beg_seq_id': 122}]},\n", " {'group_id': '31_30',\n", " 'aggregation_method': 'sequence_identity',\n", " 'similarity_cutoff': 30.0,\n", " 'aligned_regions': [{'entity_beg_seq_id': 1,\n", " 'length': 31,\n", " 'ref_beg_seq_id': 31},\n", " {'entity_beg_seq_id': 32, 'length': 15, 'ref_beg_seq_id': 64},\n", " {'entity_beg_seq_id': 47, 'length': 4, 'ref_beg_seq_id': 81},\n", " {'entity_beg_seq_id': 51, 'length': 10, 'ref_beg_seq_id': 90},\n", " {'entity_beg_seq_id': 61, 'length': 41, 'ref_beg_seq_id': 101},\n", " {'entity_beg_seq_id': 102, 'length': 14, 'ref_beg_seq_id': 143},\n", " {'entity_beg_seq_id': 116, 'length': 26, 'ref_beg_seq_id': 158}]},\n", " {'group_id': 'P69905',\n", " 'aggregation_method': 'matching_uniprot_accession',\n", " 'aligned_regions': [{'entity_beg_seq_id': 1,\n", " 'length': 141,\n", " 'ref_beg_seq_id': 2}]},\n", " {'group_id': '20_50',\n", " 'aggregation_method': 'sequence_identity',\n", " 'similarity_cutoff': 50.0,\n", " 'aligned_regions': [{'entity_beg_seq_id': 1,\n", " 'length': 46,\n", " 'ref_beg_seq_id': 5},\n", " {'entity_beg_seq_id': 47, 'length': 4, 'ref_beg_seq_id': 52},\n", " {'entity_beg_seq_id': 51, 'length': 51, 'ref_beg_seq_id': 61},\n", " {'entity_beg_seq_id': 102, 'length': 14, 'ref_beg_seq_id': 113},\n", " {'entity_beg_seq_id': 116, 'length': 26, 'ref_beg_seq_id': 128}]},\n", " {'group_id': '96_90',\n", " 'aggregation_method': 'sequence_identity',\n", " 'similarity_cutoff': 90.0,\n", " 'aligned_regions': [{'entity_beg_seq_id': 1,\n", " 'length': 141,\n", " 'ref_beg_seq_id': 5}]},\n", " {'group_id': '105_100',\n", " 'aggregation_method': 'sequence_identity',\n", " 'similarity_cutoff': 100.0,\n", " 'aligned_regions': [{'entity_beg_seq_id': 1,\n", " 'length': 141,\n", " 'ref_beg_seq_id': 5}]},\n", " {'group_id': '112_95',\n", " 'aggregation_method': 'sequence_identity',\n", " 'similarity_cutoff': 95.0,\n", " 'aligned_regions': [{'entity_beg_seq_id': 1,\n", " 'length': 141,\n", " 'ref_beg_seq_id': 5}]}],\n", " 'rcsb_genomic_lineage': [{'id': '9606', 'name': 'Homo sapiens', 'depth': 0},\n", " {'id': '9606:16', 'name': 'Chromosome 16', 'depth': 1},\n", " {'id': '9606:16:hemoglobin_subunit_alpha_2',\n", " 'name': 'hemoglobin subunit alpha 2',\n", " 'depth': 2},\n", " {'id': '9606:16:hemoglobin_subunit_alpha_1',\n", " 'name': 'hemoglobin subunit alpha 1',\n", " 'depth': 2}],\n", " 'rcsb_cluster_flexibility': {'link': 'http://pdbflex.org/cluster.html#!/1babA/252/4hhbA',\n", " 'label': 'Low',\n", " 'avg_rmsd': 0.738,\n", " 'max_rmsd': 4.392,\n", " 'provenance_code': 'PDBFlex'},\n", " 'rcsb_latest_revision': {'major_revision': 4, 'minor_revision': 1}}" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = requests.get('https://data.rcsb.org/rest/v1/core/polymer_entity/4hhb/1')\n", "json.loads(response.text)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "{'rcsb_ligand_neighbors': [{'atom_id': 'NE2',\n", " 'auth_seq_id': 87,\n", " 'comp_id': 'HIS',\n", " 'distance': 2.143,\n", " 'ligand_asym_id': 'E',\n", " 'ligand_atom_id': 'FE',\n", " 'ligand_comp_id': 'HEM',\n", " 'ligand_entity_id': '3',\n", " 'ligand_is_bound': 'Y',\n", " 'ligand_model_id': 1,\n", " 'seq_id': 87}],\n", " 'rcsb_polymer_entity_instance_container_identifiers': {'asym_id': 'A',\n", " 'auth_asym_id': 'A',\n", " 'auth_to_entity_poly_seq_mapping': ['1',\n", " '2',\n", " '3',\n", " '4',\n", " '5',\n", " '6',\n", " '7',\n", " '8',\n", " '9',\n", " '10',\n", " '11',\n", " '12',\n", " '13',\n", " '14',\n", " '15',\n", " '16',\n", " '17',\n", " '18',\n", " '19',\n", " '20',\n", " '21',\n", " '22',\n", " '23',\n", " '24',\n", " '25',\n", " '26',\n", " '27',\n", " '28',\n", " '29',\n", " '30',\n", " '31',\n", " '32',\n", " '33',\n", " '34',\n", " '35',\n", " '36',\n", " '37',\n", " '38',\n", " '39',\n", " '40',\n", " '41',\n", " '42',\n", " '43',\n", " '44',\n", " '45',\n", " '46',\n", " '47',\n", " '48',\n", " '49',\n", " '50',\n", " '51',\n", " '52',\n", " '53',\n", " '54',\n", " '55',\n", " '56',\n", " '57',\n", " '58',\n", " '59',\n", " '60',\n", " '61',\n", " '62',\n", " '63',\n", " '64',\n", " '65',\n", " '66',\n", " '67',\n", " '68',\n", " '69',\n", " '70',\n", " '71',\n", " '72',\n", " '73',\n", " '74',\n", " '75',\n", " '76',\n", " '77',\n", " '78',\n", " '79',\n", " '80',\n", " '81',\n", " '82',\n", " '83',\n", " '84',\n", " '85',\n", " '86',\n", " '87',\n", " '88',\n", " '89',\n", " '90',\n", " '91',\n", " '92',\n", " '93',\n", " '94',\n", " '95',\n", " '96',\n", " '97',\n", " '98',\n", " '99',\n", " '100',\n", " '101',\n", " '102',\n", " '103',\n", " '104',\n", " '105',\n", " '106',\n", " '107',\n", " '108',\n", " '109',\n", " '110',\n", " '111',\n", " '112',\n", " '113',\n", " '114',\n", " '115',\n", " '116',\n", " '117',\n", " '118',\n", " '119',\n", " '120',\n", " '121',\n", " '122',\n", " '123',\n", " '124',\n", " '125',\n", " '126',\n", " '127',\n", " '128',\n", " '129',\n", " '130',\n", " '131',\n", " '132',\n", " '133',\n", " '134',\n", " '135',\n", " '136',\n", " '137',\n", " '138',\n", " '139',\n", " '140',\n", " '141'],\n", " 'entity_id': '1',\n", " 'entry_id': '4HHB',\n", " 'rcsb_id': '4HHB.A'},\n", " 'rcsb_polymer_instance_annotation': [{'annotation_id': '1.10.490.10',\n", " 'assignment_version': 'v4_2_0',\n", " 'name': 'Globins',\n", " 'ordinal': 1,\n", " 'provenance_source': 'CATH',\n", " 'type': 'CATH',\n", " 'annotation_lineage': [{'depth': 1, 'id': '1', 'name': 'Mainly Alpha'},\n", " {'depth': 2, 'id': '1.10', 'name': 'Orthogonal Bundle'},\n", " {'depth': 3, 'id': '1.10.490', 'name': 'Globin-like'},\n", " {'depth': 4, 'id': '1.10.490.10', 'name': 'Globins'}]},\n", " {'annotation_id': 'd4hhba_',\n", " 'assignment_version': '2.08-stable',\n", " 'name': 'Hemoglobin, alpha-chain',\n", " 'ordinal': 5,\n", " 'provenance_source': 'SCOPe',\n", " 'type': 'SCOP',\n", " 'annotation_lineage': [{'depth': 1,\n", " 'id': '46456',\n", " 'name': 'All alpha proteins'},\n", " {'depth': 2, 'id': '46457', 'name': 'Globin-like'},\n", " {'depth': 3, 'id': '46458', 'name': 'Globin-like'},\n", " {'depth': 4, 'id': '46463', 'name': 'Globins'},\n", " {'depth': 5, 'id': '46486', 'name': 'Hemoglobin, alpha-chain'}]},\n", " {'annotation_id': 'e4hhbA1',\n", " 'assignment_version': '1.6',\n", " 'name': 'Globin',\n", " 'ordinal': 9,\n", " 'provenance_source': 'ECOD',\n", " 'type': 'ECOD',\n", " 'annotation_lineage': [{'depth': 1,\n", " 'id': '100006',\n", " 'name': 'A: alpha arrays'},\n", " {'depth': 2, 'id': '200282', 'name': 'X: Globin-like (From Topology)'},\n", " {'depth': 3, 'id': '300615', 'name': 'H: Globin-like (From Topology)'},\n", " {'depth': 4, 'id': '400688', 'name': 'T: Globin-like'},\n", " {'depth': 5, 'id': '504248', 'name': 'F: Globin'}]}],\n", " 'rcsb_polymer_instance_feature': [{'assignment_version': 'v4_2_0',\n", " 'feature_id': '1.10.490.10',\n", " 'name': 'Globins',\n", " 'ordinal': 1,\n", " 'provenance_source': 'CATH',\n", " 'type': 'CATH',\n", " 'feature_positions': [{'beg_seq_id': 1, 'end_seq_id': 141}],\n", " 'additional_properties': [{'name': 'CATH_NAME', 'values': ['Globins']},\n", " {'name': 'CATH_DOMAIN_ID', 'values': ['4hhbA00']}]},\n", " {'assignment_version': '2.08-stable',\n", " 'feature_id': 'd4hhba_',\n", " 'name': 'Hemoglobin, alpha-chain',\n", " 'ordinal': 5,\n", " 'provenance_source': 'SCOPe',\n", " 'type': 'SCOP',\n", " 'feature_positions': [{'beg_seq_id': 1, 'end_seq_id': 141}],\n", " 'additional_properties': [{'name': 'SCOP_NAME',\n", " 'values': ['Hemoglobin', ' alpha-chain']},\n", " {'name': 'SCOP_DOMAIN_ID', 'values': ['d4hhba_']},\n", " {'name': 'SCOP_SUN_ID', 'values': ['46486']}]},\n", " {'assignment_version': '1.6',\n", " 'feature_id': 'e4hhbA1',\n", " 'name': 'Globin',\n", " 'ordinal': 9,\n", " 'provenance_source': 'ECOD',\n", " 'type': 'ECOD',\n", " 'feature_positions': [{'beg_seq_id': 1, 'end_seq_id': 141}],\n", " 'additional_properties': [{'name': 'ECOD_FAMILY_NAME',\n", " 'values': ['Globin']},\n", " {'name': 'ECOD_DOMAIN_ID', 'values': ['e4hhbA1']}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': 'HELX_P1',\n", " 'name': 'helix',\n", " 'ordinal': 13,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'HELIX_P',\n", " 'feature_positions': [{'beg_seq_id': 3, 'end_seq_id': 18}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': 'HELX_P2',\n", " 'name': 'helix',\n", " 'ordinal': 14,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'HELIX_P',\n", " 'feature_positions': [{'beg_seq_id': 20, 'end_seq_id': 35}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': 'HELX_P3',\n", " 'name': 'helix',\n", " 'ordinal': 15,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'HELIX_P',\n", " 'feature_positions': [{'beg_seq_id': 36, 'end_seq_id': 42}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': 'HELX_P4',\n", " 'name': 'helix',\n", " 'ordinal': 16,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'HELIX_P',\n", " 'feature_positions': [{'beg_seq_id': 50, 'end_seq_id': 51}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': 'HELX_P5',\n", " 'name': 'helix',\n", " 'ordinal': 17,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'HELIX_P',\n", " 'feature_positions': [{'beg_seq_id': 52, 'end_seq_id': 71}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': 'HELX_P6',\n", " 'name': 'helix',\n", " 'ordinal': 18,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'HELIX_P',\n", " 'feature_positions': [{'beg_seq_id': 80, 'end_seq_id': 88}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': 'HELX_P7',\n", " 'name': 'helix',\n", " 'ordinal': 19,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'HELIX_P',\n", " 'feature_positions': [{'beg_seq_id': 94, 'end_seq_id': 112}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': 'HELX_P8',\n", " 'name': 'helix',\n", " 'ordinal': 20,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'HELIX_P',\n", " 'feature_positions': [{'beg_seq_id': 118, 'end_seq_id': 138}]},\n", " {'assignment_version': 'V1.0',\n", " 'feature_id': '1',\n", " 'name': 'unassigned secondary structure',\n", " 'ordinal': 45,\n", " 'provenance_source': 'PROMOTIF',\n", " 'type': 'UNASSIGNED_SEC_STRUCT',\n", " 'feature_positions': [{'beg_seq_id': 1, 'end_seq_id': 2},\n", " {'beg_seq_id': 19, 'end_seq_id': 19},\n", " {'beg_seq_id': 43, 'end_seq_id': 49},\n", " {'beg_seq_id': 72, 'end_seq_id': 79},\n", " {'beg_seq_id': 89, 'end_seq_id': 93},\n", " {'beg_seq_id': 113, 'end_seq_id': 117},\n", " {'beg_seq_id': 139, 'end_seq_id': 141}]},\n", " {'assignment_version': 'V1.0',\n", " 'description': 'Software generated binding site for ligand entity 3 component HEM instance E chain A',\n", " 'feature_id': 'AC3',\n", " 'name': 'ligand HEM',\n", " 'ordinal': 50,\n", " 'provenance_source': 'PDB',\n", " 'type': 'BINDING_SITE',\n", " 'feature_positions': [{'beg_seq_id': 42},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 45},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 86},\n", " {'beg_seq_id': 87},\n", " {'beg_seq_id': 91},\n", " {'beg_seq_id': 93},\n", " {'beg_seq_id': 97},\n", " {'beg_seq_id': 98},\n", " {'beg_seq_id': 101},\n", " {'beg_seq_id': 136}]},\n", " {'assignment_version': 'V1.0',\n", " 'description': 'Software generated binding site for ligand entity 3 component HEM instance G chain B',\n", " 'feature_id': 'AC4',\n", " 'name': 'ligand HEM',\n", " 'ordinal': 51,\n", " 'provenance_source': 'PDB',\n", " 'type': 'BINDING_SITE',\n", " 'feature_positions': [{'beg_seq_id': 53}]},\n", " {'assignment_version': 'V1.0',\n", " 'description': 'Molprobity bond angle outlier in instance A model 1',\n", " 'feature_id': 'ANGLE_OUTLIER_1',\n", " 'name': 'Molprobity bond angle outlier',\n", " 'ordinal': 55,\n", " 'provenance_source': 'PDB',\n", " 'type': 'ANGLE_OUTLIER',\n", " 'feature_positions': [{'beg_seq_id': 1},\n", " {'beg_seq_id': 1},\n", " {'beg_seq_id': 1},\n", " {'beg_seq_id': 1},\n", " {'beg_seq_id': 1},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 3},\n", " {'beg_seq_id': 3},\n", " {'beg_seq_id': 3},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 5},\n", " {'beg_seq_id': 5},\n", " {'beg_seq_id': 6},\n", " {'beg_seq_id': 6},\n", " {'beg_seq_id': 6},\n", " {'beg_seq_id': 6},\n", " {'beg_seq_id': 7},\n", " {'beg_seq_id': 7},\n", " {'beg_seq_id': 7},\n", " {'beg_seq_id': 7},\n", " {'beg_seq_id': 7},\n", " {'beg_seq_id': 7},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 10},\n", " {'beg_seq_id': 10},\n", " {'beg_seq_id': 11},\n", " {'beg_seq_id': 11},\n", " {'beg_seq_id': 11},\n", " {'beg_seq_id': 12},\n", " {'beg_seq_id': 12},\n", " {'beg_seq_id': 12},\n", " {'beg_seq_id': 12},\n", " {'beg_seq_id': 13},\n", " {'beg_seq_id': 13},\n", " {'beg_seq_id': 13},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 15},\n", " {'beg_seq_id': 15},\n", " {'beg_seq_id': 15},\n", " {'beg_seq_id': 15},\n", " {'beg_seq_id': 15},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 18},\n", " {'beg_seq_id': 18},\n", " {'beg_seq_id': 18},\n", " {'beg_seq_id': 19},\n", " {'beg_seq_id': 19},\n", " {'beg_seq_id': 19},\n", " {'beg_seq_id': 19},\n", " {'beg_seq_id': 19},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 22},\n", " {'beg_seq_id': 22},\n", " {'beg_seq_id': 22},\n", " {'beg_seq_id': 23},\n", " {'beg_seq_id': 23},\n", " {'beg_seq_id': 23},\n", " {'beg_seq_id': 23},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 25},\n", " {'beg_seq_id': 25},\n", " {'beg_seq_id': 26},\n", " {'beg_seq_id': 26},\n", " {'beg_seq_id': 26},\n", " {'beg_seq_id': 26},\n", " {'beg_seq_id': 26},\n", " {'beg_seq_id': 27},\n", " {'beg_seq_id': 27},\n", " {'beg_seq_id': 28},\n", " {'beg_seq_id': 28},\n", " {'beg_seq_id': 28},\n", " {'beg_seq_id': 29},\n", " {'beg_seq_id': 30},\n", " {'beg_seq_id': 30},\n", " {'beg_seq_id': 30},\n", " {'beg_seq_id': 31},\n", " {'beg_seq_id': 31},\n", " {'beg_seq_id': 31},\n", " {'beg_seq_id': 32},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 34},\n", " {'beg_seq_id': 34},\n", " {'beg_seq_id': 35},\n", " {'beg_seq_id': 36},\n", " {'beg_seq_id': 36},\n", " {'beg_seq_id': 38},\n", " {'beg_seq_id': 39},\n", " {'beg_seq_id': 39},\n", " {'beg_seq_id': 40},\n", " {'beg_seq_id': 41},\n", " {'beg_seq_id': 41},\n", " {'beg_seq_id': 42},\n", " {'beg_seq_id': 42},\n", " {'beg_seq_id': 42},\n", " {'beg_seq_id': 42},\n", " {'beg_seq_id': 42},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 44},\n", " {'beg_seq_id': 44},\n", " {'beg_seq_id': 45},\n", " {'beg_seq_id': 45},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 47},\n", " {'beg_seq_id': 47},\n", " {'beg_seq_id': 47},\n", " {'beg_seq_id': 47},\n", " {'beg_seq_id': 48},\n", " {'beg_seq_id': 48},\n", " {'beg_seq_id': 48},\n", " {'beg_seq_id': 49},\n", " {'beg_seq_id': 49},\n", " {'beg_seq_id': 49},\n", " {'beg_seq_id': 49},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 51},\n", " {'beg_seq_id': 51},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 53},\n", " {'beg_seq_id': 54},\n", " {'beg_seq_id': 54},\n", " {'beg_seq_id': 54},\n", " {'beg_seq_id': 55},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 57},\n", " {'beg_seq_id': 57},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 59},\n", " {'beg_seq_id': 59},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 62},\n", " {'beg_seq_id': 62},\n", " {'beg_seq_id': 62},\n", " {'beg_seq_id': 62},\n", " {'beg_seq_id': 62},\n", " {'beg_seq_id': 63},\n", " {'beg_seq_id': 63},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 65},\n", " {'beg_seq_id': 65},\n", " {'beg_seq_id': 67},\n", " {'beg_seq_id': 67},\n", " {'beg_seq_id': 68},\n", " {'beg_seq_id': 69},\n", " {'beg_seq_id': 70},\n", " {'beg_seq_id': 70},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 73},\n", " {'beg_seq_id': 73},\n", " {'beg_seq_id': 73},\n", " {'beg_seq_id': 73},\n", " {'beg_seq_id': 73},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 76},\n", " {'beg_seq_id': 76},\n", " {'beg_seq_id': 76},\n", " {'beg_seq_id': 77},\n", " {'beg_seq_id': 77},\n", " {'beg_seq_id': 77},\n", " {'beg_seq_id': 77},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 79},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 82},\n", " {'beg_seq_id': 82},\n", " {'beg_seq_id': 82},\n", " {'beg_seq_id': 83},\n", " {'beg_seq_id': 83},\n", " {'beg_seq_id': 83},\n", " {'beg_seq_id': 83},\n", " {'beg_seq_id': 83},\n", " {'beg_seq_id': 83},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 86},\n", " {'beg_seq_id': 86},\n", " {'beg_seq_id': 86},\n", " {'beg_seq_id': 87},\n", " {'beg_seq_id': 87},\n", " {'beg_seq_id': 88},\n", " {'beg_seq_id': 88},\n", " {'beg_seq_id': 88},\n", " {'beg_seq_id': 88},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 93},\n", " {'beg_seq_id': 94},\n", " {'beg_seq_id': 95},\n", " {'beg_seq_id': 96},\n", " {'beg_seq_id': 96},\n", " {'beg_seq_id': 96},\n", " {'beg_seq_id': 98},\n", " {'beg_seq_id': 98},\n", " {'beg_seq_id': 98},\n", " {'beg_seq_id': 98},\n", " {'beg_seq_id': 98},\n", " {'beg_seq_id': 99},\n", " {'beg_seq_id': 99},\n", " {'beg_seq_id': 99},\n", " {'beg_seq_id': 100},\n", " {'beg_seq_id': 101},\n", " {'beg_seq_id': 101},\n", " {'beg_seq_id': 103},\n", " {'beg_seq_id': 103},\n", " {'beg_seq_id': 103},\n", " {'beg_seq_id': 103},\n", " {'beg_seq_id': 104},\n", " {'beg_seq_id': 105},\n", " {'beg_seq_id': 105},\n", " {'beg_seq_id': 105},\n", " {'beg_seq_id': 106},\n", " {'beg_seq_id': 106},\n", " {'beg_seq_id': 106},\n", " {'beg_seq_id': 106},\n", " {'beg_seq_id': 108},\n", " {'beg_seq_id': 109},\n", " {'beg_seq_id': 109},\n", " {'beg_seq_id': 109},\n", " {'beg_seq_id': 110},\n", " {'beg_seq_id': 110},\n", " {'beg_seq_id': 110},\n", " {'beg_seq_id': 111},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 113},\n", " {'beg_seq_id': 113},\n", " {'beg_seq_id': 114},\n", " {'beg_seq_id': 114},\n", " {'beg_seq_id': 116},\n", " {'beg_seq_id': 116},\n", " {'beg_seq_id': 116},\n", " {'beg_seq_id': 117},\n", " {'beg_seq_id': 117},\n", " {'beg_seq_id': 117},\n", " {'beg_seq_id': 117},\n", " {'beg_seq_id': 117},\n", " {'beg_seq_id': 117},\n", " {'beg_seq_id': 118},\n", " {'beg_seq_id': 119},\n", " {'beg_seq_id': 120},\n", " {'beg_seq_id': 122},\n", " {'beg_seq_id': 125},\n", " {'beg_seq_id': 126},\n", " {'beg_seq_id': 126},\n", " {'beg_seq_id': 126},\n", " {'beg_seq_id': 126},\n", " {'beg_seq_id': 127},\n", " {'beg_seq_id': 127},\n", " {'beg_seq_id': 128},\n", " {'beg_seq_id': 128},\n", " {'beg_seq_id': 128},\n", " {'beg_seq_id': 128},\n", " {'beg_seq_id': 129},\n", " {'beg_seq_id': 131},\n", " {'beg_seq_id': 131},\n", " {'beg_seq_id': 131},\n", " {'beg_seq_id': 132},\n", " {'beg_seq_id': 132},\n", " {'beg_seq_id': 132},\n", " {'beg_seq_id': 134},\n", " {'beg_seq_id': 134},\n", " {'beg_seq_id': 134},\n", " {'beg_seq_id': 134},\n", " {'beg_seq_id': 135},\n", " {'beg_seq_id': 136},\n", " {'beg_seq_id': 136},\n", " {'beg_seq_id': 137},\n", " {'beg_seq_id': 137},\n", " {'beg_seq_id': 138},\n", " {'beg_seq_id': 138},\n", " {'beg_seq_id': 138},\n", " {'beg_seq_id': 139},\n", " {'beg_seq_id': 139},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141}]},\n", " {'assignment_version': 'V1.0',\n", " 'description': 'Molprobity bond distance outlier in instance A model 1',\n", " 'feature_id': 'BOND_OUTLIER_2',\n", " 'name': 'Molprobity bond distance outlier',\n", " 'ordinal': 56,\n", " 'provenance_source': 'PDB',\n", " 'type': 'BOND_OUTLIER',\n", " 'feature_positions': [{'beg_seq_id': 1},\n", " {'beg_seq_id': 1},\n", " {'beg_seq_id': 1},\n", " {'beg_seq_id': 1},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 2},\n", " {'beg_seq_id': 3},\n", " {'beg_seq_id': 3},\n", " {'beg_seq_id': 3},\n", " {'beg_seq_id': 3},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 5},\n", " {'beg_seq_id': 5},\n", " {'beg_seq_id': 5},\n", " {'beg_seq_id': 6},\n", " {'beg_seq_id': 7},\n", " {'beg_seq_id': 7},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 8},\n", " {'beg_seq_id': 9},\n", " {'beg_seq_id': 9},\n", " {'beg_seq_id': 10},\n", " {'beg_seq_id': 10},\n", " {'beg_seq_id': 11},\n", " {'beg_seq_id': 11},\n", " {'beg_seq_id': 11},\n", " {'beg_seq_id': 11},\n", " {'beg_seq_id': 11},\n", " {'beg_seq_id': 12},\n", " {'beg_seq_id': 12},\n", " {'beg_seq_id': 12},\n", " {'beg_seq_id': 13},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 14},\n", " {'beg_seq_id': 15},\n", " {'beg_seq_id': 15},\n", " {'beg_seq_id': 15},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 17},\n", " {'beg_seq_id': 18},\n", " {'beg_seq_id': 18},\n", " {'beg_seq_id': 19},\n", " {'beg_seq_id': 19},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 20},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 21},\n", " {'beg_seq_id': 22},\n", " {'beg_seq_id': 22},\n", " {'beg_seq_id': 23},\n", " {'beg_seq_id': 23},\n", " {'beg_seq_id': 23},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 24},\n", " {'beg_seq_id': 25},\n", " {'beg_seq_id': 26},\n", " {'beg_seq_id': 26},\n", " {'beg_seq_id': 26},\n", " {'beg_seq_id': 27},\n", " {'beg_seq_id': 27},\n", " {'beg_seq_id': 27},\n", " {'beg_seq_id': 28},\n", " {'beg_seq_id': 29},\n", " {'beg_seq_id': 29},\n", " {'beg_seq_id': 30},\n", " {'beg_seq_id': 30},\n", " {'beg_seq_id': 30},\n", " {'beg_seq_id': 31},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 33},\n", " {'beg_seq_id': 34},\n", " {'beg_seq_id': 34},\n", " {'beg_seq_id': 35},\n", " {'beg_seq_id': 35},\n", " {'beg_seq_id': 36},\n", " {'beg_seq_id': 36},\n", " {'beg_seq_id': 37},\n", " {'beg_seq_id': 37},\n", " {'beg_seq_id': 37},\n", " {'beg_seq_id': 39},\n", " {'beg_seq_id': 39},\n", " {'beg_seq_id': 40},\n", " {'beg_seq_id': 40},\n", " {'beg_seq_id': 40},\n", " {'beg_seq_id': 41},\n", " {'beg_seq_id': 41},\n", " {'beg_seq_id': 42},\n", " {'beg_seq_id': 42},\n", " {'beg_seq_id': 42},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 43},\n", " {'beg_seq_id': 44},\n", " {'beg_seq_id': 44},\n", " {'beg_seq_id': 44},\n", " {'beg_seq_id': 44},\n", " {'beg_seq_id': 45},\n", " {'beg_seq_id': 45},\n", " {'beg_seq_id': 45},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 46},\n", " {'beg_seq_id': 47},\n", " {'beg_seq_id': 47},\n", " {'beg_seq_id': 47},\n", " {'beg_seq_id': 48},\n", " {'beg_seq_id': 48},\n", " {'beg_seq_id': 48},\n", " {'beg_seq_id': 49},\n", " {'beg_seq_id': 49},\n", " {'beg_seq_id': 49},\n", " {'beg_seq_id': 49},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 50},\n", " {'beg_seq_id': 51},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 53},\n", " {'beg_seq_id': 54},\n", " {'beg_seq_id': 54},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 56},\n", " {'beg_seq_id': 57},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 58},\n", " {'beg_seq_id': 59},\n", " {'beg_seq_id': 59},\n", " {'beg_seq_id': 59},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 60},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 61},\n", " {'beg_seq_id': 62},\n", " {'beg_seq_id': 62},\n", " {'beg_seq_id': 63},\n", " {'beg_seq_id': 63},\n", " {'beg_seq_id': 63},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 64},\n", " {'beg_seq_id': 67},\n", " {'beg_seq_id': 67},\n", " {'beg_seq_id': 68},\n", " {'beg_seq_id': 68},\n", " {'beg_seq_id': 68},\n", " {'beg_seq_id': 69},\n", " {'beg_seq_id': 69},\n", " {'beg_seq_id': 70},\n", " {'beg_seq_id': 70},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 71},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 72},\n", " {'beg_seq_id': 73},\n", " {'beg_seq_id': 73},\n", " {'beg_seq_id': 73},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 74},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 76},\n", " {'beg_seq_id': 76},\n", " {'beg_seq_id': 76},\n", " {'beg_seq_id': 76},\n", " {'beg_seq_id': 76},\n", " {'beg_seq_id': 77},\n", " {'beg_seq_id': 77},\n", " {'beg_seq_id': 77},\n", " {'beg_seq_id': 77},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 78},\n", " {'beg_seq_id': 79},\n", " {'beg_seq_id': 79},\n", " {'beg_seq_id': 79},\n", " {'beg_seq_id': 80},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 81},\n", " {'beg_seq_id': 83},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 85},\n", " {'beg_seq_id': 86},\n", " {'beg_seq_id': 86},\n", " {'beg_seq_id': 87},\n", " {'beg_seq_id': 87},\n", " {'beg_seq_id': 87},\n", " {'beg_seq_id': 88},\n", " {'beg_seq_id': 88},\n", " {'beg_seq_id': 88},\n", " {'beg_seq_id': 89},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 90},\n", " {'beg_seq_id': 91},\n", " {'beg_seq_id': 91},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 92},\n", " {'beg_seq_id': 93},\n", " {'beg_seq_id': 93},\n", " {'beg_seq_id': 94},\n", " {'beg_seq_id': 95},\n", " {'beg_seq_id': 96},\n", " {'beg_seq_id': 96},\n", " {'beg_seq_id': 96},\n", " {'beg_seq_id': 97},\n", " {'beg_seq_id': 98},\n", " {'beg_seq_id': 98},\n", " {'beg_seq_id': 99},\n", " {'beg_seq_id': 99},\n", " {'beg_seq_id': 99},\n", " {'beg_seq_id': 99},\n", " {'beg_seq_id': 99},\n", " {'beg_seq_id': 104},\n", " {'beg_seq_id': 105},\n", " {'beg_seq_id': 105},\n", " {'beg_seq_id': 105},\n", " {'beg_seq_id': 106},\n", " {'beg_seq_id': 106},\n", " {'beg_seq_id': 106},\n", " {'beg_seq_id': 107},\n", " {'beg_seq_id': 107},\n", " {'beg_seq_id': 108},\n", " {'beg_seq_id': 109},\n", " {'beg_seq_id': 109},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 112},\n", " {'beg_seq_id': 113},\n", " {'beg_seq_id': 113},\n", " {'beg_seq_id': 114},\n", " {'beg_seq_id': 114},\n", " {'beg_seq_id': 114},\n", " {'beg_seq_id': 114},\n", " {'beg_seq_id': 115},\n", " {'beg_seq_id': 115},\n", " {'beg_seq_id': 116},\n", " {'beg_seq_id': 117},\n", " {'beg_seq_id': 117},\n", " {'beg_seq_id': 119},\n", " {'beg_seq_id': 120},\n", " {'beg_seq_id': 120},\n", " {'beg_seq_id': 121},\n", " {'beg_seq_id': 122},\n", " {'beg_seq_id': 122},\n", " {'beg_seq_id': 123},\n", " {'beg_seq_id': 127},\n", " {'beg_seq_id': 127},\n", " {'beg_seq_id': 127},\n", " {'beg_seq_id': 128},\n", " {'beg_seq_id': 128},\n", " {'beg_seq_id': 128},\n", " {'beg_seq_id': 129},\n", " {'beg_seq_id': 129},\n", " {'beg_seq_id': 131},\n", " {'beg_seq_id': 131},\n", " {'beg_seq_id': 132},\n", " {'beg_seq_id': 132},\n", " {'beg_seq_id': 134},\n", " {'beg_seq_id': 134},\n", " {'beg_seq_id': 135},\n", " {'beg_seq_id': 137},\n", " {'beg_seq_id': 137},\n", " {'beg_seq_id': 137},\n", " {'beg_seq_id': 137},\n", " {'beg_seq_id': 137},\n", " {'beg_seq_id': 138},\n", " {'beg_seq_id': 138},\n", " {'beg_seq_id': 139},\n", " {'beg_seq_id': 139},\n", " {'beg_seq_id': 139},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 140},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141},\n", " {'beg_seq_id': 141}]},\n", " {'assignment_version': 'V1.0',\n", " 'description': 'Molprobity Ramachandran outlier in instance A model 1',\n", " 'feature_id': 'RAMACHANDRAN_OUTLIER_3',\n", " 'name': 'Molprobity Ramachandran outlier',\n", " 'ordinal': 57,\n", " 'provenance_source': 'PDB',\n", " 'type': 'RAMACHANDRAN_OUTLIER',\n", " 'feature_positions': [{'beg_seq_id': 3},\n", " {'beg_seq_id': 16},\n", " {'beg_seq_id': 21}]},\n", " {'assignment_version': 'V1.0',\n", " 'description': 'Molprobity rotamer outlier in instance A model 1',\n", " 'feature_id': 'ROTAMER_OUTLIER_4',\n", " 'name': 'Molprobity rotamer outlier',\n", " 'ordinal': 58,\n", " 'provenance_source': 'PDB',\n", " 'type': 'ROTAMER_OUTLIER',\n", " 'feature_positions': [{'beg_seq_id': 2},\n", " {'beg_seq_id': 4},\n", " {'beg_seq_id': 45},\n", " {'beg_seq_id': 52},\n", " {'beg_seq_id': 75},\n", " {'beg_seq_id': 84},\n", " {'beg_seq_id': 95},\n", " {'beg_seq_id': 138}]},\n", " {'assignment_version': 'V1.0',\n", " 'description': 'STEREO_OUTLIER in instance A model 1',\n", " 'feature_id': 'STEREO_OUTLIER_5',\n", " 'name': 'STEREO_OUTLIER',\n", " 'ordinal': 59,\n", " 'provenance_source': 'PDB',\n", " 'type': 'STEREO_OUTLIER',\n", " 'feature_positions': [{'beg_seq_id': 137}]},\n", " {'description': 'The Accessible Surface Area considering this polymer entity instance by itself (unbound).',\n", " 'name': 'Unbound ASA',\n", " 'ordinal': 60,\n", " 'provenance_source': 'null-null',\n", " 'type': 'ASA',\n", " 'feature_positions': [{'beg_seq_id': 1,\n", " 'end_seq_id': 141,\n", " 'values': [163.21193608797677,\n", " 18.05226368835482,\n", " 42.73879697266144,\n", " 107.38936955373997,\n", " 66.50730410483254,\n", " 13.991379449108642,\n", " 48.67727291088148,\n", " 85.28624205006685,\n", " 34.520959847459366,\n", " 1.3437094434228118,\n", " 92.863304857998,\n", " 65.47356555544295,\n", " 2.4012047164588566,\n", " 21.923797855183835,\n", " 53.0742572985793,\n", " 120.18688975909083,\n", " 0.7884492781826147,\n", " 43.17696746642822,\n", " 110.12019771123582,\n", " 81.17388093023601,\n", " 7.552498066654208,\n", " 18.85531760246544,\n", " 113.94073377620057,\n", " 14.61770997999541,\n", " 2.340662456112997,\n", " 13.526881152267718,\n", " 39.64487610263428,\n", " 0.2687418886845624,\n", " 3.4936445528993114,\n", " 47.89055402193245,\n", " 87.70392630494936,\n", " 9.092980537616087,\n", " 12.08677005331391,\n", " 135.11954691588275,\n", " 69.80058918801944,\n", " 57.82767547901989,\n", " 74.1190475823852,\n", " 94.89898526921169,\n", " 2.517197343777638,\n", " 87.87390908697569,\n", " 99.65368019290668,\n", " 72.37646394274378,\n", " 31.11039282732865,\n", " 109.87626058150693,\n", " 125.39761701574386,\n", " 47.02814411286198,\n", " 74.27110695000425,\n", " 53.52575065847143,\n", " 60.67111088359101,\n", " 159.16967520998105,\n", " 34.21547012118724,\n", " 12.508562796259,\n", " 86.24288842901149,\n", " 78.15149915147933,\n", " 8.86848232659056,\n", " 127.58639603342641,\n", " 34.84356120059708,\n", " 48.02758024038556,\n", " 0.0,\n", " 139.02760775217652,\n", " 139.50191535791902,\n", " 27.671383083949387,\n", " 1.4780803877650932,\n", " 80.69898396171169,\n", " 21.344669125598724,\n", " 11.690272157778464,\n", " 56.79320179694933,\n", " 54.39928054889856,\n", " 0.2687418886845624,\n", " 12.353238685554334,\n", " 62.651300715445814,\n", " 92.26515943976219,\n", " 11.307510561961573,\n", " 114.67969920017796,\n", " 65.80122989930118,\n", " 9.0673350884663,\n", " 85.31542870245575,\n", " 103.18286998172576,\n", " 27.417676857704905,\n", " 4.416464775424207,\n", " 46.9556009685794,\n", " 78.64429817158087,\n", " 53.89481365368161,\n", " 1.4151354373577678,\n", " 61.48191699173364,\n", " 76.57396599339808,\n", " 34.59920184081555,\n", " 7.68458695809292,\n", " 105.82181535056358,\n", " 158.3769646755231,\n", " 88.65497687905943,\n", " 161.60157203002842,\n", " 23.78365714858377,\n", " 72.80740751677023,\n", " 63.2790122904631,\n", " 70.60291425263443,\n", " 18.667783370602052,\n", " 23.703777757136542,\n", " 146.79132263275392,\n", " 42.72996030084542,\n", " 38.16134819320786,\n", " 15.623877912257104,\n", " 96.23684506711352,\n", " 3.9836073431531753,\n", " 10.212191770013371,\n", " 13.885696272241882,\n", " 53.22958769246928,\n", " 0.1343709443422812,\n", " 5.374837773691248,\n", " 38.00745162220499,\n", " 64.76393381039064,\n", " 65.20523835698864,\n", " 4.702983051979842,\n", " 113.7395461826647,\n", " 84.57884586234685,\n", " 46.23905897632628,\n", " 44.632259997721235,\n", " 62.02522151197134,\n", " 117.39509334128527,\n", " 64.49508887746705,\n", " 20.29001259568446,\n", " 82.72431722336152,\n", " 59.32964328068581,\n", " 2.2659679491812454,\n", " 2.418676998161062,\n", " 70.20040837530826,\n", " 79.8176717179631,\n", " 1.8822412561011748,\n", " 10.400201010685985,\n", " 58.28403453761389,\n", " 29.70257353094256,\n", " 10.346562714355652,\n", " 18.462073140282055,\n", " 84.21967511408428,\n", " 18.140077486207964,\n", " 21.096238261738147,\n", " 34.88131057792261,\n", " 70.15298163597261,\n", " 75.58824103285183,\n", " 56.74619728766633,\n", " 281.2892740261974]}]}],\n", " 'rcsb_polymer_instance_feature_summary': [{'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'BEND'},\n", " {'count': 2,\n", " 'coverage': 0.10638,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'BINDING_SITE'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'C-MANNOSYLATION_SITE'},\n", " {'count': 1,\n", " 'coverage': 1.0,\n", " 'maximum_length': 141,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 141,\n", " 'minimum_value': 0.0,\n", " 'type': 'CATH'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'CIS-PEPTIDE'},\n", " {'count': 1,\n", " 'coverage': 1.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'ECOD'},\n", " {'count': 8,\n", " 'coverage': 0.78014,\n", " 'maximum_length': 21,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 2,\n", " 'minimum_value': 0.0,\n", " 'type': 'HELIX_P'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'HELX_LH_PP_P'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'HELX_RH_3T_P'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'HELX_RH_AL_P'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'HELX_RH_PI_P'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_CONTACT_PROBABILITY'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_DISTANCE'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_ENERGY'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_IPTM'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_NORMALIZED_SCORE'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_OTHER'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_PAE'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_PLDDT'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_PLDDT_ALL-ATOM'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_PLDDT_ALL-ATOM_[0,1]'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_PLDDT_[0,1]'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_PTM'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MA_QA_METRIC_LOCAL_TYPE_ZSCORE'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'MEMBRANE_SEGMENT'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'N-GLYCOSYLATION_SITE'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'O-GLYCOSYLATION_SITE'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'S-GLYCOSYLATION_SITE'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'SABDAB_ANTIBODY_HEAVY_CHAIN_SUBCLASS'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'SABDAB_ANTIBODY_LIGHT_CHAIN_SUBCLASS'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'SABDAB_ANTIBODY_LIGHT_CHAIN_TYPE'},\n", " {'count': 1,\n", " 'coverage': 1.0,\n", " 'maximum_length': 141,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 141,\n", " 'minimum_value': 0.0,\n", " 'type': 'SCOP'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'SCOP2B_SUPERFAMILY'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'SCOP2_FAMILY'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'SCOP2_SUPERFAMILY'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'SHEET'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'STRN'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'TURN_TY1_P'},\n", " {'count': 1,\n", " 'coverage': 0.21986,\n", " 'maximum_length': 31,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 31,\n", " 'minimum_value': 0.0,\n", " 'type': 'UNASSIGNED_SEC_STRUCT'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'UNOBSERVED_ATOM_XYZ'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'UNOBSERVED_RESIDUE_XYZ'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'ZERO_OCCUPANCY_ATOM_XYZ'},\n", " {'count': 0,\n", " 'coverage': 0.0,\n", " 'maximum_length': 0,\n", " 'maximum_value': 0.0,\n", " 'minimum_length': 0,\n", " 'minimum_value': 0.0,\n", " 'type': 'ZERO_OCCUPANCY_RESIDUE_XYZ'},\n", " {'count': 444, 'coverage': 3.14894, 'type': 'ANGLE_OUTLIER'},\n", " {'count': 372, 'coverage': 2.6383, 'type': 'BOND_OUTLIER'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'MOGUL_ANGLE_OUTLIER'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'MOGUL_BOND_OUTLIER'},\n", " {'count': 3, 'coverage': 0.02128, 'type': 'RAMACHANDRAN_OUTLIER'},\n", " {'count': 8, 'coverage': 0.05674, 'type': 'ROTAMER_OUTLIER'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'RSCC_OUTLIER'},\n", " {'count': 0, 'coverage': 0.0, 'type': 'RSRZ_OUTLIER'},\n", " {'count': 1, 'coverage': 0.00709, 'type': 'STEREO_OUTLIER'}],\n", " 'rcsb_polymer_struct_conn': [{'connect_type': 'metal coordination',\n", " 'dist_value': 2.143,\n", " 'id': 'metalc1',\n", " 'ordinal_id': 1,\n", " 'connect_target': {'auth_seq_id': '87',\n", " 'label_asym_id': 'A',\n", " 'label_atom_id': 'NE2',\n", " 'label_comp_id': 'HIS',\n", " 'label_seq_id': 87,\n", " 'symmetry': '1_555'},\n", " 'connect_partner': {'label_asym_id': 'E',\n", " 'label_atom_id': 'FE',\n", " 'label_comp_id': 'HEM',\n", " 'symmetry': '1_555'}}],\n", " 'rcsb_id': '4HHB.A',\n", " 'rcsb_latest_revision': {'major_revision': 4, 'minor_revision': 1}}" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = requests.get('https://data.rcsb.org/rest/v1/core/polymer_entity_instance/4HHB/A')\n", "json.loads(response.text)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# BLAST URLAPI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html](http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html)\n", "\n", "A URLAPI request looks like this:\n", "\n", "`http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=&{=}`\n", "\n", "where CMD can be\n", "\n", "* **PUT** add a job to the queue\n", "* **GET** get formated results from a job\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# BLAST PUT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some common attributes to PUT:\n", "\n", "* DATABASE - what database to search, *mandatory*\n", "* PROGRAM - what program to use (blastn, blastp, blastx, tblastn, tblastx)\n", "* QUERY - Accession(s), gi(s), or FASTA sequence(s), *mandatory*\n", "* MATRIX_NAME - matrix to use (default BLOSUM62)\n" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "response = requests.post('https://www.ncbi.nlm.nih.gov/blast/Blast.cgi',\\\n", " data={'CMD': 'PUT', 'DATABASE': 'nr', 'PROGRAM':'blastp', \\\n", " 'QUERY': 'SQETFSDLWKLLPEN'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=PUT&DATABASE=nr&QUERY=SQETFSDLWKLLPEN&PROGRAM=blastp" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "NCBI Blast\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\t\t\n", "
\n", "
\n", "
\n", "
\n", " \"U.S.\n", "

An official website of the United States government

\n", " \n", "
\n", "
\n", "
\n", "
\n", " \"Dot\n", "
\n", "

\n", " The .gov means it’s official.\n", "
\n", " Federal government websites often end in .gov or .mil. Before\n", " sharing sensitive information, make sure you’re on a federal\n", " government site.\n", "

\n", "
\n", "
\n", "
\n", " \"Https\"\n", "
\n", "

\n", " The site is secure.\n", "
\n", " The https:// ensures that you are connecting to the\n", " official website and that any information you provide is encrypted\n", " and transmitted securely.\n", "

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
  • \n", "

    \n", "This URL will be replaced with blast.ncbi.nlm.nih.gov on 12/1/2014.\n", "Please change your bookmark now.\n", "

    \n", "
\n", "
\n", "Skip to main page content\n", "
\n", "
\n", "
\n", " \n", " \"NIH\n", " \n", "
\n", "\n", "
\n", " Log in\n", " \n", "
\n", "\n", "
\n", "
\n", "
\n", " \n", "

Account

\n", "
\n", "
\n", " Logged in as:
\n", " username\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "\n", "
\n", "
\n", "
Check out the ClusteredNR database on BLAST+
\n", "
\n", "Learn more\n", "Give us feedback\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\t\t
\n", " Format Request\n", " \n", "
\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\n", "\t\t\t\t
\n", "
\n", "\t\t\t\t
\t\t\t\t\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "\n", "
Query
Protein Sequence
\n", "
Database
nr
\n", "
Job title
Protein Sequence
\n", "
Entrez Query
Note: Your search is limited to records matching this Entrez query
\n", "\n", "
\n", "\n", "\n", "\n", "\n", "
\n", "
Format
\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "Reset form to defaults\n", "Help\n", "
\n", "

\n", "These options control formatting of alignments in results pages. The\n", "default is HTML, but other formats (including plain text) are available.\n", "PSSM and PssmWithParameters are representations of Position Specific Scoring Matrices and are only available for PSI-BLAST. \n", "The Advanced view option allows the database descriptions to be sorted by various indices in a table.\n", "

\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "Help\n", "
\n", "

\n", "Choose how to view alignments.\n", "The default \"pairwise\" view shows how each subject sequence aligns\n", "individually to the query sequence. The \"query-anchored\" view shows how\n", "all subject sequences align to the query sequence. For each view type,\n", "you can choose to show \"identities\" (matching residues) as letters or\n", "dots.\n", "more...\n", "

\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Help\n", "
\n", "
    \n", "
  • Graphical Overview: Graphical Overview: Show graph of similar sequence regions aligned to query.\n", "more...\n", "
  • \n", "
  • NCBI-gi: Show NCBI gi identifiers.\n", "
  • \n", "
  • CDS feature: Show annotated coding region and translation.\n", "more...\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "Help\n", "
\n", "
    \n", "
  • Masking Character: Display masked (filtered) sequence regions as lower-case or as specific letters (N for nucleotide, P for protein).\n", "
  • \n", "
  • Masking Color: Display masked sequence regions in the given color.
  • \n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Help\n", "
\n", "
    \n", "
  • Descriptions: Show short descriptions for up to the given number of sequences.
  • \n", "
  • Alignments: Show alignments for up to the given number of sequences, in order of statistical significance.
  • \n", "
  • Line lenghth: Number of letters to show on one line in an alignment.
  • \n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "Type common name, binomial, taxid, or group name. Only 20 top taxa will be shown.
\n", "\n", "\n", " \n", "\n", "\"Add\n", "
\n", "\n", "
\n", "
\n", "Help\n", "
\n", "

\n", "Show only sequences from the given organism.\n", "

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "Help\n", "
\n", "

\n", "Show only those sequences that match the given Entrez query.\n", "more...\n", "

\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", "Help\n", "
\n", "

\n", "Show only sequences with expect values in the given range.\n", "more...\n", "

\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", "Help\n", "
\n", "

\n", " Show only sequences with percent identity values in the given range. \n", "

\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "Help\n", "
\n", "
    \n", "
  • Format for PSI-BLAST: The Position-Specific Iterated BLAST (PSI-BLAST) program performs iterative searches with a protein query, \n", "in which sequences found in one round of search are used to build a custom score model for the next round.\n", "more...\n", "
  • \n", "
  • Inclusion Threshold: This sets the statistical significance threshold for including a sequence in the model used \n", "by PSI-BLAST to create the PSSM on the next iteration.
  • \n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \t\t\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\t\t\t\t\t\t\t\t\t\n", "\n", "\n", "\n", "\n", "\n", "\n", "\t\t\t\n", "\t\t\t\t\n", "\n", "\t\t\t\t\t\n", "\t\t\t\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\t\t \t\t\n", "\t\t\t\t\n", "\t\t\t\t
\n", "\n", "
\n", "\n", "\t\t \n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n" ] } ], "source": [ "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# BLAST GET" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the request id parsed from the result of the PUT, get the results of the search. Common attributes:\n", "\n", "* RID - mandatory\n", "* FORMAT_TYPE - HTML, Text, ASN.1, XML\n", "* ALIGNMENTS - number of alignments (default 500)\n", "* ALIGNMENT_VIEW - Pairwise, QueryAnchored, QueryAnchoredNoIdentities, FlatQueryAnchored, FlatQueryAnchoredNoIdentities, Tabular" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "KDJMCZ64013\n" ] } ], "source": [ "rid = re.search(r'RID = (\\S+)',response.text).group(1)\n", "print(rid)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "result = requests.get('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=GET&RID=%s&FORMAT_TYPE=XML' % rid).text" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n", "\n", "NCBI Blast:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\t\t\n", "
\n", "
\n", "
\n", "
\n", " \"U.S.\n", "

An official website of the United States government

\n", " \n", "
\n", "
\n", "
\n", "
\n", " \"Dot\n", "
\n", "

\n", " The .gov means it’s official.\n", "
\n", " Federal government websites often end in .gov or .mil. Before\n", " sharing sensitive information, make sure you’re on a federal\n", " government site.\n", "

\n", "
\n", "
\n", "
\n", " \"Https\"\n", "
\n", "

\n", " The site is secure.\n", "
\n", " The https:// ensures that you are connecting to the\n", " official website and that any information you provide is encrypted\n", " and transmitted securely.\n", "

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "
\n", "Skip to main page content\n", "
\n", "
\n", "
\n", " \n", " \"NIH\n", " \n", "
\n", "\n", "
\n", " Log in\n", " \n", "
\n", "\n", "
\n", "
\n", "
\n", " \n", "

Account

\n", "
\n", "
\n", " Logged in as:
\n", " username\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " Format Request Status \n", "
\n", "\t\t
\t\t\t\t \n", "\t\t\t\t [Formatting options] \n", "
\n", "

Job Title:

\n", "\t\t\t\t\t\t\t\t\n", "
\n", " \n", "
\n", "

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Request ID KDJMCZ64013
StatusSearching
Submitted atMon Oct 23 15:33:32 2023
Current timeMon Oct 23 15:33:52 2023
Time since submission00:00:20
\n", "

This page will be automatically updated in 2 seconds

\n", "
\n", " \n", " \n", " \n", " \n", "
\n", " \n", "\t\t\t\t
\n", "\t\t\t\t
\t\t\t\t\n", "\t\t\t\t \t\t\t\t\n", "\t\t\t\t \n", " \n", " \n", " \n", "\t\t\t\t \n", "\t\t\t\t \t\t\t\t\n", "\t\t\t\t \n", "\t\t\t\t\n", "\t\t\t\t \t\t\t\t \t\n", "\t\t\t\t \n", "\t\t\t\t \n", "\t\t\t\t \n", "
\t\t\t\t\n", "
\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "print(result)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "result = requests.get('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=GET&RID=%s&FORMAT_TYPE=XML' % rid).text" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n", "\n", "NCBI Blast:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\t\t\n", "
\n", "
\n", "
\n", "
\n", " \"U.S.\n", "

An official website of the United States government

\n", " \n", "
\n", "
\n", "
\n", "
\n", " \"Dot\n", "
\n", "

\n", " The .gov means it’s official.\n", "
\n", " Federal government websites often end in .gov or .mil. Before\n", " sharing sensitive information, make sure you’re on a federal\n", " government site.\n", "

\n", "
\n", "
\n", "
\n", " \"Https\"\n", "
\n", "

\n", " The site is secure.\n", "
\n", " The https:// ensures that you are connecting to the\n", " official website and that any information you provide is encrypted\n", " and transmitted securely.\n", "

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "
\n", "Skip to main page content\n", "
\n", "
\n", "
\n", " \n", " \"NIH\n", " \n", "
\n", "\n", "
\n", " Log in\n", " \n", "
\n", "\n", "
\n", "
\n", "
\n", " \n", "

Account

\n", "
\n", "
\n", " Logged in as:
\n", " username\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " Format Request Status \n", "
\n", "\t\t
\t\t\t\t \n", "\t\t\t\t [Formatting options] \n", "
\n", "

Job Title:

\n", "\t\t\t\t\t\t\t\t\n", "
\n", " \n", "
\n", "

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Request ID KDJMCZ64013
StatusSearching
Submitted atMon Oct 23 15:33:32 2023
Current timeMon Oct 23 15:33:58 2023
Time since submission00:00:26
\n", "

This page will be automatically updated in 2 seconds

\n", "
\n", " \n", " \n", " \n", " \n", "
\n", " \n", "\t\t\t\t
\n", "\t\t\t\t
\t\t\t\t\n", "\t\t\t\t \t\t\t\t\n", "\t\t\t\t \n", " \n", " \n", " \n", "\t\t\t\t \n", "\t\t\t\t \t\t\t\t\n", "\t\t\t\t \n", "\t\t\t\t\n", "\t\t\t\t \t\t\t\t \t\n", "\t\t\t\t \n", "\t\t\t\t \n", "\t\t\t\t \n", "
\t\t\t\t\n", "
\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "print(result)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "# Exercise: Anything new?\n", "\n", "Write a script to see if a website has changed since the last time you checked. The script will save the text of the website in the current directory and compare the previously saved text to the current website." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Exercise: BLAST IT!\n", "\n", "Use the BLAST URLAPI to find structures with similar sequence to a user-supplied FASTA protein sequence. Your script will take the name of a FASTA file as its only argument. The contents of this file should be provided as the QUERY parameter in a BLAST URL PUT request querying the pdb database (DATABASE=pdb) using blastp (PROGRAM=blastp). You should extract the RID from the response using a regular expression.\n", "\n", "Using the RID, you then submit a GET request. Your GET request may either return the desired data, or it may return a status HTML page (even if you request XML) if the request hasn't finished. You should look for the presence of the string Status=WAITING in the response. If this string is present, you should repeat your GET request every 5 seconds (`time.sleep(5)`) until you get a response without it. It is typical for it to take 30 seconds.\n", "\n", "Print out the final XML response.\n", "\n", "Example fasta file: http://mscbio2025.net/files/brca.fasta" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [], "source": [ "import requests,re,sys,time\n", "import xml.etree.ElementTree as ET\n", "\n", "if len(sys.argv) < 2:\n", "\tprint(\"Need fasta file\")\n", "\tsys.exit(1)\n", "\n", "f = open(sys.argv[1])\n", "fasta = f.read()\n", "\n", "values = {'CMD': 'PUT', 'DATABASE':'pdb', 'PROGRAM': 'blastp','QUERY':fasta, 'PROGRAM_NAME':'blastp','BLAST_PROGRAM':'blastp'}\n", "res = requests.get('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi',values)\n", "response = res.text\n", "m = re.search(r'RID = (\\S+)', response)\n", "\n", "rid = m.group(1)\n", "result = 'Status=WAITING'\n", "values = {'CMD':'GET','RID': rid, 'FORMAT_TYPE':'XML'}\n", "while re.search('Status=WAITING',result):\n", " time.sleep(5)\n", " res = requests.get('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi',values)\n", " result = res.text\n", " #print(result)\n", " #print(res.url)\n", "\n", "root = ET.fromstring(result)\n", "cnt = 0\n", "#print out some summary information from the hits\n", "for hit in root.iter('Hit'):\n", "\thsp = hit.find('Hit_hsps').find('Hsp')\n", "\tevalue = hsp.find('Hsp_evalue').text\n", "\tident = float(hsp.find('Hsp_identity').text)\n", "\tlength = float(hsp.find('Hsp_align-len').text)\n", "\tpdb_ch = hit.find('Hit_accession').text\n", "\tm = re.search(r'(\\S+)_', pdb_ch)\n", "\tresolution = '0'\n", "\tif m:\n", "\t\tpdb = m.group(1)\n", "\t\tresponse = requests.get('http://www.pdb.org/pdb/rest/describePDB',{'structureId': pdb}).text\n", "\t\tm = re.search(r'resolution=\"(\\S+)\"',response)\n", "\t\tif m:\n", "\t\t\tresolution = m.group(1)\n", "\tprint('%s %s %.2f %s' % (pdb_ch, evalue, ident/length,resolution))\n", "\tcnt += 1\n", "\tif cnt >= 10:\n", "\t\tbreak\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 4 }