{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Cheminformatics\n", "## 10/31/2023 🎃\n", "\n", "print view\n", "\n", "notebook\n" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Cheminformatics (also known as chemoinformatics, chemioinformatics and chemical informatics) is the use of computer and informational techniques applied to a range of problems in the field of chemistry.* \n", "--Wikipedia" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Open Source Cheminformatics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* rdkit [http://www.rdkit.org](http://www.rdkit.org)\n", " * BSD License\n", " * Relatively new, very nicely architected C++ backend\n", " * Actively developed\n", " * Native Python interface\n", "\n", "* OpenBabel [http://openbabel.org](http://openbabel.org)\n", " * GNU License\n", " * Older (forked from OpenEye in 2001), a bit crufty and complicated\n", " * Lots of functionality (e.g., support for more than 100 file formats)\n", " * Python interface is through SWIG (auto-generated) bindings to C/C++\n", " * Includes standalone programs: babel, obabel, etc.\n", " \n", "* Pybel\n", " * A native, user-friendly python interface to OpenBabel\n", " * Limited functionality (but can always fallback to OpenBabel)\n", " * Simplest to use\n", " * **Note:** Pybel is installed as part of openbabel. There is a completely unrelated python package called PyBEL that is *not* what you want" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# File Formats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

2D

\n", "\n", " SMILES\n", "\n", "

3D

\n", " pdb, sdf, mol2\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Simplified Molecular Input Line Entry System (SMILES)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Atoms

\n", "\n", "Specified by their atomic symbols inside brackets\n", "\n", "* [Au], [Fe], [Zn], etc\n", "\n", "No brackets needed for organic subset: B, C, N, O, P, S, F, Cl, Br, and I\n", "\n", "Aromatic atoms are lower case: c1ccccc1\n", "\n", "

Bonds

\n", "\n", "* Single -\n", "* Double =\n", "* Triple #\n", "* Aromatic :\n", "\n", "Single and aromatic can be omitted.\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# SMILES, cont." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Branches\n", "\n", "Parentheses denote branches and can be nested.\n", "\n", "Example: SC(N)CO\n", "\n", "## Cycles\n", "\n", "Break a bond in the cycle and use a digit to label the break.\n", "\n", "\n", "\n", "As long as rings are separate, digits can be reused.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# SMILES, cont.\n", "\n", "## Disconnections\n", "\n", "A period `.` separates nonbonded molecules.\n", "\n", "[Na+].[Cl-]\n", "\n", "## Isomeric Smiles\n", "Slashes (`/ \\`) denote configuration around double bonds.\n", "\n", "At (`@`) denotes configuration around chiral centers.\n", "\n" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Drawing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All but the simplest smiles can be challenging to interpret (especially if chirality is included). Fortunately, you can use pybel (or molecular viewers like [MarvinView](https://www.chemaxon.com/products/marvin/marvinview/)) to convert them to their 2D representation.\n", "\n", "Example: CC(NC1=CC=C(O)C=C1)=O" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "CH\n", "3\n", "NH\n", "OH\n", "O\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from openbabel import pybel\n", "mol = pybel.readstring('smi','CC(NC1=CC=C(O)C=C1)=O')\n", "mol" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "mol.draw(filename=\"imgs/accet.png\",show=False) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# SMARTS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regular expressions for molecules.\n", "\n", "All SMILES are SMARTS (exact matches). Additionally, SMARTS support\n", "\n", "* wild cards \n", " * `C~*~C` any atom can be between two carbons using any (~) bond\n", " * `a1aaaaa1` any aromatic 6 atom ring\n", "* property testing \n", " * `[R]` atom in a ring\n", " * `[#6]` atomic number is 6 (matches aromatic or aliphatic)\n", " * `[D3]` atom with three explicit bonds (degree)\n", "* logical operators (not - !, and - & ;, or - ,)\n", " * `[!C&R]` not aliphatic carbon and in ring\n", " * `[F,Cl,Br,I]` one of the first four halogens\n", "* matching an atomic environment ('recursive' SMARTS)\n", " * `[$(*O);$(*C)]` this matches one atom that is bound to both C and O" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Pybel Input/Output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`pybel.readstring`\n", "\n", "Takes a format and string with molecular data in it and returns a single molecule." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol = pybel.readstring('smi','CCCC')\n", "len(mol.atoms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For simple output, use the molecule's `write` method, which takes the format" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " OpenBabel10302320482D\n", "\n", " 4 3 0 0 0 0 0 0 0 0999 V2000\n", " 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n", " 1 2 1 0 0 0 0\n", " 2 3 1 0 0 0 0\n", " 3 4 1 0 0 0 0\n", "M END\n", "$$$$\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "==============================\n", "*** Open Babel Warning in WriteMolecule\n", " No 2D or 3D coordinates exist. Stereochemical information will be stored using an Open Babel extension. To generate 2D or 3D coordinates instead use --gen2D or --gen3D.\n" ] } ], "source": [ "mol.write('sdf','output.sdf',overwrite=True) #write to file\n", "print(mol.write('sdf')) #no filename - return string" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `pybel.readfile`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`pybel.readfile`\n", "\n", "Takes a format and file name and returns an *iterator* over all the molecules in the file." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "14" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = list(pybel.readfile('smi','../files/results.smi')) #expand the iterator into a list\n", "len(mols)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "OH\n", "N\n", "N\n", "OH\n", "O\n", "O\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol = next(pybel.readfile('smi','../files/results.smi')) #get just first molecule\n", "mol" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "N#Cc1c(O)c2C(=O)c3ccccc3C(=O)c2c(c1C#N)O\tNSC27034\n", "N#Cc1cc2SCCSCCCSCCSc2cc1C#N\tNSC680721\n", "N#Cc1cc2CN(CCN(CCN(CCN(Cc2cc1C#N)S(=O)(=O)c1ccc(cc1)C)S(=O)(=O)c1ccc(cc1)C)S(=O)(=O)c1ccc(cc1)C)S(=O)(=O)c1ccc(cc1)C\tNSC673657\n", "N#Cc1cc2/C(=N\\c3cccc(n3)N)/N=C(c2cc1C#N)Nc1cccc(n1)N\tNSC666078\n", "N#Cc1c(OC)ccc(c1C#N)O.COc1ccc(c(c1C#N)C#N)OC\tNSC618324\n", "N#Cc1c(C#N)c(O)c2c(c1O)c(N)ccc2\tNSC320651\n", "N#Cc1cc(ccc1C#N)NC(=O)CCCC(=O)Nc1ccc(c(c1)C#N)C#N\tNSC309816\n", "N#Cc1c(C#N)c(O)c(c(c1O)Cl)Cl\tNSC172566\n", "N#Cc1c(C#N)c(O)c2c(c1O)cccc2\tNSC128281\n", "N#Cc1cc(ccc1C#N)[N+](=O)[O-]\tNSC123374\n", "N#Cc1cc(ccc1C#N)Oc1ccc(c(c1)C#N)C#N\tNSC94808\n", "N#Cc1c2c(cc(c1C#N)[N+](=O)[O-])n(c1c2cccc1)C\tNSC92934\n", "N#Cc1c(O)ccc(c1C#N)O\tNSC43554\n", "N#Cc1ccccc1C#N\tNSC17562\n" ] } ], "source": [ "#of course, this is the most efficient way to read all\n", "for mol in pybel.readfile('smi','../files/results.smi'): \n", " print(mol.write('can').rstrip()) #canonical smiles" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `pybel.Outputfile`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To output many molecules to the same file, use `pybel.Outputfile`" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "output = pybel.Outputfile('sdf','output.sdf',overwrite=True)\n", "for m in mols:\n", " output.write(m)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Molecules\n", "\n", "The molecule object provides a number of methods and access to the molecules atoms and bonds." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6 6 6 6 6 6 6 7 6 7 " ] } ], "source": [ "for atom in mol:\n", " print(atom.atomicnum,end=' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Atom properties in clude `atomicmass`, `atomicnum`, `coords`, `formalcharge`, `hyb`, `isotope`, `partialcharge`, `degree`, `explicitvalence` and `totalvalence`\n", "\n", "Atoms can also be accessed in `mol.atoms`" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "(1, 1, 1, 4)" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = pybel.readstring('smi','CC')\n", "a1 = m.atoms[0]\n", "a1.degree, a1.heavydegree, a1.explicitvalence, a1.totalvalence" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# SMARTS Matching" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SMARTS matching is done by initializing a `pybel.Smarts` object with a SMARTS expression. This can then be applied to any molecule to identify the matching atoms." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "N\n", "N\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(1, 6, 5, 4, 3, 2)]" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aromatic_ring = pybel.Smarts('a1aaaaa1')\n", "aromatic_ring.findall(mol) #returns all _unique_ matches" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned matches are atom indices that can be accessed through `mol.atoms`" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n", "8\n" ] } ], "source": [ "double_ring = pybel.Smarts('a1aaaa2a1aaaa2')\n", "for (i,m) in enumerate(mols):\n", " if double_ring.findall(m):\n", " print(i)\n", " m.draw(filename=\"r%d.png\"%i,show=False) \n" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "OH\n", "N\n", "N\n", "OH\n", "NH\n", "2\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols[5]" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "OH\n", "N\n", "N\n", "OH\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols[8]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Molecular Properties" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "128.13076\n" ] } ], "source": [ "print(mol.molwt) #molecular weight" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'abonds': 6.0, 'atoms': 10.0, 'bonds': 10.0, 'cansmi': nan, 'cansmiNS': nan, 'dbonds': 0.0, 'formula': nan, 'HBA1': 2.0, 'HBA2': 2.0, 'HBD': 0.0, 'InChI': nan, 'InChIKey': nan, 'L5': nan, 'logP': 1.42996, 'MP': 79.79220000000001, 'MR': 35.872, 'MW': 128.13076, 'nF': 0.0, 'rotors': 0.0, 's': nan, 'sbonds': 2.0, 'smarts': nan, 'tbonds': 2.0, 'title': nan, 'TPSA': 47.58}\n" ] } ], "source": [ "desc = mol.calcdesc()\n", "print(desc)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.42996" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "desc['logP'] #calculated partition coefficient between octanol/water" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Lipinski's Rule of Five" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In 1997 Christopher Lipinski analyzed existing drugs and came up with a set of molecular property rules for classifying a small molecule as *drug-like*.\n", "\n", "* No more than 5 hydrogen bond donors\n", "* No more than 10 hydrogen bond acceptors\n", "* Molecular weight less than 500 daltons\n", "* Partition coefficient logP less than 5\n", "* There is no fifth rule" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "def lipinski(mol):\n", " desc = mol.calcdesc()\n", " return desc['HBD'] <= 5 and desc['HBA1'] <= 10 and desc['MW'] <= 500 and desc['logP'] <= 5" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "True\n", "False\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n" ] } ], "source": [ "for m in mols:\n", " print(lipinski(m))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Fingerprints" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A *molecular fingerprint* reduces the chemical features of a molecule into a *bit vector*. The features of the fingerprint correspond to a bit in the vector. This bit is set if the compound has that feature.\n", "\n", "The most common type of fingerprint is a Daylight style fingerprint where all the paths (up to a given length) are enumerated and *hashed* to their bit positions.\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Fingerprints, cont." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bit vectors can easily be compared, most commonly with the Tanimoto coefficient:\n", "$$\\frac{A \\cap B}{A \\cup B}$$\n", "\n", "This provides a quantitative measure of *chemical similarity*.\n", "\n", "Similarity search is a surprisingly effective mechanism of virtual screening (given enough data)." ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "openbabel.pybel.Fingerprint" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fp = mol.calcfp()\n", "type(fp)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[75, 82, 224, 279, 296, 299, 348, 440, 442, 474, 503, 598, 656, 671, 711, 716, 728, 870, 906, 913, 937]\n" ] } ], "source": [ "print(fp.bits)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Chemical Similarity\n", "\n", "### Tanimoto coefficient\n", "$\\Large \\frac{A \\cap B}{A \\cup B}$ 1.0 means identical\n", "\n", "To calculate the Tanimoto similarity between two fingerprints, use the **|** operator" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.28\n", "0.3\n", "0.19626168224299065\n", "0.12138728323699421\n", "0.5\n", "0.4666666666666667\n", "0.29577464788732394\n", "0.42857142857142855\n", "0.6176470588235294\n", "0.4117647058823529\n", "0.4772727272727273\n", "0.22826086956521738\n", "0.6363636363636364\n", "1.0\n" ] } ], "source": [ "fp = mol.calcfp()\n", "for m in mols:\n", " f = m.calcfp()\n", " print(f | fp)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 2D -> 3D" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2.468272539514969, 0.6349880847110817, -0.018730053230247772)\n", "(2.46371187360455, 0.6341729709888883, -0.010376048284164541)\n" ] } ], "source": [ "mol.make3D() #this makes a reasonable 3D structure\n", "print(mol.atoms[0].coords)\n", "mol.localopt() #this further optimizes the structure\n", "print(mol.atoms[0].coords)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "sdf = mol.write('sdf')" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import py3Dmol\n", "view = py3Dmol.view()\n", "view.addModel(sdf)\n", "view.setStyle({'stick':{}})\n", "view.zoomTo()\n", "view.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `sdf` Molecules" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = list(pybel.readfile('sdf','../files/best.sdf'))\n", "len(mols)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(-0.5939, -56.8911, 14.3139)\n" ] } ], "source": [ "atom = mols[0].atoms[0]\n", "print(atom.coords)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ZINC78996542\r\n", "\r\n", "\r\n", " 39 44 0 0 0 0 0 0 0 0999 V2000\r\n", " -0.5939 -56.8911 14.3139 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3154 -57.8883 15.8741 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.3628 -55.5394 14.9296 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.0440 -55.7357 15.4805 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.3058 -57.7869 14.5684 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.2724 -57.1748 15.3144 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.1864 -57.3893 16.5881 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.5650 -58.0576 12.9536 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.4112 -58.0403 11.5707 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.4635 -57.8375 13.7859 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.1560 -57.8031 11.0185 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.1883 -57.5962 13.2480 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.0573 -57.5833 11.8565 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9942 -57.3574 14.1090 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.7971 -57.1312 13.5121 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.6648 -57.1197 12.0139 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.8049 -57.3464 11.2822 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.5742 -56.9136 11.4820 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7364 -57.3419 10.3001 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7198 -58.5030 16.2901 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5010 -56.2171 16.2506 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7946 -58.5045 17.6831 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5759 -56.2186 17.6435 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0729 -57.3592 15.5739 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.2227 -57.3625 18.3596 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.3046 -57.3655 19.8487 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.3111 -54.6466 15.3638 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.2563 -53.8710 14.6948 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.8121 -54.3645 13.5073 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.6178 -52.1004 11.5954 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.4057 -52.3452 11.0066 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.0836 -54.8943 14.7675 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.9949 -53.3368 13.4353 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.7496 -53.5881 12.8303 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.9229 -52.5915 12.8100 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4640 -53.0881 11.6153 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.6585 -59.9707 15.9992 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.1564 -60.0645 16.2196 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3350 -59.3635 15.5577 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 8 9 1 0 0 0\r\n", " 10 12 1 0 0 0\r\n", " 20 24 1 0 0 0\r\n", " 21 23 1 0 0 0\r\n", " 27 32 1 0 0 0\r\n", " 22 25 1 0 0 0\r\n", " 28 33 1 0 0 0\r\n", " 11 13 1 0 0 0\r\n", " 29 34 1 0 0 0\r\n", " 24 14 1 0 0 0\r\n", " 12 14 1 0 0 0\r\n", " 33 34 1 0 0 0\r\n", " 13 17 1 0 0 0\r\n", " 15 1 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 17 1 0 0 0\r\n", " 2 6 1 0 0 0\r\n", " 3 1 1 0 0 0\r\n", " 3 4 1 0 0 0\r\n", " 4 32 1 0 0 0\r\n", " 4 6 1 0 0 0\r\n", " 26 25 1 0 0 0\r\n", " 37 39 1 0 0 0\r\n", " 38 39 1 0 0 0\r\n", " 39 2 1 0 0 0\r\n", " 6 5 1 0 0 0\r\n", " 8 10 2 0 0 0\r\n", " 9 11 2 0 0 0\r\n", " 20 22 2 0 0 0\r\n", " 21 24 2 0 0 0\r\n", " 27 28 2 0 0 0\r\n", " 23 25 2 0 0 0\r\n", " 29 32 2 0 0 0\r\n", " 30 31 2 0 0 0\r\n", " 30 35 2 0 0 0\r\n", " 31 36 2 0 0 0\r\n", " 12 13 2 0 0 0\r\n", " 33 35 2 0 0 0\r\n", " 34 36 2 0 0 0\r\n", " 14 15 2 0 0 0\r\n", " 1 5 2 0 0 0\r\n", " 16 18 2 0 0 0\r\n", " 2 7 2 0 0 0\r\n", " 17 19 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.83433\r\n", "\r\n", "> \r\n", "1.45522\r\n", "\r\n", "> \r\n", "475.372\r\n", "\r\n", "$$$$\r\n", "ZINC78996542\r\n", "\r\n", "\r\n", " 39 44 0 0 0 0 0 0 0 0999 V2000\r\n", " -0.5722 -56.8468 14.3132 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3170 -57.8869 15.8829 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.3244 -55.4995 14.9316 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.0775 -55.7161 15.4874 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.3140 -57.7556 14.5698 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.2862 -57.1582 15.3202 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.1923 -57.4012 16.6007 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.5452 -57.9747 12.9290 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.3911 -57.9310 11.5468 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.4434 -57.7729 13.7658 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.1352 -57.6858 10.9997 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.1676 -57.5240 13.2330 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.0362 -57.4846 11.8422 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9733 -57.3042 14.0987 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.7756 -57.0691 13.5066 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.6429 -57.0290 12.0089 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7832 -57.2393 11.2727 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.5516 -56.8149 11.4815 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7144 -57.2159 10.2909 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7038 -58.4927 16.2574 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4766 -56.2038 16.2618 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7789 -58.5209 17.6500 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5521 -56.2319 17.6545 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0525 -57.3342 15.5634 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.2031 -57.3905 18.3484 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.2855 -57.4221 19.8372 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.3565 -54.6510 15.3845 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.3151 -53.8880 14.7200 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.8755 -54.3614 13.5148 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.7196 -52.1345 11.6161 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.5100 -52.3693 11.0185 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.1314 -54.8885 14.7794 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.0693 -53.3564 13.4560 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.8264 -53.5977 12.8421 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 5.0099 -52.6234 12.8352 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.5559 -53.0999 11.6229 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.6348 -59.9861 16.0001 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.1325 -60.0490 16.2302 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3170 -59.3619 15.5646 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 8 9 1 0 0 0\r\n", " 10 12 1 0 0 0\r\n", " 20 24 1 0 0 0\r\n", " 21 23 1 0 0 0\r\n", " 27 32 1 0 0 0\r\n", " 22 25 1 0 0 0\r\n", " 28 33 1 0 0 0\r\n", " 11 13 1 0 0 0\r\n", " 29 34 1 0 0 0\r\n", " 24 14 1 0 0 0\r\n", " 12 14 1 0 0 0\r\n", " 33 34 1 0 0 0\r\n", " 13 17 1 0 0 0\r\n", " 15 1 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 17 1 0 0 0\r\n", " 2 6 1 0 0 0\r\n", " 3 1 1 0 0 0\r\n", " 3 4 1 0 0 0\r\n", " 4 32 1 0 0 0\r\n", " 4 6 1 0 0 0\r\n", " 26 25 1 0 0 0\r\n", " 37 39 1 0 0 0\r\n", " 38 39 1 0 0 0\r\n", " 39 2 1 0 0 0\r\n", " 6 5 1 0 0 0\r\n", " 8 10 2 0 0 0\r\n", " 9 11 2 0 0 0\r\n", " 20 22 2 0 0 0\r\n", " 21 24 2 0 0 0\r\n", " 27 28 2 0 0 0\r\n", " 23 25 2 0 0 0\r\n", " 29 32 2 0 0 0\r\n", " 30 31 2 0 0 0\r\n", " 30 35 2 0 0 0\r\n", " 31 36 2 0 0 0\r\n", " 12 13 2 0 0 0\r\n", " 33 35 2 0 0 0\r\n", " 34 36 2 0 0 0\r\n", " 14 15 2 0 0 0\r\n", " 1 5 2 0 0 0\r\n", " 16 18 2 0 0 0\r\n", " 2 7 2 0 0 0\r\n", " 17 19 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.7915\r\n", "\r\n", "> \r\n", "1.18555\r\n", "\r\n", "> \r\n", "475.372\r\n", "\r\n", "$$$$\r\n", "ZINC78996534\r\n", "\r\n", "\r\n", " 39 44 0 0 0 0 0 0 0 0999 V2000\r\n", " -0.6060 -58.4259 14.4308 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.2622 -57.0761 15.7885 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.3848 -59.6010 15.3414 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.0076 -59.2726 15.8653 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.2852 -57.4884 14.4946 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.2358 -57.9070 15.3815 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.1167 -57.3919 16.6176 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.5490 -57.6468 12.7169 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.3640 -57.9721 11.3766 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.4660 -57.6655 13.6007 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.0959 -58.3170 10.9188 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.1782 -58.0108 13.1580 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.0156 -58.3341 11.8081 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0030 -58.0395 14.0758 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.7918 -58.3829 13.5703 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.6256 -58.7300 12.1163 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7497 -58.6825 11.3283 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.5225 -59.0404 11.6676 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.6589 -58.9060 10.3738 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5198 -58.6843 16.4130 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.8163 -56.4211 15.9438 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.6259 -58.3704 17.7678 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9225 -56.1072 17.2987 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.1149 -57.7096 15.5009 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.3273 -57.0819 18.2108 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4433 -56.7463 19.6592 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.8030 -59.9702 14.2440 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.7737 -60.8749 13.8173 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3083 -61.4219 16.0936 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 5.1631 -64.0341 14.7974 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.4366 -64.3052 15.9262 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.0669 -60.2450 15.3869 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.0225 -62.0537 14.5162 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.2759 -62.3324 15.6757 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.9642 -62.9109 14.0844 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.4901 -63.4609 16.3744 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.2882 -54.7944 15.8358 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.6925 -55.1298 15.1839 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.2860 -55.7119 15.1441 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 8 9 1 0 0 0\r\n", " 10 12 1 0 0 0\r\n", " 20 24 1 0 0 0\r\n", " 21 23 1 0 0 0\r\n", " 27 32 1 0 0 0\r\n", " 22 25 1 0 0 0\r\n", " 28 33 1 0 0 0\r\n", " 11 13 1 0 0 0\r\n", " 29 34 1 0 0 0\r\n", " 24 14 1 0 0 0\r\n", " 12 14 1 0 0 0\r\n", " 33 34 1 0 0 0\r\n", " 13 17 1 0 0 0\r\n", " 15 1 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 17 1 0 0 0\r\n", " 2 6 1 0 0 0\r\n", " 3 1 1 0 0 0\r\n", " 3 4 1 0 0 0\r\n", " 4 32 1 0 0 0\r\n", " 4 6 1 0 0 0\r\n", " 26 25 1 0 0 0\r\n", " 37 39 1 0 0 0\r\n", " 38 39 1 0 0 0\r\n", " 39 2 1 0 0 0\r\n", " 6 5 1 0 0 0\r\n", " 8 10 2 0 0 0\r\n", " 9 11 2 0 0 0\r\n", " 20 22 2 0 0 0\r\n", " 21 24 2 0 0 0\r\n", " 27 28 2 0 0 0\r\n", " 23 25 2 0 0 0\r\n", " 29 32 2 0 0 0\r\n", " 30 31 2 0 0 0\r\n", " 30 35 2 0 0 0\r\n", " 31 36 2 0 0 0\r\n", " 12 13 2 0 0 0\r\n", " 33 35 2 0 0 0\r\n", " 34 36 2 0 0 0\r\n", " 14 15 2 0 0 0\r\n", " 1 5 2 0 0 0\r\n", " 16 18 2 0 0 0\r\n", " 2 7 2 0 0 0\r\n", " 17 19 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.60183\r\n", "\r\n", "> \r\n", "2.26383\r\n", "\r\n", "> \r\n", "475.372\r\n", "\r\n", "$$$$\r\n", "ZINC78996542\r\n", "\r\n", "\r\n", " 39 44 0 0 0 0 0 0 0 0999 V2000\r\n", " -1.1562 -57.7105 14.6555 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.0859 -57.6546 15.8298 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.1390 -56.2126 14.7797 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.3392 -55.9873 15.0720 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.0584 -58.3149 14.9816 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.8469 -57.3439 15.3020 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.9188 -56.8145 16.1731 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.8917 -60.1405 14.9063 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.9219 -60.5932 13.5908 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.7606 -59.4813 15.3968 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.8212 -60.3875 12.7645 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.6396 -59.2637 14.5786 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.6919 -59.7269 13.2610 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4188 -58.5636 15.0722 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.3780 -58.3922 14.2190 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.4425 -58.8944 12.8026 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5968 -59.5297 12.4148 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.4927 -58.7341 12.0364 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.6562 -59.8644 11.4908 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7756 -58.8669 17.4469 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.7367 -56.7639 16.7466 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.6706 -58.3845 18.7515 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.6320 -56.2815 18.0513 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.3085 -58.0565 16.4443 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0988 -57.0919 19.0536 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9885 -56.5777 20.4491 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.6434 -55.4303 12.6355 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.3219 -54.7691 11.6131 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.1612 -54.4616 14.2263 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.1096 -52.5500 11.1925 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.5256 -52.3975 12.4883 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.0650 -55.2760 13.9477 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4186 -53.9532 11.8808 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.8462 -53.7965 13.2121 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.0567 -53.3258 10.8773 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.9009 -53.0163 13.5062 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4158 -59.7838 14.5992 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.6872 -59.3395 16.7216 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3789 -59.1280 15.9719 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 8 9 1 0 0 0\r\n", " 10 12 1 0 0 0\r\n", " 20 24 1 0 0 0\r\n", " 21 23 1 0 0 0\r\n", " 27 32 1 0 0 0\r\n", " 22 25 1 0 0 0\r\n", " 28 33 1 0 0 0\r\n", " 11 13 1 0 0 0\r\n", " 29 34 1 0 0 0\r\n", " 24 14 1 0 0 0\r\n", " 12 14 1 0 0 0\r\n", " 33 34 1 0 0 0\r\n", " 13 17 1 0 0 0\r\n", " 15 1 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 17 1 0 0 0\r\n", " 2 6 1 0 0 0\r\n", " 3 1 1 0 0 0\r\n", " 3 4 1 0 0 0\r\n", " 4 32 1 0 0 0\r\n", " 4 6 1 0 0 0\r\n", " 26 25 1 0 0 0\r\n", " 37 39 1 0 0 0\r\n", " 38 39 1 0 0 0\r\n", " 39 2 1 0 0 0\r\n", " 6 5 1 0 0 0\r\n", " 8 10 2 0 0 0\r\n", " 9 11 2 0 0 0\r\n", " 20 22 2 0 0 0\r\n", " 21 24 2 0 0 0\r\n", " 27 28 2 0 0 0\r\n", " 23 25 2 0 0 0\r\n", " 29 32 2 0 0 0\r\n", " 30 31 2 0 0 0\r\n", " 30 35 2 0 0 0\r\n", " 31 36 2 0 0 0\r\n", " 12 13 2 0 0 0\r\n", " 33 35 2 0 0 0\r\n", " 34 36 2 0 0 0\r\n", " 14 15 2 0 0 0\r\n", " 1 5 2 0 0 0\r\n", " 16 18 2 0 0 0\r\n", " 2 7 2 0 0 0\r\n", " 17 19 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.58798\r\n", "\r\n", "> \r\n", "1.876\r\n", "\r\n", "> \r\n", "475.372\r\n", "\r\n", "$$$$\r\n", "ZINC35448294\r\n", "\r\n", "\r\n", " 33 38 0 0 0 0 0 0 0 0999 V2000\r\n", " 6.2193 -51.5392 13.7822 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 6.5893 -51.9773 15.0518 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 5.1156 -52.0958 13.1259 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 5.8703 -52.9821 15.7052 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.3763 -53.1129 13.7626 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.2347 -53.8835 13.4110 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.7696 -53.5311 15.0379 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.9678 -54.7425 14.4550 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.2954 -56.1404 13.1784 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3853 -53.8636 12.1948 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.6865 -55.2195 12.0267 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.8500 -55.7376 14.5366 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.8912 -54.5146 15.4387 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.0317 -55.6866 13.2732 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.9662 -56.1848 12.1474 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.9233 -54.9834 16.3038 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9363 -58.4538 18.1389 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.3508 -57.1319 18.2899 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.1432 -58.8360 17.0510 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9875 -56.1521 17.3616 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.9885 -57.8949 14.9095 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.7643 -57.8655 16.1021 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.1944 -56.5486 16.2790 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.9633 -56.6191 14.3950 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.6932 -55.8143 15.2279 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.8386 -54.8497 15.0945 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4658 -57.5272 16.2097 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.6382 -58.0380 13.8546 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.9083 -58.8130 16.5209 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.0807 -59.3238 14.1659 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3307 -57.1398 14.8765 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.2157 -59.7112 15.4991 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.7617 -61.2970 15.8834 Cl 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 17 18 1 0 0 0\r\n", " 1 2 1 0 0 0\r\n", " 19 22 1 0 0 0\r\n", " 3 5 1 0 0 0\r\n", " 27 31 1 0 0 0\r\n", " 28 30 1 0 0 0\r\n", " 20 23 1 0 0 0\r\n", " 4 7 1 0 0 0\r\n", " 29 32 1 0 0 0\r\n", " 21 22 1 0 0 0\r\n", " 5 6 1 0 0 0\r\n", " 23 25 1 0 0 0\r\n", " 7 13 1 0 0 0\r\n", " 32 33 1 0 0 0\r\n", " 24 9 1 0 0 0\r\n", " 24 25 1 0 0 0\r\n", " 8 13 1 0 0 0\r\n", " 9 14 1 0 0 0\r\n", " 10 6 1 0 0 0\r\n", " 10 11 1 0 0 0\r\n", " 11 14 1 0 0 0\r\n", " 12 31 1 0 0 0\r\n", " 12 8 1 0 0 0\r\n", " 12 14 1 0 0 0\r\n", " 17 19 2 0 0 0\r\n", " 1 3 2 0 0 0\r\n", " 18 20 2 0 0 0\r\n", " 2 4 2 0 0 0\r\n", " 27 29 2 0 0 0\r\n", " 28 31 2 0 0 0\r\n", " 30 32 2 0 0 0\r\n", " 21 24 2 0 0 0\r\n", " 22 23 2 0 0 0\r\n", " 5 7 2 0 0 0\r\n", " 6 8 2 0 0 0\r\n", " 9 15 2 0 0 0\r\n", " 25 26 1 0 0 0\r\n", " 13 16 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.52352\r\n", "\r\n", "> \r\n", "6.72818\r\n", "\r\n", "> \r\n", "407.767\r\n", "\r\n", "$$$$\r\n", "ZINC72314638\r\n", "\r\n", "\r\n", " 34 38 0 0 0 0 0 0 0 0999 V2000\r\n", " -6.9192 -60.0249 14.8267 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.9224 -60.5532 13.5394 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.7591 -59.4408 15.3438 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.7655 -60.4981 12.7677 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.5814 -59.3754 14.5808 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.6074 -59.9121 13.2905 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.3284 -58.7584 15.1036 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.2330 -58.7339 14.3037 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.2694 -59.3144 12.9167 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4559 -59.8663 12.4993 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.2700 -59.2872 12.1989 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4977 -60.2504 11.5937 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.9776 -58.1386 14.7709 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.1138 -55.4726 15.4117 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.0692 -59.0069 15.4111 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.2024 -57.9981 15.5499 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4884 -55.4636 16.0338 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.6963 -56.8766 14.7008 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.5616 -56.7232 15.2099 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.5532 -54.4163 15.1165 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.3449 -58.4107 12.4428 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.5462 -58.8702 12.9824 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.2658 -58.1304 13.2810 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.6683 -59.0498 14.3602 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3878 -58.3101 14.6587 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.5892 -58.7698 15.1985 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.7434 -58.9704 16.6705 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.8700 -58.9787 17.5299 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5363 -56.8305 16.6476 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7893 -58.4284 18.8090 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4557 -56.2801 17.9268 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.2435 -58.1798 16.4491 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0821 -57.0791 19.0075 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9979 -56.4920 20.3757 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1 2 1 0 0 0\r\n", " 21 22 1 0 0 0\r\n", " 3 5 1 0 0 0\r\n", " 28 32 1 0 0 0\r\n", " 29 31 1 0 0 0\r\n", " 23 25 1 0 0 0\r\n", " 24 26 1 0 0 0\r\n", " 30 33 1 0 0 0\r\n", " 4 6 1 0 0 0\r\n", " 32 7 1 0 0 0\r\n", " 5 7 1 0 0 0\r\n", " 6 10 1 0 0 0\r\n", " 8 13 1 0 0 0\r\n", " 8 9 1 0 0 0\r\n", " 9 10 1 0 0 0\r\n", " 14 19 1 0 0 0\r\n", " 15 13 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 25 1 0 0 0\r\n", " 16 19 1 0 0 0\r\n", " 34 33 1 0 0 0\r\n", " 27 26 1 0 0 0\r\n", " 17 14 1 0 0 0\r\n", " 19 18 1 0 0 0\r\n", " 1 3 2 0 0 0\r\n", " 21 23 2 0 0 0\r\n", " 22 24 2 0 0 0\r\n", " 2 4 2 0 0 0\r\n", " 28 30 2 0 0 0\r\n", " 29 32 2 0 0 0\r\n", " 31 33 2 0 0 0\r\n", " 5 6 2 0 0 0\r\n", " 25 26 2 0 0 0\r\n", " 7 8 2 0 0 0\r\n", " 13 18 2 0 0 0\r\n", " 9 11 2 0 0 0\r\n", " 14 20 2 0 0 0\r\n", " 10 12 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.51168\r\n", "\r\n", "> \r\n", "1.81673\r\n", "\r\n", "> \r\n", "411.326\r\n", "\r\n", "$$$$\r\n", "ZINC72314638\r\n", "\r\n", "\r\n", " 34 38 0 0 0 0 0 0 0 0999 V2000\r\n", " -6.9192 -60.0250 14.8266 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.9223 -60.5532 13.5393 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.7590 -59.4409 15.3438 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.7655 -60.4980 12.7677 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.5813 -59.3753 14.5807 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.6073 -59.9120 13.2905 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.3283 -58.7583 15.1036 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.2330 -58.7336 14.3037 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.2693 -59.3140 12.9166 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4559 -59.8659 12.4992 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.2700 -59.2867 12.1988 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4977 -60.2499 11.5936 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.9776 -58.1384 14.7708 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.1137 -55.4723 15.4114 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.0691 -59.0065 15.4111 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.2024 -57.9977 15.5499 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4883 -55.4632 16.0337 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.6963 -56.8762 14.7006 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.5616 -56.7229 15.2098 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.5532 -54.4160 15.1161 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.3452 -58.4103 12.4429 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.5464 -58.8703 12.9827 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.2660 -58.1300 13.2811 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 4.6682 -59.0500 14.3606 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.3879 -58.3098 14.6589 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.5890 -58.7698 15.1986 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 3.7431 -58.9706 16.6707 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.8702 -58.9787 17.5299 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5360 -56.8304 16.6476 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.7895 -58.4286 18.8091 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4554 -56.2801 17.9268 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.2434 -58.1797 16.4491 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0819 -57.0792 19.0075 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9978 -56.4922 20.3758 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1 2 1 0 0 0\r\n", " 21 22 1 0 0 0\r\n", " 3 5 1 0 0 0\r\n", " 28 32 1 0 0 0\r\n", " 29 31 1 0 0 0\r\n", " 23 25 1 0 0 0\r\n", " 24 26 1 0 0 0\r\n", " 30 33 1 0 0 0\r\n", " 4 6 1 0 0 0\r\n", " 32 7 1 0 0 0\r\n", " 5 7 1 0 0 0\r\n", " 6 10 1 0 0 0\r\n", " 8 13 1 0 0 0\r\n", " 8 9 1 0 0 0\r\n", " 9 10 1 0 0 0\r\n", " 14 19 1 0 0 0\r\n", " 15 13 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 25 1 0 0 0\r\n", " 16 19 1 0 0 0\r\n", " 34 33 1 0 0 0\r\n", " 27 26 1 0 0 0\r\n", " 17 14 1 0 0 0\r\n", " 19 18 1 0 0 0\r\n", " 1 3 2 0 0 0\r\n", " 21 23 2 0 0 0\r\n", " 22 24 2 0 0 0\r\n", " 2 4 2 0 0 0\r\n", " 28 30 2 0 0 0\r\n", " 29 32 2 0 0 0\r\n", " 31 33 2 0 0 0\r\n", " 5 6 2 0 0 0\r\n", " 25 26 2 0 0 0\r\n", " 7 8 2 0 0 0\r\n", " 13 18 2 0 0 0\r\n", " 9 11 2 0 0 0\r\n", " 14 20 2 0 0 0\r\n", " 10 12 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.51156\r\n", "\r\n", "> \r\n", "2.07052\r\n", "\r\n", "> \r\n", "411.326\r\n", "\r\n", "$$$$\r\n", "ZINC39912421\r\n", "\r\n", "\r\n", " 35 39 0 0 0 0 0 0 0 0999 V2000\r\n", " -2.8637 -58.0485 14.3831 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.6591 -57.4567 14.0111 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.7718 -57.6110 13.4624 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.0742 -58.1435 13.6832 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4965 -58.9601 15.3604 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.8048 -56.6783 12.9233 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.1200 -56.7955 12.6090 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.8956 -58.9553 14.8275 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.0866 -57.9453 13.0303 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5472 -56.3394 11.8484 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.0881 -58.8830 14.9464 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.1072 -57.9326 15.8719 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.3763 -57.6028 14.6449 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.3300 -59.0479 15.5599 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.6430 -56.6523 15.5704 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.4011 -56.4874 14.9569 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.8249 -60.4168 15.8846 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4900 -55.4712 15.9138 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.0441 -55.2299 14.6663 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.4836 -54.4853 14.8788 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9761 -58.9905 17.8000 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.6595 -56.9124 16.7746 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.8625 -58.3455 19.0315 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5458 -56.2672 18.0061 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.3746 -58.2738 16.6715 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.1474 -56.9839 19.1346 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0409 -56.3650 20.3177 F 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.9715 -59.7155 15.4305 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -8.5076 -61.3824 13.1820 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -7.7831 -62.4617 12.6762 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -7.9064 -60.4967 14.0761 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.4575 -62.6553 13.0647 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.5808 -60.6902 14.4647 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.8563 -61.7695 13.9589 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.2144 -62.0410 14.4164 Cl 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 29 30 1 0 0 0\r\n", " 21 25 1 0 0 0\r\n", " 22 24 1 0 0 0\r\n", " 31 33 1 0 0 0\r\n", " 23 26 1 0 0 0\r\n", " 32 34 1 0 0 0\r\n", " 11 13 1 0 0 0\r\n", " 12 14 1 0 0 0\r\n", " 13 2 1 0 0 0\r\n", " 1 2 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 19 1 0 0 0\r\n", " 26 27 1 0 0 0\r\n", " 34 35 1 0 0 0\r\n", " 3 4 1 0 0 0\r\n", " 3 7 1 0 0 0\r\n", " 4 8 1 0 0 0\r\n", " 5 25 1 0 0 0\r\n", " 5 1 1 0 0 0\r\n", " 5 8 1 0 0 0\r\n", " 17 14 1 0 0 0\r\n", " 18 15 1 0 0 0\r\n", " 28 33 1 0 0 0\r\n", " 28 8 1 0 0 0\r\n", " 7 6 1 0 0 0\r\n", " 29 31 2 0 0 0\r\n", " 30 32 2 0 0 0\r\n", " 21 23 2 0 0 0\r\n", " 22 25 2 0 0 0\r\n", " 24 26 2 0 0 0\r\n", " 11 14 2 0 0 0\r\n", " 12 15 2 0 0 0\r\n", " 13 16 2 0 0 0\r\n", " 1 3 2 0 0 0\r\n", " 33 34 2 0 0 0\r\n", " 2 6 2 0 0 0\r\n", " 4 9 2 0 0 0\r\n", " 7 10 1 0 0 0\r\n", " 19 20 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.49363\r\n", "\r\n", "> \r\n", "1.94002\r\n", "\r\n", "> \r\n", "442.764\r\n", "\r\n", "$$$$\r\n", "ZINC39912421\r\n", "\r\n", "\r\n", " 35 39 0 0 0 0 0 0 0 0999 V2000\r\n", " -2.8637 -58.0484 14.3832 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.6591 -57.4566 14.0111 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.7718 -57.6110 13.4624 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.0743 -58.1434 13.6832 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4964 -58.9600 15.3605 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.8048 -56.6783 12.9233 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.1200 -56.7954 12.6090 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.8956 -58.9552 14.8276 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.0865 -57.9453 13.0303 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5473 -56.3392 11.8484 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.0881 -58.8830 14.9464 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.1074 -57.9325 15.8720 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.3763 -57.6028 14.6449 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.3299 -59.0479 15.5600 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.6428 -56.6523 15.5704 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.4011 -56.4874 14.9569 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.8249 -60.4168 15.8846 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4900 -55.4712 15.9137 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.0441 -55.2299 14.6664 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.4836 -54.4854 14.8789 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9763 -58.9904 17.8002 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.6592 -56.9122 16.7746 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.8627 -58.3454 19.0317 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5455 -56.2671 18.0061 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.3746 -58.2738 16.6716 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.1473 -56.9838 19.1347 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0408 -56.3649 20.3178 F 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.9714 -59.7154 15.4306 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -8.5075 -61.3823 13.1820 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -7.7831 -62.4617 12.6762 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -7.9063 -60.4966 14.0762 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.4575 -62.6552 13.0648 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.5808 -60.6901 14.4648 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.8563 -61.7695 13.9591 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -4.2143 -62.0411 14.4166 Cl 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 29 30 1 0 0 0\r\n", " 21 25 1 0 0 0\r\n", " 22 24 1 0 0 0\r\n", " 31 33 1 0 0 0\r\n", " 23 26 1 0 0 0\r\n", " 32 34 1 0 0 0\r\n", " 11 13 1 0 0 0\r\n", " 12 14 1 0 0 0\r\n", " 13 2 1 0 0 0\r\n", " 1 2 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 19 1 0 0 0\r\n", " 26 27 1 0 0 0\r\n", " 34 35 1 0 0 0\r\n", " 3 4 1 0 0 0\r\n", " 3 7 1 0 0 0\r\n", " 4 8 1 0 0 0\r\n", " 5 25 1 0 0 0\r\n", " 5 1 1 0 0 0\r\n", " 5 8 1 0 0 0\r\n", " 17 14 1 0 0 0\r\n", " 18 15 1 0 0 0\r\n", " 28 33 1 0 0 0\r\n", " 28 8 1 0 0 0\r\n", " 7 6 1 0 0 0\r\n", " 29 31 2 0 0 0\r\n", " 30 32 2 0 0 0\r\n", " 21 23 2 0 0 0\r\n", " 22 25 2 0 0 0\r\n", " 24 26 2 0 0 0\r\n", " 11 14 2 0 0 0\r\n", " 12 15 2 0 0 0\r\n", " 13 16 2 0 0 0\r\n", " 1 3 2 0 0 0\r\n", " 33 34 2 0 0 0\r\n", " 2 6 2 0 0 0\r\n", " 4 9 2 0 0 0\r\n", " 7 10 1 0 0 0\r\n", " 19 20 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.49359\r\n", "\r\n", "> \r\n", "2.12367\r\n", "\r\n", "> \r\n", "442.764\r\n", "\r\n", "$$$$\r\n", "ZINC39912344\r\n", "\r\n", "\r\n", " 35 39 0 0 0 0 0 0 0 0999 V2000\r\n", " -2.9655 -58.0579 14.4075 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.7487 -57.5053 14.0153 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.8718 -57.6016 13.4941 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.1866 -58.0933 13.7357 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.6126 -58.9419 15.4006 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -1.8851 -56.7323 12.9225 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.2070 -56.8131 12.6256 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.0176 -58.9002 14.8849 N 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.2006 -57.8708 13.0937 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.6301 -56.3508 11.8663 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.0185 -58.9767 14.9117 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.0258 -58.0767 15.8325 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.4630 -57.6840 14.6346 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.2258 -59.1731 15.5107 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.5813 -56.7838 15.5554 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.3369 -56.5875 14.9564 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 1.6995 -60.5554 15.8092 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 2.4524 -55.6234 15.9088 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -0.0884 -55.3180 14.6896 O 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 0.4544 -54.5863 14.9086 H 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0661 -58.9675 17.8345 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.6938 -56.8776 16.7976 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -2.9180 -58.3157 19.0588 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.5456 -56.2257 18.0218 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.4539 -58.2484 16.7039 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.1577 -56.9448 19.1524 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -3.0181 -56.3191 20.3285 F 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.1077 -59.6230 15.5080 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -8.0602 -62.0396 13.3356 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.9663 -63.0767 13.9497 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -7.7088 -60.9008 14.0603 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -5.6148 -61.9379 14.6743 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -7.1891 -63.1276 13.2804 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -6.4861 -60.8499 14.7295 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " -7.5641 -64.3456 12.5058 C 0 0 0 0 0 0 0 0 0 0 0 0\r\n", " 21 25 1 0 0 0\r\n", " 22 24 1 0 0 0\r\n", " 29 33 1 0 0 0\r\n", " 30 32 1 0 0 0\r\n", " 31 34 1 0 0 0\r\n", " 23 26 1 0 0 0\r\n", " 11 13 1 0 0 0\r\n", " 12 14 1 0 0 0\r\n", " 13 2 1 0 0 0\r\n", " 1 2 1 0 0 0\r\n", " 15 16 1 0 0 0\r\n", " 16 19 1 0 0 0\r\n", " 26 27 1 0 0 0\r\n", " 3 4 1 0 0 0\r\n", " 3 7 1 0 0 0\r\n", " 4 8 1 0 0 0\r\n", " 5 25 1 0 0 0\r\n", " 5 1 1 0 0 0\r\n", " 5 8 1 0 0 0\r\n", " 35 33 1 0 0 0\r\n", " 17 14 1 0 0 0\r\n", " 18 15 1 0 0 0\r\n", " 28 34 1 0 0 0\r\n", " 28 8 1 0 0 0\r\n", " 7 6 1 0 0 0\r\n", " 21 23 2 0 0 0\r\n", " 22 25 2 0 0 0\r\n", " 29 31 2 0 0 0\r\n", " 30 33 2 0 0 0\r\n", " 32 34 2 0 0 0\r\n", " 24 26 2 0 0 0\r\n", " 11 14 2 0 0 0\r\n", " 12 15 2 0 0 0\r\n", " 13 16 2 0 0 0\r\n", " 1 3 2 0 0 0\r\n", " 2 6 2 0 0 0\r\n", " 4 9 2 0 0 0\r\n", " 7 10 1 0 0 0\r\n", " 19 20 1 0 0 0\r\n", "M END\r\n", "> \r\n", "-7.4906\r\n", "\r\n", "> \r\n", "1.79745\r\n", "\r\n", "> \r\n", "419.322\r\n", "\r\n", "$$$$\r\n" ] } ], "source": [ "!cat ../files/best.sdf" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`sdf` files can have arbitrary data embedded in them:\n", "\n", " M END\n", " > \n", " -7.83433\n", "\n", " > \n", " 1.45522\n", "\n", " > \n", " 475.372\n", "\n", " $$$$" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'MOL Chiral Flag': '0', 'minimizedAffinity': '-7.83433', 'minimizedRMSD': '1.45522', 'molecular weight': '475.372', 'OpenBabel Symmetry Classes': '27 24 14 23 12 26 2 6 7 16 15 31 35 32 29 34 30 4 5 13 13 11 11 28 21 1 10 17 20 9 8 25 36 33 18 19 3 3 22'}" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols[0].data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Beyond Pybel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that Pybel is a python-native wrapper around the OpenBabel SWIG bindings. The underlying OpenBabel objects are always accessible if you need to use the additional functionality provided by OpenBabel (this may be necessary if you modifying or creating molecule objects)." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " >\n" ] } ], "source": [ "obmol = mol.OBMol\n", "vec = obmol.Center(0)\n", "print(vec)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.7085902947538542 -0.016861359909934506 -0.0005521894831451903\n" ] } ], "source": [ "print(vec.GetX(),vec.GetY(),vec.GetZ())" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['AddAtom',\n", " 'AddBond',\n", " 'AddConformer',\n", " 'AddHydrogens',\n", " 'AddNewHydrogens',\n", " 'AddNonPolarHydrogens',\n", " 'AddPolarHydrogens',\n", " 'AddResidue',\n", " 'Align',\n", " 'AreInSameRing',\n", " 'AssignSpinMultiplicity',\n", " 'AssignTotalChargeToAtoms',\n", " 'AutomaticFormalCharge',\n", " 'AutomaticPartialCharge',\n", " 'BeginAtom',\n", " 'BeginAtoms',\n", " 'BeginBond',\n", " 'BeginBonds',\n", " 'BeginConformer',\n", " 'BeginData',\n", " 'BeginInternalCoord',\n", " 'BeginModify',\n", " 'BeginResidue',\n", " 'BeginResidues',\n", " 'CBeginAtoms',\n", " 'CEndAtoms',\n", " 'Center',\n", " 'ClassDescription',\n", " 'Clear',\n", " 'CloneData',\n", " 'ConnectTheDots',\n", " 'ContigFragList',\n", " 'ConvertDativeBonds',\n", " 'ConvertZeroBonds',\n", " 'CopyConformer',\n", " 'CopySubstructure',\n", " 'CorrectForPH',\n", " 'DataSize',\n", " 'DecrementMod',\n", " 'DeleteAtom',\n", " 'DeleteBond',\n", " 'DeleteConformer',\n", " 'DeleteData',\n", " 'DeleteHydrogen',\n", " 'DeleteHydrogens',\n", " 'DeleteNonPolarHydrogens',\n", " 'DeletePolarHydrogens',\n", " 'DeleteResidue',\n", " 'DestroyAtom',\n", " 'DestroyBond',\n", " 'DestroyResidue',\n", " 'DoTransformations',\n", " 'Empty',\n", " 'EndAtom',\n", " 'EndAtoms',\n", " 'EndBond',\n", " 'EndBonds',\n", " 'EndData',\n", " 'EndModify',\n", " 'EndResidue',\n", " 'EndResidues',\n", " 'FindAngles',\n", " 'FindChildren',\n", " 'FindLSSR',\n", " 'FindLargestFragment',\n", " 'FindRingAtomsAndBonds',\n", " 'FindSSSR',\n", " 'FindTorsions',\n", " 'GetAllData',\n", " 'GetAngle',\n", " 'GetAtom',\n", " 'GetAtomById',\n", " 'GetBond',\n", " 'GetBondById',\n", " 'GetConformer',\n", " 'GetConformers',\n", " 'GetCoordinates',\n", " 'GetData',\n", " 'GetDimension',\n", " 'GetEnergies',\n", " 'GetEnergy',\n", " 'GetExactMass',\n", " 'GetFirstAtom',\n", " 'GetFlags',\n", " 'GetFormula',\n", " 'GetGIDVector',\n", " 'GetGIVector',\n", " 'GetGTDVector',\n", " 'GetInternalCoord',\n", " 'GetLSSR',\n", " 'GetMod',\n", " 'GetMolWt',\n", " 'GetNextFragment',\n", " 'GetResidue',\n", " 'GetSSSR',\n", " 'GetSpacedFormula',\n", " 'GetTitle',\n", " 'GetTorsion',\n", " 'GetTotalCharge',\n", " 'GetTotalSpinMultiplicity',\n", " 'Has2D',\n", " 'Has3D',\n", " 'HasAromaticPerceived',\n", " 'HasAtomTypesPerceived',\n", " 'HasChainsPerceived',\n", " 'HasChiralityPerceived',\n", " 'HasClosureBondsPerceived',\n", " 'HasData',\n", " 'HasFlag',\n", " 'HasHybridizationPerceived',\n", " 'HasHydrogensAdded',\n", " 'HasLSSRPerceived',\n", " 'HasNonZeroCoords',\n", " 'HasPartialChargesPerceived',\n", " 'HasRingAtomsAndBondsPerceived',\n", " 'HasRingTypesPerceived',\n", " 'HasSSSRPerceived',\n", " 'HasSpinMultiplicityAssigned',\n", " 'IncrementMod',\n", " 'InsertAtom',\n", " 'IsCorrectedForPH',\n", " 'IsPeriodic',\n", " 'IsReaction',\n", " 'MakeDativeBonds',\n", " 'NewAtom',\n", " 'NewBond',\n", " 'NewResidue',\n", " 'NextConformer',\n", " 'NextInternalCoord',\n", " 'NumAtoms',\n", " 'NumBonds',\n", " 'NumConformers',\n", " 'NumHvyAtoms',\n", " 'NumResidues',\n", " 'NumRotors',\n", " 'PerceiveBondOrders',\n", " 'RenumberAtoms',\n", " 'ReserveAtoms',\n", " 'Rotate',\n", " 'Separate',\n", " 'SetAromaticPerceived',\n", " 'SetAtomTypesPerceived',\n", " 'SetAutomaticFormalCharge',\n", " 'SetAutomaticPartialCharge',\n", " 'SetChainsPerceived',\n", " 'SetChiralityPerceived',\n", " 'SetClosureBondsPerceived',\n", " 'SetConformer',\n", " 'SetConformers',\n", " 'SetCoordinates',\n", " 'SetCorrectedForPH',\n", " 'SetData',\n", " 'SetDimension',\n", " 'SetEnergies',\n", " 'SetEnergy',\n", " 'SetFlag',\n", " 'SetFlags',\n", " 'SetFormula',\n", " 'SetHybridizationPerceived',\n", " 'SetHydrogensAdded',\n", " 'SetInternalCoord',\n", " 'SetIsPatternStructure',\n", " 'SetIsReaction',\n", " 'SetLSSRPerceived',\n", " 'SetPartialChargesPerceived',\n", " 'SetPeriodicMol',\n", " 'SetRingAtomsAndBondsPerceived',\n", " 'SetRingTypesPerceived',\n", " 'SetSSSRPerceived',\n", " 'SetSpinMultiplicityAssigned',\n", " 'SetTitle',\n", " 'SetTorsion',\n", " 'SetTotalCharge',\n", " 'SetTotalSpinMultiplicity',\n", " 'StripSalts',\n", " 'ToInertialFrame',\n", " 'Translate',\n", " 'UnsetFlag',\n", " '__class__',\n", " '__delattr__',\n", " '__dict__',\n", " '__dir__',\n", " '__doc__',\n", " '__eq__',\n", " '__format__',\n", " '__ge__',\n", " '__getattribute__',\n", " '__gt__',\n", " '__hash__',\n", " '__iadd__',\n", " '__init__',\n", " '__init_subclass__',\n", " '__le__',\n", " '__lt__',\n", " '__module__',\n", " '__ne__',\n", " '__new__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__setattr__',\n", " '__sizeof__',\n", " '__str__',\n", " '__subclasshook__',\n", " '__swig_destroy__',\n", " '__weakref__',\n", " 'this',\n", " 'thisown']" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dir(obmol)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Project: Dimensionality Reduced Molecules\n", "\n", "Given a SMILES file where the molecules names are property (e.g. binding affinity), map the molecules into 2D space using PCA and visualize the data colored by the property.\n", "\n", " * Read SMILES\n", " * Save title as property to label with\n", " * Compute fingerprint\n", " * Convert fingerprint bits into an array of size 1024 of zeroes and ones\n", " * Use `sklearn.decomposition.PCA` to transform the fingerprints into 2D coordinates\n", " * Plot the coordinates and color by specified property" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#fill in fps with fingerprint bits, yvals with title value\n", "pca = PCA(n_components=2)\n", "res = pca.fit_transform(fps) \n", "\n", "plt.scatter(res[:,0],res[:,1],c=yvals)\n", "plt.gca().set_aspect('equal', adjustable='box');" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2023-10-30 20:19:10-- http://mscbio2025.csb.pitt.edu/files/er.smi\r\n", "Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139\r\n", "Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected.\r\n", "HTTP request sent, awaiting response... 200 OK\r\n", "Length: 20022 (20K) [application/smil+xml]\r\n", "Saving to: ‘er.smi’\r\n", "\r\n", "\r", "er.smi 0%[ ] 0 --.-KB/s \r", "er.smi 100%[===================>] 19.55K --.-KB/s in 0s \r\n", "\r\n", "2023-10-30 20:19:10 (38.3 MB/s) - ‘er.smi’ saved [20022/20022]\r\n", "\r\n" ] } ], "source": [ "!wget http://mscbio2025.csb.pitt.edu/files/er.smi" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "yvals = []\n", "fps = []\n", "for mol in pybel.readfile('smi','er.smi'):\n", " yvals.append(float(mol.title))\n", " fpbits = mol.calcfp().bits\n", " fp = np.zeros(1024)\n", " fp[fpbits] = 1\n", " fps.append(fp)\n", " \n", "pca = PCA(n_components=2)\n", "res = pca.fit_transform(fps) \n", "\n", "plt.scatter(res[:,0],res[:,1],c=yvals)\n", "plt.gca().set_aspect('equal', adjustable='box');" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 0.97673441]\n", " [0.97673441 1. ]]\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEGCAYAAABsLkJ6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAg+UlEQVR4nO3df5RcZZ3n8fe3iwK7GaWDxBloEpJxmCAxkKw9kt3MeiQiYUQwIogRdtd11+ieHX8dJmMYGH44eMhs1h97dGdGBHfGA2T4EWzB4AQ1mUFyDEOHJoZAMiO6JBTMGDd0BNMcis53/6iqTnX1vVW3qm7Vrar7eZ3Th/StX0+RnOd7n+/zPN/H3B0REUmfvqQbICIiyVAAEBFJKQUAEZGUUgAQEUkpBQARkZQ6JukG1OOkk07yefPmJd0MEZGusmPHjl+6++zK610VAObNm8fo6GjSzRAR6Spm9mzQdaWARERSSgFARCSlFABERFJKAUBEJKUUAEREUqqrVgGJiPSikbEc6zfv5fnxCU4Z7GfNigWsXDJU87FmKQCIiCRoZCzH1fftYiI/CUBufIKr79s19XjYY3EEAQUAEZEElO7sc+MTMx6byE+yfvPeqT8HPaYAICLShSrv+oM8HxAYojxWD00Ci4i02frNe6t2/gCnDPZzymB/6GNxUAAQEWmzWnfw/dkMa1YsYM2KBfRnM4GPxUEpIBGRNjtlsD8w9w8wFLDSR6uARES6SLXlm2tWLJgxB9CfzXDzJYtmdO4rlwzF1uFXUgAQEYlZtaWd5R16q+7so1IAEBGJWdAkb+XyzVbe2UelSWARkZiFTfLGtXwzLhoBiIhEFJbXr7x+Qn+W8Yn8jNfHtXwzLgoAIiIRhOX1R589yMYduWnXsxkj22fkj/jU6+NcvhkXBQARSbWoxdbC8vobHt3PpPu06/lJZ9ZAloFjj0l0krcWBQARSa2wu/p7Rvex/WcvzujYg4Q9Z/xwnrHrzo+1vXFTABCRVCm/4+8zm9GBT+Qn2fbMwcjvZ0BQCOi0fH8QBQARSY3KO/4od/i1DByb4YgzY1NXp+X7g2gZqIh0jZGxHMvWbWH+2k0sW7eFkbFcXa+PUoStXodfneTmSxYxNNiPUSjlELSjtxNpBCAiXaHW7toowurvNOOUwf6O2NTVCI0ARKQrVNtdG8XIWA6LuU3ZjHVFqieMAoCIdIVmd9eu37w3cLK2KbG/YXspAIhIx6iW42/2cJRqgaKy5n5U+SMeeQTSiRQARKQjlHL8ufEJnKM5/lIQWLNiAdm+6UmcbF/0FExYoChN2g41uGyz0+r71COxAGBmc8xsq5k9ZWa7zezTSbVFRJIXKcdfmcSvI6lf7XStlUuG2LZ2eUNBoBvW+4dJcgTwGnCVu58JLAX+u5mdmWB7RCRBtXL86zfvJT85s+RCZQomLI20cslQzeWatUYTrTyeMQmJLQN19xeAF4p/fsnMngaGgKeSapOIJCfsmMTSHXaUSeAoB7FUW665cskQNz6wmxcPz6zkWTqqMelDXOLUEfsAzGwesAR4NOCx1cBqgLlz57a3YSLSNmHHJJbusGsFCIh2EEulymJwF5518rTqnuXt6Nb1/mESDwBm9hvARuAz7v6rysfd/RbgFoDh4eEuX3QlImFqHZNYK0BA9KWipU4/Nz4xrZZPbnyCjTtyfOBtQ2zdc6Bn7vTDJBoAzCxLofO/w93vS7ItIpK8anfYUc7RDTuI5YT+7NSfK9NElXeVE/lJtu45wLa1y5v8Np0vsQBgZgbcBjzt7l9Kqh0i0j1qpWAsZFVQ+fUo9YC6eWlnPZJcBbQM+A/AcjN7ovjzngTbIyJdbjxg8rbyepTOvZuXdtYjyVVAj1DXKl6R3hT1RCqpLWyi2IEln3+IC886OfAMgHLdvrSzHolPAoukWRwVLtPk2pFdU0cwZsxYdc4chk87cSqAntCfJZuxGfsFAF48nOf27fsC37c0ETyUsgBsHsOBCO0yPDzso6OjSTdDJDbL1m0JvGMdGuxPxSRkPa4d2RXYgWf6jMmyw9ezfcZr7kTt2jJmfPGDZ/d0p29mO9x9uPK6RgAiEbUiVdNshcs02fDo/sDr5Z0/FAq01WPSvac7/2pUDE4kgpGxHGvu3TmtUNmae3fWfSJVpWYrXKZJHMc3BjFo+u+xWykAiERw4wO7A+vQ3PjA7qbet1qBMjmqlR20Q1eXdG6GUkAiEQTVhql2Paoom5skng56INvH4fyRwMfSmnJTABBJWK/Vl2mFWmf5Dg3283wxPRfGMWYNZAODdlpTbgoAIhEMhpQYGCwrMSDxKp90r6W0Ymre2k2hz5nIT3LcMX30ZzNV6wmlieYARCK44eKFgadR3XDxwoRa1BvCavdfO7KLz971xNSke1S1DnQ5NJGveSZAmmgEIBKBcvXxC9sEN/rsQe7Yvq+h89aDKoaWO2WwXym3MgoAIhGp44hXWO3+DY/ub6jzh6OBOuhQlzSnesIoBSQiiQjL7de73r8y7bNyyRBj153PVy5frFRPDRoBiEgiwgq3ZWoUa6sUdlevEVttGgGISCLCNsGtOmfOjOvVjD57MO6mpYYCgIgkYuWSocAVOTetXDTj+pVLw88DD6sRJLUpBSQiiQlL0wRdDyvl3KoaQWmgEYCIdIVMyHmPYdelNgUAEWmpsM1e9Vp1zpy6rkttSgGJSEuMjOW44f7d00poNHPi2fBpJ3Ln9n2Ul3PrK16XxmgEICKxK+3yDaqfNJGfbKi65/rNe6ms5XmE9JZyjoNGACIyTTMnn5VeW6t6ZyPll3V6WvwUAERkSjOH1Fe+tppGyi+HbRxLaynnOCgAiHSQVpw7XI+w+jylNEtl20afPciGR/fXtRSz0Zo8QYXeVN+nOeZdtIZ2eHjYR0dHk26GSEsE3UH3ZzNtrWEzf+2m0EJslXX0Deou2jZrIMv1Fy1s+PskHSC7lZntcPfhyusaAYh0iGp33+3q5KrV56lsW711+uPorFXfJ15aBSTSITphkjOsPk+ju237sxm+cvlitq1dro67AyUaAMzsm2b2CzN7Msl2iHSCsMnMdk5ylurzzBo4etTlccf0Tfs9CpVg7g5Jp4D+Gvga8K2E2yGSuE6a5Hwlf3TF/fhEnmyf0WdwJMJA4Mqlc7lp5aIWtk7ikmgAcPeHzWxekm0Q6RSdcuxk0FxE/ogTpeTOsRlT599Fkh4B1GRmq4HVAHPnhpeEFekFnTDJGTbnUGsaIJsx/selZzf12Vrl014dPwns7re4+7C7D8+ePTvp5oj0vLA5h2pVN4cG+1l/6dlNddalZbC58Qmco5vQGi0eJ7V1fAAQkfaq56SuOFf51NqEJvHr+BSQiLReZerlA28bYuueAzNSMcOnndiyFE29y2CVLmpeogHAzDYA7wROMrPngOvd/bYk2ySSJmElm+96bD/HHzuze2jlHEU9tX6aqVkkRyWaAnL3Ve5+srtn3f1Udf4i7VOtZHN+0hmfyLc1Fx+WegpaBqt0UTw0ByCSUkGdaJh2dK5hh8QH3dF3wq7pXqA5AJGUqrezbEfnGjXFpNLQ8dAIQKRH1TqLt97OspM613rSRRJOAUCkB0VZU79mxQIibO4FOq9zrSddJOGUAhLpQVFKS69cMsToswe5ffu+0Pcx6Ngllp2wa7rbaQQg0oOiTpLetHJR1Uqfndr5SzwUAER6SCnvH1a2JyiPf/1FC2fk00tUjqG3KQCI9IjyvH+QsDx+eT49iNbX9y4FAJEeUW1df61J0pVLhti2dnnopLDW1/cmTQKL9IiwTtqAbWuXR3oPra9PF40ARHpEHEdKan19uigAiPSIODpvra9PF6WARHpEXEdKan19eigAiPQQdd5SD6WARERSSgFARCSlFABERFJKAUBEJKUUAEREUkoBQEQkpbQMVKRDjIzlml7DL1IPBQCRDlCq5Fkq5lYqwwwoCEjLKABI6rT7TjvK50U5wUskbqEBwMy+CqHnSuDun2pJi0RaqN132iNjOdbcs5P8EZ/6vDX37JzxeVFP8BKJU7VJ4FFgR5Ufka5T7U67FW64f/dU51+SP+LccP/uadfiqOQpUq/QEYC7/007GyLSDu2+0x6fyEe6vmbFgmkjE1AZZmm9mnMAZjYb+BxwJvC60nV3j3bCRPX3vgD4X0AGuNXd1zX7niLVdOqBJ3FV8hSpR5RJ4DuAu4ALgU8A/wk40OwHm1kG+N/Au4HngMfM7H53f6rZ9xYJ08l32qrkKe0WJQC80d1vM7NPu/s/AP9gZo/F8NlvB37q7j8DMLO/Bd4HKABIy7TyTjtotc+sgSwvHp6ZBpo1kG3680SaFWUncOlf7wtmdqGZLQFOjOGzh4D9Zb8/V7w2jZmtNrNRMxs9cKDpgYd0uZGxHMvWbWH+2k0sW7eFkbFc3a9vVed/9X27yI1P4BxdXXThWSeTzUw/aj2bMa6/aGHTnynSrCgB4CYzOwG4Cvgj4Fbgsy1tVRl3v8Xdh919ePbs2e36WOlAYZ1s1CDQ7OurCVtdtHXPAdZfeva0IxbXX3q2Uj3SEWqmgNz9u8U/HgLOjfGzc8Ccst9PLV4TCdTsZqlWbraqtrpIuX3pVFFWAf0fAjaEuftHm/zsx4DTzWw+hY7/Q8CHm3xP6WHNLOEcGcsFrv6J+vpaOnV1kUg1UVJA3wU2FX9+CLwBeLnZD3b314A/BDYDTwN3u/vu6q+SNGt0s1Qp9VPv+9ZjzYoF9Gcz0651yuoikTBRUkAby383sw3AI3F8uLs/CDwYx3tJ72t0CWdQ6qee10PtyWOt45du1EgxuNOBN8XdEJFaGu1kq6V4br5kUc3XR60fpFy/dJsocwAvMX0O4F8o7AwWabtGOtn+bB+H80dmXJ81kE188lgkSVFSQK9vR0NEWmFkLBfY+QN4aK3b6VSpU3pVzUlgM/thlGsinahalc9DIYXaKqlSp/SqaucBvA4YAE4ys1lAaTvjGwjYsSvSiardpVd24NeO7GLDo/uZdCdjxqpz5nDTykUdXT9IpBnVUkAfBz4DnEKh/n8pAPwK+FprmyUSj7D1+QZTHfjIWI5rvr2LX796tIOfdOf27fsAuGnlIkArfKT3mNdIhJrZJ939q21qT1XDw8M+OjqadDOki1Su4IFC53/F0rnctHJR4OPlMmY8c/N72tRakdYwsx3uPlx5Pcoy0CNmNuju48U3mgWscve/iLmNIrEpX7d/Qn+W12X7GD+c55TBfs49YzZb9xxg3tpNNd9nMupMsUgXirIT+GOlzh/A3V8EPtayFok0qbLo2/hEnlfyR/jy5YuZ98Z+bt++L7QsRKWMWe0niXSpKCOAjJmZF3NFxYNcjm1ts0TqV7rrD+rcJ/KT/Ml9PwldEhpm1Tlzaj9JpEtFCQB/B9xlZl8v/v5x4Huta5L0umplFRqt118rlw/U1fmXzxOI9KooAeBzwGoKx0EC/AT4rZa1SHrayFiONffuJD9ZyK3nxidYc+/OqcejlFwIUq3eTz0yZnzxg6rXL+kQZSfwETN7FHgz8EHgJGBj9VeJBLvxgd1TnX9JftK58YHdDBx7TMMlF+LYldufzUSqDSTSK6ptBPtdYFXx55cUDobH3eM8FEZSJuh83NL18ZDHonTuYev9oxrS2n5JoWqrgPYAy4H3uvvvF/cCND/GFgnRTMmFNSsWzDh7N6qhwX62rV2uzl9Sp1oAuAR4AdhqZt8ws3dxdDewSEMG+7Oh15s5VGX02YMzUktRqaibpFVoAHD3EXf/EHAGsJVCWYg3mdlfmtn5bWqf9JgbLl5Itm/6fUS2z7jh4oWsXDLEzZcsmnaA+gfeNsT6zXuZv3YTy9ZtmXGA+8hYjjP/9HtTZRsaoaJuklZRJoF/DdwJ3FncBXwZhZVBD7W4bdKDah3qUl7vv9ZBLFd848dse+ZgU+3JZkxF3SS1atYC6iSqBZQuy9ZtCZzYnTWQ5dDhPPVt6QqW7TPWX6Zln9LbwmoBRSkFIZKIsNz8izF1/gD5I171zACRXtbImcAiTYm62/eE/izjEQ9taYYmgSWtFACkraIesA7QrjpsfWbMX7tJdf4ldZQCkraqdsB6pbBNY43K9lngXoFJd5yjwahypZFIr9IIQGJTLbVTftxikMrJ3jOueTCWNhngHN3pC0dXIPWZzWhP1NITIr1AAUBiUS21M/rswZrr9Et19+NY2lkSVt6h9Pv8kANhNCcgaaEAILGoltr5l0Ov1Hz9pHukE7qiKnX+6zfv5bN3PRGY3w+rH6SNYZIWicwBmNllZrbbzI6Y2Yy1qdJ9wu6anx+fSORYxdIIpHQqWFB+v5nSEyK9IKlJ4Ccp1Bp6OKHPl4hGxnIsW7cltBRDSbVCbkkcq5gxqznZHFR6QuWgJU0SSQG5+9MApvNWO1o9SzbXrFgw40Su0t30PaP7YsvrR9GfzYQeDlM5UikvPSGSNh2/DNTMVpvZqJmNHjhwIOnmpEo9SzbD7qYBHt93KJb2LHvziVXL0ZZ/7lATpaVF0qJlIwAz+wHBR0de4+7fifo+7n4LcAsUagHF1DyJoFpeP0hlIbewA9obtf1nL4ZO3JZq+pcLG5GISEHLAoC7n9eq95b2aHSVTJQD2hsx6V411VSuVtVREdEyUKkirLM994zZvOVPv8dEvlCSrc/gw+fM5aaVhZRPXAe0V8qY1dWxK78vUl0i5aDN7P3AV4HZwDjwhLuvqPU6lYNun/IUTqa4Y3ZosJ9zz5jNndv3hVbjHGrybN5qrlx6NMiISHQdVQ7a3b/t7qe6+3Hu/ptROn9pn1IKp9SRT7pPpVm27jlQtRRzqzr/44/NqPMXiZlSQDLNyFiOq+7eGVojJ6kyCYdfjT+lJJJ2CgAypXTnH7Zzt5Rzb9VdfjWdsHwz6jkGIt1CAUCm1Jq8HTg2wwuH2t/5d8LyzXo2xYl0i47fCCbtUy29Y8CvX53kSBvWDCx784kdV56hnk1xIt1CI4CUK09rBNXHL2nHWrGMGavOmVN1sjepNEy9m+JEuoECQIpVpjWSqNoJhdHFz9ddWPN5I2M51ty7k/xkoZ258QnW3LsTaH0aRqWjpRcpBZRirdqwVa+oneiND+ye6vxL8pPOjQ/sbkWzplHpaOlFGgGkWCekLww494zZkZ4bdkZw3GcHB1FpCelFCgApNjiQbUvnWY0DG3fkGD7txI7vTFVaQnqNUkAp9koHpH8g+mqawf5sXddFpDoFgJQaGctNFXPrBOXpqLBTyG64eCHZvuknAmT7jBsuXtjWtor0CqWAukxcyyA7bf16aSI4yoYr5eFF4qEA0EXi3I3a6nIORvDeATM4xox82Y6y8tU01TZclXLwvdbhq8SEJEUpoC4S127UkbFc1aMVG2EUcvGl3buhOwoc1l92duhO37RtuCqvvOocDeqltJdIK2kE0EXi6hzXb94b685eA66oqNW/bN2W0I1T1e7i07bhqtaIR6SVNALoIoMDIatgQq6HiTv948DWPQemXWt041Qnb7gKm5xuRtpGPNJZNALoImGVGuqp4HDOF77f0Gf3GZx8Qngp6MoOq9EJ206d6G1VNdC0jXiksygAdJFDE8GbtsYn8sxfu6lmZ/nuL/09//rSqw199hGHbWuXV03tVGp0wrYTJ3pblaqJesi9SCsoAHSRaoexlE8gwtG70vIVJs3k/TNWmDZOa4fVqlRNp454JB0UALpIUOdbqfyu9NqRXdyxfV8sE75Lf3sWy9Zt4fnxCQYHshx3TB+HJvKp6bBamarpxBGPpIMCQBepvFsM69ifH59gZCwXS+efMWPpb8/i8X2HpgLPi4fz9GczfPnyxanpuNI68pHeZp5QDfhGDA8P++joaNLN6Bhh+fhmzRrIcuFZJ7N1z4GqB8UMDfazbe3y2D+/U2nDlnQrM9vh7sOV1zUC6GJRUkL1Gip2bFEOiknbUkWlaqTXaB9AF1u5ZIibL1nEQDaev8ZSbf6oB8VoqaJId9MIoMvdM7qPw01U9Syv2VOqzR+l81f+W6T7KQB0sSu+8WO2PXOwqfeoTO5M5CfJhOT8M2YccVf+W6RHJBIAzGw9cBHwKvAM8J/dfTyJtnSL0gRkq6t4QiHn35/NzFjxUl60TUS6X1JzAN8H3uruZwH/BFydUDs6SlitmfKKkY047pjgv2YLKQlaqtAZVrFTRHpDIiMAd3+o7NftwKVJtKOThNWaGX32YMPr+UsregDW3LuT/OTRd8lmjMt/b86MnH8pt68VLyK9rxPmAD4K3BX2oJmtBlYDzJ07t11taruwWjO3b99X93ste/OJ3PGxfxv4GaU17OeeMZutew5My/kPKbcvkiot2whmZj8AfivgoWvc/TvF51wDDAOXeISG9PJGsPlrN8VSsiGs8y9XOdoA5fhFelnbN4K5+3k1GvQR4L3Au6J0/r2uWqG3KGYNZLn+ooWROvCw0cZVd+8EmitvLCLdI5FJYDO7APhj4GJ3P5xEGzpN0EEoUV25dC5j150fueMO28E76a7jCEVSJKlVQF8DXg9838yeMLO/SqgdHaO0q7ceBnzl8sXTjmKMotoO3kbOGBaR7pRIAHD333H3Oe6+uPjziSTa0WnqSb1k+6zhapy1Rhtpq/EjkladsApIyoTtwi0XZbVOtcqVpf9edffOwM9SjR+RdFAASEC1znnVOXNCl35euXRupHRPlPNrS/9VjXuR9FIAaLNanXOpg9/w6H4m3cmYseqcOXXl+aOeX6vjCEXSTQfCtFnYIS5xHq5SbU+BgTp6kZTRgTAJinIwe5wTr40cHi8i6aMDYVpoZCzH4hsf4jN3PUGuSucP8U68RtlToOWeIqIRQIsElVsIE/fEaz2Hx4tIeikAtMDIWC50iWW5Vubjy6t5hs07aLmnSLopAMSsdOcfZS1/XJO+tQQdHq/lniKiABCzKAeqt7vz1XJPEQmiABCzWnn1eqp2xkkHvIhIJQWAmIUtwcyY8cUPnq1OWEQ6hpaBxixoCWZ/NqPOX0Q6jkYAMVO+XUS6hQJACyjfLiLdQCkgEZGUUgAQEUkpBQARkZRSABARSSkFABGRlFIAEBFJKQUAEZGUUgAQEUkpBQARkZRSABARSSkFABGRlEqkFpCZ/RnwPuAI8AvgI+7+fCs+a2Qsp8JsIiIBkhoBrHf3s9x9MfBd4LpWfEjpeMZc8WD03PgEV9+3i5GxXCs+TkSkqyQSANz9V2W/Hg9UP0C3QUHHM07kJ1m/eW8rPk5EpKskVg7azL4A/EfgEHBuleetBlYDzJ07t67PCDuesdaxjSIiadCyEYCZ/cDMngz4eR+Au1/j7nOAO4A/DHsfd7/F3YfdfXj27Nl1teGUwf66rouIpEnLAoC7n+fubw34+U7FU+8APtCKNoQdz7hmxYJWfJyISFdJahXQ6e7+z8Vf3wfsacXn6HhGEZFwSc0BrDOzBRSWgT4LfKJVH6TjGUVEgiUSANy9JSkfERGJTjuBRURSSgFARCSlFABERFJKAUBEJKXMvSVVGFrCzA5QWDXUiJOAX8bYnE6g79T5eu37gL5TN6j8Pqe5+4ydtF0VAJphZqPuPpx0O+Kk79T5eu37gL5TN4j6fZQCEhFJKQUAEZGUSlMAuCXpBrSAvlPn67XvA/pO3SDS90nNHICIiEyXphGAiIiUUQAQEUmpVAUAM/szM/uJmT1hZg+Z2SlJt6lZZrbezPYUv9e3zWww6TY1w8wuM7PdZnbEzLp6WZ6ZXWBme83sp2a2Nun2NMvMvmlmvzCzJ5NuSxzMbI6ZbTWzp4r/5j6ddJuaZWavM7N/NLOdxe90Y9Xnp2kOwMzeUDqP2Mw+BZzp7i0rRd0OZnY+sMXdXzOzPwdw988l3KyGmdlbKJQJ/zrwR+4+mnCTGmJmGeCfgHcDzwGPAavc/alEG9YEM3sH8DLwLXd/a9LtaZaZnQyc7O6Pm9nrgR3Ayi7/OzLgeHd/2cyywCPAp919e9DzUzUCaNdh9O3k7g+5+2vFX7cDpybZnma5+9PuvjfpdsTg7cBP3f1n7v4q8LcUDj/qWu7+MHAw6XbExd1fcPfHi39+CXga6OrDQ7zg5eKv2eJPaD+XqgAAhcPozWw/cAVwXdLtidlHge8l3QgBCh3J/rLfn6PLO5deZmbzgCXAowk3pWlmljGzJ4BfAN9399Dv1HMBIK7D6DtJre9UfM41wGsUvldHi/J9RNrFzH4D2Ah8piJL0JXcfdLdF1PIBrzdzELTdUkdCdky7n5exKfeATwIXN/C5sSi1ncys48A7wXe5V0wqVPH31E3ywFzyn4/tXhNOkgxT74RuMPd70u6PXFy93Ez2wpcAARO3PfcCKAaMzu97NeWHUbfTmZ2AfDHwMXufjjp9siUx4DTzWy+mR0LfAi4P+E2SZnihOltwNPu/qWk2xMHM5tdWgloZv0UFiGE9nNpWwW0EZh2GL27d/VdmZn9FDgO+H/FS9u7eWWTmb0f+CowGxgHnnD3FYk2qkFm9h7gK0AG+Ka7fyHZFjXHzDYA76RQavhfgevd/bZEG9UEM/t94EfALgp9AsCfuPuDybWqOWZ2FvA3FP7N9QF3u/vnQ5+fpgAgIiJHpSoFJCIiRykAiIiklAKAiEhKKQCIiKSUAoCISEopAEiqmNlksRrsk2Z2j5kNNPFef21mlxb/fKuZnVnlue80s3/XwGf8XzM7qdE2ilSjACBpM+Hui4vVLF8Fpu2ZMLOGdse7+3+tUUXynUDdAUCklRQAJM1+BPxO8e78R2Z2P/BUsZjWejN7rHjOwsehsHPUzL5WrPH/A+BNpTcys78vnV9QPAfg8WJN9h8WC419AvhscfTx74s7NjcWP+MxM1tWfO0brXBWxW4zuxWwNv8/kRTpuVpAIlEU7/T/APi74qV/A7zV3X9uZquBQ+7+e2Z2HLDNzB6iUC1yAXAm8JvAU8A3K953NvAN4B3F9zrR3Q+a2V8BL7v7/yw+707gy+7+iJnNBTYDb6FQm+oRd/+8mV0I/JeW/o+QVFMAkLTpL5bKhcII4DYKqZl/dPefF6+fD5xVyu8DJwCnA+8ANrj7JPC8mW0JeP+lwMOl93L3sPr55wFnFsrRAPCGYlXKdwCXFF+7ycxebOxritSmACBpM1EslTul2An/uvwS8El331zxvPfE2I4+YKm7vxLQFpG20ByAyEybgf9WLBWMmf2umR0PPAxcXpwjOBk4N+C124F3mNn84mtPLF5/CXh92fMeAj5Z+sXMFhf/+DDw4eK1PwBmxfWlRCopAIjMdCuF/P7jVjgA/esURsvfBv65+Ni3gB9XvtDdDwCrgfvMbCdwV/GhB4D3lyaBgU8Bw8VJ5qc4uhrpRgoBZDeFVNC+Fn1HEVUDFRFJK40ARERSSgFARCSlFABERFJKAUBEJKUUAEREUkoBQEQkpRQARERS6v8DFNHsdHkJUooAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "from openbabel import pybel\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.linear_model import LassoCV\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "yvals = []\n", "fps = []\n", "for mol in pybel.readfile('smi','er.smi'):\n", " yvals.append(float(mol.title))\n", " fpbits = mol.calcfp().bits\n", " fp = np.zeros(1024)\n", " fp[fpbits] = 1\n", " fps.append(fp)\n", " \n", "fps = np.array(fps)\n", "yvals = np.array(yvals)\n", "lin = LinearRegression()\n", "lin.fit(fps,yvals)\n", "pred = lin.predict(fps)\n", "print(np.corrcoef(pred,yvals))\n", "plt.plot(pred,yvals,'o')\n", "plt.xlabel(\"Predicted\")\n", "plt.ylabel(\"Actual\")\n", "plt.show()\n", "\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 2 }