Create an assignment, ideally related to your research.
The script resulting from the assignment should be a general purpose tool capable of taking different inputs. You are not required to decompose your assignment into different levels of partial credit (e.g. 70%, 80%, etc.) but you may find it useful to structure it that way for organizational purposes.
You may use any python packages as long as they can be installed with a package manager.
You are required to provide:
Make sure it is okay to publicly release the data.
Sometimes you need to integrate with programs that don't have a python interface (or you think it would just be easier to use the command line interface).
Python has a versatile subprocess module for calling and interacting with other programs.
However, first the venerable system command:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 348 100 348 0 0 4143 0 --:--:-- --:--:-- --:--:-- 4192
0
The return value of the system command is the exit code (not what is printed to screen).
348
The subprocess module replaces the following modules (so don't use them):
os.system
os.spawn*
os.popen*
popen2.*
commands.*
dump file with spaces ligs.sdf ligs.sdf.1 receptor.pdb receptor.pdb.1 smina
0
Run the command described by ARGS. Wait for command to complete, then return the returncode attribute.

ARGS specifies the command to call and its arguments. It can either be a string or a list of strings.
0
hello
0
If shell=False (default) and args is a string, it must be only the name of the program (no arguments). If a list is provided, then the first element is the program name and the remaining elements are the arguments.
If (and only if) shell = True then the string provided for args is parsed exactly as if you typed it on the commandline. This means you that:
If shell=False then list arguments must be use and they are passed literally to the program (e.g., it would get '*' for a file name).
shell is False by default for security reasons. Consider:
filename = input("What file would you like to display?\n")
What file would you like to display?
non_existent; rm -rf / #
subprocess.call("cat " + filename, shell=True) # Uh-oh. This will end badly...
By default /bin/sh is used as the shell. You are probably using bash. You can specify what shell to use with the executable argument.
/bin/sh /bin/bash
0
ls: cannot access '*': No such file or directory
2
dump file with spaces ligs.sdf ligs.sdf.1 receptor.pdb receptor.pdb.1 smina 0 2 file with spaces 0 file with spaces 0 2
ls: cannot access 'file': No such file or directory ls: cannot access 'with': No such file or directory ls: cannot access 'spaces': No such file or directory ls: cannot access 'file\ with\ spaces': No such file or directory
--------------------------------------------------------------------------- PermissionError Traceback (most recent call last) /tmp/ipykernel_22967/3042384492.py in <module> ----> 1 subprocess.call('ls *') #why is this FileNotFoundError? ~/apps/anaconda3/lib/python3.9/subprocess.py in call(timeout, *popenargs, **kwargs) 347 retcode = call(["ls", "-l"]) 348 """ --> 349 with Popen(*popenargs, **kwargs) as p: 350 try: 351 return p.wait(timeout=timeout) ~/apps/anaconda3/lib/python3.9/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask) 949 encoding=encoding, errors=errors) 950 --> 951 self._execute_child(args, executable, preexec_fn, close_fds, 952 pass_fds, cwd, env, 953 startupinfo, creationflags, shell, ~/apps/anaconda3/lib/python3.9/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session) 1819 if errno_num != 0: 1820 err_msg = os.strerror(errno_num) -> 1821 raise child_exception_type(errno_num, err_msg, err_filename) 1822 raise child_exception_type(err_msg) 1823 PermissionError: [Errno 13] Permission denied: 'ls *'
Every process (program) has standard places to write output and read input.
On the commandline, you can changes these places with IO redirection (<,>,|). For example:
grep Congress cnn.html > congress
wc < congress
grep Congress cnn.html | wc
stdin, stdout and stderr specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are
subprocess.PIPE - this enables communication between your script and the programopenDo no use subprocess.PIPE with subprocess.call
'dump\n'
2
ls: cannot access 'nonexistantfile': No such file or directory
check_call is identical to call, but throws an exception when the called program has a nonzero return value.
ls: cannot access 'missingfile': No such file or directory
--------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) /tmp/ipykernel_9256/1974956330.py in <module> ----> 1 subprocess.check_call(['ls','missingfile']) ~/apps/anaconda3/lib/python3.9/subprocess.py in check_call(*popenargs, **kwargs) 371 if cmd is None: 372 cmd = popenargs[0] --> 373 raise CalledProcessError(retcode, cmd) 374 return 0 375 CalledProcessError: Command '['ls', 'missingfile']' returned non-zero exit status 2.

b'dump\nfile with spaces\n'
Typically, you are calling a program because you want to parse its output. check_output provides the easiest way to do this. It's return value is what was written to stdout.
Nonzero return values result in a CalledProcessError exception (like check_call).
b'file with spaces\n'
Can redirect stderr to STDOUT
b"ls: cannot access 'non_existent_file': No such file or directory\n"
Why exit 0?
b'dump\nfile with spaces\nligs.sdf\nligs.sdf.1\nreceptor.pdb\nreceptor.pdb.1\nsmina\n'
How can we communicate with the program we are launching?

All the previous functions are just convenience wrappers around the Popen object.
dump file with spaces
<Popen: returncode: None args: 'ls'>
Popen has quite a few optional arguments. Shown are just the most common.
cwd sets the working directory for the process (if None defaults to the current working directory of the python script).
env is a dictionary that can be used to define a new set of environment variables.
Popen is a constructor and returns a Popen object.
subprocess.Popen
The python script does not wait for the called process to finish before returning.
We can finally use PIPE.

If we set stdin/stdout/stderr to subprocess.PIPE then they are available to read/write to in the resulting Popen object.
_io.BufferedReader
b'dump\n'
Pipes enable communication between your script and the called program.
If stdout/stdin/stderr is set to subprocess.PIPE then that input/output stream of the process is accessible through a file object in the returned object.
b'Hello'
python3 strings are unicode, but most programs need byte strings
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_9256/2142223569.py in <module> 1 proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE) ----> 2 proc.stdin.write("Hello") 3 proc.stdin.close() 4 print(proc.stdout.read()) TypeError: a bytes-like object is required, not 'str'
Bytes strings (which were the default kinds of string in python2) store each character using a single byte (ASCII, like in the Martian).
Unicode uses 1 to 6 bytes per a character.
This allows supports for other languages and the all important emoji.
💩
Converting bytes to string
'a byte str'
b'a unicode string'
Managing simultaneous input and output is tricky and can easily lead to deadlocks.
For example, your script may be blocked waiting for output from the process which is blocked waiting for input.

Popen.communicate(input=None)¶Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.
input is a string of data to be provided to stdin (which must be set to PIPE).
Likewise, to receive stdout/stderr, they must be set to PIPE.
This will not deadlock.
99% of the time if you have to both provide input and read output of a subprocess, communicate will do what you need.
x 1 a
Popen.poll() - check to see if process has terminatedPopen.wait() - wait for process to terminate Do not use PIPEPopen.terminate() - terminate the process (ask nicely)Popen.kill() - kill the process with extreme prejudiceNote that if your are generating a large amount of data, communicate, which buffers all the data in memory, may not be an option (instead just read from Popen.stdout).
If you need to PIPE both stdin and stdout and can't use communicate, be very careful about controlling how data is communicated.
subprocess.callsubprocess.check_outputsubprocess.Popensubprocess.Popen, stdin=subprocess.PIPE, communicateWe want to predict the binding affinity of a small molecule to a protein using the program smina.
--2022-11-16 13:48:38-- https://asinansaglam.github.io/python_bio_2022/files/rec.pdb Resolving asinansaglam.github.io (asinansaglam.github.io)... 185.199.109.153, 185.199.111.153, 185.199.108.153, ... Connecting to asinansaglam.github.io (asinansaglam.github.io)|185.199.109.153|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2022-11-16 13:48:38 ERROR 404: Not Found. --2022-11-16 13:48:38-- https://asinansaglam.github.io/python_bio_2022/files/lig.pdb Resolving asinansaglam.github.io (asinansaglam.github.io)... 185.199.109.153, 185.199.111.153, 185.199.108.153, ... Connecting to asinansaglam.github.io (asinansaglam.github.io)|185.199.109.153|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2022-11-16 13:48:38 ERROR 404: Not Found. --2022-11-16 13:48:38-- https://asinansaglam.github.io/python_bio_2022/files/receptor.pdb Resolving asinansaglam.github.io (asinansaglam.github.io)... 185.199.109.153, 185.199.111.153, 185.199.108.153, ... Connecting to asinansaglam.github.io (asinansaglam.github.io)|185.199.109.153|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 143208 (140K) [application/vnd.palm] Saving to: ‘receptor.pdb.1’ receptor.pdb.1 100%[===================>] 139.85K --.-KB/s in 0.05s 2022-11-16 13:48:39 (2.61 MB/s) - ‘receptor.pdb.1’ saved [143208/143208] --2022-11-16 13:48:39-- https://asinansaglam.github.io/python_bio_2022/files/ligs.sdf Resolving asinansaglam.github.io (asinansaglam.github.io)... 185.199.109.153, 185.199.111.153, 185.199.108.153, ... Connecting to asinansaglam.github.io (asinansaglam.github.io)|185.199.109.153|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 65619 (64K) [application/octet-stream] Saving to: ‘ligs.sdf.1’ ligs.sdf.1 100%[===================>] 64.08K --.-KB/s in 0.02s 2022-11-16 13:48:39 (3.91 MB/s) - ‘ligs.sdf.1’ saved [65619/65619] --2022-11-16 13:48:39-- https://sourceforge.net/projects/smina/files/smina.static/download Resolving sourceforge.net (sourceforge.net)... 104.18.10.128, 104.18.11.128, 2606:4700::6812:a80, ... Connecting to sourceforge.net (sourceforge.net)|104.18.10.128|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://downloads.sourceforge.net/project/smina/smina.static?ts=gAAAAABjdTCHNXSW3WcCSoVhdsyazoNqg6gJ9sjkOjM57x-YJjgUYUI-QFd3ZRv32ihx1LemPa-FVzFF2M3l1N85LIfwhq7LOQ%3D%3D&use_mirror=cytranet&r= [following] --2022-11-16 13:48:39-- https://downloads.sourceforge.net/project/smina/smina.static?ts=gAAAAABjdTCHNXSW3WcCSoVhdsyazoNqg6gJ9sjkOjM57x-YJjgUYUI-QFd3ZRv32ihx1LemPa-FVzFF2M3l1N85LIfwhq7LOQ%3D%3D&use_mirror=cytranet&r= Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 204.68.111.105 Connecting to downloads.sourceforge.net (downloads.sourceforge.net)|204.68.111.105|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://cytranet.dl.sourceforge.net/project/smina/smina.static [following] --2022-11-16 13:48:40-- https://cytranet.dl.sourceforge.net/project/smina/smina.static Resolving cytranet.dl.sourceforge.net (cytranet.dl.sourceforge.net)... 162.251.237.20 Connecting to cytranet.dl.sourceforge.net (cytranet.dl.sourceforge.net)|162.251.237.20|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 9853920 (9.4M) [application/octet-stream] Saving to: ‘download’ download 100%[===================>] 9.40M 5.47MB/s in 1.7s 2022-11-16 13:48:42 (5.47 MB/s) - ‘download’ saved [9853920/9853920]
smina -r rec.pdb -l lig.pdb --minimize on these files.
Parse the affinity and RMSD and print them on one line.smina -r receptor.pdb -l ligs.sdf --minimize. Parse the affinities and RMSDS. _______ _______ _________ _ _______
( ____ \( )\__ __/( ( /|( ___ )
| ( \/| () () | ) ( | \ ( || ( ) |
| (_____ | || || | | | | \ | || (___) |
(_____ )| |(_)| | | | | (\ \) || ___ |
) || | | | | | | | \ || ( ) |
/\____) || ) ( |___) (___| ) \ || ) ( |
\_______)|/ \|\_______/|/ )_)|/ \|
smina is based off AutoDock Vina. Please cite appropriately.
Weights Terms
-0.035579 gauss(o=0,_w=0.5,_c=8)
-0.005156 gauss(o=3,_w=2,_c=8)
0.840245 repulsion(o=0,_c=8)
-0.035069 hydrophobic(g=0.5,_b=1.5,_c=8)
-0.587439 non_dir_h_bond(g=-0.7,_b=0,_c=8)
1.923 num_tors_div
Affinity: -6.13684 -0.44100 (kcal/mol)
RMSD: 0.04666
Refine time 0.01145
Affinity: -5.86570 -0.18850 (kcal/mol)
RMSD: 0.16984
Refine time 0.00334
Affinity: -6.05768 -1.30419 (kcal/mol)
RMSD: 0.07613
Refine time 0.00347
Affinity: -6.59074 -0.53131 (kcal/mol)
RMSD: 0.22924
Refine time 0.00641
Affinity: -6.50168 0.08280 (kcal/mol)
RMSD: 0.04795
Refine time 0.00282
Affinity: -5.88335 -0.73565 (kcal/mol)
RMSD: 0.07531
Refine time 0.00204
Affinity: -6.94803 -0.27693 (kcal/mol)
RMSD: 0.08882
Refine time 0.00816
Affinity: -6.11432 -0.32757 (kcal/mol)
RMSD: 0.06710
Refine time 0.00543
Affinity: -5.85392 -0.32171 (kcal/mol)
RMSD: 0.98681
Refine time 0.00488
Affinity: -6.80549 -0.59248 (kcal/mol)
RMSD: 0.17517
Refine time 0.00431
Affinity: -6.73040 -0.57962 (kcal/mol)
RMSD: 0.03489
Refine time 0.00198
Affinity: -5.69268 -0.47264 (kcal/mol)
RMSD: 0.14554
Refine time 0.00318
Affinity: -5.08187 -2.76936 (kcal/mol)
RMSD: 0.10448
Refine time 0.00329
Affinity: -6.44079 -0.76932 (kcal/mol)
RMSD: 0.04434
Refine time 0.00256
Affinity: -6.45828 -0.45417 (kcal/mol)
RMSD: 0.09374
Refine time 0.00619
Affinity: -6.65080 -0.63172 (kcal/mol)
RMSD: 0.08727
Refine time 0.00643
Affinity: -7.12596 -0.32647 (kcal/mol)
RMSD: 0.14559
Refine time 0.00377
Affinity: -6.77129 -0.58484 (kcal/mol)
RMSD: 0.21863
Refine time 0.00683
Affinity: -7.54122 -1.03283 (kcal/mol)
RMSD: 0.06561
Refine time 0.00443
Affinity: -5.62031 -0.34329 (kcal/mol)
RMSD: 0.22742
Refine time 0.00298
Affinity: -6.35736 -0.69922 (kcal/mol)
RMSD: 0.12231
Refine time 0.00419
Affinity: -5.79781 -0.80878 (kcal/mol)
RMSD: 0.14716
Refine time 0.00204
Affinity: -5.88094 -0.42970 (kcal/mol)
RMSD: 0.11252
Refine time 0.00260
Affinity: -7.09409 0.36596 (kcal/mol)
RMSD: 0.41997
Refine time 0.00359
Affinity: -6.13325 -0.22617 (kcal/mol)
RMSD: 0.08001
Refine time 0.00307
Affinity: -7.47566 -1.20172 (kcal/mol)
RMSD: 0.35906
Refine time 0.00858
Affinity: -6.47657 -0.41204 (kcal/mol)
RMSD: 0.04145
Refine time 0.00277
Affinity: -6.58339 -0.62376 (kcal/mol)
RMSD: 0.22803
Refine time 0.00525
Affinity: -6.69025 -0.11960 (kcal/mol)
RMSD: 0.15410
Refine time 0.00368
Affinity: -5.73675 -0.69679 (kcal/mol)
RMSD: 0.06322
Refine time 0.00217
Loop time 0.16327