Subprocess¶
%%html
<script src="https://bits.csb.pitt.edu/preamble.js"></script>
Going Outside the (Python) Box¶
Sometimes you need to integrate with programs that don't have a python interface (or you think it would just be easier to use the command line interface).
Python has a versatile subprocess module for calling and interacting with other programs.
However, first the venerable system command:
import os
os.system("curl  http://mscbio2025.csb.pitt.edu -o class.html")
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   152  100   152    0     0  11799      0 --:--:-- --:--:-- --:--:-- 12666
0
The return value of the system command is the exit code (not what is printed to screen).
f = open('class.html')
len(f.read())
152
%%html
<div id="sub1" style="width: 500px"></div>
<script>
    var divid = '#sub1';
	jQuery(divid).asker({
	    id: divid,
	    question: "What exit code indicates success?",
		answers: ['-1','0','1','empty string'],
        server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
		charter: chartmaker})
    
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
subprocess¶
The subprocess module replaces the following modules (so don't use them):
os.system
os.spawn*
os.popen*
popen2.*
commands.*
subprocess.call¶
import subprocess
subprocess.call(ARGS, stdin=None, stdout=None, stderr=None, shell=False)
0
Run the command described by ARGS. Wait for command to complete, then return the returncode attribute.

ARGS¶
ARGS specifies the command to call and its arguments.  It can either be a string or a list of strings.
subprocess.call('echo')
0
subprocess.call(['echo','hello'])
hello
0
If shell=False (default) and args is a string, it must be only the name of the program (no arguments).  If a list is provided, then the first element is the program name and the remaining elements are the arguments.
shell¶
If (and only if) shell = True then the string provided for args is parsed exactly as if you typed it on the commandline.  This means you that:
- you must escape special characters (e.g. spaces in file names)
- you can use the wildcard '*' character to expand file names
- you can add IO redirection
If shell=False then list arguments must be use and they are passed literally to the program (e.g., it would get '*' for a file name).
shell is False by default for security reasons.  Consider:
filename = input("What file would you like to display?\n")
What file would you like to display?
non_existent; rm -rf / #
subprocess.call("cat " + filename, shell=True) # Uh-oh. This will end badly...Be default /bin/sh is used as the shell.  You are probably using bash.  You can specify what shell to use with the executable argument.
subprocess.call('echo $0',shell=True) #prints out /bin/sh
subprocess.call('echo $0',executable='/bin/bash',shell=True) #prints out /bin/bash
/bin/sh /bin/bash
0
shell Examples¶
subprocess.call(['ls','*']) # ls: *: No such file or directory
ls: cannot access '*': No such file or directory
2
!touch file\ with\ spaces
#note - ls returns nonzero exit code if can't list any files
print(subprocess.call('ls *',shell=True)) #shows all files
print(subprocess.call('ls file with spaces',shell=True)) #tries to ls three different files
print(subprocess.call('ls file\ with\ spaces',shell=True)) #shows single file with spaces in name
print(subprocess.call(['ls','file with spaces'])) #ditto
print(subprocess.call(['ls','file\ with\ spaces'])) #fails since it looks for file with backslashes in name
file with spaces 0 2 file with spaces 0 file with spaces 0 2
ls: cannot access 'file': No such file or directory ls: cannot access 'with': No such file or directory ls: cannot access 'spaces': No such file or directory ls: cannot access 'file\ with\ spaces': No such file or directory
subprocess.call('ls *') #why is this FileNotFoundError?
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[15], line 1 ----> 1 subprocess.call('ls *') #why is this FileNotFoundError? File /usr/lib/python3.10/subprocess.py:345, in call(timeout, *popenargs, **kwargs) 337 def call(*popenargs, timeout=None, **kwargs): 338 """Run command with arguments. Wait for command to complete or 339 timeout, then return the returncode attribute. 340 (...) 343 retcode = call(["ls", "-l"]) 344 """ --> 345 with Popen(*popenargs, **kwargs) as p: 346 try: 347 return p.wait(timeout=timeout) File /usr/lib/python3.10/subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize) 967 if self.text_mode: 968 self.stderr = io.TextIOWrapper(self.stderr, 969 encoding=encoding, errors=errors) --> 971 self._execute_child(args, executable, preexec_fn, close_fds, 972 pass_fds, cwd, env, 973 startupinfo, creationflags, shell, 974 p2cread, p2cwrite, 975 c2pread, c2pwrite, 976 errread, errwrite, 977 restore_signals, 978 gid, gids, uid, umask, 979 start_new_session) 980 except: 981 # Cleanup if the child failed starting. 982 for f in filter(None, (self.stdin, self.stdout, self.stderr)): File /usr/lib/python3.10/subprocess.py:1863, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session) 1861 if errno_num != 0: 1862 err_msg = os.strerror(errno_num) -> 1863 raise child_exception_type(errno_num, err_msg, err_filename) 1864 raise child_exception_type(err_msg) FileNotFoundError: [Errno 2] No such file or directory: 'ls *'
Input/Output/Error¶
Every process (program) has standard places to write output and read input.
- stdin - standard input is usually from the keyboard
- stdout - standard output is usually buffered
- stderr - standard error is unbuffered (output immediately)
On the commandline, you can changes these places with IO redirection (<,>,|). For example:
    grep Congress cnn.html > congress
    wc < congress
    grep Congress cnn.html | wc
When calling external programs from scripts we'll usually want to provide input to the programs and read their output, so we'll have to change these 'places' as well.
%%html
<div id="stdoutq" style="width: 500px"></div>
<script>
    var divid = '#stdoutq';
	jQuery(divid).asker({
	    id: divid,
	    question: "When you print something in python, where does it go?",
		answers: ['stdin','stdout','stderr','the screen'],
        server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
		charter: chartmaker})
    
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
stdin/stdout/stderr¶
stdin, stdout and stderr specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are
- subprocess.PIPE- this enables communication between your script and the program
- an existing file object - e.g. created with open
- None - the program will default to the existing stdin/stdout/stderr
Do no use subprocess.PIPE with subprocess.call
Redirecting to files¶
f = open('dump','w')
subprocess.call('ls',stdout=f)
f = open('dump','r') #this would be a very inefficient way to get the stdout of a program
f.readline()
'dump\n'
f = open('dump','w')
subprocess.call(['ls','nonexistantfile'],stdout=f,stderr=subprocess.STDOUT) #you can redirect stderr to stdout
2
print(open('dump').read())
ls: cannot access 'nonexistantfile': No such file or directory
subprocess.check_call¶
check_call is identical to call, but throws an exception when the called program has a nonzero return value.
subprocess.check_call(['ls','missingfile'])
ls: cannot access 'missingfile': No such file or directory
--------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) Cell In[23], line 1 ----> 1 subprocess.check_call(['ls','missingfile']) File /usr/lib/python3.10/subprocess.py:369, in check_call(*popenargs, **kwargs) 367 if cmd is None: 368 cmd = popenargs[0] --> 369 raise CalledProcessError(retcode, cmd) 370 return 0 CalledProcessError: Command '['ls', 'missingfile']' returned non-zero exit status 2.
subprocess.check_output¶

subprocess.check_output¶
subprocess.check_output(ARGS, stdin=None, stderr=None, shell=False)
b'dump\nfile with spaces\n'
Typically, you are calling a program because you want to parse its output.  check_output provides the easiest way to do this.  Its return value is what was written to stdout.
Nonzero return values result in a CalledProcessError exception (like check_call).
files = subprocess.check_output('ls file*',shell=True)
print(files)
b'file with spaces\n'
subprocess.check_output¶
Can redirect stderr to STDOUT
subprocess.check_output("ls non_existent_file; exit 0", stderr=subprocess.STDOUT, shell=True)
b"ls: cannot access 'non_existent_file': No such file or directory\n"
Why exit 0?
%%html
<div id="subck" style="width: 500px"></div>
<script>
    var divid = '#subck';
	jQuery(divid).asker({
	    id: divid,
	    question: "Which of the following does NOT produce an error?",
		answers: ['subprocess.check_output(["ls *"],shell=True)','subprocess.check_output("ls","*",shell=True)',
                  'subprocess.check_output(["ls","*"])','subprocess.check_output(["ls *"])'],
        server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
		charter: chartmaker})
    
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
How can we communicate with the program we are launching?

Popen¶
All the previous functions are just convenience wrappers around the Popen object.
subprocess.Popen(ARGS, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, env=None)
<Popen: returncode: None args: 'ls'>
dump file with spaces
Popen has quite a few optional arguments. Shown are just the most common.
cwd sets the working directory for the process (if None defaults to the current working directory of the python script).
env is a dictionary that can be used to define a new set of environment variables.
Popen is a constructor and returns a Popen object.
proc = subprocess.Popen('echo')
type(proc)
subprocess.Popen
Popen¶
The python script does not wait for the called process to finish before returning.
We can finally use PIPE.

subprocess.PIPE¶
If we set stdin/stdout/stderr to subprocess.PIPE then they are available to read/write to in the resulting Popen object.
proc = subprocess.Popen('ls',stdout=subprocess.PIPE)
type(proc.stdout)
_io.BufferedReader
print(proc.stdout.readline())
b'dump\n'
subprocess.PIPE¶
Pipes enable communication between your script and the called program.
If stdout/stdin/stderr is set to subprocess.PIPE then that input/output stream of the process is accessible through a file object in the returned object.
proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE)
proc.stdin.write(b"Hello")
proc.stdin.close()
print(proc.stdout.read())
b'Hello'
python3 strings are unicode, but most programs need byte strings
proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE)
proc.stdin.write("Hello")
proc.stdin.close()
print(proc.stdout.read())
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[34], line 2 1 proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE) ----> 2 proc.stdin.write("Hello") 3 proc.stdin.close() 4 print(proc.stdout.read()) TypeError: a bytes-like object is required, not 'str'
Unicode (aside)¶
Bytes strings (which were the default kinds of string in python2) store each character using a single byte (ASCII, like in the Martian).
Unicode uses 1 to 6 bytes per a character.
This allows supports for other languages and the all important emoji.
print('\U0001F984')
🦄
Converting bytes to string
b'a byte str'.decode()
'a byte str'
'a unicode string'.encode()
b'a unicode string'
proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE)
proc.stdin.write(b"Hello")
proc.stdin.close()
print(proc.stdout.read())
b'Hello'
%%html
<div id="sublock" style="width: 500px"></div>
<script>
    var divid = '#sublock';
	jQuery(divid).asker({
	    id: divid,
	    question: "What would happen if in the previous code we omitted the close?",
		answers: ['A','B','C','D'],
        extra: ['Nothing would change','It would not print anything','It would print Hello after a pause','It would hang'],
        server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
		charter: chartmaker})
    
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
Warning!¶
Managing simultaneous input and output is tricky and can easily lead to deadlocks.
For example, your script may be blocked waiting for output from the process which is blocked waiting for input.
 
Popen.communicate(input=None)¶
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.
input is a string of data to be provided to stdin (which must be set to PIPE).
Likewise, to receive stdout/stderr, they must be set to PIPE.
This will not deadlock.
99% of the time if you have to both provide input and read output of a subprocess, communicate will do what you need.
proc = subprocess.Popen("awk '{print $1}'",stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=True)
(out, err) = proc.communicate(b"x y z\n1 2 3\na b c\n") #returns tuple of output and error
print(out.decode())  # decode converts a bytes string to a regular unicode string
x 1 a
Interacting with Popen¶
- Popen.poll()- check to see if process has terminated
- Popen.wait()- wait for process to terminate Do not use PIPE
- Popen.terminate()- terminate the process (ask nicely)
- Popen.kill()- kill the process with extreme prejudice
Note that if your are generating a large amount of data, communicate, which buffers all the data in memory, may not be an option (instead just read from Popen.stdout).
If you need to PIPE both stdin and stdout and can't use communicate, be very careful about controlling how data is communicated.
Review¶
- Just want to run a command?- subprocess.call
 
- Want the output of the command?- subprocess.check_output
 
- Don't want to wait for command to finish?- subprocess.Popen
 
- Need to provide data through stdin?- subprocess.Popen,- stdin=subprocess.PIPE,- communicate
 
Exercise¶
We want to predict the binding affinity of a small molecule to a protein using the program gnina.
For simplicity, run your code starting from this colab: https://colab.research.google.com/drive/1QYo5QLUE80N_G28PlpYs6OKGddhhd931?usp=sharing
!wget http://mscbio2025.csb.pitt.edu/files/rec.pdb
!wget http://mscbio2025.csb.pitt.edu/files/lig.pdb
!wget http://mscbio2025.csb.pitt.edu/files/receptor.pdb
!wget http://mscbio2025.csb.pitt.edu/files/ligs.sdf
--2023-11-08 16:58:28-- http://mscbio2025.csb.pitt.edu/files/rec.pdb Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 489908 (478K) [chemical/x-pdb] Saving to: ‘rec.pdb’ rec.pdb 100%[===================>] 478.43K --.-KB/s in 0.009s 2023-11-08 16:58:28 (52.6 MB/s) - ‘rec.pdb’ saved [489908/489908] --2023-11-08 16:58:28-- http://mscbio2025.csb.pitt.edu/files/lig.pdb Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 3536 (3.5K) [chemical/x-pdb] Saving to: ‘lig.pdb’ lig.pdb 100%[===================>] 3.45K --.-KB/s in 0s 2023-11-08 16:58:28 (519 MB/s) - ‘lig.pdb’ saved [3536/3536] --2023-11-08 16:58:28-- http://mscbio2025.csb.pitt.edu/files/receptor.pdb Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 143208 (140K) [chemical/x-pdb] Saving to: ‘receptor.pdb’ receptor.pdb 100%[===================>] 139.85K --.-KB/s in 0.003s 2023-11-08 16:58:28 (46.2 MB/s) - ‘receptor.pdb’ saved [143208/143208] --2023-11-08 16:58:28-- http://mscbio2025.csb.pitt.edu/files/ligs.sdf Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 65619 (64K) [chemical/x-mdl-sdfile] Saving to: ‘ligs.sdf’ ligs.sdf 100%[===================>] 64.08K --.-KB/s in 0.001s 2023-11-08 16:58:28 (44.5 MB/s) - ‘ligs.sdf’ saved [65619/65619]
Project¶
- Run the command smina -r rec.pdb -l lig.pdb --minimizeon these files. Parse the affinity and RMSD and print them on one line.
- Run the command smina -r receptor.pdb -l ligs.sdf --minimize. Parse the affinities and RMSDS.
- Plot histograms of both
- Plot a scatter plot
plt.hist(affinities);
plt.hist(rmsds);
plt.plot(affinities,rmsds,'o')
plt.xlabel("Affinity")
plt.ylabel("RMSD");