Subprocess¶
%%html
<script src="https://bits.csb.pitt.edu/preamble.js"></script>
Going Outside the (Python) Box¶
Sometimes you need to integrate with programs that don't have a python interface (or you think it would just be easier to use the command line interface).
Python has a versatile subprocess module for calling and interacting with other programs.
However, first the venerable system command:
import os
os.system("curl http://mscbio2025.csb.pitt.edu -o class.html")
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 152 100 152 0 0 11799 0 --:--:-- --:--:-- --:--:-- 12666
0
The return value of the system command is the exit code (not what is printed to screen).
f = open('class.html')
len(f.read())
152
%%html
<div id="sub1" style="width: 500px"></div>
<script>
var divid = '#sub1';
jQuery(divid).asker({
id: divid,
question: "What exit code indicates success?",
answers: ['-1','0','1','empty string'],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
subprocess¶
The subprocess
module replaces the following modules (so don't use them):
os.system
os.spawn*
os.popen*
popen2.*
commands.*
subprocess.call¶
import subprocess
subprocess.call(ARGS, stdin=None, stdout=None, stderr=None, shell=False)
0
Run the command described by ARGS
. Wait for command to complete, then return the returncode attribute.
ARGS¶
ARGS
specifies the command to call and its arguments. It can either be a string or a list of strings.
subprocess.call('echo')
0
subprocess.call(['echo','hello'])
hello
0
If shell=False
(default) and args
is a string, it must be only the name of the program (no arguments). If a list is provided, then the first element is the program name and the remaining elements are the arguments.
shell¶
If (and only if) shell = True
then the string provided for args
is parsed exactly as if you typed it on the commandline. This means you that:
- you must escape special characters (e.g. spaces in file names)
- you can use the wildcard '*' character to expand file names
- you can add IO redirection
If shell=False
then list arguments must be use and they are passed literally to the program (e.g., it would get '*' for a file name).
shell
is False
by default for security reasons. Consider:
filename = input("What file would you like to display?\n")
What file would you like to display?
non_existent; rm -rf / #
subprocess.call("cat " + filename, shell=True) # Uh-oh. This will end badly...
Be default /bin/sh
is used as the shell. You are probably using bash
. You can specify what shell to use with the executable
argument.
subprocess.call('echo $0',shell=True) #prints out /bin/sh
subprocess.call('echo $0',executable='/bin/bash',shell=True) #prints out /bin/bash
/bin/sh /bin/bash
0
shell Examples¶
subprocess.call(['ls','*']) # ls: *: No such file or directory
ls: cannot access '*': No such file or directory
2
!touch file\ with\ spaces
#note - ls returns nonzero exit code if can't list any files
print(subprocess.call('ls *',shell=True)) #shows all files
print(subprocess.call('ls file with spaces',shell=True)) #tries to ls three different files
print(subprocess.call('ls file\ with\ spaces',shell=True)) #shows single file with spaces in name
print(subprocess.call(['ls','file with spaces'])) #ditto
print(subprocess.call(['ls','file\ with\ spaces'])) #fails since it looks for file with backslashes in name
file with spaces 0 2 file with spaces 0 file with spaces 0 2
ls: cannot access 'file': No such file or directory ls: cannot access 'with': No such file or directory ls: cannot access 'spaces': No such file or directory ls: cannot access 'file\ with\ spaces': No such file or directory
subprocess.call('ls *') #why is this FileNotFoundError?
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[15], line 1 ----> 1 subprocess.call('ls *') #why is this FileNotFoundError? File /usr/lib/python3.10/subprocess.py:345, in call(timeout, *popenargs, **kwargs) 337 def call(*popenargs, timeout=None, **kwargs): 338 """Run command with arguments. Wait for command to complete or 339 timeout, then return the returncode attribute. 340 (...) 343 retcode = call(["ls", "-l"]) 344 """ --> 345 with Popen(*popenargs, **kwargs) as p: 346 try: 347 return p.wait(timeout=timeout) File /usr/lib/python3.10/subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize) 967 if self.text_mode: 968 self.stderr = io.TextIOWrapper(self.stderr, 969 encoding=encoding, errors=errors) --> 971 self._execute_child(args, executable, preexec_fn, close_fds, 972 pass_fds, cwd, env, 973 startupinfo, creationflags, shell, 974 p2cread, p2cwrite, 975 c2pread, c2pwrite, 976 errread, errwrite, 977 restore_signals, 978 gid, gids, uid, umask, 979 start_new_session) 980 except: 981 # Cleanup if the child failed starting. 982 for f in filter(None, (self.stdin, self.stdout, self.stderr)): File /usr/lib/python3.10/subprocess.py:1863, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session) 1861 if errno_num != 0: 1862 err_msg = os.strerror(errno_num) -> 1863 raise child_exception_type(errno_num, err_msg, err_filename) 1864 raise child_exception_type(err_msg) FileNotFoundError: [Errno 2] No such file or directory: 'ls *'
Input/Output/Error¶
Every process (program) has standard places to write output and read input.
- stdin - standard input is usually from the keyboard
- stdout - standard output is usually buffered
- stderr - standard error is unbuffered (output immediately)
On the commandline, you can changes these places with IO redirection (<,>,|). For example:
grep Congress cnn.html > congress wc < congress grep Congress cnn.html | wc
When calling external programs from scripts we'll usually want to provide input to the programs and read their output, so we'll have to change these 'places' as well.
%%html
<div id="stdoutq" style="width: 500px"></div>
<script>
var divid = '#stdoutq';
jQuery(divid).asker({
id: divid,
question: "When you print something in python, where does it go?",
answers: ['stdin','stdout','stderr','the screen'],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
stdin/stdout/stderr¶
stdin, stdout and stderr specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are
subprocess.PIPE
- this enables communication between your script and the program- an existing file object - e.g. created with
open
- None - the program will default to the existing stdin/stdout/stderr
Do no use subprocess.PIPE
with subprocess.call
Redirecting to files¶
f = open('dump','w')
subprocess.call('ls',stdout=f)
f = open('dump','r') #this would be a very inefficient way to get the stdout of a program
f.readline()
'dump\n'
f = open('dump','w')
subprocess.call(['ls','nonexistantfile'],stdout=f,stderr=subprocess.STDOUT) #you can redirect stderr to stdout
2
print(open('dump').read())
ls: cannot access 'nonexistantfile': No such file or directory
subprocess.check_call¶
check_call
is identical to call
, but throws an exception when the called program has a nonzero return value.
subprocess.check_call(['ls','missingfile'])
ls: cannot access 'missingfile': No such file or directory
--------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) Cell In[23], line 1 ----> 1 subprocess.check_call(['ls','missingfile']) File /usr/lib/python3.10/subprocess.py:369, in check_call(*popenargs, **kwargs) 367 if cmd is None: 368 cmd = popenargs[0] --> 369 raise CalledProcessError(retcode, cmd) 370 return 0 CalledProcessError: Command '['ls', 'missingfile']' returned non-zero exit status 2.
subprocess.check_output¶
subprocess.check_output¶
subprocess.check_output(ARGS, stdin=None, stderr=None, shell=False)
b'dump\nfile with spaces\n'
Typically, you are calling a program because you want to parse its output. check_output
provides the easiest way to do this. Its return value is what was written to stdout
.
Nonzero return values result in a CalledProcessError
exception (like check_call
).
files = subprocess.check_output('ls file*',shell=True)
print(files)
b'file with spaces\n'
subprocess.check_output¶
Can redirect stderr
to STDOUT
subprocess.check_output("ls non_existent_file; exit 0", stderr=subprocess.STDOUT, shell=True)
b"ls: cannot access 'non_existent_file': No such file or directory\n"
Why exit 0
?
%%html
<div id="subck" style="width: 500px"></div>
<script>
var divid = '#subck';
jQuery(divid).asker({
id: divid,
question: "Which of the following does NOT produce an error?",
answers: ['subprocess.check_output(["ls *"],shell=True)','subprocess.check_output("ls","*",shell=True)',
'subprocess.check_output(["ls","*"])','subprocess.check_output(["ls *"])'],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
How can we communicate with the program we are launching?
Popen¶
All the previous functions are just convenience wrappers around the Popen object.
subprocess.Popen(ARGS, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, env=None)
<Popen: returncode: None args: 'ls'>
dump file with spaces
Popen has quite a few optional arguments. Shown are just the most common.
cwd
sets the working directory for the process (if None
defaults to the current working directory of the python script).
env
is a dictionary that can be used to define a new set of environment variables.
Popen
is a constructor and returns a Popen
object.
proc = subprocess.Popen('echo')
type(proc)
subprocess.Popen
Popen¶
The python script does not wait for the called process to finish before returning.
We can finally use PIPE
.
subprocess.PIPE¶
If we set stdin/stdout/stderr to subprocess.PIPE
then they are available to read/write to in the resulting Popen object.
proc = subprocess.Popen('ls',stdout=subprocess.PIPE)
type(proc.stdout)
_io.BufferedReader
print(proc.stdout.readline())
b'dump\n'
subprocess.PIPE¶
Pipes enable communication between your script and the called program.
If stdout/stdin/stderr
is set to subprocess.PIPE
then that input/output stream of the process is accessible through a file object in the returned object.
proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE)
proc.stdin.write(b"Hello")
proc.stdin.close()
print(proc.stdout.read())
b'Hello'
python3 strings are unicode, but most programs need byte strings
proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE)
proc.stdin.write("Hello")
proc.stdin.close()
print(proc.stdout.read())
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[34], line 2 1 proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE) ----> 2 proc.stdin.write("Hello") 3 proc.stdin.close() 4 print(proc.stdout.read()) TypeError: a bytes-like object is required, not 'str'
Unicode (aside)¶
Bytes strings (which were the default kinds of string in python2) store each character using a single byte (ASCII, like in the Martian).
Unicode uses 1 to 6 bytes per a character.
This allows supports for other languages and the all important emoji.
print('\U0001F984')
🦄
Converting bytes to string
b'a byte str'.decode()
'a byte str'
'a unicode string'.encode()
b'a unicode string'
proc = subprocess.Popen('cat',stdin=subprocess.PIPE,stdout=subprocess.PIPE)
proc.stdin.write(b"Hello")
proc.stdin.close()
print(proc.stdout.read())
b'Hello'
%%html
<div id="sublock" style="width: 500px"></div>
<script>
var divid = '#sublock';
jQuery(divid).asker({
id: divid,
question: "What would happen if in the previous code we omitted the close?",
answers: ['A','B','C','D'],
extra: ['Nothing would change','It would not print anything','It would print Hello after a pause','It would hang'],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
Warning!¶
Managing simultaneous input and output is tricky and can easily lead to deadlocks.
For example, your script may be blocked waiting for output from the process which is blocked waiting for input.
Popen.communicate(input=None)
¶
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.
input
is a string of data to be provided to stdin (which must be set to PIPE
).
Likewise, to receive stdout/stderr, they must be set to PIPE
.
This will not deadlock.
99% of the time if you have to both provide input and read output of a subprocess, communicate will do what you need.
proc = subprocess.Popen("awk '{print $1}'",stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=True)
(out, err) = proc.communicate(b"x y z\n1 2 3\na b c\n") #returns tuple of output and error
print(out.decode()) # decode converts a bytes string to a regular unicode string
x 1 a
Interacting with Popen¶
Popen.poll()
- check to see if process has terminatedPopen.wait()
- wait for process to terminate Do not use PIPEPopen.terminate()
- terminate the process (ask nicely)Popen.kill()
- kill the process with extreme prejudice
Note that if your are generating a large amount of data, communicate
, which buffers all the data in memory, may not be an option (instead just read from Popen.stdout
).
If you need to PIPE
both stdin
and stdout
and can't use communicate
, be very careful about controlling how data is communicated.
Review¶
- Just want to run a command?
subprocess.call
- Want the output of the command?
subprocess.check_output
- Don't want to wait for command to finish?
subprocess.Popen
- Need to provide data through stdin?
subprocess.Popen
,stdin=subprocess.PIPE
,communicate
Exercise¶
We want to predict the binding affinity of a small molecule to a protein using the program gnina
.
For simplicity, run your code starting from this colab: https://colab.research.google.com/drive/1QYo5QLUE80N_G28PlpYs6OKGddhhd931?usp=sharing
!wget http://mscbio2025.csb.pitt.edu/files/rec.pdb
!wget http://mscbio2025.csb.pitt.edu/files/lig.pdb
!wget http://mscbio2025.csb.pitt.edu/files/receptor.pdb
!wget http://mscbio2025.csb.pitt.edu/files/ligs.sdf
--2023-11-08 16:58:28-- http://mscbio2025.csb.pitt.edu/files/rec.pdb Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 489908 (478K) [chemical/x-pdb] Saving to: ‘rec.pdb’ rec.pdb 100%[===================>] 478.43K --.-KB/s in 0.009s 2023-11-08 16:58:28 (52.6 MB/s) - ‘rec.pdb’ saved [489908/489908] --2023-11-08 16:58:28-- http://mscbio2025.csb.pitt.edu/files/lig.pdb Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 3536 (3.5K) [chemical/x-pdb] Saving to: ‘lig.pdb’ lig.pdb 100%[===================>] 3.45K --.-KB/s in 0s 2023-11-08 16:58:28 (519 MB/s) - ‘lig.pdb’ saved [3536/3536] --2023-11-08 16:58:28-- http://mscbio2025.csb.pitt.edu/files/receptor.pdb Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 143208 (140K) [chemical/x-pdb] Saving to: ‘receptor.pdb’ receptor.pdb 100%[===================>] 139.85K --.-KB/s in 0.003s 2023-11-08 16:58:28 (46.2 MB/s) - ‘receptor.pdb’ saved [143208/143208] --2023-11-08 16:58:28-- http://mscbio2025.csb.pitt.edu/files/ligs.sdf Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 65619 (64K) [chemical/x-mdl-sdfile] Saving to: ‘ligs.sdf’ ligs.sdf 100%[===================>] 64.08K --.-KB/s in 0.001s 2023-11-08 16:58:28 (44.5 MB/s) - ‘ligs.sdf’ saved [65619/65619]
Project¶
- Run the command
smina -r rec.pdb -l lig.pdb --minimize
on these files. Parse the affinity and RMSD and print them on one line. - Run the command
smina -r receptor.pdb -l ligs.sdf --minimize
. Parse the affinities and RMSDS. - Plot histograms of both
- Plot a scatter plot
plt.hist(affinities);
plt.hist(rmsds);
plt.plot(affinities,rmsds,'o')
plt.xlabel("Affinity")
plt.ylabel("RMSD");