6.2.3. Multiprocessing with PYSPEX

As described in Session class structure, a PYSPEX session can only be started once, at least per process or thread. This can be an annoying limitation if you want to run several SPEX sessions at once. This problem can be partly circumvented by using the multiprocessing module of Python. This module is able to run functions in parallel with different input parameters.

So, if you have a problem that requires you to run SPEX multiple times with a very similar setup, but different input parameters, then this thread may be helpful to you.

The following script calculates the spectrum emitted by a CIE plasma for four different temperatures. This is just a simple example. This particular problem can be calculated more efficiently in a different way, but this shows the potential of the multiprocessing module if you need to do more complicated calculations.

First, we show the full script:

#!/usr/bin/env python

from multiprocessing import Pool
from pyspex.spex import Session

# Define the function that calls pyspex
# and does the calculation and/or
# analysis that you want

def ciecalc(kt):
    # Start the SPEX session
    s = Session()

    # Load a CIE model and choose
    # the SPEXACT version 3 database
    s.com('cie')
    s.var_calc(True)

    # Set the temperature of the plasma
    s.par(1,1,'t',kt[0])

    # Calculate the model
    s.calc()

    # Save the output spectral model
    # to a FITS file
    (pl, plt) = s.plot_model(show=False)
    pl.sector[0].tabmodel.write('cie_{0}.fits'.format(kt[0]),
                                format='fits', overwrite=True)

    # Close this SPEX session gently
    s.__del__()


# The function below is executed when this
# script is executed from the terminal:

if __name__ == "__main__":

    # Maximum number of processes to
    # create at once
    nproc = 2

    # The temperatures for which to
    # calculate the CIE spectrum
    kt = (0.5, 1.0, 2.0, 4.0)

    # Pack the input parameters into one
    # iterable variable
    arg = zip(kt)

    # Create a new pool of processes (nproc)
    pool = Pool(processes=nproc, maxtasksperchild=1)

    # Call the ciecalc function with each argument
    pool.map(ciecalc, arg)

    # Close the pool
    pool.close()
    pool.join()

Next, we will discuss the details of each part of this script.

6.2.3.1. Needed Python modules

For this script, PYSPEX is obviously needed and is imported as usual. The other module is multiprocessing.Pool, which will handle the multiprocessing for us (see also Python multiprocessing):

from multiprocessing import Pool
from pyspex.spex import Session

6.2.3.2. The ciecalc function

The function ciecalc is our worker function here. This means that this function will start a PYSPEX session, executes a number of SPEX commands (based on the input arguments), and produces a result. In this case, we add one CIE component, set a temperature read from the function input arguments, and calculate the spectrum. In the last step, we get the calculated spectrum from a model plot and save it to FITS format. The temperature value is included in the output filename.

It is important here to have the start of the PYSPEX session s=Session() inside the function. Remember that you can only have one Session per process, so this needs to be defined in each process separately to make PYSPEX run in parallel.

In this example, we just vary the temperature in each calculation, but this can be expanded to more variables or options. There are multiple ways of approaching this:

  1. You create a worker function with multiple arguments (see starmap).

  2. You pack the variables in an object such that you can pass that object to the function. This object could be a Python list or dictionary, or a custom object that you define. The function should be able to read the variables and instructions from the input object.

  3. You pass a filename of a SPEX .com file or other configuration file at each iteration.

Make sure to call the s.__del__() at the very end of your function to close SPEX gently at the end. This will delete all the *.dum files as well.

6.2.3.3. Set up multiprocessing

The nproc variable specifies the number of processes to run at the same time. Choose this parameter wisely based on the number of processors and amount of RAM memory in your computer. Also keep in mind that one PYSPEX session may also use multiple processors at the same time.

Warning

Please take care that you do not set nproc too high. A PYSPEX run can use between 1 and 4 GB of RAM memory. If you do not have a lot of RAM, then these processes can make your system extremely slow or even crash.

It may be helpful to limit the number of cores used for each process. This can be done by setting the environment variable OMP_NUM_THREADS. For example, if you have a computer with 16 cores and 32 GB of RAM memory, the optimal setting would be nproc=4 and export OMP_NUM_THREADS=4. Or:

import os
os.environ["OMP_NUM_THREADS"] = "4"

Having 4 processes which can use 4 cores each gives you 16 cores at maximum, which is exactly the processor specification. Assuming that each of the 4 processes uses 4 GB of memory at maximum, the total memory usage of 16GB should fit easily on the available RAM memory chips.

The line that creates the pool of processes (pool = Pool(processes=nproc, maxtasksperchild=1)) needs a very important option. The maxtasksperchild=1 option tells the pool that each function needs to be performed in a separate process. This ensures that the s = Session() command is only given once in each process.

The pool.map(ciecalc, arg) line divides the tasks over the different processes and passes the arguments (arg) to the ciecalc function. The arg variable needs to contain an iterable list or array. The zip function can help to create a multi-dimensional array with input arguments if necessary. See also the starmap function to pass multiple arguments to the worker function.

The pool.close() and pool.join() functions make sure that the processes are properly closed and that their output is merged (if applicable).

The script above can be saved as pyspexmp.py and executed from the command line:

python pyspexmp.py

You should see multiple welcome messages from SPEX on your screen indicating that multiple instances of SPEX are running.

6.2.3.4. Final thoughts

The method above is especially suitable for repeating long and more complicated PYSPEX sessions. Since the SPEX is restarted each time a process starts, you will lose a couple of seconds. Therefore, this is only efficient if the runtime of one iteration is much longer than a couple of seconds.

The multiprocessing module has many more options which may serve your needs. As long as you have only one SPEX session per process, everything should run well.