What is Python multiprocessing and how to use it

Python multiprocessing lets you divide the workload among multiple processes, cutting down on overall execution time. This is especially useful for making hefty calculations or handling large datasets.

What is Python multiprocessing?

Multiprocessing in Python refers to running multiple processes simultaneously, allowing you to make the most of multicore systems. Unlike single-threaded methods that handle tasks one by one, multiprocessing lets various parts of the program run in parallel, each on its own. Each process gets its own memory space and can run on separate processor cores, slashing execution time for heavy-duty or time-sensitive operations.

Python multiprocessing has a wide range of applications. Multiprocessing is often used in data processing and analysis to process large data sets faster and to accelerate complex analyses. Multiprocessing can also be used in simulations and modelling calculations (e.g., in scientific applications) to shorten the execution times of complex calculations. In addition to powering web scraping by fetching data from multiple sites simultaneously, it also boosts efficiency in image processing and computer vision, resulting in quicker image analysis.

HiDrive Cloud Storage
Store and share your data on the go
  • Store, share and edit data easily
  • ISO-certified European data centres
  • Highly secure and GDPR compliant

How to implement Python multiprocessing

Python offers various options for implementing multiprocessing. In the following sections, we’ll introduce you to three common tools: the multiprocessing module, the concurrent.futures library and the joblib package.

multiprocessing module

The multiprocessing module is the standard module for Python multiprocessing. With this module, you can create processes, share data between them and sync them using locks, queues and other tools.

import multiprocessing
def task(n):
    result = n * n
    print(f"Result: {result}")
if __name__ == "__main__":
    processes = []
    for i in range(1, 6):
        process = multiprocessing.Process(target=task, args=(i,))
        processes.append(process)
        process.start()
    for process in processes:
        process.join()
python

In the example above, we use the multiprocessing.Process class to spawn and run processes executing the task() function, which computes the square of a given number. After initialising the processes, we wait for them to complete before proceeding with the main program. The result is displayed using an f-string, a Python string format method that incorporates expressions. It’s worth noting that the sequence of output is random and non-deterministic. You can also create a process pool with Python multiprocessing:

import multiprocessing
def task(n):
    return n * n
if __name__ == "__main__":
    with multiprocessing.Pool() as pool:
        results = pool.map(task, range(1, 6))
        print(results)  # Output: [1, 4, 9, 16, 25]
python

With pool.map() the function task() is applied to a sequence of data, and the results are collected and output.

concurrent.futures library

This module provides a high-level interface for asynchronous execution and parallel processing of processes. It uses the Pool Executor to execute tasks on a pool of processes or threads. The concurrent.futures module is a simpler way to process asynchronous tasks and is in many cases easier to handle than the Python multiprocessing module.

import concurrent.futures
def task(n):
    return n * n
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(task, i) for i in range(1, 6)]
    for future in concurrent.futures.as_completed(futures):
        print(future.result()) # result in random order
python

The code uses the concurrent.futures module to process tasks in parallel with the ProcessPoolExecutor. The function task(n) is passed for numbers from 1 to 5. The as_completed() method waits for the tasks to be completed and outputs the results in any order.

joblib package

joblib is an external Python library designed to simplify parallel processing in Python, for example, for repeatable tasks such as executing functions with different input parameters or working with large amounts of data. The main functions of joblib is the parallelisation of tasks, the caching of function results and the optimisation of memory and computing resources.

from joblib import Parallel, delayed
def task(n):
    return n * n
results = Parallel(n_jobs=4)(delayed(task)(i) for i in range(1, 11))
print(results) # Output: Results of the function for numbers from 1 to 10
python

The expression Parallel(n_jobs=4)(delayed(task)(i) for i in range(1, 11)) initiates the parallel execution of the function task() for the numbers from 1 to 10. Parallel is configured with n_jobs=4, meaning up to four parallel jobs can be processed. Calling delayed(task)(i) creates the task to be executed in parallel for each number i in the range from 1 to 10. This means that the task() function is called simultaneously for each of these numbers. The results for numbers 1 through 10 are stored in results and output.

Was this article helpful?
Page top