Performance Profiling in Python using cProfile In this brief post, I will describe how to profile and test the performance of code in Python, and how I used this method to solve a major performance issue in my code. To profile and test the performance of Python code, you can use the built-in cProfile library, and then use the pstats library to analyze the results. These libraries enable you to view statistics related to the execution time and frequency of different parts of Python code. I used cProfile to work out a performance issue on my new additions to the bunsen webview, the r-grid-testcase tool. To start, cProfile can be run on any python file from the command line for the whole program you want to profile with python -m cProfile main.py However, you may also want to profile more specific parts of your code. To do this, you must import the library as well as pstats, with import cProfile import pstats Next, navigate to the part of the code you want to profile. In my case it was the @app.route r_grid_testcase decorator within the r-httpd-browse flask web app. We instantiate a cProfile profiler at the beginning of the code we want to profile. @app.route('/r_grid_testcase/') def r_grid_testcase(): with cProfile.Profile() as profile: ... Finally, collect the results from the profiler using a pstats object at the end of the function, sort the results based on the variable of interest (in this case, cumulative time), and then print the results. If the function you are profiling has a return value, ensure the return line is indented one line less than the cProfile line to ensure it is outside the statements block and the function returns correctly. ... results = pstats.Stats(profile) results.sort_stats(pstats.SortKey.CUMULATIVE) results.print_stats() return response You can sort the results by various attributes, such as NAME for function name, CUMULATIVE for cumulative time, or FILENAME, among others. Now, when you run the program, the statistics will be printed to stdout each time that function returns. Here are the top results of the old r-httpd-browse code: 119461 function calls (112011 primitive calls) in 12.938 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 12.795 12.795 /usr/lib64/python3.11/subprocess.py:506(run) 1 0.000 0.000 12.791 12.791 /usr/lib64/python3.11/subprocess.py:1165(communicate) 1 0.000 0.000 12.791 12.791 /usr/lib64/python3.11/subprocess.py:2060(_communicate) 4 0.000 0.000 12.790 3.198 /usr/lib64/python3.11/selectors.py:402(select) 4 12.790 3.198 12.790 3.198 {method 'poll' of 'select.poll' objects} 1 0.000 0.000 0.138 0.138 /home/halamour/.local/lib/python3.11/site-packages/flask/templating.py:139(render_template) 1 0.000 0.000 0.092 0.092 /home/halamour/.local/lib/python3.11/site-packages/jinja2/environment.py:1066(get_or_select_template) 1 0.000 0.000 0.092 0.092 /home/halamour/.local/lib/python3.11/site-packages/jinja2/environment.py:975(get_template) 1 0.000 0.000 0.092 0.092 /home/halamour/.local/lib/python3.11/site-packages/jinja2/environment.py:950(_load_template) You can see a few different statistics shown: ncalls is the number of calls made to the function, tottime is the total time spent in the function excluding time made in calls to sub-functions, percall is the per call time, cumtime is the cumulative time spent in this and all subfunctions, and filename:lineno(function) is the function details. In our case, we can see that by far the most amount of time was spent on the poll method, which means r-httpd-browse is waiting on some other program to run. With this knowledge, we can see that the only call to another program is the subprocess.run command. This is spawning a new process to run the r-grid-testcase command which is what extracts the required test results for the webview. p = subprocess.run(...) So following the breadcrumbs I applied the same profiling steps above to find the source of the slowdown in the r-grid-testcase file. I discovered that the r-grid-test case was running many small database queries to get the test information it was returning, which for a large search limit resulted in huge overhead. To rectify this, I refactored r-grid-test case to run a single large query instead of many smaller queries. This improved performance noticeably. The time to run the r_grid_testcase decorator of r-httpd-browse before and after the change went from almost 13 seconds to below 0.5 seconds. New r-httpd-browse profile: 42320 function calls (41152 primitive calls) in 0.433 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.385 0.385 /usr/lib64/python3.11/subprocess.py:506(run) 1 0.000 0.000 0.384 0.384 /usr/lib64/python3.11/subprocess.py:1165(communicate) 1 0.000 0.000 0.384 0.384 /usr/lib64/python3.11/subprocess.py:2060(_communicate) 4 0.000 0.000 0.383 0.096 /usr/lib64/python3.11/selectors.py:402(select) 4 0.383 0.096 0.383 0.096 {method 'poll' of 'select.poll' objects} 1 0.000 0.000 0.045 0.045 /home/halamour/.local/lib/python3.11/site-packages/flask/templating.py:139(render_template) 1 0.000 0.000 0.045 0.045 /home/halamour/.local/lib/python3.11/site-packages/flask/templating.py:127(_render) 1 0.000 0.000 0.045 0.045 /home/halamour/.local/lib/python3.11/site-packages/jinja2/environment.py:1269(render) 197/8 0.001 0.000 0.044 0.006 {method 'join' of 'str' objects} 2785 0.004 0.000 0.043 0.000