Performance Profiling in Python using cProfile

In this brief post, I will describe how to profile and test the performance
of code in Python, and how I used this method to solve a major performance
issue in my code. To profile and test the performance of Python code, you
can use the built-in cProfile library, and then use the pstats library to
analyze the results. These libraries enable you to view statistics related
to the execution time and frequency of different parts of Python code. I
used cProfile to work out a performance issue on my new additions to the
bunsen webview, the r-grid-testcase tool.

To start, cProfile can be run on any python file from the command line for
the whole program you want to profile with

python -m cProfile main.py

However, you may also want to profile more specific parts of your code. To
do this, you must import the library as well as pstats, with

import cProfile
import pstats

Next, navigate to the part of the code you want to profile. In my case it
was the @app.route r_grid_testcase decorator within the r-httpd-browse
flask web app. We instantiate a cProfile profiler at the beginning of the
code we want to profile.

   @app.route('/r_grid_testcase/')
    def r_grid_testcase():
        with cProfile.Profile() as profile:
...

Finally, collect the results from the profiler using a pstats object at the
end of the function, sort the results based on the variable of interest (in
this case, cumulative time), and then print the results. If the function
you are profiling has a return value, ensure the return line is indented
one line less than the cProfile line to ensure it is outside the statements
block and the function returns correctly.

...
  results = pstats.Stats(profile)
        results.sort_stats(pstats.SortKey.CUMULATIVE)
        results.print_stats()
return response


You can sort the results by various attributes, such as NAME for function
name, CUMULATIVE for cumulative time, or FILENAME, among others.

Now, when you run the program, the statistics will be printed to stdout
each time that function returns. Here are the top results of the old
r-httpd-browse code:

        119461 function calls (112011 primitive calls) in 12.938 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   12.795   12.795
/usr/lib64/python3.11/subprocess.py:506(run)
        1    0.000    0.000   12.791   12.791
/usr/lib64/python3.11/subprocess.py:1165(communicate)
        1    0.000    0.000   12.791   12.791
/usr/lib64/python3.11/subprocess.py:2060(_communicate)
        4    0.000    0.000   12.790    3.198
/usr/lib64/python3.11/selectors.py:402(select)
        4   12.790    3.198   12.790    3.198 {method 'poll' of
'select.poll' objects}
        1    0.000    0.000    0.138    0.138
/home/halamour/.local/lib/python3.11/site-packages/flask/templating.py:139(render_template)
        1    0.000    0.000    0.092    0.092
/home/halamour/.local/lib/python3.11/site-packages/jinja2/environment.py:1066(get_or_select_template)
        1    0.000    0.000    0.092    0.092
/home/halamour/.local/lib/python3.11/site-packages/jinja2/environment.py:975(get_template)
        1    0.000    0.000    0.092    0.092
/home/halamour/.local/lib/python3.11/site-packages/jinja2/environment.py:950(_load_template)


You can see a few different statistics shown: ncalls is the number of calls
made to the function, tottime is the total time spent in the function
excluding time made in calls to sub-functions, percall is the per call
time, cumtime is the cumulative time spent in this and all subfunctions,
and filename:lineno(function) is the function details.
In our case, we can see that by far the most amount of time was spent on
the poll method, which means r-httpd-browse is waiting on some other
program to run. With this knowledge, we can see that the only call to
another program is the subprocess.run command. This is spawning a new
process to run the r-grid-testcase command which is what extracts the
required test results for the webview.
 p = subprocess.run(...)

So following the breadcrumbs I applied the same profiling steps above to
find the source of the slowdown in the r-grid-testcase file. I discovered
that the r-grid-test case was running many small database queries to get
the test information it was returning, which for a large search limit
resulted in huge overhead. To rectify this, I refactored r-grid-test case
to run a single large query instead of many smaller queries. This improved
performance noticeably. The time to run the r_grid_testcase decorator of
r-httpd-browse before and after the change went from almost 13 seconds to
below 0.5 seconds.

New r-httpd-browse profile:

        42320 function calls (41152 primitive calls) in 0.433 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.385    0.385
/usr/lib64/python3.11/subprocess.py:506(run)
        1    0.000    0.000    0.384    0.384
/usr/lib64/python3.11/subprocess.py:1165(communicate)
        1    0.000    0.000    0.384    0.384
/usr/lib64/python3.11/subprocess.py:2060(_communicate)
        4    0.000    0.000    0.383    0.096
/usr/lib64/python3.11/selectors.py:402(select)
        4    0.383    0.096    0.383    0.096 {method 'poll' of
'select.poll' objects}
        1    0.000    0.000    0.045    0.045
/home/halamour/.local/lib/python3.11/site-packages/flask/templating.py:139(render_template)
        1    0.000    0.000    0.045    0.045
/home/halamour/.local/lib/python3.11/site-packages/flask/templating.py:127(_render)
        1    0.000    0.000    0.045    0.045
/home/halamour/.local/lib/python3.11/site-packages/jinja2/environment.py:1269(render)
    197/8    0.001    0.000    0.044    0.006 {method 'join' of 'str'
objects}
     2785    0.004    0.000    0.043    0.000 <template>:4(root)
      431    0.003    0.000    0.029    0.000
/home/halamour/.local/lib/python3.11/site-packages/jinja2/runtime.py:260(call)
      195    0.001    0.000    0.024    0.000
/home/halamour/.local/lib/python3.11/site-packages/flask/app.py:1580(url_for)

Old r-grid-testcase profile:

       105721 function calls (103399 primitive calls) in 11.551 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     50/1    0.001    0.000   11.551   11.551 {built-in method
builtins.exec}
        1    0.000    0.000   11.551   11.551 r-grid-testcase:1(<module>)
        1    0.012    0.012   11.426   11.426 r-grid-testcase:207(main)
      201    7.564    0.038    7.564    0.038 {method 'fetchall' of
'sqlite3.Cursor' objects}
      235    3.734    0.016    3.734    0.016 {method 'execute' of
'sqlite3.Connection' objects}
        6    0.000    0.000    0.129    0.021 __init__.py:1(<module>)
     68/4    0.001    0.000    0.125    0.031 <frozen
importlib._bootstrap>:1165(_find_and_load)
     68/4    0.000    0.000    0.125    0.031 <frozen
importlib._bootstrap>:1120(_find_and_load_unlocked)
     65/4    0.001    0.000    0.123    0.031 <frozen
importlib._bootstrap>:666(_load_unlocked)
     48/4    0.000    0.000    0.123    0.031 <frozen
importlib._bootstrap_external>:934(exec_module)
    151/8    0.000    0.000    0.120    0.015 <frozen
importlib._bootstrap>:233(_call_with_frames_removed)


New r-grid-testcase profile:

      75916 function calls (73594 primitive calls) in 0.213 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     50/1    0.001    0.000    0.213    0.213 {built-in method
builtins.exec}
        1    0.000    0.000    0.213    0.213 r-grid-testcase:1(<module>)
        6    0.000    0.000    0.120    0.020 __init__.py:1(<module>)
     68/4    0.001    0.000    0.116    0.029 <frozen
importlib._bootstrap>:1165(_find_and_load)
     68/4    0.000    0.000    0.116    0.029 <frozen
importlib._bootstrap>:1120(_find_and_load_unlocked)
     65/4    0.001    0.000    0.115    0.029 <frozen
importlib._bootstrap>:666(_load_unlocked)
     48/4    0.000    0.000    0.114    0.029 <frozen
importlib._bootstrap_external>:934(exec_module)
    151/8    0.000    0.000    0.112    0.014 <frozen
importlib._bootstrap>:233(_call_with_frames_removed)


It was even obvious from using the webview, as the webpage loaded in much
faster.
In conclusion, cProfile is a simple yet powerful tool for profiling Python
code and quickly debugging performance issues.
You can explore the tool in more depth on the Python docs:
https://docs.python.org/3/library/profile.html#module-cProfile