Re: Why does this unrolled function write to the stack?

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jonathan Wakely <jwakely.gcc@gmail.com>
To: Gaelan Steele <gbs3@st-andrews.ac.uk>
Cc: "gcc-help@gcc.gnu.org" <gcc-help@gcc.gnu.org>
Subject: Re: Why does this unrolled function write to the stack?
Date: Wed, 8 Feb 2023 13:49:50 +0000	[thread overview]
Message-ID: <CAH6eHdQeS_D_6tM9cgD-yNZnHuRYDzoFtpo=+DZweT4Lm1iHKw@mail.gmail.com> (raw)
In-Reply-To: <VE1PR06MB705447E5FF4C3080E5F3E586F1D89@VE1PR06MB7054.eurprd06.prod.outlook.com>

On Wed, 8 Feb 2023 at 13:31, Gaelan Steele via Gcc-help
<gcc-help@gcc.gnu.org> wrote:
>
> Hi all,
>
> In a computer architecture class, we happened across a strange compilation choice by GCC that neither I nor my professor can make much sense of. The source is as follows:
>
> void foo(int *a, const int *__restrict b, const int *__restrict c)
> {
>   for (int i = 0; i < 16; i++) {
>     a[i] = b[i] + c[i];
>   }
> }
>
> I won't reproduce the full compiled output here, as it's rather long, but when compiled with -O3 -mno-avx -mno-sse, GCC 12.2 for x86-64 (via Compiler Explorer: https://godbolt.org/z/o9e4o7cj4) produces an unrolled loop that appears to write each sum into an array on the stack before copying it into the provided pointer a. This seems hugely inefficient - it's doing quite a few memory accesses - and I can't see why it would be necessary.

I don't think it's *necessary*. If you use -Os or -O1 or -O2 you get a
loop. So it's just an optimization choice at -O3 presumably based on
cost estimates that say that fully unrolling the loop will make the
code faster than looping.

>
> Am I missing some reason why this is more efficient than the naive approach (computing the each sum into an intermediate register, then writing it directly into a)?

Benchmarking the function at different optimization levels I get:

Run on (8 X 4500 MHz CPU s)
CPU Caches:
 L1 Data 32 KiB (x4)
 L1 Instruction 32 KiB (x4)
 L2 Unified 256 KiB (x4)
 L3 Unified 8192 KiB (x1)
Load Average: 0.14, 0.22, 0.39
***WARNING*** CPU scaling is enabled, the benchmark real time
measurements may be noisy and will incur extra overhead.
-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
O3               1.60 ns         1.60 ns    432901632
O2               3.56 ns         3.56 ns    197086506
O1               6.87 ns         6.86 ns    101839250
Os               8.23 ns         8.22 ns     85273333


Using quickbench:
https://quick-bench.com/q/sSwVvtrkOCp9q-XyKAevthiaNAw

next prev parent reply	other threads:[~2023-02-08 13:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-08 13:29 Gaelan Steele
2023-02-08 13:49 ` Jonathan Wakely [this message]
2023-02-08 13:53   ` Jonathan Wakely
2023-02-08 15:32     ` David Brown
2023-02-08 18:52       ` Gaelan Steele

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH6eHdQeS_D_6tM9cgD-yNZnHuRYDzoFtpo=+DZweT4Lm1iHKw@mail.gmail.com' \
    --to=jwakely.gcc@gmail.com \
    --cc=gbs3@st-andrews.ac.uk \
    --cc=gcc-help@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).