public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Brown <david@westcontrol.com>
To: Jonathan Wakely <jwakely.gcc@gmail.com>,
	Gaelan Steele <gbs3@st-andrews.ac.uk>
Cc: "gcc-help@gcc.gnu.org" <gcc-help@gcc.gnu.org>
Subject: Re: Why does this unrolled function write to the stack?
Date: Wed, 8 Feb 2023 16:32:32 +0100	[thread overview]
Message-ID: <69488f61-fac0-0c1d-1034-d8d84779f8e2@westcontrol.com> (raw)
In-Reply-To: <CAH6eHdR5GcWckSWrSrgTbL=T3CFTv5NiciuVA=6j6As6FJiyDg@mail.gmail.com>

On 08/02/2023 14:53, Jonathan Wakely via Gcc-help wrote:
> On Wed, 8 Feb 2023 at 13:49, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>>
>> On Wed, 8 Feb 2023 at 13:31, Gaelan Steele via Gcc-help
>> <gcc-help@gcc.gnu.org> wrote:
>>>
>>> Hi all,
>>>
>>> In a computer architecture class, we happened across a strange compilation choice by GCC that neither I nor my professor can make much sense of. The source is as follows:
>>>
>>> void foo(int *a, const int *__restrict b, const int *__restrict c)
>>> {
>>>    for (int i = 0; i < 16; i++) {
>>>      a[i] = b[i] + c[i];
>>>    }
>>> }
>>>
>>> I won't reproduce the full compiled output here, as it's rather long, but when compiled with -O3 -mno-avx -mno-sse, GCC 12.2 for x86-64 (via Compiler Explorer: https://godbolt.org/z/o9e4o7cj4) produces an unrolled loop that appears to write each sum into an array on the stack before copying it into the provided pointer a. This seems hugely inefficient - it's doing quite a few memory accesses - and I can't see why it would be necessary.
>>
>> I don't think it's *necessary*. If you use -Os or -O1 or -O2 you get a
>> loop. So it's just an optimization choice at -O3 presumably based on
>> cost estimates that say that fully unrolling the loop will make the
>> code faster than looping.
>>

There's nothing wrong with the loop unrolling.  It's the use of space on 
the stack that's the problem.

> So it does look like GCC is making poor choices here.
> 

It seems to be a regression between gcc 10 and gcc 11 (discovered by 
changing the compiler on godbolt.org).  With gcc 11 onwards, the 
compiler seems to be using the stack to combine two 4-byte elements at a 
time into a single 8-byte element.  It's easy to see the effect by 
changing the loop size to 2.

(I've no idea what causes the effect, or how to fix it - but knowing it 
is a regression might make it easier for you to find.)



  reply	other threads:[~2023-02-08 15:32 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-08 13:29 Gaelan Steele
2023-02-08 13:49 ` Jonathan Wakely
2023-02-08 13:53   ` Jonathan Wakely
2023-02-08 15:32     ` David Brown [this message]
2023-02-08 18:52       ` Gaelan Steele

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69488f61-fac0-0c1d-1034-d8d84779f8e2@westcontrol.com \
    --to=david@westcontrol.com \
    --cc=gbs3@st-andrews.ac.uk \
    --cc=gcc-help@gcc.gnu.org \
    --cc=jwakely.gcc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).