public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets
@ 2023-07-18 17:11 javier.martinez.bugzilla at gmail dot com
  2023-07-18 17:38 ` [Bug middle-end/110724] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: javier.martinez.bugzilla at gmail dot com @ 2023-07-18 17:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

            Bug ID: 110724
           Summary: Unnecessary alignment on branch to unconditional
                    branch targets
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: javier.martinez.bugzilla at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/f7qMxxfMj

void duff(int * __restrict to, const int * __restrict from, const int count)
{
    int n = (count+7) / 8;
    switch(count%8)
    {
       case 0: do { *to++ = *from++;
       case 7:      *to++ = *from++;
       case 6:      *to++ = *from++;
       case 5:      *to++ = *from++;
       case 4:      *to++ = *from++;
       case 3:      *to++ = *from++;
       case 2:      *to++ = *from++;
       [[likely]] case 1:      *to++ = *from++;
        } while (--n>0);
    }
}

Trunk with O3:
        jle     .L1
        [...]
        lea     rax, [rax+4]
        jmp     .L5            # <-- no fall-through to ret
        .p2align 4,,7          # <-- unnecessary alignment
        .p2align 3
.L1:
        ret


I believe this 16-byte alignment is done to put the branch target at the
beginning of a front-end instruction fetch block. That however seems
unnecessary when the branch target is itself an unconditional branch, as the
instructions to follow will not retire.

In this example the degrade is code size / instruction caching only, as there
is no possible fall-through to .L1 that would cause nop's to be consumed.
Changing the C++ attribute to [[unlikely]] introduces fall-through, and GCC
seems to remove the padding, which is great.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets
  2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
@ 2023-07-18 17:38 ` pinskia at gcc dot gnu.org
  2023-07-18 17:48 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-18 17:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this is by design.

Adding -fno-align-jumps makes the alignment go away.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets
  2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
  2023-07-18 17:38 ` [Bug middle-end/110724] " pinskia at gcc dot gnu.org
@ 2023-07-18 17:48 ` pinskia at gcc dot gnu.org
  2023-07-18 19:07 ` javier.martinez.bugzilla at gmail dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-18 17:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2023-07-18
     Ever confirmed|0                           |1

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The generic tuning is:
  "16:11:8",                            /* Loop alignment.  */
  "16:11:8",                            /* Jump alignment.  */
  "0:0:8",                              /* Label alignment.  */

The the operands are described as:
https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Optimize-Options.html#index-falign-functions
```
-falign-functions=n:m:n2
...
Examples: -falign-functions=32 aligns functions to the next 32-byte boundary,
-falign-functions=24 aligns to the next 32-byte boundary only if this can be
done by skipping 23 bytes or less, -falign-functions=32:7 aligns to the next
32-byte boundary only if this can be done by skipping 6 bytes or less.

The second pair of n2:m2 values allows you to specify a secondary alignment:
-falign-functions=64:7:32:3 aligns to the next 64-byte boundary if this can be
done by skipping 6 bytes or less, otherwise aligns to the next 32-byte boundary
if this can be done by skipping 2 bytes or less. If m2 is not specified, it
defaults to n2.
```

So align jumps to 16 byte if 11 or less bytes can be used or 8 byte alignment
Which is exactly what this does:
        .p2align 4,,7          # <-- unnecessary alignment
        .p2align 3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets
  2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
  2023-07-18 17:38 ` [Bug middle-end/110724] " pinskia at gcc dot gnu.org
  2023-07-18 17:48 ` pinskia at gcc dot gnu.org
@ 2023-07-18 19:07 ` javier.martinez.bugzilla at gmail dot com
  2023-07-18 20:00 ` [Bug target/110724] " pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: javier.martinez.bugzilla at gmail dot com @ 2023-07-18 19:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #3 from Javier Martinez <javier.martinez.bugzilla at gmail dot com> ---
The generic tuning of 16:11:8 looks reasonable to me, I do not argue against
it.



From Anger Fog’s Optimizing subroutines in assembly language:

> Most microprocessors fetch code in aligned 16-byte or 32-byte blocks.
> If an important subroutine entry or jump label happens to be near the
> end of a 16-byte block then the microprocessor will only get a few 
> useful bytes of code when fetching that block of code. It may have
> to fetch the next 16 bytes too before it can decode the first instructions
> after the label. This can be avoided by aligning important subroutine
> entries and loop entries by 16. Aligning by 8 will assure that at least 8
> bytes of code can be loaded with the first instruction fetch, which may
> be sufficient if the instructions are small.



This looks like the reason behind the alignment. That section of the book
goes on to explain the inconvenience (execution of nops) of alignment on labels
reachable by other means than branching - which I presume lead to the :m and
:m2 tuning values, the distinction between -falign-labels and -falign-jumps,
and the reason padding is removed when my label is reachable by fall-through
with [[unlikely]].



All this is fine. 

My thesis is that this alignment strategy is always unnecessary in one specific
circumstance - when the branch target is itself an unconditional branch of size
1, as in:



.L1:

      ret 



Because the ret instruction will never cross a block boundary, and the
instructions following the ret must not execute, so there is no front-end stall
to avoid.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
  2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
                   ` (2 preceding siblings ...)
  2023-07-18 19:07 ` javier.martinez.bugzilla at gmail dot com
@ 2023-07-18 20:00 ` pinskia at gcc dot gnu.org
  2023-07-18 20:02 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-18 20:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |target

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I noticed clang/LLVM does not do any alignment here but I doubt that is correct
thing to do.

Anyways I don't there is any bug here in the middle-end because the alignment
is what the backend requests. Now the question is the alignment needed or not,
that is for some x86 person to do benchmarking really. And if this needs more
than the simple heurstics than what is currently supported in the middle-end
well that is still up to the Intel/AMD person to benchmark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
  2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
                   ` (3 preceding siblings ...)
  2023-07-18 20:00 ` [Bug target/110724] " pinskia at gcc dot gnu.org
@ 2023-07-18 20:02 ` pinskia at gcc dot gnu.org
  2023-07-19  6:46 ` rguenth at gcc dot gnu.org
  2023-07-19  8:26 ` javier.martinez.bugzilla at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-18 20:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Plus if this is just the return case how important is that because maybe we
should be inlining this kind of function.  Plus this is a memcpy, why not just
use the expansion of __builtin_memcpy here (which is tuned for each target
better).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
  2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
                   ` (4 preceding siblings ...)
  2023-07-18 20:02 ` pinskia at gcc dot gnu.org
@ 2023-07-19  6:46 ` rguenth at gcc dot gnu.org
  2023-07-19  8:26 ` javier.martinez.bugzilla at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-19  6:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
We perform jump target alignment to optimize frontend (instruction decoding).
The reporter is correct that if the jump target only contains an unconditional
control transfer elsewhere such alignment is moot unless this jump itself
crosses an instruction fetch boundary.

  a0:   89 50 fc                mov    %edx,-0x4(%rax)
  a3:   8b 11                   mov    (%rcx),%edx
  a5:   48 83 c1 04             add    $0x4,%rcx
  a9:   89 10                   mov    %edx,(%rax)
  ab:   48 83 c0 04             add    $0x4,%rax
  af:   eb 94                   jmp    45 <duff+0x45>
  b1:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  b8:   c3                      ret
  b9:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  c0:   48 89 cf                mov    %rcx,%rdi

so that doesn't seem to be the case here (in fact since 'ret' is a single
byte it never crosses a fetch boundary but other uncond jumps might).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
  2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
                   ` (5 preceding siblings ...)
  2023-07-19  6:46 ` rguenth at gcc dot gnu.org
@ 2023-07-19  8:26 ` javier.martinez.bugzilla at gmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: javier.martinez.bugzilla at gmail dot com @ 2023-07-19  8:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #7 from Javier Martinez <javier.martinez.bugzilla at gmail dot com> ---
Another case where it might be interesting to remove padding (or reduce the :m
threshold) is when the path is known to be cold. I can see Trunk padding labels
inside [clone .cold], and with attribute((cold)) &&  __builtin_expect hints.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-07-19  8:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
2023-07-18 17:38 ` [Bug middle-end/110724] " pinskia at gcc dot gnu.org
2023-07-18 17:48 ` pinskia at gcc dot gnu.org
2023-07-18 19:07 ` javier.martinez.bugzilla at gmail dot com
2023-07-18 20:00 ` [Bug target/110724] " pinskia at gcc dot gnu.org
2023-07-18 20:02 ` pinskia at gcc dot gnu.org
2023-07-19  6:46 ` rguenth at gcc dot gnu.org
2023-07-19  8:26 ` javier.martinez.bugzilla at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).