* [Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets
2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
@ 2023-07-18 17:38 ` pinskia at gcc dot gnu.org
2023-07-18 17:48 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-18 17:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this is by design.
Adding -fno-align-jumps makes the alignment go away.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets
2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
2023-07-18 17:38 ` [Bug middle-end/110724] " pinskia at gcc dot gnu.org
@ 2023-07-18 17:48 ` pinskia at gcc dot gnu.org
2023-07-18 19:07 ` javier.martinez.bugzilla at gmail dot com
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-18 17:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2023-07-18
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The generic tuning is:
"16:11:8", /* Loop alignment. */
"16:11:8", /* Jump alignment. */
"0:0:8", /* Label alignment. */
The the operands are described as:
https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Optimize-Options.html#index-falign-functions
```
-falign-functions=n:m:n2
...
Examples: -falign-functions=32 aligns functions to the next 32-byte boundary,
-falign-functions=24 aligns to the next 32-byte boundary only if this can be
done by skipping 23 bytes or less, -falign-functions=32:7 aligns to the next
32-byte boundary only if this can be done by skipping 6 bytes or less.
The second pair of n2:m2 values allows you to specify a secondary alignment:
-falign-functions=64:7:32:3 aligns to the next 64-byte boundary if this can be
done by skipping 6 bytes or less, otherwise aligns to the next 32-byte boundary
if this can be done by skipping 2 bytes or less. If m2 is not specified, it
defaults to n2.
```
So align jumps to 16 byte if 11 or less bytes can be used or 8 byte alignment
Which is exactly what this does:
.p2align 4,,7 # <-- unnecessary alignment
.p2align 3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets
2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
2023-07-18 17:38 ` [Bug middle-end/110724] " pinskia at gcc dot gnu.org
2023-07-18 17:48 ` pinskia at gcc dot gnu.org
@ 2023-07-18 19:07 ` javier.martinez.bugzilla at gmail dot com
2023-07-18 20:00 ` [Bug target/110724] " pinskia at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: javier.martinez.bugzilla at gmail dot com @ 2023-07-18 19:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724
--- Comment #3 from Javier Martinez <javier.martinez.bugzilla at gmail dot com> ---
The generic tuning of 16:11:8 looks reasonable to me, I do not argue against
it.
From Anger Fog’s Optimizing subroutines in assembly language:
> Most microprocessors fetch code in aligned 16-byte or 32-byte blocks.
> If an important subroutine entry or jump label happens to be near the
> end of a 16-byte block then the microprocessor will only get a few
> useful bytes of code when fetching that block of code. It may have
> to fetch the next 16 bytes too before it can decode the first instructions
> after the label. This can be avoided by aligning important subroutine
> entries and loop entries by 16. Aligning by 8 will assure that at least 8
> bytes of code can be loaded with the first instruction fetch, which may
> be sufficient if the instructions are small.
This looks like the reason behind the alignment. That section of the book
goes on to explain the inconvenience (execution of nops) of alignment on labels
reachable by other means than branching - which I presume lead to the :m and
:m2 tuning values, the distinction between -falign-labels and -falign-jumps,
and the reason padding is removed when my label is reachable by fall-through
with [[unlikely]].
All this is fine.
My thesis is that this alignment strategy is always unnecessary in one specific
circumstance - when the branch target is itself an unconditional branch of size
1, as in:
.L1:
ret
Because the ret instruction will never cross a block boundary, and the
instructions following the ret must not execute, so there is no front-end stall
to avoid.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
` (2 preceding siblings ...)
2023-07-18 19:07 ` javier.martinez.bugzilla at gmail dot com
@ 2023-07-18 20:00 ` pinskia at gcc dot gnu.org
2023-07-18 20:02 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-18 20:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|middle-end |target
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I noticed clang/LLVM does not do any alignment here but I doubt that is correct
thing to do.
Anyways I don't there is any bug here in the middle-end because the alignment
is what the backend requests. Now the question is the alignment needed or not,
that is for some x86 person to do benchmarking really. And if this needs more
than the simple heurstics than what is currently supported in the middle-end
well that is still up to the Intel/AMD person to benchmark.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
` (3 preceding siblings ...)
2023-07-18 20:00 ` [Bug target/110724] " pinskia at gcc dot gnu.org
@ 2023-07-18 20:02 ` pinskia at gcc dot gnu.org
2023-07-19 6:46 ` rguenth at gcc dot gnu.org
2023-07-19 8:26 ` javier.martinez.bugzilla at gmail dot com
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-18 20:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Plus if this is just the return case how important is that because maybe we
should be inlining this kind of function. Plus this is a memcpy, why not just
use the expansion of __builtin_memcpy here (which is tuned for each target
better).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
` (4 preceding siblings ...)
2023-07-18 20:02 ` pinskia at gcc dot gnu.org
@ 2023-07-19 6:46 ` rguenth at gcc dot gnu.org
2023-07-19 8:26 ` javier.martinez.bugzilla at gmail dot com
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-19 6:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
CC| |hubicka at gcc dot gnu.org
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
We perform jump target alignment to optimize frontend (instruction decoding).
The reporter is correct that if the jump target only contains an unconditional
control transfer elsewhere such alignment is moot unless this jump itself
crosses an instruction fetch boundary.
a0: 89 50 fc mov %edx,-0x4(%rax)
a3: 8b 11 mov (%rcx),%edx
a5: 48 83 c1 04 add $0x4,%rcx
a9: 89 10 mov %edx,(%rax)
ab: 48 83 c0 04 add $0x4,%rax
af: eb 94 jmp 45 <duff+0x45>
b1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
b8: c3 ret
b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
c0: 48 89 cf mov %rcx,%rdi
so that doesn't seem to be the case here (in fact since 'ret' is a single
byte it never crosses a fetch boundary but other uncond jumps might).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
2023-07-18 17:11 [Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets javier.martinez.bugzilla at gmail dot com
` (5 preceding siblings ...)
2023-07-19 6:46 ` rguenth at gcc dot gnu.org
@ 2023-07-19 8:26 ` javier.martinez.bugzilla at gmail dot com
6 siblings, 0 replies; 8+ messages in thread
From: javier.martinez.bugzilla at gmail dot com @ 2023-07-19 8:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724
--- Comment #7 from Javier Martinez <javier.martinez.bugzilla at gmail dot com> ---
Another case where it might be interesting to remove padding (or reduce the :m
threshold) is when the path is known to be cold. I can see Trunk padding labels
inside [clone .cold], and with attribute((cold)) && __builtin_expect hints.
^ permalink raw reply [flat|nested] 8+ messages in thread