public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
@ 2020-09-11 10:52 ` koen.zandberg at inria dot fr
  2022-09-01  7:24 ` kito at gcc dot gnu.org
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: koen.zandberg at inria dot fr @ 2020-09-11 10:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Koen Zandberg <koen.zandberg at inria dot fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |koen.zandberg at inria dot fr

--- Comment #6 from Koen Zandberg <koen.zandberg at inria dot fr> ---
We're hitting this bug on the an ARMv7 (cortex-m7) platform. Due to cache
alignment we see a significant increase in performance when aligning individual
functions on cache line boundaries, but we still prefer to compile with size
optimizations enabled due to the limited resources available on the platform.

Is there a way to work around this issue and enable -falign-functions together
with -Os?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
  2020-09-11 10:52 ` [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line koen.zandberg at inria dot fr
@ 2022-09-01  7:24 ` kito at gcc dot gnu.org
  2023-01-12 11:34 ` mark at kernel dot org
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: kito at gcc dot gnu.org @ 2022-09-01  7:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Kito Cheng <kito at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kito at gcc dot gnu.org

--- Comment #7 from Kito Cheng <kito at gcc dot gnu.org> ---
We are hitting this issue on RISC-V, and got some complain from linux kernel
developers, but in different form as the original report, we found cold
function or any function is marked as cold by `-fguess-branch-probability` are
all not honor to the -falign-functions=N setting, that become problem on some
linux kernel feature since they want to control the minimal alignment to make
sure they can atomically update the instruction which require align to 4 byte.

However current GCC behavior can't guarantee that even -falign-functions=4 is
given, there is 3 option in my mind:

1. Fix -falign-functions=N, let it work as expect on -Os and all cold functions
2. Force align to 4 byte if -fpatchable-function-entry is given, that's should
be doable by adjust RISC-V's FUNCTION_BOUNDARY
3. Adjust RISC-V's FUNCTION_BOUNDARY to let it honor to -falign-functions=N
4. Adding a -malign-functions=N...Okay, I know that suck idea, x86 already
deprecated that.

But I think ideally this should fixed by 1 option if possible.

Testcase from RISC-V kernel guy:
```
/* { dg-do compile } */
/* { dg-options "-march=rv64gc -mabi=lp64d -O1 -falign-functions=128" } */
/* { dg-final { scan-assembler-times ".align 7" 2 } } */

// Using 128 byte align rather than 4 byte align since it easier to observe.

__attribute__((__cold__)) void a() {} // This function isn't align to 128 byte
void b() {} // This function align to 128 byte.
```

Proposed fix:
```
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 49d5cda122f..6f8ed85fea9 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -1907,8 +1907,7 @@ assemble_start_function (tree decl, const char *fnname)
      Note that we still need to align to DECL_ALIGN, as above,
      because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
   if (! DECL_USER_ALIGN (decl)
-      && align_functions.levels[0].log > align
-      && optimize_function_for_speed_p (cfun))
+      && align_functions.levels[0].log > align)
     {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
       int align_log = align_functions.levels[0].log;

```

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
  2020-09-11 10:52 ` [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line koen.zandberg at inria dot fr
  2022-09-01  7:24 ` kito at gcc dot gnu.org
@ 2023-01-12 11:34 ` mark at kernel dot org
  2023-01-12 16:11 ` egallager at gcc dot gnu.org
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: mark at kernel dot org @ 2023-01-12 11:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Branko Drevensek <branko.drevensek at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |branko.drevensek at gmail dot com

Mark Rutland <mark at kernel dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mark at kernel dot org

--- Comment #8 from Branko Drevensek <branko.drevensek at gmail dot com> ---
Size optimization turning function alignment off assumes function alignment is
an optimization only, while for some architectures it might be requirement for
certain functions, such as interrupt handlers on risc-v.
This makes it impossible to have those functions aligned using this
switch/attribute regardless of optimization level selected as -Os will cause
alignment setting to be ignored.

--- Comment #9 from Mark Rutland <mark at kernel dot org> ---
This appears to be one case of several where GCC drops the alignment specified
by `-falign-functions=N`. I'm commenting with the other cases here rather than
creating new tickets on the assumption that's preferable.

Dropping the alignment specified by `-falign-functions=N` is a functional issue
for the arm64 Linux kernel port affecting our 'ftrace' tracing mechanism. I see
this with GCC 12.1.0 (and have no tested other versions), and LLVM seems to
always respect the alignment specified by `-falign-functions=N`

The arm64 Linux kernel port needs to use `-falign-functions=8` along with
`-fpatchable-function-entry=N,2` to place a naturally-aligned 8-byte literal at
the start of functions. There's some detail of that at:

  https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@arm.com/

As noted earlier in this ticket, GCC does not seem to respect
`-falign-functions=N` when using `-Os`. For my use-case we cvan work around the
issue by not passing `-Os`, and I have one patch to do so, but this is not
ideal:

  https://lore.kernel.org/lkml/20230109135828.879136-3-mark.rutland@arm.com/


In addition, GCC seems to drop alignment for cold functions, whether those are
marked as cold explicitly or when determined by some interprocedural analysis.
I've noted this on LKML at:

  https://lore.kernel.org/lkml/Y77%2FqVgvaJidFpYt@FVFF77S0Q05N/

... the below summary is a copy-paste of that:

For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-cold.c                 
| #define __cold \
|         __attribute__((cold))
| 
| #define EXPORT_FUNC_PTR(func) \
|         typeof((func)) *__ptr_##func = (func)
| 
| __cold
| void cold_func_a(void) { }
| 
| __cold
| void cold_func_b(void) { }
| 
| __cold
| void cold_func_c(void) { }
| 
| static __cold
| void static_cold_func_a(void) { }
| EXPORT_FUNC_PTR(static_cold_func_a);
| 
| static __cold
| void static_cold_func_b(void) { }
| EXPORT_FUNC_PTR(static_cold_func_b);
| 
| static __cold
| void static_cold_func_c(void) { }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-gcc -falign-functions=16 -c test-cold.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-objdump -d test-cold.o                     
| 
| test-cold.o:     file format elf64-littleaarch64
| 
| 
| Disassembly of section .text:
| 
| 0000000000000000 <static_cold_func_a>:
|    0:   d65f03c0        ret
| 
| 0000000000000004 <static_cold_func_b>:
|    4:   d65f03c0        ret
| 
| 0000000000000008 <static_cold_func_c>:
|    8:   d65f03c0        ret
| 
| 000000000000000c <cold_func_a>:
|    c:   d65f03c0        ret
| 
| 0000000000000010 <cold_func_b>:
|   10:   d65f03c0        ret
| 
| 0000000000000014 <cold_func_c>:
|   14:   d65f03c0        ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-objdump -h test-cold.o
| 
| test-cold.o:     file format elf64-littleaarch64
| 
| Sections:
| Idx Name          Size      VMA               LMA               File off 
Algn
|   0 .text         00000018  0000000000000000  0000000000000000  00000040 
2**2
|                   CONTENTS, ALLOC, LOAD, READONLY, CODE
|   1 .data         00000018  0000000000000000  0000000000000000  00000058 
2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, DATA
|   2 .bss          00000000  0000000000000000  0000000000000000  00000070 
2**0
|                   ALLOC
|   3 .comment      00000013  0000000000000000  0000000000000000  00000070 
2**0
|                   CONTENTS, READONLY
|   4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000083 
2**0
|                   CONTENTS, READONLY
|   5 .eh_frame     00000090  0000000000000000  0000000000000000  00000088 
2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

In simple cases, alignment *can* be restored if an explicit function attribute
is used. For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold.c         
| #define __aligned(n) \
|         __attribute__((aligned(n)))
| 
| #define __cold \
|         __attribute__((cold)) __aligned(16)
| 
| #define EXPORT_FUNC_PTR(func) \
|         typeof((func)) *__ptr_##func = (func)
| 
| __cold
| void cold_func_a(void) { }
| 
| __cold
| void cold_func_b(void) { }
| 
| __cold
| void cold_func_c(void) { }
| 
| static __cold
| void static_cold_func_a(void) { }
| EXPORT_FUNC_PTR(static_cold_func_a);
| 
| static __cold
| void static_cold_func_b(void) { }
| EXPORT_FUNC_PTR(static_cold_func_b);
| 
| static __cold
| void static_cold_func_c(void) { }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-objdump -d test-aligned-cold.o                     
| 
| test-aligned-cold.o:     file format elf64-littleaarch64
| 
| 
| Disassembly of section .text:
| 
| 0000000000000000 <static_cold_func_a>:
|    0:   d65f03c0        ret
|    4:   d503201f        nop
|    8:   d503201f        nop
|    c:   d503201f        nop
| 
| 0000000000000010 <static_cold_func_b>:
|   10:   d65f03c0        ret
|   14:   d503201f        nop
|   18:   d503201f        nop
|   1c:   d503201f        nop
| 
| 0000000000000020 <static_cold_func_c>:
|   20:   d65f03c0        ret
|   24:   d503201f        nop
|   28:   d503201f        nop
|   2c:   d503201f        nop
| 
| 0000000000000030 <cold_func_a>:
|   30:   d65f03c0        ret
|   34:   d503201f        nop
|   38:   d503201f        nop
|   3c:   d503201f        nop
| 
| 0000000000000040 <cold_func_b>:
|   40:   d65f03c0        ret
|   44:   d503201f        nop
|   48:   d503201f        nop
|   4c:   d503201f        nop
| 
| 0000000000000050 <cold_func_c>:
|   50:   d65f03c0        ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-objdump -h test-aligned-cold.o
| 
| test-aligned-cold.o:     file format elf64-littleaarch64
| 
| Sections:
| Idx Name          Size      VMA               LMA               File off 
Algn
|   0 .text         00000054  0000000000000000  0000000000000000  00000040 
2**4
|                   CONTENTS, ALLOC, LOAD, READONLY, CODE
|   1 .data         00000018  0000000000000000  0000000000000000  00000098 
2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, DATA
|   2 .bss          00000000  0000000000000000  0000000000000000  000000b0 
2**0
|                   ALLOC
|   3 .comment      00000013  0000000000000000  0000000000000000  000000b0 
2**0
|                   CONTENTS, READONLY
|   4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  000000c3 
2**0
|                   CONTENTS, READONLY
|   5 .eh_frame     00000090  0000000000000000  0000000000000000  000000c8 
2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA


Unfortunately it appears that some interprocedural analysis determines that if
a callee is only called/referenced from cold callers, the callee is marked as
cold, and the alignment it would have got from the command line option is
dropped. If it's given an explicit alignment attribute, the alignment is
retained.

For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold-caller.c  
| #define noinline \
|         __attribute__((noinline))
| 
| #define __aligned(n) \
|         __attribute__((aligned(n)))
| 
| #define __cold \
|         __attribute__((cold)) __aligned(16)
| 
| #define EXPORT_FUNC_PTR(func) \
|         typeof((func)) *__ptr_##func = (func)
| 
| static noinline void callee_a(void)
| {
|         asm volatile("// callee_a\n" ::: "memory");
| }
| 
| static noinline void callee_b(void)
| {
|         asm volatile("// callee_b\n" ::: "memory");
| }
| 
| static noinline void callee_c(void)
| {
|         asm volatile("// callee_c\n" ::: "memory");
| }
| __cold
| void cold_func_a(void) { callee_a(); }
| 
| __cold
| void cold_func_b(void) { callee_b(); }
| 
| __cold
| void cold_func_c(void) { callee_c(); }
| 
| static __cold
| void static_cold_func_a(void) { callee_a(); }
| EXPORT_FUNC_PTR(static_cold_func_a);
| 
| static __cold
| void static_cold_func_b(void) { callee_b(); }
| EXPORT_FUNC_PTR(static_cold_func_b);
| 
| static __cold
| void static_cold_func_c(void) { callee_c(); }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold-caller.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-objdump -d test-aligned-cold-caller.o                     
| 
| test-aligned-cold-caller.o:     file format elf64-littleaarch64
| 
| 
| Disassembly of section .text:
| 
| 0000000000000000 <callee_a>:
|    0:   d65f03c0        ret
| 
| 0000000000000004 <callee_b>:
|    4:   d65f03c0        ret
| 
| 0000000000000008 <callee_c>:
|    8:   d65f03c0        ret
|    c:   d503201f        nop
| 
| 0000000000000010 <static_cold_func_a>:
|   10:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   14:   910003fd        mov     x29, sp
|   18:   97fffffa        bl      0 <callee_a>
|   1c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   20:   d65f03c0        ret
|   24:   d503201f        nop
|   28:   d503201f        nop
|   2c:   d503201f        nop
| 
| 0000000000000030 <static_cold_func_b>:
|   30:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   34:   910003fd        mov     x29, sp
|   38:   97fffff3        bl      4 <callee_b>
|   3c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   40:   d65f03c0        ret
|   44:   d503201f        nop
|   48:   d503201f        nop
|   4c:   d503201f        nop
| 
| 0000000000000050 <static_cold_func_c>:
|   50:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   54:   910003fd        mov     x29, sp
|   58:   97ffffec        bl      8 <callee_c>
|   5c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   60:   d65f03c0        ret
|   64:   d503201f        nop
|   68:   d503201f        nop
|   6c:   d503201f        nop
| 
| 0000000000000070 <cold_func_a>:
|   70:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   74:   910003fd        mov     x29, sp
|   78:   97ffffe2        bl      0 <callee_a>
|   7c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   80:   d65f03c0        ret
|   84:   d503201f        nop
|   88:   d503201f        nop
|   8c:   d503201f        nop
| 
| 0000000000000090 <cold_func_b>:
|   90:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   94:   910003fd        mov     x29, sp
|   98:   97ffffdb        bl      4 <callee_b>
|   9c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   a0:   d65f03c0        ret
|   a4:   d503201f        nop
|   a8:   d503201f        nop
|   ac:   d503201f        nop
| 
| 00000000000000b0 <cold_func_c>:
|   b0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   b4:   910003fd        mov     x29, sp
|   b8:   97ffffd4        bl      8 <callee_c>
|   bc:   a8c17bfd        ldp     x29, x30, [sp], #16
|   c0:   d65f03c0        ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-objdump -h test-aligned-cold-caller.o
| 
| test-aligned-cold-caller.o:     file format elf64-littleaarch64
| 
| Sections:
| Idx Name          Size      VMA               LMA               File off 
Algn
|   0 .text         000000c4  0000000000000000  0000000000000000  00000040 
2**4
|                   CONTENTS, ALLOC, LOAD, READONLY, CODE
|   1 .data         00000018  0000000000000000  0000000000000000  00000108 
2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, DATA
|   2 .bss          00000000  0000000000000000  0000000000000000  00000120 
2**0
|                   ALLOC
|   3 .comment      00000013  0000000000000000  0000000000000000  00000120 
2**0
|                   CONTENTS, READONLY
|   4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000133 
2**0
|                   CONTENTS, READONLY
|   5 .eh_frame     00000110  0000000000000000  0000000000000000  00000138 
2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2023-01-12 11:34 ` mark at kernel dot org
@ 2023-01-12 16:11 ` egallager at gcc dot gnu.org
  2023-01-13 12:56 ` mark at kernel dot org
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: egallager at gcc dot gnu.org @ 2023-01-12 16:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Eric Gallager <egallager at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |egallager at gcc dot gnu.org

--- Comment #10 from Eric Gallager <egallager at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #2)
> I disagree, with -Os you want smallest size.

There is -Oz now, so now *that's* the one for the smallest size

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2023-01-12 16:11 ` egallager at gcc dot gnu.org
@ 2023-01-13 12:56 ` mark at kernel dot org
  2023-01-17 16:26 ` ktkachov at gcc dot gnu.org
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: mark at kernel dot org @ 2023-01-13 12:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #11 from Mark Rutland <mark at kernel dot org> ---
Further, at `-O1` and above GCC seems to silently drop the alignment of any
implementation of abort(), apparently implicitly marking it as cold.

I assume that may be true for other special functions.

For example:

[mark@lakrids:/mnt/data/tests/gcc-alignment]% cat abort.c 
void abort(void)
{
        __builtin_trap();
}
[mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc
-falign-functions=16 -c abort.c -O2
[mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-objdump -d abort.o                     

abort.o:     file format elf64-littleaarch64


Disassembly of section .text.unlikely:

0000000000000000 <abort>:
   0:   d4207d00        brk     #0x3e8
[mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0
aarch64-linux-objdump -h abort.o

abort.o:     file format elf64-littleaarch64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000040  2**0
                  ALLOC
  3 .text.unlikely 00000004  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  4 .comment      00000013  0000000000000000  0000000000000000  00000044  2**0
                  CONTENTS, READONLY
  5 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000057 
2**0
                  CONTENTS, READONLY
  6 .eh_frame     00000028  0000000000000000  0000000000000000  00000058  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2023-01-13 12:56 ` mark at kernel dot org
@ 2023-01-17 16:26 ` ktkachov at gcc dot gnu.org
  2023-01-17 16:37 ` kito at gcc dot gnu.org
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2023-01-17 16:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
                 CC|                            |ktkachov at gcc dot gnu.org
   Last reconfirmed|                            |2023-01-17

--- Comment #12 from ktkachov at gcc dot gnu.org ---
(In reply to Kito Cheng from comment #7)
> We are hitting this issue on RISC-V, and got some complain from linux kernel
> developers, but in different form as the original report, we found cold
> function or any function is marked as cold by `-fguess-branch-probability`
> are all not honor to the -falign-functions=N setting, that become problem on
> some linux kernel feature since they want to control the minimal alignment
> to make sure they can atomically update the instruction which require align
> to 4 byte.
> 
> However current GCC behavior can't guarantee that even -falign-functions=4
> is given, there is 3 option in my mind:
> 
> 1. Fix -falign-functions=N, let it work as expect on -Os and all cold
> functions
> 2. Force align to 4 byte if -fpatchable-function-entry is given, that's
> should be doable by adjust RISC-V's FUNCTION_BOUNDARY
> 3. Adjust RISC-V's FUNCTION_BOUNDARY to let it honor to -falign-functions=N
> 4. Adding a -malign-functions=N...Okay, I know that suck idea, x86 already
> deprecated that.
> 
> But I think ideally this should fixed by 1 option if possible.
> 
> Testcase from RISC-V kernel guy:
> ```
> /* { dg-do compile } */
> /* { dg-options "-march=rv64gc -mabi=lp64d -O1 -falign-functions=128" } */
> /* { dg-final { scan-assembler-times ".align 7" 2 } } */
> 
> // Using 128 byte align rather than 4 byte align since it easier to observe.
> 
> __attribute__((__cold__)) void a() {} // This function isn't align to 128
> byte
> void b() {} // This function align to 128 byte.
> ```
> 
> Proposed fix:
> ```
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 49d5cda122f..6f8ed85fea9 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -1907,8 +1907,7 @@ assemble_start_function (tree decl, const char *fnname)
>       Note that we still need to align to DECL_ALIGN, as above,
>       because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all. 
> */
>    if (! DECL_USER_ALIGN (decl)
> -      && align_functions.levels[0].log > align
> -      && optimize_function_for_speed_p (cfun))
> +      && align_functions.levels[0].log > align)
>      {
>  #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
>        int align_log = align_functions.levels[0].log;
> 
> ```

I think this patch makes sense given the extra information you and Mark have
provided. Would you mind testing it and posting it to gcc-patches for review
please?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2023-01-17 16:26 ` ktkachov at gcc dot gnu.org
@ 2023-01-17 16:37 ` kito at gcc dot gnu.org
  2023-01-23 14:15 ` mark at kernel dot org
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: kito at gcc dot gnu.org @ 2023-01-17 16:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #13 from Kito Cheng <kito at gcc dot gnu.org> ---
Patch posted before, but seems like not everybody agree:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603049.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2023-01-17 16:37 ` kito at gcc dot gnu.org
@ 2023-01-23 14:15 ` mark at kernel dot org
  2023-09-12 11:47 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: mark at kernel dot org @ 2023-01-23 14:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #14 from Mark Rutland <mark at kernel dot org> ---
> Patch posted before, but seems like not everybody agree:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603049.html

FWIW, from the arm64 kernel side all we need is a reliable mechanism to align
functions, so the `-malign-all-function` or `-falign-all-functions` options Jan
suggests there would be good enough going forwards. It's just unfortunate that
we don't currently have a reliable mechanism (and have to workaround that by
avoiding the use of `-Os` and the `__cold__` attribute), and I guess the new
option would not be back-ported to stable GCC releases.

I don't think we want this hidden behind a `-flive-patching` option, since may
also want the alignment when not live-patching (e.g. for our
DEBUG_FORCE_FUNCTION_ALIGN_64B option), and the arch-specific details vary
quite drastically (e.g. arm64 uses `-fpatchable-function-entry=M,N`, x86 uses
`-mfentry` for patching and `-fpatchable-function-entry` for some CFI-related
logic), and those details are liable to change over time. Being able to set the
alignment independent from those details is likely to be more future-proof.

If it's intended that '-falign-functions=N' is overridden in some cases, it
would be nice to have that more clearly documented.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2023-01-23 14:15 ` mark at kernel dot org
@ 2023-09-12 11:47 ` rguenth at gcc dot gnu.org
  2023-09-12 12:35 ` matz at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-12 11:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another option would be to add another number to -falign-functions,

@itemx -falign-functions=@var{n}:@var{m}:@var{n2}:@var{m2}:@var{n3}

where 'n3' applies unconditionally and defaults to a target specific
value.  And document that earlier values might not be honored for cold
functions  (or at -Os).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2023-09-12 11:47 ` rguenth at gcc dot gnu.org
@ 2023-09-12 12:35 ` matz at gcc dot gnu.org
  2023-11-22 13:20 ` hubicka at gcc dot gnu.org
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: matz at gcc dot gnu.org @ 2023-09-12 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Michael Matz <matz at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matz at gcc dot gnu.org

--- Comment #16 from Michael Matz <matz at gcc dot gnu.org> ---
I think -falign-functions=N should simply override any consideration of -Os or
coldness or suchlike.  Clearly the explicit use of such option is by some
intention of the compiler user, and as the compiler can't know _what_ exactly
is intended it needs to assume that the alignment needs to hold for all
functions.  Richis proposal would of course work, but seems a rather awkward
workaround for behaviour that actually is surprising.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2023-09-12 12:35 ` matz at gcc dot gnu.org
@ 2023-11-22 13:20 ` hubicka at gcc dot gnu.org
  2023-12-06  9:38 ` hubicka at gcc dot gnu.org
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-22 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #17 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
-falign-functions/-falign-jumps/-falign-labels/-falign-loops are originally are
intended for performance tuning.  Starting function entry close to the end of
page of code cache may lead to wasted code cache space as well as higher
overhead calling the function when CPU fetches page which contains just little
useful information.

As such I would like to keep them affecting only hot code (we should update
documentation for that).  Internally we have FUNCTION_BOUNDARY which specifies
minimal alignment needed by ABI, which is set to 8bits for i386.  My
understanding is that -fpatchable-function-entry requires the alignment to be
64bits in order to make it possible to atomically change the instruction.

So perhaps we want to make FUNCTION_BOUNDARY to be 64 for functions where we
output the patchable entry?
I am also OK with extending the flag syntax or adding -fmin-function-alignment
to specify optional user-defined minimum (increase FUNCTION_BOUNDARY) if that
seems useful, but I think the first one is most consistent way to go with live
patching?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2023-11-22 13:20 ` hubicka at gcc dot gnu.org
@ 2023-12-06  9:38 ` hubicka at gcc dot gnu.org
  2023-12-06 13:38 ` matz at gcc dot gnu.org
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-12-06  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |hubicka at gcc dot gnu.org

--- Comment #18 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Reading all the discussion again, I am leaning towards -falign-all-functions +
documentation update explaining that -falign-functions/-falign-loops are
optimizations and ignored for -Os.

I do use -falign-functions/-falign-loops when tuning for new generations of
CPUs and I definitely want to have way to specify alignment that is ignored for
cold functions (as perforance optimization) and we have this behavior since
profile code was introduced in 2002.

As an optimization, we also want to have hot functions aligned more than 8 byte
boundary needed for patching.

I will prepare patch for this and send it for disucssion.  Pehraps we want
-flive-patching to also imply FUNCTION_BOUNDARY increase on x86-64? Or is live
patching useful if function entries are not aligned?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2023-12-06  9:38 ` hubicka at gcc dot gnu.org
@ 2023-12-06 13:38 ` matz at gcc dot gnu.org
  2023-12-06 17:20 ` hubicka at ucw dot cz
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: matz at gcc dot gnu.org @ 2023-12-06 13:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #19 from Michael Matz <matz at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #18)
> Reading all the discussion again, I am leaning towards -falign-all-functions
> + documentation update explaining that -falign-functions/-falign-loops are
> optimizations and ignored for -Os.
> 
> I do use -falign-functions/-falign-loops when tuning for new generations of
> CPUs and I definitely want to have way to specify alignment that is ignored
> for cold functions (as perforance optimization) and we have this behavior
> since profile code was introduced in 2002.
> 
> As an optimization, we also want to have hot functions aligned more than 8
> byte boundary needed for patching.
> 
> I will prepare patch for this and send it for disucssion.  Pehraps we want
> -flive-patching to also imply FUNCTION_BOUNDARY increase on x86-64? Or is
> live patching useful if function entries are not aligned?

Live patching (user-space) doesn't depend on any particular alignment of
functions, on x86-64 at least.  (The plan for other architectures wouldn't need
any specific alignment either).  Note that the above complaints about missing
alignment is for kernel (!) ftrace/livepatching on arm64 (!), not on x86_64.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2023-12-06 13:38 ` matz at gcc dot gnu.org
@ 2023-12-06 17:20 ` hubicka at ucw dot cz
  2023-12-06 18:02 ` matz at gcc dot gnu.org
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: hubicka at ucw dot cz @ 2023-12-06 17:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #20 from Jan Hubicka <hubicka at ucw dot cz> ---
> 
> Live patching (user-space) doesn't depend on any particular alignment of
> functions, on x86-64 at least.  (The plan for other architectures wouldn't need
> any specific alignment either).  Note that the above complaints about missing
> alignment is for kernel (!) ftrace/livepatching on arm64 (!), not on x86_64.

I had impression that x86-64 also needs forced alignment so the patching
can be always done atomically.  But it is not a big practical difference
if we go with a flag specifying minimal function alignment.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2023-12-06 17:20 ` hubicka at ucw dot cz
@ 2023-12-06 18:02 ` matz at gcc dot gnu.org
  2023-12-07 10:49 ` jamborm at gcc dot gnu.org
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 18+ messages in thread
From: matz at gcc dot gnu.org @ 2023-12-06 18:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #21 from Michael Matz <matz at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #20)
> > 
> > Live patching (user-space) doesn't depend on any particular alignment of
> > functions, on x86-64 at least.  (The plan for other architectures wouldn't need
> > any specific alignment either).  Note that the above complaints about missing
> > alignment is for kernel (!) ftrace/livepatching on arm64 (!), not on x86_64.
> 
> I had impression that x86-64 also needs forced alignment so the patching
> can be always done atomically.  But it is not a big practical difference
> if we go with a flag specifying minimal function alignment.

Not for userspace live patching (it's done while the process is stopped).
kernel live patching may or may not need it.  Point being that alignment
shouldn't be implied by the live patching options as its orthogonal.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2023-12-06 18:02 ` matz at gcc dot gnu.org
@ 2023-12-07 10:49 ` jamborm at gcc dot gnu.org
  2024-01-01 20:22 ` hubicka at gcc dot gnu.org
  2024-01-24 17:13 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 18+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-12-07 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #22 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Just to clarify, the case where this causes us problems is (indeed on Aarch64)
with option -fpatchable-function-entry (and NOT necessarily -flive-patching). 
But I agree that a separate orthogonal option for all cases is probably best.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2023-12-07 10:49 ` jamborm at gcc dot gnu.org
@ 2024-01-01 20:22 ` hubicka at gcc dot gnu.org
  2024-01-24 17:13 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 18+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-01-01 20:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #23 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Created attachment 56970
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56970&action=edit
Patch I am testing

Hi,
this adds -falign-all-functions parameter.  It still look like more reasonable
(and backward compatible) thing to do.  I also poked about Richi's suggestion
of extending the syntax of -falign-functions but I think it is less readable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
       [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2024-01-01 20:22 ` hubicka at gcc dot gnu.org
@ 2024-01-24 17:13 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-24 17:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

--- Comment #24 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326

commit r14-8395-g0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
Author: Jan Hubicka <jh@suse.cz>
Date:   Wed Jan 24 18:13:17 2024 +0100

    Add -fmin-function-alignmnet

    -falign-functions is ignored in cold code, since it is an optimization
intended to
    improve instruction prefetch.  In some case it is necessary to force
alignment for
    all functions, so this patch adds -fmin-function-alignment for this
purpose.

    gcc/ChangeLog:

            PR middle-end/88345
            * common.opt: (flimit-function-alignment): Reorder alphabeticaly
            (fmin-function-alignment): New parameter.
            * doc/invoke.texi: (-fmin-function-alignment): Document.
            (-falign-functions,-falign-loops,-falign-labels): Mention that
            aglinments are ignored in cold code.
            * varasm.cc (assemble_start_function): Handle
min-function-alignment.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-01-24 17:13 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-88345-4@http.gcc.gnu.org/bugzilla/>
2020-09-11 10:52 ` [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line koen.zandberg at inria dot fr
2022-09-01  7:24 ` kito at gcc dot gnu.org
2023-01-12 11:34 ` mark at kernel dot org
2023-01-12 16:11 ` egallager at gcc dot gnu.org
2023-01-13 12:56 ` mark at kernel dot org
2023-01-17 16:26 ` ktkachov at gcc dot gnu.org
2023-01-17 16:37 ` kito at gcc dot gnu.org
2023-01-23 14:15 ` mark at kernel dot org
2023-09-12 11:47 ` rguenth at gcc dot gnu.org
2023-09-12 12:35 ` matz at gcc dot gnu.org
2023-11-22 13:20 ` hubicka at gcc dot gnu.org
2023-12-06  9:38 ` hubicka at gcc dot gnu.org
2023-12-06 13:38 ` matz at gcc dot gnu.org
2023-12-06 17:20 ` hubicka at ucw dot cz
2023-12-06 18:02 ` matz at gcc dot gnu.org
2023-12-07 10:49 ` jamborm at gcc dot gnu.org
2024-01-01 20:22 ` hubicka at gcc dot gnu.org
2024-01-24 17:13 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).