From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E36203857C44; Thu, 12 Jan 2023 11:34:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E36203857C44 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1673523278; bh=zWeO2PLDDAvf6QW+e2jSjyVgUlJ3/YQgBES6vqSCYuk=; h=From:To:Subject:Date:In-Reply-To:References:From; b=hoPwQ4RWvwCruiPSIwSTXw5uRSXops3rwdHq+1NrKiZHJYaWXOCZ4/RZqZJFQnndq s1dcGAqRwMjNNvj9LA0hiRAhBrlJgqQ9ZSY2X3/5LyGZaTVAR1tfmQiuNATiix0xUC On347pafauVi47VqJYADkA+77d+7QNEYxTX8PRfI= From: "mark at kernel dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/88345] -Os overrides -falign-functions=N on the command line Date: Thu, 12 Jan 2023 11:34:35 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 9.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: mark at kernel dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D88345 Branko Drevensek changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |branko.drevensek at gmail = dot com Mark Rutland changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mark at kernel dot org --- Comment #8 from Branko Drevensek --- Size optimization turning function alignment off assumes function alignment= is an optimization only, while for some architectures it might be requirement = for certain functions, such as interrupt handlers on risc-v. This makes it impossible to have those functions aligned using this switch/attribute regardless of optimization level selected as -Os will cause alignment setting to be ignored. --- Comment #9 from Mark Rutland --- This appears to be one case of several where GCC drops the alignment specif= ied by `-falign-functions=3DN`. I'm commenting with the other cases here rather= than creating new tickets on the assumption that's preferable. Dropping the alignment specified by `-falign-functions=3DN` is a functional= issue for the arm64 Linux kernel port affecting our 'ftrace' tracing mechanism. I= see this with GCC 12.1.0 (and have no tested other versions), and LLVM seems to always respect the alignment specified by `-falign-functions=3DN` The arm64 Linux kernel port needs to use `-falign-functions=3D8` along with `-fpatchable-function-entry=3DN,2` to place a naturally-aligned 8-byte lite= ral at the start of functions. There's some detail of that at: https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@arm.com/ As noted earlier in this ticket, GCC does not seem to respect `-falign-functions=3DN` when using `-Os`. For my use-case we cvan work arou= nd the issue by not passing `-Os`, and I have one patch to do so, but this is not ideal: https://lore.kernel.org/lkml/20230109135828.879136-3-mark.rutland@arm.com/ In addition, GCC seems to drop alignment for cold functions, whether those = are marked as cold explicitly or when determined by some interprocedural analys= is. I've noted this on LKML at: https://lore.kernel.org/lkml/Y77%2FqVgvaJidFpYt@FVFF77S0Q05N/ ... the below summary is a copy-paste of that: For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-cold.c=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 | #define __cold \ | __attribute__((cold)) |=20 | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func =3D (func) |=20 | __cold | void cold_func_a(void) { } |=20 | __cold | void cold_func_b(void) { } |=20 | __cold | void cold_func_c(void) { } |=20 | static __cold | void static_cold_func_a(void) { } | EXPORT_FUNC_PTR(static_cold_func_a); |=20 | static __cold | void static_cold_func_b(void) { } | EXPORT_FUNC_PTR(static_cold_func_b); |=20 | static __cold | void static_cold_func_c(void) { } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=3D16 -c test-cold.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-cold.o=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20 |=20 | test-cold.o: file format elf64-littleaarch64 |=20 |=20 | Disassembly of section .text: |=20 | 0000000000000000 : | 0: d65f03c0 ret |=20 | 0000000000000004 : | 4: d65f03c0 ret |=20 | 0000000000000008 : | 8: d65f03c0 ret |=20 | 000000000000000c : | c: d65f03c0 ret |=20 | 0000000000000010 : | 10: d65f03c0 ret |=20 | 0000000000000014 : | 14: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-cold.o |=20 | test-cold.o: file format elf64-littleaarch64 |=20 | Sections: | Idx Name Size VMA LMA File off= =20 Algn | 0 .text 00000018 0000000000000000 0000000000000000 00000040= =20 2**2 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000058= =20 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 00000070= =20 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 00000070= =20 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 0000008= 3=20 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000090 0000000000000000 0000000000000000 00000088= =20 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA In simple cases, alignment *can* be restored if an explicit function attrib= ute is used. For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold.c=20= =20=20=20=20=20=20=20=20 | #define __aligned(n) \ | __attribute__((aligned(n))) |=20 | #define __cold \ | __attribute__((cold)) __aligned(16) |=20 | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func =3D (func) |=20 | __cold | void cold_func_a(void) { } |=20 | __cold | void cold_func_b(void) { } |=20 | __cold | void cold_func_c(void) { } |=20 | static __cold | void static_cold_func_a(void) { } | EXPORT_FUNC_PTR(static_cold_func_a); |=20 | static __cold | void static_cold_func_b(void) { } | EXPORT_FUNC_PTR(static_cold_func_b); |=20 | static __cold | void static_cold_func_c(void) { } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=3D16 -c test-aligned-cold.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold.o=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20 |=20 | test-aligned-cold.o: file format elf64-littleaarch64 |=20 |=20 | Disassembly of section .text: |=20 | 0000000000000000 : | 0: d65f03c0 ret | 4: d503201f nop | 8: d503201f nop | c: d503201f nop |=20 | 0000000000000010 : | 10: d65f03c0 ret | 14: d503201f nop | 18: d503201f nop | 1c: d503201f nop |=20 | 0000000000000020 : | 20: d65f03c0 ret | 24: d503201f nop | 28: d503201f nop | 2c: d503201f nop |=20 | 0000000000000030 : | 30: d65f03c0 ret | 34: d503201f nop | 38: d503201f nop | 3c: d503201f nop |=20 | 0000000000000040 : | 40: d65f03c0 ret | 44: d503201f nop | 48: d503201f nop | 4c: d503201f nop |=20 | 0000000000000050 : | 50: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold.o |=20 | test-aligned-cold.o: file format elf64-littleaarch64 |=20 | Sections: | Idx Name Size VMA LMA File off= =20 Algn | 0 .text 00000054 0000000000000000 0000000000000000 00000040= =20 2**4 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000098= =20 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 000000b0= =20 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 000000b0= =20 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 000000c= 3=20 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000090 0000000000000000 0000000000000000 000000c8= =20 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA Unfortunately it appears that some interprocedural analysis determines that= if a callee is only called/referenced from cold callers, the callee is marked = as cold, and the alignment it would have got from the command line option is dropped. If it's given an explicit alignment attribute, the alignment is retained. For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold-calle= r.c=20=20 | #define noinline \ | __attribute__((noinline)) |=20 | #define __aligned(n) \ | __attribute__((aligned(n))) |=20 | #define __cold \ | __attribute__((cold)) __aligned(16) |=20 | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func =3D (func) |=20 | static noinline void callee_a(void) | { | asm volatile("// callee_a\n" ::: "memory"); | } |=20 | static noinline void callee_b(void) | { | asm volatile("// callee_b\n" ::: "memory"); | } |=20 | static noinline void callee_c(void) | { | asm volatile("// callee_c\n" ::: "memory"); | } | __cold | void cold_func_a(void) { callee_a(); } |=20 | __cold | void cold_func_b(void) { callee_b(); } |=20 | __cold | void cold_func_c(void) { callee_c(); } |=20 | static __cold | void static_cold_func_a(void) { callee_a(); } | EXPORT_FUNC_PTR(static_cold_func_a); |=20 | static __cold | void static_cold_func_b(void) { callee_b(); } | EXPORT_FUNC_PTR(static_cold_func_b); |=20 | static __cold | void static_cold_func_c(void) { callee_c(); } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=3D16 -c test-aligned-cold-caller.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold-caller.o=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 |=20 | test-aligned-cold-caller.o: file format elf64-littleaarch64 |=20 |=20 | Disassembly of section .text: |=20 | 0000000000000000 : | 0: d65f03c0 ret |=20 | 0000000000000004 : | 4: d65f03c0 ret |=20 | 0000000000000008 : | 8: d65f03c0 ret | c: d503201f nop |=20 | 0000000000000010 : | 10: a9bf7bfd stp x29, x30, [sp, #-16]! | 14: 910003fd mov x29, sp | 18: 97fffffa bl 0 | 1c: a8c17bfd ldp x29, x30, [sp], #16 | 20: d65f03c0 ret | 24: d503201f nop | 28: d503201f nop | 2c: d503201f nop |=20 | 0000000000000030 : | 30: a9bf7bfd stp x29, x30, [sp, #-16]! | 34: 910003fd mov x29, sp | 38: 97fffff3 bl 4 | 3c: a8c17bfd ldp x29, x30, [sp], #16 | 40: d65f03c0 ret | 44: d503201f nop | 48: d503201f nop | 4c: d503201f nop |=20 | 0000000000000050 : | 50: a9bf7bfd stp x29, x30, [sp, #-16]! | 54: 910003fd mov x29, sp | 58: 97ffffec bl 8 | 5c: a8c17bfd ldp x29, x30, [sp], #16 | 60: d65f03c0 ret | 64: d503201f nop | 68: d503201f nop | 6c: d503201f nop |=20 | 0000000000000070 : | 70: a9bf7bfd stp x29, x30, [sp, #-16]! | 74: 910003fd mov x29, sp | 78: 97ffffe2 bl 0 | 7c: a8c17bfd ldp x29, x30, [sp], #16 | 80: d65f03c0 ret | 84: d503201f nop | 88: d503201f nop | 8c: d503201f nop |=20 | 0000000000000090 : | 90: a9bf7bfd stp x29, x30, [sp, #-16]! | 94: 910003fd mov x29, sp | 98: 97ffffdb bl 4 | 9c: a8c17bfd ldp x29, x30, [sp], #16 | a0: d65f03c0 ret | a4: d503201f nop | a8: d503201f nop | ac: d503201f nop |=20 | 00000000000000b0 : | b0: a9bf7bfd stp x29, x30, [sp, #-16]! | b4: 910003fd mov x29, sp | b8: 97ffffd4 bl 8 | bc: a8c17bfd ldp x29, x30, [sp], #16 | c0: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold-caller.o |=20 | test-aligned-cold-caller.o: file format elf64-littleaarch64 |=20 | Sections: | Idx Name Size VMA LMA File off= =20 Algn | 0 .text 000000c4 0000000000000000 0000000000000000 00000040= =20 2**4 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000108= =20 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 00000120= =20 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 00000120= =20 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 0000013= 3=20 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000110 0000000000000000 0000000000000000 00000138= =20 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA=