* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
@ 2020-10-15 13:46 ` jakub at gcc dot gnu.org
2020-10-15 14:27 ` rguenth at gcc dot gnu.org
` (54 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-15 13:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
See https://gcc.gnu.org/bugs.html, you haven't provided either preprocessed
source, nor gcc command line options.
inline keyword itself is not a guarantee that the function will be inlined, it
is inlined if it is possible and seems beneficial to the inlining heuristics.
If you want to always inline, use __attribute__((always_inline)) in addition to
inline keyword.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
2020-10-15 13:46 ` [Bug c/97445] " jakub at gcc dot gnu.org
@ 2020-10-15 14:27 ` rguenth at gcc dot gnu.org
2020-10-15 14:49 ` segher at gcc dot gnu.org
` (53 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-10-15 14:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
alternatively use inline w/o static to get C99 inline semantics (you have to
provide a single out of line copy yourself then via the appropriate
declaration)
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
2020-10-15 13:46 ` [Bug c/97445] " jakub at gcc dot gnu.org
2020-10-15 14:27 ` rguenth at gcc dot gnu.org
@ 2020-10-15 14:49 ` segher at gcc dot gnu.org
2020-10-15 14:52 ` jakub at gcc dot gnu.org
` (52 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: segher at gcc dot gnu.org @ 2020-10-15 14:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
Segher Boessenkool <segher at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |segher at gcc dot gnu.org
--- Comment #3 from Segher Boessenkool <segher at gcc dot gnu.org> ---
AFAICS the point is that this always compiles to just a handful of insns,
and the inliner should be able to see that (even if the source is biggish).
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (2 preceding siblings ...)
2020-10-15 14:49 ` segher at gcc dot gnu.org
@ 2020-10-15 14:52 ` jakub at gcc dot gnu.org
2020-10-15 15:08 ` christophe.leroy at csgroup dot eu
` (51 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-15 14:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Even if it is just a few insns, if it is larger than the function call, the
caller might already trigger threshold of how much it can be enlarged by
inlining.
If this bugreport would come with the requested details, we could look at the
inlining dump and see why gcc has decided not to inline it in particular cases.
Without that we can't do anything.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (3 preceding siblings ...)
2020-10-15 14:52 ` jakub at gcc dot gnu.org
@ 2020-10-15 15:08 ` christophe.leroy at csgroup dot eu
2020-10-15 15:13 ` christophe.leroy at csgroup dot eu
` (50 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-15 15:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #5 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
GCC version with the BUG:
Using built-in specs.
COLLECT_GCC=/opt/gcc-10.1.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc
COLLECT_LTO_WRAPPER=/home/opt/gcc-10.1.0-nolibc/powerpc64-linux/bin/../libexec/gcc/powerpc64-linux/10.1.0/lto-wrapper
Target: powerpc64-linux
Configured with: /home/arnd/git/gcc/configure --target=powerpc64-linux
--enable-targets=all
--prefix=/home/arnd/cross/x86_64/gcc-10.1.0-nolibc/powerpc64-linux
--enable-languages=c --without-headers --disable-bootstrap --disable-nls
--disable-threads --disable-shared --disable-libmudflap --disable-libssp
--disable-libgomp --disable-decimal-float --disable-libquadmath
--disable-libatomic --disable-libcc1 --disable-libmpx --enable-checking=release
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.1.0 (GCC)
GCC version without the BUG:
Using built-in specs.
COLLECT_GCC=/opt/gcc-9.2.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc
COLLECT_LTO_WRAPPER=/home/opt/gcc-9.2.0-nolibc/powerpc64-linux/bin/../libexec/gcc/powerpc64-linux/9.2.0/lto-wrapper
Target: powerpc64-linux
Configured with: /home/arnd/git/gcc/configure --target=powerpc64-linux
--enable-targets=all
--prefix=/home/arnd/cross/x86_64/gcc-9.2.0-nolibc/powerpc64-linux
--enable-languages=c --without-headers --disable-bootstrap --disable-nls
--disable-threads --disable-shared --disable-libmudflap --disable-libssp
--disable-libgomp --disable-decimal-float --disable-libquadmath
--disable-libatomic --disable-libcc1 --disable-libmpx --enable-checking=release
Thread model: single
gcc version 9.2.0 (GCC)
Arguments used:
powerpc64-linux-gcc -Wp,-MMD,arch/powerpc/kernel/.setup-common.o.d -nostdinc
-isystem
/home/opt/gcc-10.1.0-nolibc/powerpc64-linux/bin/../lib/gcc/powerpc64-linux/10.1.0/include
-I./arch/powerpc/include -I./arch/powerpc/include/generated -I./include
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-include ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc -Wall
-Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration
-Werror=implicit-int -Wno-format-security -std=gnu89 -mcpu=powerpc
-mcpu=powerpc -m32 -msoft-float -pipe -ffixed-r2 -mmultiple
-mno-readonly-in-sdata -mcpu=powerpc64 -mno-altivec -mno-vsx
-fno-asynchronous-unwind-tables -mno-string -mbig-endian
-mstack-protector-guard=tls -mstack-protector-guard-reg=r2
-fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation
-Wno-format-overflow -Wno-address-of-packed-member -O2
-fno-allow-store-data-races -Wframe-larger-than=1024 -fstack-protector
-Wno-unused-but-set-variable -Wimplicit-fallthrough -Wno-unused-const-variable
-fomit-frame-pointer -fno-var-tracking-assignments
-Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation
-Wno-zero-length-bounds -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict
-Wno-maybe-uninitialized -fno-strict-overflow -fno-merge-all-constants
-fmerge-constants -fno-stack-check -fconserve-stack -Werror=date-time
-Werror=incompatible-pointer-types -Werror=designated-init
-fmacro-prefix-map=./= -Wno-packed-not-aligned
-mstack-protector-guard-offset=552 -Werror
-DKBUILD_MODFILE='"arch/powerpc/kernel/setup-common"'
-DKBUILD_BASENAME='"setup_common"' -DKBUILD_MODNAME='"setup_common"' -c -o
arch/powerpc/kernel/setup-common.o arch/powerpc/kernel/setup-common.c
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (4 preceding siblings ...)
2020-10-15 15:08 ` christophe.leroy at csgroup dot eu
@ 2020-10-15 15:13 ` christophe.leroy at csgroup dot eu
2020-10-17 16:23 ` christophe.leroy at csgroup dot eu
` (49 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-15 15:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #6 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Sorry, the above command is for another problem I'm about to report.
The command in question in this bug report is:
powerpc64-linux-gcc -Wp,-MMD,arch/powerpc/kernel/.setup-common.o.d -nostdinc
-isystem
/home/opt/gcc-10.1.0-nolibc/powerpc64-linux/bin/../lib/gcc/powerpc64-linux/10.1.0/include
-I./arch/powerpc/include -I./arch/powerpc/include/generated -I./include
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-include ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc -Wall
-Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration
-Werror=implicit-int -Wno-format-security -std=gnu89 -mcpu=powerpc
-mcpu=powerpc -m32 -msoft-float -pipe -ffixed-r2 -mmultiple
-mno-readonly-in-sdata -mcpu=powerpc64 -mno-altivec -mno-vsx
-fno-asynchronous-unwind-tables -mno-string -mbig-endian
-fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation
-Wno-format-overflow -Wno-address-of-packed-member -O2
-fno-allow-store-data-races -Wframe-larger-than=1024 -fno-stack-protector
-Wno-unused-but-set-variable -Wimplicit-fallthrough -Wno-unused-const-variable
-fomit-frame-pointer -fno-var-tracking-assignments
-Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation
-Wno-zero-length-bounds -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict
-Wno-maybe-uninitialized -fno-strict-overflow -fno-merge-all-constants
-fmerge-constants -fno-stack-check -fconserve-stack -Werror=date-time
-Werror=incompatible-pointer-types -Werror=designated-init
-fmacro-prefix-map=./= -Wno-packed-not-aligned -Werror
-DKBUILD_MODFILE='"arch/powerpc/kernel/setup-common"'
-DKBUILD_BASENAME='"setup_common"' -DKBUILD_MODNAME='"setup_common"' -c -o
arch/powerpc/kernel/setup-common.o arch/powerpc/kernel/setup-common.c
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (5 preceding siblings ...)
2020-10-15 15:13 ` christophe.leroy at csgroup dot eu
@ 2020-10-17 16:23 ` christophe.leroy at csgroup dot eu
2020-10-17 16:31 ` christophe.leroy at csgroup dot eu
` (48 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-17 16:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #7 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
With get_order(), that's even worse: there are instances of it that are never
called:
c000f94c <get_order>:
c005a7ac <get_order>:
c005a9c4: 4b ff fd e9 bl c005a7ac <get_order>
c005ab38: 4b ff fc 75 bl c005a7ac <get_order>
c005ac50: 4b ff fb 5d bl c005a7ac <get_order>
c005b540 <get_order>:
c005b588: 4b ff ff b9 bl c005b540 <get_order>
c0074fe4 <get_order>:
c00815c8 <get_order>:
c00b8690 <get_order>:
c00ba898: 4b ff dd f9 bl c00b8690 <get_order>
c00ba9d4: 4b ff dc bd bl c00b8690 <get_order>
c00bd650 <get_order>:
c00c3cbc: 4b ff 99 95 bl c00bd650 <get_order>
c00c3e90: 4b ff 97 c1 bl c00bd650 <get_order>
c00cfd08 <get_order>:
c00d02a8: 4b ff fa 61 bl c00cfd08 <get_order>
c00d0310: 4b ff f9 f9 bl c00cfd08 <get_order>
c00d0ac8: 4b ff f2 41 bl c00cfd08 <get_order>
c00d0c90: 4b ff f0 79 bl c00cfd08 <get_order>
c00d0d60: 4b ff ef a9 bl c00cfd08 <get_order>
c00d0dcc: 4b ff ef 3d bl c00cfd08 <get_order>
c00d0e10: 4b ff ee f9 bl c00cfd08 <get_order>
c00d10ac: 4b ff ec 5d bl c00cfd08 <get_order>
c00d19e4: 4b ff e3 25 bl c00cfd08 <get_order>
c00d4fa4: 4b ff ad 65 bl c00cfd08 <get_order>
c00d5cb4: 4b ff a0 55 bl c00cfd08 <get_order>
c00d5cc0: 4b ff a0 49 bl c00cfd08 <get_order>
c00d60fc: 4b ff 9c 0d bl c00cfd08 <get_order>
c00e9c70 <get_order>:
c0114b50 <get_order>:
c013643c <get_order>:
c013a520 <get_order>:
c013b3f4: 4b ff f1 2d bl c013a520 <get_order>
c013b40c: 4b ff f1 15 bl c013a520 <get_order>
c013b438: 4b ff f0 e9 bl c013a520 <get_order>
c013b454: 4b ff f0 cd bl c013a520 <get_order>
c014236c: 4b ff 81 b5 bl c013a520 <get_order>
c01423d4: 4b ff 81 4d bl c013a520 <get_order>
c0150ae8 <get_order>:
c015b968 <get_order>:
c0162040 <get_order>:
c016a710 <get_order>:
c0182aec <get_order>:
c01abe78 <get_order>:
c01cd598 <get_order>:
c01d2764 <get_order>:
c01ee3e0 <get_order>:
c01fea40 <get_order>:
c020bfd4 <get_order>:
c021cd80 <get_order>:
c021e510 <get_order>:
c02204b4 <get_order>:
c0225534 <get_order>:
c0228bec <get_order>:
c022c5a4 <get_order>:
c0231100 <get_order>:
c02314c0: 4b ff fc 41 bl c0231100 <get_order>
c02537a8 <get_order>:
c025a620 <get_order>:
c025d6bc: 4b ff cf 65 bl c025a620 <get_order>
c025d750: 4b ff ce d1 bl c025a620 <get_order>
c0270144 <get_order>:
c0287110 <get_order>:
c028717c: 4b ff ff 95 bl c0287110 <get_order>
c02871f0: 4b ff ff 21 bl c0287110 <get_order>
c028f3b8 <get_order>:
c029fa00 <get_order>:
c02a3ae0: 4b ff bf 21 bl c029fa00 <get_order>
c02a4c68: 4b ff ad 99 bl c029fa00 <get_order>
c02a5020: 4b ff a9 e1 bl c029fa00 <get_order>
c02a520c: 4b ff a7 f5 bl c029fa00 <get_order>
c02a5644: 4b ff a3 bd bl c029fa00 <get_order>
c02ad00c <get_order>:
c02ad0c0: 4b ff ff 4d bl c02ad00c <get_order>
c02b4ce0 <get_order>:
c02ba234 <get_order>:
c02bd758 <get_order>:
c02c292c <get_order>:
c02ccd00 <get_order>:
c0326dcc <get_order>:
c0327fc4: 4b ff ee 09 bl c0326dcc <get_order>
c032836c: 4b ff ea 61 bl c0326dcc <get_order>
c0328784: 4b ff e6 49 bl c0326dcc <get_order>
c0328810: 4b ff e5 bd bl c0326dcc <get_order>
c0328860: 4b ff e5 6d bl c0326dcc <get_order>
c032a754 <get_order>:
c03335d4 <get_order>:
c033c37c <get_order>:
c033e4ac: 4b ff de d1 bl c033c37c <get_order>
c033e4bc: 4b ff de c1 bl c033c37c <get_order>
c03447b4 <get_order>:
c0345cf8: 4b ff ea bd bl c03447b4 <get_order>
c035a3fc <get_order>:
c0362694 <get_order>:
c0375710 <get_order>:
c041ab78: 4b ca 2a d9 bl c00bd650 <get_order>
c0429068: 4b c9 45 e9 bl c00bd650 <get_order>
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (6 preceding siblings ...)
2020-10-17 16:23 ` christophe.leroy at csgroup dot eu
@ 2020-10-17 16:31 ` christophe.leroy at csgroup dot eu
2020-10-19 12:33 ` jakub at gcc dot gnu.org
` (47 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-17 16:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #8 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Jakub Jelinek from comment #1)
> See https://gcc.gnu.org/bugs.html, you haven't provided either preprocessed
> source, nor gcc command line options.
> inline keyword itself is not a guarantee that the function will be inlined,
> it is inlined if it is possible and seems beneficial to the inlining
> heuristics.
> If you want to always inline, use __attribute__((always_inline)) in addition
> to inline keyword.
When adding -save-temps as explained in https://gcc.gnu.org/bugs.html I get an
error:
powerpc64-linux-gcc -Wp,-MMD,fs/.pipe.o.d -nostdinc -isystem
/home/opt/gcc-10.1.0-nolibc/powerpc64-linux/bin/../lib/gcc/powerpc64-linux/10.1.0/include
-I./arch/powerpc/include -I./arch/powerpc/include/generated -I./include
-I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-include ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc -Wall
-Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration
-Werror=implicit-int -Wno-format-security -std=gnu89 -mcpu=powerpc -m32
-msoft-float -pipe -ffixed-r2 -mmultiple -mno-readonly-in-sdata -mcpu=860
-mno-altivec -mno-vsx -fno-asynchronous-unwind-tables -mno-string -mbig-endian
-fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation
-Wno-format-overflow -Wno-address-of-packed-member -O2
-fno-allow-store-data-races -Wframe-larger-than=1024 -fno-stack-protector
-Wno-unused-but-set-variable -Wimplicit-fallthrough -Wno-unused-const-variable
-fomit-frame-pointer -fno-var-tracking-assignments -g
-Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation
-Wno-zero-length-bounds -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict
-Wno-maybe-uninitialized -fno-strict-overflow -fno-merge-all-constants
-fmerge-constants -fno-stack-check -fconserve-stack -Werror=date-time
-Werror=incompatible-pointer-types -Werror=designated-init
-fmacro-prefix-map=./= -Wno-packed-not-aligned -DKBUILD_MODFILE='"fs/pipe"'
-DKBUILD_BASENAME='"pipe"' -DKBUILD_MODNAME='"pipe"' -c -o fs/pipe.o fs/pipe.c
-save-temps
powerpc64-linux-gcc.br_real: warning: -pipe ignored because -save-temps
specified
powerpc64-linux-gcc.br_real: error: unrecognized command line option
‘-fno-allow-store-data-races’
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (7 preceding siblings ...)
2020-10-17 16:31 ` christophe.leroy at csgroup dot eu
@ 2020-10-19 12:33 ` jakub at gcc dot gnu.org
2020-10-19 12:39 ` christophe.leroy at csgroup dot eu
` (46 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-19 12:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
-fno-allow-store-data-races is fairly new option, previously it has been
--param=allow-store-data-races=0
I have no idea how you've tried to add -save-temps, so can't answer why you get
the error.
What I was suggesting is build with make V=1 and copy/paste the command line
used to compile a particular source file and append -save-temps to those
options
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (8 preceding siblings ...)
2020-10-19 12:33 ` jakub at gcc dot gnu.org
@ 2020-10-19 12:39 ` christophe.leroy at csgroup dot eu
2020-10-19 12:57 ` christophe.leroy at csgroup dot eu
` (45 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-19 12:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #10 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Jakub Jelinek from comment #9)
> What I was suggesting is build with make V=1 and copy/paste the command line
> used to compile a particular source file and append -save-temps to those
> options
That's exactly what I did, I built with V=1, then I copy/pasted the line and
added -save-temps at the end as you see in comment #8
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (9 preceding siblings ...)
2020-10-19 12:39 ` christophe.leroy at csgroup dot eu
@ 2020-10-19 12:57 ` christophe.leroy at csgroup dot eu
2020-10-19 12:59 ` christophe.leroy at csgroup dot eu
` (44 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-19 12:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #11 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Created attachment 49398
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49398&action=edit
Build of fs/pipe.o
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (10 preceding siblings ...)
2020-10-19 12:57 ` christophe.leroy at csgroup dot eu
@ 2020-10-19 12:59 ` christophe.leroy at csgroup dot eu
2020-10-19 13:01 ` christophe.leroy at csgroup dot eu
` (43 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-19 12:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #12 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Created attachment 49399
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49399&action=edit
pipe.s from build of fs/pipe.o
fs/pipe.o includes an instance of get_order() allthouth get_order() is declared
static and not called from any function.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (11 preceding siblings ...)
2020-10-19 12:59 ` christophe.leroy at csgroup dot eu
@ 2020-10-19 13:01 ` christophe.leroy at csgroup dot eu
2020-10-19 15:05 ` marxin at gcc dot gnu.org
` (42 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-19 13:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #13 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Christophe Leroy from comment #10)
> (In reply to Jakub Jelinek from comment #9)
> > What I was suggesting is build with make V=1 and copy/paste the command line
> > used to compile a particular source file and append -save-temps to those
> > options
>
> That's exactly what I did, I built with V=1, then I copy/pasted the line and
> added -save-temps at the end as you see in comment #8
This was a problem with PATH, when I was copying the command line, it was using
another (older) version of GCC which was the one in the PATH. Sorry.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (12 preceding siblings ...)
2020-10-19 13:01 ` christophe.leroy at csgroup dot eu
@ 2020-10-19 15:05 ` marxin at gcc dot gnu.org
2020-10-19 15:06 ` marxin at gcc dot gnu.org
` (41 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-10-19 15:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2020-10-19
Status|UNCONFIRMED |NEW
--- Comment #14 from Martin Liška <marxin at gcc dot gnu.org> ---
All right, I can reproduce it. I'm also attaching x86_64 pre-processes source
file.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (13 preceding siblings ...)
2020-10-19 15:05 ` marxin at gcc dot gnu.org
@ 2020-10-19 15:06 ` marxin at gcc dot gnu.org
2020-10-19 15:11 ` marxin at gcc dot gnu.org
` (40 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-10-19 15:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #15 from Martin Liška <marxin at gcc dot gnu.org> ---
Created attachment 49401
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49401&action=edit
x86_64 pre-processed source file
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (14 preceding siblings ...)
2020-10-19 15:06 ` marxin at gcc dot gnu.org
@ 2020-10-19 15:11 ` marxin at gcc dot gnu.org
2020-10-19 15:18 ` hubicka at ucw dot cz
` (39 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-10-19 15:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hubicka at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org
--- Comment #16 from Martin Liška <marxin at gcc dot gnu.org> ---
The following happens:
get_order is called by kmalloc_large which is called in kmalloc. And kmalloc
calls the function for larger allocations. Problem is that we eliminate all
calls to get_order late
pipe.i.108t.thread1:;; Function get_order (get_order, funcdef_no=295,
decl_uid=4528, cgraph_uid=300, symbol_order=303)
pipe.i.108t.thread1:get_order (long unsigned int size)
pipe.i.108t.thread1: _125 = get_order (_114);
pipe.i.108t.thread1: _67 = get_order (_56);
pipe.i.109t.cdce:;; Function get_order (get_order, funcdef_no=295,
decl_uid=4396, cgraph_uid=300, symbol_order=303)
pipe.i.109t.cdce:get_order (long unsigned int size)
so remove_unreachable_nodes is not called any more.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (15 preceding siblings ...)
2020-10-19 15:11 ` marxin at gcc dot gnu.org
@ 2020-10-19 15:18 ` hubicka at ucw dot cz
2020-10-19 15:33 ` marxin at gcc dot gnu.org
` (38 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-19 15:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #17 from Jan Hubicka <hubicka at ucw dot cz> ---
> The following happens:
>
> get_order is called by kmalloc_large which is called in kmalloc. And kmalloc
> calls the function for larger allocations. Problem is that we eliminate all
> calls to get_order late
>
> pipe.i.108t.thread1:;; Function get_order (get_order, funcdef_no=295,
> decl_uid=4528, cgraph_uid=300, symbol_order=303)
> pipe.i.108t.thread1:get_order (long unsigned int size)
> pipe.i.108t.thread1: _125 = get_order (_114);
> pipe.i.108t.thread1: _67 = get_order (_56);
> pipe.i.109t.cdce:;; Function get_order (get_order, funcdef_no=295,
> decl_uid=4396, cgraph_uid=300, symbol_order=303)
> pipe.i.109t.cdce:get_order (long unsigned int size)
>
> so remove_unreachable_nodes is not called any more.
Yep, that is by design - we are already outputting functions to
assembler file, so there is not much we can do at this moent. Option
wold be to do threading early of course. How often this happen in
practice?
Also note that -Winline outputs reasons why the static inline is not
inlined (which is also by design a decision of the inliner heuristics).
I suppose here the inliner sees the function called multiple times and
since it is quite long it decides to keep it offline. Opitmizing all
references late if of course unfortunate.
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (16 preceding siblings ...)
2020-10-19 15:18 ` hubicka at ucw dot cz
@ 2020-10-19 15:33 ` marxin at gcc dot gnu.org
2020-10-19 16:13 ` hubicka at gcc dot gnu.org
` (37 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-10-19 15:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #18 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #17)
> > The following happens:
> >
> > get_order is called by kmalloc_large which is called in kmalloc. And kmalloc
> > calls the function for larger allocations. Problem is that we eliminate all
> > calls to get_order late
> >
> > pipe.i.108t.thread1:;; Function get_order (get_order, funcdef_no=295,
> > decl_uid=4528, cgraph_uid=300, symbol_order=303)
> > pipe.i.108t.thread1:get_order (long unsigned int size)
> > pipe.i.108t.thread1: _125 = get_order (_114);
> > pipe.i.108t.thread1: _67 = get_order (_56);
> > pipe.i.109t.cdce:;; Function get_order (get_order, funcdef_no=295,
> > decl_uid=4396, cgraph_uid=300, symbol_order=303)
> > pipe.i.109t.cdce:get_order (long unsigned int size)
> >
> > so remove_unreachable_nodes is not called any more.
> Yep, that is by design - we are already outputting functions to
> assembler file, so there is not much we can do at this moent. Option
> wold be to do threading early of course. How often this happen in
> practice?
>
> Also note that -Winline outputs reasons why the static inline is not
You are right, this is printed:
gcc -O2 pipe.i -c -fdump-tree-all -Winline
In file included from ./arch/x86/include/asm/page.h:77,
from ./arch/x86/include/asm/thread_info.h:12,
from ./include/linux/thread_info.h:38,
from ./arch/x86/include/asm/preempt.h:7,
from ./include/linux/preempt.h:78,
from ./include/linux/spinlock.h:51,
from ./include/linux/mmzone.h:8,
from ./include/linux/gfp.h:6,
from ./include/linux/mm.h:10,
from fs/pipe.c:8:
./include/linux/slab.h: In function ‘alloc_pipe_info’:
./include/asm-generic/getorder.h:29:146: warning: inlining failed in call to
‘get_order’: --param max-inline-insns-single limit reached [-Winline]
29 | static inline __attribute_const__ int get_order(unsigned long size)
|
^
In file included from fs/pipe.c:11:
./include/linux/slab.h:482:30: note: called from here
482 | unsigned int order = get_order(size);
| ^~~~~~~~~~~~~~~
In file included from ./arch/x86/include/asm/page.h:77,
from ./arch/x86/include/asm/thread_info.h:12,
from ./include/linux/thread_info.h:38,
from ./arch/x86/include/asm/preempt.h:7,
from ./include/linux/preempt.h:78,
from ./include/linux/spinlock.h:51,
from ./include/linux/mmzone.h:8,
from ./include/linux/gfp.h:6,
from ./include/linux/mm.h:10,
from fs/pipe.c:8:
./include/linux/slab.h: In function ‘pipe_resize_ring’:
./include/asm-generic/getorder.h:29:146: warning: inlining failed in call to
‘get_order’: --param max-inline-insns-single limit reached [-Winline]
29 | static inline __attribute_const__ int get_order(unsigned long size)
|
^
In file included from fs/pipe.c:11:
./include/linux/slab.h:482:30: note: called from here
482 | unsigned int order = get_order(size);
| ^~~~~~~~~~~~~~~
> inlined (which is also by design a decision of the inliner heuristics).
> I suppose here the inliner sees the function called multiple times and
> since it is quite long it decides to keep it offline. Opitmizing all
> references late if of course unfortunate.
>
> Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (17 preceding siblings ...)
2020-10-19 15:33 ` marxin at gcc dot gnu.org
@ 2020-10-19 16:13 ` hubicka at gcc dot gnu.org
2020-10-19 16:20 ` hubicka at ucw dot cz
` (36 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-10-19 16:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #19 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
get_order unwinds to:
<bb 2> [local count: 1073741824]:
_1 = __builtin_constant_p (size_68(D));
if (_1 != 0)
goto <bb 3>; [50.00%]
else
goto <bb 71>; [50.00%]
<bb 3> [local count: 536870913]:
if (size_68(D) == 0)
goto <bb 72>; [21.72%]
else
goto <bb 4>; [78.28%]
<bb 4> [local count: 420262548]:
if (size_68(D) <= 4095)
goto <bb 72>; [50.00%]
else
goto <bb 5>; [50.00%]
<bb 5> [local count: 210131274]:
_2 = size_68(D) + 18446744073709551615;
_3 = __builtin_constant_p (_2);
if (_3 != 0)
goto <bb 6>; [50.00%]
else
goto <bb 69>; [50.00%]
<bb 6> [local count: 105065637]:
_4 = (signed long) _2;
if (_4 >= 0)
goto <bb 7>; [59.00%]
else
goto <bb 70>; [41.00%]
... [very long code]
<bb 69> [local count: 105065637]:
__asm__("bsrq %1,%q0" : "=r" bitpos_75 : "rm" _2, "0" -1);
iftmp.1_73 = bitpos_75 + -11;
<bb 70> [local count: 210131274]:
# iftmp.1_67 = PHI <52(6), iftmp.1_73(69), 51(7), 50(8), 49(9), 48(10),
47(11), 46(12), 45(13), 44(14), 43(15), 42(16), 41(17), 40(18), 39(19), 38(20),
37(21), 36(22), 35(23), 34(24), 33(25), 32(26), 31(27), 30(28), 29(29), 28(30),
27(31), 26(32), 25(33), 24(34), 23(35), 22(36), 21(37), 20(38), 19(39), 18(40),
17(41), 16(42), 15(43), 14(44), 13(45), 12(46), 11(47), 10(48), 9(49), 8(50),
7(51), 6(52), 5(53), 4(54), 3(55), 2(56), 1(57), 0(58), -1(59), -2(60), -3(61),
-4(62), -5(63), -6(64), -7(65), -8(66), -10(68), -9(67)>
goto <bb 72>; [100.00%]
<bb 71> [local count: 536870913]:
size_69 = size_68(D) + 18446744073709551615;
size_70 = size_69 >> 12;
__asm__("bsrq %1,%q0" : "=r" bitpos_72 : "rm" size_70, "0" -1);
_74 = bitpos_72 + 1;
<bb 72> [local count: 1073741824]:
# _66 = PHI <52(3), 0(4), iftmp.1_67(70), _74(71)>
return _66;
We get summary:
IPA function summary for get_order/303 inlinable
global time: 8.716289
self size: 201
global size: 201
min size: 4
self stack: 0
global stack: 0
size:4.000000, time:3.000000
size:3.000000, time:2.000000, executed if:(not inlined)
size:4.000000, time:2.000000, executed if:(op0 not constant)
size:2.000000, time:0.782800, executed if:(op0 != 0)
size:3.000000, time:0.391400, executed if:(op0 > 4095) && (op0 != 0)
size:2.000000, time:0.195700, executed if:(op0 > 4095) && (op0 != 0) &&
(op0 not constant)
size:3.000000, time:0.173194, executed if:(op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.000000, time:0.086597, executed if:(op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.000000, time:0.043299, executed if:(op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.000000, time:0.021649, executed if:(op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.000000, time:0.010825, executed if:(op0,(# +
18446744073709551615),(# & 576460752303423488) == 0) && (op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:168.000000, time:0.010825, executed if:(op0,(# +
18446744073709551615),(# & 288230376151711744) == 0) && (op0,(# +
18446744073709551615),(# & 576460752303423488) == 0) && (op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
calls:
__builtin_constant_p/4546 function body not available
freq:0.20 loop depth: 0 size: 0 time: 0 predicate: (op0 > 4095) && (op0
!= 0)
op0 points to local or readonly memory
__builtin_constant_p/4546 function body not available
freq:1.00 loop depth: 0 size: 0 time: 0
and then in calls to get_inline we do not know the constant parameter:
Estimating body: get_order/303
Known to be false: not inlined
size:198 time:6.716289 nonspec time:8.716289 loops with known
iterations:0.000000 known strides:0.000000
the problem here is size of 198 instructions while we inline up to 70
instructions. Of course after concluding that parameter is not constant this
would all collapse to just few instrutions.
It is difficult to handle builtin_constant_p correctly at this stage: ipa-prop
is missing a lot of known constants and it is quite possible parameter will be
folded to constant post inlining and thus we keep both variant.
We could teach ipa-predicates that the if is exclusive and thus only max of
both variants should be accounted byt it does not fit the way predicates works
very well. One option would be to takea hint that function with
builtin_constant_p on parameters really wants to be inlined and increase the
bounds (I think this owuld be good idea to do along with functions having
vector builtins in them), but that would cure only some cases, certainly not
all.
It is always possible to always_inline functions that are intended to be always
inlined.
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (18 preceding siblings ...)
2020-10-19 16:13 ` hubicka at gcc dot gnu.org
@ 2020-10-19 16:20 ` hubicka at ucw dot cz
2020-10-19 16:52 ` jakub at gcc dot gnu.org
` (35 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-19 16:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #20 from Jan Hubicka <hubicka at ucw dot cz> ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #19 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
> get_order unwinds to:
>
> <bb 2> [local count: 1073741824]:
> _1 = __builtin_constant_p (size_68(D));
> if (_1 != 0)
> goto <bb 3>; [50.00%]
> else
> goto <bb 71>; [50.00%]
>
> <bb 3> [local count: 536870913]:
> if (size_68(D) == 0)
> goto <bb 72>; [21.72%]
> else
> goto <bb 4>; [78.28%]
>
> <bb 4> [local count: 420262548]:
> if (size_68(D) <= 4095)
> goto <bb 72>; [50.00%]
> else
> goto <bb 5>; [50.00%]
>
> <bb 5> [local count: 210131274]:
> _2 = size_68(D) + 18446744073709551615;
> _3 = __builtin_constant_p (_2);
Forgot to note, things would be easier if we folded this to _1 :)
Among other we run on out of the limit on number of conditionals
recorded by the fnsummary pass.
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (19 preceding siblings ...)
2020-10-19 16:20 ` hubicka at ucw dot cz
@ 2020-10-19 16:52 ` jakub at gcc dot gnu.org
2020-10-19 17:13 ` hubicka at ucw dot cz
` (34 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-19 16:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #20)
> > _2 = size_68(D) + 18446744073709551615;
> > _3 = __builtin_constant_p (_2);
> Forgot to note, things would be easier if we folded this to _1 :)
> Among other we run on out of the limit on number of conditionals
> recorded by the fnsummary pass.
Maybe better just have a match.pd rule that would fold
y = z binop cst;
x = __builtin_constant_p (y);
to
x = __builtin_constant_p (z);
and let SCCVN do the rest (or do it in fwprop or whatever else if we'd want to
write it in C without having to enumerate all possible binops we want to do it
for).
Not sure if we shouldn't stop on ops that could trap though, or on ops that
could possibly be invalid...
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (20 preceding siblings ...)
2020-10-19 16:52 ` jakub at gcc dot gnu.org
@ 2020-10-19 17:13 ` hubicka at ucw dot cz
2020-10-20 5:19 ` christophe.leroy at csgroup dot eu
` (33 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-19 17:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #22 from Jan Hubicka <hubicka at ucw dot cz> ---
> Maybe better just have a match.pd rule that would fold
> y = z binop cst;
> x = __builtin_constant_p (y);
> to
> x = __builtin_constant_p (z);
> and let SCCVN do the rest (or do it in fwprop or whatever else if we'd want to
> write it in C without having to enumerate all possible binops we want to do it
> for).
>
> Not sure if we shouldn't stop on ops that could trap though, or on ops that
> could possibly be invalid...
We need to establish that z binop cst folds to a constant whenever z is
a constant or we may run into cases where we go into builtin_constant_p
== true case and then fail to fold the actual value.
Honza
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (21 preceding siblings ...)
2020-10-19 17:13 ` hubicka at ucw dot cz
@ 2020-10-20 5:19 ` christophe.leroy at csgroup dot eu
2020-10-20 6:44 ` Jan Hubicka
2020-10-20 5:21 ` christophe.leroy at csgroup dot eu
` (32 subsequent siblings)
55 siblings, 1 reply; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 5:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #23 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Jan Hubicka from comment #19)
>
> It is always possible to always_inline functions that are intended to be
> always inlined.
> Honza
Yes and I sent a patch for that to the Linux kernel, but what I would like to
understand is why does GCC 10 completely fails to inline that while GCC 9 was
doing things properly ?
Find attached the same temp files generated with GCC 9. GCC9 sees that
get_order() is not used and doesn't generate it.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-20 5:19 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 6:44 ` Jan Hubicka
0 siblings, 0 replies; 62+ messages in thread
From: Jan Hubicka @ 2020-10-20 6:44 UTC (permalink / raw)
To: christophe.leroy at csgroup dot eu; +Cc: gcc-bugs
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #23 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
> (In reply to Jan Hubicka from comment #19)
> >
> > It is always possible to always_inline functions that are intended to be
> > always inlined.
> > Honza
>
> Yes and I sent a patch for that to the Linux kernel, but what I would like to
> understand is why does GCC 10 completely fails to inline that while GCC 9 was
> doing things properly ?
It is because --param inline-insns-single was reduced for -O2 from 200
to 70. GCC 10 has newly different set of parameters for -O2 and -O3 and
enables auto-inlining at -O2.
Problem with inlininig funtions declared inline is that C++ codebases
tends to abuse this keyword for things that are really too large (and
get_order would be such example if it did not have builtin_constant_p
check which inliner does not understand well). So having same limit at
-O2 and -O3 turned out to be problematic with respect to code size and
especially with respect to LTO, where a lot more inlining oppurtunities
appear.
I will implement the heuristics to push up inline limits of functions
having builtin_constant_p of parameter which should help a bit in this
case (but not very systematically: as dicussed in the PR log it is quite
hard problem to get builtin_constant_p right in the code size metrics
used by inliner before it knows exactly what is going to be constant and
what is not).
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (22 preceding siblings ...)
2020-10-20 5:19 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 5:21 ` christophe.leroy at csgroup dot eu
2020-10-20 5:21 ` christophe.leroy at csgroup dot eu
` (31 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 5:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #24 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Created attachment 49403
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49403&action=edit
pipe.i from build of fs/pipe.o with GCC 9.2
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (23 preceding siblings ...)
2020-10-20 5:21 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 5:21 ` christophe.leroy at csgroup dot eu
2020-10-20 6:17 ` christophe.leroy at csgroup dot eu
` (30 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 5:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #25 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Created attachment 49404
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49404&action=edit
pipe.s from build of fs/pipe.o with GCC 9.2
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (24 preceding siblings ...)
2020-10-20 5:21 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 6:17 ` christophe.leroy at csgroup dot eu
2020-10-20 6:44 ` hubicka at ucw dot cz
` (29 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 6:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #26 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
In reality it is not perfect with GCC 9.2, but that's better, only two
instances are unused.
[root@po17688vm linux-powerpc]# ppc-linux-objdump -d vmlinux | grep get_order
c00c0470 <get_order>:
c013e238 <get_order>:
c0225f68 <get_order>:
c041ebfc: 4b ca 18 75 bl c00c0470 <get_order>
c042d34c: 4b c9 31 25 bl c00c0470 <get_order>
In the few places where get_order() remains with GCC 9.2, I get the following
warning:
warning: inlining failed in call to 'get_order': call is unlikely and code size
would grow [-Winline]
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (25 preceding siblings ...)
2020-10-20 6:17 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 6:44 ` hubicka at ucw dot cz
2020-10-20 7:09 ` christophe.leroy at csgroup dot eu
` (28 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-20 6:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #27 from Jan Hubicka <hubicka at ucw dot cz> ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #23 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
> (In reply to Jan Hubicka from comment #19)
> >
> > It is always possible to always_inline functions that are intended to be
> > always inlined.
> > Honza
>
> Yes and I sent a patch for that to the Linux kernel, but what I would like to
> understand is why does GCC 10 completely fails to inline that while GCC 9 was
> doing things properly ?
It is because --param inline-insns-single was reduced for -O2 from 200
to 70. GCC 10 has newly different set of parameters for -O2 and -O3 and
enables auto-inlining at -O2.
Problem with inlininig funtions declared inline is that C++ codebases
tends to abuse this keyword for things that are really too large (and
get_order would be such example if it did not have builtin_constant_p
check which inliner does not understand well). So having same limit at
-O2 and -O3 turned out to be problematic with respect to code size and
especially with respect to LTO, where a lot more inlining oppurtunities
appear.
I will implement the heuristics to push up inline limits of functions
having builtin_constant_p of parameter which should help a bit in this
case (but not very systematically: as dicussed in the PR log it is quite
hard problem to get builtin_constant_p right in the code size metrics
used by inliner before it knows exactly what is going to be constant and
what is not).
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (26 preceding siblings ...)
2020-10-20 6:44 ` hubicka at ucw dot cz
@ 2020-10-20 7:09 ` christophe.leroy at csgroup dot eu
2020-10-20 7:10 ` christophe.leroy at csgroup dot eu
` (27 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 7:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #28 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Find attached the temporary files for net/core/skbuff.c, as it is indeed what
initially triggered my focus on this issue. I don't expect such a function at
all:
00000cc8 <csum_partial>:
cc8: 48 00 00 00 b cc8 <csum_partial>
cc8: R_PPC_REL24 __csum_partial
Every function should call __csum_partial() directly.
I guess that's the same problem, powerpc csum_partial() does sums 'manually'
for some particular values of constant 'len', then fallbacks on
__csum_partial().
When 'len' is not a constant, csum_partial() is reduced to a call to
__csum_partial(), and I'd expect GCC 10 to call __csum_partial() directly as
all previous GCC have always done.
Though, getting the following warning when building:
warning: inlining failed in call to 'csum_partial': --param
max-inline-insns-single limit reached [-Winline]
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (27 preceding siblings ...)
2020-10-20 7:09 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 7:10 ` christophe.leroy at csgroup dot eu
2020-10-20 7:10 ` christophe.leroy at csgroup dot eu
` (26 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 7:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #29 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Created attachment 49406
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49406&action=edit
skbuff.s from build of net/core/skbuff.o with GCC 10
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (28 preceding siblings ...)
2020-10-20 7:10 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 7:10 ` christophe.leroy at csgroup dot eu
2020-10-20 9:55 ` segher at gcc dot gnu.org
` (25 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 7:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #30 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
Created attachment 49407
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49407&action=edit
skbuff.i from build of net/core/skbuff.o with GCC 10
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (29 preceding siblings ...)
2020-10-20 7:10 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 9:55 ` segher at gcc dot gnu.org
2020-10-20 10:16 ` Jan Hubicka
2020-10-20 10:16 ` hubicka at ucw dot cz
` (24 subsequent siblings)
55 siblings, 1 reply; 62+ messages in thread
From: segher at gcc dot gnu.org @ 2020-10-20 9:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #31 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #27)
> It is because --param inline-insns-single was reduced for -O2 from 200
> to 70. GCC 10 has newly different set of parameters for -O2 and -O3 and
> enables auto-inlining at -O2.
>
> Problem with inlininig funtions declared inline is that C++ codebases
> tends to abuse this keyword for things that are really too large (and
> get_order would be such example if it did not have builtin_constant_p
> check which inliner does not understand well). So having same limit at
> -O2 and -O3 turned out to be problematic with respect to code size and
> especially with respect to LTO, where a lot more inlining oppurtunities
> appear.
Do the heuristics account for that not inlining a "static inline" results
in multiple copies?
> I will implement the heuristics to push up inline limits of functions
> having builtin_constant_p of parameter which should help a bit in this
> case
Thank you!
> (but not very systematically: as dicussed in the PR log it is quite
> hard problem to get builtin_constant_p right in the code size metrics
> used by inliner before it knows exactly what is going to be constant and
> what is not).
That is true for many other inlining things as well... builtin_constant_p
is worse than most I guess ;-)
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-20 9:55 ` segher at gcc dot gnu.org
@ 2020-10-20 10:16 ` Jan Hubicka
0 siblings, 0 replies; 62+ messages in thread
From: Jan Hubicka @ 2020-10-20 10:16 UTC (permalink / raw)
To: segher at gcc dot gnu.org; +Cc: gcc-bugs
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #31 from Segher Boessenkool <segher at gcc dot gnu.org> ---
> (In reply to Jan Hubicka from comment #27)
> > It is because --param inline-insns-single was reduced for -O2 from 200
> > to 70. GCC 10 has newly different set of parameters for -O2 and -O3 and
> > enables auto-inlining at -O2.
> >
> > Problem with inlininig funtions declared inline is that C++ codebases
> > tends to abuse this keyword for things that are really too large (and
> > get_order would be such example if it did not have builtin_constant_p
> > check which inliner does not understand well). So having same limit at
> > -O2 and -O3 turned out to be problematic with respect to code size and
> > especially with respect to LTO, where a lot more inlining oppurtunities
> > appear.
>
> Do the heuristics account for that not inlining a "static inline" results
> in multiple copies?
It prevents inlining only when there are multiple calls in the unit
being compiled (there is no way to know that the same inline function is
duplicated in other units).
This is what happens here: there are multiple calls so inliner concludes
inlining would cost too much of code size and later they are optimized
away.
get_order is a wrapper around ffs64. This can be implemented w/o asm
statement as follows:
int
my_fls64 (__u64 x)
{
if (!x)
return 0;
return 64 - __builtin_clzl (x);
}
This results in longer assembly than the kernel asm implementation. If
that matters I would replace builtin_constnat_p part of get_order by this
implementation that is more transparent to the code size estimation and
things will get inlined.
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (30 preceding siblings ...)
2020-10-20 9:55 ` segher at gcc dot gnu.org
@ 2020-10-20 10:16 ` hubicka at ucw dot cz
2020-10-20 10:40 ` jakub at gcc dot gnu.org
` (23 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-20 10:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #32 from Jan Hubicka <hubicka at ucw dot cz> ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #31 from Segher Boessenkool <segher at gcc dot gnu.org> ---
> (In reply to Jan Hubicka from comment #27)
> > It is because --param inline-insns-single was reduced for -O2 from 200
> > to 70. GCC 10 has newly different set of parameters for -O2 and -O3 and
> > enables auto-inlining at -O2.
> >
> > Problem with inlininig funtions declared inline is that C++ codebases
> > tends to abuse this keyword for things that are really too large (and
> > get_order would be such example if it did not have builtin_constant_p
> > check which inliner does not understand well). So having same limit at
> > -O2 and -O3 turned out to be problematic with respect to code size and
> > especially with respect to LTO, where a lot more inlining oppurtunities
> > appear.
>
> Do the heuristics account for that not inlining a "static inline" results
> in multiple copies?
It prevents inlining only when there are multiple calls in the unit
being compiled (there is no way to know that the same inline function is
duplicated in other units).
This is what happens here: there are multiple calls so inliner concludes
inlining would cost too much of code size and later they are optimized
away.
get_order is a wrapper around ffs64. This can be implemented w/o asm
statement as follows:
int
my_fls64 (__u64 x)
{
if (!x)
return 0;
return 64 - __builtin_clzl (x);
}
This results in longer assembly than the kernel asm implementation. If
that matters I would replace builtin_constnat_p part of get_order by this
implementation that is more transparent to the code size estimation and
things will get inlined.
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (31 preceding siblings ...)
2020-10-20 10:16 ` hubicka at ucw dot cz
@ 2020-10-20 10:40 ` jakub at gcc dot gnu.org
2020-10-20 11:12 ` Jan Hubicka
2020-10-20 11:12 ` hubicka at ucw dot cz
` (22 subsequent siblings)
55 siblings, 1 reply; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 10:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #32)
> get_order is a wrapper around ffs64. This can be implemented w/o asm
> statement as follows:
> int
> my_fls64 (__u64 x)
> {
> if (!x)
> return 0;
> return 64 - __builtin_clzl (x);
> }
>
> This results in longer assembly than the kernel asm implementation. If
> that matters I would replace builtin_constnat_p part of get_order by this
> implementation that is more transparent to the code size estimation and
> things will get inlined.
Better __builtin_clzll so that it works also on 32-bit arches.
Anyway, if kernel's fls64 results in better code than the my_fls64, we should
look at GCC's code generation for that case.
And, perhaps kernel's const_ilog2 should be reimplemented using __builtin_clz*?
Or, maybe even better, keep const_ilog2 as is because as it is declared it
should be usable even in pedantic C constant expressions, and just change ilog2
to:
#define ilog2(n) \
( \
__builtin_constant_p(n) ? \
((n) < 2 ? 0 : 63 - __builtin_clzll (n)) : \
(sizeof(n) <= 4) ? \
__ilog2_u32(n) : \
__ilog2_u64(n) \
)
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-20 10:40 ` jakub at gcc dot gnu.org
@ 2020-10-20 11:12 ` Jan Hubicka
0 siblings, 0 replies; 62+ messages in thread
From: Jan Hubicka @ 2020-10-20 11:12 UTC (permalink / raw)
To: jakub at gcc dot gnu.org; +Cc: gcc-bugs
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Jan Hubicka from comment #32)
> > get_order is a wrapper around ffs64. This can be implemented w/o asm
> > statement as follows:
> > int
> > my_fls64 (__u64 x)
> > {
> > if (!x)
> > return 0;
> > return 64 - __builtin_clzl (x);
> > }
> >
> > This results in longer assembly than the kernel asm implementation. If
> > that matters I would replace builtin_constnat_p part of get_order by this
> > implementation that is more transparent to the code size estimation and
> > things will get inlined.
>
> Better __builtin_clzll so that it works also on 32-bit arches.
> Anyway, if kernel's fls64 results in better code than the my_fls64, we should
> look at GCC's code generation for that case.
Original asm is:
__attribute__ ((noinline))
int fls64(__u64 x)
{
int bitpos = -1;
asm("bsrq %1,%q0"
: "+r" (bitpos)
: "rm" (x));
return bitpos + 1;
}
There seems to be bug in bsr{q} pattern. I can make GCC produce same
code with:
__attribute__ ((noinline))
int
my_fls64 (__u64 x)
{
asm volatile ("movl $-1, %eax");
return (__builtin_clzll (x) ^ 63) + 1;
}
But obviously the volatile asm should not be needed. I think bsrq is
incorrectly modelled as returning full register
(define_insn "bsr_rex64"
[(set (match_operand:DI 0 "register_operand" "=r")
(minus:DI (const_int 63)
(clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"))))
(clobber (reg:CC FLAGS_REG))]
"TARGET_64BIT"
"bsr{q}\t{%1, %0|%0, %1}"
[(set_attr "type" "alu1")
(set_attr "prefix_0f" "1")
(set_attr "znver1_decode" "vector")
(set_attr "mode" "DI")])
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (32 preceding siblings ...)
2020-10-20 10:40 ` jakub at gcc dot gnu.org
@ 2020-10-20 11:12 ` hubicka at ucw dot cz
2020-10-20 11:17 ` Jan Hubicka
2020-10-20 11:17 ` hubicka at ucw dot cz
` (21 subsequent siblings)
55 siblings, 1 reply; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-20 11:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #34 from Jan Hubicka <hubicka at ucw dot cz> ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>
> --- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Jan Hubicka from comment #32)
> > get_order is a wrapper around ffs64. This can be implemented w/o asm
> > statement as follows:
> > int
> > my_fls64 (__u64 x)
> > {
> > if (!x)
> > return 0;
> > return 64 - __builtin_clzl (x);
> > }
> >
> > This results in longer assembly than the kernel asm implementation. If
> > that matters I would replace builtin_constnat_p part of get_order by this
> > implementation that is more transparent to the code size estimation and
> > things will get inlined.
>
> Better __builtin_clzll so that it works also on 32-bit arches.
> Anyway, if kernel's fls64 results in better code than the my_fls64, we should
> look at GCC's code generation for that case.
Original asm is:
__attribute__ ((noinline))
int fls64(__u64 x)
{
int bitpos = -1;
asm("bsrq %1,%q0"
: "+r" (bitpos)
: "rm" (x));
return bitpos + 1;
}
There seems to be bug in bsr{q} pattern. I can make GCC produce same
code with:
__attribute__ ((noinline))
int
my_fls64 (__u64 x)
{
asm volatile ("movl $-1, %eax");
return (__builtin_clzll (x) ^ 63) + 1;
}
But obviously the volatile asm should not be needed. I think bsrq is
incorrectly modelled as returning full register
(define_insn "bsr_rex64"
[(set (match_operand:DI 0 "register_operand" "=r")
(minus:DI (const_int 63)
(clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"))))
(clobber (reg:CC FLAGS_REG))]
"TARGET_64BIT"
"bsr{q}\t{%1, %0|%0, %1}"
[(set_attr "type" "alu1")
(set_attr "prefix_0f" "1")
(set_attr "znver1_decode" "vector")
(set_attr "mode" "DI")])
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-20 11:12 ` hubicka at ucw dot cz
@ 2020-10-20 11:17 ` Jan Hubicka
2020-10-20 13:12 ` Jan Hubicka
0 siblings, 1 reply; 62+ messages in thread
From: Jan Hubicka @ 2020-10-20 11:17 UTC (permalink / raw)
To: hubicka at ucw dot cz; +Cc: gcc-bugs
>
> Original asm is:
>
> __attribute__ ((noinline))
> int fls64(__u64 x)
> {
> int bitpos = -1;
> asm("bsrq %1,%q0"
> : "+r" (bitpos)
> : "rm" (x));
> return bitpos + 1;
> }
>
> There seems to be bug in bsr{q} pattern. I can make GCC produce same
> code with:
>
> __attribute__ ((noinline))
> int
> my_fls64 (__u64 x)
> {
> asm volatile ("movl $-1, %eax");
> return (__builtin_clzll (x) ^ 63) + 1;
> }
Aha, bsr is not doing anything if parameter is 0, so pattern is correct
(just the instruction is undefined for 0 which makes sense).
But with that pattern GCC can't synthetize the code sequence above :)
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-20 11:17 ` Jan Hubicka
@ 2020-10-20 13:12 ` Jan Hubicka
0 siblings, 0 replies; 62+ messages in thread
From: Jan Hubicka @ 2020-10-20 13:12 UTC (permalink / raw)
To: hubicka at ucw dot cz, mjambor; +Cc: gcc-bugs
[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]
Hi,
this implements the heuristics increasing bounds for functions having
__builtin_constant_p on parameters. Note that for get_order this is
still not enough, since we increase the bound twice if hint applies, so
it goes from 70 to 140 and not to 190 needed, however it will handle
ohter similar cases.
If hint weight is increased to 300%, so 210 I get:
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build10/gcc$ ./xgcc -B ./ -O2 -Winline pipe.i --param inline-heuristics-hint-percent=300
In file included from fs/pipe.c:11:
./include/linux/slab.h: In function ‘alloc_pipe_info’:
./include/linux/slab.h:586:121: warning: inlining failed in call to ‘kmalloc_array.constprop’: --param max-inline-insns-single limit reached
[-Winline]
./include/linux/slab.h:605:9: note: called from here
./include/linux/slab.h: In function ‘pipe_resize_ring’:
./include/linux/slab.h:586:121: warning: inlining failed in call to ‘kmalloc_array.constprop’: --param max-inline-insns-single limit reached [-Winline]
./include/linux/slab.h:605:9: note: called from here
So the problem only shifts to not inlininig kmalloc_array.
(that is why it would be nice to update kernel with the easier
get_order)
However it shows different problem: ipa-cp produces cone of
kmalloc_array since it is always used by constant size, but the clone
does not update the predicates, so we lose track about the parameter
being constant and that is why we optimize out only late.
Martin, I think this is caused by long lasting TODO in
ipa_fn_summary_t::duplicate and probably we should implement it: based
on the known partial assignment of params to constant we should fold the
conditions in predicates.
Indeed with ./xgcc -B ./ -O2 -Winline pipe.i -fno-ipa-cp --param inline-heuristics-hint-percent=300
the warning goes away. We still need the stronger hint though.
[-- Attachment #2: hintsp --]
[-- Type: text/plain, Size: 9192 bytes --]
gcc/ChangeLog:
2020-10-20 Jan Hubicka <hubicka@ucw.cz>
PR c/97445
* ipa-fnsummary.c (ipa_dump_hints): Handle
INLINE_HINT_builtin_constant_p.
(ipa_fn_summary::~ipa_fn_summary): Free builtin_constant_p_parms.
(ipa_fn_summary_t::duplicate): Copy builtin_constant_p_parms.
(ipa_dump_fn_summary): Dump builtin_constant_p_parms.
(set_cond_stmt_execution_predicate): Compute builtin_constant_p_parms.
(ipa_call_context::estimate_size_and_time): Set
INLINE_HINT_builtin_constant_p.
(ipa_merge_fn_summary_after_inlining): Merge builtin_constant_p_parms.
(inline_read_section): Stream builtin_constant_p_parms.
(ipa_fn_summary_write): Stream builtin_constant_p_parms.
* ipa-fnsummary.h (enum ipa_hints_vals): Add
INLINE_HINT_builtin_constant_p.
(ipa_fn_summary): Add builtin_constant_p_parms.
* ipa-inline.c (want_inline_small_function_p): Handle
INLINE_HINT_builtin_constant_p.
(edge_badness): Handle INLINE_HINT_builtin_constant_p.
gcc/testsuite/ChangeLog:
2020-10-20 Jan Hubicka <hubicka@ucw.cz>
* gcc.dg/ipa/inlinehint-5.c: New test.
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 9e3eda4d3cb..4292f1f5fe7 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -141,6 +141,11 @@ ipa_dump_hints (FILE *f, ipa_hints hints)
hints &= ~INLINE_HINT_known_hot;
fprintf (f, " known_hot");
}
+ if (hints & INLINE_HINT_builtin_constant_p)
+ {
+ hints &= ~INLINE_HINT_builtin_constant_p;
+ fprintf (f, " builtin_constant_p");
+ }
gcc_assert (!hints);
}
@@ -751,6 +756,7 @@ ipa_fn_summary::~ipa_fn_summary ()
vec_free (call_size_time_table);
vec_free (loop_iterations);
vec_free (loop_strides);
+ vec_free (builtin_constant_p_parms);
}
void
@@ -805,6 +811,10 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
that are known to be false or true. */
info->conds = vec_safe_copy (info->conds);
+ if (info->builtin_constant_p_parms)
+ info->builtin_constant_p_parms
+ = vec_safe_copy (info->builtin_constant_p_parms);
+
/* When there are any replacements in the function body, see if we can figure
out that something was optimized out. */
if (ipa_node_params_sum && dst->clone.tree_map)
@@ -1066,6 +1076,13 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node)
fprintf (f, " inlinable");
if (s->fp_expressions)
fprintf (f, " fp_expression");
+ if (s->builtin_constant_p_parms)
+ {
+ fprintf (f, " builtin_constant_p_parms");
+ for (unsigned int i = 0;
+ i < s->builtin_constant_p_parms->length (); i++)
+ fprintf (f, " %i", (*s->builtin_constant_p_parms)[i]);
+ }
fprintf (f, "\n global time: %f\n", s->time.to_double ());
fprintf (f, " self size: %i\n", ss->self_size);
fprintf (f, " global size: %i\n", ss->size);
@@ -1598,6 +1615,8 @@ set_cond_stmt_execution_predicate (struct ipa_func_body_info *fbi,
op2 = gimple_call_arg (set_stmt, 0);
if (!decompose_param_expr (fbi, set_stmt, op2, &index, ¶m_type, &aggpos))
return;
+ if (!aggpos.by_ref)
+ vec_safe_push (summary->builtin_constant_p_parms, index);
FOR_EACH_EDGE (e, ei, bb->succs) if (e->flags & EDGE_FALSE_VALUE)
{
predicate p = add_condition (summary, params_summary, index,
@@ -3717,6 +3736,9 @@ ipa_call_context::estimate_size_and_time (ipa_call_estimates *estimates,
hints |= INLINE_HINT_in_scc;
if (DECL_DECLARED_INLINE_P (m_node->decl))
hints |= INLINE_HINT_declared_inline;
+ if (info->builtin_constant_p_parms
+ && DECL_DECLARED_INLINE_P (m_node->decl))
+ hints |= INLINE_HINT_builtin_constant_p;
ipa_freqcounting_predicate *fcp;
for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
@@ -4044,8 +4066,13 @@ ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge)
operand_map[i] = map;
gcc_assert (map < ipa_get_param_count (params_summary));
}
+ int *ip;
+ for (i = 0; vec_safe_iterate (callee_info->builtin_constant_p_parms,
+ i, &ip); i++)
+ if (*ip < count && operand_map[*ip] > 0)
+ vec_safe_push (info->builtin_constant_p_parms, operand_map[*ip]);
}
- sreal freq = edge->sreal_frequency ();
+ sreal freq = edge->sreal_frequency ();
for (i = 0; vec_safe_iterate (callee_info->size_time_table, i, &e); i++)
{
predicate p;
@@ -4443,6 +4470,15 @@ inline_read_section (struct lto_file_decl_data *file_data, const char *data,
vec_safe_push (info->loop_strides, fcp);
}
}
+ count2 = streamer_read_uhwi (&ib);
+ if (info && count2)
+ vec_safe_reserve_exact (info->builtin_constant_p_parms, count2);
+ for (j = 0; j < count2; j++)
+ {
+ int parm = streamer_read_uhwi (&ib);
+ if (info)
+ info->builtin_constant_p_parms->quick_push (parm);
+ }
for (e = node->callees; e; e = e->next_callee)
read_ipa_call_summary (&ib, e, info != NULL);
for (e = node->indirect_calls; e; e = e->next_callee)
@@ -4618,6 +4654,13 @@ ipa_fn_summary_write (void)
fcp->predicate->stream_out (ob);
fcp->freq.stream_out (ob);
}
+ streamer_write_uhwi (ob,
+ vec_safe_length
+ (info->builtin_constant_p_parms));
+ int *ip;
+ for (i = 0; vec_safe_iterate (info->builtin_constant_p_parms, i, &ip);
+ i++)
+ streamer_write_uhwi (ob, *ip);
for (edge = cnode->callees; edge; edge = edge->next_callee)
write_ipa_call_summary (ob, edge);
for (edge = cnode->indirect_calls; edge; edge = edge->next_callee)
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index f4dd5b85ab9..4c4a90dd622 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -49,7 +49,10 @@ enum ipa_hints_vals {
Set by simple_edge_hints in ipa-inline-analysis.c. */
INLINE_HINT_cross_module = 64,
/* We know that the callee is hot by profile. */
- INLINE_HINT_known_hot = 128
+ INLINE_HINT_known_hot = 128,
+ /* There is builtin_constant_p dependent on parameter which is usually
+ a strong hint to inline. */
+ INLINE_HINT_builtin_constant_p = 256
};
typedef int ipa_hints;
@@ -123,10 +126,12 @@ public:
ipa_fn_summary ()
: min_size (0),
inlinable (false), single_caller (false),
- fp_expressions (false), estimated_stack_size (false),
+ fp_expressions (false),
+ estimated_stack_size (false),
time (0), conds (NULL),
size_time_table (NULL), call_size_time_table (NULL),
loop_iterations (NULL), loop_strides (NULL),
+ builtin_constant_p_parms (NULL),
growth (0), scc_no (0)
{
}
@@ -140,6 +145,7 @@ public:
time (s.time), conds (s.conds), size_time_table (s.size_time_table),
call_size_time_table (NULL),
loop_iterations (s.loop_iterations), loop_strides (s.loop_strides),
+ builtin_constant_p_parms (s.builtin_constant_p_parms),
growth (s.growth), scc_no (s.scc_no)
{}
@@ -182,6 +188,8 @@ public:
vec<ipa_freqcounting_predicate, va_gc> *loop_iterations;
/* Predicates on when some loops in the function can have known strides. */
vec<ipa_freqcounting_predicate, va_gc> *loop_strides;
+ /* Parameters tested by builtin_constant_p. */
+ vec<int, va_gc> * GTY((skip)) builtin_constant_p_parms;
/* Estimated growth for inlining all copies of the function before start
of small functions inlining.
This value will get out of date as the callers are duplicated, but
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 225a0140725..99e6002149b 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -878,7 +878,8 @@ want_inline_small_function_p (struct cgraph_edge *e, bool report)
bool apply_hints = (hints & (INLINE_HINT_indirect_call
| INLINE_HINT_known_hot
| INLINE_HINT_loop_iterations
- | INLINE_HINT_loop_stride));
+ | INLINE_HINT_loop_stride
+ | INLINE_HINT_builtin_constant_p));
if (growth <= opt_for_fn (to->decl,
param_max_inline_insns_size))
@@ -1314,7 +1315,8 @@ edge_badness (struct cgraph_edge *edge, bool dump)
badness = badness.shift (badness > 0 ? 4 : -4);
if ((hints & (INLINE_HINT_indirect_call
| INLINE_HINT_loop_iterations
- | INLINE_HINT_loop_stride))
+ | INLINE_HINT_loop_stride
+ | INLINE_HINT_builtin_constant_p))
|| callee_info->growth <= 0)
badness = badness.shift (badness > 0 ? -2 : 2);
if (hints & (INLINE_HINT_same_scc))
diff --git a/gcc/testsuite/gcc.dg/ipa/inlinehint-5.c b/gcc/testsuite/gcc.dg/ipa/inlinehint-5.c
new file mode 100644
index 00000000000..3dd3a11dd3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/inlinehint-5.c
@@ -0,0 +1,33 @@
+/* { dg-options "-O2 -fdump-ipa-inline-details -fno-early-inlining " } */
+/* { dg-add-options bind_pic_locally } */
+int j,k,l;
+int test3(int);
+int test4(int);
+
+static inline int
+test2(int i)
+{
+ if (__builtin_constant_p (i))
+ {
+ switch (i)
+ {
+ case 1: return j;
+ case 2: return k;
+ case 3: return l;
+ }
+ }
+ else return test3(i)+test4(i);
+}
+
+static inline int
+test (int i)
+{
+ return test2(i) + test2(i+1) + test3 (i) + test3(i) + test3(i);
+}
+
+int
+run (int i)
+{
+ return test (i);
+}
+/* { dg-final { scan-ipa-dump-times "hints: declared_inline builtin_constant_p" 3 "inline" } } */
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (33 preceding siblings ...)
2020-10-20 11:12 ` hubicka at ucw dot cz
@ 2020-10-20 11:17 ` hubicka at ucw dot cz
2020-10-20 11:45 ` hubicka at ucw dot cz
` (20 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-20 11:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #35 from Jan Hubicka <hubicka at ucw dot cz> ---
>
> Original asm is:
>
> __attribute__ ((noinline))
> int fls64(__u64 x)
> {
> int bitpos = -1;
> asm("bsrq %1,%q0"
> : "+r" (bitpos)
> : "rm" (x));
> return bitpos + 1;
> }
>
> There seems to be bug in bsr{q} pattern. I can make GCC produce same
> code with:
>
> __attribute__ ((noinline))
> int
> my_fls64 (__u64 x)
> {
> asm volatile ("movl $-1, %eax");
> return (__builtin_clzll (x) ^ 63) + 1;
> }
Aha, bsr is not doing anything if parameter is 0, so pattern is correct
(just the instruction is undefined for 0 which makes sense).
But with that pattern GCC can't synthetize the code sequence above :)
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (34 preceding siblings ...)
2020-10-20 11:17 ` hubicka at ucw dot cz
@ 2020-10-20 11:45 ` hubicka at ucw dot cz
2020-10-20 13:12 ` hubicka at ucw dot cz
` (19 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-20 11:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #36 from Jan Hubicka <hubicka at ucw dot cz> ---
> Find attached the temporary files for net/core/skbuff.c, as it is indeed what
> initially triggered my focus on this issue. I don't expect such a function at
> all:
>
> 00000cc8 <csum_partial>:
> cc8: 48 00 00 00 b cc8 <csum_partial>
> cc8: R_PPC_REL24 __csum_partial
>
>
> Every function should call __csum_partial() directly.
Here csum_artial is:
static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((__no_instrument_function__)) __wsum csum_add(__wsum csum, __wsum
addend)
{
if (__builtin_constant_p(csum) && csum == 0)
return addend;
if (__builtin_constant_p(addend) && addend == 0)
return csum;
asm("addc %0,%0,%1;"
"addze %0,%0;"
: "+r" (csum) : "r" (addend) : "xer");
return csum;
}
static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((__no_instrument_function__)) __wsum csum_partial(const void
*buff, int len, __wsum sum)
{
if (__builtin_constant_p(len) && len <= 16 && (len & 1) == 0) {
if (len == 2)
sum = csum_add(sum, ( __wsum)*(const u16 *)buff);
if (len >= 4)
sum = csum_add(sum, ( __wsum)*(const u32 *)buff);
if (len == 6)
sum = csum_add(sum, ( __wsum)
*(const u16 *)(buff + 4));
if (len >= 8)
sum = csum_add(sum, ( __wsum)
*(const u32 *)(buff + 4));
if (len == 10)
sum = csum_add(sum, ( __wsum)
*(const u16 *)(buff + 8));
if (len >= 12)
sum = csum_add(sum, ( __wsum)
*(const u32 *)(buff + 8));
if (len == 14)
sum = csum_add(sum, ( __wsum)
*(const u16 *)(buff + 12));
if (len >= 16)
sum = csum_add(sum, ( __wsum)
*(const u32 *)(buff + 12));
} else if (__builtin_constant_p(len) && (len & 3) == 0) {
sum = csum_add(sum, ip_fast_csum_nofold(buff, len >> 2));
} else {
sum = __csum_partial(buff, len, sum);
}
return sum;
}
So again it expands to really large decision tree with many
builtion_constant_p checks that makes inliner to give up.
You should see all such cases easilly with -Winline
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (35 preceding siblings ...)
2020-10-20 11:45 ` hubicka at ucw dot cz
@ 2020-10-20 13:12 ` hubicka at ucw dot cz
2020-10-20 13:28 ` christophe.leroy at csgroup dot eu
` (18 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-20 13:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #37 from Jan Hubicka <hubicka at ucw dot cz> ---
Hi,
this implements the heuristics increasing bounds for functions having
__builtin_constant_p on parameters. Note that for get_order this is
still not enough, since we increase the bound twice if hint applies, so
it goes from 70 to 140 and not to 190 needed, however it will handle
ohter similar cases.
If hint weight is increased to 300%, so 210 I get:
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build10/gcc$ ./xgcc -B ./ -O2
-Winline pipe.i --param inline-heuristics-hint-percent=300
In file included from fs/pipe.c:11:
./include/linux/slab.h: In function ‘alloc_pipe_info’:
./include/linux/slab.h:586:121: warning: inlining failed in call to
‘kmalloc_array.constprop’: --param max-inline-insns-single limit reached
[-Winline]
./include/linux/slab.h:605:9: note: called from here
./include/linux/slab.h: In function ‘pipe_resize_ring’:
./include/linux/slab.h:586:121: warning: inlining failed in call to
‘kmalloc_array.constprop’: --param max-inline-insns-single limit reached
[-Winline]
./include/linux/slab.h:605:9: note: called from here
So the problem only shifts to not inlininig kmalloc_array.
(that is why it would be nice to update kernel with the easier
get_order)
However it shows different problem: ipa-cp produces cone of
kmalloc_array since it is always used by constant size, but the clone
does not update the predicates, so we lose track about the parameter
being constant and that is why we optimize out only late.
Martin, I think this is caused by long lasting TODO in
ipa_fn_summary_t::duplicate and probably we should implement it: based
on the known partial assignment of params to constant we should fold the
conditions in predicates.
Indeed with ./xgcc -B ./ -O2 -Winline pipe.i -fno-ipa-cp --param
inline-heuristics-hint-percent=300
the warning goes away. We still need the stronger hint though.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (36 preceding siblings ...)
2020-10-20 13:12 ` hubicka at ucw dot cz
@ 2020-10-20 13:28 ` christophe.leroy at csgroup dot eu
2020-10-20 13:31 ` jakub at gcc dot gnu.org
` (17 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 13:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #38 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Jan Hubicka from comment #32)
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> >
> > --- Comment #31 from Segher Boessenkool <segher at gcc dot gnu.org> ---
> > (In reply to Jan Hubicka from comment #27)
> > > It is because --param inline-insns-single was reduced for -O2 from 200
> > > to 70. GCC 10 has newly different set of parameters for -O2 and -O3 and
> > > enables auto-inlining at -O2.
> > >
> > > Problem with inlininig funtions declared inline is that C++ codebases
> > > tends to abuse this keyword for things that are really too large (and
> > > get_order would be such example if it did not have builtin_constant_p
> > > check which inliner does not understand well). So having same limit at
> > > -O2 and -O3 turned out to be problematic with respect to code size and
> > > especially with respect to LTO, where a lot more inlining oppurtunities
> > > appear.
> >
> > Do the heuristics account for that not inlining a "static inline" results
> > in multiple copies?
>
> It prevents inlining only when there are multiple calls in the unit
> being compiled (there is no way to know that the same inline function is
> duplicated in other units).
> This is what happens here: there are multiple calls so inliner concludes
> inlining would cost too much of code size and later they are optimized
> away.
>
> get_order is a wrapper around ffs64. This can be implemented w/o asm
> statement as follows:
> int
> my_fls64 (__u64 x)
> {
> if (!x)
> return 0;
> return 64 - __builtin_clzl (x);
> }
>
> This results in longer assembly than the kernel asm implementation. If
> that matters I would replace builtin_constnat_p part of get_order by this
> implementation that is more transparent to the code size estimation and
> things will get inlined.
>
But on powerpc that's already the case and it doesn't solve the issue.
static inline int fls(unsigned int x)
{
return 32 - __builtin_clz(x);
}
static inline int fls64(__u64 x)
{
return 64 - __builtin_clzll(x);
}
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (37 preceding siblings ...)
2020-10-20 13:28 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 13:31 ` jakub at gcc dot gnu.org
2020-10-20 13:51 ` christophe.leroy at csgroup dot eu
` (16 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 13:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #39 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Christophe Leroy from comment #38)
> But on powerpc that's already the case and it doesn't solve the issue.
>
> static inline int fls(unsigned int x)
> {
> return 32 - __builtin_clz(x);
> }
>
> static inline int fls64(__u64 x)
> {
> return 64 - __builtin_clzll(x);
> }
That is clearly a kernel bug (__builtin_clz* is documented undefined for 0,
while fls* wants to be well defined there), and shouldn't change anything,
because
in the if (__builtin_constant_p (size))
case get_order doesn't use fls*, but ilog2. And it is ilog2 that should be
changed.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (38 preceding siblings ...)
2020-10-20 13:31 ` jakub at gcc dot gnu.org
@ 2020-10-20 13:51 ` christophe.leroy at csgroup dot eu
2020-10-20 14:18 ` jakub at gcc dot gnu.org
` (15 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 13:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #40 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Jakub Jelinek from comment #39)
> (In reply to Christophe Leroy from comment #38)
> > But on powerpc that's already the case and it doesn't solve the issue.
> >
> > static inline int fls(unsigned int x)
> > {
> > return 32 - __builtin_clz(x);
> > }
> >
> > static inline int fls64(__u64 x)
> > {
> > return 64 - __builtin_clzll(x);
> > }
>
> That is clearly a kernel bug (__builtin_clz* is documented undefined for 0,
> while fls* wants to be well defined there), and shouldn't change anything,
> because
> in the if (__builtin_constant_p (size))
> case get_order doesn't use fls*, but ilog2. And it is ilog2 that should be
> changed.
What's the bug ?
int f(int x)
{
return __builtin_clz(x);
}
Compiles into
<f>:
cntlzw r3, r3
blr
Powerpc 32 bits documentation says:
cntlzw : Count Leading Zeros Word
A count of the number of consecutive zero bits starting at bit 0 of rS is
placed into rA. This number ranges from 0 to 32, inclusive.
I can't see a problem when x == 0
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (39 preceding siblings ...)
2020-10-20 13:51 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 14:18 ` jakub at gcc dot gnu.org
2020-10-20 14:22 ` christophe.leroy at csgroup dot eu
` (14 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 14:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #41 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
It is documented to be undefined:
-- Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in X, starting at the most
significant bit position. If X is 0, the result is undefined.
Especially GCC 11 (but e.g. clang too) will e.g. during value range propagation
assume that e.g. the builtin return value will be only 0 to 31, not to 32, etc.
The portable way how to write this is x ? __builtin_clz (x) :
whatever_value_you_want_for_clz_0;
and the compiler should recognize that and if the instruction is well defined
for 0 and matches your choice, use optimal sequence.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (40 preceding siblings ...)
2020-10-20 14:18 ` jakub at gcc dot gnu.org
@ 2020-10-20 14:22 ` christophe.leroy at csgroup dot eu
2020-10-20 14:24 ` christophe.leroy at csgroup dot eu
` (13 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 14:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #42 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Jakub Jelinek from comment #41)
> It is documented to be undefined:
> -- Built-in Function: int __builtin_clz (unsigned int x)
> Returns the number of leading 0-bits in X, starting at the most
> significant bit position. If X is 0, the result is undefined.
> Especially GCC 11 (but e.g. clang too) will e.g. during value range
> propagation assume that e.g. the builtin return value will be only 0 to 31,
> not to 32, etc.
> The portable way how to write this is x ? __builtin_clz (x) :
> whatever_value_you_want_for_clz_0;
> and the compiler should recognize that and if the instruction is well
> defined for 0 and matches your choice, use optimal sequence.
int f(int x)
{
return x ? __builtin_clz(x) : 32;
}
Is compiled into (with -O2):
00000000 <f>:
0: 2c 03 00 00 cmpwi r3,0
4: 40 82 00 0c bne 10 <f+0x10>
8: 38 60 00 20 li r3,32
c: 4e 80 00 20 blr
10: 7c 63 00 34 cntlzw r3,r3
14: 4e 80 00 20 blr
Allthough
int g(void)
{
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (41 preceding siblings ...)
2020-10-20 14:22 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 14:24 ` christophe.leroy at csgroup dot eu
2020-10-20 14:29 ` jakub at gcc dot gnu.org
` (12 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 14:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #43 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Christophe Leroy from comment #42)
> (In reply to Jakub Jelinek from comment #41)
> > It is documented to be undefined:
> > -- Built-in Function: int __builtin_clz (unsigned int x)
> > Returns the number of leading 0-bits in X, starting at the most
> > significant bit position. If X is 0, the result is undefined.
> > Especially GCC 11 (but e.g. clang too) will e.g. during value range
> > propagation assume that e.g. the builtin return value will be only 0 to 31,
> > not to 32, etc.
> > The portable way how to write this is x ? __builtin_clz (x) :
> > whatever_value_you_want_for_clz_0;
> > and the compiler should recognize that and if the instruction is well
> > defined for 0 and matches your choice, use optimal sequence.
>
> int f(int x)
> {
> return x ? __builtin_clz(x) : 32;
> }
>
> Is compiled into (with -O2):
>
> 00000000 <f>:
> 0: 2c 03 00 00 cmpwi r3,0
> 4: 40 82 00 0c bne 10 <f+0x10>
> 8: 38 60 00 20 li r3,32
> c: 4e 80 00 20 blr
> 10: 7c 63 00 34 cntlzw r3,r3
> 14: 4e 80 00 20 blr
>
>
>
> Allthough
>
> int g(void)
> {
int g(int x)
{
return __builtin_clz(0);
}
Gives
00000018 <g>:
18: 38 60 00 20 li r3,32
1c: 4e 80 00 20 blr
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (42 preceding siblings ...)
2020-10-20 14:24 ` christophe.leroy at csgroup dot eu
@ 2020-10-20 14:29 ` jakub at gcc dot gnu.org
2020-10-20 16:58 ` jakub at gcc dot gnu.org
` (11 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 14:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #44 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Then perhaps some backends need to be improved.
Try e.g.:
void bar (void);
void
foo (int x)
{
if (__builtin_clz (x) == 32)
bar ();
}
with trunk GCC if you don't trust me.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (43 preceding siblings ...)
2020-10-20 14:29 ` jakub at gcc dot gnu.org
@ 2020-10-20 16:58 ` jakub at gcc dot gnu.org
2020-10-20 21:19 ` segher at gcc dot gnu.org
` (10 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 16:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #45 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > __attribute__ ((noinline))
> > int
> > my_fls64 (__u64 x)
> > {
> > asm volatile ("movl $-1, %eax");
> > return (__builtin_clzll (x) ^ 63) + 1;
> > }
>
> Aha, bsr is not doing anything if parameter is 0, so pattern is correct
> (just the instruction is undefined for 0 which makes sense).
> But with that pattern GCC can't synthetize the code sequence above :)
The docs explicitly say that bsf/bsr DEST is undefined if ZF is set (operand is
zero), so relying on it preserving the previous value of the register is quite
dangerous.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (44 preceding siblings ...)
2020-10-20 16:58 ` jakub at gcc dot gnu.org
@ 2020-10-20 21:19 ` segher at gcc dot gnu.org
2020-10-21 5:38 ` christophe.leroy at csgroup dot eu
` (9 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: segher at gcc dot gnu.org @ 2020-10-20 21:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #46 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Christophe Leroy from comment #43)
> int g(int x)
> {
> return __builtin_clz(0);
> }
>
> Gives
>
> 00000018 <g>:
> 18: 38 60 00 20 li r3,32
> 1c: 4e 80 00 20 blr
That is because rs6000 has
/* The cntlzw and cntlzd instructions return 32 and 64 for input of zero. */
#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
((VALUE) = GET_MODE_BITSIZE (MODE), 2)
This says that at RTL level and in the optabs, clz of 0 *is* defined,
for rs6000. But the builtin is not valid with an arg of 0!
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (45 preceding siblings ...)
2020-10-20 21:19 ` segher at gcc dot gnu.org
@ 2020-10-21 5:38 ` christophe.leroy at csgroup dot eu
2020-10-21 15:04 ` [Bug ipa/97445] " hubicka at gcc dot gnu.org
` (8 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-21 5:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #47 from Christophe Leroy <christophe.leroy at csgroup dot eu> ---
(In reply to Segher Boessenkool from comment #46)
> (In reply to Christophe Leroy from comment #43)
> > int g(int x)
> > {
> > return __builtin_clz(0);
> > }
> >
> > Gives
> >
> > 00000018 <g>:
> > 18: 38 60 00 20 li r3,32
> > 1c: 4e 80 00 20 blr
>
> That is because rs6000 has
>
> /* The cntlzw and cntlzd instructions return 32 and 64 for input of zero. */
> #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> ((VALUE) = GET_MODE_BITSIZE (MODE), 2)
>
> This says that at RTL level and in the optabs, clz of 0 *is* defined,
> for rs6000. But the builtin is not valid with an arg of 0!
I opened bug #97503 for that
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (46 preceding siblings ...)
2020-10-21 5:38 ` christophe.leroy at csgroup dot eu
@ 2020-10-21 15:04 ` hubicka at gcc dot gnu.org
2020-10-21 15:16 ` hubicka at gcc dot gnu.org
` (7 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-10-21 15:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c |ipa
--- Comment #48 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Changing component to IPA.
Concerning comment #37 about summaries not being updated after ipa-cp, I was
actually wrong there: they are updated and the behaviour is quite sane. We work
out that kmalloc has constant argument and produce specialized clone for it.
Because it is estimated quite large it is not inlined. While when ipa-cp is
disabled we work out that inlining it will simplify body a lot and bump up the
limits.
Jakub, concerning
asm volatile ("movl $-1, %eax")
that was of course a hack. I was confused about bsr instruction - for some
time I tought it stores only 8bit value until I re-read the manual.
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (47 preceding siblings ...)
2020-10-21 15:04 ` [Bug ipa/97445] " hubicka at gcc dot gnu.org
@ 2020-10-21 15:16 ` hubicka at gcc dot gnu.org
2020-10-21 18:01 ` cvs-commit at gcc dot gnu.org
` (6 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at gcc dot gnu.org @ 2020-10-21 15:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Depends on| |97519, 97503
--- Comment #49 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Patch posted for the inline heuristics change
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556685.html
Also opened spearate PR on builtin_constant_p folding. I am not sure how to
implement that correctly (what are the conditions that make this valid -
perhaps for all "i op cst" after all?)
Martin, how does the if chain conversion behave on the example?
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
[Bug 97503] Suboptimal use of cntlzw and cntlzd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97519
[Bug 97519] builtin_constant_p (x + cst) should be optimized to
builtin_constant_p (x)
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (48 preceding siblings ...)
2020-10-21 15:16 ` hubicka at gcc dot gnu.org
@ 2020-10-21 18:01 ` cvs-commit at gcc dot gnu.org
2020-10-21 18:21 ` marxin at gcc dot gnu.org
` (5 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-21 18:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #50 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:caaa218f912ccf932fdb79243ded68bb462bbe63
commit r11-4192-gcaaa218f912ccf932fdb79243ded68bb462bbe63
Author: Jan Hubicka <jh@suse.cz>
Date: Wed Oct 21 20:00:22 2020 +0200
Inline functions with builtin_constant_p more agressively.
This patch implements heuristics that increases inline limits (by the hints
mechanism) for inline functions that use builtin_constant_p on parameter.
Those
are very likely intended to be always inlined and simplify after inlining.
The PR is about a function that we used to inline with
--param inline-insns-single=200 but with new default of 70 for -O2 we no
longer
do so. Hints are currently configured to bump the bound up twice, so we
get limit of 140 that is still not enough to inline the particular testcase
but it should help in general. I can implement a stronger bump if that
seems
useful (maybe it is). The example is bit operation written as a decision
chain
with 64 conditions.
This blows up the limit on number of conditions we track per funtion (which
is
30) and thus the size/time estimates are not working that well.
gcc/ChangeLog:
PR ipa/97445
* ipa-fnsummary.c (ipa_dump_hints): Add
INLINE_HINT_builtin_constant_p.
(ipa_fn_summary::~ipa_fn_summary): Free builtin_constant_p_parms.
(ipa_fn_summary_t::duplicate): Duplicate builtin_constant_p_parms.
(ipa_dump_fn_summary): Dump builtin_constant_p_parms.
(add_builtin_constant_p_parm): New function
(set_cond_stmt_execution_predicate): Update
builtin_constant_p_parms.
(ipa_call_context::estimate_size_and_time): Set
INLINE_HINT_builtin_constant_p..
(ipa_merge_fn_summary_after_inlining): Merge
builtin_constant_p_parms.
(inline_read_section): Read builtin_constant_p_parms.
(ipa_fn_summary_write): Write builtin_constant_p_parms.
* ipa-fnsummary.h (enum ipa_hints_vals): Add
INLINE_HINT_builtin_constant_p.
* ipa-inline.c (want_inline_small_function_p): Use
INLINE_HINT_builtin_constant_p.
(edge_badness): Use INLINE_HINT_builtin_constant_p.
gcc/testsuite/ChangeLog:
PR ipa/97445
* gcc.dg/ipa/inlinehint-5.c: New test.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (49 preceding siblings ...)
2020-10-21 18:01 ` cvs-commit at gcc dot gnu.org
@ 2020-10-21 18:21 ` marxin at gcc dot gnu.org
2020-10-21 18:24 ` hubicka at ucw dot cz
` (4 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-10-21 18:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #51 from Martin Liška <marxin at gcc dot gnu.org> ---
> Martin, how does the if chain conversion behave on the example?
I don't see how can if-to-switch conversion pass help us here. It's designed to
identify compact intervals. In this case we see:
<bb 12> :
_16 = _2 & 576460752303423488;
if (_16 == 0)
goto <bb 13>; [INV]
else
goto <bb 72>; [INV]
<bb 13> :
_18 = _2 & 288230376151711744;
if (_18 == 0)
goto <bb 14>; [INV]
else
goto <bb 72>; [INV]
...
So it's a series of masking. Do you see Honza how to convert it?
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (50 preceding siblings ...)
2020-10-21 18:21 ` marxin at gcc dot gnu.org
@ 2020-10-21 18:24 ` hubicka at ucw dot cz
2020-10-21 18:25 ` marxin at gcc dot gnu.org
` (3 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-21 18:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #52 from Jan Hubicka <hubicka at ucw dot cz> ---
> I don't see how can if-to-switch conversion pass help us here. It's designed to
> identify compact intervals. In this case we see:
>
> <bb 12> :
> _16 = _2 & 576460752303423488;
> if (_16 == 0)
> goto <bb 13>; [INV]
> else
> goto <bb 72>; [INV]
>
> <bb 13> :
> _18 = _2 & 288230376151711744;
> if (_18 == 0)
> goto <bb 14>; [INV]
> else
> goto <bb 72>; [INV]
> ...
>
> So it's a series of masking. Do you see Honza how to convert it?
It goes from 1 to 1<<63, so each of tests translates to a range.
Honza
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (51 preceding siblings ...)
2020-10-21 18:24 ` hubicka at ucw dot cz
@ 2020-10-21 18:25 ` marxin at gcc dot gnu.org
2020-10-21 18:34 ` hubicka at ucw dot cz
` (2 subsequent siblings)
55 siblings, 0 replies; 62+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-10-21 18:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #53 from Martin Liška <marxin at gcc dot gnu.org> ---
> It goes from 1 to 1<<63, so each of tests translates to a range.
Yes, but these ranges are very large, nothing for a jump table or a bit-test.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (52 preceding siblings ...)
2020-10-21 18:25 ` marxin at gcc dot gnu.org
@ 2020-10-21 18:34 ` hubicka at ucw dot cz
2020-10-21 18:38 ` marxin at gcc dot gnu.org
2020-10-21 23:42 ` cvs-commit at gcc dot gnu.org
55 siblings, 0 replies; 62+ messages in thread
From: hubicka at ucw dot cz @ 2020-10-21 18:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #54 from Jan Hubicka <hubicka at ucw dot cz> ---
> > It goes from 1 to 1<<63, so each of tests translates to a range.
>
> Yes, but these ranges are very large, nothing for a jump table or a bit-test.
Yep, but theoretically you can recover the decision table and pattern
match it is a bit builtin. Not sure how many open coded bit builtins
are there.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (53 preceding siblings ...)
2020-10-21 18:34 ` hubicka at ucw dot cz
@ 2020-10-21 18:38 ` marxin at gcc dot gnu.org
2020-10-21 23:42 ` cvs-commit at gcc dot gnu.org
55 siblings, 0 replies; 62+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-10-21 18:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #55 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #54)
> > > It goes from 1 to 1<<63, so each of tests translates to a range.
> >
> > Yes, but these ranges are very large, nothing for a jump table or a bit-test.
> Yep, but theoretically you can recover the decision table and pattern
> match it is a bit builtin. Not sure how many open coded bit builtins
> are there.
Such a pattern matching is a bit similar to PR90838.
Anyway, not planning to implement it now.
^ permalink raw reply [flat|nested] 62+ messages in thread
* [Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined
2020-10-15 13:41 [Bug c/97445] New: Some fonctions marked static inline in Linux kernel are not inlined christophe.leroy at csgroup dot eu
` (54 preceding siblings ...)
2020-10-21 18:38 ` marxin at gcc dot gnu.org
@ 2020-10-21 23:42 ` cvs-commit at gcc dot gnu.org
55 siblings, 0 replies; 62+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-21 23:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
--- Comment #56 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:3fd5876793ddf882994acafc9c5b28e3be8897bd
commit r11-4196-g3fd5876793ddf882994acafc9c5b28e3be8897bd
Author: Jan Hubicka <jh@suse.cz>
Date: Thu Oct 22 01:42:11 2020 +0200
Strenghten bound for bulitin_constant_p hint.
this patch makes builtin_constant_p hint to combine with other loop hints
we already support.
gcc/ChangeLog:
2020-10-22 Jan Hubicka <hubicka@ucw.cz>
PR ipa/97445
* ipa-inline.c (inline_insns_single): Add hint2 parameter.
(inline_insns_auto): Add hint2 parameter.
(can_inline_edge_by_limits_p): Update.
(want_inline_small_function_p): Update.
(wrapper_heuristics_may_apply): Update.
^ permalink raw reply [flat|nested] 62+ messages in thread