On Thu, 8 Jun 2023 at 09:58, Maxim Kuvyrkov wrote: > Hi Jonathan, > > Interestingly, this increases code-size of -O3 code on aarch64-linux-gnu > on SPEC CPU2017's 641.leela_s benchmark [1]. > > In particular, FastBoard::get_nearby_enemies() grew from 1444 to 2212 > bytes. This seems like a corner-case; the rest of SPEC CPU2017 is, mostly, > neutral to this patch. Is this something you may be interested in > investigating? I'll be happy to assist. > I'd certainly like to avoid the regression, but I'm too dumb to understand most inlining bugs myself. > > Looking at assembly, one of the differences I see is that the "after" > version has calls to realloc_insert(), while "before" version seems to have > them inlined [2]. > > [1] > https://git.linaro.org/toolchain/ci/interesting-commits.git/tree/gcc/sha1/b7b255e77a271974479c34d1db3daafc04b920bc/tcwg_bmk-code_size-cpu2017fast/status.txt > > I find it annoying that adding `if (n < sz) __builtin_unreachable()` seems to affect the size estimates for the function, and so perturbs inlining decisions. That code shouldn't add any actual instructions, so shouldn't affect size estimates. I mentioned this in a meeting last week and Jason suggested checking whether using __builtin_assume has the same undesirable consequences, so I think I'll start by investigating that. > [2] 641.leela_s is non-GPL/non-BSD benchmark, and I'm not sure if I can > post its compiled and/or preprocessed code publicly. I assume RedHat has > SPEC CPU2017 license, and I can post details to you privately. > > Yes, I think I can get the benchmark code from Vlad. Thanks for bringing this to my attention.