public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "grasland at lal dot in2p3.fr" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug c/94497] New: Branchless clamp in the general case gets a branch in a particular case ? Date: Mon, 06 Apr 2020 09:35:30 +0000 [thread overview] Message-ID: <bug-94497-4@http.gcc.gnu.org/bugzilla/> (raw) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94497 Bug ID: 94497 Summary: Branchless clamp in the general case gets a branch in a particular case ? Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: grasland at lal dot in2p3.fr Target Milestone: --- (Triage note: I think this is probably a compiler middle-end or back-end issue, but I am not knowledgeable enough about the structure of the GCC codebase to pick the right component.) --- I am trying to make a floating-point computation autovectorization-friendly, without mandating the use of -ffast-math for optimal performance as that is a numerical stability and compiler portability hazard. This turned out to be an interesting exercise in IEEE-754 pedantry, of course, but I can live with that. However, while trying to optimize a "clamp" computation, I ended up at a point where the behavior of the GCC optimizer just does not make sense to me and I could use the opinion of an expert. Consider the following functions: ``` double fast_min(double x, double y) { return (x < y) ? x : y; } double fast_max(double x, double y) { return (x > y) ? x : y; } ``` The definitions of fast_min and fast_max are carefully crafted to match the semantics of x86's min and max instruction family, and indeed if I compile this code with -O1 or above I get minsd/maxsd or vminsd/vmaxsd instructions depending on which vector instruction sets are enabled. This is exactly what I wanted, so far I'm happy. And if I now try to use these min and max functions to write a clamp function... ``` double fast_clamp(double x, double min, double max) { return fast_max(fast_min(x, max), min); } ``` ...again, at -O1 optimization level and above, I get a minsd/maxsd pair, short and sweet: ``` fast_clamp(double, double, double): minsd xmm0, xmm2 maxsd xmm0, xmm1 ret ``` Where this perfect picture becomes tainted, however, is as soon as I try to _use_ this function with certain min/max arguments. ``` double use_fast_clamp(double x) { return fast_clamp(x, 0.0, 1.0); } ``` All of a sudden, the assembly becomes branchy and terrible-looking, even in -O3 mode! ``` use_fast_clamp(double): movapd xmm1, xmm0 movsd xmm0, QWORD PTR .LC0[rip] comisd xmm0, xmm1 jbe .L13 maxsd xmm1, QWORD PTR .LC1[rip] movapd xmm0, xmm1 .L13: ret .LC0: .long 0 .long 1072693248 .LC1: .long 0 .long 0 ``` I can make the generated code go back to a minsd/maxsd pair if I enable -ffast-math (more precisely -ffinite-math-only -funsafe-math-optimizations), but to the best of my knowledge, I shouldn't need fast-math flags here. Further, even if I did forget about an IEEE-754 oddity that requires fast-math flags, it would still mean that the above compilation of the general fast_clamp function is incorrect: if this compilation output should work for any pair of "min" and "max" double-precision arguments, then it trivially should work when the min is 0.0 and max is 1.0. So one way or another, I think the GCC optimizer is doing something strange here. --- This is the most minimal example of this behavior that I managed to come up with. Using only the fast_min or fast_math functions in isolation will behave as expected and codegen into a single minsd or maxsd: ``` double use_fast_min(double x) { return fast_min(x, 1.0); } double use_fast_max(double x) { return fast_max(x, 0.0); } ``` I observed similar behavior on any GCC build I could get my hands on, all the way from the most recent GCC trunk build currently available on godbolt (10.0.1 20200405) to the most ancient build provided by godbolt (4.1.2). Both my local system and godbolt run are Linux-based. My local GCC build was configured with ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,ada,go,d --enable-offload-targets=hsa,nvptx-none=/usr/nvptx-none, --without-cuda-driver --disable-werror --with-gxx-include-dir=/usr/include/c++/9 --enable-ssp --disable-libssp --disable-libvtv --disable-cet --disable-libcc1 --enable-plugin --with-bugurl=https://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-libphobos --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-9 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --with-build-config=bootstrap-lto-lean --enable-link-mutex --build=x86_64-suse-linux --host=x86_64-suse-linux As for godbolt builds, it is easy to go to godbolt.org and add a -v to the compiler options of the build you're interested in, so I will invite you to do that instead of cluttering this already long bug report further. --- FWIW, clang 10 behaves the way I would expect without fast-math flags (and also generates the zero in place with a xorpd instead of loading it from memory, which is kind of cool), but I'm well aware of the danger of comparing the floating-point behavior of various compiler optimizers. So I wouldn't read too much into that: ``` .LCPI5_0: .quad 4607182418800017408 # double 1 use_fast_clamp(double): # @use_fast_clamp(double) minsd xmm0, qword ptr [rip + .LCPI5_0] xorpd xmm1, xmm1 maxsd xmm0, xmm1 ret ``` If you like to experiment on godbolt too, here's my setup: https://godbolt.org/z/eD-guY .
next reply other threads:[~2020-04-06 9:35 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-06 9:35 grasland at lal dot in2p3.fr [this message] 2020-04-06 9:39 ` [Bug middle-end/94497] " pinskia at gcc dot gnu.org 2020-04-06 13:33 ` rguenth at gcc dot gnu.org 2020-04-06 13:34 ` rguenth at gcc dot gnu.org 2020-04-06 13:37 ` rguenth at gcc dot gnu.org 2020-04-06 14:04 ` grasland at lal dot in2p3.fr 2020-04-06 16:31 ` rguenth at gcc dot gnu.org 2021-08-08 23:59 ` pinskia at gcc dot gnu.org 2023-07-20 8:38 ` rguenth at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-94497-4@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).