public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110591] New: [i386] (Maybe) Missed optimisation: _cmpccxadd sets flags
@ 2023-07-07 17:13 thiago at kde dot org
  2023-07-10  2:21 ` [Bug target/110591] " crazylht at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: thiago at kde dot org @ 2023-07-07 17:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110591

            Bug ID: 110591
           Summary: [i386] (Maybe) Missed optimisation: _cmpccxadd sets
                    flags
           Product: gcc
           Version: 13.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thiago at kde dot org
  Target Milestone: ---

In:
#include <immintrin.h>

bool increment_if(int *ptr, int v)
{
    return _cmpccxadd_epi32(ptr, v, 1, _CMPCCX_Z) == v;
}

GCC generates (and current Clang does the same):

increment_if(int*, int):
        movl    $1, %edx
        movl    %esi, %eax
        cmpzxadd        %edx, %eax, (%rdi)
        cmpl    %eax, %esi
        sete    %al
        ret

The CMPccXADD instructions set EFLAGS to the result of the comparison of their
memory operand to the middle one, which will get the current value of that
memory location whether the comparison succeeded or not. That means the CMP
instruction on the next line is superfluous, since it'll set the flags to
exactly what they are already set to. That means this particular example could
be written:

        movl    $1, %edx
        cmpzxadd        %edx, %esi, (%rdi)
        sete    %al
        ret

Saving 2 retire slots and 1 uop. This can be done every time the result of the
intrinsic is compared to the same value that was passed as the intrinsic's
second parameter.

However, in a real workload, this function is likely to be inlined, where the
extra MOV may not be present at all and the CMP is likely to be followed by a
Jcc instead of a SETcc. For the latter case, the CMP+Jcc would be macro-fused,
so there would be no 1-uop gain. Moreover, this atomic operation is likely
going to be multiple cycles long and the conditional code after it probably
can't be speculated very well either.

I'll leave it up to you to decide whether it's worth pursuing this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-30 10:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07 17:13 [Bug target/110591] New: [i386] (Maybe) Missed optimisation: _cmpccxadd sets flags thiago at kde dot org
2023-07-10  2:21 ` [Bug target/110591] " crazylht at gmail dot com
2023-07-10  4:27 ` crazylht at gmail dot com
2023-07-10  6:22 ` crazylht at gmail dot com
2023-07-18  3:31 ` cvs-commit at gcc dot gnu.org
2023-07-18  3:32 ` crazylht at gmail dot com
2023-11-30 10:53 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).