public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/108614] New: _subborrow_u32 generates suboptimal code when second subtraction operand is constant on x86 targets
@ 2023-01-31 12:39 john_platts at hotmail dot com
  0 siblings, 0 replies; only message in thread
From: john_platts at hotmail dot com @ 2023-01-31 12:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108614

            Bug ID: 108614
           Summary: _subborrow_u32 generates suboptimal code when second
                    subtraction operand is constant on x86 targets
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: john_platts at hotmail dot com
  Target Milestone: ---

Here is some C++ code that generates suboptimal code with the -O2
-march=skylake-avx512 -m32 options with gcc 12.2.0:
#include <stdint.h>
#include <utility>
#include <x86intrin.h>
#include <immintrin.h>

std::pair<uint32_t, uint32_t> ComputeHiMaskAndHiZeroAmt(uint32_t len) {
    uint32_t hiMask;
    uint32_t hiZeroAmt;

    _addcarry_u32(_subborrow_u32(0, len, 32, &hiZeroAmt),
        uint32_t{0xFFFFFFFFu}, 0, &hiMask);

    hiMask = _bzhi_u32(hiMask, hiZeroAmt);

    return std::make_pair(hiMask, hiZeroAmt);
}

Here is the assembly code that is generated when the above code is compiled
with gcc 12.2.0 with the -O2 -march=skylake-avx512 -m32 options:
_Z25ComputeHiMaskAndHiZeroAmtj:
        subl    $16, %esp
        movl    24(%esp), %eax
        movl    $32, %edx
        subl    %edx, %eax
        movl    20(%esp), %ecx
        movl    $-1, %edx
        adcl    $0, %edx
        movl    %eax, 4(%ecx)
        bzhi    %eax, %edx, %edx
        movl    %ecx, %eax
        movl    %edx, (%ecx)
        addl    $16, %esp
        ret     $4

Here is a more optimal version of the above code (for 32-bit x86):
_Z25ComputeHiMaskAndHiZeroAmtj:
        movl    8(%esp), %eax
        subl    $32, %eax
        movl    4(%esp), %ecx
        movl    $-1, %edx
        adcl    $0, %edx
        movl    %eax, 4(%ecx)
        bzhi    %eax, %edx, %edx
        movl    %ecx, %eax
        movl    %edx, (%ecx)
        ret     $4

Here is the assembly code that is generated when the above code is compiled
with gcc 12.2.0 with the -O2 -march=skylake-avx512 options:
_Z25ComputeHiMaskAndHiZeroAmtj:
        movl    $32, %eax
        subl    %eax, %edi
        movl    $-1, %eax
        adcl    $0, %eax
        bzhi    %edi, %eax, %eax
        salq    $32, %rdi
        movl    %eax, %eax
        orq     %rdi, %rax
        ret

Here is a more optimal version of the above code (for 64-bit x86):
_Z25ComputeHiMaskAndHiZeroAmtj:
        subl    $32, %edi
        movl    $-1, %eax
        adcl    $0, %eax
        bzhi    %edi, %eax, %eax
        salq    $32, %rdi
        movl    %eax, %eax
        orq     %rdi, %rax
        ret

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-01-31 12:39 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-31 12:39 [Bug target/108614] New: _subborrow_u32 generates suboptimal code when second subtraction operand is constant on x86 targets john_platts at hotmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).