public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST
@ 2023-05-26  0:38 benjsith at gmail dot com
  2023-05-26  0:42 ` [Bug rtl-optimization/109973] " benjsith at gmail dot com
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: benjsith at gmail dot com @ 2023-05-26  0:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

            Bug ID: 109973
           Summary: Wrong code for AVX2 since 13.1 by combining VPAND and
                    VPTEST
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: benjsith at gmail dot com
  Target Milestone: ---

The following code is a minimal repro of the issue, when compiled with `gcc -O1
-mavx2`:

#include <immintrin.h>

int do_stuff(__m256i Y0, __m256i Y1, __m128i X2) {
  __m256i And01 = _mm256_and_si256(Y0, Y1);
  int TestResult = _mm256_testc_si256(And01, And01);
  return TestResult;
}

I have also attached the preprocessed version of that minimal repro

12.3 produces the following assembly
        vpand   ymm0, ymm0, ymm1  ; <<<<< missing in 13.1
        mov     eax, 0
        vptest  ymm0, ymm0
        setb    al
        ret

While 13.1 generates:
        mov     eax, 0
        vptest  ymm0, ymm1
        setb    al
        ret

Note that as of 13.1, the VPAND is removed, and just the VPTEST remains.
However, this is incorrect because _mm256_testc_si256 returns the results of
the Carry Flag, which is based on the bitwise AND of the second operand and the
bitwise NOT of the first operand, meaning these two can't be combined like that
for this intrinsic

Here is a Godbolt link showing the issue with a full execution case:
https://godbolt.org/z/x9or4WEWh

This issue is present in 12.3 but not 13.1. A bisect shows that it was most
likely introduced in a56c1641e9d25e46059168e811b4a2f185f07b6b

I have confirmed that this issue is still present on the latest trunk,
8d2fa90a41567670d2dbd4918d19d21d9bec4a8f

-O0 on trunk will also return the correct result

For triage/priority purposes: this bug was not found in manually written code,
but instead from a fuzzer meant to test SIMD codegen

PS: This is my first bug on the GCC tracker, so if I've done anything wrong let
me know. I marked it as in "rtl-optimization" though I'm not sure if that's
correct

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug rtl-optimization/109973] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
@ 2023-05-26  0:42 ` benjsith at gmail dot com
  2023-05-26  0:47 ` [Bug target/109973] [13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: benjsith at gmail dot com @ 2023-05-26  0:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

--- Comment #1 from Benji Smith <benjsith at gmail dot com> ---
Created attachment 55159
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55159&action=edit
A compressed preprocessed minimal repro of the VPAND/VPTEST issue

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
  2023-05-26  0:42 ` [Bug rtl-optimization/109973] " benjsith at gmail dot com
@ 2023-05-26  0:47 ` pinskia at gcc dot gnu.org
  2023-05-26  6:31 ` [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-26  0:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.2
            Summary|Wrong code for AVX2 since   |[13/14 Regression] Wrong
                   |13.1 by combining VPAND and |code for AVX2 since 13.1 by
                   |VPTEST                      |combining VPAND and VPTEST
           Keywords|                            |wrong-code

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The patch which introduced the failure is all in the x86_64 backend so it is
target issue.

r13-2006-ga56c1641e9d25e

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
  2023-05-26  0:42 ` [Bug rtl-optimization/109973] " benjsith at gmail dot com
  2023-05-26  0:47 ` [Bug target/109973] [13/14 Regression] " pinskia at gcc dot gnu.org
@ 2023-05-26  6:31 ` rguenth at gcc dot gnu.org
  2023-05-26  6:56 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-26  6:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-05-26
           Priority|P3                          |P2
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
      Known to fail|                            |13.1.0
      Known to work|                            |12.3.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (2 preceding siblings ...)
  2023-05-26  6:31 ` [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e rguenth at gcc dot gnu.org
@ 2023-05-26  6:56 ` jakub at gcc dot gnu.org
  2023-05-26 12:06 ` benjsith at gmail dot com
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-26  6:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Guess the optimization is perfectly valid when it is just the ZF flag that is
tested, i.e. in bar:

#include <immintrin.h>

int
foo (__m256i x, __m256i y)
{
  __m256i a = _mm256_and_si256 (x, y);
  return _mm256_testc_si256 (a, a);
}

int
bar (__m256i x, __m256i y)
{
  __m256i a = _mm256_and_si256 (x, y);
  return _mm256_testz_si256 (a, a);
}

_mm256_testc_si256 (a, a) is dumb (always returns non-zero because a & ~a is
0), perhaps we could fold it in gimple folding to 1.  Still I'm afraid at RTL
we can't rely on that folding.  One option could be to use CCZmode instead of
CCmode for the _mm*_testz* cases and perform this optimization solely for
CCZmode and not for CCmode that would be used
for _mm*_testc*.  It has a disadvantage that we'd likely not be able to merge
_mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b) (or vice versa).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (3 preceding siblings ...)
  2023-05-26  6:56 ` jakub at gcc dot gnu.org
@ 2023-05-26 12:06 ` benjsith at gmail dot com
  2023-05-29 18:24 ` roger at nextmovesoftware dot com
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: benjsith at gmail dot com @ 2023-05-26 12:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

--- Comment #4 from Benji Smith <benjsith at gmail dot com> ---
Just as follow-up, I also tested the same code with
_mm_and_si128/_mm_testc_si128 on SSE4.1, and the same issue repros (via `gcc
-O1 -msse4.1`):

#include <immintrin.h>

int do_stuff(__m128i Y0, __m128i Y1, __m128i X2) {
  __m128i And01 = _mm_and_si128(Y0, Y1);
  int TestResult = _mm_testc_si128(And01, And01);

  return TestResult;
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (4 preceding siblings ...)
  2023-05-26 12:06 ` benjsith at gmail dot com
@ 2023-05-29 18:24 ` roger at nextmovesoftware dot com
  2023-05-30 13:44 ` cvs-commit at gcc dot gnu.org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-05-29 18:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=80040
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com

--- Comment #5 from Roger Sayle <roger at nextmovesoftware dot com> ---
Many thanks to Benji for reporting this issue.  I've proposed a solution at
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619973.html (following
Jakub's suggestions in comment #3).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (5 preceding siblings ...)
  2023-05-29 18:24 ` roger at nextmovesoftware dot com
@ 2023-05-30 13:44 ` cvs-commit at gcc dot gnu.org
  2023-06-01 14:12 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-30 13:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:69185294f322dd53d4e1592115014c5488302e2e

commit r14-1405-g69185294f322dd53d4e1592115014c5488302e2e
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Tue May 30 14:40:50 2023 +0100

    PR target/107172: Avoid "unusual" MODE_CC comparisons in simplify-rtx.cc

    I believe that a better (or supplementary) fix to PR target/107172 is to
    avoid producing incorrect (but valid) RTL in
    simplify_const_relational_operation when presented with questionable
    (obviously invalid) expressions, such as those produced during combine.
    Just as with the "first do no harm" clause with the Hippocratic Oath,
    simplify-rtx (probably) shouldn't unintentionally transform invalid RTL
    expressions, into incorrect (non-equivalent) but valid RTL that may be
    inappropriately recognized by recog.

    In this specific case, many GCC backends represent their flags register via
    MODE_CC, whose representation is intentionally "opaque" to the middle-end.
    The only use of MODE_CC comprehensible to the middle-end's RTL optimizers
    is relational comparisons between the result of a COMPARE rtx (op0) and
zero
    (op1).  Any other uses of MODE_CC should be left alone, and some might
argue
    indicate representational issues in the backend.

    In practice, CPUs occasionally have numerous instructions that affect the
    flags register(s) other than comparisons [AVR's setc, powerpc's mtcrf,
    x86's clc, stc and cmc and x86_64's ptest that sets C and Z flags in
    non-obvious ways, c.f. PR target/109973].  Currently care has to be taken,
    wrapping these in UNSPEC, to avoid combine inappropriately merging flags
    setters with flags consumers (such as conditional jumps).  It's safer to
    teach simplify_const_relational_operation not to modify expressions that
    it doesn't understand/recognize.

    2023-05-30  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR target/107172
            * simplify-rtx.cc (simplify_const_relational_operation): Return
            early if we have a MODE_CC comparison that isn't a COMPARE against
            const0_rtx.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (6 preceding siblings ...)
  2023-05-30 13:44 ` cvs-commit at gcc dot gnu.org
@ 2023-06-01 14:12 ` cvs-commit at gcc dot gnu.org
  2023-06-03 18:01 ` benjsith at gmail dot com
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-01 14:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:3635e8c67e13e3da7e1e23a617dd9952218e93e0

commit r14-1466-g3635e8c67e13e3da7e1e23a617dd9952218e93e0
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Thu Jun 1 15:10:09 2023 +0100

    PR target/109973: CCZmode and CCCmode variants of [v]ptest on x86.

    This is my proposed minimal fix for PR target/109973 (hopefully suitable
    for backporting) that follows Jakub Jelinek's suggestion that we introduce
    CCZmode and CCCmode variants of ptest and vptest, so that the i386
    backend treats [v]ptest instructions similarly to testl instructions;
    using different CCmodes to indicate which condition flags are desired,
    and then relying on the RTL cmpelim pass to eliminate redundant tests.

    This conveniently matches Intel's intrinsics, that provide different
    functions for retrieving different flags, _mm_testz_si128 tests the
    Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
    same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
    optimization is only valid when the ptest/vptest instruction is used to
    set/test the Z flag.

    The downside, as predicted by Jakub, is that GCC's cmpelim pass is
    currently COMPARE-centric and not able to merge the ptests from expressions
    such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
    known issue, PR target/80040.

    2023-06-01  Roger Sayle  <roger@nextmovesoftware.com>
                Uros Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog
            PR target/109973
            * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
            CODE_for_sse4_1_ptestzv2di.
            (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
            (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
            (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
            * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
            when expanding UNSPEC_PTEST to compare against zero.
            * config/i386/i386-features.cc (scalar_chain::convert_compare):
            Likewise generate CCZmode UNSPEC_PTESTs when converting
comparisons.
            (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
result.
            (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE
result.
            * config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype.
            * config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to
            check for suitable matching modes for the UNSPEC_PTEST pattern.
            * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
            to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
            (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.  Remove
            ":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode.
            (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
            (<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
            (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
            current behavior.
            (*ptest<mode>_and): Specify CCZ to only perform this optimization
            when only the Z flag is required.

    gcc/testsuite/ChangeLog
            PR target/109973
            * gcc.target/i386/pr109973-1.c: New test case.
            * gcc.target/i386/pr109973-2.c: Likewise.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (7 preceding siblings ...)
  2023-06-01 14:12 ` cvs-commit at gcc dot gnu.org
@ 2023-06-03 18:01 ` benjsith at gmail dot com
  2023-06-04 11:01 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: benjsith at gmail dot com @ 2023-06-03 18:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

--- Comment #8 from Benji Smith <benjsith at gmail dot com> ---
Yes, just pulled latest trunk and confirmed that the issue no longer repros.
Thanks for the fix!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (8 preceding siblings ...)
  2023-06-03 18:01 ` benjsith at gmail dot com
@ 2023-06-04 11:01 ` cvs-commit at gcc dot gnu.org
  2023-06-22  6:44 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-04 11:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:8ab9fb6b8e05cf9acca7bd8282979ede53524cf1

commit r14-1528-g8ab9fb6b8e05cf9acca7bd8282979ede53524cf1
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Sun Jun 4 11:59:32 2023 +0100

    PR target/110083: Fix-up REG_EQUAL notes on COMPARE in STV.

    This patch fixes PR target/110083, an ICE-on-valid regression exposed by
    my recent PTEST improvements (to address PR target/109973).  The latent
    bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't
update
    or delete REG_EQUAL notes attached to COMPARE instructions.  As a result
    the operands of COMPARE would be mismatched, with the register transformed
    to V1TImode, but the immediate operand left as const_wide_int, which is
    valid for TImode but not V1TImode.  This remained latent when the STV
    conversion converted the mode of the COMPARE to CCmode, with later passes
    recognizing the REG_EQUAL note is obviously invalid as the modes didn't
    match, but now that we (correctly) preserve the CCZmode on COMPARE, the
    mismatched operand modes trigger a sanity checking ICE downstream.

    Fixed by updating (or deleting) any REG_EQUAL notes in convert_compare.

    Before:
        (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
            (const_wide_int 0x80000000000000000000000000000000))

    After:
        (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
            (const_vector:V1TI [
                (const_wide_int 0x80000000000000000000000000000000)
             ]))

    2023-06-04  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR target/110083
            * config/i386/i386-features.cc (scalar_chain::convert_compare):
            Update or delete REG_EQUAL notes, converting CONST_INT and
            CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR.

    gcc/testsuite/ChangeLog
            PR target/110083
            * gcc.target/i386/pr110083.c: New test case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (9 preceding siblings ...)
  2023-06-04 11:01 ` cvs-commit at gcc dot gnu.org
@ 2023-06-22  6:44 ` cvs-commit at gcc dot gnu.org
  2023-07-03 13:28 ` [Bug target/109973] [13 " roger at nextmovesoftware dot com
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-22  6:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:5322f009e8f7d1c7a1c9aab7cb4c90c433398fdd

commit r14-2030-g5322f009e8f7d1c7a1c9aab7cb4c90c433398fdd
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Thu Jun 22 07:43:07 2023 +0100

    i386: Convert ptestz of pandn into ptestc.

    This patch is the next installment in a set of backend patches around
    improvements to ptest/vptest.  A previous patch optimized the sequence
    t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the
    property that ZF is set to (X&Y) == 0.  This patch performs a similar
    transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost)
    equivalent ptestc(y,x), using the property that the CF flags is set to
    (~X&Y) == 0.  The tricky bit is that this sets the CF flag instead of
    the ZF flag, so we can only perform this transformation when we can
    also convert the flags consumer, as well as the producer.

    For the test case:

    int foo (__m128i x, __m128i y)
    {
      __m128i a = x & ~y;
      return __builtin_ia32_ptestz128 (a, a);
    }

    With -O2 -msse4.1 we previously generated:

    foo:    pandn   %xmm0, %xmm1
            xorl    %eax, %eax
            ptest   %xmm1, %xmm1
            sete    %al
            ret

    with this patch we now generate:

    foo:    xorl    %eax, %eax
            ptest   %xmm0, %xmm1
            setc    %al
            ret

    At the same time, this patch also provides alternative fixes for
    PR target/109973 and PR target/110118, by recognizing that ptestc(x,x)
    always sets the carry flag (X&~X is always zero).  This is achieved
    both by recognizing the special case in ix86_expand_sse_ptest and with
    a splitter to convert an eligible ptest into an stc.

    2023-06-22  Roger Sayle  <roger@nextmovesoftware.com>
                Uros Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog
            * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize
            expansion of ptestc with equal operands as producing const1_rtx.
            * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost
            estimates of UNSPEC_PTEST, where the ptest performs the PAND
            or PAND of its operands.
            * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST
            of reg_equal_p operands into an x86_stc instruction.
            (define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c.
            (define_split): Similar to above for strict_low_part destinations.
            (define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c.

    gcc/testsuite/ChangeLog
            * gcc.target/i386/avx-vptest-4.c: New test case.
            * gcc.target/i386/avx-vptest-5.c: Likewise.
            * gcc.target/i386/avx-vptest-6.c: Likewise.
            * gcc.target/i386/pr109973-1.c: Update test case.
            * gcc.target/i386/pr109973-2.c: Likewise.
            * gcc.target/i386/sse4_1-ptest-4.c: New test case.
            * gcc.target/i386/sse4_1-ptest-5.c: Likewise.
            * gcc.target/i386/sse4_1-ptest-6.c: Likewise.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (10 preceding siblings ...)
  2023-06-22  6:44 ` cvs-commit at gcc dot gnu.org
@ 2023-07-03 13:28 ` roger at nextmovesoftware dot com
  2023-07-03 13:43 ` jakub at gcc dot gnu.org
  2023-07-20  7:28 ` rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-03 13:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
            Summary|[13/14 Regression] Wrong    |[13 Regression] Wrong code
                   |code for AVX2 since 13.1 by |for AVX2 since 13.1 by
                   |combining VPAND and VPTEST  |combining VPAND and VPTEST
                   |since                       |since
                   |r13-2006-ga56c1641e9d25e    |r13-2006-ga56c1641e9d25e

--- Comment #11 from Roger Sayle <roger at nextmovesoftware dot com> ---
This issue is now fixed on mainline for GCC 14, and a backport of the relevant
bits (minimum viable fix) for GCC 13 was proposed here
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621256.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (11 preceding siblings ...)
  2023-07-03 13:28 ` [Bug target/109973] [13 " roger at nextmovesoftware dot com
@ 2023-07-03 13:43 ` jakub at gcc dot gnu.org
  2023-07-20  7:28 ` rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-07-03 13:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Wrong-code issues like this shouldn't be just closed.
I think you should ping Uros on this, or another option would be to revert on
the branch the change that caused the regression.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/109973] [13 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e
  2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
                   ` (12 preceding siblings ...)
  2023-07-03 13:43 ` jakub at gcc dot gnu.org
@ 2023-07-20  7:28 ` rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-20  7:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |13.1.1
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on the branch by reverting the offending commit instead.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-07-20  7:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26  0:38 [Bug rtl-optimization/109973] New: Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST benjsith at gmail dot com
2023-05-26  0:42 ` [Bug rtl-optimization/109973] " benjsith at gmail dot com
2023-05-26  0:47 ` [Bug target/109973] [13/14 Regression] " pinskia at gcc dot gnu.org
2023-05-26  6:31 ` [Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e rguenth at gcc dot gnu.org
2023-05-26  6:56 ` jakub at gcc dot gnu.org
2023-05-26 12:06 ` benjsith at gmail dot com
2023-05-29 18:24 ` roger at nextmovesoftware dot com
2023-05-30 13:44 ` cvs-commit at gcc dot gnu.org
2023-06-01 14:12 ` cvs-commit at gcc dot gnu.org
2023-06-03 18:01 ` benjsith at gmail dot com
2023-06-04 11:01 ` cvs-commit at gcc dot gnu.org
2023-06-22  6:44 ` cvs-commit at gcc dot gnu.org
2023-07-03 13:28 ` [Bug target/109973] [13 " roger at nextmovesoftware dot com
2023-07-03 13:43 ` jakub at gcc dot gnu.org
2023-07-20  7:28 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).