public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3
@ 2023-10-03 18:27 deodharvinit99 at gmail dot com
  2023-10-03 22:18 ` [Bug middle-end/111683] [11/12/13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (28 more replies)
  0 siblings, 29 replies; 30+ messages in thread
From: deodharvinit99 at gmail dot com @ 2023-10-03 18:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

            Bug ID: 111683
           Summary: Incorrect answer when using SSE2 intrinsics with -O3
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: deodharvinit99 at gmail dot com
  Target Milestone: ---

Created attachment 56042
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56042&action=edit
convn_script

g++ produces incorrect answers when SSE2 intrinsics are used with -O3. 
-O2 produces same answers compared to an equivalent code written without SSE2

A standalone repro shell script:

g++ -O2 convn_script.cpp
./a.out

g++ -O3 convn_script.cpp
./a.out


Target: x86_64-linux-gnu
gcc version 10.2.1 20210110 (Debian 10.2.1-6)


covn_script.cpp:

#include <cstring>
#include <emmintrin.h>
#include <iostream>

// Function Definitions
//
// Check properties of input in9
//
// Arguments    : const double in9[10]
//                const double in10[7]
//                double out5[16]
// Return Type  : void
//
void convn_script(const double in9[10], const double in10[7], double out5[16])
{
  int iB;
  int iC;
  //  Check properties of input in10
  std::memset(&out5[0], 0, 16U * sizeof(double));
  iC = 0;
  iB = 0;
  for (int i{0}; i < 7; i++) {
    int b_i;
    int vectorUB;
    if (i + 10 <= 15) {
      b_i = 9;
    } else {
      b_i = 15 - i;
    }
    vectorUB = (((b_i + 1) / 2) << 1) - 2;
    for (int r{0}; r <= vectorUB; r += 2) {
      __m128d b_r;
      b_i = iC + r;
      b_r = _mm_loadu_pd(&out5[b_i]);
      _mm_storeu_pd(&out5[b_i],
                    _mm_add_pd(b_r, _mm_mul_pd(_mm_set1_pd(in10[iB]),
                                               _mm_loadu_pd(&in9[r]))));
    }
    iC = iB + 1;
    iB++;
  }
}

int main() {

    double in9[10] = {0.8147, 0.9058, 0.1270, 0.9134,
    0.6324,
    0.0975,
    0.2785,
    0.5469,
    0.9575,
    0.9649};
    double in10[7] = { 0.1576,
    0.9706,
    0.9572,
    0.4854,
    0.8003,
    0.1419,
    0.4218};
    double out5[16];

    convn_script(in9, in10, out5);

    for(int i = 0; i < 16; i++) {

        std::cout << "Out[" << i << "] = " << out5[i] << "\n";
    }

}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
@ 2023-10-03 22:18 ` pinskia at gcc dot gnu.org
  2023-10-04  9:55 ` rguenth at gcc dot gnu.org
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-03 22:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
      Known to fail|                            |7.1.0
           Keywords|                            |needs-bisection,
                   |                            |needs-reduction
   Last reconfirmed|                            |2023-10-03
   Target Milestone|---                         |11.5
            Summary|Incorrect answer when using |[11/12/13/14 Regression]
                   |SSE2 intrinsics with -O3    |Incorrect answer when using
                   |                            |SSE2 intrinsics with -O3
             Status|UNCONFIRMED                 |NEW
      Known to work|                            |6.1.0, 6.4.0

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. Started between GCC 6 and GCC 7. Looks unrolling related too.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
  2023-10-03 22:18 ` [Bug middle-end/111683] [11/12/13/14 Regression] " pinskia at gcc dot gnu.org
@ 2023-10-04  9:55 ` rguenth at gcc dot gnu.org
  2023-10-04 16:56 ` amonakov at gcc dot gnu.org
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-04  9:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |13.2.1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Just to say, -fno-strict-aliasing -fstack-reuse=none doesn't help. 
convn_script can be made 'noipa' and still reproduces the failure. 
-fno-tree-vectorize
changes code and result but still differs from -O2.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
  2023-10-03 22:18 ` [Bug middle-end/111683] [11/12/13/14 Regression] " pinskia at gcc dot gnu.org
  2023-10-04  9:55 ` rguenth at gcc dot gnu.org
@ 2023-10-04 16:56 ` amonakov at gcc dot gnu.org
  2023-12-12 10:27 ` jakub at gcc dot gnu.org
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-10-04 16:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
This is predcom. It's easier to see what's going wrong with

#pragma GCC unroll 99

added to the innermost loop so it's unrolled at -O2 and comparing

g++ -O2 -fpredictive-commoning --param=max-unroll-times=1

vs. plain g++ -O2 (or diffing 'pcom' dump against the preceding pass).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (2 preceding siblings ...)
  2023-10-04 16:56 ` amonakov at gcc dot gnu.org
@ 2023-12-12 10:27 ` jakub at gcc dot gnu.org
  2024-03-07 21:06 ` law at gcc dot gnu.org
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-12-12 10:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org
           Keywords|needs-bisection             |

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
Slightly reduced:
#include <emmintrin.h>

void convn_script(const double in9[10], const double in10[7], double out5[16])
{
  int iB = 0;
  int iC = 0;
  __builtin_memset(&out5[0], 0, 16U * sizeof(double));
  for (int i = 0; i < 7; i++) {
    int b_i;
    int vectorUB;
    if (i + 10 <= 15)
      b_i = 9;
    else
      b_i = 15 - i;
    vectorUB = (((b_i + 1) / 2) << 1) - 2;
    for (int r = 0; r <= vectorUB; r += 2) {
      __m128d b_r;
      b_i = iC + r;
      b_r = _mm_loadu_pd(&out5[b_i]);
      _mm_storeu_pd(&out5[b_i],
                    _mm_add_pd(b_r, _mm_mul_pd(_mm_set1_pd(in10[iB]),
                                               _mm_loadu_pd(&in9[r]))));
    }
    iC = iB + 1;
    iB++;
  }
}

int main() {
  double in9[10] = {0.8147, 0.9058, 0.1270, 0.9134, 0.6324, 0.0975, 0.2785,
0.5469, 0.9575, 0.9649};
  double in10[7] = {0.1576, 0.9706, 0.9572, 0.4854, 0.8003, 0.1419, 0.4218};
  double out5[16];
  convn_script(in9, in10, out5);
  for(int i = 0; i < 16; i++)
    __builtin_printf ("%d %f\n", i, out5[i]);
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (3 preceding siblings ...)
  2023-12-12 10:27 ` jakub at gcc dot gnu.org
@ 2024-03-07 21:06 ` law at gcc dot gnu.org
  2024-03-10  5:55 ` [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e kugan at gcc dot gnu.org
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 21:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org
           Priority|P3                          |P1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (4 preceding siblings ...)
  2024-03-07 21:06 ` law at gcc dot gnu.org
@ 2024-03-10  5:55 ` kugan at gcc dot gnu.org
  2024-03-10 11:15 ` sjames at gcc dot gnu.org
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: kugan at gcc dot gnu.org @ 2024-03-10  5:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #5 from kugan at gcc dot gnu.org ---
 -O3 -fno-tree-vectorize  and -O3 -fno-tree-vrp works. I looked at the ever
dump and it is not doing anything suspicious. Looks like range_info usage in
vectoriser is causing the problem.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (5 preceding siblings ...)
  2024-03-10  5:55 ` [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e kugan at gcc dot gnu.org
@ 2024-03-10 11:15 ` sjames at gcc dot gnu.org
  2024-03-12  1:58 ` pinskia at gcc dot gnu.org
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: sjames at gcc dot gnu.org @ 2024-03-10 11:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #6 from Sam James <sjames at gcc dot gnu.org> ---
Created attachment 57660
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57660&action=edit
reduced.c

Bit smaller.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (6 preceding siblings ...)
  2024-03-10 11:15 ` sjames at gcc dot gnu.org
@ 2024-03-12  1:58 ` pinskia at gcc dot gnu.org
  2024-03-12  2:24 ` pinskia at gcc dot gnu.org
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-12  1:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57677
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57677&action=edit
Slightly more reduced

Removes the extra induction variables and manually unroll the inner loop.

What I am seeing is the first store is being removed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (7 preceding siblings ...)
  2024-03-12  1:58 ` pinskia at gcc dot gnu.org
@ 2024-03-12  2:24 ` pinskia at gcc dot gnu.org
  2024-03-12  2:31 ` pinskia at gcc dot gnu.org
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-12  2:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-reduction             |

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note with the reduced testcase in comment #7, -fno-tree-vrp has no effect on it
any more so it might make sense to do another bisection on that testcase.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (8 preceding siblings ...)
  2024-03-12  2:24 ` pinskia at gcc dot gnu.org
@ 2024-03-12  2:31 ` pinskia at gcc dot gnu.org
  2024-03-12  2:44 ` pinskia at gcc dot gnu.org
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-12  2:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> Note with the reduced testcase in comment #7, -fno-tree-vrp has no effect on
> it any more so it might make sense to do another bisection on that testcase.

I am suspecting r7-104-gfc9cf6da84a899 exposed the issue now. in GCC 6,
_mm_storeu_pd/_mm_loadu_pd were still builtins, but in GCC 7, they become
directly unaligned loads (starting with r7-104-gfc9cf6da84a899 ).

I suspect if manually change _mm_storeu_pd/_mm_loadu_pd to be similar to what
GCC 7 had, then we might even up with this failing before GCC 7.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (9 preceding siblings ...)
  2024-03-12  2:31 ` pinskia at gcc dot gnu.org
@ 2024-03-12  2:44 ` pinskia at gcc dot gnu.org
  2024-03-12 13:23 ` jakub at gcc dot gnu.org
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-12  2:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57678
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57678&action=edit
testcase that fails all the way back to GCC 4.8.x

This one fails until GCC 4.8.1, works in GCC 4.7.4 but that is because the
outer loop is unrolled in GCC 4.7.4.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (10 preceding siblings ...)
  2024-03-12  2:44 ` pinskia at gcc dot gnu.org
@ 2024-03-12 13:23 ` jakub at gcc dot gnu.org
  2024-03-19 17:28 ` jakub at gcc dot gnu.org
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-12 13:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 7-13 has been released with this bug, so P2.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (11 preceding siblings ...)
  2024-03-12 13:23 ` jakub at gcc dot gnu.org
@ 2024-03-19 17:28 ` jakub at gcc dot gnu.org
  2024-03-19 18:17 ` pinskia at gcc dot gnu.org
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-19 17:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Further simplified without any headers, -O1 works, -O2 fails.
double a[2] = { 0.8147, 0.9058 };
double b[6] = { 0.1576, 0.9706, 1, 1, 1, 1 };
double c[16];
typedef double U __attribute__ ((vector_size(16), may_alias, aligned(1)));
typedef double V __attribute__ ((vector_size(16), may_alias));

int
main ()
{
  for (int f = 0; f < 6; f++)
    {
      *(U *) &c[f] = *(U *) &c[f] + (V) { b[f], b[f] } * *(U *) a;
      *(U *) &c[f + 2] = *(U *) &c[f + 2] + (V) { b[f], b[f] } * *(U *) a;
    }
  if (c[1] <= 0.93 || c[1] >= 0.94)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (12 preceding siblings ...)
  2024-03-19 17:28 ` jakub at gcc dot gnu.org
@ 2024-03-19 18:17 ` pinskia at gcc dot gnu.org
  2024-03-21 16:47 ` jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-19 18:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #12)
> Further simplified without any headers, -O1 works, -O2 fails.

For this testcase (at -O3), GCC 4.7.4 works but 4.8.1 fails.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (13 preceding siblings ...)
  2024-03-19 18:17 ` pinskia at gcc dot gnu.org
@ 2024-03-21 16:47 ` jakub at gcc dot gnu.org
  2024-03-22  7:33 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-21 16:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
It is indeed pcom.
Slightly better testcase which doesn't use double arithmetics but long long:
long long b[6] = { 3, 4, 5, 6, 7, 8 };
long long c[16];
typedef long long U __attribute__ ((vector_size(16), may_alias, aligned(1)));
typedef long long V __attribute__ ((vector_size(16), may_alias));

int
main ()
{
  for (int f = 0; f < 6; f++)
    {
      *(U *) &c[f] = *(U *) &c[f] + (V) { b[f], b[f] };
      *(U *) &c[f + 2] = *(U *) &c[f + 2] + (V) { b[f], b[f] };
    }
  if (c[1] != 7)
    __builtin_abort ();
  return 0;
}

And it doesn't need to be about vector types either, nor use any aligned(1)
loads/stores in the source:
int b[6] = { 3, 4, 5, 6, 7, 8 }, c[12];
int d[16] = { 0, 1, 3, 6, 10, 14, 12, 9, 5, 0, 0, 0 };

int
main ()
{
  int i;
  if (sizeof (int) * 2 != sizeof (long long))
    return 0;
  for (i = 0; i < 6; i++)
    {
      long long a;
      __builtin_memcpy (&a, &c[i], sizeof (a));
      a += (((long long) i) << (sizeof (int) * __CHAR_BIT__)) + i;
      __builtin_memcpy (&c[i], &a, sizeof (a));
      __builtin_memcpy (&a, &c[i + 2], sizeof (a));
      a += (((long long) i) << (sizeof (int) * __CHAR_BIT__)) + i;
      __builtin_memcpy (&c[i + 2], &a, sizeof (a));
    }
  if (__builtin_memcmp (&c[0], &d[0], sizeof (c)))
    __builtin_abort ();
  return 0;
}
On the last testcase the *.pcom change against previous dump is:
 int main ()
 {
+  unsigned long D__lsm1.7;
+  unsigned long D__lsm0.6;
   long long int a;
   int i;
   int * _1;
@@ -21,18 +31,24 @@ int main ()
   unsigned long _20;
   sizetype _25;
   sizetype _28;
+  unsigned long _33;
+  unsigned long _37;
   unsigned int ivtmp_38;
   unsigned int ivtmp_39;

   <bb 2> [local count: 153437704]:
+  _37 = MEM <unsigned long> [(char * {ref-all})&c];
+  _33 = MEM <unsigned long> [(char * {ref-all})&c + 4B];

   <bb 3> [local count: 920304121]:
   # i_26 = PHI <i_23(7), 0(2)>
   # ivtmp_39 = PHI <ivtmp_38(7), 6(2)>
+  # D__lsm0.6_34 = PHI <D__lsm1.7_35(7), _37(2)>
+  # D__lsm1.7_35 = PHI <D__lsm0.6_36(7), _33(2)>
   _28 = (sizetype) i_26;
   _9 = _28 * 4;
   _1 = &c + _9;
-  _12 = MEM <unsigned long> [(char * {ref-all})_1];
+  _12 = D__lsm0.6_34;
   a_13 = (long long int) _12;
   _19 = _28 * 4294967297;
   _2 = (long long int) _19;
@@ -48,6 +64,7 @@ int main ()
   _6 = _2 + a_18;
   _20 = (unsigned long) _6;
   MEM <unsigned long> [(char * {ref-all})_5] = _20;
+  D__lsm0.6_36 = _20;
   i_23 = i_26 + 1;
   ivtmp_38 = ivtmp_39 - 1;
   if (ivtmp_38 != 0)

The data refs in the loop have access fns {0B, +, 4}_1 and {8B, +, 4}_1, so
within
the same iteration there is no overlap between the two, but because they are
actually
8 byte loads/stores, there is partial overlap between the adjacent iterations
in them that pcom doesn't take into account.

So, shall we somewhere punt for references with larger access sizes than their
DR_STEP?
Something else?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (14 preceding siblings ...)
  2024-03-21 16:47 ` jakub at gcc dot gnu.org
@ 2024-03-22  7:33 ` rguenth at gcc dot gnu.org
  2024-03-22  7:38 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22  7:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
the problem is in dependence analysis itself - by design classic dependence
analysis relies on a[i] and a[i+1] not aliasing but for DRs like

Creating dr for MEM[(U * {ref-all})_1]
analyze_innermost: success.
        base_address: &c
        offset from base address: 0
        constant offset from base address: 0
        step: 8
        base alignment: 32
        base misalignment: 0 
        offset alignment: 128
        step alignment: 8
        base_object: MEM[(U * {ref-all})&c]
        Access function 0: {0B, +, 8}_1

The access function evolves in a way that this doesn't hold.

It might be tempting to fail data-ref analysis itself but it's really
subscript dependence testing that doesn't work.  OTOH we already
have code in dr_analyze_indices to deal with similar cases concerning
overlap by adjusted base offset.

Note in theory we handle non-constant increments, so we eventually
would have to ditch support for those at all.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (15 preceding siblings ...)
  2024-03-22  7:33 ` rguenth at gcc dot gnu.org
@ 2024-03-22  7:38 ` rguenth at gcc dot gnu.org
  2024-03-22  7:47 ` jakub at gcc dot gnu.org
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22  7:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 57766
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57766&action=edit
patch

Oddly enough the following patch doesn't fix it but it correctly prevents us
from recording the access functions and data dependence analysis now says

(Data Dep:
#(Data Ref:
#  bb: 3
#  stmt: _2 = MEM[(U * {ref-all})_1];
#  ref: MEM[(U * {ref-all})_1];
#  base_object: MEM[(U * {ref-all})_1];
#)
#(Data Ref:
#  bb: 3
#  stmt: _7 = MEM[(U * {ref-all})_6];
#  ref: MEM[(U * {ref-all})_6];
#  base_object: MEM[(U * {ref-all})_6];
#)
    (don't know)

but predictive commoning still does

Store-loads chain 0x48229b0
  max distance 2, may reuse first
  inits MEM[(U * {ref-all})&c] MEM[(U * {ref-all})&c + 8B]
  references:
    MEM[(U * {ref-all})_6] (id 3, write)
      offset -2
      distance 0
    MEM[(U * {ref-all})_1] (id 0)
      offset 0
      distance 2

Executing predictive commoning without unrolling.

predcom has its own analysis it seems and it makes a similar mistake.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (16 preceding siblings ...)
  2024-03-22  7:38 ` rguenth at gcc dot gnu.org
@ 2024-03-22  7:47 ` jakub at gcc dot gnu.org
  2024-03-22  7:50 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-22  7:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 57767
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57767&action=edit
gcc14-pr111683.patch

Patch I've tested overnight for this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (17 preceding siblings ...)
  2024-03-22  7:47 ` jakub at gcc dot gnu.org
@ 2024-03-22  7:50 ` jakub at gcc dot gnu.org
  2024-03-22  8:06 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-22  7:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Using wi::multiple_of_p is what I've tried first but that regressed the
generated code for pr71083.c (the test still PASSed, but predcom no longer
triggered on it).
I think it has access size 3 and step 4, so step is not a multiple of access
size, but that is not a problem, all we actually care about is just if access
size isn't bigger than step.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (18 preceding siblings ...)
  2024-03-22  7:50 ` jakub at gcc dot gnu.org
@ 2024-03-22  8:06 ` rguenth at gcc dot gnu.org
  2024-03-22  8:18 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
But note that {0, +, 3 } and {2, + , 3} with size 2 will still eventually
overlap.  See dr_analyze_indices where we attempt to make DR_INIT a
multiple of the element size (probably also not a perfect solution).

In the end representing the access function for the pointer with byte
granularity is a mistake and it should be translated to be based on
the access size.  That of course makes a byte step of 3 with an access
size of 2 not representable (but it avoids these kind of issues).

The other possibility is to filter the remaining problematic cases
during dependence analysis itself where we see both DR_INITs.  As said
the index based dependence analysis doesn't know about such thing as
partial element overlaps.

For the PR at hand your patch looks reasonable, but I think there's
a problem in how we handle the pointer-based access functions.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (19 preceding siblings ...)
  2024-03-22  8:06 ` rguenth at gcc dot gnu.org
@ 2024-03-22  8:18 ` jakub at gcc dot gnu.org
  2024-03-22 10:37 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-22  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #20 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
That is true.  I've been also wondering whether e.g. for the pr71083.c case we
couldn't just look through all COMPONENT_REFs of the DR_REF (and maybe
ARRAY_REFs with constant indexes) and check type size of the aggregate it is in
against the size.  That could perhaps fix the predcom of pr71083.c, because it
is a 24-bit bitfield in 4 byte structure.

Will you take this over, or should I tweak my patch?

Note, statistics I've gathered with the above posted patch was that the patch
did something in
/home/jakub/src/gcc/gcc/testsuite/c-c++-common/torture/pr101636.c bar
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/pr32399.c f
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/autopar/pr69109-2.c f
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/pr71575-2.c fn1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/loop-versioning-6.c f1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr111683-1.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr111683-2.c main
but only changed the generated code in the last 2 (those are intentional).
The older patch with wi::multiple_of_p affected
/home/jakub/src/gcc/gcc/testsuite/c-c++-common/torture/pr101636.c bar
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/pr32399.c f
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/execute/pr71083.c bar
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/autopar/pr69109-2.c f
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/pr71575-2.c fn1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/loop-versioning-6.c f1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr111683-1.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr111683-2.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/torture/pr112736.c fn1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/torture/pr68379.c fn1
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/tree-ssa/pr87022.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr101445.c foo
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr68502-2.c reset_nodes
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr81740-2.c main
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/vect-avg-16.c f
/home/jakub/src/gcc/gcc/testsuite/g++.dg/vect/pr95297.cc test
and changed code generation of the 2 expected tests and pr71083.c.  All this
statistics was solely from make check-gcc check-g++.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (20 preceding siblings ...)
  2024-03-22  8:18 ` jakub at gcc dot gnu.org
@ 2024-03-22 10:37 ` jakub at gcc dot gnu.org
  2024-03-22 10:46 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-22 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 57768
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57768&action=edit
gcc14-pr111683.patch

Updated patch which uses wi::multiple_of_p but doesn't regress pr71083.c.
To be precise, it actually wasn't about foo with the bitfield, that used to be
optimized before/after, but bar which uses packed struct with char and short
members, step was 3 but the short access had size 2.  But given that the 2
sized access is a COMPONENT_REF of the 3 byte sized struct, if step is a
multiple of
3, that is fine as well.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (21 preceding siblings ...)
  2024-03-22 10:37 ` jakub at gcc dot gnu.org
@ 2024-03-22 10:46 ` jakub at gcc dot gnu.org
  2024-03-22 11:35 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-22 10:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #57768|0                           |1
        is obsolete|                            |

--- Comment #22 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 57769
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57769&action=edit
gcc14-pr111683.patch

Actually for bit-fields, TREE_TYPE on the ref is the underlying type, so either
we'd need to somehow better try to understand the actual reference size in that
case, or simply look throught the BIT_FIELD_REF COMPONENT_REFs.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (22 preceding siblings ...)
  2024-03-22 10:46 ` jakub at gcc dot gnu.org
@ 2024-03-22 11:35 ` rguenth at gcc dot gnu.org
  2024-03-22 11:43 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-22 11:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
It looks like this could go to suitable_reference_p instead?

That said, I do wonder why with my patch split_data_refs_to_components
doesn't fixup.  I think it's supposed to handle the case where
dependences are unknown conservatively...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (23 preceding siblings ...)
  2024-03-22 11:35 ` rguenth at gcc dot gnu.org
@ 2024-03-22 11:43 ` jakub at gcc dot gnu.org
  2024-03-22 12:42 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-22 11:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #23)
> It looks like this could go to suitable_reference_p instead?

You mean return false for those making them not suitable at all?
I thought without a write such references would act more like RS_ANY, but
a reason I didn't just treat such references as RS_ANY rather than RS_NONZERO
in suitable_reference_p was because of the assert that all refs in a component
have the same ref_step_type but nothing actually comparing it before the
assertion.

But if you think I should just return false from suitable_reference_p if the
step isn't a multiple of the sizes, I can surely try that.

> That said, I do wonder why with my patch split_data_refs_to_components
> doesn't fixup.  I think it's supposed to handle the case where
> dependences are unknown conservatively...

I'm afraid I'm not familiar with data refs enough to know what was going on.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (24 preceding siblings ...)
  2024-03-22 11:43 ` jakub at gcc dot gnu.org
@ 2024-03-22 12:42 ` rguenther at suse dot de
  2024-03-23 10:18 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenther at suse dot de @ 2024-03-22 12:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #25 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 22 Mar 2024, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683
> 
> --- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #23)
> > It looks like this could go to suitable_reference_p instead?
> 
> You mean return false for those making them not suitable at all?
> I thought without a write such references would act more like RS_ANY, but
> a reason I didn't just treat such references as RS_ANY rather than RS_NONZERO
> in suitable_reference_p was because of the assert that all refs in a component
> have the same ref_step_type but nothing actually comparing it before the
> assertion.

Hmm, true.

> But if you think I should just return false from suitable_reference_p if the
> step isn't a multiple of the sizes, I can surely try that.
> 
> > That said, I do wonder why with my patch split_data_refs_to_components
> > doesn't fixup.  I think it's supposed to handle the case where
> > dependences are unknown conservatively...
> 
> I'm afraid I'm not familiar with data refs enough to know what was going on.

I tried to follow what happens there and I'm also somewhat lost.

Anyway, I think fixing this in predcom in a convenient place even if
it might be not the true fix is OK.  You might want to put a comment
before any such fix indicating there might be more latent issues
in predcom or dependence analysis in general.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (25 preceding siblings ...)
  2024-03-22 12:42 ` rguenther at suse dot de
@ 2024-03-23 10:18 ` cvs-commit at gcc dot gnu.org
  2024-03-23 10:21 ` [Bug middle-end/111683] [11/12/13 " jakub at gcc dot gnu.org
  2024-03-30  3:55 ` cvs-commit at gcc dot gnu.org
  28 siblings, 0 replies; 30+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-23 10:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #26 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:8fc5593df8e0d36cc5bd8ea21097a491a634a866

commit r14-9639-g8fc5593df8e0d36cc5bd8ea21097a491a634a866
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Mar 23 11:17:44 2024 +0100

    predcom: Punt for steps which aren't multiples of access size [PR111683]

    On the following testcases, there is no overlap between data references
    within a single iteration, but the data references have size which is twice
    as large as the step, which means the data references overlap with the next
    iteration which predcom doesn't take into account.
    As discussed in the PR, even if the reference size is smaller than step,
    if step isn't a multiple of the reference size, there could be overlaps
with
    some other iteration later on.
    The initial version of the patch regressed (test still passed, but predcom
    didn't optimize anymore) pr71083.c which has a packed char, short structure
    and was reading/writing the short 2 bytes in there with step 3.
    The following patch deals with that by retrying for COMPONENT_REFs also the
    aggregate sizes etc., so that it then compares 3 bytes against step 3.
    In make check-gcc/check-g++ this patch I believe affects code generation
    for only the 2 new testcases according to statistics I've gathered.

    2024-03-23  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/111683
            * tree-predcom.cc (pcom_worker::suitable_component_p): If has_write
            and comp_step is RS_NONZERO, return false if any reference in the
            component doesn't have DR_STEP a multiple of access size.

            * gcc.dg/pr111683-1.c: New test.
            * gcc.dg/pr111683-2.c: New test.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (26 preceding siblings ...)
  2024-03-23 10:18 ` cvs-commit at gcc dot gnu.org
@ 2024-03-23 10:21 ` jakub at gcc dot gnu.org
  2024-03-30  3:55 ` cvs-commit at gcc dot gnu.org
  28 siblings, 0 replies; 30+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-23 10:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org
             Status|NEW                         |ASSIGNED
            Summary|[11/12/13/14 Regression]    |[11/12/13 Regression]
                   |Incorrect answer when using |Incorrect answer when using
                   |SSE2 intrinsics with -O3    |SSE2 intrinsics with -O3
                   |since                       |since
                   |r7-3163-g973625a04b3d9351f2 |r7-3163-g973625a04b3d9351f2
                   |485e37f7d3382af2aed87e      |485e37f7d3382af2aed87e

--- Comment #27 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed on the trunk so far.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/111683] [11/12/13 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e
  2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
                   ` (27 preceding siblings ...)
  2024-03-23 10:21 ` [Bug middle-end/111683] [11/12/13 " jakub at gcc dot gnu.org
@ 2024-03-30  3:55 ` cvs-commit at gcc dot gnu.org
  28 siblings, 0 replies; 30+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-30  3:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #28 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:4320e8483bb88b49bf251451307324c06d33c0a4

commit r13-8521-g4320e8483bb88b49bf251451307324c06d33c0a4
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Mar 23 11:17:44 2024 +0100

    predcom: Punt for steps which aren't multiples of access size [PR111683]

    On the following testcases, there is no overlap between data references
    within a single iteration, but the data references have size which is twice
    as large as the step, which means the data references overlap with the next
    iteration which predcom doesn't take into account.
    As discussed in the PR, even if the reference size is smaller than step,
    if step isn't a multiple of the reference size, there could be overlaps
with
    some other iteration later on.
    The initial version of the patch regressed (test still passed, but predcom
    didn't optimize anymore) pr71083.c which has a packed char, short structure
    and was reading/writing the short 2 bytes in there with step 3.
    The following patch deals with that by retrying for COMPONENT_REFs also the
    aggregate sizes etc., so that it then compares 3 bytes against step 3.
    In make check-gcc/check-g++ this patch I believe affects code generation
    for only the 2 new testcases according to statistics I've gathered.

    2024-03-23  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/111683
            * tree-predcom.cc (pcom_worker::suitable_component_p): If has_write
            and comp_step is RS_NONZERO, return false if any reference in the
            component doesn't have DR_STEP a multiple of access size.

            * gcc.dg/pr111683-1.c: New test.
            * gcc.dg/pr111683-2.c: New test.

    (cherry picked from commit 8fc5593df8e0d36cc5bd8ea21097a491a634a866)

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-03-30  3:55 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-03 18:27 [Bug c++/111683] New: Incorrect answer when using SSE2 intrinsics with -O3 deodharvinit99 at gmail dot com
2023-10-03 22:18 ` [Bug middle-end/111683] [11/12/13/14 Regression] " pinskia at gcc dot gnu.org
2023-10-04  9:55 ` rguenth at gcc dot gnu.org
2023-10-04 16:56 ` amonakov at gcc dot gnu.org
2023-12-12 10:27 ` jakub at gcc dot gnu.org
2024-03-07 21:06 ` law at gcc dot gnu.org
2024-03-10  5:55 ` [Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e kugan at gcc dot gnu.org
2024-03-10 11:15 ` sjames at gcc dot gnu.org
2024-03-12  1:58 ` pinskia at gcc dot gnu.org
2024-03-12  2:24 ` pinskia at gcc dot gnu.org
2024-03-12  2:31 ` pinskia at gcc dot gnu.org
2024-03-12  2:44 ` pinskia at gcc dot gnu.org
2024-03-12 13:23 ` jakub at gcc dot gnu.org
2024-03-19 17:28 ` jakub at gcc dot gnu.org
2024-03-19 18:17 ` pinskia at gcc dot gnu.org
2024-03-21 16:47 ` jakub at gcc dot gnu.org
2024-03-22  7:33 ` rguenth at gcc dot gnu.org
2024-03-22  7:38 ` rguenth at gcc dot gnu.org
2024-03-22  7:47 ` jakub at gcc dot gnu.org
2024-03-22  7:50 ` jakub at gcc dot gnu.org
2024-03-22  8:06 ` rguenth at gcc dot gnu.org
2024-03-22  8:18 ` jakub at gcc dot gnu.org
2024-03-22 10:37 ` jakub at gcc dot gnu.org
2024-03-22 10:46 ` jakub at gcc dot gnu.org
2024-03-22 11:35 ` rguenth at gcc dot gnu.org
2024-03-22 11:43 ` jakub at gcc dot gnu.org
2024-03-22 12:42 ` rguenther at suse dot de
2024-03-23 10:18 ` cvs-commit at gcc dot gnu.org
2024-03-23 10:21 ` [Bug middle-end/111683] [11/12/13 " jakub at gcc dot gnu.org
2024-03-30  3:55 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).