public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working)
@ 2022-07-16  9:48 malat at debian dot org
  2022-07-16  9:55 ` [Bug c++/106322] " malat at debian dot org
                   ` (54 more replies)
  0 siblings, 55 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-16  9:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

            Bug ID: 106322
           Summary: i386: Wrong code at O2 level (O0 / O1 are working)
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: malat at debian dot org
  Target Milestone: ---

I can trigger an assertion in highway unit test suite on i386 when using -O2,
which does not happen neither at -O1 nor at -O0.

Symptoms:

% ./tests/mul_test
"--gtest_filter=HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128"          
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter = HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyMulTestGroup/HwyMulTest
[ RUN      ] HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128


i16x8 expect [0+ ->]:
  0x3FFF,0x0FFF,0x03FF,0x00FF,0x003F,0x000F,0x0003,
i16x8 actual [0+ ->]:
  0xBFFF,0x0FFF,0xE400,0x00FF,0xF840,0x000F,0xFE04,
Abort at ./hwy/tests/mul_test.cc:131: Emu128, i16x8 lane 0 mismatch: expected
'0x3FFF', got '0xBFFF'.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug c++/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
@ 2022-07-16  9:55 ` malat at debian dot org
  2022-07-16 10:00 ` malat at debian dot org
                   ` (53 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-16  9:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #1 from Mathieu Malaterre <malat at debian dot org> ---
I can reduce the gtest code to simply:

```
HWY_NOINLINE void TestAllMulHigh() {
  ForPartialVectors<TestMulHigh> test;
  test(int16_t());
//  test(uint16_t());
}
```

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug c++/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
  2022-07-16  9:55 ` [Bug c++/106322] " malat at debian dot org
@ 2022-07-16 10:00 ` malat at debian dot org
  2022-07-16 10:00 ` malat at debian dot org
                   ` (52 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-16 10:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Mathieu Malaterre <malat at debian dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |malat at debian dot org

--- Comment #2 from Mathieu Malaterre <malat at debian dot org> ---
Created attachment 53305
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53305&action=edit
gcc-11 -O2 -save-temps

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug c++/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
  2022-07-16  9:55 ` [Bug c++/106322] " malat at debian dot org
  2022-07-16 10:00 ` malat at debian dot org
@ 2022-07-16 10:00 ` malat at debian dot org
  2022-07-16 10:02 ` malat at debian dot org
                   ` (51 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-16 10:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #3 from Mathieu Malaterre <malat at debian dot org> ---
Created attachment 53306
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53306&action=edit
gcc-12 -O2 -save-temps

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug c++/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (2 preceding siblings ...)
  2022-07-16 10:00 ` malat at debian dot org
@ 2022-07-16 10:02 ` malat at debian dot org
  2022-07-16 10:02 ` malat at debian dot org
                   ` (50 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-16 10:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #4 from Mathieu Malaterre <malat at debian dot org> ---
gcc-11 version on my side is:

 % gcc-11 --version
gcc-11 (Debian 11.3.0-4) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


gcc-12 version is:

% gcc-12 --version
gcc-12 (Debian 12.1.0-5) 12.1.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Everything was executed from public porterbox barriere.debian.org (Debian sid
chroot).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug c++/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (3 preceding siblings ...)
  2022-07-16 10:02 ` malat at debian dot org
@ 2022-07-16 10:02 ` malat at debian dot org
  2022-07-16 10:07 ` malat at debian dot org
                   ` (49 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-16 10:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #5 from Mathieu Malaterre <malat at debian dot org> ---
Compilation line for -save-temps is:

% /usr/bin/g++-12 -DHWY_STATIC_DEFINE
-I"/home/malat/highway-0.17.1~git20220711.f0a396a" -O2 -fPIE
-fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined
-D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\"
-fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla
-Wnon-virtual-dtor -fmath-errno -fno-exceptions -DHWY_IS_TEST=1
-DGTEST_HAS_PTHREAD=1 -MD -MT CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o
-MF CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o.d -o
CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o -c
"/home/malat/highway-0.17.1~git20220711.f0a396a/hwy/tests/mul_test.cc"
-save-temps

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug c++/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (4 preceding siblings ...)
  2022-07-16 10:02 ` malat at debian dot org
@ 2022-07-16 10:07 ` malat at debian dot org
  2022-07-16 10:15 ` malat at debian dot org
                   ` (48 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-16 10:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #6 from Mathieu Malaterre <malat at debian dot org> ---
Using `-fno-strict-aliasing` causes the same symptoms:

/usr/bin/g++-12 -DHWY_STATIC_DEFINE
-I"/home/malat/highway-0.17.1~git20220711.f0a396a" -O2 -fno-strict-aliasing
-fPIE -fvisibility=hidden -fvisibility-inlines-hidden
-Wno-builtin-macro-redefined -D__DATE__=\"redacted\"
-D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants
-Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor
-fmath-errno -fno-exceptions -DHWY_IS_TEST=1 -DGTEST_HAS_PTHREAD=1 -MD -MT
CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o -MF
CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o.d -o
CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o -c
"/home/malat/highway-0.17.1~git20220711.f0a396a/hwy/tests/mul_test.cc"
[...]
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter = HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyMulTestGroup/HwyMulTest
[ RUN      ] HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128


i16x8 expect [0+ ->]:
  0x3FFF,0x0FFF,0x03FF,0x00FF,0x003F,0x000F,0x0003,
i16x8 actual [0+ ->]:
  0xBFFF,0x0FFF,0xE400,0x00FF,0xF840,0x000F,0xFE04,
Abort at
/home/malat/highway-0.17.1~git20220711.f0a396a/hwy/tests/mul_test.cc:131:
Emu128, i16x8 lane 0 mismatch: expected '0x3FFF', got '0xBFFF'.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug c++/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (5 preceding siblings ...)
  2022-07-16 10:07 ` malat at debian dot org
@ 2022-07-16 10:15 ` malat at debian dot org
  2022-07-17 20:20 ` [Bug target/106322] " pinskia at gcc dot gnu.org
                   ` (47 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-16 10:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #7 from Mathieu Malaterre <malat at debian dot org> ---
I can make it fails with sanitize=address:

/usr/bin/g++-12 -DHWY_STATIC_DEFINE
-I"/home/malat/highway-0.17.1~git20220711.f0a396a" -O2 -fsanitize=address -fPIE
-fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined
-D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\"
-fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla
-Wnon-virtual-dtor -fmath-errno -fno-exceptions -DHWY_IS_TEST=1
-DGTEST_HAS_PTHREAD=1 -MD -MT CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o
-MF CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o.d -o
CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o -c
"/home/malat/highway-0.17.1~git20220711.f0a396a/hwy/tests/mul_test.cc"
[...]
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter = HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyMulTestGroup/HwyMulTest
[ RUN      ] HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128


i16x2 expect [0+ ->]:
  0x3FFF,0x0FFF,
i16x2 actual [0+ ->]:
  0xBFFF,0x0FFF,
Abort at
/home/malat/highway-0.17.1~git20220711.f0a396a/hwy/tests/mul_test.cc:131:
Emu128, i16x2 lane 0 mismatch: expected '0x3FFF', got '0xBFFF'.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug target/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (6 preceding siblings ...)
  2022-07-16 10:15 ` malat at debian dot org
@ 2022-07-17 20:20 ` pinskia at gcc dot gnu.org
  2022-07-18  8:48 ` marxin at gcc dot gnu.org
                   ` (46 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-17 20:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |i386-linux-gnu
          Component|c++                         |target

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Does -fwrapv fix the issue?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug target/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (7 preceding siblings ...)
  2022-07-17 20:20 ` [Bug target/106322] " pinskia at gcc dot gnu.org
@ 2022-07-18  8:48 ` marxin at gcc dot gnu.org
  2022-07-18 14:40 ` malat at debian dot org
                   ` (45 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-07-18  8:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2022-07-18
                 CC|                            |marxin at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug target/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (8 preceding siblings ...)
  2022-07-18  8:48 ` marxin at gcc dot gnu.org
@ 2022-07-18 14:40 ` malat at debian dot org
  2022-07-19  7:58 ` ubizjak at gmail dot com
                   ` (44 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-07-18 14:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #9 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Andrew Pinski from comment #8)
> Does -fwrapv fix the issue?

No. This seems like the exact same symptoms:

% ./tests/mul_test
"--gtest_filter=HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128"          
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter = HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyMulTestGroup/HwyMulTest
[ RUN      ] HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128


i16x8 expect [0+ ->]:
  0x3FFF,0x0FFF,0x03FF,0x00FF,0x003F,0x000F,0x0003,
i16x8 actual [0+ ->]:
  0xBFFF,0x0FFF,0xE400,0x00FF,0xF840,0x000F,0xFE04,
Abort at
/home/malat/highway-0.17.1~git20220711.f0a396a/hwy/tests/mul_test.cc:131:
Emu128, i16x8 lane 0 mismatch: expected '0x3FFF', got '0xBFFF'.


Technically I can also execute the `uint16` portion of the unit test and
produce a failure (so this seems to be consistent behavior with signed
counterpart):

```
HWY_NOINLINE void TestAllMulHigh() {
  ForPartialVectors<TestMulHigh> test;
//  test(int16_t());
  test(uint16_t());
}
```

And then:


```
% ./tests/mul_test
"--gtest_filter=HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128"          
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter = HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyMulTestGroup/HwyMulTest
[ RUN      ] HwyMulTestGroup/HwyMulTest.TestAllMulHigh/Emu128


u16x8 expect [0+ ->]:
  0xFFFE,0x3FFF,0x0FFF,0x03FF,0x00FF,0x003F,0x000F,
u16x8 actual [0+ ->]:
  0xFFFF,0x3FFF,0xD000,0x03FF,0xF100,0x003F,0xFC10,
Abort at
/home/malat/highway-0.17.1~git20220711.f0a396a/hwy/tests/mul_test.cc:131:
Emu128, u16x8 lane 0 mismatch: expected '0xFFFE', got '0xFFFF'.
```

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug target/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (9 preceding siblings ...)
  2022-07-18 14:40 ` malat at debian dot org
@ 2022-07-19  7:58 ` ubizjak at gmail dot com
  2022-08-03  8:41 ` malat at debian dot org
                   ` (43 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: ubizjak at gmail dot com @ 2022-07-19  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Mathieu Malaterre from comment #9)

> Technically I can also execute the `uint16` portion of the unit test and
> produce a failure (so this seems to be consistent behavior with signed
> counterpart):
> 
> ```
> HWY_NOINLINE void TestAllMulHigh() {
>   ForPartialVectors<TestMulHigh> test;
> //  test(int16_t());
>   test(uint16_t());
> }


As this is a runtime failure, you will have to provide a (minimized) runtime
testcase. I took a quick look at the sources and it looks to me that the
following procedure can obtain a testcase:

Use tests/mul_tests.cc and strip out as much lines as possible. Above the part
that you show are several tests. Please find out which test fails.

As can be seen from the test run, the failure is in the 128bit emulation part.
These operations are in hwy/ops/emu128-inl.h, specifically:

--cut here--
HWY_API Vec128<uint16_t, N> MulHigh(Vec128<uint16_t, N> a,
                                    const Vec128<uint16_t, N> b) {
  for (size_t i = 0; i < N; ++i) {
    // Cast to uint32_t first to prevent overflow. Otherwise the result of
    // uint16_t * uint16_t is in "int" which may overflow. In practice the
    // result is the same but this way it is also defined.
    a.raw[i] = static_cast<uint16_t>(
        (static_cast<uint32_t>(a.raw[i]) * static_cast<uint32_t>(b.raw[i])) >>
        16);
  }
  return a;
}
--cut here--

Put everything together in one file, check if it still fails, and you have a
testcase. If it is possible, simplify it as much as possible and if you can
convert it to a plain C, the testcase will be much easier to analyse.

The reason the test fails with gcc-12 is that gcc-12 enabled auto-vectorisation
for -O2. The failure suggests there are some issues with the vectorisation of
the above code, or perhaps with the preparation of test values before the loop.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug target/106322] i386: Wrong code at O2 level (O0 / O1 are working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (10 preceding siblings ...)
  2022-07-19  7:58 ` ubizjak at gmail dot com
@ 2022-08-03  8:41 ` malat at debian dot org
  2022-08-03 12:31 ` [Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) malat at debian dot org
                   ` (42 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-03  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #11 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Uroš Bizjak from comment #10)
> The reason the test fails with gcc-12 is that gcc-12 enabled
> auto-vectorisation for -O2.

I can make the symptoms go away by doing: `-O2 -fno-tree-vectorize`. Since this
affects also arm5 and powerpc, it seems the bug is somewhere in the shared
32bits code (bug does not affects 64bits arch for some reason).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (11 preceding siblings ...)
  2022-08-03  8:41 ` malat at debian dot org
@ 2022-08-03 12:31 ` malat at debian dot org
  2022-08-03 12:32 ` malat at debian dot org
                   ` (41 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-03 12:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #12 from Mathieu Malaterre <malat at debian dot org> ---
Created attachment 53406
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53406&action=edit
main function with no-tree-optimize attribute

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (12 preceding siblings ...)
  2022-08-03 12:31 ` [Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) malat at debian dot org
@ 2022-08-03 12:32 ` malat at debian dot org
  2022-08-03 12:33 ` malat at debian dot org
                   ` (40 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-03 12:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #13 from Mathieu Malaterre <malat at debian dot org> ---
Created attachment 53407
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53407&action=edit
main function with no-tree-optimize attribute

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (13 preceding siblings ...)
  2022-08-03 12:32 ` malat at debian dot org
@ 2022-08-03 12:33 ` malat at debian dot org
  2022-08-05 13:14 ` [Bug tree-optimization/106322] " malat at debian dot org
                   ` (39 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-03 12:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #14 from Mathieu Malaterre <malat at debian dot org> ---
I can make the symptom go away with a single function attribute:

```
% diff -u *
--- /tmp/ii/mul_test.cc.ii.bad  2022-08-03 12:29:41.192263306 +0000
+++ /tmp/ii/mul_test.cc.ii.good 2022-08-03 12:29:41.196263281 +0000
@@ -124932,7 +124932,7 @@
    }
   template <typename T, class D>
   __attribute__((noinline)) void
-
+  __attribute__((optimize("no-tree-vectorize")))
   operator()(T , D d) {

     const size_t N = Lanes(d);
```

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (14 preceding siblings ...)
  2022-08-03 12:33 ` malat at debian dot org
@ 2022-08-05 13:14 ` malat at debian dot org
  2022-08-08  7:12 ` malat at debian dot org
                   ` (38 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-05 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #15 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Mathieu Malaterre from comment #11)
> (In reply to Uroš Bizjak from comment #10)
> > The reason the test fails with gcc-12 is that gcc-12 enabled
> > auto-vectorisation for -O2.
> 
> I can make the symptoms go away by doing: `-O2 -fno-tree-vectorize`. Since
> this affects also arm5 and powerpc, it seems the bug is somewhere in the
> shared 32bits code (bug does not affects 64bits arch for some reason).

The above is incorrect, since the symptoms are also going away on mips64el and
ppc64.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (15 preceding siblings ...)
  2022-08-05 13:14 ` [Bug tree-optimization/106322] " malat at debian dot org
@ 2022-08-08  7:12 ` malat at debian dot org
  2022-08-08  7:20 ` malat at debian dot org
                   ` (37 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-08  7:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #16 from Mathieu Malaterre <malat at debian dot org> ---
Simplified version:

% cat demo.cc
#include "hwy/highway.h"
#include "hwy/tests/test_util-inl.h"


struct TestMulHigh {
  template <typename T, class D>
  void operator()(T /*unused*/, D d) {
    const size_t N = 2;
    hwy::AlignedFreeUniquePtr<short unsigned int []> in_lanes =
hwy::AllocateAligned<T>(N);
    uint16_t expected_lanes[2];

    for (size_t i = 0; i < N; ++i) {
      in_lanes[i] = T(hwy::LimitsMax<T>() >> i);
    }
    expected_lanes[0] = 65534 ;
    expected_lanes[1] = 16383 ;
    hwy::N_EMU128::Vec128<uint16_t, 2> v = Load(d, in_lanes.get());
    HWY_ASSERT_VEC_EQ(d, in_lanes.get(), v);
    hwy::N_EMU128::Vec128<uint16_t, 2> actual = MulHigh(v, v);

    HWY_ASSERT_VEC_EQ(d, expected_lanes, actual);
  }
};

int main()
{
  TestMulHigh()(uint16_t(), hwy::N_EMU128::FixedTag<uint16_t, 2>());
}


Working:

% g++ -O2 -fno-tree-vectorize -o demo demo.cc -lhwy -lhwy_test

Not working:

% g++ -O2 -o demo demo.cc -lhwy -lhwy_test

With:

% apt-cache policy libhwy-dev
libhwy-dev:
  Installed: 1.0.0-5
  Candidate: 1.0.0-5
  Version table:
 *** 1.0.0-5 500
        500 http://deb.debian.org/debian sid/main i386 Packages
        100 /var/lib/dpkg/status

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (16 preceding siblings ...)
  2022-08-08  7:12 ` malat at debian dot org
@ 2022-08-08  7:20 ` malat at debian dot org
  2022-08-08 10:00 ` malat at debian dot org
                   ` (36 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-08  7:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #17 from Mathieu Malaterre <malat at debian dot org> ---
Created attachment 53424
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53424&action=edit
g++ -save-temps -O2 -o demo demo.cc -lhwy -lhwy_test

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (17 preceding siblings ...)
  2022-08-08  7:20 ` malat at debian dot org
@ 2022-08-08 10:00 ` malat at debian dot org
  2022-08-09  7:50 ` malat at debian dot org
                   ` (35 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-08 10:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #18 from Mathieu Malaterre <malat at debian dot org> ---
Brushed-up example (with Makefile):

% more Makefile bytes.cc demo.cc
::::::::::::::
Makefile
::::::::::::::
CXXFLAGS := -O2

demo: demo.o bytes.o
        $(CXX) $(CXXFLAGS) -o $@ $^ -lhwy

clean:
        rm -f bytes.o demo.o
::::::::::::::
bytes.cc
::::::::::::::
#include <cstring>

bool BytesEqual2(const void *bytes1, const void *bytes2, const size_t size) {
  return memcmp(bytes1, bytes2, size) == 0;
}
::::::::::::::
demo.cc
::::::::::::::
#include "hwy/aligned_allocator.h"
#include "hwy/highway.h"

#include <cstring>

bool BytesEqual2(const void *p1, const void *p2, const size_t size);

template <class D, class V>
void AssertVecEqual2(D d, const uint16_t *expected, const V &actual) {
  const size_t N = 2;
  auto actual_lanes = hwy::AllocateAligned<uint16_t>(N);
  Store(actual, d, actual_lanes.get());
  const uint8_t *expected_array = reinterpret_cast<const uint8_t *>(expected);
  const uint8_t *actual_array =
      reinterpret_cast<const uint8_t *>(actual_lanes.get());
  for (size_t i = 0; i < N; ++i) {
    const uint8_t *expected_ptr = expected_array + i * 2;
    const uint8_t *actual_ptr = actual_array + i * 2;
#if 1
    // trigger bug
    if (!BytesEqual2(expected_ptr, actual_ptr, 2)) {
#else
    // no bug
    if (std::memcmp(expected_ptr, actual_ptr, 2) != 0) {
#endif
    abort();
  }
}
}

int main() {
  hwy::N_EMU128::FixedTag<uint16_t, 2> d;
  const size_t N = 2;
  hwy::AlignedFreeUniquePtr<uint16_t[]> in_lanes =
      hwy::AllocateAligned<uint16_t>(N);
  uint16_t expected_lanes[2];
  in_lanes[0] = 65535;
  in_lanes[1] = 32767;
  expected_lanes[0] = 65534;
  expected_lanes[1] = 16383;
  hwy::N_EMU128::Vec128<uint16_t, 2> v = Load(d, in_lanes.get());
  hwy::N_EMU128::Vec128<uint16_t, 2> actual = MulHigh(v, v);
  AssertVecEqual2(d, expected_lanes, actual);
}

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (18 preceding siblings ...)
  2022-08-08 10:00 ` malat at debian dot org
@ 2022-08-09  7:50 ` malat at debian dot org
  2022-08-09 12:36 ` marxin at gcc dot gnu.org
                   ` (34 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09  7:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #19 from Mathieu Malaterre <malat at debian dot org> ---
Without hwy dependency:

 % more Makefile bytes.cc demo.cc
::::::::::::::
Makefile
::::::::::::::
CXXFLAGS := -O2

demo: demo.o bytes.o
        $(CXX) $(CXXFLAGS) -o $@ $^

clean:
        rm -f bytes.o demo.o
::::::::::::::
bytes.cc
::::::::::::::
#include <cstring>

bool BytesEqual(const void *bytes1, const void *bytes2, const size_t size) {
  return memcmp(bytes1, bytes2, size) == 0;
}
::::::::::::::
demo.cc
::::::::::::::
#include <atomic>
#include <cassert>
#include <cstdlib>
#include <cstring>
#include <limits>
#include <memory>

#define HWY_ALIGNMENT 64
constexpr size_t kAlignment = HWY_ALIGNMENT;
constexpr size_t kAlias = kAlignment * 4;

bool BytesEqual(const void *p1, const void *p2, const size_t size);

namespace hwy {
namespace N_EMU128 {
template <typename T, size_t N = 16 / sizeof(T)> struct Vec128 {
  T raw[16 / sizeof(T)] = {};
};
} // namespace N_EMU128
} // namespace hwy

template <typename T, size_t N>
static void Store(const hwy::N_EMU128::Vec128<T, N> v,
                  T *__restrict__ aligned) {
  __builtin_memcpy(aligned, v.raw, sizeof(T) * N);
}

template <typename T, size_t N>
static hwy::N_EMU128::Vec128<T, N> Load(const T *__restrict__ aligned) {
  hwy::N_EMU128::Vec128<T, N> v;
  __builtin_memcpy(v.raw, aligned, sizeof(T) * N);
  return v;
}

template <size_t N>
static hwy::N_EMU128::Vec128<uint16_t, N>
MulHigh(hwy::N_EMU128::Vec128<uint16_t, N> a,
        const hwy::N_EMU128::Vec128<uint16_t, N> b) {
  for (size_t i = 0; i < N; ++i) {
    // Cast to uint32_t first to prevent overflow. Otherwise the result of
    // uint16_t * uint16_t is in "int" which may overflow. In practice the
    // result is the same but this way it is also defined.
    a.raw[i] = static_cast<uint16_t>(
        (static_cast<uint32_t>(a.raw[i]) * static_cast<uint32_t>(b.raw[i])) >>
        16);
  }
  return a;
}

#define HWY_ASSERT(condition) assert((condition))
#define HWY_ASSUME_ALIGNED(ptr, align) __builtin_assume_aligned((ptr), (align))

#pragma pack(push, 1)
struct AllocationHeader {
  void *allocated;
  size_t payload_size;
};
#pragma pack(pop)

static void FreeAlignedBytes(const void *aligned_pointer) {
  HWY_ASSERT(aligned_pointer != nullptr);
  if (aligned_pointer == nullptr)
    return;

  const uintptr_t payload = reinterpret_cast<uintptr_t>(aligned_pointer);
  HWY_ASSERT(payload % kAlignment == 0);
  const AllocationHeader *header =
      reinterpret_cast<const AllocationHeader *>(payload) - 1;

  free(header->allocated);
}

class AlignedFreer {
public:
  template <typename T> void operator()(T *aligned_pointer) const {
    FreeAlignedBytes(aligned_pointer);
  }
};

template <typename T>
using AlignedFreeUniquePtr = std::unique_ptr<T, AlignedFreer>;

static inline constexpr size_t ShiftCount(size_t n) {
  return (n <= 1) ? 0 : 1 + ShiftCount(n / 2);
}

namespace {
static size_t NextAlignedOffset() {
  static std::atomic<uint32_t> next{0};
  constexpr uint32_t kGroups = kAlias / kAlignment;
  const uint32_t group = next.fetch_add(1, std::memory_order_relaxed) %
kGroups;
  const size_t offset = kAlignment * group;
  HWY_ASSERT((offset % kAlignment == 0) && offset <= kAlias);
  //  std::cerr << "O: " << offset << std::endl;
  return offset;
}
} // namespace

static void *AllocateAlignedBytes(const size_t payload_size) {
  HWY_ASSERT(payload_size != 0); // likely a bug in caller
  if (payload_size >= std::numeric_limits<size_t>::max() / 2) {
    HWY_ASSERT(false && "payload_size too large");
    return nullptr;
  }

  size_t offset = NextAlignedOffset();

  // What: | misalign | unused | AllocationHeader |payload
  // Size: |<= kAlias | offset                    |payload_size
  //       ^allocated.^aligned.^header............^payload
  // The header must immediately precede payload, which must remain aligned.
  // To avoid wasting space, the header resides at the end of `unused`,
  // which therefore cannot be empty (offset == 0).
  if (offset == 0) {
    offset = kAlignment; // = RoundUpTo(sizeof(AllocationHeader), kAlignment)
    static_assert(sizeof(AllocationHeader) <= kAlignment, "Else: round up");
  }

  const size_t allocated_size = kAlias + offset + payload_size;
  void *allocated = malloc(allocated_size);
  HWY_ASSERT(allocated != nullptr);
  if (allocated == nullptr)
    return nullptr;
  // Always round up even if already aligned - we already asked for kAlias
  // extra bytes and there's no way to give them back.
  uintptr_t aligned = reinterpret_cast<uintptr_t>(allocated) + kAlias;
  static_assert((kAlias & (kAlias - 1)) == 0, "kAlias must be a power of 2");
  static_assert(kAlias >= kAlignment, "Cannot align to more than kAlias");
  aligned &= ~(kAlias - 1);

  const uintptr_t payload = aligned + offset; // still aligned

  // Stash `allocated` and payload_size inside header for FreeAlignedBytes().
  // The allocated_size can be reconstructed from the payload_size.
  AllocationHeader *header = reinterpret_cast<AllocationHeader *>(payload) - 1;
  header->allocated = allocated;
  header->payload_size = payload_size;

  //printf("%d-byte aligned addr: %p\n", kAlignment,
reinterpret_cast<void*>(payload));
  return HWY_ASSUME_ALIGNED(reinterpret_cast<void *>(payload), kAlignment);
}

template <typename T> static T *AllocateAlignedItems(size_t items) {
  constexpr size_t size = sizeof(T);

  constexpr bool is_pow2 = (size & (size - 1)) == 0;
  constexpr size_t bits = ShiftCount(size);
  static_assert(!is_pow2 || (1ull << bits) == size, "ShiftCount is incorrect");

  const size_t bytes = is_pow2 ? items << bits : items * size;
  const size_t check = is_pow2 ? bytes >> bits : bytes / size;
  if (check != items) {
    return nullptr; // overflowed
  }
  return static_cast<T *>(AllocateAlignedBytes(bytes));
}

template <typename T>
static AlignedFreeUniquePtr<T[]> AllocateAligned(const size_t items) {
  return AlignedFreeUniquePtr<T[]>(AllocateAlignedItems<T>(items),
                                   AlignedFreer());
}

int main() {
  AlignedFreeUniquePtr<uint16_t[]> in_lanes = AllocateAligned<uint16_t>(2);
  uint16_t expected_lanes[2];
  in_lanes[0] = 65535;
  in_lanes[1] = 32767;
  expected_lanes[0] = 65534;
  expected_lanes[1] = 16383;
  hwy::N_EMU128::Vec128<uint16_t, 2> v = Load<uint16_t, 2>(in_lanes.get());
  hwy::N_EMU128::Vec128<uint16_t, 2> actual = MulHigh(v, v);
  {
    auto actual_lanes = AllocateAligned<uint16_t>(2);
    Store(actual, actual_lanes.get());
    const uint8_t *expected_array =
        reinterpret_cast<const uint8_t *>(expected_lanes);
    const uint8_t *actual_array =
        reinterpret_cast<const uint8_t *>(actual_lanes.get());
    for (size_t i = 0; i < 2; ++i) {
      const uint8_t *expected_ptr = expected_array + i * 2;
      const uint8_t *actual_ptr = actual_array + i * 2;
#if 1
      // trigger bug
      if (!BytesEqual(expected_ptr, actual_ptr, 2)) {
#else
      // no bug
      if (std::memcmp(expected_ptr, actual_ptr, 2) != 0) {
#endif
        abort();
      }
    }
  }
}

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (19 preceding siblings ...)
  2022-08-09  7:50 ` malat at debian dot org
@ 2022-08-09 12:36 ` marxin at gcc dot gnu.org
  2022-08-09 12:58 ` malat at debian dot org
                   ` (33 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-08-09 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #20 from Martin Liška <marxin at gcc dot gnu.org> ---
Hmm, can't reproduce with x86_64 compiler with -m32:

$ g++ --version
g++ (SUSE Linux) 12.1.1 20220721 [revision
4f15d2234608e82159d030dadb17af678cfad626
...
$ g++ *.cc -O2 -m32 && ./a.out && echo Ok
Ok

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (20 preceding siblings ...)
  2022-08-09 12:36 ` marxin at gcc dot gnu.org
@ 2022-08-09 12:58 ` malat at debian dot org
  2022-08-09 13:00 ` ubizjak at gmail dot com
                   ` (32 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09 12:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #21 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Martin Liška from comment #20)
> Hmm, can't reproduce with x86_64 compiler with -m32:
> 
> $ g++ --version
> g++ (SUSE Linux) 12.1.1 20220721 [revision
> 4f15d2234608e82159d030dadb17af678cfad626
> ...
> $ g++ *.cc -O2 -m32 && ./a.out && echo Ok
> Ok

I also confirm the behavior over here. However my x86 binary produces the
expected 'abort' from my multi-arch amd64.

There is no point in attaching *.o here, right ? A quick check seems to
indicate that the issue is:

schroot-32 $ g++ -O2 -c -o demo.o demo.cc
schroot-32 $ <ctrl+d>
amd64 $ g++ -O2 -m32 -c -o bytes.o bytes.cc
amd64 $ g++ -O2 -m32 -o demo demo.o bytes.o
amd64 $ ./demo
zsh: abort      ./demo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (21 preceding siblings ...)
  2022-08-09 12:58 ` malat at debian dot org
@ 2022-08-09 13:00 ` ubizjak at gmail dot com
  2022-08-09 13:03 ` malat at debian dot org
                   ` (31 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: ubizjak at gmail dot com @ 2022-08-09 13:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #22 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Martin Liška from comment #20)
> Hmm, can't reproduce with x86_64 compiler with -m32:
> 
> $ g++ --version
> g++ (SUSE Linux) 12.1.1 20220721 [revision
> 4f15d2234608e82159d030dadb17af678cfad626
> ...
> $ g++ *.cc -O2 -m32 && ./a.out && echo Ok
> Ok

Do you need -msse2 to actually enable vectorization?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (22 preceding siblings ...)
  2022-08-09 13:00 ` ubizjak at gmail dot com
@ 2022-08-09 13:03 ` malat at debian dot org
  2022-08-09 13:04 ` marxin at gcc dot gnu.org
                   ` (30 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #23 from Mathieu Malaterre <malat at debian dot org> ---
Nevermind; I can reproduce the issue with a sid/amd64 chroot:

stable64 % schroot -c sid64
sid64 % g++ --version
g++ (Debian 12.1.0-7) 12.1.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


sid64 %  g++ *.cc -O2 -m32 && ./a.out
zsh: IOT instruction  ./a.out

I'll report against Debian bugtracker for now.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (23 preceding siblings ...)
  2022-08-09 13:03 ` malat at debian dot org
@ 2022-08-09 13:04 ` marxin at gcc dot gnu.org
  2022-08-09 13:05 ` malat at debian dot org
                   ` (29 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-08-09 13:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #24 from Martin Liška <marxin at gcc dot gnu.org> ---
> sid64 %  g++ *.cc -O2 -m32 && ./a.out

Please provide output with --verbose.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (24 preceding siblings ...)
  2022-08-09 13:04 ` marxin at gcc dot gnu.org
@ 2022-08-09 13:05 ` malat at debian dot org
  2022-08-09 13:11 ` [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5 marxin at gcc dot gnu.org
                   ` (28 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09 13:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #25 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Martin Liška from comment #24)
> > sid64 %  g++ *.cc -O2 -m32 && ./a.out
> 
> Please provide output with --verbose.

% g++ --verbose *.cc -O2 -m32 && ./a.out
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.1.0-7'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-aYRw0H/gcc-12-12.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-aYRw0H/gcc-12-12.1.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.1.0 (Debian 12.1.0-7)
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a-'
 /usr/lib/gcc/x86_64-linux-gnu/12/cc1plus -quiet -v -imultilib 32 -imultiarch
i386-linux-gnu -D_GNU_SOURCE bytes.cc -quiet -dumpdir a- -dumpbase bytes.cc
-dumpbase-ext .cc -m32 -mtune=generic -march=i686 -O2 -version
-fasynchronous-unwind-tables -o /tmp/cccQJh1u.s
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/i386-linux-gnu/c++/12"
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i386-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/12
 /usr/include/x86_64-linux-gnu/c++/12/32
 /usr/include/c++/12/backward
 /usr/lib/gcc/x86_64-linux-gnu/12/include
 /usr/local/include
 /usr/include
End of search list.
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 8a56007e6299a53b3d2bb12e46ecf480
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a-'
 as -v --32 -o /tmp/ccG1Wx1X.o /tmp/cccQJh1u.s
GNU assembler version 2.38.90 (x86_64-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.38.90.20220713
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a-'
 /usr/lib/gcc/x86_64-linux-gnu/12/cc1plus -quiet -v -imultilib 32 -imultiarch
i386-linux-gnu -D_GNU_SOURCE demo.cc -quiet -dumpdir a- -dumpbase demo.cc
-dumpbase-ext .cc -m32 -mtune=generic -march=i686 -O2 -version
-fasynchronous-unwind-tables -o /tmp/cccQJh1u.s
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/i386-linux-gnu/c++/12"
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i386-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/12
 /usr/include/x86_64-linux-gnu/c++/12/32
 /usr/include/c++/12/backward
 /usr/lib/gcc/x86_64-linux-gnu/12/include
 /usr/local/include
 /usr/include
End of search list.
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 8a56007e6299a53b3d2bb12e46ecf480
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a-'
 as -v --32 -o /tmp/ccXuCoem.o /tmp/cccQJh1u.s
GNU assembler version 2.38.90 (x86_64-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.38.90.20220713
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/12/32/:/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib32/:/lib/../lib32/:/usr/lib/../lib32/:/usr/lib/gcc/x86_64-linux-gnu/12/:/usr/lib/gcc/x86_64-linux-gnu/12/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a.'
 /usr/lib/gcc/x86_64-linux-gnu/12/collect2 -plugin
/usr/lib/gcc/x86_64-linux-gnu/12/liblto_plugin.so
-plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
-plugin-opt=-fresolution=/tmp/ccDtT2bM.res -plugin-opt=-pass-through=-lgcc_s
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc
-plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --build-id
--eh-frame-hdr -m elf_i386 --hash-style=gnu --as-needed -dynamic-linker
/lib/ld-linux.so.2 -pie
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib32/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib32/crti.o
/usr/lib/gcc/x86_64-linux-gnu/12/32/crtbeginS.o
-L/usr/lib/gcc/x86_64-linux-gnu/12/32
-L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib32 -L/lib/../lib32
-L/usr/lib/../lib32 -L/usr/lib/gcc/x86_64-linux-gnu/12
-L/usr/lib/gcc/x86_64-linux-gnu/12/../../.. /tmp/ccG1Wx1X.o /tmp/ccXuCoem.o
-lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc
/usr/lib/gcc/x86_64-linux-gnu/12/32/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib32/crtn.o
COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic'
'-march=i686' '-dumpdir' 'a.'
zsh: IOT instruction  ./a.out

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (25 preceding siblings ...)
  2022-08-09 13:05 ` malat at debian dot org
@ 2022-08-09 13:11 ` marxin at gcc dot gnu.org
  2022-08-09 13:12 ` marxin at gcc dot gnu.org
                   ` (27 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-08-09 13:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |linkw at gcc dot gnu.org
            Summary|tree-vectorize: Wrong code  |[12/13 Regression]
                   |at O2 level                 |tree-vectorize: Wrong code
                   |(-fno-tree-vectorize is     |at O2 level
                   |working)                    |(-fno-tree-vectorize is
                   |                            |working) since
                   |                            |r12-2404-ga1d27560770818c5
             Status|WAITING                     |NEW

--- Comment #26 from Martin Liška <marxin at gcc dot gnu.org> ---
Cool! I can reproduce it now with:

$ g++ *.cc -O3 -m32 -mtune=generic -march=i686 && ./a.out
Aborted (core dumped)


and it started with r12-2404-ga1d27560770818c5.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (26 preceding siblings ...)
  2022-08-09 13:11 ` [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5 marxin at gcc dot gnu.org
@ 2022-08-09 13:12 ` marxin at gcc dot gnu.org
  2022-08-09 13:26 ` linkw at gcc dot gnu.org
                   ` (26 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-08-09 13:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #27 from Martin Liška <marxin at gcc dot gnu.org> ---
Crashes also w/ -fno-strict-aliasing.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (27 preceding siblings ...)
  2022-08-09 13:12 ` marxin at gcc dot gnu.org
@ 2022-08-09 13:26 ` linkw at gcc dot gnu.org
  2022-08-09 13:29 ` marxin at gcc dot gnu.org
                   ` (25 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-09 13:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |linkw at gcc dot gnu.org

--- Comment #28 from Kewen Lin <linkw at gcc dot gnu.org> ---
Sorry for the breakage, I'll have a look tomorrow.

btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (28 preceding siblings ...)
  2022-08-09 13:26 ` linkw at gcc dot gnu.org
@ 2022-08-09 13:29 ` marxin at gcc dot gnu.org
  2022-08-09 13:30 ` malat at debian dot org
                   ` (24 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-08-09 13:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #29 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #28)
> Sorry for the breakage, I'll have a look tomorrow.
> 
> btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?

No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (29 preceding siblings ...)
  2022-08-09 13:29 ` marxin at gcc dot gnu.org
@ 2022-08-09 13:30 ` malat at debian dot org
  2022-08-09 13:34 ` malat at debian dot org
                   ` (23 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09 13:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #30 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Martin Liška from comment #29)
> (In reply to Kewen Lin from comment #28)
> > Sorry for the breakage, I'll have a look tomorrow.
> > 
> > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> 
> No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.

I could see unit-test failures of highway on most 32bits arch, as well as
mips64el and ppc64be.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (30 preceding siblings ...)
  2022-08-09 13:30 ` malat at debian dot org
@ 2022-08-09 13:34 ` malat at debian dot org
  2022-08-09 13:40 ` linkw at gcc dot gnu.org
                   ` (22 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09 13:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #31 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Mathieu Malaterre from comment #30)
> (In reply to Martin Liška from comment #29)
> > (In reply to Kewen Lin from comment #28)
> > > Sorry for the breakage, I'll have a look tomorrow.
> > > 
> > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > 
> > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> 
> I could see unit-test failures of highway on most 32bits arch, as well as
> mips64el and ppc64be.

For reference complete list is:

* armel
* i386
* mips64el
* mipsel
* powerpc
* ppc64

See:

*
https://buildd.debian.org/status/logs.php?pkg=highway&ver=1.0.1%7Egit20220802.5810c58-3&suite=experimental


(riscv64 is unrelated IMHO).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (31 preceding siblings ...)
  2022-08-09 13:34 ` malat at debian dot org
@ 2022-08-09 13:40 ` linkw at gcc dot gnu.org
  2022-08-09 13:48 ` rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-09 13:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #32 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Mathieu Malaterre from comment #30)
> (In reply to Martin Liška from comment #29)
> > (In reply to Kewen Lin from comment #28)
> > > Sorry for the breakage, I'll have a look tomorrow.
> > > 
> > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > 
> > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> 
> I could see unit-test failures of highway on most 32bits arch, as well as
> mips64el and ppc64be.

Thanks to both guys! I'll try with ppc64 32bit first.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (32 preceding siblings ...)
  2022-08-09 13:40 ` linkw at gcc dot gnu.org
@ 2022-08-09 13:48 ` rguenth at gcc dot gnu.org
  2022-08-09 13:53 ` malat at debian dot org
                   ` (20 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-09 13:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
   Target Milestone|---                         |12.2
           Keywords|                            |wrong-code

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (33 preceding siblings ...)
  2022-08-09 13:48 ` rguenth at gcc dot gnu.org
@ 2022-08-09 13:53 ` malat at debian dot org
  2022-08-09 13:56 ` malat at debian dot org
                   ` (19 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #33 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Kewen Lin from comment #32)
> (In reply to Mathieu Malaterre from comment #30)
> > (In reply to Martin Liška from comment #29)
> > > (In reply to Kewen Lin from comment #28)
> > > > Sorry for the breakage, I'll have a look tomorrow.
> > > > 
> > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > > 
> > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> > 
> > I could see unit-test failures of highway on most 32bits arch, as well as
> > mips64el and ppc64be.
> 
> Thanks to both guys! I'll try with ppc64 32bit first.

Watch out that I've reduced the original test case on my local x86/32bits arch.

It appears that I've lifted way too much code to reproduce the issue on
ppc32/be. Is is ok for you to use instead, reproducer from previous comment:

* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (34 preceding siblings ...)
  2022-08-09 13:53 ` malat at debian dot org
@ 2022-08-09 13:56 ` malat at debian dot org
  2022-08-09 14:01 ` malat at debian dot org
                   ` (18 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09 13:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #34 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Mathieu Malaterre from comment #33)
> (In reply to Kewen Lin from comment #32)
> > (In reply to Mathieu Malaterre from comment #30)
> > > (In reply to Martin Liška from comment #29)
> > > > (In reply to Kewen Lin from comment #28)
> > > > > Sorry for the breakage, I'll have a look tomorrow.
> > > > > 
> > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > > > 
> > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> > > 
> > > I could see unit-test failures of highway on most 32bits arch, as well as
> > > mips64el and ppc64be.
> > 
> > Thanks to both guys! I'll try with ppc64 32bit first.
> 
> Watch out that I've reduced the original test case on my local x86/32bits
> arch.
> 
> It appears that I've lifted way too much code to reproduce the issue on
> ppc32/be. Is is ok for you to use instead, reproducer from previous comment:
> 
> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16

It appears this one is also way too much lifted for proper repro on ppc32/be.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (35 preceding siblings ...)
  2022-08-09 13:56 ` malat at debian dot org
@ 2022-08-09 14:01 ` malat at debian dot org
  2022-08-09 15:28 ` pinskia at gcc dot gnu.org
                   ` (17 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-08-09 14:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #35 from Mathieu Malaterre <malat at debian dot org> ---
(In reply to Mathieu Malaterre from comment #33)
> (In reply to Kewen Lin from comment #32)
> > (In reply to Mathieu Malaterre from comment #30)
> > > (In reply to Martin Liška from comment #29)
> > > > (In reply to Kewen Lin from comment #28)
> > > > > Sorry for the breakage, I'll have a look tomorrow.
> > > > > 
> > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
> > > > 
> > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
> > > 
> > > I could see unit-test failures of highway on most 32bits arch, as well as
> > > mips64el and ppc64be.
> > 
> > Thanks to both guys! I'll try with ppc64 32bit first.
> 
> Watch out that I've reduced the original test case on my local x86/32bits
> arch.
> 
> It appears that I've lifted way too much code to reproduce the issue on
> ppc32/be. Is is ok for you to use instead, reproducer from previous comment:
> 
> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16

Nevermind; I was using gcc-11.

I can reproduce the issue on ppc32/be using the (somewhat) reduced example:

* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c19

For reference:

% g++ -O2 -fno-tree-vectorize *.cc && ./a.out && echo "ok"
ok

But:

% g++ --verbose -O2 *.cc && ./a.out && echo "ok"
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc-linux-gnu/12/lto-wrapper
Target: powerpc-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.1.0-7'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=powerpc-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libitm --disable-libquadmath
--disable-libquadmath-support --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --with-libphobos-druntime-only=yes
--enable-objc-gc=auto --enable-secureplt --disable-softfloat
--with-cpu=default32 --disable-softfloat
--enable-targets=powerpc-linux,powerpc64-linux --enable-multiarch
--disable-werror --with-long-double-128 --enable-multilib
--enable-checking=release --build=powerpc-linux-gnu --host=powerpc-linux-gnu
--target=powerpc-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.1.0 (Debian 12.1.0-7)
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-'
 /usr/lib/gcc/powerpc-linux-gnu/12/cc1plus -quiet -v -imultiarch
powerpc-linux-gnu -D_GNU_SOURCE bytes.cc -msecure-plt -quiet -dumpdir a-
-dumpbase bytes.cc -dumpbase-ext .cc -O2 -version -o /tmp/ccXa9nGd.s
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/include/powerpc-linux-gnu/c++/12"
ignoring nonexistent directory "/usr/local/include/powerpc-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc-linux-gnu/12/../../../../powerpc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/12
 /usr/include/powerpc-linux-gnu/c++/12
 /usr/include/c++/12/backward
 /usr/lib/gcc/powerpc-linux-gnu/12/include
 /usr/local/include
 /usr/include/powerpc-linux-gnu
 /usr/include
End of search list.
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 56cdbc606649bdc6108da73e5dd1af6f
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-'
 as -v -a32 -K PIC -mppc -many -mbig -o /tmp/ccKx6rlb.o /tmp/ccXa9nGd.s
GNU assembler version 2.38.90 (powerpc-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.38.90.20220713
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-'
 /usr/lib/gcc/powerpc-linux-gnu/12/cc1plus -quiet -v -imultiarch
powerpc-linux-gnu -D_GNU_SOURCE demo.cc -msecure-plt -quiet -dumpdir a-
-dumpbase demo.cc -dumpbase-ext .cc -O2 -version -o /tmp/ccXa9nGd.s
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/include/powerpc-linux-gnu/c++/12"
ignoring nonexistent directory "/usr/local/include/powerpc-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc-linux-gnu/12/../../../../powerpc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/12
 /usr/include/powerpc-linux-gnu/c++/12
 /usr/include/c++/12/backward
 /usr/lib/gcc/powerpc-linux-gnu/12/include
 /usr/local/include
 /usr/include/powerpc-linux-gnu
 /usr/include
End of search list.
GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 56cdbc606649bdc6108da73e5dd1af6f
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-'
 as -v -a32 -K PIC -mppc -many -mbig -o /tmp/ccTxltFy.o /tmp/ccXa9nGd.s
GNU assembler version 2.38.90 (powerpc-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.38.90.20220713
COMPILER_PATH=/usr/lib/gcc/powerpc-linux-gnu/12/:/usr/lib/gcc/powerpc-linux-gnu/12/:/usr/lib/gcc/powerpc-linux-gnu/:/usr/lib/gcc/powerpc-linux-gnu/12/:/usr/lib/gcc/powerpc-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/powerpc-linux-gnu/12/:/usr/lib/gcc/powerpc-linux-gnu/12/../../../powerpc-linux-gnu/:/usr/lib/gcc/powerpc-linux-gnu/12/../../../../lib/:/lib/powerpc-linux-gnu/:/lib/../lib/:/usr/lib/powerpc-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/powerpc-linux-gnu/12/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a.'
 /usr/lib/gcc/powerpc-linux-gnu/12/collect2 -plugin
/usr/lib/gcc/powerpc-linux-gnu/12/liblto_plugin.so
-plugin-opt=/usr/lib/gcc/powerpc-linux-gnu/12/lto-wrapper
-plugin-opt=-fresolution=/tmp/ccfJV0ms.res -plugin-opt=-pass-through=-lgcc_s
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc
-plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --build-id
--eh-frame-hdr -V --secure-plt -m elf32ppclinux --hash-style=gnu --as-needed
-dynamic-linker /lib/ld.so.1 -pie
/usr/lib/gcc/powerpc-linux-gnu/12/../../../powerpc-linux-gnu/Scrt1.o
/usr/lib/gcc/powerpc-linux-gnu/12/../../../powerpc-linux-gnu/crti.o
/usr/lib/gcc/powerpc-linux-gnu/12/crtbeginS.o
-L/usr/lib/gcc/powerpc-linux-gnu/12
-L/usr/lib/gcc/powerpc-linux-gnu/12/../../../powerpc-linux-gnu
-L/usr/lib/gcc/powerpc-linux-gnu/12/../../../../lib -L/lib/powerpc-linux-gnu
-L/lib/../lib -L/usr/lib/powerpc-linux-gnu -L/usr/lib/../lib
-L/usr/lib/gcc/powerpc-linux-gnu/12/../../.. /tmp/ccKx6rlb.o /tmp/ccTxltFy.o
-lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc
/usr/lib/gcc/powerpc-linux-gnu/12/crtendS.o
/usr/lib/gcc/powerpc-linux-gnu/12/../../../powerpc-linux-gnu/crtn.o
GNU ld (GNU Binutils for Debian) 2.38.90.20220713
  Supported emulations:
   elf32ppclinux
   elf32ppc
   elf32ppcsim
   elf32lppclinux
   elf32lppc
   elf32lppcsim
   elf64ppc
   elf64lppc
COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a.'
zsh: IOT instruction  ./a.out

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (36 preceding siblings ...)
  2022-08-09 14:01 ` malat at debian dot org
@ 2022-08-09 15:28 ` pinskia at gcc dot gnu.org
  2022-08-10  5:25 ` linkw at gcc dot gnu.org
                   ` (16 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-08-09 15:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #36 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
You might need to do -O2 -fPIE -pie to reproduce the issue as debian is
configured with --enable-default-pie

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (37 preceding siblings ...)
  2022-08-09 15:28 ` pinskia at gcc dot gnu.org
@ 2022-08-10  5:25 ` linkw at gcc dot gnu.org
  2022-08-10  5:34 ` linkw at gcc dot gnu.org
                   ` (15 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-10  5:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #37 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #36)
> You might need to do -O2 -fPIE -pie to reproduce the issue as debian is
> configured with --enable-default-pie

Thanks for the hint! I can reproduce this but it needs one more explicit cpu
type like -mcpu=power4/5/6. The problem comes from slp1, so
-fno-tree-slp-vectorize can make it pass.

It seems to expose one latent issue, for the code in vect_recog_mulhs_pattern:

  vect_pattern_detected ("vect_recog_mulhs_pattern", last_stmt);

  /* Check for target support.  */
  tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
  if (!new_vectype
      || !direct_internal_fn_supported_p
            (ifn, new_vectype, OPTIMIZE_FOR_SPEED))
    return NULL;

At this time, the new_vectype is 

(gdb) pge new_vectype
vector(2) short unsigned int

the current target doesn't support umul_highpart optab for V2HImode at all, but
the check doesn't fail since in the function direct_optab_supported_p

static bool
direct_optab_supported_p (direct_optab optab, tree_pair types,
                          optimization_type opt_type)
{
  machine_mode mode = TYPE_MODE (types.first);
  gcc_checking_assert (mode == TYPE_MODE (types.second));
  return direct_optab_handler (optab, mode, opt_type) != CODE_FOR_nothing;
}

(gdb) pge types.first
vector(2) short unsigned int
(gdb) p mode
$12 = E_SImode

the current target does support umul_highpart optab for SImode, so it doesn't
fail. But we expected to query with vector mode for the given type, it's wrong
in functionality to use scalar insn for vector operation here, so this result
is unexpected.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (38 preceding siblings ...)
  2022-08-10  5:25 ` linkw at gcc dot gnu.org
@ 2022-08-10  5:34 ` linkw at gcc dot gnu.org
  2022-08-10  6:03 ` pinskia at gcc dot gnu.org
                   ` (14 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-10  5:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #38 from Kewen Lin <linkw at gcc dot gnu.org> ---
Created attachment 53428
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53428&action=edit
untested patch

A untested patch which can make it pass.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (39 preceding siblings ...)
  2022-08-10  5:34 ` linkw at gcc dot gnu.org
@ 2022-08-10  6:03 ` pinskia at gcc dot gnu.org
  2022-08-10  6:24 ` linkw at gcc dot gnu.org
                   ` (13 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-08-10  6:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #39 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Comment on attachment 53428
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53428
untested patch

>diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>index d666ccccf67..7d8b4ac2200 100644
>--- a/gcc/internal-fn.cc
>+++ b/gcc/internal-fn.cc
>@@ -3750,7 +3750,12 @@ static bool
> direct_optab_supported_p (direct_optab optab, tree_pair types,
> 			  optimization_type opt_type)
> {
>-  machine_mode mode = TYPE_MODE (types.first);
>+  tree type = types.first;
>+  machine_mode mode = TYPE_MODE (type);
>+  /* Scalar mode optab can't work for vector type, return false if
>+     the given type is vector type but the mode isn't vector mode.  */
>+  if (VECTOR_TYPE_P (type) != VECTOR_MODE_P (mode))
>+    return false;

There are a few which scalar mode and vector types can differ and still work:
IOR, XOR, and AND. I wonder if those should be special cased here or somewhere
else.

>   gcc_checking_assert (mode == TYPE_MODE (types.second));
>   return direct_optab_handler (optab, mode, opt_type) != CODE_FOR_nothing;
> }
>@@ -3763,6 +3768,12 @@ static bool
> convert_optab_supported_p (convert_optab optab, tree_pair types,
> 			   optimization_type opt_type)
> {
>+  tree type = types.first;
>+  machine_mode mode = TYPE_MODE (type);
>+  /* Scalar mode optab can't work for vector type, return false if
>+     the given type is vector type but the mode isn't vector mode.  */
>+  if (VECTOR_TYPE_P (type) != VECTOR_MODE_P (mode))
>+    return false;
>   return (convert_optab_handler (optab, TYPE_MODE (types.first),
> 				 TYPE_MODE (types.second), opt_type)
> 	  != CODE_FOR_nothing);
>@@ -3778,6 +3789,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>   gcc_assert (TREE_CODE (types.first) == ARRAY_TYPE);
>   machine_mode imode = TYPE_MODE (types.first);
>   machine_mode vmode = TYPE_MODE (TREE_TYPE (types.first));
>+  gcc_assert (VECTOR_MODE_P (vmode));
>   return (convert_optab_handler (optab, imode, vmode, opt_type)
> 	  != CODE_FOR_nothing);
> }

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (40 preceding siblings ...)
  2022-08-10  6:03 ` pinskia at gcc dot gnu.org
@ 2022-08-10  6:24 ` linkw at gcc dot gnu.org
  2022-08-10  9:47 ` linkw at gcc dot gnu.org
                   ` (12 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-10  6:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #40 from Kewen Lin <linkw at gcc dot gnu.org> ---
> >diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >index d666ccccf67..7d8b4ac2200 100644
> >--- a/gcc/internal-fn.cc
> >+++ b/gcc/internal-fn.cc
> >@@ -3750,7 +3750,12 @@ static bool
> > direct_optab_supported_p (direct_optab optab, tree_pair types,
> > 			  optimization_type opt_type)
> > {
> >-  machine_mode mode = TYPE_MODE (types.first);
> >+  tree type = types.first;
> >+  machine_mode mode = TYPE_MODE (type);
> >+  /* Scalar mode optab can't work for vector type, return false if
> >+     the given type is vector type but the mode isn't vector mode.  */
> >+  if (VECTOR_TYPE_P (type) != VECTOR_MODE_P (mode))
> >+    return false;
> 
> There are a few which scalar mode and vector types can differ and still
> work: IOR, XOR, and AND. I wonder if those should be special cased here or
> somewhere else.

Good point! This is overkill then. Not sure if there is this kind of routine to
special case them.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (41 preceding siblings ...)
  2022-08-10  6:24 ` linkw at gcc dot gnu.org
@ 2022-08-10  9:47 ` linkw at gcc dot gnu.org
  2022-08-10 12:32 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-10  9:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #41 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #40)
> > >diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > >index d666ccccf67..7d8b4ac2200 100644
> > >--- a/gcc/internal-fn.cc
> > >+++ b/gcc/internal-fn.cc
> > >@@ -3750,7 +3750,12 @@ static bool
> > > direct_optab_supported_p (direct_optab optab, tree_pair types,
> > > 			  optimization_type opt_type)
> > > {
> > >-  machine_mode mode = TYPE_MODE (types.first);
> > >+  tree type = types.first;
> > >+  machine_mode mode = TYPE_MODE (type);
> > >+  /* Scalar mode optab can't work for vector type, return false if
> > >+     the given type is vector type but the mode isn't vector mode.  */
> > >+  if (VECTOR_TYPE_P (type) != VECTOR_MODE_P (mode))
> > >+    return false;
> > 
> > There are a few which scalar mode and vector types can differ and still
> > work: IOR, XOR, and AND. I wonder if those should be special cased here or
> > somewhere else.
> 
> Good point! This is overkill then. Not sure if there is this kind of routine
> to special case them.

When I was cooking one function to special case Andrew's concerns, I realized
that the touched functions direct_optab_supported_p, convert_optab_supported_p
and multi_vector_optab_supported_p are only for optabs used in internal-fn.def,
for now there are not {and,ior,xor}_optab or their similars (I quickly went
through binary/unary ones). So it seems we don't need to consider this for now?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (42 preceding siblings ...)
  2022-08-10  9:47 ` linkw at gcc dot gnu.org
@ 2022-08-10 12:32 ` rguenth at gcc dot gnu.org
  2022-08-10 12:36 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-10 12:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #42 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think this goes wrong in vectorizable_operation which does

  if (using_emulated_vectors_p
      && !vect_can_vectorize_without_simd_p (code))

to guard this but I'm not sure how this slips through?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (43 preceding siblings ...)
  2022-08-10 12:32 ` rguenth at gcc dot gnu.org
@ 2022-08-10 12:36 ` rguenth at gcc dot gnu.org
  2022-08-11  1:18 ` linkw at gcc dot gnu.org
                   ` (9 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-10 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #43 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #42)
> I think this goes wrong in vectorizable_operation which does
> 
>   if (using_emulated_vectors_p
>       && !vect_can_vectorize_without_simd_p (code))
> 
> to guard this but I'm not sure how this slips through?

Ah, it's an internal function.  I think we should simply return false
during analysis for any vect_emulated_vector_p type in vectorizable_call.

Alternatively pattern recognition could also be made to fail but the above
is definitely more future proof.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (44 preceding siblings ...)
  2022-08-10 12:36 ` rguenth at gcc dot gnu.org
@ 2022-08-11  1:18 ` linkw at gcc dot gnu.org
  2022-08-15  6:51 ` linkw at gcc dot gnu.org
                   ` (8 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-11  1:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #44 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #43)
> (In reply to Richard Biener from comment #42)
> > I think this goes wrong in vectorizable_operation which does
> > 
> >   if (using_emulated_vectors_p
> >       && !vect_can_vectorize_without_simd_p (code))
> > 
> > to guard this but I'm not sure how this slips through?
> 
> Ah, it's an internal function.  I think we should simply return false
> during analysis for any vect_emulated_vector_p type in vectorizable_call.
> 
> Alternatively pattern recognition could also be made to fail but the above
> is definitely more future proof.

Thanks for the pointer!  I think you meant:

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c9534ef9b1e..ee10fa3e0fb 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3388,6 +3388,14 @@ vectorizable_call (vec_info *vinfo,
       return false;
     }

+  if (vect_emulated_vector_p (vectype_in) || vect_emulated_vector_p
(vectype_out))
+  {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "use emulated vector type for call\n");
+      return false;
+  }
+
   /* FORNOW */
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);

Will kick off some testings on x64/aarch64/ppc64{,le} and post it later.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (45 preceding siblings ...)
  2022-08-11  1:18 ` linkw at gcc dot gnu.org
@ 2022-08-15  6:51 ` linkw at gcc dot gnu.org
  2022-08-16  5:50 ` cvs-commit at gcc dot gnu.org
                   ` (7 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-15  6:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #45 from Kewen Lin <linkw at gcc dot gnu.org> ---
One reduced C test case:

#define N 64
typedef unsigned short int uh;
typedef unsigned short int uw;
uh a[N];
uh b[N];
uh c[N];
uh e[N];

__attribute__ ((noipa)) void
foo ()
{
  for (int i = 0; i < N; i++)
    c[i] = ((uw) b[i] * (uw) a[i]) >> 16;
}

__attribute__ ((optimize ("-O0"))) void
init ()
{
  for (int i = 0; i < N; i++)
    {
      a[i] = (uh) (0x7ABC - 0x5 * i);
      b[i] = (uh) (0xEAB + 0xF * i);
      e[i] = ((uw) b[i] * (uw) a[i]) >> 16;
    }
}

__attribute__ ((optimize ("-O0"))) void
check ()
{
  for (int i = 0; i < N; i++)
    {
      if (c[i] != e[i])
        __builtin_abort ();
    }
}

int
main ()
{
  init ();
  foo ();
  check ();

  return 0;
}

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (46 preceding siblings ...)
  2022-08-15  6:51 ` linkw at gcc dot gnu.org
@ 2022-08-16  5:50 ` cvs-commit at gcc dot gnu.org
  2022-08-24  2:31 ` [Bug tree-optimization/106322] [12 " cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-16  5:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #46 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:5239e2bd48fb1e6a1d1b06a1bac49bee0a742e98

commit r13-2061-g5239e2bd48fb1e6a1d1b06a1bac49bee0a742e98
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Tue Aug 16 00:18:51 2022 -0500

    vect: Don't allow vect_emulated_vector_p type in vectorizable_call
[PR106322]

    As PR106322 shows, in some cases for some vector type whose
    TYPE_MODE is a scalar integral mode instead of a vector mode,
    it's possible to obtain wrong target support information when
    querying with the scalar integral mode.  For example, for the
    test case in PR106322, on ppc64 32bit vectorizer gets vector
    type "vector(2) short unsigned int" for scalar type "short
    unsigned int", its mode is SImode instead of V2HImode.  The
    target support querying checks umul_highpart optab with SImode
    and considers it's supported, then vectorizer further generates
    .MULH IFN call for that vector type.  Unfortunately it's wrong
    to use SImode support for that vector type multiply highpart
    here.

    This patch is to teach vectorizable_call analysis not to allow
    vect_emulated_vector_p type for both vectype_in and vectype_out
    as Richi suggested.

            PR tree-optimization/106322

    gcc/ChangeLog:

            * tree-vect-stmts.cc (vectorizable_call): Don't allow
            vect_emulated_vector_p type for both vectype_in and vectype_out.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr106322.c: New test.
            * gcc.target/powerpc/pr106322.c: New test.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (47 preceding siblings ...)
  2022-08-16  5:50 ` cvs-commit at gcc dot gnu.org
@ 2022-08-24  2:31 ` cvs-commit at gcc dot gnu.org
  2022-08-24  2:53 ` linkw at gcc dot gnu.org
                   ` (5 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-24  2:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #48 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:9f532fec01d6651cc3cc136073f044a7953d8560

commit r12-8710-g9f532fec01d6651cc3cc136073f044a7953d8560
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Tue Aug 16 00:18:51 2022 -0500

    vect: Don't allow vect_emulated_vector_p type in vectorizable_call
[PR106322]

    As PR106322 shows, in some cases for some vector type whose
    TYPE_MODE is a scalar integral mode instead of a vector mode,
    it's possible to obtain wrong target support information when
    querying with the scalar integral mode.  For example, for the
    test case in PR106322, on ppc64 32bit vectorizer gets vector
    type "vector(2) short unsigned int" for scalar type "short
    unsigned int", its mode is SImode instead of V2HImode.  The
    target support querying checks umul_highpart optab with SImode
    and considers it's supported, then vectorizer further generates
    .MULH IFN call for that vector type.  Unfortunately it's wrong
    to use SImode support for that vector type multiply highpart
    here.

    This patch is to teach vectorizable_call analysis not to allow
    vect_emulated_vector_p type for both vectype_in and vectype_out
    as Richi suggested.

            PR tree-optimization/106322

    gcc/ChangeLog:

            * tree-vect-stmts.cc (vectorizable_call): Don't allow
            vect_emulated_vector_p type for both vectype_in and vectype_out.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr106322.c: New test.
            * gcc.target/powerpc/pr106322.c: New test.

    (cherry picked from commit 5239e2bd48fb1e6a1d1b06a1bac49bee0a742e98)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (48 preceding siblings ...)
  2022-08-24  2:31 ` [Bug tree-optimization/106322] [12 " cvs-commit at gcc dot gnu.org
@ 2022-08-24  2:53 ` linkw at gcc dot gnu.org
  2022-08-24  6:51 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-08-24  2:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #49 from Kewen Lin <linkw at gcc dot gnu.org> ---
Hi Richi,

One thing I'm not sure about is that if we want to backport this to gcc-11 and
gcc-10? Although the failure got exposed by .MULH pattern recog which is only
in gcc-12, IMHO the underlying issue exists in gcc-10 and gcc-11.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (49 preceding siblings ...)
  2022-08-24  2:53 ` linkw at gcc dot gnu.org
@ 2022-08-24  6:51 ` rguenth at gcc dot gnu.org
  2022-09-27 14:14 ` malat at debian dot org
                   ` (3 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-08-24  6:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
      Known to work|                            |12.2.1
             Status|ASSIGNED                    |RESOLVED
      Known to fail|                            |12.2.0

--- Comment #50 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #49)
> Hi Richi,
> 
> One thing I'm not sure about is that if we want to backport this to gcc-11
> and gcc-10? Although the failure got exposed by .MULH pattern recog which is
> only in gcc-12, IMHO the underlying issue exists in gcc-10 and gcc-11.

It's enough to backport to 12.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (50 preceding siblings ...)
  2022-08-24  6:51 ` rguenth at gcc dot gnu.org
@ 2022-09-27 14:14 ` malat at debian dot org
  2022-09-27 14:18 ` malat at debian dot org
                   ` (2 subsequent siblings)
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-09-27 14:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Mathieu Malaterre <malat at debian dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #51 from Mathieu Malaterre <malat at debian dot org> ---
@Kewen I am using the latest gcc-12 update from doko@d.o:

*
https://tracker.debian.org/news/1363780/accepted-gcc-12-1220-3-source-into-unstable/


It does include the patch for PR/106322 but I am getting exactly the same
behavior as before:

```
% gcc -v -O2 -mtune=generic -march=i686 106322.c && ./a.out
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/12/lto-wrapper
Target: i686-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-3'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=i686-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-targets=all
--enable-multiarch --disable-werror --with-arch-32=i686
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-checking=release --build=i686-linux-gnu --host=i686-linux-gnu
--target=i686-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-3)
COLLECT_GCC_OPTIONS='-v' '-O2' '-mtune=generic' '-march=i686' '-dumpdir' 'a-'
 /usr/lib/gcc/i686-linux-gnu/12/cc1 -quiet -v -imultiarch i386-linux-gnu
106322.c -quiet -dumpdir a- -dumpbase 106322.c -dumpbase-ext .c -mtune=generic
-march=i686 -O2 -version -fasynchronous-unwind-tables -o /tmp/ccKGWuAf.s
GNU C17 (Debian 12.2.0-3) version 12.2.0 (i686-linux-gnu)
        compiled by GNU C version 12.2.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/i686-linux-gnu/12/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/i686-linux-gnu/12/../../../../i686-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/i686-linux-gnu/12/include
 /usr/local/include
 /usr/include/i386-linux-gnu
 /usr/include
End of search list.
GNU C17 (Debian 12.2.0-3) version 12.2.0 (i686-linux-gnu)
        compiled by GNU C version 12.2.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: bab440224e23e29e673aafc5eddaffb6
COLLECT_GCC_OPTIONS='-v' '-O2' '-mtune=generic' '-march=i686' '-dumpdir' 'a-'
 as -v --32 -o /tmp/ccKi1Hwd.o /tmp/ccKGWuAf.s
GNU assembler version 2.39 (i686-linux-gnu) using BFD version (GNU Binutils for
Debian) 2.39
COMPILER_PATH=/usr/lib/gcc/i686-linux-gnu/12/:/usr/lib/gcc/i686-linux-gnu/12/:/usr/lib/gcc/i686-linux-gnu/:/usr/lib/gcc/i686-linux-gnu/12/:/usr/lib/gcc/i686-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/i686-linux-gnu/12/:/usr/lib/gcc/i686-linux-gnu/12/../../../i386-linux-gnu/:/usr/lib/gcc/i686-linux-gnu/12/../../../../lib/:/lib/i386-linux-gnu/:/lib/../lib/:/usr/lib/i386-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/i686-linux-gnu/12/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-O2' '-mtune=generic' '-march=i686' '-dumpdir' 'a.'
 /usr/lib/gcc/i686-linux-gnu/12/collect2 -plugin
/usr/lib/gcc/i686-linux-gnu/12/liblto_plugin.so
-plugin-opt=/usr/lib/gcc/i686-linux-gnu/12/lto-wrapper
-plugin-opt=-fresolution=/tmp/ccJ6oy3P.res -plugin-opt=-pass-through=-lgcc
-plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id
--eh-frame-hdr -m elf_i386 --hash-style=gnu --as-needed -dynamic-linker
/lib/ld-linux.so.2 -pie
/usr/lib/gcc/i686-linux-gnu/12/../../../i386-linux-gnu/Scrt1.o
/usr/lib/gcc/i686-linux-gnu/12/../../../i386-linux-gnu/crti.o
/usr/lib/gcc/i686-linux-gnu/12/crtbeginS.o -L/usr/lib/gcc/i686-linux-gnu/12
-L/usr/lib/gcc/i686-linux-gnu/12/../../../i386-linux-gnu
-L/usr/lib/gcc/i686-linux-gnu/12/../../../../lib -L/lib/i386-linux-gnu
-L/lib/../lib -L/usr/lib/i386-linux-gnu -L/usr/lib/../lib
-L/usr/lib/gcc/i686-linux-gnu/12/../../.. /tmp/ccKi1Hwd.o -lgcc --push-state
--as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s
--pop-state /usr/lib/gcc/i686-linux-gnu/12/crtendS.o
/usr/lib/gcc/i686-linux-gnu/12/../../../i386-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS='-v' '-O2' '-mtune=generic' '-march=i686' '-dumpdir' 'a.'
zsh: IOT instruction  ./a.out
```

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (51 preceding siblings ...)
  2022-09-27 14:14 ` malat at debian dot org
@ 2022-09-27 14:18 ` malat at debian dot org
  2022-09-28  6:11 ` malat at debian dot org
  2022-09-28  6:26 ` linkw at gcc dot gnu.org
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-09-27 14:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #52 from Mathieu Malaterre <malat at debian dot org> ---
For comparison, gcc-snapshot taken from trunk is working as expected:

* https://packages.qa.debian.org/g/gcc-snapshot/news/20220920T113715Z.html

% /usr/lib/gcc-snapshot/bin/cc -v -O2 -mtune=generic -march=i686 106322.c &&
./a.out && echo "success"
Using built-in specs.
COLLECT_GCC=/usr/lib/gcc-snapshot/bin/cc
COLLECT_LTO_WRAPPER=/usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/13/lto-wrapper
Target: i686-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 20220920-1'
--with-bugurl=file:///usr/share/doc/gcc-snapshot/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++
--prefix=/usr/lib/gcc-snapshot --with-gcc-major-version-only --program-prefix=
--enable-shared --enable-linker-build-id --disable-nls --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-targets=all --enable-multiarch --disable-werror
--with-arch-32=i686 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic --enable-checking=yes --build=i686-linux-gnu
--host=i686-linux-gnu --target=i686-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.0 20220920 (experimental) [master r13-2730-gd0c73b6c856]
(Debian 20220920-1)
COLLECT_GCC_OPTIONS='-v' '-O2' '-mtune=generic' '-march=i686' '-dumpdir' 'a-'
 /usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/13/cc1 -quiet -v -imultiarch
i386-linux-gnu 106322.c -quiet -dumpdir a- -dumpbase 106322.c -dumpbase-ext .c
-mtune=generic -march=i686 -O2 -version -o /tmp/ccUhEkRq.s
GNU C17 (Debian 20220920-1) version 13.0.0 20220920 (experimental) [master
r13-2730-gd0c73b6c856] (i686-linux-gnu)
        compiled by GNU C version 13.0.0 20220920 (experimental) [master
r13-2730-gd0c73b6c856], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/../../../../i686-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/include
 /usr/local/include
 /usr/lib/gcc-snapshot/include
 /usr/include/i386-linux-gnu
 /usr/include
End of search list.
GNU C17 (Debian 20220920-1) version 13.0.0 20220920 (experimental) [master
r13-2730-gd0c73b6c856] (i686-linux-gnu)
        compiled by GNU C version 13.0.0 20220920 (experimental) [master
r13-2730-gd0c73b6c856], GMP version 6.2.1, MPFR version 4.1.0, MPC version
1.2.1, isl version isl-0.25-GMP

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: ac95e449fd0e2a349cb0956fc7734c9a
COLLECT_GCC_OPTIONS='-v' '-O2' '-mtune=generic' '-march=i686' '-dumpdir' 'a-'
 as -v --32 -o /tmp/ccNPHzVO.o /tmp/ccUhEkRq.s
GNU assembler version 2.39 (i686-linux-gnu) using BFD version (GNU Binutils for
Debian) 2.39
COMPILER_PATH=/usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/13/:/usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/13/:/usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/:/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/:/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/:/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/../../../../lib/:/lib/i386-linux-gnu/:/lib/../lib/:/usr/lib/i386-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-O2' '-mtune=generic' '-march=i686' '-dumpdir' 'a.'
 /usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/13/collect2 -plugin
/usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/13/liblto_plugin.so
-plugin-opt=/usr/lib/gcc-snapshot/libexec/gcc/i686-linux-gnu/13/lto-wrapper
-plugin-opt=-fresolution=/tmp/ccYP5awc.res -plugin-opt=-pass-through=-lgcc
-plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id
--eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2
/lib/i386-linux-gnu/crt1.o /lib/i386-linux-gnu/crti.o
/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/crtbegin.o
-L/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13
-L/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/../../../../lib
-L/lib/i386-linux-gnu -L/lib/../lib -L/usr/lib/i386-linux-gnu -L/usr/lib/../lib
-L/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/../../.. /tmp/ccNPHzVO.o
-lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state
--as-needed -lgcc_s --pop-state
/usr/lib/gcc-snapshot/lib/gcc/i686-linux-gnu/13/crtend.o
/lib/i386-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS='-v' '-O2' '-mtune=generic' '-march=i686' '-dumpdir' 'a.'
success

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (52 preceding siblings ...)
  2022-09-27 14:18 ` malat at debian dot org
@ 2022-09-28  6:11 ` malat at debian dot org
  2022-09-28  6:26 ` linkw at gcc dot gnu.org
  54 siblings, 0 replies; 56+ messages in thread
From: malat at debian dot org @ 2022-09-28  6:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

Mathieu Malaterre <malat at debian dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|REOPENED                    |RESOLVED

--- Comment #53 from Mathieu Malaterre <malat at debian dot org> ---
Closing. I do not believe that Debian gcc-12 (12.2.0-3) really is an update to
git 20220920 from the gcc-12 branch. Sorry for the noise.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Bug tree-optimization/106322] [12 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
  2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
                   ` (53 preceding siblings ...)
  2022-09-28  6:11 ` malat at debian dot org
@ 2022-09-28  6:26 ` linkw at gcc dot gnu.org
  54 siblings, 0 replies; 56+ messages in thread
From: linkw at gcc dot gnu.org @ 2022-09-28  6:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #54 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Mathieu Malaterre from comment #53)
> Closing. I do not believe that Debian gcc-12 (12.2.0-3) really is an update
> to git 20220920 from the gcc-12 branch. Sorry for the noise.

OK. Thanks for the update! FWIW, with the reduced test case in comment #45, I
just tried with releases/gcc-12 r12-8709 (without my fix), the aborting was
reproduced; while with r12-8710 (my fix), it executed well.

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2022-09-28  6:26 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-16  9:48 [Bug c++/106322] New: i386: Wrong code at O2 level (O0 / O1 are working) malat at debian dot org
2022-07-16  9:55 ` [Bug c++/106322] " malat at debian dot org
2022-07-16 10:00 ` malat at debian dot org
2022-07-16 10:00 ` malat at debian dot org
2022-07-16 10:02 ` malat at debian dot org
2022-07-16 10:02 ` malat at debian dot org
2022-07-16 10:07 ` malat at debian dot org
2022-07-16 10:15 ` malat at debian dot org
2022-07-17 20:20 ` [Bug target/106322] " pinskia at gcc dot gnu.org
2022-07-18  8:48 ` marxin at gcc dot gnu.org
2022-07-18 14:40 ` malat at debian dot org
2022-07-19  7:58 ` ubizjak at gmail dot com
2022-08-03  8:41 ` malat at debian dot org
2022-08-03 12:31 ` [Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) malat at debian dot org
2022-08-03 12:32 ` malat at debian dot org
2022-08-03 12:33 ` malat at debian dot org
2022-08-05 13:14 ` [Bug tree-optimization/106322] " malat at debian dot org
2022-08-08  7:12 ` malat at debian dot org
2022-08-08  7:20 ` malat at debian dot org
2022-08-08 10:00 ` malat at debian dot org
2022-08-09  7:50 ` malat at debian dot org
2022-08-09 12:36 ` marxin at gcc dot gnu.org
2022-08-09 12:58 ` malat at debian dot org
2022-08-09 13:00 ` ubizjak at gmail dot com
2022-08-09 13:03 ` malat at debian dot org
2022-08-09 13:04 ` marxin at gcc dot gnu.org
2022-08-09 13:05 ` malat at debian dot org
2022-08-09 13:11 ` [Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5 marxin at gcc dot gnu.org
2022-08-09 13:12 ` marxin at gcc dot gnu.org
2022-08-09 13:26 ` linkw at gcc dot gnu.org
2022-08-09 13:29 ` marxin at gcc dot gnu.org
2022-08-09 13:30 ` malat at debian dot org
2022-08-09 13:34 ` malat at debian dot org
2022-08-09 13:40 ` linkw at gcc dot gnu.org
2022-08-09 13:48 ` rguenth at gcc dot gnu.org
2022-08-09 13:53 ` malat at debian dot org
2022-08-09 13:56 ` malat at debian dot org
2022-08-09 14:01 ` malat at debian dot org
2022-08-09 15:28 ` pinskia at gcc dot gnu.org
2022-08-10  5:25 ` linkw at gcc dot gnu.org
2022-08-10  5:34 ` linkw at gcc dot gnu.org
2022-08-10  6:03 ` pinskia at gcc dot gnu.org
2022-08-10  6:24 ` linkw at gcc dot gnu.org
2022-08-10  9:47 ` linkw at gcc dot gnu.org
2022-08-10 12:32 ` rguenth at gcc dot gnu.org
2022-08-10 12:36 ` rguenth at gcc dot gnu.org
2022-08-11  1:18 ` linkw at gcc dot gnu.org
2022-08-15  6:51 ` linkw at gcc dot gnu.org
2022-08-16  5:50 ` cvs-commit at gcc dot gnu.org
2022-08-24  2:31 ` [Bug tree-optimization/106322] [12 " cvs-commit at gcc dot gnu.org
2022-08-24  2:53 ` linkw at gcc dot gnu.org
2022-08-24  6:51 ` rguenth at gcc dot gnu.org
2022-09-27 14:14 ` malat at debian dot org
2022-09-27 14:18 ` malat at debian dot org
2022-09-28  6:11 ` malat at debian dot org
2022-09-28  6:26 ` linkw at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).