[Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
@ 2021-11-23 19:20 jschoen4 at gmail dot com
  2021-11-24  8:57 ` [Bug tree-optimization/103393] [12 Regression] Generating " rguenth at gcc dot gnu.org
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: jschoen4 at gmail dot com @ 2021-11-23 19:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

            Bug ID: 103393
           Summary: [ 12 Regression ] Auto vectorizer generating 256bit
                    register usage with -mprefer-avx128
                    -mprefer-vector-width=128
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jschoen4 at gmail dot com
  Target Milestone: ---

gcc -v
Using built-in specs.
COLLECT_GCC=/gcc_build/bin/gcc
COLLECT_LTO_WRAPPER=/gcc_build/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/gcc_build --include=/gcc_build/include
--disable-multilib --enable-rpath --enable-__cxa_atexit --enable-nls
--disable-checking --disable-libunwind-exceptions --enable-bootstrap
--enable-shared --enable-static --enable-threads=posix --with-gcc --with-gnu-as
--with-gnu-ld --with-system-zlib
--enable-languages=c,c++,fortran,go,objc,obj-c++ --enable-lto
--enable-stage1-languages=c
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20211123 (experimental) (GCC)

Branch: trunk, w/ a latest commit of 721d8b9e26bf8205c1f2125c2626919a408cdbe4

===========
=TEST CODE=
===========
# cat test.cpp
struct TestData {
  float arr[8];
};
void cpy( TestData& s1, TestData& s2 ) {
  for(int i=0; i<8; ++i) {
    s1.arr[i] = s2.arr[i];
  }
}

===========
=cmd      =
===========
gcc -S -masm=intel -O2 -mavx -mprefer-avx128 -mprefer-vector-width=128 -Wall
-Wextra test.cpp -o test.s

===========
=BAD ASM  =
= GCC 12  =
===========
cat test.s
        .file   "test.cpp"
        .intel_syntax noprefix
        .text
        .p2align 4
        .globl  _Z3cpyR8TestDataS0_
        .type   _Z3cpyR8TestDataS0_, @function
_Z3cpyR8TestDataS0_:
.LFB0:
        .cfi_startproc
        vmovdqu ymm0, YMMWORD PTR [rsi]
        vmovdqu YMMWORD PTR [rdi], ymm0
        vzeroupper
        ret
        .cfi_endproc
.LFE0:
        .size   _Z3cpyR8TestDataS0_, .-_Z3cpyR8TestDataS0_
        .ident  "GCC: (GNU) 12.0.0 20211123 (experimental)"
        .section        .note.GNU-stack,"",@progbits

===========
= GCC 11  = (GCC 10 generates identical asm)
===========
cat test.s
        .file   "test.cpp"
        .intel_syntax noprefix
        .text
        .p2align 4
        .globl  _Z3cpyR8TestDataS0_
        .type   _Z3cpyR8TestDataS0_, @function
_Z3cpyR8TestDataS0_:
.LFB0:
        .cfi_startproc
        mov     edx, 32
        jmp     memmove
        .cfi_endproc
.LFE0:
        .size   _Z3cpyR8TestDataS0_, .-_Z3cpyR8TestDataS0_
        .ident  "GCC: (GNU) 11.2.0"
        .section        .note.GNU-stack,"",@progbits

=========
= GCC 9 =
=========
cat test.s
        .file   "test.cpp"
        .intel_syntax noprefix
        .text
        .p2align 4
        .globl  _Z3cpyR8TestDataS0_
        .type   _Z3cpyR8TestDataS0_, @function
_Z3cpyR8TestDataS0_:
.LFB0:
        .cfi_startproc
        xor     eax, eax
        .p2align 4,,10
        .p2align 3
.L2:
        vmovss  xmm0, DWORD PTR [rsi+rax]
        vmovss  DWORD PTR [rdi+rax], xmm0
        add     rax, 4
        cmp     rax, 32
        jne     .L2
        ret
        .cfi_endproc
.LFE0:
        .size   _Z3cpyR8TestDataS0_, .-_Z3cpyR8TestDataS0_
        .ident  "GCC: (GNU) 9.3.0"
        .section        .note.GNU-stack,"",@progbits




The auto vectorizer is generating YMM / 256-bit vector instructions with
-mprefer-avx128 and -mprefer-vector-width=128 flags specified.  This is an
issue for low latency software. Using registers 256-bit and wider causes jitter
CPU problems on sky lake / cascade lake / ice lake chips.  This is true even in
cases where the instructions used are considered avx256-light instructions due
to a "mix of instructions" being used to determine the power levels (this is
also mentioned in intel's optimization manual).

Auto vectorizer needs to respect the prefer width flags.  Enabling/using newer
instruction sets i.e. AVX/AVX2/AVX512 does not require usage of the wider
register types.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
@ 2021-11-24  8:57 ` rguenth at gcc dot gnu.org
  2021-11-24 13:45 ` [Bug target/103393] " hjl.tools at gmail dot com
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-24  8:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
                 CC|                            |hjl.tools at gmail dot com
            Summary|[ 12 Regression ] Auto      |[12 Regression] Generating
                   |vectorizer generating       |256bit register usage with
                   |256bit register usage with  |-mprefer-avx128
                   |-mprefer-avx128             |-mprefer-vector-width=128
                   |-mprefer-vector-width=128   |
             Target|                            |x86_64-*-* i?86-*-*
   Last reconfirmed|                            |2021-11-24

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It isn't the vectorizer but memmove inline expansion.  I'm not sure it's really
a bug, but there isn't a way to disable %ymm use besides disabling AVX
entirely.
HJ?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
  2021-11-24  8:57 ` [Bug tree-optimization/103393] [12 Regression] Generating " rguenth at gcc dot gnu.org
@ 2021-11-24 13:45 ` hjl.tools at gmail dot com
  2021-11-24 13:53 ` rguenth at gcc dot gnu.org
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: hjl.tools at gmail dot com @ 2021-11-24 13:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> really a bug, but there isn't a way to disable %ymm use besides disabling
> AVX entirely.
> HJ?

YMM move is generated by loop distribution which doesn't check
TARGET_PREFER_AVX128.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
  2021-11-24  8:57 ` [Bug tree-optimization/103393] [12 Regression] Generating " rguenth at gcc dot gnu.org
  2021-11-24 13:45 ` [Bug target/103393] " hjl.tools at gmail dot com
@ 2021-11-24 13:53 ` rguenth at gcc dot gnu.org
  2021-11-24 20:38 ` jschoen4 at gmail dot com
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-24 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rearnsha at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #2)
> (In reply to Richard Biener from comment #1)
> > It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> > really a bug, but there isn't a way to disable %ymm use besides disabling
> > AVX entirely.
> > HJ?
> 
> YMM move is generated by loop distribution which doesn't check
> TARGET_PREFER_AVX128.

I think it's generated by gimple_fold_builtin_memory_op which since Richards
changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
picking an integer mode via

              scalar_int_mode mode;
              if (int_mode_for_size (ilen * 8, 0).exists (&mode)
                  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
                  && have_insn_for (SET, mode)
                  /* If the destination pointer is not aligned we must be able
                     to emit an unaligned store.  */
                  && (dest_align >= GET_MODE_ALIGNMENT (mode)
                      || !targetm.slow_unaligned_access (mode, dest_align)
                      || (optab_handler (movmisalign_optab, mode)
                          != CODE_FOR_nothing)))

not sure if there's another way to validate things.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (2 preceding siblings ...)
  2021-11-24 13:53 ` rguenth at gcc dot gnu.org
@ 2021-11-24 20:38 ` jschoen4 at gmail dot com
  2021-11-25  1:15 ` crazylht at gmail dot com
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: jschoen4 at gmail dot com @ 2021-11-24 20:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #4 from John S <jschoen4 at gmail dot com> ---
I can Confirm from my side that it does appear to be the memmove inline
expansion and not the auto vectorizer.  It also occurs with
builtin_memset/builtin_memcpy as well.

For some context, this is an issue would prevent the usage of gcc in my
production environment.  It will certainly impact other use cases outside of my
own as well.  For example, it becomes impossible to use "-mno-vzeroupper -mavx
-mpreferred-vector-width=128" and use _mm256_xxx + _mm256_zeroupper()
intrinsics to properly manage the ymm state (clear or not) since the compiler
is now able to insert ymm's almost anywhere via the memmove inlining.

Up until now the prefer-width has always behaved as in a way that all auto
generated vector uses will not exceed the preferred width.  Only explicit use
of the _mm256/_mm512_ .. intrinsics or the "vector types" i.e. `__m256 var;
__m512 var;` would result in wider register usage.

I do believe Clang/icc behave this way as well and there are dependencies on
this behavior.  The same also applies w/ avx-512 enabled with ZMM usage +
prefer=128/256 where the downclocking issues can be even more pronounced.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (3 preceding siblings ...)
  2021-11-24 20:38 ` jschoen4 at gmail dot com
@ 2021-11-25  1:15 ` crazylht at gmail dot com
  2021-11-25  1:25 ` crazylht at gmail dot com
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: crazylht at gmail dot com @ 2021-11-25  1:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> (In reply to H.J. Lu from comment #2)
> > (In reply to Richard Biener from comment #1)
> > > It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> > > really a bug, but there isn't a way to disable %ymm use besides disabling
> > > AVX entirely.
> > > HJ?
> > 
> > YMM move is generated by loop distribution which doesn't check
> > TARGET_PREFER_AVX128.
> 
> I think it's generated by gimple_fold_builtin_memory_op which since Richards
> changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
> picking an integer mode via
> 
>               scalar_int_mode mode;
>               if (int_mode_for_size (ilen * 8, 0).exists (&mode)
>                   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
>                   && have_insn_for (SET, mode)
>                   /* If the destination pointer is not aligned we must be
> able
>                      to emit an unaligned store.  */
>                   && (dest_align >= GET_MODE_ALIGNMENT (mode)
>                       || !targetm.slow_unaligned_access (mode, dest_align)
>                       || (optab_handler (movmisalign_optab, mode)
>                           != CODE_FOR_nothing)))
> 
> not sure if there's another way to validate things.

For one single set operation, shouldn't the total size be less than MOVE_MAX
instead of MOVE_MAX * MOVE_RATIO?


      /* If we can perform the copy efficiently with first doing all loads and
         then all stores inline it that way.  Currently efficiently means that
         we can load all the memory with a single set operation and that the
         total size is less than MOVE_MAX * MOVE_RATIO.  */
      src_align = get_pointer_alignment (src);
      dest_align = get_pointer_alignment (dest);
      if (tree_fits_uhwi_p (len)
          && (compare_tree_int
              (len, (MOVE_MAX
                     * MOVE_RATIO (optimize_function_for_size_p (cfun))))
              <= 0)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (4 preceding siblings ...)
  2021-11-25  1:15 ` crazylht at gmail dot com
@ 2021-11-25  1:25 ` crazylht at gmail dot com
  2021-11-25  7:16 ` rguenther at suse dot de
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: crazylht at gmail dot com @ 2021-11-25  1:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #5)
> (In reply to Richard Biener from comment #3)
> > (In reply to H.J. Lu from comment #2)
> > > (In reply to Richard Biener from comment #1)
> > > > It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> > > > really a bug, but there isn't a way to disable %ymm use besides disabling
> > > > AVX entirely.
> > > > HJ?
> > > 
> > > YMM move is generated by loop distribution which doesn't check
> > > TARGET_PREFER_AVX128.
> > 
> > I think it's generated by gimple_fold_builtin_memory_op which since Richards
> > changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
> > picking an integer mode via
> > 
> >               scalar_int_mode mode;
> >               if (int_mode_for_size (ilen * 8, 0).exists (&mode)
> >                   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> >                   && have_insn_for (SET, mode)
> >                   /* If the destination pointer is not aligned we must be
> > able
> >                      to emit an unaligned store.  */
> >                   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> >                       || !targetm.slow_unaligned_access (mode, dest_align)
> >                       || (optab_handler (movmisalign_optab, mode)
> >                           != CODE_FOR_nothing)))
> > 
> > not sure if there's another way to validate things.
> 
> For one single set operation, shouldn't the total size be less than MOVE_MAX
> instead of MOVE_MAX * MOVE_RATIO?

r12-3482 change MOVE_MAX to MOVE_MAX * MOVE_RATIO

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (5 preceding siblings ...)
  2021-11-25  1:25 ` crazylht at gmail dot com
@ 2021-11-25  7:16 ` rguenther at suse dot de
  2021-11-25  7:28 ` [Bug middle-end/103393] " rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenther at suse dot de @ 2021-11-25  7:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 25 Nov 2021, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393
> 
> --- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to Hongtao.liu from comment #5)
> > (In reply to Richard Biener from comment #3)
> > > (In reply to H.J. Lu from comment #2)
> > > > (In reply to Richard Biener from comment #1)
> > > > > It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> > > > > really a bug, but there isn't a way to disable %ymm use besides disabling
> > > > > AVX entirely.
> > > > > HJ?
> > > > 
> > > > YMM move is generated by loop distribution which doesn't check
> > > > TARGET_PREFER_AVX128.
> > > 
> > > I think it's generated by gimple_fold_builtin_memory_op which since Richards
> > > changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
> > > picking an integer mode via
> > > 
> > >               scalar_int_mode mode;
> > >               if (int_mode_for_size (ilen * 8, 0).exists (&mode)
> > >                   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> > >                   && have_insn_for (SET, mode)
> > >                   /* If the destination pointer is not aligned we must be
> > > able
> > >                      to emit an unaligned store.  */
> > >                   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> > >                       || !targetm.slow_unaligned_access (mode, dest_align)
> > >                       || (optab_handler (movmisalign_optab, mode)
> > >                           != CODE_FOR_nothing)))
> > > 
> > > not sure if there's another way to validate things.
> > 
> > For one single set operation, shouldn't the total size be less than MOVE_MAX
> > instead of MOVE_MAX * MOVE_RATIO?
> 
> r12-3482 change MOVE_MAX to MOVE_MAX * MOVE_RATIO

Yes, IIRC it was specifically to allow vector register moves on
aarch64/arm which doesn't seem to have a MOVE_MAX that exceeds
WORD_SIZE.  It looks like x86 carefully tries to have a MOVE_MAX
that honors -mprefer-xxx as to not exceed a single move size.

Both seem to be in conflict here.  Richard - why could arm/aarch64
not increase MOVE_MAX here?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (6 preceding siblings ...)
  2021-11-25  7:16 ` rguenther at suse dot de
@ 2021-11-25  7:28 ` rguenth at gcc dot gnu.org
  2021-11-25  7:40 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-25  7:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1
          Component|target                      |middle-end

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
So my suggestion would be to revert the * MOVE_RATIO change.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (7 preceding siblings ...)
  2021-11-25  7:28 ` [Bug middle-end/103393] " rguenth at gcc dot gnu.org
@ 2021-11-25  7:40 ` rguenth at gcc dot gnu.org
  2021-11-25 17:57 ` jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-25  7:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
In particular MOVE_RATIO only looks applicable if the target (or RTL
expansion?) would split the bigger GIMPLE move into pieces honoring MOVE_MAX. 
Though technically even MOVE_MAX only guarantees:

"The maximum number of bytes that a single instruction can move _QUICKLY_
between memory and registers or between two memory locations."

(emphasis mine)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (8 preceding siblings ...)
  2021-11-25  7:40 ` rguenth at gcc dot gnu.org
@ 2021-11-25 17:57 ` jakub at gcc dot gnu.org
  2021-11-25 18:09 ` jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-25 17:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Alternatively, couldn't we check next to that new
                 && have_insn_for (SET, mode)
also that
                 && known_le (GET_MODE_SIZE (mode), MOVE_MAX)
?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (9 preceding siblings ...)
  2021-11-25 17:57 ` jakub at gcc dot gnu.org
@ 2021-11-25 18:09 ` jakub at gcc dot gnu.org
  2021-11-25 20:54 ` rearnsha at gcc dot gnu.org
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-25 18:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Actually no, GET_MODE_SIZE in that case is the size of the whole operation.
To me the previous change looks extremely ARM specific with load lines in mind
which no other target has.  If we want to support more than one SET covering
it, there should be a loop to find out how large each load should be and we
should decide that based on MOVE_MAX.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (10 preceding siblings ...)
  2021-11-25 18:09 ` jakub at gcc dot gnu.org
@ 2021-11-25 20:54 ` rearnsha at gcc dot gnu.org
  2021-11-25 20:57 ` rearnsha at gcc dot gnu.org
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-11-25 20:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #12 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #10)
> Alternatively, couldn't we check next to that new
>                  && have_insn_for (SET, mode)
> also that
>                  && known_le (GET_MODE_SIZE (mode), MOVE_MAX)
> ?

No, that would limit us to MOVE_MAX again, so what would be the point in having
a more relaxed test earlier.

I do wonder if MOVE_MAX * MOVE_RATIO should be replaced with the MOVE_BY_PIECES
infrastructure, I just haven't had time to cook up a patch to try that, though.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (11 preceding siblings ...)
  2021-11-25 20:54 ` rearnsha at gcc dot gnu.org
@ 2021-11-25 20:57 ` rearnsha at gcc dot gnu.org
  2021-11-25 22:49 ` hjl.tools at gmail dot com
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-11-25 20:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #13 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Also, note that the comment in gimple-fold.c prior to this change read:

      /* If we can perform the copy efficiently with first doing all loads
         and then all stores inline it that way.  Currently efficiently
         means that we can load all the memory into a single integer
         register which is what MOVE_MAX gives us.  */

Which would imply that the AArch64 definition of MOVE_MAX is the correct one.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (12 preceding siblings ...)
  2021-11-25 20:57 ` rearnsha at gcc dot gnu.org
@ 2021-11-25 22:49 ` hjl.tools at gmail dot com
  2021-11-26 11:31 ` rearnsha at gcc dot gnu.org
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: hjl.tools at gmail dot com @ 2021-11-25 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #14 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Richard Earnshaw from comment #13)
> Also, note that the comment in gimple-fold.c prior to this change read:
> 
>       /* If we can perform the copy efficiently with first doing all loads
>          and then all stores inline it that way.  Currently efficiently
>          means that we can load all the memory into a single integer
>          register which is what MOVE_MAX gives us.  */
> 
> Which would imply that the AArch64 definition of MOVE_MAX is the correct one.

The GCC manual has

- Macro: MOVE_MAX
     The maximum number of bytes that a single instruction can move
     quickly between memory and registers or between two memory
     locations.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (13 preceding siblings ...)
  2021-11-25 22:49 ` hjl.tools at gmail dot com
@ 2021-11-26 11:31 ` rearnsha at gcc dot gnu.org
  2021-11-26 11:37 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-11-26 11:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #15 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
It seems perverse to me that you have a standard named pattern in the x86
backend that is enabled, but then you somehow expect the generic parts of the
compiler to know that it shouldn't be used.  

Either the pattern should be disabled, or it should handle this case by
decomposing the operation into smaller chunks.  See, for example, how the arm
backend handles movmisaligndi.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (14 preceding siblings ...)
  2021-11-26 11:31 ` rearnsha at gcc dot gnu.org
@ 2021-11-26 11:37 ` jakub at gcc dot gnu.org
  2021-11-26 11:44 ` rearnsha at gcc dot gnu.org
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-26 11:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Earnshaw from comment #15)
> It seems perverse to me that you have a standard named pattern in the x86
> backend that is enabled, but then you somehow expect the generic parts of
> the compiler to know that it shouldn't be used.

They should be used, but only if the user code asks for it explicitly.
So, say a 32-byte generic vector in user code, or the <x86intrin.h> intrinsics
that need 32-byte vectors are just fine.
The option just asks that the compiler tries hard not to introduce those on its
own (e.g. vectorization but this string ops expansion is similar to that).

With those selected ISAs, such instructions are available, but on some CPUs use
of those is not really performance beneficial and using smaller vectors might
get better results.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (15 preceding siblings ...)
  2021-11-26 11:37 ` jakub at gcc dot gnu.org
@ 2021-11-26 11:44 ` rearnsha at gcc dot gnu.org
  2021-11-26 11:48 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-11-26 11:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #17 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #16)
> (In reply to Richard Earnshaw from comment #15)
> > It seems perverse to me that you have a standard named pattern in the x86
> > backend that is enabled, but then you somehow expect the generic parts of
> > the compiler to know that it shouldn't be used.
> 
> They should be used, but only if the user code asks for it explicitly.
> So, say a 32-byte generic vector in user code, or the <x86intrin.h>
> intrinsics that need 32-byte vectors are just fine.
> The option just asks that the compiler tries hard not to introduce those on
> its own (e.g. vectorization but this string ops expansion is similar to
> that).
> 
> With those selected ISAs, such instructions are available, but on some CPUs
> use of those is not really performance beneficial and using smaller vectors
> might get better results.

So the intrinsic should use a non-standard name to implement the expansion. 
Then the well-known name can be disabled and the mid-end will not use it.  We
have cases like that in the neon handling of intrinsics, for example to handle
that the intrinsics need to work even without fast-math.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (16 preceding siblings ...)
  2021-11-26 11:44 ` rearnsha at gcc dot gnu.org
@ 2021-11-26 11:48 ` jakub at gcc dot gnu.org
  2021-11-26 11:51 ` rearnsha at gcc dot gnu.org
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-26 11:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
No.  Generic vectors need to work too.  And those always do use the standard
optabs.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (17 preceding siblings ...)
  2021-11-26 11:48 ` jakub at gcc dot gnu.org
@ 2021-11-26 11:51 ` rearnsha at gcc dot gnu.org
  2021-11-26 11:58 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-11-26 11:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #19 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
It sounds to me like you're trying to keep your cake and eat it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (18 preceding siblings ...)
  2021-11-26 11:51 ` rearnsha at gcc dot gnu.org
@ 2021-11-26 11:58 ` jakub at gcc dot gnu.org
  2021-11-26 12:26 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-11-26 11:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #20 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The aarch64 MOVE_MAX definition of (UNITS_PER_WORD * 2) clearly doesn't match
the documentation, because with Neon/SVE around, you can move quickly much more
bytes by a single instruction than that.  And the gimple-fold.c change was just
a workaround for that.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (19 preceding siblings ...)
  2021-11-26 11:58 ` jakub at gcc dot gnu.org
@ 2021-11-26 12:26 ` rguenth at gcc dot gnu.org
  2021-11-26 12:51 ` rearnsha at gcc dot gnu.org
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-26 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note using MOVE_RATIO in gimple-fold but then always emitting just a single
stmt and not honoring MOVE_MAX on that is fishy - you seem to be expecting RTL
expansion to fix up but that's clearly not happening (for the reasons Jakub is
talking about).  So one can either teach gimple-fold to emit multiple
load/store
stmts or go back to MOVE_MAX.  Note multiple load/store stmts wasn't done on
purpose to not complicate the code and bring in additional cost considerations
(size and register pressure).  But sure, in principle the whole *_by_pieces
machinery could be brought to GIMPLE and we could leave RTL expansion of
memcpy/memset and friends to always emit calls or use optabs.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (20 preceding siblings ...)
  2021-11-26 12:26 ` rguenth at gcc dot gnu.org
@ 2021-11-26 12:51 ` rearnsha at gcc dot gnu.org
  2022-03-01 22:41 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-11-26 12:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #22 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Looking at the different port definitions for MOVE_MAX, it would appear that
only the i386 port seems to be using a value that is not the size of a
general-purpose register.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (21 preceding siblings ...)
  2021-11-26 12:51 ` rearnsha at gcc dot gnu.org
@ 2022-03-01 22:41 ` hjl.tools at gmail dot com
  2022-03-02 14:52 ` hjl.tools at gmail dot com
  2022-03-31  7:31 ` rguenth at gcc dot gnu.org
  24 siblings, 0 replies; 26+ messages in thread
From: hjl.tools at gmail dot com @ 2022-03-01 22:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #23 from H.J. Lu <hjl.tools at gmail dot com> ---
A patch is posted at

https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591093.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (22 preceding siblings ...)
  2022-03-01 22:41 ` hjl.tools at gmail dot com
@ 2022-03-02 14:52 ` hjl.tools at gmail dot com
  2022-03-31  7:31 ` rguenth at gcc dot gnu.org
  24 siblings, 0 replies; 26+ messages in thread
From: hjl.tools at gmail dot com @ 2022-03-02 14:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #24 from H.J. Lu <hjl.tools at gmail dot com> ---
Another testcase:

[hjl@gnu-tgl-2 pr103393]$ cat x.c
struct TestData {
  float arr[8];
};
void cpy(struct TestData *s1, struct TestData *s2 ) {
  for(int i=0; i<16; ++i) {
    s1->arr[i] = s2->arr[i];
  }
}
[hjl@gnu-tgl-2 pr103393]$ make x.s
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2
-march=skylake-avx512 -S x.c
[hjl@gnu-tgl-2 pr103393]$ cat x.s
        .file   "x.c"
        .text
        .p2align 4
        .globl  cpy
        .type   cpy, @function
cpy:
.LFB0:
        .cfi_startproc
        vmovdqu64       (%rsi), %zmm0
        vmovdqu64       %zmm0, (%rdi)
        vzeroupper
        ret
        .cfi_endproc
.LFE0:
        .size   cpy, .-cpy
        .ident  "GCC: (GNU) 12.0.1 20220301 (experimental)"
        .section        .note.GNU-stack,"",@progbits
[hjl@gnu-tgl-2 pr103393]$ 

ZMM is used when we try to avoid it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug middle-end/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128
  2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
                   ` (23 preceding siblings ...)
  2022-03-02 14:52 ` hjl.tools at gmail dot com
@ 2022-03-31  7:31 ` rguenth at gcc dot gnu.org
  24 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-03-31  7:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
Should be fixed by means of reversion of r12-3482-g5f6a6c91d7c592

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-03-31  7:31 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23 19:20 [Bug tree-optimization/103393] New: [ 12 Regression ] Auto vectorizer generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128 jschoen4 at gmail dot com
2021-11-24  8:57 ` [Bug tree-optimization/103393] [12 Regression] Generating " rguenth at gcc dot gnu.org
2021-11-24 13:45 ` [Bug target/103393] " hjl.tools at gmail dot com
2021-11-24 13:53 ` rguenth at gcc dot gnu.org
2021-11-24 20:38 ` jschoen4 at gmail dot com
2021-11-25  1:15 ` crazylht at gmail dot com
2021-11-25  1:25 ` crazylht at gmail dot com
2021-11-25  7:16 ` rguenther at suse dot de
2021-11-25  7:28 ` [Bug middle-end/103393] " rguenth at gcc dot gnu.org
2021-11-25  7:40 ` rguenth at gcc dot gnu.org
2021-11-25 17:57 ` jakub at gcc dot gnu.org
2021-11-25 18:09 ` jakub at gcc dot gnu.org
2021-11-25 20:54 ` rearnsha at gcc dot gnu.org
2021-11-25 20:57 ` rearnsha at gcc dot gnu.org
2021-11-25 22:49 ` hjl.tools at gmail dot com
2021-11-26 11:31 ` rearnsha at gcc dot gnu.org
2021-11-26 11:37 ` jakub at gcc dot gnu.org
2021-11-26 11:44 ` rearnsha at gcc dot gnu.org
2021-11-26 11:48 ` jakub at gcc dot gnu.org
2021-11-26 11:51 ` rearnsha at gcc dot gnu.org
2021-11-26 11:58 ` jakub at gcc dot gnu.org
2021-11-26 12:26 ` rguenth at gcc dot gnu.org
2021-11-26 12:51 ` rearnsha at gcc dot gnu.org
2022-03-01 22:41 ` hjl.tools at gmail dot com
2022-03-02 14:52 ` hjl.tools at gmail dot com
2022-03-31  7:31 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).