[Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
@ 2020-10-11  6:41 peter at cordes dot ca
  2020-10-11  6:42 ` [Bug target/97366] " peter at cordes dot ca
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: peter at cordes dot ca @ 2020-10-11  6:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

            Bug ID: 97366
           Summary: [8/9/10/11 Regression] Redundant load with SSE/AVX
                    vector intrinsics
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

When you use the same _mm_load_si128 or _mm256_load_si256 result twice,
sometimes GCC loads it *and* uses it as a memory source operand.

I'm not certain this is specific to x86 back-ends, please check bug tags if it
happens elsewhere.  (But it probably doesn't on 3-operand load/store RISC
machines; it looks like one operation chooses to load and then operate, the
other chooses to use the original source as a memory operand.)

#include <immintrin.h>
void gcc_double_load_128(int8_t *__restrict out, const int8_t *__restrict
input)
{
    for (unsigned i=0 ; i<1024 ; i+=16){
        __m128i in = _mm_load_si128((__m128i*)&input[i]);
        __m128i high = _mm_srli_epi32(in, 4);
        _mm_store_si128((__m128i*)&out[i], _mm_or_si128(in,high));
    }
}

gcc 8 and later -O3 -mavx2, including 11.0.0 20200920, with 

gcc_double_load_128(signed char*, signed char const*):
        xorl    %eax, %eax
.L6:
        vmovdqa (%rsi,%rax), %xmm1         # load
        vpsrld  $4, %xmm1, %xmm0
        vpor    (%rsi,%rax), %xmm0, %xmm0  # reload as a memory operand
        vmovdqa %xmm0, (%rdi,%rax)
        addq    $16, %rax
        cmpq    $1024, %rax
        jne     .L6
        ret

GCC7.5 and earlier use  vpor %xmm1, %xmm0, %xmm0 to use the copy of the
original that was already loaded.

`-march=haswell` happens to fix this for GCC trunk, for this 128-bit version
but not for a __m256i version.

restrict doesn't make a difference, and there's no overlapping anyway.  The two
redundant loads both happen between any other stores.

Using a memory source operand for vpsrld wasn't an option: the form with a
memory source takes the *count* from  memory, not the data. 
https://www.felixcloutier.com/x86/psllw:pslld:psllq

----

Note that *without* AVX, the redundant load is a possible win, for code running
on Haswell and later Intel (and AMD) CPUs.  Possibly some heuristic is saving
instructions for the legacy-SSE case (in a way that's probably worse overall)
and hurting the AVX case.

GCC 7.5, -O3  without any -m options
gcc_double_load_128(signed char*, signed char const*):
        xorl    %eax, %eax
.L2:
        movdqa  (%rsi,%rax), %xmm0
        movdqa  %xmm0, %xmm1         # this instruction avoided
        psrld   $4, %xmm1
        por     %xmm1, %xmm0         # with a memory source reload, in GCC8 and
later
        movaps  %xmm0, (%rdi,%rax)
        addq    $16, %rax
        cmpq    $1024, %rax
        jne     .L2
        rep ret

Using a memory-source POR saves 1 front-end uop by avoiding a register-copy, as
long as the indexed addressing mode can stay micro-fused on Intel.  (Requires
Haswell or later for that to happen, or any AMD.)  But in practice it's
probably worse.  Load-port pressure, and space in the out-of-order scheduler,
as well as code-size, is a problem for using an extra memory-source operand in
the SSE version, with the upside being saving 1 uop for the front-end.  (And
thus in the ROB.)  mov-elimination on modern CPUs means the movdqa register
copy costs no back-end resources (ivybridge and bdver1).

I don't know if GCC trunk is using por  (%rsi,%rax), %xmm0  on purpose for that
reason, of if it's just a coincidence.
I don't think it's a good idea on most CPUs, even if alignment is guaranteed.

This is of course 100% a loss with AVX; we have to `vmovdqa/u` load for the
shift, and it can leave the original value in a register so we're not saving a
vmovdqua.  And it's a bigger loss because indexed memory-source operands
unlaminate from 3-operand instructions even on Haswell/Skylake:
https://stackoverflow.com/questions/26046634/micro-fusion-and-addressing-modes/31027695#31027695
so it hurts the front-end as well as wasting cycles on load ports, and taking
up space in the RS (scheduler).

The fact that -mtune=haswell fixes this for 128-bit vectors is interesting, but
it's clearly still a loss in the AVX version for all AVX CPUs.  2 memory ops /
cycle on Zen could become a bottleneck, and it's larger code size.  And
-mtune=haswell *doesn't* fix it for the -mavx2 _m256i version.

There is a possible real advantage in the SSE case, but it's very minor and
outweighed by disadvantages.  Especially for older CPUs like Nehalem that can
only do 1 load / 1 store per clock.  (Although this has so many uops in the
loop that it barely bottlenecks on that.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
@ 2020-10-11  6:42 ` peter at cordes dot ca
  2020-10-11 16:05 ` amonakov at gcc dot gnu.org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: peter at cordes dot ca @ 2020-10-11  6:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

--- Comment #1 from Peter Cordes <peter at cordes dot ca> ---
Forgot to include https://godbolt.org/z/q44r13

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
  2020-10-11  6:42 ` [Bug target/97366] " peter at cordes dot ca
@ 2020-10-11 16:05 ` amonakov at gcc dot gnu.org
  2020-10-12  6:28 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-10-11 16:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Intrinsics being type-agnostic cause vector subregs to appear before register
allocation: the pseudo coming from the load has mode V2DI, the shift needs to
be done in mode V4SI, the bitwise-or and the store are done in mode V2DI again.
Subreg in the bitwise-or appears to be handled inefficiently. Didn't dig deeper
as to what happens during allocation.

FWIW, using generic vectors allows to avoid introducing such mismatches, and
indeed the variant coded with generic vectors does not have extra loads. For
your original code you'll have to convert between generic vectors and __m128i
to use the shuffle intrinsic. The last paragraphs in "Vector Extensions"
chapter [1] suggest using a union for that purpose in C; in C++ reinterpreting
via union is formally UB, so another approach could be used (probably simply
converting via assignment).

[1] https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

typedef uint32_t u32v4 __attribute__((vector_size(16)));
void gcc_double_load_128(int8_t *__restrict out, const int8_t *__restrict
input)
{
    u32v4 *vin = (u32v4 *)input;
    u32v4 *vout = (u32v4 *)out;
    for (unsigned i=0 ; i<1024; i+=16) {
        u32v4 in = *vin++;
        *vout++ = in | (in >> 4);
    }
}

Above code on Compiler Explorer: https://godbolt.org/z/MKPvxb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
  2020-10-11  6:42 ` [Bug target/97366] " peter at cordes dot ca
  2020-10-11 16:05 ` amonakov at gcc dot gnu.org
@ 2020-10-12  6:28 ` rguenth at gcc dot gnu.org
  2020-10-12  9:26 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-10-12  6:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |93943
   Target Milestone|---                         |8.5

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Maybe the same issue as PR93943?  We seem to be quite trigger-happy with
substituting memory for operands.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93943
[Bug 93943] IRA/LRA happily rematerialize (un-CSEs) loads without register
pressure

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (2 preceding siblings ...)
  2020-10-12  6:28 ` rguenth at gcc dot gnu.org
@ 2020-10-12  9:26 ` jakub at gcc dot gnu.org
  2020-10-12 11:04 ` amonakov at gcc dot gnu.org
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-12  9:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-10-12
           Keywords|                            |ra
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Seems it is LRA that creates this (propagates the MEM load into the ior
instruction, even when the memory location is already loaded for another
instruction earlier).  Perhaps it wouldn't do it if there weren't the subregs
(the load is in V4SImode, while ior uses V2DImode).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (3 preceding siblings ...)
  2020-10-12  9:26 ` jakub at gcc dot gnu.org
@ 2020-10-12 11:04 ` amonakov at gcc dot gnu.org
  2020-10-12 11:43 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-10-12 11:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

--- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
afaict LRA is just following IRA decisions, and IRA allocates that pseudo to
memory due to costs.

Not sure where strange cost is coming from, but it depends on x86 tuning
options: with -mtune=skylake we get the expected code, with -mtune=haswell we
get 128-bit vectors right and extra load for 256-bit, with -mtune=generic both
cases have extra loads.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (4 preceding siblings ...)
  2020-10-12 11:04 ` amonakov at gcc dot gnu.org
@ 2020-10-12 11:43 ` rguenth at gcc dot gnu.org
  2020-10-20  7:59 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-10-12 11:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (5 preceding siblings ...)
  2020-10-12 11:43 ` rguenth at gcc dot gnu.org
@ 2020-10-20  7:59 ` crazylht at gmail dot com
  2021-02-19 20:19 ` vmakarov at gcc dot gnu.org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: crazylht at gmail dot com @ 2020-10-20  7:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Alexander Monakov from comment #5)
> afaict LRA is just following IRA decisions, and IRA allocates that pseudo to
> memory due to costs.
> 
> Not sure where strange cost is coming from, but it depends on x86 tuning
> options: with -mtune=skylake we get the expected code, with -mtune=haswell
> we get 128-bit vectors right and extra load for 256-bit, with -mtune=generic
> both cases have extra loads.

in 
----
  /* If this insn loads a parameter from its stack slot, then it
     represents a savings, rather than a cost, if the parameter is
     stored in memory.  Record this fact.

     Similarly if we're loading other constants from memory (constant
     pool, TOC references, small data areas, etc) and this is the only
     assignment to the destination pseudo.

     Don't do this if SET_SRC (set) isn't a general operand, if it is
     a memory requiring special instructions to load it, decreasing
     mem_cost might result in it being loaded using the specialized
     instruction into a register, then stored into stack and loaded
     again from the stack.  See PR52208.

     Don't do this if SET_SRC (set) has side effect.  See PR56124.  */
  if (set != 0 && REG_P (SET_DEST (set)) && MEM_P (SET_SRC (set))
      && (note = find_reg_note (insn, REG_EQUIV, NULL_RTX)) != NULL_RTX
      && ((MEM_P (XEXP (note, 0))
           && !side_effects_p (SET_SRC (set)))
          || (CONSTANT_P (XEXP (note, 0))
              && targetm.legitimate_constant_p (GET_MODE (SET_DEST (set)),
                                                XEXP (note, 0))
              && REG_N_SETS (REGNO (SET_DEST (set))) == 1))
      && general_operand (SET_SRC (set), GET_MODE (SET_SRC (set)))
      /* LRA does not use equiv with a symbol for PIC code.  */
      && (! ira_use_lra_p || ! pic_offset_table_rtx
          || ! contains_symbol_ref_p (XEXP (note, 0))))
    {
      enum reg_class cl = GENERAL_REGS;
      rtx reg = SET_DEST (set);
      int num = COST_INDEX (REGNO (reg));

      COSTS (costs, num)->mem_cost
        -= ira_memory_move_cost[GET_MODE (reg)][cl][1] * frequency;
      record_address_regs (GET_MODE (SET_SRC (set)),
                           MEM_ADDR_SPACE (SET_SRC (set)),
                           XEXP (SET_SRC (set), 0), 0, MEM, SCRATCH,
                           frequency * 2);
      counted_mem = true;
    }
---

for 

(insn 9 8 11 3 (set (reg:V2DI 88 [ _16 ])
        (mem:V2DI (plus:DI (reg/v/f:DI 91 [ input ])
                (reg:DI 89 [ ivtmp.11 ])) [0 MEM[(const __m128i *
{ref-all})input_7(D) + ivtmp.11_40 * 1]+0 S16 A128]))
"/export/users2/liuhongt/tools-build/build_gcc11_master_debug/gcc/include/emmintrin.h":697:10
1405 {movv2di_internal}

mem_cost for r88 would minus ira_memory_move_cost[V2DImode][GENERAL_REGS][1],
and got -11808 as an initial value, but for reality it should minus
ira_memory_move_cost[V2DImode][SSE_REGS][1], then have -5905 as an initial
value. It seems it adds too much preference to memory here.

Then in the later record_operand_costs, when ira found r88 would also be used
in shift and ior instruction, the mem_cost for r88 increases, but still smaller 
than costs of SSE_REGS because we add too much preference to memory in the
upper. Finally, ira would choose memory for r88 because it has lowest cost and
it's suboptimal.

a10(r88,l1) costs: SSE_FIRST_REG:0,0 NO_REX_SSE_REGS:0,0 SSE_REGS:0,0
MEM:-984,-984

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (6 preceding siblings ...)
  2020-10-20  7:59 ` crazylht at gmail dot com
@ 2021-02-19 20:19 ` vmakarov at gcc dot gnu.org
  2021-02-19 20:20 ` vmakarov at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2021-02-19 20:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

--- Comment #7 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Created attachment 50225
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50225&action=edit
A candidate patch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (7 preceding siblings ...)
  2021-02-19 20:19 ` vmakarov at gcc dot gnu.org
@ 2021-02-19 20:20 ` vmakarov at gcc dot gnu.org
  2021-05-14  9:54 ` [Bug target/97366] [9/10/11/12 " jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2021-02-19 20:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

--- Comment #8 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
I've tried different approaches to fix it.  The best patch I have now is in the
attachment.

Unfortunately, the best patch results in two new failures on ppc64 (other
patches are even worse):

gcc.target/powerpc/dform-3.c scan-assembler-not mfvsrd
gcc.target/powerpc/dform-3.c scan-assembler-not mfvsrld

I'll think more how to avoid these 2 failures.  If I succeed, I'll submit a
patch.  But there is probability that the PR will not be fixed at all.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [9/10/11/12 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (8 preceding siblings ...)
  2021-02-19 20:20 ` vmakarov at gcc dot gnu.org
@ 2021-05-14  9:54 ` jakub at gcc dot gnu.org
  2021-06-01  8:18 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-14  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|8.5                         |9.4

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 8 branch is being closed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [9/10/11/12 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (9 preceding siblings ...)
  2021-05-14  9:54 ` [Bug target/97366] [9/10/11/12 " jakub at gcc dot gnu.org
@ 2021-06-01  8:18 ` rguenth at gcc dot gnu.org
  2022-05-27  9:43 ` [Bug target/97366] [10/11/12/13 " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.4                         |9.5

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [10/11/12/13 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (10 preceding siblings ...)
  2021-06-01  8:18 ` rguenth at gcc dot gnu.org
@ 2022-05-27  9:43 ` rguenth at gcc dot gnu.org
  2022-06-28 10:42 ` jakub at gcc dot gnu.org
  2023-07-07 10:38 ` [Bug target/97366] [11/12/13/14 " rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-27  9:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.5                         |10.4

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9 branch is being closed

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [10/11/12/13 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (11 preceding siblings ...)
  2022-05-27  9:43 ` [Bug target/97366] [10/11/12/13 " rguenth at gcc dot gnu.org
@ 2022-06-28 10:42 ` jakub at gcc dot gnu.org
  2023-07-07 10:38 ` [Bug target/97366] [11/12/13/14 " rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-28 10:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.4                        |10.5

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/97366] [11/12/13/14 Regression] Redundant load with SSE/AVX vector intrinsics
  2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
                   ` (12 preceding siblings ...)
  2022-06-28 10:42 ` jakub at gcc dot gnu.org
@ 2023-07-07 10:38 ` rguenth at gcc dot gnu.org
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.5                        |11.5

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-07-07 10:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-11  6:41 [Bug target/97366] New: [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics peter at cordes dot ca
2020-10-11  6:42 ` [Bug target/97366] " peter at cordes dot ca
2020-10-11 16:05 ` amonakov at gcc dot gnu.org
2020-10-12  6:28 ` rguenth at gcc dot gnu.org
2020-10-12  9:26 ` jakub at gcc dot gnu.org
2020-10-12 11:04 ` amonakov at gcc dot gnu.org
2020-10-12 11:43 ` rguenth at gcc dot gnu.org
2020-10-20  7:59 ` crazylht at gmail dot com
2021-02-19 20:19 ` vmakarov at gcc dot gnu.org
2021-02-19 20:20 ` vmakarov at gcc dot gnu.org
2021-05-14  9:54 ` [Bug target/97366] [9/10/11/12 " jakub at gcc dot gnu.org
2021-06-01  8:18 ` rguenth at gcc dot gnu.org
2022-05-27  9:43 ` [Bug target/97366] [10/11/12/13 " rguenth at gcc dot gnu.org
2022-06-28 10:42 ` jakub at gcc dot gnu.org
2023-07-07 10:38 ` [Bug target/97366] [11/12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).