[Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2
@ 2022-02-17 12:30 ubizjak at gmail dot com
  2022-02-17 12:35 ` [Bug tree-optimization/104582] " ubizjak at gmail dot com
                   ` (25 more replies)
  0 siblings, 26 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2022-02-17 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

            Bug ID: 104582
           Summary: Unoptimal code for __negdi2 (and others) from libgcc2
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcase (taken from libgcc):

--cut here--
typedef          int DItype     __attribute__ ((mode (DI)));
typedef unsigned int UDItype    __attribute__ ((mode (DI)));
typedef          int TItype     __attribute__ ((mode (TI)));

#define Wtype   DItype
#define UWtype  UDItype
#define DWtype  TItype

#if __BYTE_ORDER__ != __ORDER_LITTLE_ENDIAN__
  struct DWstruct {Wtype high, low;};
#else
  struct DWstruct {Wtype low, high;};
#endif

typedef union
{
  struct DWstruct s;
  DWtype ll;
} DWunion;

DWtype
__negdi2 (DWtype u)
{
  const DWunion uu = {.ll = u};
  const DWunion w = { {.low = -uu.s.low,
                       .high = -uu.s.high - ((UWtype) -uu.s.low > 0) } };

  return w.ll;
}
--cut here--

compiles with -O2 on x86_64 to:

__negdi2:
        movq    %rdi, %rax
        negq    %rsi
        negq    %rax
        cmpq    $1, %rdi
        adcq    $-1, %rsi
        movq    %rax, %xmm0
        movq    %rsi, %xmm1
        punpcklqdq      %xmm1, %xmm0
        movaps  %xmm0, -24(%rsp)
        movq    -24(%rsp), %rax
        movq    -16(%rsp), %rdx
        ret

Please note the convoluted sequence to move the value at the end.

gcc-10 compiles the code to:

__negdi2:
        negq    %rsi
        movq    %rdi, %rax
        negq    %rax
        movq    %rsi, %rdx
        cmpq    $1, %rdi
        adcq    $-1, %rdx
        ret

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] Unoptimal code for __negdi2 (and others) from libgcc2
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
@ 2022-02-17 12:35 ` ubizjak at gmail dot com
  2022-02-17 12:45 ` ubizjak at gmail dot com
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2022-02-17 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
Happens due to unwanted vectorization with -ftree-vectorize by default.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] Unoptimal code for __negdi2 (and others) from libgcc2
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
  2022-02-17 12:35 ` [Bug tree-optimization/104582] " ubizjak at gmail dot com
@ 2022-02-17 12:45 ` ubizjak at gmail dot com
  2022-02-17 15:28 ` [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization jakub at gcc dot gnu.org
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2022-02-17 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
Please note that gcc-10 does not vectorize the testcase even with -O3
-ftree-vectorize.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
  2022-02-17 12:35 ` [Bug tree-optimization/104582] " ubizjak at gmail dot com
  2022-02-17 12:45 ` ubizjak at gmail dot com
@ 2022-02-17 15:28 ` jakub at gcc dot gnu.org
  2022-02-17 16:35 ` jakub at gcc dot gnu.org
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-17 15:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org
            Summary|Unoptimal code for __negdi2 |[11/12 Regression]
                   |(and others) from libgcc2   |Unoptimal code for __negdi2
                   |due to unwanted             |(and others) from libgcc2
                   |vectorization               |due to unwanted
                   |                            |vectorization
           Priority|P3                          |P2
   Target Milestone|---                         |11.3

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
With -O3 started with
r11-3204-gc9de716a59c873859df3b3e1fbb993200fce5a73
With -O2 most likely with
r12-4240-g2b8453c401b699ed93c085d0413ab4b5030bcdb8
Not sure if it is better not to vectorize it or be able to undo the
vectorization.
In particular in this case, *.optimized dump has:
  _14 = {_1, _5};
  _8 = VIEW_CONVERT_EXPR<__int128>(_14);
where
  vector(2) long int _14;
I guess it depends on how _8 is then used, if it is going to be used in some
vector context, perhaps the above can be a win, but when the __int128 is
returned as __int128, on targets that return in a pair of registers that is
never beneficial, I bet if it is stored into memory, it will be hardly ever
beneficial etc.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2022-02-17 15:28 ` [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization jakub at gcc dot gnu.org
@ 2022-02-17 16:35 ` jakub at gcc dot gnu.org
  2022-02-17 16:40 ` jakub at gcc dot gnu.org
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-17 16:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What slp does is just
-  w.s.low = _1;
-  w.s.high = _5;
+  _14 = {_1, _5};
+  MEM[(union  *)&w] = _14;
I must say I don't really see that as a beneficial optimization, construction
of a vector from scalars just to store it in memory doesn't look ever like a
win.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (3 preceding siblings ...)
  2022-02-17 16:35 ` jakub at gcc dot gnu.org
@ 2022-02-17 16:40 ` jakub at gcc dot gnu.org
  2022-02-18  0:43 ` pinskia at gcc dot gnu.org
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-17 16:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The costs look weird:
_1 1 times scalar_store costs 12 in body
_5 1 times scalar_store costs 12 in body
_1 1 times vector_store costs 12 in body
<unknown> 1 times vec_construct costs 8 in prologue
vec_construct is certainly more expensive than a store (especially in this case
when it is a store into a TImode variable which isn't addressable and will not
be in memory at all).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (4 preceding siblings ...)
  2022-02-17 16:40 ` jakub at gcc dot gnu.org
@ 2022-02-18  0:43 ` pinskia at gcc dot gnu.org
  2022-02-18  7:26 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-18  0:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-02-18
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm:
 _14 = {_1, _5};
  _8 = VIEW_CONVERT_EXPR<__int128>(_14);

Wouldn't it better to convert that to just (hopefully I got the order correct):
t1 = (__128)_1
_8 = BIT_INSERT_EXPR(t1, 64, _5);

?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (5 preceding siblings ...)
  2022-02-18  0:43 ` pinskia at gcc dot gnu.org
@ 2022-02-18  7:26 ` rguenth at gcc dot gnu.org
  2022-02-18  8:37 ` jakub at gcc dot gnu.org
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18  7:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #5)
> The costs look weird:
> _1 1 times scalar_store costs 12 in body
> _5 1 times scalar_store costs 12 in body
> _1 1 times vector_store costs 12 in body
> <unknown> 1 times vec_construct costs 8 in prologue
> vec_construct is certainly more expensive than a store (especially in this
> case when it is a store into a TImode variable which isn't addressable and
> will not be in memory at all).

x86 can do cheap move low/hi so the construct isn't expensive.  Note
it only gets expensive in the end because the "memory" isn't really memory
and the return ABI isn't exposed.

Just as a wild idea, maybe we can pessimize vector stores into
!TREE_ADDRESSABLE automatic variables ...

We do already have some "weird" code in vect_model_store_cost employing
hard_function_value to deal with stores to RESULT_DECLs, but here 'w'
isn't a RESULT_DECL.  In the code we assume what happens happens, spill
of the vector and loads of the components.

What's missing in the CTOR cost is the move from GPR to XMM regs when
we are not dealing with FP or vector components (or direct memory
sources).  Getting that applied only for relevant cases isn't easy
since it requires looking at the defs.

One could try to amend the vect_model_store_cost handling by at the
beginning of the SLP pass analyze stmts from the function return,
marking decls we return a loaded value from in some way and handle
that in a similar way.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (6 preceding siblings ...)
  2022-02-18  7:26 ` rguenth at gcc dot gnu.org
@ 2022-02-18  8:37 ` jakub at gcc dot gnu.org
  2022-02-18  8:48 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-18  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Just trying a dumb microbenchmark:
struct S { unsigned long a, b; } s;

__attribute__((noipa)) void
foo (unsigned long a, unsigned long b)
{
  s.a = a;
  s.b = b;
}

int
main ()
{
  int i;
  for (i = 0; i < 1000000000; i++)
    foo (42, 43);
  return 0;
}
the GCC 11 vs. GCC 12 code:
-       movq    %rdi, s(%rip)
-       movq    %rsi, s+8(%rip)
+       movq    %rdi, %xmm0
+       movq    %rsi, %xmm1
+       punpcklqdq      %xmm1, %xmm0
+       movaps  %xmm0, s(%rip)
seems to be exactly the same speed (on i9-7960X) and the GCC 11 code is 7 bytes
smaller.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (7 preceding siblings ...)
  2022-02-18  8:37 ` jakub at gcc dot gnu.org
@ 2022-02-18  8:48 ` rguenth at gcc dot gnu.org
  2022-02-18  8:53 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18  8:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #8)
> Just trying a dumb microbenchmark:
> struct S { unsigned long a, b; } s;
> 
> __attribute__((noipa)) void
> foo (unsigned long a, unsigned long b)
> {
>   s.a = a;
>   s.b = b;
> }
> 
> int
> main ()
> {
>   int i;
>   for (i = 0; i < 1000000000; i++)
>     foo (42, 43);
>   return 0;
> }
> the GCC 11 vs. GCC 12 code:
> -	movq	%rdi, s(%rip)
> -	movq	%rsi, s+8(%rip)
> +	movq	%rdi, %xmm0
> +	movq	%rsi, %xmm1
> +	punpcklqdq	%xmm1, %xmm0
> +	movaps	%xmm0, s(%rip)
> seems to be exactly the same speed (on i9-7960X) and the GCC 11 code is 7
> bytes smaller.

The GCC 12 code is 30% slower on Zen 2 (the gpr -> xmm move is comparatively
more costly there).  As said we fail to account for that.  But as I said
the cost is not there if it's

struct S { unsigned long a, b; } s;

__attribute__((noipa)) void
foo (unsigned long *a, unsigned long *b)
{
  unsigned long a_ = *a;
  unsigned long b_ = *b;
  s.a = a_;
  s.b = b_;
}

which vectorizes to

        movq    (%rdi), %xmm0
        movhps  (%rsi), %xmm0
        movaps  %xmm0, s(%rip)
        ret

which is _smaller_ than the scalar code.  So it's important to be able
to distinguish those cases.  The above is also

a__3 1 times scalar_store costs 12 in body
b__5 1 times scalar_store costs 12 in body
a__3 1 times vector_store costs 12 in body
<unknown> 1 times vec_construct costs 8 in prologue

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (8 preceding siblings ...)
  2022-02-18  8:48 ` rguenth at gcc dot gnu.org
@ 2022-02-18  8:53 ` rguenth at gcc dot gnu.org
  2022-02-18  9:02 ` jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18  8:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, I think it makes sense to build libgcc with -mno-sse, maybe even
-mgeneral-regs-only.  Or globally with -fno-tree-vectorize (but we likely do
not want
%xmm uses for parameter setup either with the move-by-pieces changes - IIRC
I've seen uses in the unwinder code trapping because of a misaligned stack
in an executable).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (9 preceding siblings ...)
  2022-02-18  8:53 ` rguenth at gcc dot gnu.org
@ 2022-02-18  9:02 ` jakub at gcc dot gnu.org
  2022-02-18  9:28 ` rguenther at suse dot de
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-18  9:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
True.
So another option is to try to undo some of those short vectorization cases
during isel, expansion or later, though e.g. for the negdi2 case it will go
already during expansion into memory.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (10 preceding siblings ...)
  2022-02-18  9:02 ` jakub at gcc dot gnu.org
@ 2022-02-18  9:28 ` rguenther at suse dot de
  2022-02-18 10:19 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenther at suse dot de @ 2022-02-18  9:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 18 Feb 2022, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582
> 
> --- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> True.
> So another option is to try to undo some of those short vectorization cases
> during isel, expansion or later, though e.g. for the negdi2 case it will go
> already during expansion into memory.

Yes, there are duplicates about this issue and it's really hard to
solve generally.  There's the possibility to try improving on the
costing side but currently the cost hooks just see

ix86_vector_costs::add_stmt_cost (this=0x41b88c0, count=1, 
kind=vec_construct, stmt_info=0x0, vectype=<vector_type 0x7ffff667a888>, 
misalign=0, where=vect_prologue)

so they have no idea about the feeding stmts.  The cost entry
is generated by vect_prologue_cost_for_slp which knows the
scalar operands but we do not pass the SLP node down to the cost
hooks (that's something on my list but my idea was to push it back
when we only have SLP nodes and thus could go w/o the stmt_info then).

The other possibility is (for the original testcase) to anticipate
that RTL expansion will expand 'w' to a TImode register and take
that as a reason to pessimize vectorization (but we don't know how
it's going to be used, so that's probably a flawed attempt).

The only short-term fixes are a) biasing the costing, regressing
the from memory case, b) pass down the SLP node where available
and look at the defs of the CTOR components, costing a gpr->xmm
move where it can be anticipated.

b) is more future-proof, if we'd take that at this point I can
see how intrusive it would be.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (11 preceding siblings ...)
  2022-02-18  9:28 ` rguenther at suse dot de
@ 2022-02-18 10:19 ` rguenth at gcc dot gnu.org
  2022-02-18 10:22 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18 10:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
                 CC|                            |rsandifo at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (12 preceding siblings ...)
  2022-02-18 10:19 ` rguenth at gcc dot gnu.org
@ 2022-02-18 10:22 ` rguenth at gcc dot gnu.org
  2022-02-18 10:50 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18 10:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 52476
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52476&action=edit
minimal patch

This is a minimal untested patch adjusting APIs to allow for the cost hook to
receive a slp_node in addition to a stmt_vec_info and make the x86 backend
use it and successfully disregard the vectorization that's not doing
a CTOR from memory.

Other targets need minimal adjustments as well of course and some of the
cleanups (additional overloads for record/add_stmt_cost for scalar and branch
stmts and two fixes using scalar_stmt rather than vector_stmt kinds for
versioning costs can and will be split out).

Richard - any comments?  Would you object to doing this for GCC 12 (give we
changed the costing API anyway)?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (13 preceding siblings ...)
  2022-02-18 10:22 ` rguenth at gcc dot gnu.org
@ 2022-02-18 10:50 ` rguenth at gcc dot gnu.org
  2022-02-18 11:31 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18 10:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another testcase is

struct S { double a, b; } s;

void
foo (double a, double b)
{
  s.a = a;
  s.b = b;
}

which also receives the same costs and compiles vectorized to

  unpcklpd %xmm1,%xmm0
  movaps %xmm0,0x0(%rip)  
  ret

which is also smaller than unvectorized.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (14 preceding siblings ...)
  2022-02-18 10:50 ` rguenth at gcc dot gnu.org
@ 2022-02-18 11:31 ` rguenth at gcc dot gnu.org
  2022-02-18 13:37 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18 11:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
The patch will cause

FAIL: gcc.target/i386/pr91446.c scan-assembler-times vmovdqa[^\\n\\r]*xmm[0-9]
2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
XPASS: gcc.target/i386/pr99881.c scan-assembler-not xmm[0-9]

I have to look into some of them.  The pr92658 one seems to be cases like

void
bar_u32_u64 (v2di * dst, v4si src)
{
  unsigned long long tem[2];
  tem[0] = src[0];
  tem[1] = src[1];
  dst[0] = *(v2di *) tem;
}

where we fail to recognize the BIT_FIELD_REF as accessing a pre-existing
vector (we only support a subset of cases during SLP discovery):

  _1 = BIT_FIELD_REF <src_6(D), 32, 0>;
  _2 = (long long unsigned int) _1;
  tem[0] = _2;
  _3 = BIT_FIELD_REF <src_6(D), 32, 32>;
  _4 = (long long unsigned int) _3;
  tem[1] = _4;

but when vectorizing just store and the conversion as

  <bb 2> [local count: 1073741824]:
  _1 = BIT_FIELD_REF <src_6(D), 32, 0>;
  _3 = BIT_FIELD_REF <src_6(D), 32, 32>;
  _13 = {_1, _3};
  vect__2.110_14 = (vector(2) long long unsigned int) _13;
  MEM <vector(2) long long unsigned int> [(long long unsigned int *)&tem] =
vect__2.110_14;

we can recover things on the RTL side.

So we just realize that costing is a difficult thing.

Cost model analysis:
_2 1 times scalar_store costs 12 in body
_4 1 times scalar_store costs 12 in body
(long long unsigned int) _1 1 times scalar_stmt costs 4 in body
(long long unsigned int) _3 1 times scalar_stmt costs 4 in body
(long long unsigned int) _1 1 times vector_stmt costs 4 in body
node 0x415e268 1 times vec_construct costs 20 in prologue
_2 1 times vector_store costs 16 in body
Cost model analysis for part in loop 0:
  Vector cost: 40
  Scalar cost: 32
not vectorized: vectorization is not profitable.

note this uses icelake-server costs which has an unusally high sse_to_integer
cost.

The fix here would best be to recognize the BIT_FIELD_REF vector use of course.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (15 preceding siblings ...)
  2022-02-18 11:31 ` rguenth at gcc dot gnu.org
@ 2022-02-18 13:37 ` rguenth at gcc dot gnu.org
  2022-02-18 13:45 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18 13:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=99881

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
See also PR99881 where this XPASSes its testcase for eventual fallout in x264_r
on CLX and 538.imagick_r on Kabylake.  Unlike the fix for that PR I'm simply
re-using x86_cost->sse_to_integer here.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (16 preceding siblings ...)
  2022-02-18 13:37 ` rguenth at gcc dot gnu.org
@ 2022-02-18 13:45 ` rguenth at gcc dot gnu.org
  2022-02-18 21:16 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-18 13:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
For

FAIL: gcc.target/i386/pr91446.c scan-assembler-times vmovdqa[^\\n\\r]*xmm[0-9]
2

we used to produce

0000000000000000 <foo>:
   0:   48 83 ec 28             sub    $0x28,%rsp
   4:   c4 e1 f9 6e d7          vmovq  %rdi,%xmm2
   9:   c4 e1 f9 6e da          vmovq  %rdx,%xmm3
   e:   c4 e3 e9 22 ce 01       vpinsrq $0x1,%rsi,%xmm2,%xmm1
  14:   c4 e3 e1 22 c1 01       vpinsrq $0x1,%rcx,%xmm3,%xmm0
  1a:   48 89 e7                mov    %rsp,%rdi
  1d:   c5 f9 7f 0c 24          vmovdqa %xmm1,(%rsp)
  22:   c5 f9 7f 44 24 10       vmovdqa %xmm0,0x10(%rsp)
  28:   e8 00 00 00 00          call   2d <foo+0x2d>
  2d:   48 83 c4 28             add    $0x28,%rsp
  31:   c3                      ret    

but now reject this on costing grounds.  The scalar code is

0000000000000000 <foo>:
   0:   48 83 ec 28             sub    $0x28,%rsp
   4:   48 89 3c 24             mov    %rdi,(%rsp)
   8:   48 89 e7                mov    %rsp,%rdi
   b:   48 89 74 24 08          mov    %rsi,0x8(%rsp)
  10:   48 89 54 24 10          mov    %rdx,0x10(%rsp)
  15:   48 89 4c 24 18          mov    %rcx,0x18(%rsp)
  1a:   e8 00 00 00 00          call   1f <foo+0x1f>
  1f:   48 83 c4 28             add    $0x28,%rsp
  23:   c3                      ret    

I think the scalar variant is 5 uops up to the call while the vector variant
is 9 uops.  The scalar variant can also execute 4 of the uops in parallel
(well, I guess only up to 3 with 3 store ports).  I think the scalar
variant is better and so I'm inclined to adjust the testcase.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (17 preceding siblings ...)
  2022-02-18 13:45 ` rguenth at gcc dot gnu.org
@ 2022-02-18 21:16 ` pinskia at gcc dot gnu.org
  2022-02-22  7:58 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-18 21:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #18 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #6)
> Hmm:
>  _14 = {_1, _5};
>   _8 = VIEW_CONVERT_EXPR<__int128>(_14);
> 
> Wouldn't it better to convert that to just (hopefully I got the order
> correct):
> t1 = (__128)_1
> _8 = BIT_INSERT_EXPR(t1, 64, _5);
> 
> ?

I filed that as PR 104600 since it might be useful in the general case too.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (18 preceding siblings ...)
  2022-02-18 21:16 ` pinskia at gcc dot gnu.org
@ 2022-02-22  7:58 ` cvs-commit at gcc dot gnu.org
  2022-02-22  7:59 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-22  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:61fc5e098e76c9809f35f449a70c9c8d74773d9d

commit r12-7317-g61fc5e098e76c9809f35f449a70c9c8d74773d9d
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 18 11:34:52 2022 +0100

    tree-optimization/104582 - Simplify vectorizer cost API and fixes

    This simplifies the vectorizer cost API by providing overloads
    to add_stmt_cost and record_stmt_cost suitable for scalar stmt
    and branch stmt costing which do not need information like
    a vector type or alignment.  It also fixes two mistakes where
    costs for versioning tests were recorded as vector stmt rather
    than scalar stmt.

    This is a first patch to simplify the actual fix for PR104582.

    2022-02-18  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/104582
            * tree-vectorizer.h (add_stmt_cost): New overload.
            (record_stmt_cost): Likewise.
            * tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
            Use add_stmt_costs.
            (vect_get_known_peeling_cost): Use new overloads.
            (vect_estimate_min_profitable_iters): Likewise.  Consistently
            use scalar_stmt for costing versioning checks.
            * tree-vect-stmts.cc (record_stmt_cost): New overload.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (19 preceding siblings ...)
  2022-02-22  7:58 ` cvs-commit at gcc dot gnu.org
@ 2022-02-22  7:59 ` cvs-commit at gcc dot gnu.org
  2022-02-22  7:59 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-22  7:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #20 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:f24dfc76177b3994434c8beb287cde1a9976b5ce

commit r12-7318-gf24dfc76177b3994434c8beb287cde1a9976b5ce
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 18 11:50:44 2022 +0100

    tree-optimization/104582 - make SLP node available in vector cost hook

    This adjusts the vectorizer costing API to allow passing down the
    SLP node the vector stmt is created from.

    2022-02-18  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/104582
            * tree-vectorizer.h (stmt_info_for_cost::node): New field.
            (vector_costs::add_stmt_cost): Add SLP node parameter.
            (dump_stmt_cost): Likewise.
            (add_stmt_cost): Likewise, new overload and adjust.
            (add_stmt_costs): Adjust.
            (record_stmt_cost): New overload.
            * tree-vectorizer.cc (dump_stmt_cost): Dump the SLP node.
            (vector_costs::add_stmt_cost): Adjust.
            * tree-vect-loop.cc (vect_estimate_min_profitable_iters):
            Adjust.
            * tree-vect-slp.cc (vect_prologue_cost_for_slp): Record
            the SLP node for costing.
            (vectorizable_slp_permutation): Likewise.
            * tree-vect-stmts.cc (record_stmt_cost): Adjust and add
            new overloads.
            * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
            Adjust.
            * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost):
            Adjust.
            * config/rs6000/rs6000.cc (rs6000_vector_costs::add_stmt_cost):
            Adjust.
            (rs6000_cost_data::adjust_vect_cost_per_loop): Likewise.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (20 preceding siblings ...)
  2022-02-22  7:59 ` cvs-commit at gcc dot gnu.org
@ 2022-02-22  7:59 ` cvs-commit at gcc dot gnu.org
  2022-02-22  7:59 ` [Bug tree-optimization/104582] [11 " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-22  7:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #21 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:90d693bdc9d71841f51d68826ffa5bd685d7f0bc

commit r12-7319-g90d693bdc9d71841f51d68826ffa5bd685d7f0bc
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 18 14:32:14 2022 +0100

    target/99881 - x86 vector cost of CTOR from integer regs

    This uses the now passed SLP node to the vectorizer costing hook
    to adjust vector construction costs for the cost of moving an
    integer component from a GPR to a vector register when that's
    required for building a vector from components.  A cruical difference
    here is whether the component is loaded from memory or extracted
    from a vector register as in those cases no intermediate GPR is involved.

    The pr99881.c testcase can be Un-XFAILed with this patch, the
    pr91446.c testcase now produces scalar code which looks superior
    to me so I've adjusted it as well.

    2022-02-18  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/104582
            PR target/99881
            * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
            Cost GPR to vector register moves for integer vector construction.

            * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: New.
            * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Likewise.
            * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Likewise.
            * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Likewise.
            * gcc.target/i386/pr99881.c: Un-XFAIL.
            * gcc.target/i386/pr91446.c: Adjust to not expect vectorization.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (21 preceding siblings ...)
  2022-02-22  7:59 ` cvs-commit at gcc dot gnu.org
@ 2022-02-22  7:59 ` rguenth at gcc dot gnu.org
  2022-04-07  8:15 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-22  7:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-* i?86-*-*
            Summary|[11/12 Regression]          |[11 Regression] Unoptimal
                   |Unoptimal code for __negdi2 |code for __negdi2 (and
                   |(and others) from libgcc2   |others) from libgcc2 due to
                   |due to unwanted             |unwanted vectorization
                   |vectorization               |
      Known to work|                            |12.0

--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is now fixed on trunk for x86.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (22 preceding siblings ...)
  2022-02-22  7:59 ` [Bug tree-optimization/104582] [11 " rguenth at gcc dot gnu.org
@ 2022-04-07  8:15 ` rguenth at gcc dot gnu.org
  2022-04-21  7:51 ` rguenth at gcc dot gnu.org
  2023-05-29 10:06 ` jakub at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-07  8:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
I do not plan to backport this given it's quite intrusive and had some fallout.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (23 preceding siblings ...)
  2022-04-07  8:15 ` rguenth at gcc dot gnu.org
@ 2022-04-21  7:51 ` rguenth at gcc dot gnu.org
  2023-05-29 10:06 ` jakub at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-21  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.3                        |11.4

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 11.3 is being released, retargeting bugs to GCC 11.4.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/104582] [11 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization
  2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
                   ` (24 preceding siblings ...)
  2022-04-21  7:51 ` rguenth at gcc dot gnu.org
@ 2023-05-29 10:06 ` jakub at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-29 10:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.4                        |11.5

--- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 11.4 is being released, retargeting bugs to GCC 11.5.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-05-29 10:06 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-17 12:30 [Bug target/104582] New: Unoptimal code for __negdi2 (and others) from libgcc2 ubizjak at gmail dot com
2022-02-17 12:35 ` [Bug tree-optimization/104582] " ubizjak at gmail dot com
2022-02-17 12:45 ` ubizjak at gmail dot com
2022-02-17 15:28 ` [Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization jakub at gcc dot gnu.org
2022-02-17 16:35 ` jakub at gcc dot gnu.org
2022-02-17 16:40 ` jakub at gcc dot gnu.org
2022-02-18  0:43 ` pinskia at gcc dot gnu.org
2022-02-18  7:26 ` rguenth at gcc dot gnu.org
2022-02-18  8:37 ` jakub at gcc dot gnu.org
2022-02-18  8:48 ` rguenth at gcc dot gnu.org
2022-02-18  8:53 ` rguenth at gcc dot gnu.org
2022-02-18  9:02 ` jakub at gcc dot gnu.org
2022-02-18  9:28 ` rguenther at suse dot de
2022-02-18 10:19 ` rguenth at gcc dot gnu.org
2022-02-18 10:22 ` rguenth at gcc dot gnu.org
2022-02-18 10:50 ` rguenth at gcc dot gnu.org
2022-02-18 11:31 ` rguenth at gcc dot gnu.org
2022-02-18 13:37 ` rguenth at gcc dot gnu.org
2022-02-18 13:45 ` rguenth at gcc dot gnu.org
2022-02-18 21:16 ` pinskia at gcc dot gnu.org
2022-02-22  7:58 ` cvs-commit at gcc dot gnu.org
2022-02-22  7:59 ` cvs-commit at gcc dot gnu.org
2022-02-22  7:59 ` cvs-commit at gcc dot gnu.org
2022-02-22  7:59 ` [Bug tree-optimization/104582] [11 " rguenth at gcc dot gnu.org
2022-04-07  8:15 ` rguenth at gcc dot gnu.org
2022-04-21  7:51 ` rguenth at gcc dot gnu.org
2023-05-29 10:06 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).