public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/103554] New: -mavx generates worse code on scalar code
@ 2021-12-04 15:37 avi at scylladb dot com
  2021-12-04 22:08 ` [Bug target/103554] " pinskia at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: avi at scylladb dot com @ 2021-12-04 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

            Bug ID: 103554
           Summary: -mavx generates worse code on scalar code
           Product: gcc
           Version: 11.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: avi at scylladb dot com
  Target Milestone: ---

Test case:

struct s1 {
    long a, b, c, d, e, f, g, h;
};

s1 move(s1 in) {
    s1 ret;

    ret.a = in.d;
    ret.b = in.e;
    ret.c = in.a;
    ret.d = in.b;
    return ret;
}


-O3 generates:

move(s1):
  movq 8(%rsp), %xmm0
  movq 32(%rsp), %xmm1
  movq %rdi, %rax
  movhps 16(%rsp), %xmm0
  movhps 40(%rsp), %xmm1
  movups %xmm1, (%rdi)
  movups %xmm0, 16(%rdi)
  ret


-O3 -mavx generates:

move(s1):
        pushq   %rbp
        movq    %rdi, %rax
        movq    %rsp, %rbp
        vmovq   16(%rbp), %xmm2
        vmovq   40(%rbp), %xmm3
        vpinsrq $1, 24(%rbp), %xmm2, %xmm1
        vpinsrq $1, 48(%rbp), %xmm3, %xmm0
        vinsertf128     $0x1, %xmm1, %ymm0, %ymm0
        vmovdqu %ymm0, (%rdi)
        vzeroupper
        popq    %rbp
        ret

Clang -O3 generates this simple code, with or without -mavx (-mavx does use VEX
instructions):

move(s1): # @move(s1)
  movq %rdi, %rax
  movups 32(%rsp), %xmm0
  movups %xmm0, (%rdi)
  movaps 8(%rsp), %xmm0
  movups %xmm0, 16(%rdi)
  retq

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
@ 2021-12-04 22:08 ` pinskia at gcc dot gnu.org
  2021-12-06  8:36 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-04 22:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is SLP happening, I thought I have seen this issue before.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
  2021-12-04 22:08 ` [Bug target/103554] " pinskia at gcc dot gnu.org
@ 2021-12-06  8:36 ` rguenth at gcc dot gnu.org
  2021-12-06  9:00 ` avi at scylladb dot com
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-12-06  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com,
                   |                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-12-06
             Blocks|                            |53947
     Ever confirmed|0                           |1
             Target|                            |x86_64-*-*

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is

t.ii:8:11: note:   ==> examining statement: _1 = in.d;
t.ii:8:11: missed:   BB vectorization with gaps at the end of a load is not
supported
t.ii:8:16: missed:   not vectorized: relevant stmt not supported: _1 = in.d;
t.ii:8:11: note:   Building vector operands of 0x447c768 from scalars instead

when trying to vectorize this with V4DI.  We don't realize that with a visible
decl we can load the gap.  Indeed I think I've seen this before as well.

Note with SSE the same issue is present but we create the V2DI vectors in
a more optimal way from scalars.  With the above issue fixed we'd instead
use two V4DI unaligned vector moves from the stack and a shuffle.

The locally optimal solution would be two unaligned V2DI loads and either
two V2DI stores or a V4DI merge and store.

_Note_ that likely the suboptimal solution presented here is faster because
it avoids STLF penalties from the calls stack setup which very likely uses
scalar or differently aligned vector moves.

Note the x86 backend costs the SSE variant

t.ii:8:11: note: Cost model analysis for part in loop 0:
  Vector cost: 40
  Scalar cost: 48

and the AVX variant

t.ii:8:11: note: Cost model analysis for part in loop 0:
  Vector cost: 48
  Scalar cost: 48

but the x86 backend chooses to not let the vectorizer compare costs with
different vector sizes but instead asks it to pick the first working
solution from the vector of modes to consider (and in that order).  We
might want to reconsider that (maybe at least for BB vectorization and
maybe with some extra special mode?).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
  2021-12-04 22:08 ` [Bug target/103554] " pinskia at gcc dot gnu.org
  2021-12-06  8:36 ` rguenth at gcc dot gnu.org
@ 2021-12-06  9:00 ` avi at scylladb dot com
  2021-12-06 11:00 ` rguenther at suse dot de
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: avi at scylladb dot com @ 2021-12-06  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #3 from Avi Kivity <avi at scylladb dot com> ---
> _Note_ that likely the suboptimal solution presented here is faster because
> it avoids STLF penalties from the calls stack setup which very likely uses
> scalar or differently aligned vector moves.

Interesting point. Agner says (Icelake):

> A read that is bigger than the write, or a read that covers both written and unwritten bytes,
> fails to forward. The write-to-read latency is 19-20 clock cycles.

However, the same code is generated when `in` is a reference, in which case it
may not be in the store queue at all, so we're paying two extra instructions
for nothing. movhps is also 2 uops, so we're paying 3 uops to load 2 elements.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (2 preceding siblings ...)
  2021-12-06  9:00 ` avi at scylladb dot com
@ 2021-12-06 11:00 ` rguenther at suse dot de
  2021-12-06 11:23 ` avi at scylladb dot com
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2021-12-06 11:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 6 Dec 2021, avi at scylladb dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
> 
> --- Comment #3 from Avi Kivity <avi at scylladb dot com> ---
> > _Note_ that likely the suboptimal solution presented here is faster because
> > it avoids STLF penalties from the calls stack setup which very likely uses
> > scalar or differently aligned vector moves.
> 
> Interesting point. Agner says (Icelake):
> 
> > A read that is bigger than the write, or a read that covers both written and unwritten bytes,
> > fails to forward. The write-to-read latency is 19-20 clock cycles.

Note the penalty is usually much bigger since the CPU speculatively issues
the load rather than using the data in the store buffers and thus when
the store retires it has to flush & restart.

> However, the same code is generated when `in` is a reference, in which case it
> may not be in the store queue at all, so we're paying two extra instructions
> for nothing. movhps is also 2 uops, so we're paying 3 uops to load 2 elements.

Yes - across function boundaries it's difficult to weight possible STLF
against less optimal code (we've talked about trying to use IPA analysis
to discover the likeliness of a STLF failure).

I just wanted to say that looking at such small code in isolation may
fail to cover important parts of the bigger picture ;)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (3 preceding siblings ...)
  2021-12-06 11:00 ` rguenther at suse dot de
@ 2021-12-06 11:23 ` avi at scylladb dot com
  2021-12-06 11:52 ` rguenther at suse dot de
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: avi at scylladb dot com @ 2021-12-06 11:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #5 from Avi Kivity <avi at scylladb dot com> ---
Here's some big-picture data. Compiled with clang, which seems to ignore these
STLF issues.

no-slp:

42641.91 tps ( 75.1 allocs/op,  12.1 tasks/op,   44929 insns/op)
42446.41 tps ( 75.1 allocs/op,  12.1 tasks/op,   44870 insns/op)
42495.03 tps ( 75.1 allocs/op,  12.1 tasks/op,   44931 insns/op)
42703.40 tps ( 75.1 allocs/op,  12.1 tasks/op,   44916 insns/op)
42798.98 tps ( 75.1 allocs/op,  12.1 tasks/op,   44963 insns/op)

slp:

41536.46 tps ( 75.1 allocs/op,  12.1 tasks/op,   44828 insns/op)
41482.05 tps ( 75.1 allocs/op,  12.1 tasks/op,   44802 insns/op)
41707.23 tps ( 75.1 allocs/op,  12.1 tasks/op,   44874 insns/op)
41811.10 tps ( 75.1 allocs/op,  12.1 tasks/op,   44847 insns/op)
41764.39 tps ( 75.1 allocs/op,  12.1 tasks/op,   44846 insns/op)

So slp definitely has negative impact on ops/sec, even though it reduces
instructions/op. This is on an older machine (newer ones have ~5X perf, with 3X
higher IPC and the rest due to higher frequency).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (4 preceding siblings ...)
  2021-12-06 11:23 ` avi at scylladb dot com
@ 2021-12-06 11:52 ` rguenther at suse dot de
  2021-12-06 11:59 ` avi at scylladb dot com
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2021-12-06 11:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 6 Dec 2021, avi at scylladb dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
> 
> --- Comment #5 from Avi Kivity <avi at scylladb dot com> ---
> Here's some big-picture data. Compiled with clang, which seems to ignore these
> STLF issues.
> 
> no-slp:
> 
> 42641.91 tps ( 75.1 allocs/op,  12.1 tasks/op,   44929 insns/op)
> 42446.41 tps ( 75.1 allocs/op,  12.1 tasks/op,   44870 insns/op)
> 42495.03 tps ( 75.1 allocs/op,  12.1 tasks/op,   44931 insns/op)
> 42703.40 tps ( 75.1 allocs/op,  12.1 tasks/op,   44916 insns/op)
> 42798.98 tps ( 75.1 allocs/op,  12.1 tasks/op,   44963 insns/op)
> 
> slp:
> 
> 41536.46 tps ( 75.1 allocs/op,  12.1 tasks/op,   44828 insns/op)
> 41482.05 tps ( 75.1 allocs/op,  12.1 tasks/op,   44802 insns/op)
> 41707.23 tps ( 75.1 allocs/op,  12.1 tasks/op,   44874 insns/op)
> 41811.10 tps ( 75.1 allocs/op,  12.1 tasks/op,   44847 insns/op)
> 41764.39 tps ( 75.1 allocs/op,  12.1 tasks/op,   44846 insns/op)
> 
> So slp definitely has negative impact on ops/sec, even though it reduces
> instructions/op. This is on an older machine (newer ones have ~5X perf, with 3X
> higher IPC and the rest due to higher frequency).

Is that with the function inlined?  Can you show the argument setup
code at the caller side?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (5 preceding siblings ...)
  2021-12-06 11:52 ` rguenther at suse dot de
@ 2021-12-06 11:59 ` avi at scylladb dot com
  2021-12-07  2:26 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: avi at scylladb dot com @ 2021-12-06 11:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #7 from Avi Kivity <avi at scylladb dot com> ---
Sorry, I was unclear. That's the entire program with a huge number of slp
opportunities. Each iteration in the program is ~40k instructions. It's not
directly related to the test case, which is artificial, and arose from me
exploring the differences between gcc and clang code generation.

It does demonstrate that clang's slp is counterproductive. I promise to repeat
the experiment with gcc once it stops ICEing.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (6 preceding siblings ...)
  2021-12-06 11:59 ` avi at scylladb dot com
@ 2021-12-07  2:26 ` crazylht at gmail dot com
  2021-12-07  8:36 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-12-07  2:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
> but the x86 backend chooses to not let the vectorizer compare costs with
> different vector sizes but instead asks it to pick the first working
> solution from the vector of modes to consider (and in that order).  We
> might want to reconsider that (maybe at least for BB vectorization and
> maybe with some extra special mode?).

Shouldn't the vectorizer compare costs of different vector factors and choose
the samllest one, or vectorizer already support the corresponding framework,
but the x86 backend doesn't implement the corresponding target_hook?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (7 preceding siblings ...)
  2021-12-07  2:26 ` crazylht at gmail dot com
@ 2021-12-07  8:36 ` rguenth at gcc dot gnu.org
  2021-12-07  9:51 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-12-07  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #8)
> > but the x86 backend chooses to not let the vectorizer compare costs with
> > different vector sizes but instead asks it to pick the first working
> > solution from the vector of modes to consider (and in that order).  We
> > might want to reconsider that (maybe at least for BB vectorization and
> > maybe with some extra special mode?).
> 
> Shouldn't the vectorizer compare costs of different vector factors and
> choose the samllest one, or vectorizer already support the corresponding
> framework, but the x86 backend doesn't implement the corresponding
> target_hook?

This is controlled by the autovectorize_vector_modes hook where the
return value is documented as

The hook returns a bitmask of flags that control how the modes in\n\
@var{modes} are used.  The flags are:\n\
@table @code\n\
@item VECT_COMPARE_COSTS\n\
Tells the loop vectorizer to try all the provided modes and pick the one\n\
with the lowest cost.  By default the vectorizer will choose the first\n\
mode that works.\n\
@end table\n\
\n\
The hook does not need to do anything if the vector returned by\n\
@code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is the only one relevant\n\
for autovectorization.  The default implementation adds no modes and\n\
returns 0.

IIRC we don't compare costs since we have so many vector modes and iterating
over them is (compile-time) costly and the question is how well we can trust
our cost model here to make a concise decision.  The hook currently is not
told the vectorization mode (loop vs. basic-block vectorization) - we might
want to add this info and amend the hook accordingly.  We might also want
to add another mode that says to stop iterating over modes when the
vectorizer runs into a mode with larger cost - like if we have mode/cost
pairs { V64QI, 64 } { V32QI, 56 } { V16QI, 60 } then stop and not try
V8QI and V4QI.

Note that returning VECT_COMPARE_COSTS has to be done carefully to avoid
changing semantics of -mprefer-vector-with, currently if the preferred
width can be used we use it but with comparing costs we can end up
using a smaller vector size if that's deemed better.  -mprefer-vector-width
would behave more like a -mmax-vector-width when comparing costs.

We could add VECT_FIRST_PREFERRED and make the vectorizer pick the first
mode (which we'd then need to order first) and only if that isn't supported
compare costs.  Alternatively simply only return VECT_COMPARE_COSTS when
no -mprefer-* is given.

I was mostly pointing out that the cost modeling for this particular case
would have prefered SSE but we told the vectorizer to pick the first
successful attempt.  Note the very first mode tried is _not_ the first
mode in the array but it's the one auto-detected from the testcases
use and the preferred_simd_mode hook.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (8 preceding siblings ...)
  2021-12-07  8:36 ` rguenth at gcc dot gnu.org
@ 2021-12-07  9:51 ` crazylht at gmail dot com
  2021-12-07 10:28 ` rguenther at suse dot de
  2021-12-09  3:07 ` crazylht at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-12-07  9:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
Got it, thanks for your detail explanation, so there're 2 issues in this case,
first x86 target didn't choose vector size w/ smallest cost, second BB
vectorization with gaps at the end of a load is not supported.

on the other side, if "BB vectorization with gaps at the end of a load is not
supported", cost of scalar version should be cheaper than both 128 and 256
vectorization. I've once tried to increase cost of vec_construct to make it
more realistic, but the patch regressed PR101929. The current cost model tends
to generate more vectorized code.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (9 preceding siblings ...)
  2021-12-07  9:51 ` crazylht at gmail dot com
@ 2021-12-07 10:28 ` rguenther at suse dot de
  2021-12-09  3:07 ` crazylht at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2021-12-07 10:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #11 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 7 Dec 2021, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
> 
> --- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
> Got it, thanks for your detail explanation, so there're 2 issues in this case,
> first x86 target didn't choose vector size w/ smallest cost, second BB
> vectorization with gaps at the end of a load is not supported.
> 
> on the other side, if "BB vectorization with gaps at the end of a load is not
> supported", cost of scalar version should be cheaper than both 128 and 256
> vectorization. I've once tried to increase cost of vec_construct to make it
> more realistic, but the patch regressed PR101929. The current cost model tends
> to generate more vectorized code.

The cost model would really need to look at more than a single stmt.  If
there is work to schedule in parallel to a vector build then it really
isn't that expensive.  It's just that if we are dependent on the result
and cannot proceed then it can end up being more expensive.

Remember we are really costing assuming stmts execute one at a time,
simply adding latencies.  We have ideas on how to improve on that side.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/103554] -mavx generates worse code on scalar code
  2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
                   ` (10 preceding siblings ...)
  2021-12-07 10:28 ` rguenther at suse dot de
@ 2021-12-09  3:07 ` crazylht at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-12-09  3:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554

--- Comment #12 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to rguenther@suse.de from comment #11)
> On Tue, 7 Dec 2021, crazylht at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
> > 
> > --- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
> > Got it, thanks for your detail explanation, so there're 2 issues in this case,
> > first x86 target didn't choose vector size w/ smallest cost, second BB
> > vectorization with gaps at the end of a load is not supported.
> > 
> > on the other side, if "BB vectorization with gaps at the end of a load is not
> > supported", cost of scalar version should be cheaper than both 128 and 256
> > vectorization. I've once tried to increase cost of vec_construct to make it
> > more realistic, but the patch regressed PR101929. The current cost model tends
> > to generate more vectorized code.
> 
> The cost model would really need to look at more than a single stmt.  If
> there is work to schedule in parallel to a vector build then it really
> isn't that expensive.  It's just that if we are dependent on the result
> and cannot proceed then it can end up being more expensive.
> 
> Remember we are really costing assuming stmts execute one at a time,
> simply adding latencies.  We have ideas on how to improve on that side.

Agree, let me play with VECT_COMPARE_COSTS.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-12-09  3:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-04 15:37 [Bug target/103554] New: -mavx generates worse code on scalar code avi at scylladb dot com
2021-12-04 22:08 ` [Bug target/103554] " pinskia at gcc dot gnu.org
2021-12-06  8:36 ` rguenth at gcc dot gnu.org
2021-12-06  9:00 ` avi at scylladb dot com
2021-12-06 11:00 ` rguenther at suse dot de
2021-12-06 11:23 ` avi at scylladb dot com
2021-12-06 11:52 ` rguenther at suse dot de
2021-12-06 11:59 ` avi at scylladb dot com
2021-12-07  2:26 ` crazylht at gmail dot com
2021-12-07  8:36 ` rguenth at gcc dot gnu.org
2021-12-07  9:51 ` crazylht at gmail dot com
2021-12-07 10:28 ` rguenther at suse dot de
2021-12-09  3:07 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).