[Bug target/95762] New: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/95762] New: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2
@ 2020-06-19  9:35 gabravier at gmail dot com
  2020-06-19  9:45 ` [Bug target/95762] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: gabravier at gmail dot com @ 2020-06-19  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762

            Bug ID: 95762
           Summary: Failure to optimize __builtin_convertvector from
                    vector of 16 chars to vector of 16 shorts in a single
                    instruction on AVX2
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef int8_t v16i8 __attribute__((vector_size(16)));
typedef int16_t v16i16 __attribute__((vector_size(32)));

auto f(v16i8 a)
{
    return __builtin_convertvector(a, v16i16);
}

With -O3 -mavx2, LLVM outputs this :

f(signed char __vector(16)):
  vpmovsxbw ymm0, xmm0
  ret

GCC outputs this :

f(signed char __vector(16)):
  vpmovsxbw xmm1, xmm0
  vpsrldq xmm0, xmm0, 8
  vpmovsxbw xmm0, xmm0
  vinserti128 ymm0, ymm1, xmm0, 0x1
  ret

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/95762] Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2
  2020-06-19  9:35 [Bug target/95762] New: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2 gabravier at gmail dot com
@ 2020-06-19  9:45 ` rguenth at gcc dot gnu.org
  2020-06-19 10:09 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-19  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-06-19
             Status|UNCONFIRMED                 |NEW
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're currently representing this as a .VEC_CONVERT IFN lowered at veclower
time to

  _4 = [vec_unpack_lo_expr] a_1(D);
  _5 = [vec_unpack_hi_expr] a_1(D);
  _2 = {_4, _5};

rather than using a NOP_EXPR as would be possible now.  I suppose we should
remove .VEC_CONVERT again for vector integer conversions and directly
use NOP_EXPRs plus make sure to lower those when not supported.  Not
sure if __builtin_convertvector also supports integer<->float conversions.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/95762] Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2
  2020-06-19  9:35 [Bug target/95762] New: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2 gabravier at gmail dot com
  2020-06-19  9:45 ` [Bug target/95762] " rguenth at gcc dot gnu.org
@ 2020-06-19 10:09 ` jakub at gcc dot gnu.org
  2020-06-19 10:22 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-06-19 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> We're currently representing this as a .VEC_CONVERT IFN lowered at veclower
> time to
> 
>   _4 = [vec_unpack_lo_expr] a_1(D);
>   _5 = [vec_unpack_hi_expr] a_1(D);
>   _2 = {_4, _5};
> 
> rather than using a NOP_EXPR as would be possible now.  I suppose we should
> remove .VEC_CONVERT again for vector integer conversions and directly
> use NOP_EXPRs plus make sure to lower those when not supported.  Not
> sure if __builtin_convertvector also supports integer<->float conversions.

__builtin_convertvector does support integer<->float conversions too.
I'd say we should just fold .VEC_CONVERT to something more appropriate if the
conditions are right (e.g. if an optab says it is possible to do it in a
different way that will also survive veclower) and otherwise keep it as is.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/95762] Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2
  2020-06-19  9:35 [Bug target/95762] New: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2 gabravier at gmail dot com
  2020-06-19  9:45 ` [Bug target/95762] " rguenth at gcc dot gnu.org
  2020-06-19 10:09 ` jakub at gcc dot gnu.org
@ 2020-06-19 10:22 ` jakub at gcc dot gnu.org
  2020-06-19 10:43 ` rguenther at suse dot de
  2020-06-19 11:05 ` jakub at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-06-19 10:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
And I should note, because of offloading, it would be better to do that kind of
folding only after_inlining.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/95762] Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2
  2020-06-19  9:35 [Bug target/95762] New: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2 gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2020-06-19 10:22 ` jakub at gcc dot gnu.org
@ 2020-06-19 10:43 ` rguenther at suse dot de
  2020-06-19 11:05 ` jakub at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenther at suse dot de @ 2020-06-19 10:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 19 Jun 2020, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762
> 
> --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> And I should note, because of offloading, it would be better to do that kind of
> folding only after_inlining.

Hmm, does that also hold for all the vector permute/ctor "optimizations"
forwprop does?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/95762] Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2
  2020-06-19  9:35 [Bug target/95762] New: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2 gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2020-06-19 10:43 ` rguenther at suse dot de
@ 2020-06-19 11:05 ` jakub at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-06-19 11:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95762

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'd say anything that depends on optabs if possible.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-19 11:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-19  9:35 [Bug target/95762] New: Failure to optimize __builtin_convertvector from vector of 16 chars to vector of 16 shorts in a single instruction on AVX2 gabravier at gmail dot com
2020-06-19  9:45 ` [Bug target/95762] " rguenth at gcc dot gnu.org
2020-06-19 10:09 ` jakub at gcc dot gnu.org
2020-06-19 10:22 ` jakub at gcc dot gnu.org
2020-06-19 10:43 ` rguenther at suse dot de
2020-06-19 11:05 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).