[Bug c++/107432] New: __builtin_convertvector generates inefficient code

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/107432] New: __builtin_convertvector generates inefficient code
@ 2022-10-27 10:02 g.peterhoff@t-online.de
  2022-10-27 15:12 ` [Bug target/107432] " pinskia at gcc dot gnu.org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: g.peterhoff@t-online.de @ 2022-10-27 10:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

            Bug ID: 107432
           Summary: __builtin_convertvector generates inefficient code
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: g.peterhoff@t-online.de
  Target Milestone: ---

Example: conversion int64_t -> int32_t

avx512f + avx512vl
HW conversions are available.

avx2
There is a correctly working 32-bit-permutation
(_mm256_permutevar8x32_epi32/vpermd) that can be used.

I have not (yet) evaluated whether other conversions (larger int -> smaller
int) are also affected.
PS: On x86 it's already hell to optimize all cases depending on the instruction
set.
PPS: What about -march=znver4 ?

https://godbolt.org/z/3s79bnh7v

thx
Gero

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
@ 2022-10-27 15:12 ` pinskia at gcc dot gnu.org
  2022-10-27 16:14 ` g.peterhoff@t-online.de
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-27 15:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 53781
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53781&action=edit
testcase

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
  2022-10-27 15:12 ` [Bug target/107432] " pinskia at gcc dot gnu.org
@ 2022-10-27 16:14 ` g.peterhoff@t-online.de
  2022-10-28  3:33 ` crazylht at gmail dot com
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: g.peterhoff@t-online.de @ 2022-10-27 16:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #2 from g.peterhoff@t-online.de ---
Another example. I want to convert an array<Bool> to array<Float64>.
There are basically 3 options:
- Copy
- Test (b2f64_default)
- optimized version (b2f64_manually)

gcc12.2 + gcctrunc
convertSIZE_copy only generates scalar code (_mm_cvtsi64_sd)
convertSIZE_default always generates conditional jumps

convertSIZE_manually
gcctrunc always generates branch-free scalar code
gcc12.2
convert1024_manually generates vector code, but does not use HW conversion
int8->int64 (_mm(256)_cvtepi8_epi64) and converts int8->int16->int32->int64
manually
convert8_manually generates branch-free scalar code
convert4_manually generates vector code and uses HW conversion int8->int64


NONE of these conversions are transformed/optimized to the extent that always
- all available intrinsics are used
- no "normal" registers are used
- branch-free code is generated

https://godbolt.org/z/f74vK79of

thx
Gero

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
  2022-10-27 15:12 ` [Bug target/107432] " pinskia at gcc dot gnu.org
  2022-10-27 16:14 ` g.peterhoff@t-online.de
@ 2022-10-28  3:33 ` crazylht at gmail dot com
  2022-10-28  3:36 ` crazylht at gmail dot com
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28  3:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
typedef int v4si __attribute__((vector_size(16)));
typedef long long v4di __attribute__((vector_size(32)));

v4si
foo (v4di a)
{
    return __builtin_convertvector (a, v4si);
}

hmm, we actually support truncv4div4si2, but some how gcc failed to generate
.VEC_CONVERT with truncmn2.

hmm, what's optab for convert_optab_handler?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (2 preceding siblings ...)
  2022-10-28  3:33 ` crazylht at gmail dot com
@ 2022-10-28  3:36 ` crazylht at gmail dot com
  2022-10-28  5:22 ` crazylht at gmail dot com
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28  3:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> typedef int v4si __attribute__((vector_size(16)));
> typedef long long v4di __attribute__((vector_size(32)));
> 
> v4si
> foo (v4di a)
> {
>     return __builtin_convertvector (a, v4si);
> }
> 
> hmm, we actually support truncv4div4si2, but some how gcc failed to generate
> .VEC_CONVERT with truncmn2.
> 

/* IFN_VEC_CONVERT is supposed to be expanded at pass_lower_vector.  So this
   dummy function should never be called.  */

static void
expand_VEC_CONVERT (internal_fn, gcall *)
{
  gcc_unreachable ();
}

It's lowered by pass_lower_vector, ideally, can we use truncmn2 in
expand_VEC_CONVERT if src is bigger integer mode than dest.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (3 preceding siblings ...)
  2022-10-28  3:36 ` crazylht at gmail dot com
@ 2022-10-28  5:22 ` crazylht at gmail dot com
  2022-10-28  5:33 ` crazylht at gmail dot com
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28  5:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---

> It's lowered by pass_lower_vector, ideally, can we use truncmn2 in
> expand_VEC_CONVERT if src is bigger integer mode than dest.

Currently, expand_vector_conversion uses VEC_PACK_TRUNC_EXPR

---------------cut begins------------------------
  else if (modifier == NARROW)
    {
      switch (code)
        {
        CASE_CONVERT:
          code1 = VEC_PACK_TRUNC_EXPR;
          optab1 = optab_for_tree_code (code1, arg_type, optab_default);
          break;

---------------Cut ends------------------------

But BB vectorizer can do the right thing for 

void
foo (long long* a, int* b)
{
    b[0] = a[0];
    b[1] = a[1];
    b[2] = a[2];
    b[3] = a[3];
}



        vmovdqu ymm0, YMMWORD PTR [rdi]
        vpmovqd XMMWORD PTR [rsi], ymm0
        vzeroupper
        ret


  vect__1.5_16 = MEM <vector(4) long long int> [(long long int *)a_10(D)];
  vect__2.6_18 = (vector(4) int) vect__1.5_16;
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  MEM <vector(4) int> [(int *)b_11(D)] = vect__2.6_18;
  return;


Guess expand_vector_conversion can be optimized.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (4 preceding siblings ...)
  2022-10-28  5:22 ` crazylht at gmail dot com
@ 2022-10-28  5:33 ` crazylht at gmail dot com
  2022-10-28  6:55 ` crazylht at gmail dot com
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28  5:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---

> Guess expand_vector_conversion can be optimized.

  if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
      && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
    code = FIX_TRUNC_EXPR;
  else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
           && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
    code = FLOAT_EXPR;

It only supports floatmn2/fix_truncmn2 for float <-> integer.

But we can also supports extendmn2/zero_extendmn2/truncmn2 for float <-> float,
integer <-> integer.

Or are there any concerns and VEC_PACK_TRUNC_EXPR,
VEC_PACK_FIX_TRUNC_EXPR,VEC_PACK_FLOAT_EXPR are used on purpose?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (5 preceding siblings ...)
  2022-10-28  5:33 ` crazylht at gmail dot com
@ 2022-10-28  6:55 ` crazylht at gmail dot com
  2022-10-28 11:41 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28  6:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #6)
> > Guess expand_vector_conversion can be optimized.
> 
>   if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
>       && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
>     code = FIX_TRUNC_EXPR;
>   else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
> 	   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
>     code = FLOAT_EXPR;
> 
> It only supports floatmn2/fix_truncmn2 for float <-> integer.
> 
> But we can also supports extendmn2/zero_extendmn2/truncmn2 for float <->
> float, integer <-> integer.
> 
> Or are there any concerns and VEC_PACK_TRUNC_EXPR,
> VEC_PACK_FIX_TRUNC_EXPR,VEC_PACK_FLOAT_EXPR are used on purpose?

May be we can add some gimple simplication in match.pd to hanlde 
  _4 = VEC_PACK_TRUNC_EXPR <a_1(D), { 0, 0, 0, 0 }>;
  _5 = BIT_FIELD_REF <_4, 128, 0>;

and

  _4 = [vec_unpack_lo_expr] a_1(D);
  _5 = [vec_unpack_hi_expr] a_1(D);
  _2 = {_4, _5};

Since loop vectorizer may also create vec_unpack_lo_expr/vec_unpack_hi_expr.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (6 preceding siblings ...)
  2022-10-28  6:55 ` crazylht at gmail dot com
@ 2022-10-28 11:41 ` rguenth at gcc dot gnu.org
  2022-10-31 13:02 ` rsandifo at gcc dot gnu.org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-10-28 11:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org
             Target|X86_64                      |x86_64-*-*
   Last reconfirmed|                            |2022-10-28
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
            Version|unknown                     |13.0

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #6)
> > Guess expand_vector_conversion can be optimized.
> 
>   if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
>       && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
>     code = FIX_TRUNC_EXPR;
>   else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
> 	   && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
>     code = FLOAT_EXPR;
> 
> It only supports floatmn2/fix_truncmn2 for float <-> integer.
> 
> But we can also supports extendmn2/zero_extendmn2/truncmn2 for float <->
> float, integer <-> integer.
> 
> Or are there any concerns and VEC_PACK_TRUNC_EXPR,
> VEC_PACK_FIX_TRUNC_EXPR,VEC_PACK_FLOAT_EXPR are used on purpose?

I think we do support FIX_TRUNC_EXPR or FLOAT_EXPR for float <-> int
conversion of vectors like we now support {CONVERT,NOP}_EXPR for
just widening/shortening.  At least the GIMPLE verifier allows that.

The obtabs would be [us]fix and [us]float, not sure if aarch64 makes use
of those for vector modes or if Richard extended the vectorizer to
consider those (I only remember int <-> int conversions).

So I think if x86_64 can do float <-> int for vectors implementing
[us]fix/[us]float would be the way to go (and of course then make use
of those in lowering/vectorization).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (7 preceding siblings ...)
  2022-10-28 11:41 ` rguenth at gcc dot gnu.org
@ 2022-10-31 13:02 ` rsandifo at gcc dot gnu.org
  2024-06-27  8:06 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2022-10-31 13:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #8)
> I think we do support FIX_TRUNC_EXPR or FLOAT_EXPR for float <-> int
> conversion of vectors like we now support {CONVERT,NOP}_EXPR for
> just widening/shortening.  At least the GIMPLE verifier allows that.
> 
> The obtabs would be [us]fix and [us]float, not sure if aarch64 makes use
> of those for vector modes or if Richard extended the vectorizer to
> consider those (I only remember int <-> int conversions).
AArch64 doesn't use mixed-size vector fix and float yet, but the hope
is that would in future.  For SVE, the main difficulty is that FP
conversions could raise exceptions, so only the conditional forms
would be interesting for normal predicated loops under default flags.
The unpredicated optabs would require -ffast-math-like flags.

This is probably lower hanging fruit for Advanced SIMD though.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (9 preceding siblings ...)
  2024-06-27  8:06 ` cvs-commit at gcc dot gnu.org
@ 2024-06-27  8:06 ` cvs-commit at gcc dot gnu.org
  2024-06-27  8:07 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-27  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:

https://gcc.gnu.org/g:c320a7efcd35ba6c6be70dc9b2fe562a9673e363

commit r15-1677-gc320a7efcd35ba6c6be70dc9b2fe562a9673e363
Author: Hu, Lin1 <lin1.hu@intel.com>
Date:   Thu Feb 1 15:15:01 2024 +0800

    vect: generate suitable convert insn for int -> int, float -> float and int
<-> float.

    gcc/ChangeLog:

            PR target/107432
            * tree-vect-generic.cc
            (expand_vector_conversion): Support convert for int -> int,
            float -> float and int <-> float.
            * tree-vect-stmts.cc (vectorizable_conversion): Wrap the
            indirect convert part.
            (supportable_indirect_convert_operation): New function.
            * tree-vectorizer.h (supportable_indirect_convert_operation):
            Define the new function.

    gcc/testsuite/ChangeLog:

            PR target/107432
            * gcc.target/i386/pr107432-1.c: New test.
            * gcc.target/i386/pr107432-2.c: Ditto.
            * gcc.target/i386/pr107432-3.c: Ditto.
            * gcc.target/i386/pr107432-4.c: Ditto.
            * gcc.target/i386/pr107432-5.c: Ditto.
            * gcc.target/i386/pr107432-6.c: Ditto.
            * gcc.target/i386/pr107432-7.c: Ditto.

--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:

https://gcc.gnu.org/g:e5f8a39941f6f0f25dac88bd71fd368fb284a10f

commit r15-1678-ge5f8a39941f6f0f25dac88bd71fd368fb284a10f
Author: Hu, Lin1 <lin1.hu@intel.com>
Date:   Wed Feb 28 18:11:55 2024 +0800

    vect: Support v4hi -> v4qi.

    gcc/ChangeLog:

            PR target/107432
            * config/i386/mmx.md
            (VI2_32_64): New mode iterator.
            (mmxhalfmode): New mode atter.
            (mmxhalfmodelower): Ditto.
            (truncv2hiv2qi2): Extend mode v4hi and change name from
            truncv2hiv2qi to trunc<mode><mmxhalfmodelower>2.

    gcc/testsuite/ChangeLog:

            PR target/107432
            * gcc.target/i386/pr107432-1.c: Modify test.
            * gcc.target/i386/pr107432-6.c: Add test.
            * gcc.target/i386/pr108938-3.c: This patch supports
            truncv4hiv4qi affect bswap optimization, so I added
            the -mno-avx option for now, and open a bugzilla.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (8 preceding siblings ...)
  2022-10-31 13:02 ` rsandifo at gcc dot gnu.org
@ 2024-06-27  8:06 ` cvs-commit at gcc dot gnu.org
  2024-06-27  8:06 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-27  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:

https://gcc.gnu.org/g:c320a7efcd35ba6c6be70dc9b2fe562a9673e363

commit r15-1677-gc320a7efcd35ba6c6be70dc9b2fe562a9673e363
Author: Hu, Lin1 <lin1.hu@intel.com>
Date:   Thu Feb 1 15:15:01 2024 +0800

    vect: generate suitable convert insn for int -> int, float -> float and int
<-> float.

    gcc/ChangeLog:

            PR target/107432
            * tree-vect-generic.cc
            (expand_vector_conversion): Support convert for int -> int,
            float -> float and int <-> float.
            * tree-vect-stmts.cc (vectorizable_conversion): Wrap the
            indirect convert part.
            (supportable_indirect_convert_operation): New function.
            * tree-vectorizer.h (supportable_indirect_convert_operation):
            Define the new function.

    gcc/testsuite/ChangeLog:

            PR target/107432
            * gcc.target/i386/pr107432-1.c: New test.
            * gcc.target/i386/pr107432-2.c: Ditto.
            * gcc.target/i386/pr107432-3.c: Ditto.
            * gcc.target/i386/pr107432-4.c: Ditto.
            * gcc.target/i386/pr107432-5.c: Ditto.
            * gcc.target/i386/pr107432-6.c: Ditto.
            * gcc.target/i386/pr107432-7.c: Ditto.

--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:

https://gcc.gnu.org/g:e5f8a39941f6f0f25dac88bd71fd368fb284a10f

commit r15-1678-ge5f8a39941f6f0f25dac88bd71fd368fb284a10f
Author: Hu, Lin1 <lin1.hu@intel.com>
Date:   Wed Feb 28 18:11:55 2024 +0800

    vect: Support v4hi -> v4qi.

    gcc/ChangeLog:

            PR target/107432
            * config/i386/mmx.md
            (VI2_32_64): New mode iterator.
            (mmxhalfmode): New mode atter.
            (mmxhalfmodelower): Ditto.
            (truncv2hiv2qi2): Extend mode v4hi and change name from
            truncv2hiv2qi to trunc<mode><mmxhalfmodelower>2.

    gcc/testsuite/ChangeLog:

            PR target/107432
            * gcc.target/i386/pr107432-1.c: Modify test.
            * gcc.target/i386/pr107432-6.c: Add test.
            * gcc.target/i386/pr108938-3.c: This patch supports
            truncv4hiv4qi affect bswap optimization, so I added
            the -mno-avx option for now, and open a bugzilla.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (10 preceding siblings ...)
  2024-06-27  8:06 ` cvs-commit at gcc dot gnu.org
@ 2024-06-27  8:07 ` cvs-commit at gcc dot gnu.org
  2024-07-02  7:52 ` liuhongt at gcc dot gnu.org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-27  8:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #12 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:

https://gcc.gnu.org/g:4385dc97b0d28e54541eb2418d6e68fc672441d7

commit r15-1679-g4385dc97b0d28e54541eb2418d6e68fc672441d7
Author: Hu, Lin1 <lin1.hu@intel.com>
Date:   Wed Mar 6 19:58:48 2024 +0800

    vect: support direct conversion under x86-64-v3.

    gcc/ChangeLog:

            PR target/107432
            * config/i386/i386-expand.cc
(ix86_expand_trunc_with_avx2_noavx512f):
            New function for generate a series of suitable insn.
            * config/i386/i386-protos.h
(ix86_expand_trunc_with_avx2_noavx512f):
            Define new function.
            * config/i386/sse.md: Extend trunc<mode><mode>2 for x86-64-v3.
            (ssebytemode) Add V8HI.
            (PMOV_DST_MODE_2_AVX2): New mode iterator.
            (PMOV_SRC_MODE_3_AVX2): Ditto.
            * config/i386/mmx.md
            (trunc<mode><mmxhalfmodelower>2): Ditto.
            (avx512vl_trunc<mode><mmxhalfmodelower>2): Ditto.
            (truncv2si<mode>2): Ditto.
            (avx512vl_truncv2si<mode>2): Ditto.
            (mmxbytemode): New mode attr.

    gcc/testsuite/ChangeLog:

            PR target/107432
            * gcc.target/i386/pr107432-8.c: New test.
            * gcc.target/i386/pr107432-9.c: Ditto.
            * gcc.target/i386/pr92645-4.c: Modify test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (11 preceding siblings ...)
  2024-06-27  8:07 ` cvs-commit at gcc dot gnu.org
@ 2024-07-02  7:52 ` liuhongt at gcc dot gnu.org
  2024-07-02  7:54 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-07-02  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
                 CC|                            |liuhongt at gcc dot gnu.org
             Status|NEW                         |RESOLVED

--- Comment #13 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
Fixed in GCC15.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (12 preceding siblings ...)
  2024-07-02  7:52 ` liuhongt at gcc dot gnu.org
@ 2024-07-02  7:54 ` pinskia at gcc dot gnu.org
  2024-07-02 22:28 ` cvs-commit at gcc dot gnu.org
  2024-07-16  1:24 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-07-02  7:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |15.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (13 preceding siblings ...)
  2024-07-02  7:54 ` pinskia at gcc dot gnu.org
@ 2024-07-02 22:28 ` cvs-commit at gcc dot gnu.org
  2024-07-16  1:24 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-07-02 22:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>:

https://gcc.gnu.org/g:a7ad9cb813063ddf51269910f33b56116c10462c

commit r15-1800-ga7ad9cb813063ddf51269910f33b56116c10462c
Author: Andrew Pinski <quic_apinski@quicinc.com>
Date:   Tue Jul 2 15:02:17 2024 -0700

    aarch64: Add testcase for vectorconvert lowering [PR110473]

    Vectorconvert lowering was changed to use the convert optab directly
    starting in r15-1677-gc320a7efcd35ba. I had filed an aarch64 specific
    issue for this specific thing and it would make sense to add an aarch64
    specific testcase instead of just having a x86_64 specific ones for
    this.

    Pushed as obvious after testing for aarch64-linux-gnu.

            PR tree-optimization/110473
            PR tree-optimization/107432

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/vect-convert-1.c: New test.

    Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/107432] __builtin_convertvector generates inefficient code
  2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
                   ` (14 preceding siblings ...)
  2024-07-02 22:28 ` cvs-commit at gcc dot gnu.org
@ 2024-07-16  1:24 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-07-16  1:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

--- Comment #15 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:

https://gcc.gnu.org/g:a902e35396d68f10bd27477153fafa4f5ac9c319

commit r15-2052-ga902e35396d68f10bd27477153fafa4f5ac9c319
Author: Hu, Lin1 <lin1.hu@intel.com>
Date:   Thu Jul 11 15:03:22 2024 +0800

    i386: extend trunc{128}2{16,32,64}'s scope.

    Based on actual usage, trunc{128}2{16,32,64} use some instructions from
    sse/sse3, so extend their scope to extend the scope of optimization.

    gcc/ChangeLog:

            PR target/107432
            * config/i386/sse.md
            (PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI.
            (PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI.
            (trunc<mode><pmov_dst_3_lower>2): Change constraint from
TARGET_AVX2 to
            TARGET_SSSE3.
            (trunc<mode><pmov_dst_4_lower>2): Ditto.
            (truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE.

    gcc/testsuite/ChangeLog:

            PR target/107432
            * gcc.target/i386/pr107432-10.c: New test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-07-16  1:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
2022-10-27 15:12 ` [Bug target/107432] " pinskia at gcc dot gnu.org
2022-10-27 16:14 ` g.peterhoff@t-online.de
2022-10-28  3:33 ` crazylht at gmail dot com
2022-10-28  3:36 ` crazylht at gmail dot com
2022-10-28  5:22 ` crazylht at gmail dot com
2022-10-28  5:33 ` crazylht at gmail dot com
2022-10-28  6:55 ` crazylht at gmail dot com
2022-10-28 11:41 ` rguenth at gcc dot gnu.org
2022-10-31 13:02 ` rsandifo at gcc dot gnu.org
2024-06-27  8:06 ` cvs-commit at gcc dot gnu.org
2024-06-27  8:06 ` cvs-commit at gcc dot gnu.org
2024-06-27  8:07 ` cvs-commit at gcc dot gnu.org
2024-07-02  7:52 ` liuhongt at gcc dot gnu.org
2024-07-02  7:54 ` pinskia at gcc dot gnu.org
2024-07-02 22:28 ` cvs-commit at gcc dot gnu.org
2024-07-16  1:24 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).