public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/107432] New: __builtin_convertvector generates inefficient code
@ 2022-10-27 10:02 g.peterhoff@t-online.de
2022-10-27 15:12 ` [Bug target/107432] " pinskia at gcc dot gnu.org
` (15 more replies)
0 siblings, 16 replies; 17+ messages in thread
From: g.peterhoff@t-online.de @ 2022-10-27 10:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
Bug ID: 107432
Summary: __builtin_convertvector generates inefficient code
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: g.peterhoff@t-online.de
Target Milestone: ---
Example: conversion int64_t -> int32_t
avx512f + avx512vl
HW conversions are available.
avx2
There is a correctly working 32-bit-permutation
(_mm256_permutevar8x32_epi32/vpermd) that can be used.
I have not (yet) evaluated whether other conversions (larger int -> smaller
int) are also affected.
PS: On x86 it's already hell to optimize all cases depending on the instruction
set.
PPS: What about -march=znver4 ?
https://godbolt.org/z/3s79bnh7v
thx
Gero
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
@ 2022-10-27 15:12 ` pinskia at gcc dot gnu.org
2022-10-27 16:14 ` g.peterhoff@t-online.de
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-27 15:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 53781
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53781&action=edit
testcase
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
2022-10-27 15:12 ` [Bug target/107432] " pinskia at gcc dot gnu.org
@ 2022-10-27 16:14 ` g.peterhoff@t-online.de
2022-10-28 3:33 ` crazylht at gmail dot com
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: g.peterhoff@t-online.de @ 2022-10-27 16:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #2 from g.peterhoff@t-online.de ---
Another example. I want to convert an array<Bool> to array<Float64>.
There are basically 3 options:
- Copy
- Test (b2f64_default)
- optimized version (b2f64_manually)
gcc12.2 + gcctrunc
convertSIZE_copy only generates scalar code (_mm_cvtsi64_sd)
convertSIZE_default always generates conditional jumps
convertSIZE_manually
gcctrunc always generates branch-free scalar code
gcc12.2
convert1024_manually generates vector code, but does not use HW conversion
int8->int64 (_mm(256)_cvtepi8_epi64) and converts int8->int16->int32->int64
manually
convert8_manually generates branch-free scalar code
convert4_manually generates vector code and uses HW conversion int8->int64
NONE of these conversions are transformed/optimized to the extent that always
- all available intrinsics are used
- no "normal" registers are used
- branch-free code is generated
https://godbolt.org/z/f74vK79of
thx
Gero
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
2022-10-27 15:12 ` [Bug target/107432] " pinskia at gcc dot gnu.org
2022-10-27 16:14 ` g.peterhoff@t-online.de
@ 2022-10-28 3:33 ` crazylht at gmail dot com
2022-10-28 3:36 ` crazylht at gmail dot com
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28 3:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
typedef int v4si __attribute__((vector_size(16)));
typedef long long v4di __attribute__((vector_size(32)));
v4si
foo (v4di a)
{
return __builtin_convertvector (a, v4si);
}
hmm, we actually support truncv4div4si2, but some how gcc failed to generate
.VEC_CONVERT with truncmn2.
hmm, what's optab for convert_optab_handler?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (2 preceding siblings ...)
2022-10-28 3:33 ` crazylht at gmail dot com
@ 2022-10-28 3:36 ` crazylht at gmail dot com
2022-10-28 5:22 ` crazylht at gmail dot com
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28 3:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> typedef int v4si __attribute__((vector_size(16)));
> typedef long long v4di __attribute__((vector_size(32)));
>
> v4si
> foo (v4di a)
> {
> return __builtin_convertvector (a, v4si);
> }
>
> hmm, we actually support truncv4div4si2, but some how gcc failed to generate
> .VEC_CONVERT with truncmn2.
>
/* IFN_VEC_CONVERT is supposed to be expanded at pass_lower_vector. So this
dummy function should never be called. */
static void
expand_VEC_CONVERT (internal_fn, gcall *)
{
gcc_unreachable ();
}
It's lowered by pass_lower_vector, ideally, can we use truncmn2 in
expand_VEC_CONVERT if src is bigger integer mode than dest.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (3 preceding siblings ...)
2022-10-28 3:36 ` crazylht at gmail dot com
@ 2022-10-28 5:22 ` crazylht at gmail dot com
2022-10-28 5:33 ` crazylht at gmail dot com
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28 5:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
> It's lowered by pass_lower_vector, ideally, can we use truncmn2 in
> expand_VEC_CONVERT if src is bigger integer mode than dest.
Currently, expand_vector_conversion uses VEC_PACK_TRUNC_EXPR
---------------cut begins------------------------
else if (modifier == NARROW)
{
switch (code)
{
CASE_CONVERT:
code1 = VEC_PACK_TRUNC_EXPR;
optab1 = optab_for_tree_code (code1, arg_type, optab_default);
break;
---------------Cut ends------------------------
But BB vectorizer can do the right thing for
void
foo (long long* a, int* b)
{
b[0] = a[0];
b[1] = a[1];
b[2] = a[2];
b[3] = a[3];
}
vmovdqu ymm0, YMMWORD PTR [rdi]
vpmovqd XMMWORD PTR [rsi], ymm0
vzeroupper
ret
vect__1.5_16 = MEM <vector(4) long long int> [(long long int *)a_10(D)];
vect__2.6_18 = (vector(4) int) vect__1.5_16;
# DEBUG BEGIN_STMT
# DEBUG BEGIN_STMT
# DEBUG BEGIN_STMT
MEM <vector(4) int> [(int *)b_11(D)] = vect__2.6_18;
return;
Guess expand_vector_conversion can be optimized.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (4 preceding siblings ...)
2022-10-28 5:22 ` crazylht at gmail dot com
@ 2022-10-28 5:33 ` crazylht at gmail dot com
2022-10-28 6:55 ` crazylht at gmail dot com
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28 5:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
> Guess expand_vector_conversion can be optimized.
if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
&& SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
code = FIX_TRUNC_EXPR;
else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
&& SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
code = FLOAT_EXPR;
It only supports floatmn2/fix_truncmn2 for float <-> integer.
But we can also supports extendmn2/zero_extendmn2/truncmn2 for float <-> float,
integer <-> integer.
Or are there any concerns and VEC_PACK_TRUNC_EXPR,
VEC_PACK_FIX_TRUNC_EXPR,VEC_PACK_FLOAT_EXPR are used on purpose?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (5 preceding siblings ...)
2022-10-28 5:33 ` crazylht at gmail dot com
@ 2022-10-28 6:55 ` crazylht at gmail dot com
2022-10-28 11:41 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2022-10-28 6:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #6)
> > Guess expand_vector_conversion can be optimized.
>
> if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
> && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
> code = FIX_TRUNC_EXPR;
> else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
> && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
> code = FLOAT_EXPR;
>
> It only supports floatmn2/fix_truncmn2 for float <-> integer.
>
> But we can also supports extendmn2/zero_extendmn2/truncmn2 for float <->
> float, integer <-> integer.
>
> Or are there any concerns and VEC_PACK_TRUNC_EXPR,
> VEC_PACK_FIX_TRUNC_EXPR,VEC_PACK_FLOAT_EXPR are used on purpose?
May be we can add some gimple simplication in match.pd to hanlde
_4 = VEC_PACK_TRUNC_EXPR <a_1(D), { 0, 0, 0, 0 }>;
_5 = BIT_FIELD_REF <_4, 128, 0>;
and
_4 = [vec_unpack_lo_expr] a_1(D);
_5 = [vec_unpack_hi_expr] a_1(D);
_2 = {_4, _5};
Since loop vectorizer may also create vec_unpack_lo_expr/vec_unpack_hi_expr.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (6 preceding siblings ...)
2022-10-28 6:55 ` crazylht at gmail dot com
@ 2022-10-28 11:41 ` rguenth at gcc dot gnu.org
2022-10-31 13:02 ` rsandifo at gcc dot gnu.org
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-10-28 11:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org
Target|X86_64 |x86_64-*-*
Last reconfirmed| |2022-10-28
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Version|unknown |13.0
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #6)
> > Guess expand_vector_conversion can be optimized.
>
> if (INTEGRAL_TYPE_P (TREE_TYPE (ret_type))
> && SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg_type)))
> code = FIX_TRUNC_EXPR;
> else if (INTEGRAL_TYPE_P (TREE_TYPE (arg_type))
> && SCALAR_FLOAT_TYPE_P (TREE_TYPE (ret_type)))
> code = FLOAT_EXPR;
>
> It only supports floatmn2/fix_truncmn2 for float <-> integer.
>
> But we can also supports extendmn2/zero_extendmn2/truncmn2 for float <->
> float, integer <-> integer.
>
> Or are there any concerns and VEC_PACK_TRUNC_EXPR,
> VEC_PACK_FIX_TRUNC_EXPR,VEC_PACK_FLOAT_EXPR are used on purpose?
I think we do support FIX_TRUNC_EXPR or FLOAT_EXPR for float <-> int
conversion of vectors like we now support {CONVERT,NOP}_EXPR for
just widening/shortening. At least the GIMPLE verifier allows that.
The obtabs would be [us]fix and [us]float, not sure if aarch64 makes use
of those for vector modes or if Richard extended the vectorizer to
consider those (I only remember int <-> int conversions).
So I think if x86_64 can do float <-> int for vectors implementing
[us]fix/[us]float would be the way to go (and of course then make use
of those in lowering/vectorization).
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (7 preceding siblings ...)
2022-10-28 11:41 ` rguenth at gcc dot gnu.org
@ 2022-10-31 13:02 ` rsandifo at gcc dot gnu.org
2024-06-27 8:06 ` cvs-commit at gcc dot gnu.org
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2022-10-31 13:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #8)
> I think we do support FIX_TRUNC_EXPR or FLOAT_EXPR for float <-> int
> conversion of vectors like we now support {CONVERT,NOP}_EXPR for
> just widening/shortening. At least the GIMPLE verifier allows that.
>
> The obtabs would be [us]fix and [us]float, not sure if aarch64 makes use
> of those for vector modes or if Richard extended the vectorizer to
> consider those (I only remember int <-> int conversions).
AArch64 doesn't use mixed-size vector fix and float yet, but the hope
is that would in future. For SVE, the main difficulty is that FP
conversions could raise exceptions, so only the conditional forms
would be interesting for normal predicated loops under default flags.
The unpredicated optabs would require -ffast-math-like flags.
This is probably lower hanging fruit for Advanced SIMD though.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (9 preceding siblings ...)
2024-06-27 8:06 ` cvs-commit at gcc dot gnu.org
@ 2024-06-27 8:06 ` cvs-commit at gcc dot gnu.org
2024-06-27 8:07 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-27 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:
https://gcc.gnu.org/g:c320a7efcd35ba6c6be70dc9b2fe562a9673e363
commit r15-1677-gc320a7efcd35ba6c6be70dc9b2fe562a9673e363
Author: Hu, Lin1 <lin1.hu@intel.com>
Date: Thu Feb 1 15:15:01 2024 +0800
vect: generate suitable convert insn for int -> int, float -> float and int
<-> float.
gcc/ChangeLog:
PR target/107432
* tree-vect-generic.cc
(expand_vector_conversion): Support convert for int -> int,
float -> float and int <-> float.
* tree-vect-stmts.cc (vectorizable_conversion): Wrap the
indirect convert part.
(supportable_indirect_convert_operation): New function.
* tree-vectorizer.h (supportable_indirect_convert_operation):
Define the new function.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-1.c: New test.
* gcc.target/i386/pr107432-2.c: Ditto.
* gcc.target/i386/pr107432-3.c: Ditto.
* gcc.target/i386/pr107432-4.c: Ditto.
* gcc.target/i386/pr107432-5.c: Ditto.
* gcc.target/i386/pr107432-6.c: Ditto.
* gcc.target/i386/pr107432-7.c: Ditto.
--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:
https://gcc.gnu.org/g:e5f8a39941f6f0f25dac88bd71fd368fb284a10f
commit r15-1678-ge5f8a39941f6f0f25dac88bd71fd368fb284a10f
Author: Hu, Lin1 <lin1.hu@intel.com>
Date: Wed Feb 28 18:11:55 2024 +0800
vect: Support v4hi -> v4qi.
gcc/ChangeLog:
PR target/107432
* config/i386/mmx.md
(VI2_32_64): New mode iterator.
(mmxhalfmode): New mode atter.
(mmxhalfmodelower): Ditto.
(truncv2hiv2qi2): Extend mode v4hi and change name from
truncv2hiv2qi to trunc<mode><mmxhalfmodelower>2.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-1.c: Modify test.
* gcc.target/i386/pr107432-6.c: Add test.
* gcc.target/i386/pr108938-3.c: This patch supports
truncv4hiv4qi affect bswap optimization, so I added
the -mno-avx option for now, and open a bugzilla.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (8 preceding siblings ...)
2022-10-31 13:02 ` rsandifo at gcc dot gnu.org
@ 2024-06-27 8:06 ` cvs-commit at gcc dot gnu.org
2024-06-27 8:06 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-27 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:
https://gcc.gnu.org/g:c320a7efcd35ba6c6be70dc9b2fe562a9673e363
commit r15-1677-gc320a7efcd35ba6c6be70dc9b2fe562a9673e363
Author: Hu, Lin1 <lin1.hu@intel.com>
Date: Thu Feb 1 15:15:01 2024 +0800
vect: generate suitable convert insn for int -> int, float -> float and int
<-> float.
gcc/ChangeLog:
PR target/107432
* tree-vect-generic.cc
(expand_vector_conversion): Support convert for int -> int,
float -> float and int <-> float.
* tree-vect-stmts.cc (vectorizable_conversion): Wrap the
indirect convert part.
(supportable_indirect_convert_operation): New function.
* tree-vectorizer.h (supportable_indirect_convert_operation):
Define the new function.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-1.c: New test.
* gcc.target/i386/pr107432-2.c: Ditto.
* gcc.target/i386/pr107432-3.c: Ditto.
* gcc.target/i386/pr107432-4.c: Ditto.
* gcc.target/i386/pr107432-5.c: Ditto.
* gcc.target/i386/pr107432-6.c: Ditto.
* gcc.target/i386/pr107432-7.c: Ditto.
--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:
https://gcc.gnu.org/g:e5f8a39941f6f0f25dac88bd71fd368fb284a10f
commit r15-1678-ge5f8a39941f6f0f25dac88bd71fd368fb284a10f
Author: Hu, Lin1 <lin1.hu@intel.com>
Date: Wed Feb 28 18:11:55 2024 +0800
vect: Support v4hi -> v4qi.
gcc/ChangeLog:
PR target/107432
* config/i386/mmx.md
(VI2_32_64): New mode iterator.
(mmxhalfmode): New mode atter.
(mmxhalfmodelower): Ditto.
(truncv2hiv2qi2): Extend mode v4hi and change name from
truncv2hiv2qi to trunc<mode><mmxhalfmodelower>2.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-1.c: Modify test.
* gcc.target/i386/pr107432-6.c: Add test.
* gcc.target/i386/pr108938-3.c: This patch supports
truncv4hiv4qi affect bswap optimization, so I added
the -mno-avx option for now, and open a bugzilla.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (10 preceding siblings ...)
2024-06-27 8:06 ` cvs-commit at gcc dot gnu.org
@ 2024-06-27 8:07 ` cvs-commit at gcc dot gnu.org
2024-07-02 7:52 ` liuhongt at gcc dot gnu.org
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-27 8:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #12 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:
https://gcc.gnu.org/g:4385dc97b0d28e54541eb2418d6e68fc672441d7
commit r15-1679-g4385dc97b0d28e54541eb2418d6e68fc672441d7
Author: Hu, Lin1 <lin1.hu@intel.com>
Date: Wed Mar 6 19:58:48 2024 +0800
vect: support direct conversion under x86-64-v3.
gcc/ChangeLog:
PR target/107432
* config/i386/i386-expand.cc
(ix86_expand_trunc_with_avx2_noavx512f):
New function for generate a series of suitable insn.
* config/i386/i386-protos.h
(ix86_expand_trunc_with_avx2_noavx512f):
Define new function.
* config/i386/sse.md: Extend trunc<mode><mode>2 for x86-64-v3.
(ssebytemode) Add V8HI.
(PMOV_DST_MODE_2_AVX2): New mode iterator.
(PMOV_SRC_MODE_3_AVX2): Ditto.
* config/i386/mmx.md
(trunc<mode><mmxhalfmodelower>2): Ditto.
(avx512vl_trunc<mode><mmxhalfmodelower>2): Ditto.
(truncv2si<mode>2): Ditto.
(avx512vl_truncv2si<mode>2): Ditto.
(mmxbytemode): New mode attr.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-8.c: New test.
* gcc.target/i386/pr107432-9.c: Ditto.
* gcc.target/i386/pr92645-4.c: Modify test.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (11 preceding siblings ...)
2024-06-27 8:07 ` cvs-commit at gcc dot gnu.org
@ 2024-07-02 7:52 ` liuhongt at gcc dot gnu.org
2024-07-02 7:54 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-07-02 7:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
Hongtao Liu <liuhongt at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
CC| |liuhongt at gcc dot gnu.org
Status|NEW |RESOLVED
--- Comment #13 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
Fixed in GCC15.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (12 preceding siblings ...)
2024-07-02 7:52 ` liuhongt at gcc dot gnu.org
@ 2024-07-02 7:54 ` pinskia at gcc dot gnu.org
2024-07-02 22:28 ` cvs-commit at gcc dot gnu.org
2024-07-16 1:24 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-07-02 7:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |15.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (13 preceding siblings ...)
2024-07-02 7:54 ` pinskia at gcc dot gnu.org
@ 2024-07-02 22:28 ` cvs-commit at gcc dot gnu.org
2024-07-16 1:24 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-07-02 22:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>:
https://gcc.gnu.org/g:a7ad9cb813063ddf51269910f33b56116c10462c
commit r15-1800-ga7ad9cb813063ddf51269910f33b56116c10462c
Author: Andrew Pinski <quic_apinski@quicinc.com>
Date: Tue Jul 2 15:02:17 2024 -0700
aarch64: Add testcase for vectorconvert lowering [PR110473]
Vectorconvert lowering was changed to use the convert optab directly
starting in r15-1677-gc320a7efcd35ba. I had filed an aarch64 specific
issue for this specific thing and it would make sense to add an aarch64
specific testcase instead of just having a x86_64 specific ones for
this.
Pushed as obvious after testing for aarch64-linux-gnu.
PR tree-optimization/110473
PR tree-optimization/107432
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-convert-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug target/107432] __builtin_convertvector generates inefficient code
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
` (14 preceding siblings ...)
2024-07-02 22:28 ` cvs-commit at gcc dot gnu.org
@ 2024-07-16 1:24 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-07-16 1:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
--- Comment #15 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hu <hulin@gcc.gnu.org>:
https://gcc.gnu.org/g:a902e35396d68f10bd27477153fafa4f5ac9c319
commit r15-2052-ga902e35396d68f10bd27477153fafa4f5ac9c319
Author: Hu, Lin1 <lin1.hu@intel.com>
Date: Thu Jul 11 15:03:22 2024 +0800
i386: extend trunc{128}2{16,32,64}'s scope.
Based on actual usage, trunc{128}2{16,32,64} use some instructions from
sse/sse3, so extend their scope to extend the scope of optimization.
gcc/ChangeLog:
PR target/107432
* config/i386/sse.md
(PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI.
(PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI.
(trunc<mode><pmov_dst_3_lower>2): Change constraint from
TARGET_AVX2 to
TARGET_SSSE3.
(trunc<mode><pmov_dst_4_lower>2): Ditto.
(truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-10.c: New test.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-07-16 1:24 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-27 10:02 [Bug c++/107432] New: __builtin_convertvector generates inefficient code g.peterhoff@t-online.de
2022-10-27 15:12 ` [Bug target/107432] " pinskia at gcc dot gnu.org
2022-10-27 16:14 ` g.peterhoff@t-online.de
2022-10-28 3:33 ` crazylht at gmail dot com
2022-10-28 3:36 ` crazylht at gmail dot com
2022-10-28 5:22 ` crazylht at gmail dot com
2022-10-28 5:33 ` crazylht at gmail dot com
2022-10-28 6:55 ` crazylht at gmail dot com
2022-10-28 11:41 ` rguenth at gcc dot gnu.org
2022-10-31 13:02 ` rsandifo at gcc dot gnu.org
2024-06-27 8:06 ` cvs-commit at gcc dot gnu.org
2024-06-27 8:06 ` cvs-commit at gcc dot gnu.org
2024-06-27 8:07 ` cvs-commit at gcc dot gnu.org
2024-07-02 7:52 ` liuhongt at gcc dot gnu.org
2024-07-02 7:54 ` pinskia at gcc dot gnu.org
2024-07-02 22:28 ` cvs-commit at gcc dot gnu.org
2024-07-16 1:24 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).