public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/51492] New: vectorizer generates unnecessary code
@ 2011-12-10 1:38 drepper.fsp at gmail dot com
2011-12-12 10:25 ` [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns rguenth at gcc dot gnu.org
` (18 more replies)
0 siblings, 19 replies; 20+ messages in thread
From: drepper.fsp at gmail dot com @ 2011-12-10 1:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
Bug #: 51492
Summary: vectorizer generates unnecessary code
Classification: Unclassified
Product: gcc
Version: 4.6.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: drepper.fsp@gmail.com
Build: x86_64-linux
Compile this code with 4.6.2 on a x86-64 machine with -O3:
#define SIZE 65536
#define WSIZE 64
unsigned short head[SIZE] __attribute__((aligned(64)));
void
f(void)
{
for (unsigned n = 0; n < SIZE; ++n) {
unsigned short m = head[n];
head[n] = (unsigned short)(m >= WSIZE ? m-WSIZE : 0);
}
}
The result I see is this:
0000000000000000 <f>:
0: 66 0f ef d2 pxor %xmm2,%xmm2
4: b8 00 00 00 00 mov $0x0,%eax
5: R_X86_64_32 head
9: 66 0f 6f 25 00 00 00 movdqa 0x0(%rip),%xmm4 # 11 <f+0x11>
10: 00
d: R_X86_64_PC32 .LC0-0x4
11: 66 0f 6f 1d 00 00 00 movdqa 0x0(%rip),%xmm3 # 19 <f+0x19>
18: 00
15: R_X86_64_PC32 .LC1-0x4
19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
20: 66 0f 6f 00 movdqa (%rax),%xmm0
24: 66 0f 6f c8 movdqa %xmm0,%xmm1
28: 66 0f d9 c4 psubusw %xmm4,%xmm0
2c: 66 0f 75 c2 pcmpeqw %xmm2,%xmm0
30: 66 0f fd cb paddw %xmm3,%xmm1
34: 66 0f df c1 pandn %xmm1,%xmm0
38: 66 0f 7f 00 movdqa %xmm0,(%rax)
3c: 48 83 c0 10 add $0x10,%rax
40: 48 3d 00 00 00 00 cmp $0x0,%rax
42: R_X86_64_32S head+0x20000
46: 75 d8 jne 20 <f+0x20>
48: f3 c3 repz retq
There is a lot of unnecessary code. The psubusw instruction alone is
sufficient. The purpose of this instruction is to implement saturated
subtraction. Why does gcc create all this extra code? The code should just be
movdqa (%rax), %xmm0
psubusw %xmm1, %xmm0
movdqa %mm0, (%rax)
where %xmm1 has WSIZE in the 16-bit values.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
@ 2011-12-12 10:25 ` rguenth at gcc dot gnu.org
2012-01-08 18:57 ` drepper.fsp at gmail dot com
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-12-12 10:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2011-12-12
Summary|vectorizer generates |vectorizer does not support
|unnecessary code |saturated arithmetic
| |patterns
Ever Confirmed|0 |1
Severity|normal |enhancement
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-12-12 10:23:20 UTC ---
It's vectorized as
vect_var_.11_17 = MEM[base: D.1616_5, offset: 0B];
vect_var_.12_19 = vect_var_.11_17 + { 65472, 65472, 65472, 65472, 65472,
65472, 65472, 65472 };
vect_var_.14_22 = VEC_COND_EXPR <vect_var_.11_17 > { 63, 63, 63, 63, 63, 63,
63, 63 }, vect_var_.12_19, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
MEM[base: D.1616_5, offset: 0B] = vect_var_.14_22;
GCC doesn't have the idea that this is a "saturated subtraction". If targets
have saturated arithmetic support, but only with vectors, then the vectorizer
pattern recognition would need to be enhanced and the targets eventually
should support expanding saturated arithmetic.
OTOH middle-end support for saturated arithmetic needs to be improved,
scalar code could also benefit from optimization. On the RTL level
we have [us]s_{plus,minus} which the vectorizer could use (if implemented
on the target for vector types).
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
2011-12-12 10:25 ` [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns rguenth at gcc dot gnu.org
@ 2012-01-08 18:57 ` drepper.fsp at gmail dot com
2012-07-13 8:39 ` rguenth at gcc dot gnu.org
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-01-08 18:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #2 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-01-08 18:56:48 UTC ---
Note, this code appears in gzip and therefore IIRC in specCPU (in
deflate.c:fill_window). Although when compiling gzip myself with that code
embedded in a larger function I cannot get the optimization to apply at all.
If this bug is fixed and the optimization is applied the spec numbers could go
up if specCPUis testing unzipping...
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
2011-12-12 10:25 ` [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns rguenth at gcc dot gnu.org
2012-01-08 18:57 ` drepper.fsp at gmail dot com
@ 2012-07-13 8:39 ` rguenth at gcc dot gnu.org
2021-08-24 23:44 ` pinskia at gcc dot gnu.org
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-13 8:39 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |53947
--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-13 08:39:43 UTC ---
Link to vectorizer missed-optimization meta-bug.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (2 preceding siblings ...)
2012-07-13 8:39 ` rguenth at gcc dot gnu.org
@ 2021-08-24 23:44 ` pinskia at gcc dot gnu.org
2021-08-25 3:54 ` pinskia at gcc dot gnu.org
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-24 23:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2016-01-04 00:00:00 |2021-8-24
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We do slightly better but not close:
movdqa (%rax), %xmm0
addq $16, %rax
psubusw %xmm1, %xmm0
paddw %xmm1, %xmm0
paddw %xmm2, %xmm0
movaps %xmm0, -16(%rax)
Which is expanded from:
vect__1.6_15 = MAX_EXPR <vect_m_6.5_3, { 64, 64, 64, 64, 64, 64, 64, 64 }>;
vect__2.7_17 = vect__1.6_15 + { 65472, 65472, 65472, 65472, 65472, 65472,
65472, 65472 };
-mavx2 we get:
vpmaxuw (%rax), %ymm2, %ymm0
addq $32, %rax
vpaddw %ymm1, %ymm0, %ymm0
vmovdqa %ymm0, -32(%rax)
Just note 65472 is -64.
This shouldn't be too hard to detect and add and even lower back to
MAX_EXPR/PLUS_EXPR if us_minus does not exist.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (3 preceding siblings ...)
2021-08-24 23:44 ` pinskia at gcc dot gnu.org
@ 2021-08-25 3:54 ` pinskia at gcc dot gnu.org
2024-02-01 13:06 ` pan2.li at intel dot com
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-25 3:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
https://gcc.gnu.org/pipermail/gcc/2021-May/236015.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (4 preceding siblings ...)
2021-08-25 3:54 ` pinskia at gcc dot gnu.org
@ 2024-02-01 13:06 ` pan2.li at intel dot com
2024-02-01 13:37 ` juzhe.zhong at rivai dot ai
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: pan2.li at intel dot com @ 2024-02-01 13:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #7 from Li Pan <pan2.li at intel dot com> ---
RISC-V backend reproduce code, build with "-march=rv64gcv_zba_zbb_zbc_zbs
--param=riscv-autovec-preference=fixed-vlmax -Ofast -ffast-math"
typedef unsigned short uint16_t;
void AAA (uint16_t *x, uint16_t *y, unsigned wsize, unsigned count)
{
unsigned m = 0, n = count;
register uint16_t *p;
p = x;
do {
m = *--p;
*p = (uint16_t)(m >= wsize ? m-wsize : 0);
} while (--n);
n = wsize;
p = y;
do {
m = *--p;
*p = (uint16_t)(m >= wsize ? m-wsize : 0);
} while (--n);
}
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (5 preceding siblings ...)
2024-02-01 13:06 ` pan2.li at intel dot com
@ 2024-02-01 13:37 ` juzhe.zhong at rivai dot ai
2024-02-01 13:42 ` juzhe.zhong at rivai dot ai
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-02-01 13:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #8 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Missing saturate vectorization causes RVV Clang 20% performance better than RVV
GCC during recent benchmark evaluation.
In coremark pro zip-test, I believe other targets should be the same.
I wonder how we should start to support it. Or did some body has already
started it ?
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (6 preceding siblings ...)
2024-02-01 13:37 ` juzhe.zhong at rivai dot ai
@ 2024-02-01 13:42 ` juzhe.zhong at rivai dot ai
2024-02-01 14:40 ` juzhe.zhong at rivai dot ai
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-02-01 13:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. After investigation of LLVM:
Before loop vectorizer:
%cond12 = tail call i32 @llvm.usub.sat.i32(i32 %conv5, i32 %wsize)
%conv13 = trunc i32 %cond12 to i16
After loop vectorizer:
%10 = call <16 x i32> @llvm.usub.sat.v16i32(<16 x i32> %9, <16 x i32>
%broadcast.splat)
%11 = trunc <16 x i32> %10 to <16 x i16>
I think GCC can follow this approach, that is, first recognize scalar
saturation,
then fall into loop vectorizer to vectorize it into the saturation.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (7 preceding siblings ...)
2024-02-01 13:42 ` juzhe.zhong at rivai dot ai
@ 2024-02-01 14:40 ` juzhe.zhong at rivai dot ai
2024-02-01 14:41 ` juzhe.zhong at rivai dot ai
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-02-01 14:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Tamar.
We are interested in supporting saturating and rounding.
We may need to support scalar first.
Do you have any suggestions ?
Or you are already working on it?
Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (8 preceding siblings ...)
2024-02-01 14:40 ` juzhe.zhong at rivai dot ai
@ 2024-02-01 14:41 ` juzhe.zhong at rivai dot ai
2024-02-01 15:10 ` tnfchris at gcc dot gnu.org
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-02-01 14:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #11 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Tamar.
We are interested in supporting saturating and rounding.
We may need to support scalar first.
Do you have any suggestions ?
Or you are already working on it?
Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (9 preceding siblings ...)
2024-02-01 14:41 ` juzhe.zhong at rivai dot ai
@ 2024-02-01 15:10 ` tnfchris at gcc dot gnu.org
2024-02-02 1:04 ` pan2.li at intel dot com
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-01 15:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #12 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #11)
> Hi, Tamar.
>
> We are interested in supporting saturating and rounding.
Awesome!
>
> We may need to support scalar first.
>
> Do you have any suggestions ?
>
> Or you are already working on it?
No, atm we're not, it's on the backlog but haven't gotten to it so feel free to
do so.
The general conclusion of the thread is that we should introduce new internal
functions in the mid-end for this (also see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600 for some other scalar
examples).
So e.g. we'd have IFN_SAT_ADD etc and new optabs. recognizing this on scalar
you'll then automatically get autovect.
What I would do is create non-direct-optab IFNs. as in, have a default
fallback for architectures that don't have the optab implemented, and those
that do use the optab.
I think we should be able to do better here in general even for scalar if we
know the operation is supposed to saturate like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600 shows.
This also simplifies optimizations because every target then has the same
GIMPLE representation for these operations.
The only outstanding thing is where to do this. We obviously have to do so
before vectorization but some of the saturation idioms require phi-opts
https://godbolt.org/z/9oWP5vqee but others can't be done in phi-opts, those
probably fit in match.pd or forwardprop.
Any suggestions of where to best add the detection richi?
>
> Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (10 preceding siblings ...)
2024-02-01 15:10 ` tnfchris at gcc dot gnu.org
@ 2024-02-02 1:04 ` pan2.li at intel dot com
2024-02-02 11:11 ` tnfchris at gcc dot gnu.org
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: pan2.li at intel dot com @ 2024-02-02 1:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #13 from Li Pan <pan2.li at intel dot com> ---
I'll try to understand it and make it happen recently.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (11 preceding siblings ...)
2024-02-02 1:04 ` pan2.li at intel dot com
@ 2024-02-02 11:11 ` tnfchris at gcc dot gnu.org
2024-02-03 6:57 ` pan2.li at intel dot com
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-02 11:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #14 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Awesome! Feel free to reach out if you need any help.
It’s likely easier to start with add and sub and get things pipe cleaned and
expand incrementally than to try and do it all at once.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (12 preceding siblings ...)
2024-02-02 11:11 ` tnfchris at gcc dot gnu.org
@ 2024-02-03 6:57 ` pan2.li at intel dot com
2024-02-06 1:13 ` pan2.li at intel dot com
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: pan2.li at intel dot com @ 2024-02-03 6:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #15 from Li Pan <pan2.li at intel dot com> ---
(In reply to Tamar Christina from comment #14)
> Awesome! Feel free to reach out if you need any help.
>
> It’s likely easier to start with add and sub and get things pipe cleaned and
> expand incrementally than to try and do it all at once.
Cool, thanks in advance.
I will first try to make a SAT_ADD to the direct optab for a POC following your
RFC and suggestion. Looks like at least match.pd and internal-fn.def will be
touched. I am learning how match.pd works right now.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (13 preceding siblings ...)
2024-02-03 6:57 ` pan2.li at intel dot com
@ 2024-02-06 1:13 ` pan2.li at intel dot com
2024-02-06 22:11 ` tnfchris at gcc dot gnu.org
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: pan2.li at intel dot com @ 2024-02-06 1:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #16 from Li Pan <pan2.li at intel dot com> ---
I have a try like below and finally have the Standard Name "SAT_ADD". Could you
please help to double-check if my understanding is correct?
Given below example code below:
typedef unsigned int uint32_t;
uint32_t
sat_add (uint32_t x, uint32_t y)
{
return (x + y) | - ((x + y) < x);
}
And then add one simpify to match.pd and define new DEF_INTERNAL_OPTAB_FN for
it. Then we have the SAT_ADD representation after expand.
uint32_t sat_add (uint32_t x, uint32_t y)
{
uint32_t _6;
;; basic block 2, loop depth 0
;; pred: ENTRY
_6 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
return _6;
;; succ: EXIT
}
If everything goes well, I will prepare the patch for it later. Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (14 preceding siblings ...)
2024-02-06 1:13 ` pan2.li at intel dot com
@ 2024-02-06 22:11 ` tnfchris at gcc dot gnu.org
2024-02-07 0:57 ` pan2.li at intel dot com
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-06 22:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #17 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Li Pan from comment #16)
> I have a try like below and finally have the Standard Name "SAT_ADD". Could
> you please help to double-check if my understanding is correct?
>
> Given below example code below:
>
> typedef unsigned int uint32_t;
>
> uint32_t
> sat_add (uint32_t x, uint32_t y)
> {
> return (x + y) | - ((x + y) < x);
> }
>
> And then add one simpify to match.pd and define new DEF_INTERNAL_OPTAB_FN
> for it. Then we have the SAT_ADD representation after expand.
>
> uint32_t sat_add (uint32_t x, uint32_t y)
> {
> uint32_t _6;
>
> ;; basic block 2, loop depth 0
> ;; pred: ENTRY
> _6 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
> return _6;
> ;; succ: EXIT
>
> }
>
> If everything goes well, I will prepare the patch for it later. Thanks.
Yeah that's looks right, I assume above you mean before expand?
I believe saturating add is commutative but not associative, so you'd want to
add it to commutative_binary_fn_p in internal-fn.cc.
You may also want to provide some basic optimizations for it in match.pd such
as .SAT_ADD (a, 0) = a. etc.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (15 preceding siblings ...)
2024-02-06 22:11 ` tnfchris at gcc dot gnu.org
@ 2024-02-07 0:57 ` pan2.li at intel dot com
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
18 siblings, 0 replies; 20+ messages in thread
From: pan2.li at intel dot com @ 2024-02-07 0:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #18 from Li Pan <pan2.li at intel dot com> ---
Thanks for the confirmation.
Yes, it was before expand. I will prepare one PATCH for this, and it should
target for gcc-15 I bet.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (16 preceding siblings ...)
2024-02-07 0:57 ` pan2.li at intel dot com
@ 2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-16 12:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #19 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:52b0536710ff3f3ace72ab00ce9ef6c630cd1183
commit r15-576-g52b0536710ff3f3ace72ab00ce9ef6c630cd1183
Author: Pan Li <pan2.li@intel.com>
Date: Wed May 15 10:14:05 2024 +0800
Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
This patch would like to add the middle-end presentation for the
saturation add. Aka set the result of add to the max when overflow.
It will take the pattern similar as below.
SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
Take uint8_t as example, we will have:
* SAT_ADD (1, 254) => 255.
* SAT_ADD (1, 255) => 255.
* SAT_ADD (2, 255) => 255.
* SAT_ADD (255, 255) => 255.
Given below example for the unsigned scalar integer uint64_t:
uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}
Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
long unsigned int _1;
_Bool _2;
long unsigned int _3;
long unsigned int _4;
uint64_t _7;
long unsigned int _10;
__complex__ long unsigned int _11;
;; basic block 2, loop depth 0
;; pred: ENTRY
_11 = .ADD_OVERFLOW (x_5(D), y_6(D));
_1 = REALPART_EXPR <_11>;
_10 = IMAGPART_EXPR <_11>;
_2 = _10 != 0;
_3 = (long unsigned int) _2;
_4 = -_3;
_7 = _1 | _4;
return _7;
;; succ: EXIT
}
After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
uint64_t _7;
;; basic block 2, loop depth 0
;; pred: ENTRY
_7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
return _7;
;; succ: EXIT
}
The below tests are passed for this patch:
1. The riscv fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.
PR target/51492
PR target/112600
gcc/ChangeLog:
* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
to the return true switch case(s).
* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
* match.pd: Add unsigned SAT_ADD match(es).
* optabs.def (OPTAB_NL): Remove fixed-point limitation for
us/ssadd.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
extern func decl generated in match.pd match.
(match_saturation_arith): New func impl to match the saturation
arith.
(math_opts_dom_walker::after_dom_children): Try match saturation
arith when IOR expr.
Signed-off-by: Pan Li <pan2.li@intel.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
` (17 preceding siblings ...)
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
@ 2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
18 siblings, 0 replies; 20+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-16 12:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #20 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:d4dee347b3fe1982bab26485ff31cd039c9df010
commit r15-577-gd4dee347b3fe1982bab26485ff31cd039c9df010
Author: Pan Li <pan2.li@intel.com>
Date: Wed May 15 10:14:06 2024 +0800
Vect: Support new IFN SAT_ADD for unsigned vector int
For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode. For example:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
unsigned i;
for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) <
x[i]));
}
Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
...
_80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
ivtmp_58 = _80 * 8;
vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
... }, vect__7.11_66);
.MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0,
vect__12.15_72);
vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
ivtmp_79 = ivtmp_78 - _80;
...
}
After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
...
_62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
ivtmp_46 = _62 * 8;
vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
.MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0,
vect__12.11_54);
...
}
The below test suites are passed for this patch.
* The riscv fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.
PR target/51492
PR target/112600
gcc/ChangeLog:
* tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
func decl generated by match.pd match.
(vect_recog_sat_add_pattern): New func impl to recog the pattern
for unsigned SAT_ADD.
Signed-off-by: Pan Li <pan2.li@intel.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-05-16 12:09 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-10 1:38 [Bug tree-optimization/51492] New: vectorizer generates unnecessary code drepper.fsp at gmail dot com
2011-12-12 10:25 ` [Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns rguenth at gcc dot gnu.org
2012-01-08 18:57 ` drepper.fsp at gmail dot com
2012-07-13 8:39 ` rguenth at gcc dot gnu.org
2021-08-24 23:44 ` pinskia at gcc dot gnu.org
2021-08-25 3:54 ` pinskia at gcc dot gnu.org
2024-02-01 13:06 ` pan2.li at intel dot com
2024-02-01 13:37 ` juzhe.zhong at rivai dot ai
2024-02-01 13:42 ` juzhe.zhong at rivai dot ai
2024-02-01 14:40 ` juzhe.zhong at rivai dot ai
2024-02-01 14:41 ` juzhe.zhong at rivai dot ai
2024-02-01 15:10 ` tnfchris at gcc dot gnu.org
2024-02-02 1:04 ` pan2.li at intel dot com
2024-02-02 11:11 ` tnfchris at gcc dot gnu.org
2024-02-03 6:57 ` pan2.li at intel dot com
2024-02-06 1:13 ` pan2.li at intel dot com
2024-02-06 22:11 ` tnfchris at gcc dot gnu.org
2024-02-07 0:57 ` pan2.li at intel dot com
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).