public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/96933] New: inefficient code for char/short vec CTOR
@ 2020-09-04 9:31 linkw at gcc dot gnu.org
2020-09-04 9:33 ` [Bug target/96933] rs6000: " linkw at gcc dot gnu.org
` (13 more replies)
0 siblings, 14 replies; 15+ messages in thread
From: linkw at gcc dot gnu.org @ 2020-09-04 9:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
Bug ID: 96933
Summary: inefficient code for char/short vec CTOR
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
When I'm investigate the vectorization cost for vec_construct, I happened to
find the generated code for vector construction is inefficient with DIRECT_MOVE
support.
The test case looks like:
vector unsigned char test_char(unsigned char f1, unsigned char f2,
unsigned char f3, unsigned char f4,
unsigned char f5, unsigned char f6,
unsigned char f7, unsigned char f8,
unsigned char f9, unsigned char f10,
unsigned char f11, unsigned char f12,
unsigned char f13, unsigned char f14,
unsigned char f15, unsigned char f16) {
vector unsigned char v = {f1, f2, f3, f4, f5, f6, f7, f8,
f9, f10, f11, f12, f13, f14, f15, f16};
return v;
}
The generated code currently with -mcpu=power9:
0000000000000000 <test_char>:
0: e8 ff a1 fb std r29,-24(r1)
4: f0 ff c1 fb std r30,-16(r1)
8: f8 ff e1 fb std r31,-8(r1)
c: 60 00 a1 8b lbz r29,96(r1)
10: 68 00 c1 8b lbz r30,104(r1)
14: 70 00 e1 8b lbz r31,112(r1)
18: d1 ff 81 98 stb r4,-47(r1)
1c: d2 ff a1 98 stb r5,-46(r1)
20: 78 00 81 89 lbz r12,120(r1)
24: 80 00 01 88 lbz r0,128(r1)
28: 88 00 61 89 lbz r11,136(r1)
2c: 90 00 81 88 lbz r4,144(r1)
30: 98 00 a1 88 lbz r5,152(r1)
34: d0 ff 61 98 stb r3,-48(r1)
38: d3 ff c1 98 stb r6,-45(r1)
3c: d4 ff e1 98 stb r7,-44(r1)
40: d8 ff a1 9b stb r29,-40(r1)
44: d5 ff 01 99 stb r8,-43(r1)
48: d6 ff 21 99 stb r9,-42(r1)
4c: d7 ff 41 99 stb r10,-41(r1)
50: d9 ff c1 9b stb r30,-39(r1)
54: da ff e1 9b stb r31,-38(r1)
58: db ff 81 99 stb r12,-37(r1)
5c: dc ff 01 98 stb r0,-36(r1)
60: dd ff 61 99 stb r11,-35(r1)
64: de ff 81 98 stb r4,-34(r1)
68: df ff a1 98 stb r5,-33(r1)
6c: e8 ff a1 eb ld r29,-24(r1)
70: f0 ff c1 eb ld r30,-16(r1)
74: f8 ff e1 eb ld r31,-8(r1)
78: d9 ff 41 f4 lxv vs34,-48(r1)
7c: 20 00 80 4e blr
But it can be more efficient with direct move and vector merge, such as:
0: 67 01 43 7c mtvsrd vs34,r3
4: 68 00 61 80 lwz r3,104(r1)
8: 60 00 61 81 lwz r11,96(r1)
c: 67 01 64 7c mtvsrd vs35,r4
10: 70 00 81 80 lwz r4,112(r1)
14: 67 01 03 7d mtvsrd vs40,r3
18: 78 00 61 80 lwz r3,120(r1)
1c: 67 01 85 7c mtvsrd vs36,r5
20: 67 01 a6 7c mtvsrd vs37,r6
24: 67 01 07 7c mtvsrd vs32,r7
28: 67 01 28 7c mtvsrd vs33,r8
2c: 67 01 24 7d mtvsrd vs41,r4
30: 80 00 81 80 lwz r4,128(r1)
34: 0c 10 43 10 vmrghb v2,v3,v2
38: 67 01 63 7c mtvsrd vs35,r3
3c: 88 00 61 80 lwz r3,136(r1)
40: 67 01 eb 7c mtvsrd vs39,r11
44: 0c 20 85 10 vmrghb v4,v5,v4
48: 67 01 a4 7c mtvsrd vs37,r4
4c: 90 00 81 80 lwz r4,144(r1)
50: 0c 00 01 10 vmrghb v0,v1,v0
54: 67 01 23 7c mtvsrd vs33,r3
58: 98 00 61 80 lwz r3,152(r1)
5c: 67 01 c9 7c mtvsrd vs38,r9
60: 0c 38 e8 10 vmrghb v7,v8,v7
64: 67 01 04 7d mtvsrd vs40,r4
68: 0c 48 63 10 vmrghb v3,v3,v9
6c: 67 01 23 7d mtvsrd vs41,r3
70: 0c 28 a1 10 vmrghb v5,v1,v5
74: 67 01 2a 7c mtvsrd vs33,r10
78: 0c 40 09 11 vmrghb v8,v9,v8
7c: 0c 30 21 10 vmrghb v1,v1,v6
80: 4c 11 44 10 vmrglh v2,v4,v2
84: 4c 39 63 10 vmrglh v3,v3,v7
88: 4c 29 88 10 vmrglh v4,v8,v5
8c: 4c 01 a1 10 vmrglh v5,v1,v0
90: 8c 19 64 10 vmrglw v3,v4,v3
94: 8c 11 45 10 vmrglw v2,v5,v2
98: 57 13 43 f0 xxmrgld vs34,vs35,vs34
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
@ 2020-09-04 9:33 ` linkw at gcc dot gnu.org
2020-09-04 10:26 ` segher at gcc dot gnu.org
` (12 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: linkw at gcc dot gnu.org @ 2020-09-04 9:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
Kewen Lin <linkw at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |ASSIGNED
CC| |bergner at gcc dot gnu.org,
| |linkw at gcc dot gnu.org,
| |segher at gcc dot gnu.org,
| |wschmidt at gcc dot gnu.org
Summary|inefficient code for |rs6000: inefficient code
|char/short vec CTOR |for char/short vec CTOR
Last reconfirmed| |2020-09-04
Target| |powerpc
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
2020-09-04 9:33 ` [Bug target/96933] rs6000: " linkw at gcc dot gnu.org
@ 2020-09-04 10:26 ` segher at gcc dot gnu.org
2020-09-04 10:46 ` linkw at gcc dot gnu.org
` (11 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: segher at gcc dot gnu.org @ 2020-09-04 10:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #1 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Is that actually faster though? The original has shorter dependency
chains. Or is this to avoid some LHS/SHL?
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
2020-09-04 9:33 ` [Bug target/96933] rs6000: " linkw at gcc dot gnu.org
2020-09-04 10:26 ` segher at gcc dot gnu.org
@ 2020-09-04 10:46 ` linkw at gcc dot gnu.org
2020-09-04 12:06 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: linkw at gcc dot gnu.org @ 2020-09-04 10:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #1)
> Is that actually faster though? The original has shorter dependency
> chains. Or is this to avoid some LHS/SHL?
Yes, I tested it with one constructed case, the original version takes 18.20s
while the optimized version takes 8.40s. And yes, I guess it's due to LHS/SHL
similar to the vec_insert issue xionghu is working on.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (2 preceding siblings ...)
2020-09-04 10:46 ` linkw at gcc dot gnu.org
@ 2020-09-04 12:06 ` rguenth at gcc dot gnu.org
2020-09-04 13:04 ` segher at gcc dot gnu.org
` (9 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-04 12:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
very likely the byte stores and then the following vector load will also
trigger
STLF issues.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (3 preceding siblings ...)
2020-09-04 12:06 ` rguenth at gcc dot gnu.org
@ 2020-09-04 13:04 ` segher at gcc dot gnu.org
2020-09-07 2:39 ` linkw at gcc dot gnu.org
` (8 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: segher at gcc dot gnu.org @ 2020-09-04 13:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #4 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Yes, timing suggests there is some SHL/LHS flush.
On p9 and later we can use mtvsrdd instead of mtvsrd (moving two
bytes into place at one), which reduces the number of moves from
16 to 8, and the number of merges from 15 to 7 (and reduces path
length by 1). This sounds like a no-brainer win with that :-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (4 preceding siblings ...)
2020-09-04 13:04 ` segher at gcc dot gnu.org
@ 2020-09-07 2:39 ` linkw at gcc dot gnu.org
2020-09-07 7:26 ` linkw at gcc dot gnu.org
` (7 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: linkw at gcc dot gnu.org @ 2020-09-07 2:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #4)
> Yes, timing suggests there is some SHL/LHS flush.
>
> On p9 and later we can use mtvsrdd instead of mtvsrd (moving two
> bytes into place at one), which reduces the number of moves from
> 16 to 8, and the number of merges from 15 to 7 (and reduces path
> length by 1). This sounds like a no-brainer win with that :-)
Good idea, it looks better on P9. One thing to double confirm, currently there
are no instructions like vmrgob and vmrgoh, so of the mergings you mentioned
from vector bytes to vector short and vector short to vector word needs
artificial control vector?
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (5 preceding siblings ...)
2020-09-07 2:39 ` linkw at gcc dot gnu.org
@ 2020-09-07 7:26 ` linkw at gcc dot gnu.org
2020-09-07 15:14 ` segher at gcc dot gnu.org
` (6 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: linkw at gcc dot gnu.org @ 2020-09-07 7:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #6 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #5)
> (In reply to Segher Boessenkool from comment #4)
> > Yes, timing suggests there is some SHL/LHS flush.
> >
> > On p9 and later we can use mtvsrdd instead of mtvsrd (moving two
> > bytes into place at one), which reduces the number of moves from
> > 16 to 8, and the number of merges from 15 to 7 (and reduces path
> > length by 1). This sounds like a no-brainer win with that :-)
>
> Good idea, it looks better on P9. One thing to double confirm, currently
> there are no instructions like vmrgob and vmrgoh, so of the mergings you
> mentioned from vector bytes to vector short and vector short to vector word
> needs artificial control vector?
Improve the patch to support mtvsrdd, the asm for char looks like:
0000000000000000 <test_char>:
0: 00 00 4c 3c addis r2,r12,0
0: R_PPC64_REL16_HA .TOC.
4: 00 00 42 38 addi r2,r2,0
4: R_PPC64_REL16_LO .TOC.+0x4
8: e8 ff a1 fb std r29,-24(r1)
c: 00 00 a2 3f addis r29,r2,0
c: R_PPC64_TOC16_HA .rodata.cst16
10: f0 ff c1 fb std r30,-16(r1)
14: f8 ff e1 fb std r31,-8(r1)
18: 67 1b 24 7c mtvsrdd vs33,r4,r3
1c: 67 3b 28 7d mtvsrdd vs41,r8,r7
20: 68 00 c1 8b lbz r30,104(r1)
24: 78 00 e1 8b lbz r31,120(r1)
28: 00 00 bd 3b addi r29,r29,0
28: R_PPC64_TOC16_LO .rodata.cst16
2c: 60 00 81 89 lbz r12,96(r1)
30: 70 00 61 89 lbz r11,112(r1)
34: 80 00 81 88 lbz r4,128(r1)
38: 88 00 61 88 lbz r3,136(r1)
3c: 90 00 01 89 lbz r8,144(r1)
40: 98 00 e1 88 lbz r7,152(r1)
44: 67 2b 46 7c mtvsrdd vs34,r6,r5
48: 67 4b aa 7d mtvsrdd vs45,r10,r9
4c: 09 00 9d f5 lxv vs44,0(r29)
50: 67 63 5e 7d mtvsrdd vs42,r30,r12
54: 67 5b 1f 7c mtvsrdd vs32,r31,r11
58: e8 ff a1 eb ld r29,-24(r1)
5c: f0 ff c1 eb ld r30,-16(r1)
60: 67 23 63 7d mtvsrdd vs43,r3,r4
64: f8 ff e1 eb ld r31,-8(r1)
68: 3b 0b 42 10 vpermr v2,v2,v1,v12
6c: 67 43 27 7c mtvsrdd vs33,r7,r8
70: 3b 4b ad 11 vpermr v13,v13,v9,v12
74: 3b 53 00 10 vpermr v0,v0,v10,v12
78: 3b 5b 21 10 vpermr v1,v1,v11,v12
7c: 97 11 4d f0 xxmrglw vs34,vs45,vs34
80: 97 01 01 f0 xxmrglw vs32,vs33,vs32
84: 57 13 40 f0 xxmrgld vs34,vs32,vs34
88: 20 00 80 4e blr
For:
1) mtvsrdd under TARGET_DIRECT_MOVE_128
2) mtvsrd under TARGET_DIRECT_MOVE
3) original
The time evaluation on Power9 looks like
1) 7.28s
2) 7.41s
3) 18.19s
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (6 preceding siblings ...)
2020-09-07 7:26 ` linkw at gcc dot gnu.org
@ 2020-09-07 15:14 ` segher at gcc dot gnu.org
2020-09-08 5:26 ` linkw at gcc dot gnu.org
` (5 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: segher at gcc dot gnu.org @ 2020-09-07 15:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
There are vmrglb and vrghb etc.?
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (7 preceding siblings ...)
2020-09-07 15:14 ` segher at gcc dot gnu.org
@ 2020-09-08 5:26 ` linkw at gcc dot gnu.org
2020-09-08 18:30 ` segher at gcc dot gnu.org
` (4 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: linkw at gcc dot gnu.org @ 2020-09-08 5:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #8 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #7)
> There are vmrglb and vrghb etc.?
But these are only for low/high part separately, with mtvsrdd both low/high
parts (doubleword) have the values, we don't have Vector Merge Even/Odd for
char or short to merge them. Now I used one artificial control vector for the
merging, correct me if I miss something.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (8 preceding siblings ...)
2020-09-08 5:26 ` linkw at gcc dot gnu.org
@ 2020-09-08 18:30 ` segher at gcc dot gnu.org
2020-09-09 5:20 ` linkw at gcc dot gnu.org
` (3 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: segher at gcc dot gnu.org @ 2020-09-08 18:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #9 from Segher Boessenkool <segher at gcc dot gnu.org> ---
I'm not sure what you mean.
vmrglb merges the vectors
abcdefghijklmnop
and
ABCDEFGHIJKLMNOP
to
iIjJkKlLmMnNoOpP
... ah, I see what you mean I guess.
So, use something else instead? How about vpku*um?
First vpkudum, xforming
xxxxxxxAxxxxxxxB
and
xxxxxxxCxxxxxxxD
into
xxxAxxxBxxxCxxxD
and then vpkuwum:
xxxAxxxBxxxCxxxD
and
xxxExxxFxxxGxxxH
into
xAxBxCxDxExFxGxH
and finally vpkuhum:
xAxBxCxDxExFxGxH
and
xIxJxKxLxMxNxOxP
into
ABCDEFGHIJKLMNOP
?
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (9 preceding siblings ...)
2020-09-08 18:30 ` segher at gcc dot gnu.org
@ 2020-09-09 5:20 ` linkw at gcc dot gnu.org
2020-11-05 8:09 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: linkw at gcc dot gnu.org @ 2020-09-09 5:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #10 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #9)
> I'm not sure what you mean.
>
> vmrglb merges the vectors
> abcdefghijklmnop
> and
> ABCDEFGHIJKLMNOP
> to
> iIjJkKlLmMnNoOpP
>
> ... ah, I see what you mean I guess.
>
> So, use something else instead? How about vpku*um?
>
> First vpkudum, xforming
> xxxxxxxAxxxxxxxB
> and
> xxxxxxxCxxxxxxxD
> into
> xxxAxxxBxxxCxxxD
>
> and then vpkuwum:
> xxxAxxxBxxxCxxxD
> and
> xxxExxxFxxxGxxxH
> into
> xAxBxCxDxExFxGxH
>
> and finally vpkuhum:
> xAxBxCxDxExFxGxH
> and
> xIxJxKxLxMxNxOxP
> into
> ABCDEFGHIJKLMNOP
>
> ?
Great, it works! Thanks for the advice. By testing, for type char, it's on par
with the artificial control vector version, 7.30s vs. 7.28s, while for type
short, it's better, 28.66s vs. 31.52s. Will update the sent patch to V2.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (10 preceding siblings ...)
2020-09-09 5:20 ` linkw at gcc dot gnu.org
@ 2020-11-05 8:09 ` cvs-commit at gcc dot gnu.org
2020-11-05 8:42 ` linkw at gcc dot gnu.org
2020-11-06 22:14 ` cvs-commit at gcc dot gnu.org
13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-11-05 8:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:
https://gcc.gnu.org/g:025f434a87336e38bf5140fba2005081876aa911
commit r11-4731-g025f434a87336e38bf5140fba2005081876aa911
Author: Kewen Lin <linkw@linux.ibm.com>
Date: Thu Nov 5 00:04:10 2020 -0600
rs6000: Use direct move for char/short vector CTOR [PR96933]
This patch is to make vector CTOR with char/short leverage direct
move instructions when they are available. With one constructed
test case, it can speed up 145% for char and 190% for short on P9.
Tested SPEC2017 x264_r at -Ofast on P9, it gets 1.61% speedup
(but based on unexpected SLP see PR96789).
Bootstrapped/regtested on powerpc64{,le}-linux-gnu P8 and
powerpc64le-linux-gnu P9.
gcc/ChangeLog:
PR target/96933
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Use direct
move
instructions for vector construction with char/short types.
* config/rs6000/rs6000.md (p8_mtvsrwz_v16qisi2): New define_insn.
(p8_mtvsrd_v16qidi2): Likewise.
gcc/testsuite/ChangeLog:
PR target/96933
* gcc.target/powerpc/pr96933-1.c: New test.
* gcc.target/powerpc/pr96933-2.c: New test.
* gcc.target/powerpc/pr96933-3.c: New test.
* gcc.target/powerpc/pr96933-4.c: New test.
* gcc.target/powerpc/pr96933.h: New test.
* gcc.target/powerpc/pr96933-run.h: New test.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (11 preceding siblings ...)
2020-11-05 8:09 ` cvs-commit at gcc dot gnu.org
@ 2020-11-05 8:42 ` linkw at gcc dot gnu.org
2020-11-06 22:14 ` cvs-commit at gcc dot gnu.org
13 siblings, 0 replies; 15+ messages in thread
From: linkw at gcc dot gnu.org @ 2020-11-05 8:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
Kewen Lin <linkw at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #12 from Kewen Lin <linkw at gcc dot gnu.org> ---
Should be done with latest trunk.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/96933] rs6000: inefficient code for char/short vec CTOR
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
` (12 preceding siblings ...)
2020-11-05 8:42 ` linkw at gcc dot gnu.org
@ 2020-11-06 22:14 ` cvs-commit at gcc dot gnu.org
13 siblings, 0 replies; 15+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-11-06 22:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>:
https://gcc.gnu.org/g:e5502ae72f784470019de5850017ad0c87ffacef
commit r11-4805-ge5502ae72f784470019de5850017ad0c87ffacef
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date: Fri Nov 6 12:50:35 2020 +0000
rs6000: Fix TARGET_POWERPC64 vs. TARGET_64BIT confusion
I gave Ke Wen bad advice, luckily David corrected me: it is true that we
cannot use TARGET_POWERPC64 on many 32-bit OSes, since either the kernel
or userland does not save the top half of the 64-bit integer registers,
but we do not have to care about that in separate patterns or related
code. The flag is automatically not enabled by default on targets that
do not handle this correctly.
This patch fixes it.
Segher
2020-11-06 Segher Boessenkool <segher@kernel.crashing.org>
PR target/96933
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Use
TARGET_POWERPC64 instead of TARGET_64BIT.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-11-06 22:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 9:31 [Bug target/96933] New: inefficient code for char/short vec CTOR linkw at gcc dot gnu.org
2020-09-04 9:33 ` [Bug target/96933] rs6000: " linkw at gcc dot gnu.org
2020-09-04 10:26 ` segher at gcc dot gnu.org
2020-09-04 10:46 ` linkw at gcc dot gnu.org
2020-09-04 12:06 ` rguenth at gcc dot gnu.org
2020-09-04 13:04 ` segher at gcc dot gnu.org
2020-09-07 2:39 ` linkw at gcc dot gnu.org
2020-09-07 7:26 ` linkw at gcc dot gnu.org
2020-09-07 15:14 ` segher at gcc dot gnu.org
2020-09-08 5:26 ` linkw at gcc dot gnu.org
2020-09-08 18:30 ` segher at gcc dot gnu.org
2020-09-09 5:20 ` linkw at gcc dot gnu.org
2020-11-05 8:09 ` cvs-commit at gcc dot gnu.org
2020-11-05 8:42 ` linkw at gcc dot gnu.org
2020-11-06 22:14 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).