From: xionghu luo <luoxhu@linux.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
GCC Patches <gcc-patches@gcc.gnu.org>,
David Edelsohn <dje.gcc@gmail.com>,
Bill Schmidt <wschmidt@linux.ibm.com>,
Jiufu Guo <guojiufu@linux.ibm.com>,
linkw@gcc.gnu.org, Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]
Date: Fri, 25 Sep 2020 11:50:40 +0800 [thread overview]
Message-ID: <6d2ccf37-669f-c80d-2efd-1ceb086f14f0@linux.ibm.com> (raw)
In-Reply-To: <CAFiYyc2O+dj-ApbaDODW7i9Jmc0_pdrA3JXAoxQXuwHvqRmAeA@mail.gmail.com>
Hi,
On 2020/9/24 21:27, Richard Biener wrote:
> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo <luoxhu@linux.ibm.com> wrote:
>
> I'll just comment that
>
> xxperm 34,34,33
> xxinsertw 34,0,12
> xxperm 34,34,32
>
> doesn't look like a variable-position insert instruction but
> this is a variable whole-vector rotate plus an insert at index zero
> followed by a variable whole-vector rotate. I'm not fluend in
> ppc assembly but
>
> rlwinm 6,6,2,28,29
> mtvsrwz 0,5
> lvsr 1,0,6
> lvsl 0,0,6
>
> possibly computes the shift masks for r33/r32? though
> I do not see those registers mentioned...
For V4SI:
rlwinm 6,6,2,28,29 // r6*4
mtvsrwz 0,5 // vs0 <- r5 (0xfe)
lvsr 1,0,6 // vs33 <- lvsr[r6]
lvsl 0,0,6 // vs32 <- lvsl[r6]
xxperm 34,34,33
xxinsertw 34,0,12
xxperm 34,34,32
blr
idx = idx * 4;
0 0 0x4000000030000000200000001 xxperm:0x4000000030000000200000001 vs33:0x101112131415161718191a1b1c1d1e1f vs32:0x102030405060708090a0b0c0d0e0f
1 4 0x4000000030000000200000001 xxperm:0x1000000040000000300000002 vs33:0xc0d0e0f101112131415161718191a1b vs32:0x405060708090a0b0c0d0e0f10111213
2 8 0x4000000030000000200000001 xxperm:0x2000000010000000400000003 vs33:0x8090a0b0c0d0e0f1011121314151617 vs32:0x8090a0b0c0d0e0f1011121314151617
3 12 0x4000000030000000200000001 xxperm:0x3000000020000000100000004 vs33:0x405060708090a0b0c0d0e0f10111213 vs32:0xc0d0e0f101112131415161718191a1b
vs34:
0x40000000300000002000000fe
0x400000003000000fe00000001
0x4000000fe0000000200000001
0xfe000000030000000200000001
"xxinsertw 34,0,12" will always insert vs0[32:63] content to the forth word of
target vector, bits[96:127]. Then the second xxperm rotate the modified vector
back.
All the instructions are register based operation, as Segher replied, power9
supports only fixed position inserts, so we need do some trick here to support
it instead of generate short store wide load instructions.
>
> This might be a generic viable expansion strathegy btw,
> which is why I asked before whether the CPU supports
> inserts at a variable position ... the building blocks are
> already there with vec_set at constant zero position
> plus vec_perm_const for the rotates.
>
> But well, I did ask this question. Multiple times.
>
> ppc does _not_ have a VSX instruction
> like xxinsertw r34, r8, r12 where r8 denotes
> the vector element (or byte position or whatever).
>
> So I don't think vec_set with a variable index is the
> best approach.
> Xionghu - you said even without the patch the stack
> storage is eventually elided but
>
> addi 9,1,-16
> rldic 6,6,2,60
> stxv 34,-16(1)
> stwx 5,9,6
> lxv 34,-16(1)
>
> still shows stack(?) store/load with a bad STLF penalty.
Sorry that if I didn't describe clearly and misunderstood you, I mean if insert many
instructions(tested with a loop inserted) between "stwx 5,9,6" and "lxv 34,-16(1)",
the store hit load performance issue could be elided, but this is not the solution
we want.
I also changed your test as below and build for X86, seems it also generates
inefficient code? What my patch does maybe different usage from your pasted
case?
#define N 32
typedef int T;
typedef T V __attribute__((vector_size(N)));
V setg3 (V v, int idx, T val)
{
v[idx&31] = val;
return v;
}
-O2 -S -mavx -march=znver2:
setg3:
push rbp
and edi, 31
mov rbp, rsp
and rsp, -32
vmovdqa YMMWORD PTR [rsp-32], ymm0
mov DWORD PTR [rsp-32+rdi*4], esi
vmovdqa ymm0, YMMWORD PTR [rsp-32]
leave
ret
While idx is constant:
setg3:
vpinsrd xmm1, xmm0, esi, 3
vinserti128 ymm0, ymm0, xmm1, 0x0
ret
And ARM with -O2 -S -march=armv8.2-a+sve (N change to 16):
setg3:
sub sp, sp, #16
and x0, x0, 15
str q0, [sp]
str w1, [sp, x0, lsl 2]
ldr q0, [sp]
add sp, sp, 16
ret
While idx is constant:
setg3:
ins v0.s[3], w1
ret
Though I've no idea how to optimize this on X86 and ARM with vector instructions
to avoid short store with wide load followed on stack.
Thanks,
Xionghu
next prev parent reply other threads:[~2020-09-25 3:50 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-18 6:17 [PATCH v2 1/2] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR Xiong Hu Luo
2020-09-18 6:17 ` [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251] Xiong Hu Luo
2020-09-18 20:19 ` Segher Boessenkool
2020-09-24 8:21 ` xionghu luo
2020-09-24 13:27 ` Richard Biener
2020-09-24 14:55 ` Richard Biener
2020-09-24 19:36 ` Segher Boessenkool
2020-09-25 6:58 ` Richard Biener
2020-09-25 12:21 ` Richard Sandiford
2020-09-25 22:39 ` Segher Boessenkool
2020-09-28 8:00 ` Richard Biener
2020-09-24 19:02 ` Segher Boessenkool
2020-09-25 3:50 ` xionghu luo [this message]
2020-09-25 5:00 ` Richard Biener
2020-09-18 18:20 ` [PATCH v2 1/2] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR Segher Boessenkool
2020-09-21 8:31 ` Richard Biener
2020-09-22 3:55 ` [PATCH v3 " xionghu luo
2020-09-23 11:33 ` Richard Biener
2020-09-24 3:24 ` xionghu luo
2020-09-24 12:39 ` Richard Sandiford
2020-09-25 6:51 ` [PATCH v4 1/3] " xionghu luo
2020-09-25 11:32 ` Richard Biener
2020-09-25 13:28 ` Richard Sandiford
2020-09-27 5:45 ` xionghu luo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d2ccf37-669f-c80d-2efd-1ceb086f14f0@linux.ibm.com \
--to=luoxhu@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=guojiufu@linux.ibm.com \
--cc=linkw@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
--cc=richard.sandiford@arm.com \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).