From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 32FA93851C07; Wed, 2 Sep 2020 09:26:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 32FA93851C07 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08292HG9089416; Wed, 2 Sep 2020 05:26:57 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 33a7pa21qc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Sep 2020 05:26:57 -0400 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 08292EPw088944; Wed, 2 Sep 2020 05:26:56 -0400 Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 33a7pa21ph-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Sep 2020 05:26:56 -0400 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0829MX4e003511; Wed, 2 Sep 2020 09:26:54 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma05fra.de.ibm.com with ESMTP id 337en82scp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Sep 2020 09:26:54 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0829QpWE23331160 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 2 Sep 2020 09:26:51 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B8AC811C05B; Wed, 2 Sep 2020 09:26:51 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4878C11C064; Wed, 2 Sep 2020 09:26:49 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.197.240.238]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 2 Sep 2020 09:26:48 +0000 (GMT) Subject: Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251] To: Richard Biener Cc: Segher Boessenkool , Bill Schmidt , GCC Patches , linkw@gcc.gnu.org, David Edelsohn References: <20200831090647.152432-1-luoxhu@linux.ibm.com> <20200831170414.GJ28786@gate.crashing.org> <3aec8b6f-8cf5-bd48-eb4e-c7b82e88dcd7@linux.ibm.com> From: luoxhu Message-ID: <53017f5c-d81b-6fe2-6c5d-1496249313f1@linux.ibm.com> Date: Wed, 2 Sep 2020 17:26:47 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-09-02_03:2020-09-02, 2020-09-02 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=999 spamscore=0 phishscore=0 suspectscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 clxscore=1015 impostorscore=0 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009020079 X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Sep 2020 09:27:00 -0000 Hi, On 2020/9/1 21:07, Richard Biener wrote: > On Tue, Sep 1, 2020 at 10:11 AM luoxhu via Gcc-patches > wrote: >> >> Hi, >> >> On 2020/9/1 01:04, Segher Boessenkool wrote: >>> Hi! >>> >>> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote: >>>> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value >>>> to be insert, arg2 is the place to insert arg1 to arg0. This patch adds >>>> __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to >>>> not expand too early in gimple stage if arg2 is variable, to avoid generate >>>> store hit load instructions. >>>> >>>> For Power9 V4SI: >>>> addi 9,1,-16 >>>> rldic 6,6,2,60 >>>> stxv 34,-16(1) >>>> stwx 5,9,6 >>>> lxv 34,-16(1) >>>> => >>>> addis 9,2,.LC0@toc@ha >>>> addi 9,9,.LC0@toc@l >>>> mtvsrwz 33,5 >>>> lxv 32,0(9) >>>> sradi 9,6,2 >>>> addze 9,9 >>>> sldi 9,9,2 >>>> subf 9,9,6 >>>> subfic 9,9,3 >>>> sldi 9,9,2 >>>> subfic 9,9,20 >>>> lvsl 13,0,9 >>>> xxperm 33,33,45 >>>> xxperm 32,32,45 >>>> xxsel 34,34,33,32 >>> >>> For v a V4SI, x a SI, j some int, what do we generate for >>> v[j&3] = x; >>> ? >>> This should be exactly the same as we generate for >>> vec_insert(x, v, j); >>> (the builtin does a modulo 4 automatically). >> >> No, even with my patch "stxv 34,-16(1);stwx 5,9,6;lxv 34,-16(1)" generated currently. >> Is it feasible and acceptable to expand some kind of pattern in expander directly without >> builtin transition? >> >> I borrowed some of implementation from vec_extract. For vec_extract, the issue also exists: >> >> source: gimple: expand: asm: >> 1) i = vec_extract (v, n); => i = __builtin_vec_ext_v4si (v, n); => {r120:SI=unspec[r118:V4SI,r119:DI] 134;...} => slwi 9,6,2 vextuwrx 3,9,2 >> 2) i = vec_extract (v, 3); => i = __builtin_vec_ext_v4si (v, 3); => {r120:SI=vec_select(r118:V4SI,parallel)...} => li 9,12 vextuwrx 3,9,2 >> 3) i = v[n%4]; => _1 = n & 3; i = VIEW_CONVERT_EXPR(v)[_1]; => ... => stxv 34,-16(1);addi 9,1,-16; rldic 5,5,2,60; lwax 3,9,5 >> 4) i = v[3]; => i = BIT_FIELD_REF ; => {r120:SI=vec_select(r118:V4SI,parallel)...} => li 9,12; vextuwrx 3,9,2 > > Why are 1) and 2) handled differently than 3)/4)? 1) and 2) are calling builtin function vec_extract, it is defined to __builtin_vec_extract and will be resolved to ALTIVEC_BUILTIN_VEC_EXTRACT by resolve_overloaded_builtin, to generate a call __builtin_vec_ext_v4si to be expanded only in RTL. 3) is access variable v as array type with opcode VIEW_CONVERT_EXPR, I guess we should also generate builtin call instead of calling convert_vector_to_array_for_subscript to generate VIEW_CONVERT_EXPR expression for such kind of usage. 4) is translated to BIT_FIELD_REF with constant bitstart and bitsize, variable v could also be accessed by register instead of stack, so optabs could match the rs6000_expand_vector_insert to generate expected instruction through extract_bit_field. > >> Case 3) also couldn't handle the similar usage, and case 4) doesn't generate builtin as expected, >> it just expand to vec_select by coincidence. So does this mean both vec_insert and vec_extract >> and all other similar vector builtins should use IFN as suggested by Richard Biener, to match the >> pattern in gimple and expand both constant and variable index in expander? Will this also be >> beneficial for other targets except power? Or we should do that gradually after this patch >> approved as it seems another independent issue? Thanks:) > > If the code generated for 3)/4) isn't optimal you have to figure why > by tracing the RTL > expansion code and looking for missing optabs. > > Consider the amount of backend code you need to write if ever using > those in constexpr > context ... It seems too complicated to expand the "i = VIEW_CONVERT_EXPR(v)[_1];" or "VIEW_CONVERT_EXPR(v1)[_1] = i_6(D);" to rs6000_expand_vector_insert/rs6000_expand_vector_extract in RTL, as: 1) Vector v is stored to stack with array type; need extra load and store operation. 2) Requires amount of code to decompose VIEW_CONVERT_EXPR to extract the vector and index then call rs6000_expand_vector_insert/rs6000_expand_vector_extract. which means replace following instructions #9~#12 to new instruction sequences: 1: NOTE_INSN_DELETED 6: NOTE_INSN_BASIC_BLOCK 2 2: r119:V4SI=%2:V4SI 3: r120:DI=%5:DI 4: r121:DI=%6:DI 5: NOTE_INSN_FUNCTION_BEG 8: [r112:DI]=r119:V4SI 9: r122:DI=r121:DI&0x3 10: r123:DI=r122:DI<<0x2 11: r124:DI=r112:DI+r123:DI 12: [r124:DI]=r120:DI#0 13: r118:V4SI=[r112:DI] 17: %2:V4SI=r118:V4SI 18: use %2:V4SI => 1: NOTE_INSN_DELETED 6: NOTE_INSN_BASIC_BLOCK 2 2: r119:V4SI=%2:V4SI 3: r120:DI=%5:DI 4: r121:DI=%6:DI 5: NOTE_INSN_FUNCTION_BEG 8: [r112:DI]=r119:V4SI r130:V4SI=[r112:DI] rs6000_expand_vector_insert (r130, r121:DI&0x3, r120:DI#0) [r112:DI]=r130:V4SI 13: r118:V4SI=[r112:DI] 17: %2:V4SI=r118:V4SI 18: use %2:V4SI so maybe bypass convert_vector_to_array_for_subscript for special circumstance like "i = v[n%4]" or "v[n&3]=i" to generate vec_extract or vec_insert builtin call a relative simpler method? Xionghu