From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 4A5C13858401; Fri, 17 Sep 2021 05:44:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4A5C13858401 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18H5UX0k027605; Fri, 17 Sep 2021 01:44:05 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3b4mxc892d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Sep 2021 01:44:05 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 18H5eQ4g002801; Fri, 17 Sep 2021 01:44:05 -0400 Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0b-001b2d01.pphosted.com with ESMTP id 3b4mxc891r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Sep 2021 01:44:05 -0400 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18H5giju012191; Fri, 17 Sep 2021 05:44:03 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma06ams.nl.ibm.com with ESMTP id 3b0kqkhsfd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Sep 2021 05:44:03 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 18H5i19G47186250 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Sep 2021 05:44:01 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EFB2CA405B; Fri, 17 Sep 2021 05:44:00 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 39CB6A4065; Fri, 17 Sep 2021 05:43:59 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.155.166]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 17 Sep 2021 05:43:58 +0000 (GMT) Subject: Re: Ping ^ 3: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613] To: David Edelsohn Cc: Segher Boessenkool , Bill Schmidt , GCC Patches , linkw@gcc.gnu.org References: <20210430063258.2774866-1-luoxhu@linux.ibm.com> <20210513104931.GG10366@gate.crashing.org> <973f14d9-c626-89b4-753f-e5c613eef6f2@linux.ibm.com> <2b08faad-887f-73c4-2b84-75dd0feb51c7@linux.ibm.com> <8adb7d7f-44d6-7222-bb42-606092b140b0@linux.ibm.com> <2d6953e9-ddbd-edff-dc71-2a2f7f13afc3@linux.ibm.com> From: Xionghu Luo Message-ID: <6ceb35dd-71ff-c8bc-eb46-fbd20f9a1663@linux.ibm.com> Date: Fri, 17 Sep 2021 13:43:56 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: sVO3Z35_fWPbyfayNomuKTIIaw3G1p4W X-Proofpoint-GUID: 62BS-8IZOMcrOvTtniW3c21iSLAsBWNH Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-17_02,2021-09-16_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 spamscore=0 priorityscore=1501 mlxscore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 adultscore=0 mlxlogscore=999 phishscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109170033 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2021 05:44:08 -0000 On 2021/9/15 21:11, David Edelsohn wrote: > Hi, Xionhu > > Should "altivec_vsel2" .. 3 .. 4 be "*altivec_vsel2", etc. > because they are combiner patterns and never referenced by name? Only > the first, named pattern is referenced by the builtin code. Thanks, updated the patchset with Segher's review comments, he didn't mention about this and sorry to forget change this part, I am also not sure whether "altivec_vsel2" .. 3 .. 4 will be used/generated or optimized by expander in future, is there any benefit to add "*" to the define_insn patterns? > > Other than that question / suggestion, this patch is okay. Please > coordinate with Bill and his builtin patches. OK. > > Thanks, David > > On Wed, Sep 15, 2021 at 3:50 AM Xionghu Luo wrote: >> >> Ping^3, thanks. >> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html >> >> >> On 2021/9/6 08:52, Xionghu Luo via Gcc-patches wrote: >>> Ping^2, thanks. >>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html >>> >>> On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote: >>>> Gentle ping, thanks. >>>> >>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html >>>> >>>> >>>> On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote: >>>>> Hi, >>>>> >>>>> On 2021/5/13 18:49, Segher Boessenkool wrote: >>>>>> Hi! >>>>>> >>>>>> On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote: >>>>>>> The vsel instruction is a bit-wise select instruction. Using an >>>>>>> IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code >>>>>>> being generated in the combine pass. Per element selection is a >>>>>>> subset of per bit-wise selection,with the patch the pattern is >>>>>>> written using bit operations. But there are 8 different patterns >>>>>>> to define "op0 := (op1 & ~op3) | (op2 & op3)": >>>>>>> >>>>>>> (~op3&op1) | (op3&op2), >>>>>>> (~op3&op1) | (op2&op3), >>>>>>> (op3&op2) | (~op3&op1), >>>>>>> (op2&op3) | (~op3&op1), >>>>>>> (op1&~op3) | (op3&op2), >>>>>>> (op1&~op3) | (op2&op3), >>>>>>> (op3&op2) | (op1&~op3), >>>>>>> (op2&op3) | (op1&~op3), >>>>>>> >>>>>>> Combine pass will swap (op1&~op3) to (~op3&op1) due to commutative >>>>>>> canonical, which could reduce it to the FIRST 4 patterns, but it won't >>>>>>> swap (op2&op3) | (~op3&op1) to (~op3&op1) | (op2&op3), so this patch >>>>>>> handles it with two patterns with different NOT op3 position and check >>>>>>> equality inside it. >>>>>> >>>>>> Yup, that latter case does not have canonicalisation rules. Btw, not >>>>>> only combine does this canonicalisation: everything should, >>>>>> non-canonical RTL is invalid RTL (in the instruction stream, you can do >>>>>> everything in temporary code of course, as long as the RTL isn't >>>>>> malformed). >>>>>> >>>>>>> -(define_insn "*altivec_vsel" >>>>>>> +(define_insn "altivec_vsel" >>>>>>> [(set (match_operand:VM 0 "altivec_register_operand" "=v") >>>>>>> - (if_then_else:VM >>>>>>> - (ne:CC (match_operand:VM 1 "altivec_register_operand" "v") >>>>>>> - (match_operand:VM 4 "zero_constant" "")) >>>>>>> - (match_operand:VM 2 "altivec_register_operand" "v") >>>>>>> - (match_operand:VM 3 "altivec_register_operand" "v")))] >>>>>>> - "VECTOR_MEM_ALTIVEC_P (mode)" >>>>>>> - "vsel %0,%3,%2,%1" >>>>>>> + (ior:VM >>>>>>> + (and:VM >>>>>>> + (not:VM (match_operand:VM 3 "altivec_register_operand" "v")) >>>>>>> + (match_operand:VM 1 "altivec_register_operand" "v")) >>>>>>> + (and:VM >>>>>>> + (match_operand:VM 2 "altivec_register_operand" "v") >>>>>>> + (match_operand:VM 4 "altivec_register_operand" "v"))))] >>>>>>> + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) >>>>>>> + && (rtx_equal_p (operands[2], operands[3]) >>>>>>> + || rtx_equal_p (operands[4], operands[3]))" >>>>>>> + { >>>>>>> + if (rtx_equal_p (operands[2], operands[3])) >>>>>>> + return "vsel %0,%1,%4,%3"; >>>>>>> + else >>>>>>> + return "vsel %0,%1,%2,%3"; >>>>>>> + } >>>>>>> [(set_attr "type" "vecmove")]) >>>>>> >>>>>> That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I >>>>>> think. So please write this as two patterns (and keep the expand if >>>>>> that helps). >>>>> >>>>> I was a bit concerned that there would be a lot of duplicate code if we >>>>> write two patterns for each vsel, totally 4 similar patterns in >>>>> altivec.md and another 4 in vsx.md make it difficult to maintain, >>>>> however >>>>> I updated it since you prefer this way, as you pointed out the xxsel in >>>>> vsx.md could be folded by later patch. >>>>> >>>>>> >>>>>>> +(define_insn "altivec_vsel2" >>>>>> >>>>>> (same here of course). >>>>>> >>>>>>> ;; Fused multiply add. >>>>>>> diff --git a/gcc/config/rs6000/rs6000-call.c >>>>>>> b/gcc/config/rs6000/rs6000-call.c >>>>>>> index f5676255387..d65bdc01055 100644 >>>>>>> --- a/gcc/config/rs6000/rs6000-call.c >>>>>>> +++ b/gcc/config/rs6000/rs6000-call.c >>>>>>> @@ -3362,11 +3362,11 @@ const struct altivec_builtin_types >>>>>>> altivec_overloaded_builtins[] = { >>>>>>> RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, >>>>>>> RS6000_BTI_unsigned_V2DI }, >>>>>>> { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI, >>>>>>> RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, >>>>>>> RS6000_BTI_V2DI }, >>>>>>> - { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI, >>>>>>> + { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS, >>>>>> >>>>>> Are the _uns things still used for anything? But, let's not change >>>>>> this until Bill's stuff is in :-) >>>>>> >>>>>> Why do you want to change this here, btw? I don't understand. >>>>> >>>>> OK, they are actually "unsigned type" overload builtin functions, change >>>>> it or not so far won't cause functionality issue, I will revert this >>>>> change >>>>> in the updated patch. >>>>> >>>>>> >>>>>>> + if (target == 0 >>>>>>> + || GET_MODE (target) != tmode >>>>>>> + || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) >>>>>> >>>>>> No space after ! and other unary operators (except for casts and other >>>>>> operators you write with alphanumerics, like "sizeof"). I know you >>>>>> copied this code, but :-) >>>>> >>>>> OK, thanks. >>>>> >>>>>> >>>>>>> @@ -15608,8 +15606,6 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx >>>>>>> op_true, rtx op_false, >>>>>>> case GEU: >>>>>>> case LTU: >>>>>>> case LEU: >>>>>>> - /* Mark unsigned tests with CCUNSmode. */ >>>>>>> - cc_mode = CCUNSmode; >>>>>>> /* Invert condition to avoid compound test if necessary. */ >>>>>>> if (rcode == GEU || rcode == LEU) >>>>>> >>>>>> So this is related to the _uns thing. Could you split off that change? >>>>>> Probably as an earlier patch (but either works for me). >>>>> >>>>> Not related to the ALTIVEC_BUILTIN_VSEL_2DI_UNS things, previously >>>>> cc_mode >>>>> is a parameter to generate the condition for IF_THEN_ELSE >>>>> instruction, now >>>>> we don't need it again as we use IOR (AND... AND...) style, remove it >>>>> to avoid >>>>> build error. >>>>> >>>>> >>>>> - cond2 = gen_rtx_fmt_ee (NE, cc_mode, gen_lowpart (dest_mode, mask), >>>>> - CONST0_RTX (dest_mode)); >>>>> - emit_insn (gen_rtx_SET (dest, >>>>> - gen_rtx_IF_THEN_ELSE (dest_mode, >>>>> - cond2, >>>>> - op_true, >>>>> - op_false))); >>>>> + rtx tmp = gen_rtx_IOR (dest_mode, >>>>> + gen_rtx_AND (dest_mode, gen_rtx_NOT >>>>> (dest_mode, mask), >>>>> + op_false), >>>>> + gen_rtx_AND (dest_mode, mask, op_true)); >>>>> + emit_insn (gen_rtx_SET (dest, tmp)); >>>>> >>>>> >>>>>> >>>>>>> @@ -15629,6 +15625,9 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx >>>>>>> op_true, rtx op_false, >>>>>>> if (!mask) >>>>>>> return 0; >>>>>>> + if (mask_mode != dest_mode) >>>>>>> + mask = simplify_gen_subreg (dest_mode, mask, mask_mode, 0); >>>>>> >>>>>> Indent just two characters please: line continuations (usually) align, >>>>>> but indents do not.> >>>>>> Can you fold vsel and xxsel together completely? They have exactly the >>>>>> same semantics! This does not have to be in this patch of course. >>>>> >>>>> I noticed that vperm/xxperm are folded together, do you mean fold >>>>> vsel/xxsel >>>>> like them? It's attached as: >>>>> 0002-rs6000-Fold-xxsel-to-vsel-since-they-have-same-seman.patch >>>>> >>>>> >>>>> Thanks, >>>>> Xionghu >>>> >>> >> >> -- >> Thanks, >> Xionghu -- Thanks, Xionghu