From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id D57FA3858C2C; Wed, 15 Sep 2021 07:50:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D57FA3858C2C Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.0.43) with SMTP id 18F6vPW3031763; Wed, 15 Sep 2021 03:50:31 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3b3c11s3qp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Sep 2021 03:50:30 -0400 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 18F7ZWxZ007504; Wed, 15 Sep 2021 03:50:30 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 3b3c11s3q0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Sep 2021 03:50:30 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18F7hCKt026278; Wed, 15 Sep 2021 07:50:28 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma04ams.nl.ibm.com with ESMTP id 3b0m3a44p7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Sep 2021 07:50:28 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 18F7oPpK36176358 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 15 Sep 2021 07:50:25 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 987DA11C05C; Wed, 15 Sep 2021 07:50:25 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1E42611C06C; Wed, 15 Sep 2021 07:50:24 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.155.166]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Wed, 15 Sep 2021 07:50:23 +0000 (GMT) Subject: Re: Ping ^ 3: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613] To: Segher Boessenkool Cc: wschmidt@linux.ibm.com, gcc-patches@gcc.gnu.org, linkw@gcc.gnu.org, dje.gcc@gmail.com References: <20210430063258.2774866-1-luoxhu@linux.ibm.com> <20210513104931.GG10366@gate.crashing.org> <973f14d9-c626-89b4-753f-e5c613eef6f2@linux.ibm.com> <2b08faad-887f-73c4-2b84-75dd0feb51c7@linux.ibm.com> <8adb7d7f-44d6-7222-bb42-606092b140b0@linux.ibm.com> From: Xionghu Luo Message-ID: <2d6953e9-ddbd-edff-dc71-2a2f7f13afc3@linux.ibm.com> Date: Wed, 15 Sep 2021 15:50:21 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 In-Reply-To: <8adb7d7f-44d6-7222-bb42-606092b140b0@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: CwIlm2VNYRgxuvcFVmBQl3krSNCfdkTw X-Proofpoint-GUID: ZUDOJdTs_2ZZSaP0KerD0cMXXfeM5WcL Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.687,Hydra:6.0.235,FMLib:17.0.607.475 definitions=2020-10-13_15,2020-10-13_02,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 clxscore=1015 spamscore=0 adultscore=0 bulkscore=0 malwarescore=0 lowpriorityscore=0 mlxscore=0 phishscore=0 mlxlogscore=999 priorityscore=1501 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109150033 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2021 07:50:34 -0000 Ping^3, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html On 2021/9/6 08:52, Xionghu Luo via Gcc-patches wrote: > Ping^2, thanks. > > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html > > On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote: >> Gentle ping, thanks. >> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html >> >> >> On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote: >>> Hi, >>> >>> On 2021/5/13 18:49, Segher Boessenkool wrote: >>>> Hi! >>>> >>>> On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote: >>>>> The vsel instruction is a bit-wise select instruction.  Using an >>>>> IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code >>>>> being generated in the combine pass.  Per element selection is a >>>>> subset of per bit-wise selection,with the patch the pattern is >>>>> written using bit operations.  But there are 8 different patterns >>>>> to define "op0 := (op1 & ~op3) | (op2 & op3)": >>>>> >>>>> (~op3&op1) | (op3&op2), >>>>> (~op3&op1) | (op2&op3), >>>>> (op3&op2) | (~op3&op1), >>>>> (op2&op3) | (~op3&op1), >>>>> (op1&~op3) | (op3&op2), >>>>> (op1&~op3) | (op2&op3), >>>>> (op3&op2) | (op1&~op3), >>>>> (op2&op3) | (op1&~op3), >>>>> >>>>> Combine pass will swap (op1&~op3) to (~op3&op1) due to commutative >>>>> canonical, which could reduce it to the FIRST 4 patterns, but it won't >>>>> swap (op2&op3) | (~op3&op1) to (~op3&op1) | (op2&op3), so this patch >>>>> handles it with two patterns with different NOT op3 position and check >>>>> equality inside it. >>>> >>>> Yup, that latter case does not have canonicalisation rules.  Btw, not >>>> only combine does this canonicalisation: everything should, >>>> non-canonical RTL is invalid RTL (in the instruction stream, you can do >>>> everything in temporary code of course, as long as the RTL isn't >>>> malformed). >>>> >>>>> -(define_insn "*altivec_vsel" >>>>> +(define_insn "altivec_vsel" >>>>>     [(set (match_operand:VM 0 "altivec_register_operand" "=v") >>>>> -    (if_then_else:VM >>>>> -     (ne:CC (match_operand:VM 1 "altivec_register_operand" "v") >>>>> -        (match_operand:VM 4 "zero_constant" "")) >>>>> -     (match_operand:VM 2 "altivec_register_operand" "v") >>>>> -     (match_operand:VM 3 "altivec_register_operand" "v")))] >>>>> -  "VECTOR_MEM_ALTIVEC_P (mode)" >>>>> -  "vsel %0,%3,%2,%1" >>>>> +    (ior:VM >>>>> +     (and:VM >>>>> +      (not:VM (match_operand:VM 3 "altivec_register_operand" "v")) >>>>> +      (match_operand:VM 1 "altivec_register_operand" "v")) >>>>> +     (and:VM >>>>> +      (match_operand:VM 2 "altivec_register_operand" "v") >>>>> +      (match_operand:VM 4 "altivec_register_operand" "v"))))] >>>>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) >>>>> +  && (rtx_equal_p (operands[2], operands[3]) >>>>> +  || rtx_equal_p (operands[4], operands[3]))" >>>>> +  { >>>>> +    if (rtx_equal_p (operands[2], operands[3])) >>>>> +      return "vsel %0,%1,%4,%3"; >>>>> +    else >>>>> +      return "vsel %0,%1,%2,%3"; >>>>> +  } >>>>>     [(set_attr "type" "vecmove")]) >>>> >>>> That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I >>>> think.  So please write this as two patterns (and keep the expand if >>>> that helps). >>> >>> I was a bit concerned that there would be a lot of duplicate code if we >>> write two patterns for each vsel, totally 4 similar patterns in >>> altivec.md and another 4 in vsx.md make it difficult to maintain, >>> however >>> I updated it since you prefer this way, as you pointed out the xxsel in >>> vsx.md could be folded by later patch. >>> >>>> >>>>> +(define_insn "altivec_vsel2" >>>> >>>> (same here of course). >>>> >>>>>   ;; Fused multiply add. >>>>> diff --git a/gcc/config/rs6000/rs6000-call.c >>>>> b/gcc/config/rs6000/rs6000-call.c >>>>> index f5676255387..d65bdc01055 100644 >>>>> --- a/gcc/config/rs6000/rs6000-call.c >>>>> +++ b/gcc/config/rs6000/rs6000-call.c >>>>> @@ -3362,11 +3362,11 @@ const struct altivec_builtin_types >>>>> altivec_overloaded_builtins[] = { >>>>>       RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, >>>>> RS6000_BTI_unsigned_V2DI }, >>>>>     { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI, >>>>>       RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, >>>>> RS6000_BTI_V2DI }, >>>>> -  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI, >>>>> +  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS, >>>> >>>> Are the _uns things still used for anything?  But, let's not change >>>> this until Bill's stuff is in :-) >>>> >>>> Why do you want to change this here, btw?  I don't understand. >>> >>> OK, they are actually "unsigned type" overload builtin functions, change >>> it or not so far won't cause functionality issue, I will revert this >>> change >>> in the updated patch. >>> >>>> >>>>> +  if (target == 0 >>>>> +      || GET_MODE (target) != tmode >>>>> +      || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) >>>> >>>> No space after ! and other unary operators (except for casts and other >>>> operators you write with alphanumerics, like "sizeof").  I know you >>>> copied this code, but :-) >>> >>> OK, thanks. >>> >>>> >>>>> @@ -15608,8 +15606,6 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx >>>>> op_true, rtx op_false, >>>>>       case GEU: >>>>>       case LTU: >>>>>       case LEU: >>>>> -      /* Mark unsigned tests with CCUNSmode.  */ >>>>> -      cc_mode = CCUNSmode; >>>>>         /* Invert condition to avoid compound test if necessary.  */ >>>>>         if (rcode == GEU || rcode == LEU) >>>> >>>> So this is related to the _uns thing.  Could you split off that change? >>>> Probably as an earlier patch (but either works for me). >>> >>> Not related to the ALTIVEC_BUILTIN_VSEL_2DI_UNS things, previously >>> cc_mode >>> is a parameter to generate the condition for IF_THEN_ELSE >>> instruction, now >>> we don't need it again as we use IOR (AND... AND...) style, remove it >>> to avoid >>> build error. >>> >>> >>> -  cond2 = gen_rtx_fmt_ee (NE, cc_mode, gen_lowpart (dest_mode, mask), >>> -                         CONST0_RTX (dest_mode)); >>> -  emit_insn (gen_rtx_SET (dest, >>> -                         gen_rtx_IF_THEN_ELSE (dest_mode, >>> -                                               cond2, >>> -                                               op_true, >>> -                                               op_false))); >>> +  rtx tmp = gen_rtx_IOR (dest_mode, >>> +                        gen_rtx_AND (dest_mode, gen_rtx_NOT >>> (dest_mode, mask), >>> +                                     op_false), >>> +                        gen_rtx_AND (dest_mode, mask, op_true)); >>> +  emit_insn (gen_rtx_SET (dest, tmp)); >>> >>> >>>> >>>>> @@ -15629,6 +15625,9 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx >>>>> op_true, rtx op_false, >>>>>     if (!mask) >>>>>       return 0; >>>>> +  if (mask_mode != dest_mode) >>>>> +      mask = simplify_gen_subreg (dest_mode, mask, mask_mode, 0); >>>> >>>> Indent just two characters please: line continuations (usually) align, >>>> but indents do not.> >>>> Can you fold vsel and xxsel together completely?  They have exactly the >>>> same semantics!  This does not have to be in this patch of course. >>> >>> I noticed that vperm/xxperm are folded together, do you mean fold >>> vsel/xxsel >>> like them?  It's attached as: >>> 0002-rs6000-Fold-xxsel-to-vsel-since-they-have-same-seman.patch >>> >>> >>> Thanks, >>> Xionghu >> > -- Thanks, Xionghu