From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 542FD3858C52 for ; Mon, 21 Nov 2022 02:01:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 542FD3858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AL0u2FR032481; Mon, 21 Nov 2022 02:01:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=bEdTnQkyhRfKS/6aZeeZGOv1dnXLGt+G/rMoQ8E+5rY=; b=koel+gqt46IaZw77ml6t0eHE5RT+Avpy4mQMKD7VDxjxseMTLd69pmCwwYqo7Tds04yY hoeKZvEtKmgXXEcqRafG9Aea4+4uTlDrwEQwvEpRF9pbCccOVwMtZo0YIaQxehB0e7UZ xryHboYMGByjxEQlhYc/zwSBZgbpK328+V2+Fh0KdZY6S5WWjZ/9N2BKCFe/de4EUiKS 9HcdbLrOhPr6jcdL7XAAxOd633p9Mc7w81g+dKRvGa3JABVhlj1zBvbzwaT/kRH4TJKr PkjHrcA0batfMObrDzA0jcYa3gw/bCZSFb/kJSiNzdD/hFtthNSKuZwp0YX6+1cqC9I9 Mw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ky8wc494f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 21 Nov 2022 02:01:24 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AL1FbB7028812; Mon, 21 Nov 2022 02:01:24 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ky8wc493j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 21 Nov 2022 02:01:24 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AL1q6br000705; Mon, 21 Nov 2022 02:01:21 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma06fra.de.ibm.com with ESMTP id 3kxpdj1et6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 21 Nov 2022 02:01:21 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AL21IHH13500884 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 21 Nov 2022 02:01:18 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 50167A405B; Mon, 21 Nov 2022 02:01:18 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FDA3A4054; Mon, 21 Nov 2022 02:01:16 +0000 (GMT) Received: from [9.200.45.208] (unknown [9.200.45.208]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 21 Nov 2022 02:01:15 +0000 (GMT) Message-ID: <7ccd6eda-8d29-8e41-6e9f-041097eee42e@linux.ibm.com> Date: Mon, 21 Nov 2022 10:01:14 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare Content-Language: en-US To: Segher Boessenkool Cc: GCC Patches , David Edelsohn , Peter Bergner , Michael Meissner References: <20221116184437.GW25951@gate.crashing.org> <30553cb3-35d0-c51b-ffc4-0f8a320bd1ed@linux.ibm.com> <20221118151030.GA25951@gate.crashing.org> From: "Kewen.Lin" In-Reply-To: <20221118151030.GA25951@gate.crashing.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: wmTuageOviFBrmsfooIOh8EyfreJqRRx X-Proofpoint-GUID: U65f_M3i9u8LRb7f4-_3cRn7fkMjP4Nt X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-20_13,2022-11-18_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 adultscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211210010 X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Segher, on 2022/11/18 23:10, Segher Boessenkool wrote: > Hi! > > On Thu, Nov 17, 2022 at 02:59:00PM +0800, Kewen.Lin wrote: >> on 2022/11/17 02:44, Segher Boessenkool wrote: >>> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote: >>>> * config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove >>>> float only comparison operators. >>> >>> Why? Is that correct? Your mail says nothing about this :-( >>> >>> Is there any testcase that covers this, and that shows things still >>> generate the same code? >>> >> >> Sorry for the unclear description, I thought mistakenly that it's >> probably straightforward. >> >> With the change in this patch, all 14 vector float comparison operators >> (unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt) >> would be handled early in rs6000_emit_vector_compare. >> >> For unordered/ordered/ltgt/uneq, the new way is exactly the same >> as what we do in rs6000_emit_vector_compare_inner, it means there is >> no chance to get into rs6000_emit_vector_compare_inner with any of them. > > Ah! In that case, please add an assert there. It helps catch problems, > but much more importantly even, if helps the reader understand what is > going on :-) Good idea, will do. > >> For eq/ge/gt, it's the same too, but they are shared with vector integer >> comparison, I just left them alone here. Just noticed we can remove ge >> safely too as it's guarded with !MODE_VECTOR_INT. > > ge is nasty for float, it means something different with and without > -ffast-math (with fast-math ge means not lt, le means not gt; both can > be done with a simple single condition, no cror needed. (Compare to ne > which is the same with and without -ffast-math, that is because it has a > "not" in its definition!) > It's true for scalar float comparison, but the context here is for vector comparison, the result of comparison is still vector (of boolean), and we have the corresponding vector comparison instruction for ge, so I think it should be fine here. >> For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code >> with reverse_condition_maybe_unordered and invert the result, it's the >> same as what we have in vector.md. >> >> ; unge(a,b) = ~lt(a,b) >> ; unle(a,b) = ~gt(a,b) >> ; ne(a,b) = ~eq(a,b) >> ; ungt(a,b) = ~le(a,b) >> ; unlt(a,b) = ~ge(a,b) > > But for these last two do we generate identical code still? Since > forever we have only use cror here (with CCEQ), not crnor etc. (and will > CCEQ still do the correct thing always then?) For ge (~ge), yes; while for le (~le), it's not, as explained previously below. > >> Then eq/ge/gt on the right side would match the cases that were mentioned >> above. So we just need to focus on lt and le then. >> >> For lt, rs6000_emit_vector_compare swaps operands and the operator to gt, >> it's the same as what we have in vector.md: >> >> ; lt(a,b) = gt(b,a) >> >> , and further matches the case mentioned above. >> >> As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq, >> and further handle lt recursively, that is: >> le = lt(a,b) || eq(a,b) >> = gt(b,a) || eq(a,b) >> >> actually this is worse than what vector.md supports: >> >> ; le(a,b) = ge(b,a) >> >> In short, the function rs6000_emit_vector_compare_inner is only called by >> twice in rs6000_emit_vector_compare, there is no chance to enter >> rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq >> any more, I think it's safe to make the change in function >> rs6000_emit_vector_compare_inner. Besides, the proposed way to handle >> vector float comparison can improve slightly for UNGT and LE handlings. > > Thanks for the explanation! > > Can you do this in multiple steps, which will make it much easier to > review, and to spot the problem if some unexpected problem shows up? Sure, I'll try my best to separate it into some steps and show how it evolves gradually. > >> I constructed a test case, compiled with option -O2 -ftree-vectorize >> -fno-vect-cost-model on ppc64le, which goes into this function >> rs6000_emit_vector_compare with all 14 vector float comparison codes, >> the assembly of most functions doesn't change after this patch, >> excepting for test_UNGT_{float,double} and test_LE_{float,double}. > > For, this is a separate change, a separate and the other patches will > show no changes in generated code at all. Good point, will separate it. > >> Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double} >> and scan not xvcmp{gt,eq}[sd]p. > > In the patch that changes code gen for those, sure :-) > Thanks for all the comments again. BR, Kewen