From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 9647C3858CDA for ; Mon, 25 Sep 2023 06:09:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9647C3858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38P5eGM5006776; Mon, 25 Sep 2023 06:09:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=J4wECEW703sTh4GWKxYbrWYAnVKl32a3DeRD/Aml9ko=; b=WLhPMuEMYxJU0deDypJLa8LoR3gEU6Ie2qwRN4L9mUoPrZOoLB0UABQHLm7KvtUcVBH1 uQLIQNffoPoAoUpW1wYtir4mR+XjgCk3u/t9hudZNHPnQlByH1z6+kN79IhB1Fj8Nu6B HbxZq405XmKCWxJLl5/bhgp/S91kxHpBIhmtqPWa6XDBs2/tIFNYaWDjZYo3H4HuAxZf K/Chk7ueZyQCEGdOb55JDR6ubY6Y3OKgmrprAcGaGNto4QDX5TULJOcBQGAJejlXZUZ+ AqpO7GlUGAzpkP99pV1CHvWngnGaljOOoEbSjrvKTGSg2KmMaNIRrfUf713291wFr692 7g== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ta79qh746-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Sep 2023 06:09:51 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38P5g91Q012723; Mon, 25 Sep 2023 06:09:50 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ta79qh73q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Sep 2023 06:09:50 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38P37LxH008192; Mon, 25 Sep 2023 06:09:49 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3tabbmqwrj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Sep 2023 06:09:49 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38P69kuR13435472 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 25 Sep 2023 06:09:46 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1181120040; Mon, 25 Sep 2023 06:09:46 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 91C9420043; Mon, 25 Sep 2023 06:09:42 +0000 (GMT) Received: from [9.197.230.166] (unknown [9.197.230.166]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 25 Sep 2023 06:09:42 +0000 (GMT) Message-ID: Date: Mon, 25 Sep 2023 14:09:40 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449] Content-Language: en-US To: HAO CHEN GUI Cc: Segher Boessenkool , David , Peter Bergner , gcc-patches , Richard Biener , Richard Sandiford , Jeff Law , Jakub Jelinek References: From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: sbqOdkZoeIxrm4DyFpfb6fVuHVOI9CKU X-Proofpoint-ORIG-GUID: stDbS2jP_0DQkQ8o8AAAGoMZTbYP4nhG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-09-25_03,2023-09-21_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 malwarescore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 bulkscore=0 adultscore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1011 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2309180000 definitions=main-2309250041 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, on 2023/9/20 16:49, HAO CHEN GUI wrote: > Hi, > This patch enables vector compare for 16-byte memory equality compare. > The 16-byte memory equality compare can be efficiently implemented by > instruction "vcmpequb." It reduces one branch and one compare compared > with two 8-byte compare sequence. It looks nice to exploit vcmpequb. for this comparison. > > 16-byte vector compare is not enabled on 32bit sub-targets as TImode > hasn't been supported well on 32bit sub-targets. But it sounds weird to say it is with TImode but the underlying instruction is V16QImode. This does NOT necessarily depend on TImode, so if it's coded with V16QImode it would not suffer this unsupported issue. The reason why you hacked with TImode seems that the generic part of code only considers the scalar mode? I wonder if we can extend the generic code to consider the vector mode as well. It also makes thing better if we will have wider vector mode one day. I guess there is no blocking/limitation for not considering vector modes? CC some experts. BR, Kewen > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > > Thanks > Gui Haochen > > ChangeLog > rs6000: Enable vector compare for 16-byte memory equality compare > > gcc/ > PR target/111449 > * config/rs6000/altivec.md (cbranchti4): New expand pattern. > * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn > sequence for TImode vector equality compare. > * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define. > (COMPARE_MAX_PIECES): Define. > > gcc/testsuite/ > PR target/111449 > * gcc.target/powerpc/pr111449.c: New. > > patch.diff > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index e8a596fb7e9..99264235cbe 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -2605,6 +2605,24 @@ (define_insn "altivec_vupklpx" > } > [(set_attr "type" "vecperm")]) > > +(define_expand "cbranchti4" > + [(use (match_operator 0 "equality_operator" > + [(match_operand:TI 1 "memory_operand") > + (match_operand:TI 2 "memory_operand")])) > + (use (match_operand 3))] > + "VECTOR_UNIT_ALTIVEC_P (V16QImode)" > +{ > + rtx op1 = simplify_subreg (V16QImode, operands[1], TImode, 0); > + rtx op2 = simplify_subreg (V16QImode, operands[2], TImode, 0); > + operands[1] = force_reg (V16QImode, op1); > + operands[2] = force_reg (V16QImode, op2); > + rtx_code code = GET_CODE (operands[0]); > + operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], > + operands[2]); > + rs6000_emit_cbranch (TImode, operands); > + DONE; > +}) > + > ;; Compare vectors producing a vector result and a predicate, setting CR6 to > ;; indicate a combined status > (define_insn "altivec_vcmpequ_p" > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index efe9adce1f8..c6b935a64e7 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode) > else > emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b)); > } > + else if (mode == TImode) > + { > + gcc_assert (code == EQ || code == NE); > + > + rtx result_vector = gen_reg_rtx (V16QImode); > + compare_result = gen_rtx_REG (CCmode, CR6_REGNO); > + emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1)); > + code = (code == NE) ? GE : LT; > + } > else > emit_insn (gen_rtx_SET (compare_result, > gen_rtx_COMPARE (comp_mode, op0, op1))); > diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h > index 3503614efbd..dc33bca0802 100644 > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -1730,6 +1730,8 @@ typedef struct rs6000_args > in one reasonably fast instruction. */ > #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8) > #define MAX_MOVE_MAX 8 > +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) > +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16) > > /* Nonzero if access to memory by bytes is no faster than for words. > Also nonzero if doing byte operations (specifically shifts) in registers > diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c > new file mode 100644 > index 00000000000..ab9583f47bb > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-maltivec -O2" } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +/* Ensure vector comparison is used for 16-byte memory equality compare. */ > + > +int compare (const char* s1, const char* s2) > +{ > + return __builtin_memcmp (s1, s2, 16) == 0; > +} > + > +/* { dg-final { scan-assembler-times {\mvcmpequb\M} 1 } } */ > +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */