From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4fPJ=GU=linux.ibm.com=linkw@sourceware.org>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1])
	by sourceware.org (Postfix) with ESMTPS id C83A13858401
	for <gcc-patches@gcc.gnu.org>; Tue,  7 Nov 2023 02:40:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C83A13858401
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C83A13858401
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699324837; cv=none;
	b=UY9ab4poMgldpV3Zq5NyHVt4bPSR6/MVjaPuy5N0ZgKfJaLwJSKoik7P1qNbPqQLjzTFMHNdhuh31VbNA08wcY7lDaQW7SQ0un5B6b9Y8jlCoiB6Z/zqoonGgNF4j40lAJC3gUIwpeXHtmJrAa/Xr0ogJRVuiz8QKophHxI/XhE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1699324837; c=relaxed/simple;
	bh=tsv3/DPv7J2MrBo3lMpGsghOrRsv564KVOSudon9Lsg=;
	h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=p1hbF5DWuLCEN54HTLhYsQMfAaRMfGwZJoAIsEVvir8bX4EifU+AP7y09MMb8Ufv/x7MtZHAYq5x2iA2bmkyQIJqQTs4BYZ/jDKuQKohE4OHP5WYtKp/RXzED9U7VfdsBg4nS2znPcB7a0IrbJRDhaIciQGcXlyxZOlEjBlfscg=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from pps.filterd (m0353727.ppops.net [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A71PoFO020820;
	Tue, 7 Nov 2023 02:40:34 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date :
 mime-version : subject : to : cc : references : from : in-reply-to :
 content-type : content-transfer-encoding; s=pp1;
 bh=vli6TfzV5NKeVFxVWWDBbazUBQRAAfhzSEEPH/hEYyM=;
 b=V2E6c93rKox+s5nYrMzwhiw2lbJBqEIzfj98YFwpNJUZ9mMDaySudqDPE02sGQV9++8E
 ny+adscBiTc24kzH9uiTyaJbTm3dqmeCgoEb1lHNjPeeCIll4eFIZfNC2ExntWLrHdYb
 CSeqZhbIzdfPhfKUL8C2WFr2Rm//q7j+ZPE373qk8YBp1BHLUHyetjfj6h4tFuvEkQ76
 gzAntatEME+lAo3XTTKJjnluley+sNM7SvyIfI2l7TqB4o33h3wu6ZVSQ78ea+a5rfZ0
 s4bg2myXGuAIZI8tD1WCoGaZLWo0/TllmDmG2n8xkmEl+2THFWAI+sEuror1ww6PGfLi sg== 
Received: from pps.reinject (localhost [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u76ykrbj4-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 07 Nov 2023 02:40:34 +0000
Received: from m0353727.ppops.net (m0353727.ppops.net [127.0.0.1])
	by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3A72eX1A003213;
	Tue, 7 Nov 2023 02:40:33 GMT
Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u76ykrbhy-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 07 Nov 2023 02:40:33 +0000
Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1])
	by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A72Cvqn028230;
	Tue, 7 Nov 2023 02:40:33 GMT
Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227])
	by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u62gjwaj0-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 07 Nov 2023 02:40:32 +0000
Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102])
	by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A72eUYw13042380
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Tue, 7 Nov 2023 02:40:30 GMT
Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 28A492004F;
	Tue,  7 Nov 2023 02:40:30 +0000 (GMT)
Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id B2D5520040;
	Tue,  7 Nov 2023 02:40:27 +0000 (GMT)
Received: from [9.177.27.218] (unknown [9.177.27.218])
	by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP;
	Tue,  7 Nov 2023 02:40:27 +0000 (GMT)
Message-ID: <f2003443-435a-f154-b6cb-50f2bb1be5c5@linux.ibm.com>
Date: Tue, 7 Nov 2023 10:40:25 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
 Gecko/20100101 Thunderbird/91.6.1
Subject: Re: [PATCH-2, rs6000] Enable vector mode for by pieces equality
 compare [PR111449]
Content-Language: en-US
To: HAO CHEN GUI <guihaoc@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
        David
 <dje.gcc@gmail.com>, Peter Bergner <bergner@linux.ibm.com>,
        gcc-patches <gcc-patches@gcc.gnu.org>
References: <92676187-37ed-4d1f-aad1-c8eb4c938fa5@linux.ibm.com>
From: "Kewen.Lin" <linkw@linux.ibm.com>
In-Reply-To: <92676187-37ed-4d1f-aad1-c8eb4c938fa5@linux.ibm.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: gSixQUQJp6DTngZamr4FornNrzcNuIaX
X-Proofpoint-ORIG-GUID: Vi_k_Ov7MXNBe38fopvz-jqhkr6g8jnG
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26
 definitions=2023-11-06_15,2023-11-02_03,2023-05-22_02
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 clxscore=1015
 phishscore=0 lowpriorityscore=0 spamscore=0 suspectscore=0 mlxlogscore=999
 mlxscore=0 impostorscore=0 adultscore=0 malwarescore=0 priorityscore=1501
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310240000
 definitions=main-2311070021
X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi Haochen,

on 2023/11/6 10:36, HAO CHEN GUI wrote:
> Hi,
>   This patch enables vector mode for by pieces equality compare. It
> adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES
> and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare
> relies both move and compare instructions, so both macro are changed.
> The vector load/store might be unaligned, so the 16-byte move and
> compare are only enabled when p8 vector enabled (TARGET_VSX +
> TARGET_EFFICIENT_UNALIGNED_VSX).
> 
>   This patch enables 16 byte by pieces move. As the vector mode is not
> enabled for by pieces move, TImode is used for the move. It caused some
> regression cases. I drafted the third patch to fix them.

Could you also list the failures if the total number is small?

> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Enable vector mode for by pieces equality compare
> 
> This patch adds a new expand pattern - cbranchv16qi4 to enable vector
> mode by pieces equality compare on rs6000.  The macro MOVE_MAX_PIECES
> (COMPARE_MAX_PIECES) is set to 16 bytes when P8 vector is enabled,
> otherwise keeps unchanged.  The macro STORE_MAX_PIECES is set to the
> same value as MOVE_MAX_PIECES by default, so now it's explicitly
> defined and keeps unchanged.
> 
> gcc/
> 	PR target/111449
> 	* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
> 	* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate
> 	insn sequence for V16QImode equality compare.
> 	* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
> 	(STORE_MAX_PIECES): Define.
> 
> gcc/testsuite/
> 	PR target/111449
> 	* gcc.target/powerpc/pr111449-1.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index e8a596fb7e9..d0937f192d6 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2605,6 +2605,45 @@ (define_insn "altivec_vupklpx"
>  }
>    [(set_attr "type" "vecperm")])
> 
> +/* The cbranch_optabs doesn't allow FAIL, so altivec load/store
> +   instructions are disabled as the cost is high for unaligned
> +   load/store.  */
> +(define_expand "cbranchv16qi4"
> +  [(use (match_operator 0 "equality_operator"
> +	[(match_operand:V16QI 1 "reg_or_mem_operand")
> +	 (match_operand:V16QI 2 "reg_or_mem_operand")]))
> +   (use (match_operand 3))]
> +  "VECTOR_MEM_VSX_P (V16QImode)
> +   && TARGET_EFFICIENT_UNALIGNED_VSX"
> +{
> +  if (!TARGET_P9_VECTOR
> +      && !BYTES_BIG_ENDIAN

Nit: If these two aim to match the below comment "P8 little endian", it
seems easier to read it with "TARGET_P8_VECTOR && !BYTES_BIG_ENDIAN".

> +      && MEM_P (operands[1])
> +      && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)

Need a comment on why it checks altivec_indexed_or_indirect_operand, it's
not obvious.

> +      && MEM_P (operands[2])
> +      && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
> +    {
> +      /* Use direct move for P8 little endian to skip bswap, as the byte
> +	 order doesn't matter for equality compare.  */
> +      rtx reg_op1 = gen_reg_rtx (V16QImode);
> +      rtx reg_op2 = gen_reg_rtx (V16QImode);
> +      rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
> +      rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
> +      operands[1] = reg_op1;
> +      operands[2] = reg_op2;
> +    }
> +  else
> +    {
> +      operands[1] = force_reg (V16QImode, operands[1]);
> +      operands[2] = force_reg (V16QImode, operands[2]);
> +    }
> +
> +  rtx_code code = GET_CODE (operands[0]);
> +  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
> +  rs6000_emit_cbranch (V16QImode, operands);
> +  DONE;
> +})
> +
>  ;; Compare vectors producing a vector result and a predicate, setting CR6 to
>  ;; indicate a combined status
>  (define_insn "altivec_vcmpequ<VI_char>_p"
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index cc24dd5301e..10279052636 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -15472,6 +15472,18 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
>  	  else
>  	    emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
>  	}
> +      else if (mode == V16QImode)
> +	{
> +	  gcc_assert (code == EQ || code == NE);
> +
> +	  rtx result_vector = gen_reg_rtx (V16QImode);
> +	  rtx cc_bit = gen_reg_rtx (SImode);
> +	  emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
> +	  emit_insn (gen_cr6_test_for_lt (cc_bit));
> +	  emit_insn (gen_rtx_SET (compare_result,
> +				  gen_rtx_COMPARE (comp_mode, cc_bit,
> +						   const1_rtx)));
> +	}
>        else
>  	emit_insn (gen_rtx_SET (compare_result,
>  				gen_rtx_COMPARE (comp_mode, op0, op1)));
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 22595f6ebd7..51441825e20 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -1730,6 +1730,8 @@ typedef struct rs6000_args
>     in one reasonably fast instruction.  */
>  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>  #define MAX_MOVE_MAX 8
> +#define MOVE_MAX_PIECES (TARGET_P8_VECTOR ? 16 : (TARGET_POWERPC64 ? 8 : 4))

But the cbranchv16qi4 condition uses TARGET_EFFICIENT_UNALIGNED_VSX which can be
separately disabled but TARGET_P8_VECTOR still enabled?  So I think we should
keep both consistent.

> +#define STORE_MAX_PIECES (TARGET_POWERPC64 ? 8 : 4)
> 
>  /* Nonzero if access to memory by bytes is no faster than for words.
>     Also nonzero if doing byte operations (specifically shifts) in registers
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-1.c b/gcc/testsuite/gcc.target/powerpc/pr111449-1.c
> new file mode 100644
> index 00000000000..d8583d3303b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr111449-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { has_arch_pwr8 } } } */

Nit: has_arch_pwr8 would make it un-tested on Power7 default env,
I'd prefer to remove this "has_arch_pwr8" and append
"-mdejagnu-cpu=power8" to dg-options.

Otherwise LGTM.

BR,
Kewen

> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mvsx -O2" } */
> +
> +/* Ensure vector mode is used for 16-byte memory equality compare.  */
> +
> +int compare1 (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 16) == 0;
> +}
> +
> +int compare2 (const char* s1)
> +{
> +  return __builtin_memcmp (s1, "0123456789012345", 16) == 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mvcmpequb\.} 2 } } */
> +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */