From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id B8023385801C for ; Mon, 18 Dec 2023 02:44:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B8023385801C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B8023385801C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702867458; cv=none; b=WyME6MtfAWbAdDVn7ruzbjiC/eddw3yVjq1e21dlB0ECqJGKlIfOhXa1BulZixUKArdFisYmDRpJvSvX+keJdg5Fo621hO5PDpzhMIdthUJ9Oouq0m+5GbWk7r6egCDwTbcqWPKHBnNfn5AYO1tH95dfSR2nE9Pp6uTJXMFKL7s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702867458; c=relaxed/simple; bh=0m/oZVMdnYgOb0+KfwH8S7felYj65B61bjKuEX7JZZs=; h=DKIM-Signature:Message-ID:Date:To:From:Subject:MIME-Version; b=fIvUCvdENC/5oV0EbKrcaF/OEyGJdpFbHBcL68UzJmtDXnN1h4UumXSzVtUN1iF5Bz7OhMErHy2Kvf2nHRSScQUNY08LAZAkGG1CDPTcXxXHtJLSOokUgQVwwoL66LbT71Lhq0teCIprj5Zd2dBH3JmY3wqg1aK2aTjSWwD0NGg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3BI2IPFU032767; Mon, 18 Dec 2023 02:44:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : to : cc : from : subject : content-type : content-transfer-encoding : mime-version; s=pp1; bh=j0RtWBBbZ8IrmNWdC8b+PHseh1otN2mVd3C0JtOlAC8=; b=EgWDRXYe7OIqqcowXrKsYZOVejnCN30gwl/ggyD+lEzYQbBwP7ItQFl3VtXupr2o4aeW 4pf0MuBg3/ggjPp5fQAD3dcU0don++qPqErXn4QL/pORTI7SMS+5+q6PNllgeC6rKX/Q nkp6JfUFJIbcWFkhcATViADzJBH8da/rt16Yk3taOpTYVZc8mZMEFkBwuGQaqMK9xya1 0+ifR68MrTJhp11gp/J6V39gCwnC4RMSQsiLMM9At3RBfc+wKiWEpG6zkVNg7k7HkQob SaKG1cGfYMJCUxMeMN9MMAyiZ+ldMLLTQGSNWIpxEYFrFjPYYM1ireCtExrLif9Q1rcI pw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3v2d61rjrx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Dec 2023 02:44:16 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3BI2W3v6007310; Mon, 18 Dec 2023 02:44:15 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3v2d61rjrr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Dec 2023 02:44:15 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3BI20wPM004800; Mon, 18 Dec 2023 02:44:14 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3v1pkyeauh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Dec 2023 02:44:14 +0000 Received: from smtpav03.fra02v.mail.ibm.com (smtpav03.fra02v.mail.ibm.com [10.20.54.102]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3BI2iBS8655992 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 18 Dec 2023 02:44:11 GMT Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C27A020043; Mon, 18 Dec 2023 02:44:11 +0000 (GMT) Received: from smtpav03.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 00C6120040; Mon, 18 Dec 2023 02:44:10 +0000 (GMT) Received: from [9.197.250.133] (unknown [9.197.250.133]) by smtpav03.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 18 Dec 2023 02:44:09 +0000 (GMT) Message-ID: <15d40d24-f546-4351-9bed-e99b503ec1b9@linux.ibm.com> Date: Mon, 18 Dec 2023 10:44:09 +0800 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner From: HAO CHEN GUI Subject: [Patchv2, rs6000] Clean up pre-checkings of expand_block_compare Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: nXoCep-fZGowu9hlcnng7P-S2mKexWMU X-Proofpoint-ORIG-GUID: bs9F5sqDuDrKHzwIIOO_E2qXcmYQ66la Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-17_10,2023-12-14_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 phishscore=0 mlxscore=0 clxscore=1015 spamscore=0 impostorscore=0 adultscore=0 mlxlogscore=999 bulkscore=0 suspectscore=0 malwarescore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312180019 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the performance of expand is better than the performance of library when the length is long. Compared to last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640082.html the main change is to add some comments and move the variable definition closed to its use. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Clean up the pre-checkings of expand_block_compare gcc/ * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert only P7 above can enter this function. Return false (call library) when it's optimized for size. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by the checking of targetm.slow_unaligned_access on word_mode. Also performance test shows the expand of block compare with 16 bytes to 64 bytes length is better than library on P7 BE. gcc/testsuite/ * gcc.target/powerpc/block-cmp-3.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index cb9eeef05d8..49670cef4d7 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1946,36 +1946,32 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { - rtx target = operands[0]; - rtx orig_src1 = operands[1]; - rtx orig_src2 = operands[2]; - rtx bytes_rtx = operands[3]; - rtx align_rtx = operands[4]; + /* TARGET_POPCNTD is already guarded at expand cmpmemsi. */ + gcc_assert (TARGET_POPCNTD); - /* This case is complicated to handle because the subtract - with carry instructions do not generate the 64-bit - carry and so we must emit code to calculate it ourselves. - We choose not to implement this yet. */ - if (TARGET_32BIT && TARGET_POWERPC64) + if (optimize_insn_for_size_p ()) return false; - bool isP7 = (rs6000_tune == PROCESSOR_POWER7); - /* Allow this param to shut off all expansion. */ if (rs6000_block_compare_inline_limit == 0) return false; - /* targetm.slow_unaligned_access -- don't do unaligned stuff. - However slow_unaligned_access returns true on P7 even though the - performance of this code is good there. */ - if (!isP7 - && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) - || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))) + /* This case is complicated to handle because the subtract + with carry instructions do not generate the 64-bit + carry and so we must emit code to calculate it ourselves. + We choose not to implement this yet. */ + if (TARGET_32BIT && TARGET_POWERPC64) return false; - /* Unaligned l*brx traps on P7 so don't do this. However this should - not affect much because LE isn't really supported on P7 anyway. */ - if (isP7 && !BYTES_BIG_ENDIAN) + rtx target = operands[0]; + rtx orig_src1 = operands[1]; + rtx orig_src2 = operands[2]; + rtx bytes_rtx = operands[3]; + rtx align_rtx = operands[4]; + + /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ + if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))) return false; /* If this is not a fixed size compare, try generating loop code and @@ -2023,14 +2019,6 @@ expand_block_compare (rtx operands[]) if (!IN_RANGE (bytes, 1, max_bytes)) return expand_compare_loop (operands); - /* The code generated for p7 and older is not faster than glibc - memcmp if alignment is small and length is not short, so bail - out to avoid those conditions. */ - if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx)) - && ((base_align == 1 && bytes > 16) - || (base_align == 2 && bytes > 32))) - return false; - rtx final_label = NULL; if (use_vec) diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c new file mode 100644 index 00000000000..c7e853ad593 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } } */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 4); +}