From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C8F4B3858D1E for ; Mon, 26 Feb 2024 02:25:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C8F4B3858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C8F4B3858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708914348; cv=none; b=dLOFNsydcNfZDVxtmcXKvrFgkenmKhftsd3pDusNSWUpb1Lpwf/loEULAnhdJBredi+bJjZ/tIOcqTqqxKugMNrLpPpaOMWfn1nkvneETHiZPMWt4MyyFrEtvc71iebMaj7T1IOzVmHbkAGAvzxQhV4VMFd9PqCA1cgrDSojYdo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708914348; c=relaxed/simple; bh=bIarQdjIhb7SOR2f+byxxQncgGkLOy2Z2e3vAm0LEfs=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=KbL2K79tBfkxNwBxplQs0PLlwbl8uNDW/W4Rpf05P0uCySiSk9hROnxOwPeaQ+/KgYbbhpIuqwvYaHhlXrZP8jb0W3oJHGrubyJbmUMrSXmAGESq6PMS0whWmFYkA71BY8ZQe6fWIwov3xAyWHmthYU3qMsPK0ME+b12mSqhrlE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41PM1VgY000786; Mon, 26 Feb 2024 02:25:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : cc : from : subject : content-type : content-transfer-encoding; s=pp1; bh=9LomvRs2TCifqXBlu/Aq6DWZOKT0GvJxxNDilUwUiUI=; b=QoTfL5IgVRnZL1vGxfYe5a5eAOmyZLFHc5U1nIcwtoOHhOGYaxQ6gNARD1hCcahIx0Ty qAz947b5Q3fsgeHi+30QRc3rl2o/3ShTl/MhIBrIBis3PFtWXvh9eLiZbrrXZ8VkwKYf tucM3x4XG1NnaMx9QkYxYDD7px5Hi2I6kSQJCgfmsn8zMNYwOytnJFvojZLBOuoAPheA KMF+s6DbyeLAoGM99Vo01oM1kKD1AJ6/y+ROeG6HYoI2EC184Z/CHtkwHjUdN5HFMysr +jWK4AYECBLZFXNJKDvSSmoTs+WxultP6zCd3HTRGsJilX9jdDFzwd7lbrtjQnFaQ6C+ Vg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wg0bgw0s4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Feb 2024 02:25:45 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 41Q2DhoY025626; Mon, 26 Feb 2024 02:25:45 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wg0bgw0ry-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Feb 2024 02:25:45 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41Q0xm8Z008802; Mon, 26 Feb 2024 02:25:44 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3wftst6few-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Feb 2024 02:25:44 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41Q2PdG420054706 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Feb 2024 02:25:41 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3E2832004D; Mon, 26 Feb 2024 02:25:39 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5914020040; Mon, 26 Feb 2024 02:25:37 +0000 (GMT) Received: from [9.197.226.11] (unknown [9.197.226.11]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 26 Feb 2024 02:25:36 +0000 (GMT) Message-ID: Date: Mon, 26 Feb 2024 10:25:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner From: HAO CHEN GUI Subject: [Patch, rs6000] Enable overlap memory store for block memory clear Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: NYEJvQi9VhHe505vqeEj4zzqnPc0BuRv X-Proofpoint-GUID: 4cNn3NojiSybMRGTuwHFpLHFCKZbVMwU X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-26_01,2024-02-23_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 mlxscore=0 adultscore=0 clxscore=1011 lowpriorityscore=0 malwarescore=0 spamscore=0 mlxlogscore=999 phishscore=0 impostorscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402260016 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, This patch enables overlap memory store for block memory clear which saves the number of store instructions. The expander calls widest_fixed_size_mode_for_block_clear to get the mode for looped block clear and calls widest_fixed_size_mode_for_block_clear to get the mode for last overlapped clear. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk or next stage 1? Thanks Gui Haochen ChangeLog rs6000: Enable overlap memory store for block memory clear gcc/ * config/rs6000/rs6000-string.cc (widest_fixed_size_mode_for_block_clear): New. (smallest_fixed_size_mode_for_block_clear): New. (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to get the mode for looped memory stores and call smallest_fixed_size_mode_for_block_clear to get the mode for the last overlapped memory store. gcc/testsuite * gcc.target/powerpc/block-clear-1.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 133e5382af2..c2a6095a586 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -38,6 +38,49 @@ #include "profile-count.h" #include "predict.h" +/* Return the widest mode which mode size is less than or equal to the + size. */ +static fixed_size_mode +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int align, + bool unaligned_vsx_ok) +{ + machine_mode mode; + + if (TARGET_ALTIVEC + && size >= 16 + && (align >= 128 + || unaligned_vsx_ok)) + mode = V4SImode; + else if (size >= 8 + && TARGET_POWERPC64 + && (align >= 64 + || !STRICT_ALIGNMENT)) + mode = DImode; + else if (size >= 4 + && (align >= 32 + || !STRICT_ALIGNMENT)) + mode = SImode; + else if (size >= 2 + && (align >= 16 + || !STRICT_ALIGNMENT)) + mode = HImode; + else + mode = QImode; + + return as_a (mode); +} + +/* Return the smallest mode which mode size is smaller than or eqaul to + the size. */ +static fixed_size_mode +smallest_fixed_size_mode_for_block_clear (unsigned int size) +{ + if (size > UNITS_PER_WORD) + return as_a (V4SImode); + + return smallest_int_mode_for_size (size * BITS_PER_UNIT); +} + /* Expand a block clear operation, and return 1 if successful. Return 0 if we should let the compiler generate normal code. @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[]) HOST_WIDE_INT align; HOST_WIDE_INT bytes; int offset; - int clear_bytes; int clear_step; /* If this is not a fixed size move, just call memcpy */ @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[]) bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); - for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) + auto mode = widest_fixed_size_mode_for_block_clear (bytes, align, + unaligned_vsx_ok); + offset = 0; + rtx dest; + + do { - machine_mode mode = BLKmode; - rtx dest; + unsigned int size = GET_MODE_SIZE (mode); - if (TARGET_ALTIVEC - && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok))) + while (bytes >= size) { - clear_bytes = 16; - mode = V4SImode; - } - else if (bytes >= 8 && TARGET_POWERPC64 - && (align >= 64 || !STRICT_ALIGNMENT)) - { - clear_bytes = 8; - mode = DImode; - if (offset == 0 && align < 64) - { - rtx addr; + dest = adjust_address (orig_dest, mode, offset); + emit_move_insn (dest, CONST0_RTX (mode)); - /* If the address form is reg+offset with offset not a - multiple of four, reload into reg indirect form here - rather than waiting for reload. This way we get one - reload, not one per store. */ - addr = XEXP (orig_dest, 0); - if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM) - && CONST_INT_P (XEXP (addr, 1)) - && (INTVAL (XEXP (addr, 1)) & 3) != 0) - { - addr = copy_addr_to_reg (addr); - orig_dest = replace_equiv_address (orig_dest, addr); - } - } - } - else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT)) - { /* move 4 bytes */ - clear_bytes = 4; - mode = SImode; - } - else if (bytes >= 2 && (align >= 16 || !STRICT_ALIGNMENT)) - { /* move 2 bytes */ - clear_bytes = 2; - mode = HImode; - } - else /* move 1 byte at a time */ - { - clear_bytes = 1; - mode = QImode; + offset += size; + bytes -= size; } - dest = adjust_address (orig_dest, mode, offset); + if (bytes == 0) + return 1; - emit_move_insn (dest, CONST0_RTX (mode)); + mode = smallest_fixed_size_mode_for_block_clear (bytes); + int gap = GET_MODE_SIZE (mode) - bytes; + if (gap > 0) + { + offset -= gap; + bytes += gap; + } } - - return 1; + while (1); } /* Figure out the correct instructions to generate to load data for diff --git a/gcc/testsuite/gcc.target/powerpc/block-clear-1.c b/gcc/testsuite/gcc.target/powerpc/block-clear-1.c new file mode 100644 index 00000000000..5e16c44fea3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-clear-1.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not {\mst[hb]\M} } } */ + +/* Verify that memclear takes overlap store. */ +void* foo (char* s1) +{ + __builtin_memset (s1, 0, 31); +}