From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 690453858C35 for ; Tue, 27 Feb 2024 13:19:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 690453858C35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=linux.vnet.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 690453858C35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709039994; cv=none; b=sLyJK2Sx71sRvBpCE9DPABnZGMUyyGD1xzBQ4itmsTh/XiRnmpoNCVKQ7A6tCbi6yKpFByM0KQ/FRE/fmUwZHQJ93Kt02CUXn67hLW3GMaNCFk8+pMsYGtiqPVFY7QQj3o2CGbqnk+Ct8IgJ4O1SWsQSgu0CSSG/Q9OO4m3CUg0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709039994; c=relaxed/simple; bh=cYyVd92KdEl6chd5X5t1SKbjp2UQNPdu0oB4uu6F/Lo=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:From:To; b=UPKqd0y9k2Gcf146GVuWXghIT4TUtSvCzyyjeeNN+tqgqWXCZ1n0ERPeERhAgaQQTh7Q1kgJFAVd/EM2seWavhOKMn8gwM0s1Fc4WMzsGRLAsusiC2/zcEnK0b7FLYc0f9DsH24yCI+rodrG42O9DhjcSwKJwnVR2gJkAYzr0VQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41RD3VtK030162; Tue, 27 Feb 2024 13:19:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : from : to : references : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=uBoK07NWRshDHJaAU62+DAijtbsI//3jvelPB+d6O5A=; b=IqR7udCFwh/cN7w0vX8cfwIEBRFohRXeWN4cD0DMYXWuJMUiHvGZcbI4yPOl6kSvWnC2 4wAvswYOg+58oSKbxTiI2k1Gchsj5rDrHSk/rkRWAypMzJge0xLoeIrh6AVbqfwQlav8 IjPb7FAZIXCu8CX6u36CglhqE6GZtBXgj2i7JNXJw7LT0k3o/ThhfryPrS4gANqmxa55 g73x6+7ypx/YyRi4FblJRdW5ASdZnpj6lm5PSR08r8o/hjdAhaOjGvuA5x0DiVMMHO+l jw34eLqPhsljpiqOhe8dZCtebsDDmka+PvJuoeJIqtToiW4z2QwEqTPN0zg4vtB5gDDx Fw== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3whg1x0wvv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 13:19:50 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41RATYmA008142; Tue, 27 Feb 2024 13:19:49 GMT Received: from smtprelay05.wdc07v.mail.ibm.com ([172.16.1.72]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3wfv9m7pcu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 13:19:49 +0000 Received: from smtpav05.dal12v.mail.ibm.com (smtpav05.dal12v.mail.ibm.com [10.241.53.104]) by smtprelay05.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41RDJkq959441512 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 27 Feb 2024 13:19:48 GMT Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 69D4958056; Tue, 27 Feb 2024 13:19:46 +0000 (GMT) Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 19E1458052; Tue, 27 Feb 2024 13:19:45 +0000 (GMT) Received: from [9.61.124.44] (unknown [9.61.124.44]) by smtpav05.dal12v.mail.ibm.com (Postfix) with ESMTP; Tue, 27 Feb 2024 13:19:44 +0000 (GMT) Message-ID: <91af8479-93bf-4137-ad02-512855439aa8@linux.vnet.ibm.com> Date: Tue, 27 Feb 2024 18:49:43 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PING 4][PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770] From: Surya Kumari Jangala To: Segher Boessenkool , GCC Patches , Peter Bergner References: <73f178e8-a38f-4a51-933c-27e2cf961cb4@linux.vnet.ibm.com> <95fb9ff4-c68e-4e6f-81d0-ee482c059e1b@linux.vnet.ibm.com> Content-Language: en-US In-Reply-To: <95fb9ff4-c68e-4e6f-81d0-ee482c059e1b@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: D_CUtLtPhp60Ts3TzD7vQOLav6QZDEVP X-Proofpoint-GUID: D_CUtLtPhp60Ts3TzD7vQOLav6QZDEVP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-26_11,2024-02-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 spamscore=0 clxscore=1011 adultscore=0 bulkscore=0 priorityscore=1501 mlxlogscore=999 suspectscore=0 mlxscore=0 phishscore=0 lowpriorityscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402270102 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ping On 08/01/24 11:19 am, Surya Kumari Jangala wrote: > Ping > > On 28/11/23 6:24 pm, Surya Kumari Jangala wrote: >> Ping >> >> On 10/11/23 12:27 pm, Surya Kumari Jangala wrote: >>> Ping >>> >>> On 03/11/23 1:14 pm, Surya Kumari Jangala wrote: >>>> Hi Segher, >>>> I have incorporated changes in the code as per the review comments provided by you >>>> for version 2 of the patch. Please review. >>>> >>>> Regards, >>>> Surya >>>> >>>> >>>> rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770] >>>> >>>> In the routine rs6000_analyze_swaps(), special handling of swappable >>>> instructions is done even if the webs that contain the swappable instructions >>>> are not optimized, i.e., the webs do not contain any permuting load/store >>>> instructions along with the associated register swap instructions. Doing special >>>> handling in such webs will result in the extracted lane being adjusted >>>> unnecessarily for vec_extract. >>>> >>>> Another issue is that existing code treats non-permuting loads/stores as special >>>> swappables. Non-permuting loads/stores (that have not yet been split into a >>>> permuting load/store and a swap) are handled by converting them into a permuting >>>> load/store (which effectively removes the swap). As a result, if special >>>> swappables are handled only in webs containing permuting loads/stores, then >>>> non-optimal code is generated for non-permuting loads/stores. >>>> >>>> Hence, in this patch, all webs containing either permuting loads/ stores or >>>> non-permuting loads/stores are marked as requiring special handling of >>>> swappables. Swaps associated with permuting loads/stores are marked for removal, >>>> and non-permuting loads/stores are converted to permuting loads/stores. Then the >>>> special swappables in the webs are fixed up. >>>> >>>> This patch also ensures that swappable instructions are not modified in the >>>> following webs as it is incorrect to do so: >>>> - webs containing permuting load/store instructions and associated swap >>>> instructions that are transformed by converting the permuting memory >>>> instructions into non-permuting instructions and removing the swap >>>> instructions. >>>> - webs where swap(load(vector constant)) instructions are replaced with >>>> load(swapped vector constant). >>>> >>>> 2023-09-10 Surya Kumari Jangala >>>> >>>> gcc/ >>>> PR rtl-optimization/PR106770 >>>> * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function. >>>> (handle_non_permuting_mem_insn): New function. >>>> (rs6000_analyze_swaps): Handle swappable instructions only in certain >>>> webs. >>>> (web_requires_special_handling): New instance variable. >>>> (handle_special_swappables): Remove handling of non-permuting load/store >>>> instructions. >>>> >>>> gcc/testsuite/ >>>> PR rtl-optimization/PR106770 >>>> * gcc.target/powerpc/pr106770.c: New test. >>>> --- >>>> >>>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc >>>> index 0388b9bd736..02ea299bc3d 100644 >>>> --- a/gcc/config/rs6000/rs6000-p8swap.cc >>>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc >>>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base >>>> unsigned int special_handling : 4; >>>> /* Set if the web represented by this entry cannot be optimized. */ >>>> unsigned int web_not_optimizable : 1; >>>> + /* Set if the swappable insns in the web represented by this entry >>>> + have to be fixed. Swappable insns have to be fixed in: >>>> + - webs containing permuting loads/stores and the swap insns >>>> + in such webs have been marked for removal >>>> + - webs where non-permuting loads/stores have been converted >>>> + to permuting loads/stores */ >>>> + unsigned int web_requires_special_handling : 1; >>>> /* Set if this insn should be deleted. */ >>>> unsigned int will_delete : 1; >>>> }; >>>> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, unsigned i) >>>> if (dump_file) >>>> fprintf (dump_file, "Adjusting subreg in insn %d\n", i); >>>> break; >>>> - case SH_NOSWAP_LD: >>>> - /* Convert a non-permuting load to a permuting one. */ >>>> - permute_load (insn); >>>> - break; >>>> - case SH_NOSWAP_ST: >>>> - /* Convert a non-permuting store to a permuting one. */ >>>> - permute_store (insn); >>>> - break; >>>> case SH_EXTRACT: >>>> /* Change the lane on an extract operation. */ >>>> adjust_extract (insn); >>>> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun) >>>> free (to_delete); >>>> } >>>> >>>> +/* Return true if insn is a non-permuting load/store. */ >>>> +static bool >>>> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i) >>>> +{ >>>> + return insn_entry[i].special_handling == SH_NOSWAP_LD >>>> + || insn_entry[i].special_handling == SH_NOSWAP_ST; >>>> +} >>>> + >>>> +/* Convert a non-permuting load/store insn to a permuting one. */ >>>> +static void >>>> +convert_mem_insn (swap_web_entry *insn_entry, unsigned int i) >>>> +{ >>>> + rtx_insn *insn = insn_entry[i].insn; >>>> + if (insn_entry[i].special_handling == SH_NOSWAP_LD) >>>> + permute_load (insn); >>>> + if (insn_entry[i].special_handling == SH_NOSWAP_ST) >>>> + permute_store (insn); >>>> +} >>>> + >>>> /* Main entry point for this pass. */ >>>> unsigned int >>>> rs6000_analyze_swaps (function *fun) >>>> @@ -2624,25 +2642,55 @@ rs6000_analyze_swaps (function *fun) >>>> dump_swap_insn_table (insn_entry); >>>> } >>>> >>>> - /* For each load and store in an optimizable web (which implies >>>> - the loads and stores are permuting), find the associated >>>> - register swaps and mark them for removal. Due to various >>>> - optimizations we may mark the same swap more than once. Also >>>> - perform special handling for swappable insns that require it. */ >>>> + /* There are two kinds of optimizations that can be performed on an >>>> + optimizable web: >>>> + 1. Remove the register swaps associated with permuting load/store >>>> + in an optimizable web >>>> + 2. Convert the vanilla loads/stores (that have not yet been split >>>> + into a permuting load/store and a swap) into a permuting >>>> + load/store (which effectively removes the swap) >>>> + In both the cases, swappable instructions in the webs need >>>> + special handling to fix them up. */ >>>> for (i = 0; i < e; ++i) >>>> + /* For each permuting load/store in an optimizable web, find >>>> + the associated register swaps and mark them for removal. >>>> + Due to various optimizations we may mark the same swap more >>>> + than once. */ >>>> if ((insn_entry[i].is_load || insn_entry[i].is_store) >>>> && insn_entry[i].is_swap) >>>> { >>>> swap_web_entry* root_entry >>>> = (swap_web_entry*)((&insn_entry[i])->unionfind_root ()); >>>> if (!root_entry->web_not_optimizable) >>>> - mark_swaps_for_removal (insn_entry, i); >>>> + { >>>> + mark_swaps_for_removal (insn_entry, i); >>>> + root_entry->web_requires_special_handling = true; >>>> + } >>>> } >>>> - else if (insn_entry[i].is_swappable && insn_entry[i].special_handling) >>>> + /* Convert the non-permuting loads/stores into a permuting >>>> + load/store. */ >>>> + else if (insn_entry[i].is_swappable >>>> + && non_permuting_mem_insn (insn_entry, i)) >>>> { >>>> swap_web_entry* root_entry >>>> = (swap_web_entry*)((&insn_entry[i])->unionfind_root ()); >>>> if (!root_entry->web_not_optimizable) >>>> + { >>>> + convert_mem_insn (insn_entry, i); >>>> + root_entry->web_requires_special_handling = true; >>>> + } >>>> + } >>>> + >>>> + /* Now that the webs which require special handling have been >>>> + identified, modify the instructions that are sensitive to >>>> + element order. */ >>>> + for (i = 0; i < e; ++i) >>>> + if (insn_entry[i].is_swappable && insn_entry[i].special_handling >>>> + && !non_permuting_mem_insn (insn_entry, i)) >>>> + { >>>> + swap_web_entry* root_entry >>>> + = (swap_web_entry*)((&insn_entry[i])->unionfind_root ()); >>>> + if (root_entry->web_requires_special_handling) >>>> handle_special_swappables (insn_entry, i); >>>> } >>>> >>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c b/gcc/testsuite/gcc.target/powerpc/pr106770.c >>>> new file mode 100644 >>>> index 00000000000..5b300b94a41 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c >>>> @@ -0,0 +1,20 @@ >>>> +/* { dg-require-effective-target powerpc_p8vector_ok } */ >>>> +/* { dg-options "-mdejagnu-cpu=power8 -O2 " } */ >>>> +/* The 2 xxpermdi instructions are generated by the two >>>> + calls to vec_promote() */ >>>> +/* { dg-final { scan-assembler-times {xxpermdi} 2 } } */ >>>> + >>>> +/* Test case to resolve PR106770 */ >>>> + >>>> +#include >>>> + >>>> +int cmp2(double a, double b) >>>> +{ >>>> + vector double va = vec_promote(a, 1); >>>> + vector double vb = vec_promote(b, 1); >>>> + vector long long vlt = (vector long long)vec_cmplt(va, vb); >>>> + vector long long vgt = (vector long long)vec_cmplt(vb, va); >>>> + vector signed long long vr = vec_sub(vlt, vgt); >>>> + >>>> + return vec_extract(vr, 1); >>>> +}