From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 02E893858C52 for ; Tue, 27 Sep 2022 17:45:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 02E893858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28RHPu8T002401 for ; Tue, 27 Sep 2022 17:45:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : from : to : references : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=OBsEcxLBae2eSgRkFjs33Bf8vwD6vrx+PMNZ+n9xuEA=; b=l02rmHJi4Kz7uj1jZzO3Nkfqewv3tsRR4ReJO+JPBCe6BRbXx3cDuZWffrUCSBl34phQ X2c1NGv0EHyC4f/eSPT/rwc9KzXMQTPPLuFzdMs2ILEhAaZ5SNDZ5KMuwe4IzhQWZfvF gsMahu6YQsX4sP60VibiJh3JXIGtVC60oGMjO3ABOACsFr70B7LzjAG0S9DCOHuW4Ges onAKAe58I72GUTfkSib4sQLFjOiSi9x06aFdhtVO/0e+qyXFYbSkCa9ZdwY0SMiLUkMC 8C/GKrSWT2bopaaVCd9IG5bfXbc3IEba4wqhZ9uJKipvaUxlLzW9BpzcNBgyTT08+njW kw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jv5jm8fvc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 27 Sep 2022 17:45:29 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28RHQKpp004966 for ; Tue, 27 Sep 2022 17:45:29 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jv5jm8ftw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Sep 2022 17:45:29 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 28RHcjsg031320; Tue, 27 Sep 2022 17:40:26 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma04fra.de.ibm.com with ESMTP id 3jssh9344p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Sep 2022 17:40:26 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 28RHeO6k5636850 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 27 Sep 2022 17:40:24 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 051825204F; Tue, 27 Sep 2022 17:40:24 +0000 (GMT) Received: from [9.171.15.101] (unknown [9.171.15.101]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTPS id CCDC75204E; Tue, 27 Sep 2022 17:40:23 +0000 (GMT) Message-ID: <7bd6cb29-a107-a7f2-463f-75bf811792a7@linux.ibm.com> Date: Tue, 27 Sep 2022 19:40:23 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Subject: Re: [RFC] postreload cse'ing vector constants Content-Language: en-US From: Robin Dapp To: Jeff Law , gcc-patches@gcc.gnu.org References: <3b0984ef-c532-c29c-732a-1c9b569e134c@linux.ibm.com> <7ecca009-32ac-3b2f-e552-55414300113e@gmail.com> <70a54b9a-30ea-5673-3a41-9585b3abf627@linux.ibm.com> <5b687817-126e-d463-9d88-b3d7d2dad861@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: bEIo5UCyLLxSdEhssF_Wl33Y_yIr_Lwr X-Proofpoint-GUID: n7tKNLL5Sm_LU5ZxcprYaBBjc-BEJTVi X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-27_07,2022-09-27_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxscore=0 malwarescore=0 lowpriorityscore=0 spamscore=0 impostorscore=0 priorityscore=1501 mlxlogscore=738 clxscore=1015 adultscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209270109 X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > I did bootstrapping and ran the testsuite on x86(-64), aarch64, Power9 > and s390. Everything looks good except two additional fails on x86 > where code actually looks worse. > > gcc.target/i386/keylocker-encodekey128.c > > 17c17,18 > < movaps %xmm4, k2(%rip) > --- >> pxor %xmm0, %xmm0 >> movaps %xmm0, k2(%rip) > > gcc.target/i386/keylocker-encodekey256.c: > > 19c19,20 > < movaps %xmm4, k3(%rip) > --- >> pxor %xmm0, %xmm0 >> movaps %xmm0, k3(%rip) Before the patch and after postreload we have: (insn (set (reg:V2DI xmm0) (reg:V2DI xmm4)) (expr_list:REG_DEAD (reg:V2DI 24 xmm4) (expr_list:REG_EQUIV (const_vector:V2DI [ (const_int 0 [0]) repeated x2 ]))))) (insn (set (mem/c:V2DI (symbol_ref:DI ("k2")) (reg:V2DI xmm0)))) which is converted by cprop_hardreg to: (insn (set (mem/c:V2DI (symbol_ref:DI ("k2"))) (reg:V2DI xmm4)))) With the change there is: (insn (set (reg:V2DI xmm0) (const_vector:V2DI [ (const_int 0 [0]) repeated x2 ]))) (insn (set (mem/c:V2DI (symbol_ref:DI ("k2"))) (reg:V2DI xmm0)))) which is not simplified further because xmm0 needs to be explicitly zeroed while xmm4 is assumed to be zeroed by encodekey128. I'm not familiar with this so I'm supposing this is correct even though I found "XMM4 through XMM6 are reserved for future usages and software should not rely upon them being zeroed." online. Even inf xmm4 were zeroed explicity, I guess in this case the simple costing of mov reg,reg vs mov reg,imm (with the latter not being more expensive) falls short? cprop_hardreg can actually propagate the zeroed xmm4 into the next move. The same mechanism could possibly even elide many such moves which would mean we'd unnecessarily emit many mov reg,0? Hmm...