From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on20620.outbound.protection.outlook.com [IPv6:2a01:111:f403:2612::620]) by sourceware.org (Postfix) with ESMTPS id B0C78386C5B6 for ; Wed, 26 Jun 2024 13:46:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B0C78386C5B6 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B0C78386C5B6 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2612::620 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1719409621; cv=pass; b=E+lMfUh6TG50j5lGWgCZ9Vijw27Au0An1vhmIF3TUb+sFQ6QuP1KyOlxn1Th1puAotMGZHgZmgIRPRh3fLuCSFPUR8BbWfZiUYxrGEEWJd0VpkN+hgz8syBZtnLgSrvwFJFbIrex0urzhucqBrCx6/2fFqh3C6SRamOR6c/xcC8= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1719409621; c=relaxed/simple; bh=bK4TDDL/r/AIAj1FcqeVb/YtPtPdqwVlxWr5d67HVyM=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=tXUSDOnN+EanXTSDk87NEnioDNNTtFjQGpZDDun1JUUk+pJvbRJC0GDCOmFHgz/y6dyk87jRej49OSNG/qxep7Wgw9yRIEckZXHPXxRTAWKFpKlscJnI4VMQH2bsxT6r+15i0zxZTburY/e30XYyFExk30zi4CeRIiLBJ543IUo= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=a16NVgxL9OABmWr5ZdlRtKNl2zwIfitOEOWocKm97PTFrO7uqUBIPzeCfIuyNkbVFeMkWYk2Ktitoymu2VvIOX2WjKXTtGiT/jOLWr0J7EVoCXPId+gb8qVr/eQ71L32S9eM/EZfFQUI0zQcEfh8UmopvUJELYlDP5T3D5NkIMupdRieBmz+EbCRy6VT59AeCA3jOlEmyFeBSzZkEvbtvFiNln6HyBwAqNBM8nuu2LMx5W35XZ2k/K5HK4uLh0TtwxI1pqcVt6SjzW8c/AGnPCMWssJRd/HtrQMpuFO/UCCSQtabYoF6bjcIhG3XGVSJGpMFd+/90+/2rGCzlmtBxw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4gQnB02Tqz3Wz71eBMfdl0C9UT2MqFdrY4WuL6M9RWo=; b=TPoZwWJyRApU05m72c8rRjQ8fHYkGlDnFu+h+5KBv5hHMGBjZ7LgzkOodq/lcYNhilQvMpoy7myVT5AsR/Qve8VaWrGxmNrbRqO0oe/u6yKfZVnGhVOPajuGy9hTqYrG4u0NnitH9JXkYhKbiXND4XIqBiMHCG4NW4NDOfPBsCdt9fxudpydQ4Ox3H6aKqU6yaBrWw4mD1Ed5PNgE/E9Zoh5Mp0YFVb9Nj0gasWyol1Q5fJFU1Lg5exXbWdEPyyGeJ4R7h77w7aCS4g8aUsjyFTT2++rkvSzZr6UNKlYaYv2xLdIN/ysyoBxHOsWcKIQpPKunoDoUpiplkUaoOq4zw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4gQnB02Tqz3Wz71eBMfdl0C9UT2MqFdrY4WuL6M9RWo=; b=k4BE2gOc3iJoSimG9hh93iyoVWTbpEzBYPbV+olZ4yYJ/stLuZBXZHJQyo9eCtHPeiga9ZvpmdD8AWiIF6cDwNgbUWEdXx3ISxc86X5pFsQrhnMYH9zVnKDd1PJ2049ly5AOQ33VO4GqYDIx1rKg9SyYPFy4+ynhO68mQfcNS3U= Received: from DU7PR01CA0005.eurprd01.prod.exchangelabs.com (2603:10a6:10:50f::28) by AS8PR08MB6709.eurprd08.prod.outlook.com (2603:10a6:20b:395::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7698.32; Wed, 26 Jun 2024 13:46:51 +0000 Received: from DB1PEPF000509E3.eurprd03.prod.outlook.com (2603:10a6:10:50f:cafe::7f) by DU7PR01CA0005.outlook.office365.com (2603:10a6:10:50f::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7719.22 via Frontend Transport; Wed, 26 Jun 2024 13:46:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB1PEPF000509E3.mail.protection.outlook.com (10.167.242.53) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7677.15 via Frontend Transport; Wed, 26 Jun 2024 13:46:50 +0000 Received: ("Tessian outbound 41160df97de5:v347"); Wed, 26 Jun 2024 13:46:50 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 54549bca497edbe6 X-CR-MTA-TID: 64aa7808 Received: from ec95f6bff42b.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id AC8E22C6-E1B9-444B-9EE2-38142A1763B9.1; Wed, 26 Jun 2024 13:46:43 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ec95f6bff42b.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 26 Jun 2024 13:46:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JHr9XLX5Xf5QPRzlMrdG6yqRPQ1CdHZ6eBYDNdPRV22x61QEl8TVeaagh6vKM7SsaI3AWPyrVILhJjXljlR5KyPLGI2wMFAUZfmx6PgeIXsDNI1wnQdB2SnzerYqMhnk8NJ8iaQS+k0CNweNRIdkw99Oq0MEP4TzuoWfeJSprKGNH3SJ7r0/4C84T9Wjm8ys73V++z1T006ck4onxnhqPGp1cFcJOgEiBYUMejUXwn2GCA+Qvr+GGzlQyYhfgq/2B3WGc86Gh1w4n4I1XKI/Cww7slbMbsK/SqOWyUrWF0yv1fAh1y4BKtZHwZBGXFiAamT4DW69Xohl51cs24oW/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4gQnB02Tqz3Wz71eBMfdl0C9UT2MqFdrY4WuL6M9RWo=; b=KX26xNRSLYXb2lcptZ3cSQzulEn86Wm6USa1Z2Ya/QIdQUbVcdyzag2YxuLIm67P5ojVDeIa+mvc0XIISdqtmRZ9W2s4zQOvfQ4Yf1/vtSnn3CkwjxlNOqBin8b2rOd+XX/dBqedPKhueeM4g4WGiGX0Z7YblAUXI9npDV6Givf5kY/pj3SHT1Yp/yv87tPRpHHPpntHKa9Nn12e5Ogh9fuCln/oKb5/DprUcudWn5CXLcY54P+KUqL1+mAyzi9vgd5WZnmUV4BUeUQUP0mqB9vngVGDukPvu4mwGSvUnmsrtNeirWegJ72mZMLdOR7cq6uS5hJk180J/os+BB4g8w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4gQnB02Tqz3Wz71eBMfdl0C9UT2MqFdrY4WuL6M9RWo=; b=k4BE2gOc3iJoSimG9hh93iyoVWTbpEzBYPbV+olZ4yYJ/stLuZBXZHJQyo9eCtHPeiga9ZvpmdD8AWiIF6cDwNgbUWEdXx3ISxc86X5pFsQrhnMYH9zVnKDd1PJ2049ly5AOQ33VO4GqYDIx1rKg9SyYPFy4+ynhO68mQfcNS3U= Received: from AM0PR08MB5316.eurprd08.prod.outlook.com (2603:10a6:208:185::14) by GVXPR08MB10890.eurprd08.prod.outlook.com (2603:10a6:150:1f6::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7698.32; Wed, 26 Jun 2024 13:46:38 +0000 Received: from AM0PR08MB5316.eurprd08.prod.outlook.com ([fe80::29ce:20f7:75fe:a074]) by AM0PR08MB5316.eurprd08.prod.outlook.com ([fe80::29ce:20f7:75fe:a074%3]) with mapi id 15.20.7698.025; Wed, 26 Jun 2024 13:46:37 +0000 From: Tamar Christina To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH]middle-end: Implement conditonal store vectorizer pattern [PR115531] Thread-Topic: [PATCH]middle-end: Implement conditonal store vectorizer pattern [PR115531] Thread-Index: AQHaxvnLnky9DxoOtU6l4xZTQvNPwLHaCfmAgAABAgA= Date: Wed, 26 Jun 2024 13:46:37 +0000 Message-ID: References: <5r1oo799-nq2o-670q-2s2p-3r38no245837@fhfr.qr> In-Reply-To: <5r1oo799-nq2o-670q-2s2p-3r38no245837@fhfr.qr> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: AM0PR08MB5316:EE_|GVXPR08MB10890:EE_|DB1PEPF000509E3:EE_|AS8PR08MB6709:EE_ X-MS-Office365-Filtering-Correlation-Id: 4edb66ea-97e2-494d-7ecd-08dc95e66e30 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0;ARA:13230038|366014|1800799022|376012|38070700016; X-Microsoft-Antispam-Message-Info-Original: =?us-ascii?Q?eIbsZ4tsQ7zXEcEbIswPTvi51ZIpWlyWYgMyNWi3N+Q1xWlPo7svQRIJ67c8?= =?us-ascii?Q?BQPFj8nvxLAosHif089LIqjkgT500lOs5MKFcEy3r1IJzRVBFaOcYeoAJ2Y2?= =?us-ascii?Q?/hX0wH5XFsy1vsW1AW3Kl5ThAV/XrsfpAf6OA2FR+NZmQOfHSYwJjcJHzu2n?= =?us-ascii?Q?tLfa82F35JvBBrc+eppoCuoYQFtbsrEyefjYM+Uru2L+zEVQDLbBHDPbhKHT?= =?us-ascii?Q?Mta+pyMSqW/IEk3DLMB6AfrOZUcrIDdD48O+4ap1EoewZHgK3jhOVLBuP/sK?= =?us-ascii?Q?lfMwORxgICjlZR351+fl//HdfCFl5+zkFxOGvxlgte5Oq/Nb7lVSwBLSLUem?= =?us-ascii?Q?onkj0883/IgSFWjv4xCDPUOkT+N/5u3aojGykv4JuhTD1bYQPjJfdY8VRCkl?= =?us-ascii?Q?YxcQWkHoquxYXdni0dltDylm54EPlYW5vx2xr0K4mSipMyxb/JsG4Btlp3/q?= =?us-ascii?Q?+Bt1cH49icVSuvDywmYY+hU54kQYiwk2JXftMHYN73+dWXWQyfiu353LqN01?= =?us-ascii?Q?jOT/xG32PXxAdiHHh0m/7CxJi86hKK1Z2jaDzKXzqvR83PgjD+jzRU2c5w9f?= =?us-ascii?Q?S/J5F8ZvifCWSQDeEwfL/6CB19PhHVjItQlpqDYzLftKHCzW6vki0B0cvwvy?= =?us-ascii?Q?sh245PcxdlbKUqysXVXf5mjEm4NivCa+SbzwUYB7xt5imzj94VOZSTw0FxUy?= =?us-ascii?Q?kdudmf4Z8zEYIcKRMu6tbWypAT1mERoWbyZCqbD7ZQDFyKdgUsjz+rOvbrK2?= =?us-ascii?Q?HCulhyLQotV0SRxBzXc3JkrqR68B/G8GCQ6WYNH1VVdgCIUb/DHtxQJvtmss?= =?us-ascii?Q?ww8WmAim6N5k+OQLo1yj2TkZ0kK6AiAVdfLF+VGurzEzdtziYSErCbw5URde?= =?us-ascii?Q?kHcvvK4gYZ4sGOI0hBofurRhAsenTlHOYO2mZZW+/5HOsZY54gQKZB2zfXaq?= =?us-ascii?Q?W9Juggk+BMdFQW3UMuSwvC9dmJlveBswhpEq/8ZhZgFDo+3CXfjkZS6ZK6Jb?= =?us-ascii?Q?JmW5TlJ0qX/FbysUVB4A9B4mnmWieHQoXuUN3+zQUz+zvj128N7cdfFKwGLk?= =?us-ascii?Q?mCESP5YeY6aI/DtV/UVZyY/qa0jgnhwPz3//7n9i5POBRE4sEPEI/+FHqs6/?= =?us-ascii?Q?Bf0+qYcfp+ktTxOwKgsNBgi0ZmbQGes7jIi9FVAGHwroVliHAhFALNcxGoC7?= =?us-ascii?Q?Bhh4fk3N5rWnlUgWgBJ2K17a5c5WETQfzvcGQQsJTK2bCacbaLlTQ5kUy9Pe?= =?us-ascii?Q?ydUXsJhf3Y2IbUN48RMsZvmWCcHhdQ6XTVSZpYIS7H3IpCwt9uafePm1hscw?= =?us-ascii?Q?UfAQqy/ymdPcQKrHRz/8pNrm12g5YFLKeMK9ihD2ZubroYlZQ89XuuoOshWd?= =?us-ascii?Q?iPgycNM=3D?= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AM0PR08MB5316.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230038)(366014)(1800799022)(376012)(38070700016);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: GVXPR08MB10890 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB1PEPF000509E3.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 2692e5a7-1509-44aa-bce1-08dc95e666c2 X-Microsoft-Antispam: BCL:0;ARA:13230038|376012|35042699020|1800799022|82310400024|36860700011; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?wyJenHAXSSVaR5Ztj7HJjhOOSRGrJIqXAZB+rgLoDGJ5PIt0wMaPlwJp7EKu?= =?us-ascii?Q?nmEOva8d9GgKt1pMBQML2/EOSkj6HzgUI0bNTj/HqPk0tqmn8tMMtJnJJSEb?= =?us-ascii?Q?tWSQHDiXJfFG2Uz+/RPa141eJOgE/R2/X7PE4bp9bFQs4RdTGgKwCvfgL0o2?= =?us-ascii?Q?VtuGNwTOVVdTDa/w5bm/bweKHglV5/mJ6hWu7acBOI+yA7elVPtRXm7HoPD5?= =?us-ascii?Q?S3NFofD2wxF7TBW8EcqIeXe69MxD7BdAm2XMRvHkOZnKbyKuHiCGPxyu+OPe?= =?us-ascii?Q?8X+YoSmCfcsVYXYcW48pSfKw6eGAslVeFLOL3ZL1isVjygCsow4X1AgjrzBq?= =?us-ascii?Q?SbxBARl1WYBzAf1GMZJFUneiMKPUo8xCBT4Zqq33k/Ci2c5qlxgzOjM4LASB?= =?us-ascii?Q?NU4hCMsVkbpxdD0g3L3ouOye0yQVYPi2IkqTl5hPRwEbkoNhez13d6zUYBhE?= =?us-ascii?Q?K3UBK+P1o1rrT0hiKstBhvS3bkPhOnDPY1uOxNDkjAQeZ10QlDsRbfGGd2sU?= =?us-ascii?Q?lwAF6tBB+GLk7yd/v/gozo/DRNQ5bHbbRgj11ljaUoq5F7nCQP7QUM0J2B6k?= =?us-ascii?Q?DYOmHxGLD9Wwx5WVb92gUO2T4o1feRZQxX1nnIkY4mq+pRtFFLco7C7goC9h?= =?us-ascii?Q?0TYXOC/waSdnPvhOxti5w4zFpYJFUlhMBi//SDBHbMm70RCwLVOXl6DmR9qK?= =?us-ascii?Q?1idOFcPVpz2xZw8oL9IJcQMAKfXxls3QR6AKZjGGFhdNSXuec+qUW1Jjv7jc?= =?us-ascii?Q?M28yk/gdQf9EYMVnP5DQwCEvA5iaC+OoZAOW55OANFb/04m0fNpcOIqzjVyd?= =?us-ascii?Q?TSeqA1gI3j+cnTABLvwhofAMhoL//FoxMc5RAd6YQiSDdiXI3HFKobaBuma7?= =?us-ascii?Q?kiSkc9sBVSHEOEclmqo7U8RRtFYJNtbb0cHtfil1iOPKpJ7yXfwl+XsditoY?= =?us-ascii?Q?jDZLqqBGH/r7ruUr6/TEny4PQORiHfY6DyFluEnzSRtexnoBhidOJn/nohuG?= =?us-ascii?Q?YVVpDNrJflVifQJHG2DLlIe+YBruIbh6QMsfhN6iL4V7AqNPGZq4ecfS628R?= =?us-ascii?Q?GDU1+MMln/5W+O5lFjg9Q+fQyGIYqCI4ZGZ1jiU4mCH7Xyh+X09gWNh6QJF6?= =?us-ascii?Q?q0QLqVAmyRvmNhQ1CHjaIdtDvFIxwEsnoFqV2Bvv1MK8D7fb377IMOhDzQ5Y?= =?us-ascii?Q?onNgGnptiZD1dYYS+DHoS2wQPxvpajnVYgWIZNY5F5gs6fyyTUrYp0vn8Dfu?= =?us-ascii?Q?g79yDVaGFZF0MNlRYxKiCEY3i0v90oXxaEe+GgQQSxKPJxz2cAil1nt95QVM?= =?us-ascii?Q?aCmojiCV1/Ns+OQ2FV6TVqzh7qYh1Yun6dCYlcV5Q9bVLb3gRiiAfWlqTAM9?= =?us-ascii?Q?kZmZkjw=3D?= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230038)(376012)(35042699020)(1800799022)(82310400024)(36860700011);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Jun 2024 13:46:50.3456 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4edb66ea-97e2-494d-7ecd-08dc95e66e30 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB1PEPF000509E3.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6709 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO,GIT_PATCH_0,KAM_LOTSOFHASH,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > -----Original Message----- > From: Richard Biener > Sent: Wednesday, June 26, 2024 2:23 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; jlaw@ventanamicro.com > Subject: Re: [PATCH]middle-end: Implement conditonal store vectorizer pat= tern > [PR115531] >=20 > On Tue, 25 Jun 2024, Tamar Christina wrote: >=20 > > Hi All, > > > > This adds a conditional store optimization for the vectorizer as a patt= ern. > > The vectorizer already supports modifying memory accesses because of th= e > pattern > > based gather/scatter recognition. > > > > Doing it in the vectorizer allows us to still keep the ability to vecto= rize such > > loops for architectures that don't have MASK_STORE support, whereas doi= ng this > > in ifcvt makes us commit to MASK_STORE. > > > > Concretely for this loop: > > > > void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, i= nt stride) > > { > > if (stride <=3D 1) > > return; > > > > for (int i =3D 0; i < n; i++) > > { > > int res =3D c[i]; > > int t =3D b[i+stride]; > > if (a[i] !=3D 0) > > res =3D t; > > c[i] =3D res; > > } > > } > > > > today we generate: > > > > .L3: > > ld1b z29.s, p7/z, [x0, x5] > > ld1w z31.s, p7/z, [x2, x5, lsl 2] > > ld1w z30.s, p7/z, [x1, x5, lsl 2] > > cmpne p15.b, p6/z, z29.b, #0 > > sel z30.s, p15, z30.s, z31.s > > st1w z30.s, p7, [x2, x5, lsl 2] > > add x5, x5, x4 > > whilelo p7.s, w5, w3 > > b.any .L3 > > > > which in gimple is: > > > > vect_res_18.9_68 =3D .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67); > > vect_t_20.12_74 =3D .MASK_LOAD (vectp.10_72, 32B, loop_mask_67); > > vect__9.15_77 =3D .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67); > > mask__34.16_79 =3D vect__9.15_77 !=3D { 0, ... }; > > vect_res_11.17_80 =3D VEC_COND_EXPR vect_res_18.9_68>; > > .MASK_STORE (vectp_c.18_81, 32B, loop_mask_67, vect_res_11.17_80); > > > > A MASK_STORE is already conditional, so there's no need to perform the = load of > > the old values and the VEC_COND_EXPR. This patch makes it so we genera= te: > > > > vect_res_18.9_68 =3D .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67); > > vect__9.15_77 =3D .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67); > > mask__34.16_79 =3D vect__9.15_77 !=3D { 0, ... }; > > .MASK_STORE (vectp_c.18_81, 32B, mask__34.16_79, vect_res_18.9_68); > > > > which generates: > > > > .L3: > > ld1b z30.s, p7/z, [x0, x5] > > ld1w z31.s, p7/z, [x1, x5, lsl 2] > > cmpne p7.b, p7/z, z30.b, #0 > > st1w z31.s, p7, [x2, x5, lsl 2] > > add x5, x5, x4 > > whilelo p7.s, w5, w3 > > b.any .L3 > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >=20 > The idea looks good but I wonder if it's not slower in practice. > The issue with masked stores, in particular those where any elements > are actually masked out, is that such stores do not forward on any > uarch I know. They also usually have a penalty for the merging > (the load has to be carried out anyway). >=20 Yes, but when the predicate has all bit set it usually does. But forwarding aside, this gets rid of the select and the additional load, So purely from a instruction latency perspective it's a win. > So - can you do an actual benchmark on real hardware where the > loop has (way) more than one vector iteration and where there's > at least one masked element during each vector iteration? >=20 Sure, this optimization comes from exchange2 where vectoring with SVE ends up being slower than not vectorizing. This change makes the vectoriza= tion profitable and recovers about a 3% difference overall between vectorizing a= nd not. I did run microbenchmarks over all current and future Arm cores and it was = a universal win. I can run more benchmarks with various masks, but as mentioned above, even = without Forwarding, you still have 2 instructions less, so it's almost always going= to win. > > Ok for master? >=20 > Few comments below. >=20 > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/115531 > > * tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New. > > (vect_recog_cond_store_pattern): New. > > (vect_vect_recog_func_ptrs): Use it. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/115531 > > * gcc.dg/vect/vect-conditional_store_1.c: New test. > > * gcc.dg/vect/vect-conditional_store_2.c: New test. > > * gcc.dg/vect/vect-conditional_store_3.c: New test. > > * gcc.dg/vect/vect-conditional_store_4.c: New test. > > > > --- > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..3884a3c3d0a2dc2258097 > 348c75bb7c0b3b37c72 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_1.c > > @@ -0,0 +1,24 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-require-effective-target vect_masked_store } */ > > + > > +/* { dg-additional-options "-mavx2" { target avx2 } } */ > > +/* { dg-additional-options "-march=3Darmv9-a" { target aarch64-*-* } }= */ > > + > > +void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, = int stride) > > +{ > > + if (stride <=3D 1) > > + return; > > + > > + for (int i =3D 0; i < n; i++) > > + { > > + int res =3D c[i]; > > + int t =3D b[i+stride]; > > + if (a[i] !=3D 0) > > + res =3D t; > > + c[i] =3D res; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..bc965a244f147c199b1726 > e5f6b44229539cd225 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_2.c > > @@ -0,0 +1,24 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-require-effective-target vect_masked_store } */ > > + > > +/* { dg-additional-options "-mavx2" { target avx2 } } */ > > +/* { dg-additional-options "-march=3Darmv9-a" { target aarch64-*-* } }= */ > > + > > +void foo2 (char *restrict a, int *restrict b, int *restrict c, int n, = int stride) > > +{ > > + if (stride <=3D 1) > > + return; > > + > > + for (int i =3D 0; i < n; i++) > > + { > > + int res =3D c[i]; > > + int t =3D b[i+stride]; > > + if (a[i] !=3D 0) > > + t =3D res; > > + c[i] =3D t; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..ab6889f967b330a652917 > 925c2748b16af59b9fd > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_3.c > > @@ -0,0 +1,24 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-require-effective-target vect_masked_store } */ > > + > > +/* { dg-additional-options "-mavx2" { target avx2 } } */ > > +/* { dg-additional-options "-march=3Darmv9-a" { target aarch64-*-* } }= */ > > + > > +void foo3 (float *restrict a, int *restrict b, int *restrict c, int n,= int stride) > > +{ > > + if (stride <=3D 1) > > + return; > > + > > + for (int i =3D 0; i < n; i++) > > + { > > + int res =3D c[i]; > > + int t =3D b[i+stride]; > > + if (a[i] >=3D 0) > > + t =3D res; > > + c[i] =3D t; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c > b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..3bfe2f81dc2d47096aa235 > 29d43263be52cd422c > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_4.c > > @@ -0,0 +1,28 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-require-effective-target vect_masked_store } */ > > + > > +/* { dg-additional-options "-mavx2" { target avx2 } } */ > > +/* { dg-additional-options "-march=3Darmv9-a" { target aarch64-*-* } }= */ > > + > > +void foo4 (signed char *restrict a, int *restrict b, int *restrict c, = int *restrict d, int > n, int stride) > > +{ > > + if (stride <=3D 1) > > + return; > > + > > + for (int i =3D 0; i < n; i++) > > + { > > + int res1 =3D c[i]; > > + int res2 =3D d[i]; > > + int t =3D b[i+stride]; > > + if (a[i] > 0) > > + t =3D res1; > > + else if (a[i] < 0) > > + t =3D res2 * 2; > > + > > + c[i] =3D t; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" } } */ > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index > cef901808eb97c0c92b51da20535ea7f397b4742..f0da6c932f48d2992a501d0c > ed3efc8924912c77 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -6397,6 +6397,177 @@ vect_recog_gather_scatter_pattern (vec_info > *vinfo, > > return pattern_stmt; > > } > > > > +/* Helper method of vect_recog_cond_store_pattern, checks to see if > COND_ARG > > + is points to a load statement that reads the same data as that of > > + STORE_VINFO. */ > > + > > +static bool > > +vect_cond_store_pattern_same_ref (loop_vec_info loop_vinfo, > > + stmt_vec_info store_vinfo, tree cond_arg) > > +{ > > + stmt_vec_info load_stmt_vinfo =3D loop_vinfo->lookup_def (cond_arg); > > + if (!load_stmt_vinfo > > + || !STMT_VINFO_DATA_REF (load_stmt_vinfo) > > + || DR_IS_WRITE (STMT_VINFO_DATA_REF (load_stmt_vinfo)) >=20 > can you use !DR_IS_READ? >=20 > > + || !same_data_refs (STMT_VINFO_DATA_REF (store_vinfo), > > + STMT_VINFO_DATA_REF (load_stmt_vinfo))) > > + return false; > > + > > + return true; > > +} > > + > > +/* Function vect_recog_cond_store_pattern > > + > > + Try to find the following pattern: > > + > > + x =3D *_3; > > + c =3D a CMP b; > > + y =3D c ? t_20 : x; > > + *_3 =3D y; > > + > > + where the store of _3 happens on a conditional select on a value lo= aded > > + from the same location. In such case we can elide the initial load= if > > + MASK_STORE is supported and instead only conditionally write out th= e result. > > + > > + The pattern produces for the above: > > + > > + c =3D a CMP b; > > + .MASK_STORE (_3, c, t_20) > > + > > + Input: > > + > > + * STMT_VINFO: The stmt from which the pattern search begins. In th= e > > + example, when this function is called with _3 then the search begin= s. > > + > > + Output: > > + > > + * TYPE_OUT: The type of the output of this pattern. > > + > > + * Return value: A new stmt that will be used to replace the sequenc= e. */ > > + > > +static gimple * > > +vect_recog_cond_store_pattern (vec_info *vinfo, > > + stmt_vec_info stmt_vinfo, tree *type_out) > > +{ > > + loop_vec_info loop_vinfo =3D dyn_cast (vinfo); > > + if (!loop_vinfo) > > + return NULL; >=20 > Why only for loops? We run BB vect for if-converted loop bodies > if loop vect failed on them for example. Or is it that you imply > this is only profitable when loop masking is applied - which of course > you do not check? >=20 I don't think it's possible when masking isn't applied no? The check is implicit in checking for MASK_STORE support, or can you have masked store support without masking? > > + gimple *store_stmt =3D STMT_VINFO_STMT (stmt_vinfo); > > + > > + /* Needs to be a gimple store where we have DR info for. */ > > + if (!STMT_VINFO_DATA_REF (stmt_vinfo) > > + || !DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_vinfo)) > > + || !gimple_store_p (store_stmt)) > > + return NULL; > > + > > + tree st_rhs =3D gimple_assign_rhs1 (store_stmt); > > + tree st_lhs =3D gimple_assign_lhs (store_stmt); > > + > > + if (TREE_CODE (st_rhs) !=3D SSA_NAME) > > + return NULL; > > + > > + gassign *cond_stmt =3D dyn_cast (SSA_NAME_DEF_STMT (st_rh= s)); > > + if (!cond_stmt || gimple_assign_rhs_code (cond_stmt) !=3D COND_EXPR) > > + return NULL; > > + > > + /* Check if the else value matches the original loaded one. */ > > + bool invert =3D false; > > + tree cmp_ls =3D gimple_arg (cond_stmt, 0); > > + tree cond_arg1 =3D gimple_arg (cond_stmt, 1); > > + tree cond_arg2 =3D gimple_arg (cond_stmt, 2); > > + > > + if (!vect_cond_store_pattern_same_ref (loop_vinfo, stmt_vinfo, cond_= arg2) > > + && !(invert =3D vect_cond_store_pattern_same_ref (loop_vinfo, st= mt_vinfo, > > + cond_arg1))) > > + return NULL; > > + > > + vect_pattern_detected ("vect_recog_cond_store_pattern", store_stmt); > > + > > + tree scalar_type =3D TREE_TYPE (st_rhs); > > + if (VECTOR_TYPE_P (scalar_type)) > > + return NULL; > > + > > + tree vectype =3D get_vectype_for_scalar_type (vinfo, scalar_type); > > + if (vectype =3D=3D NULL_TREE) > > + return NULL; > > + > > + machine_mode mask_mode; > > + machine_mode vecmode =3D TYPE_MODE (vectype); > > + if (!targetm.vectorize.get_mask_mode (vecmode).exists (&mask_mode) > > + || !can_vec_mask_load_store_p (vecmode, mask_mode, false)) > > + return NULL; > > + > > + /* Convert the mask to the right form. */ > > + tree gs_vectype =3D get_vectype_for_scalar_type (loop_vinfo, scalar_= type); >=20 > same as vectype above? You sometimes use 'vinfo' and sometimes > 'loop_vinfo'. >=20 > > + tree cookie =3D build_int_cst (build_pointer_type (scalar_type), > > + TYPE_ALIGN (scalar_type)); >=20 > please do this next to the use. It's also wrong, you need to > preserve alias info and alignment of the ref properly - see if-conversion > on how to do that. >=20 > > + tree base =3D TREE_OPERAND (st_lhs, 0); >=20 > You assume this is a MEM_REF? I think you want build_fold_addr_expr > (st_lhs) and you need to be prepared to put this to a separate stmt if > it's not invariant. See if-conversion again. >=20 > > + tree cond_store_arg =3D cond_arg1; > > + > > + /* If we have to invert the condition, i.e. use the true argument ra= ther than > > + the false argument, we should check whether we can just invert th= e > > + comparison or if we have to negate the result. */ > > + if (invert) > > + { > > + gimple *cond =3D SSA_NAME_DEF_STMT (cmp_ls); > > + /* We need to use the false parameter of the conditional select.= */ > > + cond_store_arg =3D cond_arg2; > > + tree_code new_code =3D ERROR_MARK; > > + tree mask_vec_type, itype; > > + gassign *conv; > > + tree var =3D vect_recog_temp_ssa_var (boolean_type_node, NULL); > > + > > + if (is_gimple_assign (cond) > > + && TREE_CODE_CLASS (gimple_assign_rhs_code (cond)) =3D=3D > tcc_comparison) > > + { > > + tree_code cond_code =3D gimple_assign_rhs_code (cond); > > + tree cond_expr0 =3D gimple_assign_rhs1 (cond); > > + tree cond_expr1 =3D gimple_assign_rhs2 (cond); > > + > > + /* We have to invert the comparison, see if we can flip it. */ > > + bool honor_nans =3D HONOR_NANS (TREE_TYPE (cond_expr0)); > > + new_code =3D invert_tree_comparison (cond_code, honor_nans); > > + if (new_code !=3D ERROR_MARK) > > + { > > + itype =3D TREE_TYPE(cond_expr0); > > + conv =3D gimple_build_assign (var, new_code, cond_expr0, > > + cond_expr1); > > + } >=20 > I think this is premature optimization here. Actual inversion should > be cheaper than having a second comparison. So just invert. Fair, the reason I did so was because the vectorizer already tracks masks and their inverses. So if the negated version was live we wouldn't materia= lize it. That said, that also means I can just negate and leave to the same cod= e to track, so I'll just try negating. >=20 > > + } > > + > > + if (new_code =3D=3D ERROR_MARK) > > + { > > + /* We couldn't flip the condition, so invert the mask instead. */ > > + itype =3D TREE_TYPE (cmp_ls); > > + conv =3D gimple_build_assign (var, BIT_XOR_EXPR, cmp_ls, > > + build_int_cst (itype, 1)); > > + } > > + > > + mask_vec_type =3D get_mask_type_for_scalar_type (loop_vinfo, ity= pe); > > + append_pattern_def_seq (vinfo, stmt_vinfo, conv, mask_vec_type, = itype); > > + /* Then prepare the boolean mask as the mask conversion pattern > > + won't hit on the pattern statement. */ > > + cmp_ls =3D build_mask_conversion (vinfo, var, gs_vectype, stmt_v= info); >=20 > Isn't this somewhat redundant with the below call? >=20 > I fear of bad [non-]interactions with bool pattern recognition btw. So this is again another issue with that patterns don't apply to newly prod= uced patterns. and so they can't serve as root for new patterns. This is why the scatter/= gather pattern addition refactored part of the work into these helper functions. I did actually try to just add a secondary loop that iterates over newly pr= oduced patterns but you later run into problems where a new pattern completely cancels out = an old pattern rather than just extend it. So at the moment, unless the code ends up being hybrid, whatever the bool r= ecog pattern does is just ignored as irrelevant. But If we don't invert the compare then it should be simpler as the origina= l compare is never in a pattern. I'll respin with these changes. Thanks, Tamar >=20 > > + } > > + > > + tree mask =3D vect_convert_mask_for_vectype (cmp_ls, gs_vectype, stm= t_vinfo, > > + loop_vinfo); > > + gcall *call > > + =3D gimple_build_call_internal (IFN_MASK_STORE, 4, base, cookie, m= ask, > > + cond_store_arg); > > + gimple_set_location (call, gimple_location (store_stmt)); > > + gimple_set_lhs (call, make_ssa_name (integer_type_node)); > > + > > + /* Copy across relevant vectorization info and associate DR with the > > + new pattern statement instead of the original statement. */ > > + stmt_vec_info pattern_stmt_info =3D loop_vinfo->add_stmt (call); > > + loop_vinfo->move_dr (pattern_stmt_info, stmt_vinfo); > > + > > + *type_out =3D vectype; > > + return call; > > +} > > + > > /* Return true if TYPE is a non-boolean integer type. These are the t= ypes > > that we want to consider for narrowing. */ > > > > @@ -7061,6 +7232,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[= ] =3D > { > > of mask conversion that are needed for gather and scatter > > internal functions. */ > > { vect_recog_gather_scatter_pattern, "gather_scatter" }, > > + { vect_recog_cond_store_pattern, "cond_store" }, > > { vect_recog_mask_conversion_pattern, "mask_conversion" }, > > { vect_recog_widen_plus_pattern, "widen_plus" }, > > { vect_recog_widen_minus_pattern, "widen_minus" }, > > > > > > > > > > >=20 > -- > Richard Biener > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg= )