From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00041.outbound.protection.outlook.com [40.107.0.41]) by sourceware.org (Postfix) with ESMTPS id AC33C3857036 for ; Wed, 30 Jun 2021 16:08:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AC33C3857036 Received: from AM5PR0201CA0011.eurprd02.prod.outlook.com (2603:10a6:203:3d::21) by VE1PR08MB5279.eurprd08.prod.outlook.com (2603:10a6:803:105::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.22; Wed, 30 Jun 2021 16:08:54 +0000 Received: from AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com (2603:10a6:203:3d:cafe::37) by AM5PR0201CA0011.outlook.office365.com (2603:10a6:203:3d::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.22 via Frontend Transport; Wed, 30 Jun 2021 16:08:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT011.mail.protection.outlook.com (10.152.16.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.22 via Frontend Transport; Wed, 30 Jun 2021 16:08:53 +0000 Received: ("Tessian outbound 1763b1d84bc3:v97"); Wed, 30 Jun 2021 16:08:53 +0000 X-CR-MTA-TID: 64aa7808 Received: from 733c478fa50f.3 by 64aa7808-outbound-1.mta.getcheckrecipient.com id E359F1DB-E204-4700-9F8B-C6C0D240AC94.1; Wed, 30 Jun 2021 16:08:47 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 733c478fa50f.3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 30 Jun 2021 16:08:47 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VM/OLCpLgND7PXgqozNKtVDAcMxvvz0qNWQN0k6Xc84en5wz9tiTgxbx3NMvDzW8ly6qhWr4He7AjQh288Rnbj3KXZnLsMXLxp/tLWcucHSv6d1BdOnEpD/GqFUHfIvorCsu/gn9jXs7mItLn1EhiHoYolEA+E8MxTk3Edo4GLB1W8w0AaZvorP193/gEwcaYdAuathVYbePwJzXcvvKiYT+cQKOxADXBeCi6UNGHcmp3hy00OZ9k5msQgAKCfPjJLjnJPkcm+mv62XNQ6t8VzpgACt8bFm7AqCD3Fij9SYRgLcAa2uBSvL3MyOC+1kitPEfmRQxihjCIZlVieT1sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=foGLvgHw5frbHbaQwTN3Ur1vLzE0QXmgRT3IDbZ4n3w=; b=Jp08ucUYiTj7uMLl2zd4e5t2RQXwnfsaXolFpecl5AGrugussvX/hhRG/WWPCzBVca1ByOKQZ8XtXunDc4MIDD+q5DL91XtkB9CJF1VNnTW5Fe4zkFlos1fo7MJAbPWXInd1rVmEBFu5UTMinyWtSFi551tfDsVVjJye/InGT1oVPNgDwW2yOuQc8lxV+/t4IxiXB10nyRWNjW7lS/6gHctXNKa8ilqNH08Kd7k/lefMmXep8knI9bzqJ5NyyZIcQzaZBdbXC9HiVrRK9phkFIRGEU4HszfpgD8jHkzML4uGFPyjoV0+4owfGfZygFyXVeqkQjrItf4d93vTKD2pbw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR08MB4383.eurprd08.prod.outlook.com (2603:10a6:803:fc::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.21; Wed, 30 Jun 2021 16:08:45 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::dd47:2a77:e102:e87a]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::dd47:2a77:e102:e87a%5]) with mapi id 15.20.4264.026; Wed, 30 Jun 2021 16:08:44 +0000 From: Tamar Christina To: Richard Sandiford CC: "gcc-patches@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov Subject: RE: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on inverted operands Thread-Topic: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on inverted operands Thread-Index: AQHXYSMxRzrhr1N50E2P2j3Yox4PoKsTluyKgAAA4JCAABEiZoAY+M1Q Date: Wed, 30 Jun 2021 16:08:44 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: 3DFC5D6E4C853549A6DB800DF523642A.0 x-checkrecipientchecked: true Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.11.185.166] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 30f79886-81c9-4a06-9184-08d93be15b9e x-ms-traffictypediagnostic: VI1PR08MB4383:|VE1PR08MB5279: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: FXawnSRmugKdQP6A3FBWwAhy0CsLenlVkX595eyc9+4YU1tddf7w+D+sLIjG5DlI5kSmMcSPq4klv6Qs5YaO7g0aO+kqU3gjpbyqLD5Z3QEKCCVTGW7JxWpWwODeDaWyUGlla6zEW0XCTW+51UpanJ1m/Bu3mV3VDGzGjrlJu5VJ4DWeRonHEbaj1Z7BlzfI044PQ+jObWq7/qMaX3fI4hb2i96q+4rVvZ7VEPi3lTF/hbYzBmVaDVM9X+WOoPR1XOBEE0YAcdzB9FOIP4AM7V0gdfNvuIamcZHFrFer+4Jk6RA+8VtPgpjkyYXS2iXOvehgIWlKDi92wAfaFfXByJVUJKjmX4Ornhmp1mb7V2bD1u/hEqDOF3JSeW94k1VMrEiWbba8YXfeIEzLCI6sj17qKB4Ytntyc7IMXG26N41Ifin04RdmTzHgiD0Gue5lywtakxuqounsNv7O3parcIPWoOkISCQ8nYcQiHanjj7DiD9EAfuxcqNz4O/g+WGxBbe4qkRREnvjKU3Qow/yJxrOphw1xBRyoUX2yLrSYQyL/O8j2U8OT+TCeK8VwHTQ/QescDoRQp+DaSaeR6ZU3YGn14jNHEUwbKWfCivzUORBLreNkMWmQ2WIFlM2PJ67KrLzPgOpRmdptgx5B+Vms4OImWDIigLVN/cCJlYlZlGCsPE7FoJeH4SOxin/vjQmlwiVYe2NXwWnt7ZornWUdg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(376002)(366004)(39830400003)(396003)(54906003)(55016002)(7696005)(9686003)(508600001)(5660300002)(6862004)(53546011)(6506007)(52536014)(64756008)(66556008)(66446008)(66476007)(4326008)(76116006)(2906002)(71200400001)(66946007)(6636002)(26005)(8676002)(86362001)(38100700002)(33656002)(83380400001)(8936002)(122000001)(186003)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?1HKfh9o838pSlH9XgGo2FA43qyJOjgNPXRe2+r3Db5mSNfzUGIpypRwh+Sq7?= =?us-ascii?Q?GEa7UpAxosstPdB21zIF9ijbWXAYomOOU58kMW0zcF5veWnbJXTA3a8HqN2Z?= =?us-ascii?Q?O9WB5uGAdwutQgwJi+wGQkYCScghmc0405Ygkk+MGtPd51fo/C8+NuIT9WZD?= =?us-ascii?Q?bNYXnlIsezNpHl5SMExidkwFu047F+ZG/rnQI368XMefN7qfDseTmxzqUf9i?= =?us-ascii?Q?QnJelGfsr95MatIBayCD+tuSW1K6r+4p9NcgaVDwEzWiD4G3pOINLaJzaHJJ?= =?us-ascii?Q?/hxf1rfGUKsCrON3FGKDjc7BO0SBRO1NOJkcnnpspB3Zd/TlAzZWaW2XVYd6?= =?us-ascii?Q?bXnj3UULXcTV6aA9tE1+aycG2eqO9OPCPBmqQNXaqc2fSqRyqUh0xIGaHU9A?= =?us-ascii?Q?Hz03Y/KNHoEjXhgTQnZ5Y1FHo0KLaK5XLhZ+qDzitHkArf8GgwSlAnbTrTnL?= =?us-ascii?Q?yB86+GEcGwOJfV2fGzqC1ZfM8VEHCi1aX95GstdoaJ0EAQklJukwAtONdCyy?= =?us-ascii?Q?hNE55Qyjh6MCdtqto+rCFO+UcBs10kWqEsjUvYRcCOHEpcmn0tlRBzRZWhMc?= =?us-ascii?Q?I/gcl2YAs29SQxyknDxBqb7KUATbuZTPQbk1/hQmqsUIA5UBzF8Rx5sPQ3Pz?= =?us-ascii?Q?02qIbQK8VYPaSBO7M+16takSJJuzkwZswNwqBtPHxRTRfVIjZTxbXekb9tTO?= =?us-ascii?Q?eV1F5yl/BWL8evozXVeSPfeOF4bSwOvs/juvg5/tIxWYkNIpJL/mLSVLctYY?= =?us-ascii?Q?+QzA+ZVfsR2L7Bvn+EItfN61UOrvkl0D34VK+7K5Sviuimjmv/8egycp+8gx?= =?us-ascii?Q?FiGhbwn6tLgMMQ60Nt6SSPMjE+4vFwtDh2FbQR5SyNb+le98FiPvkpZRUveQ?= =?us-ascii?Q?NP/dTCY2BHfrrr1zDiTJSXSTbd9XfahqPTsrloR2VshG5gHGSB+92jYvgrfp?= =?us-ascii?Q?d7jkOyZx6UOurUYozGgOQcSKb892LzSZW8fRmozpX5TK3j/YgWAWyiTinbua?= =?us-ascii?Q?/eWQjDyPu6Ga5olB3aM1XwGPgi8TaDH09C4qdMhqa3sOmb+OjwYCr6cqkVES?= =?us-ascii?Q?dA3HE47/eXTdJS+Y515eB3dQHQ6BCeKZlRE+VeC8MZiR9Ahls620694CPCfS?= =?us-ascii?Q?37wK2TkFho39TAkdBs07VrirobUCQo91Cwbq7k/wIt0CEuyrH+EpG3njT8sg?= =?us-ascii?Q?uonyyMGwoNv7Gt9do0xontFIBYXfloHNADdfc7t4QXoeKUNsPwgJB17abiJU?= =?us-ascii?Q?+lXCP5tcIOgutWB+rp9RpeXRFk5pSLCZOrsN+HEOcklUTxELzGDXWeLoap15?= =?us-ascii?Q?GuMreg6kxrcCYv0tEt9jh6cp?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4383 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: d183ebb2-09b2-4304-7164-08d93be15638 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NaDtALmPKA9PO9cxCWQi35/STl/pWcN0yYLmUIPLiMTjF0loxdAHm6Wv1/C11GgyJYs9RWqyeVV9cDdEbtJEeRyzyE34IH+9REEFPIq0a9JM573Kw92QrLTtaXOvWLUi/mFI81I6pc/QuomJvV1xVeQ+h5an0jZPQ6F9Za2o54cffaa/4zBVMpNbbpfDY6pBIT1wBSxEIrGMgEzG1fKrWwZ/J5IcDXsXqIU/cuwhhziYTZgDnMIyrRvgD5sBkoRdz7GnYstQMbs5dQnNNX+g8G+Z101UVXVGpRQEKl6imMTUQ8rNMzVIFssmqyyDcZyq8XjHTsNhQV9wjKT2Xp4Yl2gb3vjWKXHXDglMjFjwl/C6PKVq4je0enP7GPl7JH6anfZNNE21K8jSJmdL6a4yi+Crmq/BRYVQZptfkAO2vvHrLQBScghsZxwij7x0rHjyQ+xBj0hKFYFsyyBTFIuCEmCRqTS7CPI5fst+zFsoStSpoLtXhONDcEraM9VpfxM+O1irZTREkYZ3y0ACPkRgUr/VABBZ1ITG0B2v47S08uTPqn0JjkU52X5iVJvXRgT0WpcgW6vnM6X9zUmClEmKdnSEmIKQ11yauGNF7edY91+yG9mvUchhevqFKI1m8fY2sap9ke7mZgITMhHTRtCWabiid8ZjMcCXUCo6nQjS17UC7e/EV7EQSjD6/rLGCP7EYlujZfawPv7kX3Bi2J1CvsOCIbSCV9iSfv7S65fCVts= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(396003)(39830400003)(376002)(346002)(36840700001)(46966006)(33656002)(186003)(26005)(55016002)(5660300002)(8676002)(9686003)(6636002)(53546011)(6506007)(82310400003)(52536014)(30864003)(47076005)(8936002)(7696005)(86362001)(508600001)(6862004)(4326008)(70206006)(2906002)(83380400001)(336012)(81166007)(356005)(70586007)(54906003)(36860700001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jun 2021 16:08:53.9790 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 30f79886-81c9-4a06-9184-08d93be15b9e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5279 X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jun 2021 16:08:59 -0000 > -----Original Message----- > From: Richard Sandiford > Sent: Monday, June 14, 2021 4:55 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on > inverted operands >=20 > Tamar Christina writes: > > Hi Richard, > >> -----Original Message----- > >> From: Richard Sandiford > >> Sent: Monday, June 14, 2021 3:50 PM > >> To: Tamar Christina > >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > >> ; Marcus Shawcroft > >> ; Kyrylo Tkachov > > >> Subject: Re: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks > >> on inverted operands > >> > >> Tamar Christina writes: > >> > Hi All, > >> > > >> > This RFC is trying to address the following inefficiency when > >> > vectorizing conditional statements with SVE. > >> > > >> > Consider the case > >> > > >> > void f10(double * restrict z, double * restrict w, double * restrict= x, > >> > double * restrict y, int n) > >> > { > >> > for (int i =3D 0; i < n; i++) { > >> > z[i] =3D (w[i] > 0) ? x[i] + w[i] : y[i] - w[i]; > >> > } > >> > } > >> > > >> > > >> > For which we currently generate at -O3: > >> > > >> > f10: > >> > cmp w4, 0 > >> > ble .L1 > >> > mov x5, 0 > >> > whilelo p1.d, wzr, w4 > >> > ptrue p3.b, all > >> > .L3: > >> > ld1d z1.d, p1/z, [x1, x5, lsl 3] > >> > fcmgt p2.d, p1/z, z1.d, #0.0 > >> > fcmgt p0.d, p3/z, z1.d, #0.0 > >> > ld1d z2.d, p2/z, [x2, x5, lsl 3] > >> > bic p0.b, p3/z, p1.b, p0.b > >> > ld1d z0.d, p0/z, [x3, x5, lsl 3] > >> > fsub z0.d, p0/m, z0.d, z1.d > >> > movprfx z0.d, p2/m, z1.d > >> > fadd z0.d, p2/m, z0.d, z2.d > >> > st1d z0.d, p1, [x0, x5, lsl 3] > >> > incd x5 > >> > whilelo p1.d, w5, w4 > >> > b.any .L3 > >> > .L1: > >> > ret > >> > > >> > Notice that the condition for the else branch duplicates the same > >> > predicate as the then branch and then uses BIC to negate the results= . > >> > > >> > The reason for this is that during instruction generation in the > >> > vectorizer we emit > >> > > >> > mask__41.11_66 =3D vect__4.10_64 > vect_cst__65; > >> > vec_mask_and_69 =3D mask__41.11_66 & loop_mask_63; > >> > vec_mask_and_71 =3D mask__41.11_66 & loop_mask_63; > >> > mask__43.16_73 =3D ~mask__41.11_66; > >> > vec_mask_and_76 =3D mask__43.16_73 & loop_mask_63; > >> > vec_mask_and_78 =3D mask__43.16_73 & loop_mask_63; > >> > > >> > which ultimately gets optimized to > >> > > >> > mask__41.11_66 =3D vect__4.10_64 > { 0.0, ... }; > >> > vec_mask_and_69 =3D loop_mask_63 & mask__41.11_66; > >> > mask__43.16_73 =3D ~mask__41.11_66; > >> > vec_mask_and_76 =3D loop_mask_63 & mask__43.16_73; > >> > > >> > Notice how the negate is on the operation and not the predicate > >> > resulting from the operation. When this is expanded this turns > >> > into RTL where the negate is on the compare directly. This means > >> > the RTL is different from the one without the negate and so CSE is > >> > unable to > >> recognize that they are essentially same operation. > >> > > >> > To fix this my patch changes it so you negate the mask rather than > >> > the operation > >> > > >> > mask__41.13_55 =3D vect__4.12_53 > { 0.0, ... }; > >> > vec_mask_and_58 =3D loop_mask_52 & mask__41.13_55; > >> > vec_mask_op_67 =3D ~vec_mask_and_58; > >> > vec_mask_and_65 =3D loop_mask_52 & vec_mask_op_67; > >> > >> But to me this looks like a pessimisation in gimple terms. We've > >> increased the length of the critical path: vec_mask_and_65 now needs > >> a chain of > >> 4 operations instead of 3. > > > > True, but it should reduce the number of RTL patterns. I would have > > thought RTL is more expensive to handle than gimple. >=20 > I think this is only a fair gimple optimisation if gimple does the isel i= tself (to a > predicated compare and a predicated NOT). >=20 > >> We also need to be careful not to pessimise the case in which the > >> comparison is an integer one. At the moment we'll generate opposed > >> conditions, which is the intended behaviour: > > > > This still happens with this patch at `-Ofast` because that flips the > > conditions, So the different representation doesn't harm it. >=20 > OK, that's good. >=20 > >> > >> .L3: > >> ld1d z1.d, p0/z, [x1, x5, lsl 3] > >> cmpgt p2.d, p0/z, z1.d, #0 > >> movprfx z2, z1 > >> scvtf z2.d, p3/m, z1.d > >> cmple p1.d, p0/z, z1.d, #0 > >> ld1d z0.d, p2/z, [x2, x5, lsl 3] > >> ld1d z1.d, p1/z, [x3, x5, lsl 3] > >> fadd z0.d, p2/m, z0.d, z2.d > >> movprfx z0.d, p1/m, z1.d > >> fsub z0.d, p1/m, z0.d, z2.d > >> st1d z0.d, p0, [x0, x5, lsl 3] > >> add x5, x5, x6 > >> whilelo p0.d, w5, w4 > >> b.any .L3 > >> > >> Could we handle the fcmp case using a 3->2 define_split instead: > >> convert > >> > >> (set res (and (not (fcmp X Y)) Z)) -> > >> (set res (fcmp X Y)) > >> (set res (and (not res) Z)) > >> > > > > This was the other approach I mentioned. It works, and gives you the ne= g, > but only in the case where the compare is single use. >=20 > But in the original example we duplicate the comparison through a > 2->2 combine, which leaves the original comparison as a single use. > Isn't that enough? >=20 > > e.g. in > > > > void f11(double * restrict z, double * restrict w, double * restrict > > x, double * restrict y, int n) { > > for (int i =3D 0; i < n; i++) { > > z[i] =3D (w[i] > 0) ? w[i] : y[i] - w[i]; > > } > > } > > > > You have some of the same problem. It generates > > > > ld1d z0.d, p0/z, [x1, x2, lsl 3] > > fcmgt p2.d, p3/z, z0.d, #0.0 > > bic p1.b, p3/z, p0.b, p2.b > > ld1d z1.d, p1/z, [x3, x2, lsl 3] > > fsub z1.d, p1/m, z1.d, z0.d > > sel z0.d, p2, z0.d, z1.d > > st1d z0.d, p0, [x0, x2, lsl 3] > > incd x2 > > whilelo p0.d, w2, w4 > > > > which has two problems. fcmgt doesn't need to be predicated on p3 > > which is ptrue all, it can/should be p0. > > > > With that fixed the splitter won't match because p2 is needed in the > > sel, so it's not single use and so combine won't try to build the RTL s= o it can > be split. >=20 > I think having the vectoriser avoid the dual use between the > IFN_MASK_LOAD/STORE and the VEC_COND_EXPR is fair game, since that is > the only pass that has the context to prove that including the loop mask = in > the VEC_COND_EXPR condition is correct. We already try to do that to som= e > extent: >=20 Sorry I have been looking at this these past couple of days and I just don'= t know how this is supposed to work. In the above example the problem is not just the use of p2 in the VEC_COND_= EXPR. If the VEC_COND_EXPR is changed to use p1 then p1 now has 3 uses which makes combine still not try the combination. But the general case void f10(double * restrict z, double * restrict w, double * restrict x, dou= ble * restrict y, int n) { for (int i =3D 0; i < n; i++) { z[i] =3D (w[i] > 0) ? x[i] + w[i] : y[i] - w[i]; } } Produces f10: cmp w4, 0 ble .L1 mov x5, 0 whilelo p1.d, wzr, w4 ptrue p3.b, all .p2align 5,,15 .L3: ld1d z1.d, p1/z, [x1, x5, lsl 3] fcmgt p2.d, p1/z, z1.d, #0.0 fcmgt p0.d, p3/z, z1.d, #0.0 ld1d z2.d, p2/z, [x2, x5, lsl 3] bic p0.b, p3/z, p1.b, p0.b ld1d z0.d, p0/z, [x3, x5, lsl 3] fsub z0.d, p0/m, z0.d, z1.d movprfx z0.d, p2/m, z1.d fadd z0.d, p2/m, z0.d, z2.d st1d z0.d, p1, [x0, x5, lsl 3] incd x5 whilelo p1.d, w5, w4 b.any .L3 where the VEC_COND_EXPR has been elided. The problem is that the comparison for the inverse case is unpredicated. Which causes the compiler to predicate it on ptrue when it's being generate= d from the load. The resulting Gimple is [local count: 105119322]: bnd.9_49 =3D (unsigned int) n_14(D); max_mask_79 =3D .WHILE_ULT (0, bnd.9_49, { 0, ... }); [local count: 630715932]: # ivtmp_76 =3D PHI # loop_mask_52 =3D PHI _27 =3D &MEM [(double *)w_15(D) + ivtmp_76 * 8]; vect__4.12_53 =3D .MASK_LOAD (_27, 64B, loop_mask_52); mask__41.13_55 =3D vect__4.12_53 > { 0.0, ... }; vec_mask_and_58 =3D loop_mask_52 & mask__41.13_55; _48 =3D &MEM [(double *)x_18(D) + ivtmp_76 * 8]; vect__6.16_59 =3D .MASK_LOAD (_48, 64B, vec_mask_and_58); mask__43.18_62 =3D ~mask__41.13_55; vec_mask_and_65 =3D loop_mask_52 & mask__43.18_62; _25 =3D &MEM [(double *)y_16(D) + ivtmp_76 * 8]; vect__8.21_66 =3D .MASK_LOAD (_25, 64B, vec_mask_and_65); vect_iftmp.22_68 =3D .COND_SUB (vec_mask_and_65, vect__8.21_66, vect__4.1= 2_53, vect__8.21_66); _50 =3D .COND_ADD (vec_mask_and_58, vect__4.12_53, vect__6.16_59, vect_if= tmp.22_68); _1 =3D &MEM [(double *)z_20(D) + ivtmp_76 * 8]; .MASK_STORE (_1, 64B, loop_mask_52, _50); ivtmp_77 =3D ivtmp_76 + POLY_INT_CST [2, 2]; _2 =3D (unsigned int) ivtmp_77; next_mask_80 =3D .WHILE_ULT (_2, bnd.9_49, { 0, ... }); where the mask vec_mask_and_65 is masking the negate and not the compare be= cause of how the expansion is done. =20 > /* See whether another part of the vectorized code applies a loop > mask to the condition, or to its inverse. */ >=20 > but it would need extending to handle this case. This code is fine as far as I can tell. But there's nothing you can do her= e. The mask it needs is ~original So it does not find an inverse mask to use because it has to honor floating= point exceptions. And indeed `-fno-trapping-math` or `-Ofast` generates the most optimal sequ= ence, but when honoring traps it can't re-use invert existing mask, which leaves the operation unpr= edicated. So is what you're requesting that it looks inside unary operators and tries= to CSE the thing they're pointed to? In which case isn't it about the same as what I had before just that the ve= ctorizer did the CSE itself? If that's the case maybe it's better to do lookups into loop_vinfo->scalar_= cond_masked_set in prepare_load_store_mask? So that it just applies to everything? Thanks, Tamar >=20 > Thanks, > Richard