From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Tamar.Christina@arm.com>
Received: from EUR02-AM5-obe.outbound.protection.outlook.com
 (mail-eopbgr00041.outbound.protection.outlook.com [40.107.0.41])
 by sourceware.org (Postfix) with ESMTPS id AC33C3857036
 for <gcc-patches@gcc.gnu.org>; Wed, 30 Jun 2021 16:08:56 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AC33C3857036
Received: from AM5PR0201CA0011.eurprd02.prod.outlook.com
 (2603:10a6:203:3d::21) by VE1PR08MB5279.eurprd08.prod.outlook.com
 (2603:10a6:803:105::18) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.22; Wed, 30 Jun
 2021 16:08:54 +0000
Received: from AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com
 (2603:10a6:203:3d:cafe::37) by AM5PR0201CA0011.outlook.office365.com
 (2603:10a6:203:3d::21) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.22 via Frontend
 Transport; Wed, 30 Jun 2021 16:08:54 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none
 header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 AM5EUR03FT011.mail.protection.outlook.com (10.152.16.152) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.4287.22 via Frontend Transport; Wed, 30 Jun 2021 16:08:53 +0000
Received: ("Tessian outbound 1763b1d84bc3:v97");
 Wed, 30 Jun 2021 16:08:53 +0000
X-CR-MTA-TID: 64aa7808
Received: from 733c478fa50f.3
 by 64aa7808-outbound-1.mta.getcheckrecipient.com id
 E359F1DB-E204-4700-9F8B-C6C0D240AC94.1; 
 Wed, 30 Jun 2021 16:08:47 +0000
Received: from EUR05-DB8-obe.outbound.protection.outlook.com
 by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 733c478fa50f.3
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
 Wed, 30 Jun 2021 16:08:47 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=VM/OLCpLgND7PXgqozNKtVDAcMxvvz0qNWQN0k6Xc84en5wz9tiTgxbx3NMvDzW8ly6qhWr4He7AjQh288Rnbj3KXZnLsMXLxp/tLWcucHSv6d1BdOnEpD/GqFUHfIvorCsu/gn9jXs7mItLn1EhiHoYolEA+E8MxTk3Edo4GLB1W8w0AaZvorP193/gEwcaYdAuathVYbePwJzXcvvKiYT+cQKOxADXBeCi6UNGHcmp3hy00OZ9k5msQgAKCfPjJLjnJPkcm+mv62XNQ6t8VzpgACt8bFm7AqCD3Fij9SYRgLcAa2uBSvL3MyOC+1kitPEfmRQxihjCIZlVieT1sQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=foGLvgHw5frbHbaQwTN3Ur1vLzE0QXmgRT3IDbZ4n3w=;
 b=Jp08ucUYiTj7uMLl2zd4e5t2RQXwnfsaXolFpecl5AGrugussvX/hhRG/WWPCzBVca1ByOKQZ8XtXunDc4MIDD+q5DL91XtkB9CJF1VNnTW5Fe4zkFlos1fo7MJAbPWXInd1rVmEBFu5UTMinyWtSFi551tfDsVVjJye/InGT1oVPNgDwW2yOuQc8lxV+/t4IxiXB10nyRWNjW7lS/6gHctXNKa8ilqNH08Kd7k/lefMmXep8knI9bzqJ5NyyZIcQzaZBdbXC9HiVrRK9phkFIRGEU4HszfpgD8jHkzML4uGFPyjoV0+4owfGfZygFyXVeqkQjrItf4d93vTKD2pbw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17)
 by VI1PR08MB4383.eurprd08.prod.outlook.com (2603:10a6:803:fc::10)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.21; Wed, 30 Jun
 2021 16:08:45 +0000
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::dd47:2a77:e102:e87a]) by VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::dd47:2a77:e102:e87a%5]) with mapi id 15.20.4264.026; Wed, 30 Jun 2021
 16:08:44 +0000
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Sandiford <Richard.Sandiford@arm.com>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>, Marcus Shawcroft
 <Marcus.Shawcroft@arm.com>, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: RE: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on
 inverted operands
Thread-Topic: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on
 inverted operands
Thread-Index: AQHXYSMxRzrhr1N50E2P2j3Yox4PoKsTluyKgAAA4JCAABEiZoAY+M1Q
Date: Wed, 30 Jun 2021 16:08:44 +0000
Message-ID: <VI1PR08MB532548799CA7FAB7BCC00229FF019@VI1PR08MB5325.eurprd08.prod.outlook.com>
References: <patch-14553-tamar@arm.com> <mpt7diwjyam.fsf@arm.com>
 <VI1PR08MB532566E232B32751F7225EEFFF319@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mptim2gigqs.fsf@arm.com>
In-Reply-To: <mptim2gigqs.fsf@arm.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ts-tracking-id: 3DFC5D6E4C853549A6DB800DF523642A.0
x-checkrecipientchecked: true
Authentication-Results-Original: arm.com; dkim=none (message not signed)
 header.d=none;arm.com; dmarc=none action=none header.from=arm.com;
x-originating-ip: [82.11.185.166]
x-ms-publictraffictype: Email
X-MS-Office365-Filtering-Correlation-Id: 30f79886-81c9-4a06-9184-08d93be15b9e
x-ms-traffictypediagnostic: VI1PR08MB4383:|VE1PR08MB5279:
x-ms-exchange-transport-forked: True
X-Microsoft-Antispam-PRVS: <VE1PR08MB5279FBDC2D8965E95934DA11FF019@VE1PR08MB5279.eurprd08.prod.outlook.com>
x-checkrecipientrouted: true
nodisclaimer: true
x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000;
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: FXawnSRmugKdQP6A3FBWwAhy0CsLenlVkX595eyc9+4YU1tddf7w+D+sLIjG5DlI5kSmMcSPq4klv6Qs5YaO7g0aO+kqU3gjpbyqLD5Z3QEKCCVTGW7JxWpWwODeDaWyUGlla6zEW0XCTW+51UpanJ1m/Bu3mV3VDGzGjrlJu5VJ4DWeRonHEbaj1Z7BlzfI044PQ+jObWq7/qMaX3fI4hb2i96q+4rVvZ7VEPi3lTF/hbYzBmVaDVM9X+WOoPR1XOBEE0YAcdzB9FOIP4AM7V0gdfNvuIamcZHFrFer+4Jk6RA+8VtPgpjkyYXS2iXOvehgIWlKDi92wAfaFfXByJVUJKjmX4Ornhmp1mb7V2bD1u/hEqDOF3JSeW94k1VMrEiWbba8YXfeIEzLCI6sj17qKB4Ytntyc7IMXG26N41Ifin04RdmTzHgiD0Gue5lywtakxuqounsNv7O3parcIPWoOkISCQ8nYcQiHanjj7DiD9EAfuxcqNz4O/g+WGxBbe4qkRREnvjKU3Qow/yJxrOphw1xBRyoUX2yLrSYQyL/O8j2U8OT+TCeK8VwHTQ/QescDoRQp+DaSaeR6ZU3YGn14jNHEUwbKWfCivzUORBLreNkMWmQ2WIFlM2PJ67KrLzPgOpRmdptgx5B+Vms4OImWDIigLVN/cCJlYlZlGCsPE7FoJeH4SOxin/vjQmlwiVYe2NXwWnt7ZornWUdg==
X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en;
 SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com;
 PTR:; CAT:NONE;
 SFS:(4636009)(346002)(376002)(366004)(39830400003)(396003)(54906003)(55016002)(7696005)(9686003)(508600001)(5660300002)(6862004)(53546011)(6506007)(52536014)(64756008)(66556008)(66446008)(66476007)(4326008)(76116006)(2906002)(71200400001)(66946007)(6636002)(26005)(8676002)(86362001)(38100700002)(33656002)(83380400001)(8936002)(122000001)(186003)(357404004);
 DIR:OUT; SFP:1101; 
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?1HKfh9o838pSlH9XgGo2FA43qyJOjgNPXRe2+r3Db5mSNfzUGIpypRwh+Sq7?=
 =?us-ascii?Q?GEa7UpAxosstPdB21zIF9ijbWXAYomOOU58kMW0zcF5veWnbJXTA3a8HqN2Z?=
 =?us-ascii?Q?O9WB5uGAdwutQgwJi+wGQkYCScghmc0405Ygkk+MGtPd51fo/C8+NuIT9WZD?=
 =?us-ascii?Q?bNYXnlIsezNpHl5SMExidkwFu047F+ZG/rnQI368XMefN7qfDseTmxzqUf9i?=
 =?us-ascii?Q?QnJelGfsr95MatIBayCD+tuSW1K6r+4p9NcgaVDwEzWiD4G3pOINLaJzaHJJ?=
 =?us-ascii?Q?/hxf1rfGUKsCrON3FGKDjc7BO0SBRO1NOJkcnnpspB3Zd/TlAzZWaW2XVYd6?=
 =?us-ascii?Q?bXnj3UULXcTV6aA9tE1+aycG2eqO9OPCPBmqQNXaqc2fSqRyqUh0xIGaHU9A?=
 =?us-ascii?Q?Hz03Y/KNHoEjXhgTQnZ5Y1FHo0KLaK5XLhZ+qDzitHkArf8GgwSlAnbTrTnL?=
 =?us-ascii?Q?yB86+GEcGwOJfV2fGzqC1ZfM8VEHCi1aX95GstdoaJ0EAQklJukwAtONdCyy?=
 =?us-ascii?Q?hNE55Qyjh6MCdtqto+rCFO+UcBs10kWqEsjUvYRcCOHEpcmn0tlRBzRZWhMc?=
 =?us-ascii?Q?I/gcl2YAs29SQxyknDxBqb7KUATbuZTPQbk1/hQmqsUIA5UBzF8Rx5sPQ3Pz?=
 =?us-ascii?Q?02qIbQK8VYPaSBO7M+16takSJJuzkwZswNwqBtPHxRTRfVIjZTxbXekb9tTO?=
 =?us-ascii?Q?eV1F5yl/BWL8evozXVeSPfeOF4bSwOvs/juvg5/tIxWYkNIpJL/mLSVLctYY?=
 =?us-ascii?Q?+QzA+ZVfsR2L7Bvn+EItfN61UOrvkl0D34VK+7K5Sviuimjmv/8egycp+8gx?=
 =?us-ascii?Q?FiGhbwn6tLgMMQ60Nt6SSPMjE+4vFwtDh2FbQR5SyNb+le98FiPvkpZRUveQ?=
 =?us-ascii?Q?NP/dTCY2BHfrrr1zDiTJSXSTbd9XfahqPTsrloR2VshG5gHGSB+92jYvgrfp?=
 =?us-ascii?Q?d7jkOyZx6UOurUYozGgOQcSKb892LzSZW8fRmozpX5TK3j/YgWAWyiTinbua?=
 =?us-ascii?Q?/eWQjDyPu6Ga5olB3aM1XwGPgi8TaDH09C4qdMhqa3sOmb+OjwYCr6cqkVES?=
 =?us-ascii?Q?dA3HE47/eXTdJS+Y515eB3dQHQ6BCeKZlRE+VeC8MZiR9Ahls620694CPCfS?=
 =?us-ascii?Q?37wK2TkFho39TAkdBs07VrirobUCQo91Cwbq7k/wIt0CEuyrH+EpG3njT8sg?=
 =?us-ascii?Q?uonyyMGwoNv7Gt9do0xontFIBYXfloHNADdfc7t4QXoeKUNsPwgJB17abiJU?=
 =?us-ascii?Q?+lXCP5tcIOgutWB+rp9RpeXRFk5pSLCZOrsN+HEOcklUTxELzGDXWeLoap15?=
 =?us-ascii?Q?GuMreg6kxrcCYv0tEt9jh6cp?=
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4383
Original-Authentication-Results: arm.com; dkim=none (message not signed)
 header.d=none;arm.com; dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com
X-MS-Office365-Filtering-Correlation-Id-Prvs: d183ebb2-09b2-4304-7164-08d93be15638
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: NaDtALmPKA9PO9cxCWQi35/STl/pWcN0yYLmUIPLiMTjF0loxdAHm6Wv1/C11GgyJYs9RWqyeVV9cDdEbtJEeRyzyE34IH+9REEFPIq0a9JM573Kw92QrLTtaXOvWLUi/mFI81I6pc/QuomJvV1xVeQ+h5an0jZPQ6F9Za2o54cffaa/4zBVMpNbbpfDY6pBIT1wBSxEIrGMgEzG1fKrWwZ/J5IcDXsXqIU/cuwhhziYTZgDnMIyrRvgD5sBkoRdz7GnYstQMbs5dQnNNX+g8G+Z101UVXVGpRQEKl6imMTUQ8rNMzVIFssmqyyDcZyq8XjHTsNhQV9wjKT2Xp4Yl2gb3vjWKXHXDglMjFjwl/C6PKVq4je0enP7GPl7JH6anfZNNE21K8jSJmdL6a4yi+Crmq/BRYVQZptfkAO2vvHrLQBScghsZxwij7x0rHjyQ+xBj0hKFYFsyyBTFIuCEmCRqTS7CPI5fst+zFsoStSpoLtXhONDcEraM9VpfxM+O1irZTREkYZ3y0ACPkRgUr/VABBZ1ITG0B2v47S08uTPqn0JjkU52X5iVJvXRgT0WpcgW6vnM6X9zUmClEmKdnSEmIKQ11yauGNF7edY91+yG9mvUchhevqFKI1m8fY2sap9ke7mZgITMhHTRtCWabiid8ZjMcCXUCo6nQjS17UC7e/EV7EQSjD6/rLGCP7EYlujZfawPv7kX3Bi2J1CvsOCIbSCV9iSfv7S65fCVts=
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:;
 IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com;
 PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE;
 SFS:(4636009)(396003)(39830400003)(376002)(346002)(36840700001)(46966006)(33656002)(186003)(26005)(55016002)(5660300002)(8676002)(9686003)(6636002)(53546011)(6506007)(82310400003)(52536014)(30864003)(47076005)(8936002)(7696005)(86362001)(508600001)(6862004)(4326008)(70206006)(2906002)(83380400001)(336012)(81166007)(356005)(70586007)(54906003)(36860700001)(357404004);
 DIR:OUT; SFP:1101; 
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jun 2021 16:08:53.9790 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 30f79886-81c9-4a06-9184-08d93be15b9e
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
 Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT011.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5279
X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS,
 TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Jun 2021 16:08:59 -0000

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, June 14, 2021 4:55 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on
> inverted operands
>=20
> Tamar Christina <Tamar.Christina@arm.com> writes:
> > Hi Richard,
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: Monday, June 14, 2021 3:50 PM
> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> >> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> >> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> >> Subject: Re: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks
> >> on inverted operands
> >>
> >> Tamar Christina <tamar.christina@arm.com> writes:
> >> > Hi All,
> >> >
> >> > This RFC is trying to address the following inefficiency when
> >> > vectorizing conditional statements with SVE.
> >> >
> >> > Consider the case
> >> >
> >> > void f10(double * restrict z, double * restrict w, double * restrict=
 x,
> >> > 	 double * restrict y, int n)
> >> > {
> >> >     for (int i =3D 0; i < n; i++) {
> >> >         z[i] =3D (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
> >> >     }
> >> > }
> >> >
> >> >
> >> > For which we currently generate at -O3:
> >> >
> >> > f10:
> >> >         cmp     w4, 0
> >> >         ble     .L1
> >> >         mov     x5, 0
> >> >         whilelo p1.d, wzr, w4
> >> >         ptrue   p3.b, all
> >> > .L3:
> >> >         ld1d    z1.d, p1/z, [x1, x5, lsl 3]
> >> >         fcmgt   p2.d, p1/z, z1.d, #0.0
> >> >         fcmgt   p0.d, p3/z, z1.d, #0.0
> >> >         ld1d    z2.d, p2/z, [x2, x5, lsl 3]
> >> >         bic     p0.b, p3/z, p1.b, p0.b
> >> >         ld1d    z0.d, p0/z, [x3, x5, lsl 3]
> >> >         fsub    z0.d, p0/m, z0.d, z1.d
> >> >         movprfx z0.d, p2/m, z1.d
> >> >         fadd    z0.d, p2/m, z0.d, z2.d
> >> >         st1d    z0.d, p1, [x0, x5, lsl 3]
> >> >         incd    x5
> >> >         whilelo p1.d, w5, w4
> >> >         b.any   .L3
> >> > .L1:
> >> >         ret
> >> >
> >> > Notice that the condition for the else branch duplicates the same
> >> > predicate as the then branch and then uses BIC to negate the results=
.
> >> >
> >> > The reason for this is that during instruction generation in the
> >> > vectorizer we emit
> >> >
> >> >   mask__41.11_66 =3D vect__4.10_64 > vect_cst__65;
> >> >   vec_mask_and_69 =3D mask__41.11_66 & loop_mask_63;
> >> >   vec_mask_and_71 =3D mask__41.11_66 & loop_mask_63;
> >> >   mask__43.16_73 =3D ~mask__41.11_66;
> >> >   vec_mask_and_76 =3D mask__43.16_73 & loop_mask_63;
> >> >   vec_mask_and_78 =3D mask__43.16_73 & loop_mask_63;
> >> >
> >> > which ultimately gets optimized to
> >> >
> >> >   mask__41.11_66 =3D vect__4.10_64 > { 0.0, ... };
> >> >   vec_mask_and_69 =3D loop_mask_63 & mask__41.11_66;
> >> >   mask__43.16_73 =3D ~mask__41.11_66;
> >> >   vec_mask_and_76 =3D loop_mask_63 & mask__43.16_73;
> >> >
> >> > Notice how the negate is on the operation and not the predicate
> >> > resulting from the operation.  When this is expanded this turns
> >> > into RTL where the negate is on the compare directly.  This means
> >> > the RTL is different from the one without the negate and so CSE is
> >> > unable to
> >> recognize that they are essentially same operation.
> >> >
> >> > To fix this my patch changes it so you negate the mask rather than
> >> > the operation
> >> >
> >> >   mask__41.13_55 =3D vect__4.12_53 > { 0.0, ... };
> >> >   vec_mask_and_58 =3D loop_mask_52 & mask__41.13_55;
> >> >   vec_mask_op_67 =3D ~vec_mask_and_58;
> >> >   vec_mask_and_65 =3D loop_mask_52 & vec_mask_op_67;
> >>
> >> But to me this looks like a pessimisation in gimple terms.  We've
> >> increased the length of the critical path: vec_mask_and_65 now needs
> >> a chain of
> >> 4 operations instead of 3.
> >
> > True, but it should reduce the number of RTL patterns.  I would have
> > thought RTL is more expensive to handle than gimple.
>=20
> I think this is only a fair gimple optimisation if gimple does the isel i=
tself (to a
> predicated compare and a predicated NOT).
>=20
> >> We also need to be careful not to pessimise the case in which the
> >> comparison is an integer one.  At the moment we'll generate opposed
> >> conditions, which is the intended behaviour:
> >
> > This still happens with this patch at `-Ofast` because that flips the
> > conditions, So the different representation doesn't harm it.
>=20
> OK, that's good.
>=20
> >>
> >> .L3:
> >>         ld1d    z1.d, p0/z, [x1, x5, lsl 3]
> >>         cmpgt   p2.d, p0/z, z1.d, #0
> >>         movprfx z2, z1
> >>         scvtf   z2.d, p3/m, z1.d
> >>         cmple   p1.d, p0/z, z1.d, #0
> >>         ld1d    z0.d, p2/z, [x2, x5, lsl 3]
> >>         ld1d    z1.d, p1/z, [x3, x5, lsl 3]
> >>         fadd    z0.d, p2/m, z0.d, z2.d
> >>         movprfx z0.d, p1/m, z1.d
> >>         fsub    z0.d, p1/m, z0.d, z2.d
> >>         st1d    z0.d, p0, [x0, x5, lsl 3]
> >>         add     x5, x5, x6
> >>         whilelo p0.d, w5, w4
> >>         b.any   .L3
> >>
> >> Could we handle the fcmp case using a 3->2 define_split instead:
> >> convert
> >>
> >>    (set res (and (not (fcmp X Y)) Z)) ->
> >>      (set res (fcmp X Y))
> >>      (set res (and (not res) Z))
> >>
> >
> > This was the other approach I mentioned. It works, and gives you the ne=
g,
> but only in the case where the compare is single use.
>=20
> But in the original example we duplicate the comparison through a
> 2->2 combine, which leaves the original comparison as a single use.
> Isn't that enough?
>=20
> > e.g. in
> >
> > void f11(double * restrict z, double * restrict w, double * restrict
> > x, double * restrict y, int n) {
> >     for (int i =3D 0; i < n; i++) {
> >         z[i] =3D (w[i] > 0) ? w[i] : y[i] - w[i];
> >     }
> > }
> >
> > You have some of the same problem. It generates
> >
> >         ld1d    z0.d, p0/z, [x1, x2, lsl 3]
> >         fcmgt   p2.d, p3/z, z0.d, #0.0
> >         bic     p1.b, p3/z, p0.b, p2.b
> >         ld1d    z1.d, p1/z, [x3, x2, lsl 3]
> >         fsub    z1.d, p1/m, z1.d, z0.d
> >         sel     z0.d, p2, z0.d, z1.d
> >         st1d    z0.d, p0, [x0, x2, lsl 3]
> >         incd    x2
> >         whilelo p0.d, w2, w4
> >
> > which has two problems. fcmgt doesn't need to be predicated on p3
> > which is ptrue all, it can/should be p0.
> >
> > With that fixed the splitter won't match because p2 is needed in the
> > sel, so it's not single use and so combine won't try to build the RTL s=
o it can
> be split.
>=20
> I think having the vectoriser avoid the dual use between the
> IFN_MASK_LOAD/STORE and the VEC_COND_EXPR is fair game, since that is
> the only pass that has the context to prove that including the loop mask =
in
> the VEC_COND_EXPR condition is correct.  We already try to do that to som=
e
> extent:
>=20

Sorry I have been looking at this these past couple of days and I just don'=
t know how
this is supposed to work.

In the above example the problem is not just the use of p2 in the VEC_COND_=
EXPR. If
the VEC_COND_EXPR is changed to use p1 then p1 now has 3 uses which makes
combine still not try the combination.

But the general case

void f10(double * restrict z, double * restrict w, double * restrict x, dou=
ble * restrict y, int n)
{
    for (int i =3D 0; i < n; i++) {
        z[i] =3D (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
    }
}

Produces

f10:
        cmp     w4, 0
        ble     .L1
        mov     x5, 0
        whilelo p1.d, wzr, w4
        ptrue   p3.b, all
        .p2align 5,,15
.L3:
        ld1d    z1.d, p1/z, [x1, x5, lsl 3]
        fcmgt   p2.d, p1/z, z1.d, #0.0
        fcmgt   p0.d, p3/z, z1.d, #0.0
        ld1d    z2.d, p2/z, [x2, x5, lsl 3]
        bic     p0.b, p3/z, p1.b, p0.b
        ld1d    z0.d, p0/z, [x3, x5, lsl 3]
        fsub    z0.d, p0/m, z0.d, z1.d
        movprfx z0.d, p2/m, z1.d
        fadd    z0.d, p2/m, z0.d, z2.d
        st1d    z0.d, p1, [x0, x5, lsl 3]
        incd    x5
        whilelo p1.d, w5, w4
        b.any   .L3

where the VEC_COND_EXPR has been elided.

The problem is that the comparison for the inverse case is unpredicated.

Which causes the compiler to predicate it on ptrue when it's being generate=
d from the load.
The resulting Gimple is

  <bb 3> [local count: 105119322]:
  bnd.9_49 =3D (unsigned int) n_14(D);
  max_mask_79 =3D .WHILE_ULT (0, bnd.9_49, { 0, ... });

  <bb 4> [local count: 630715932]:
  # ivtmp_76 =3D PHI <ivtmp_77(4), 0(3)>
  # loop_mask_52 =3D PHI <next_mask_80(4), max_mask_79(3)>
  _27 =3D &MEM <vector([2,2]) double> [(double *)w_15(D) + ivtmp_76 * 8];
  vect__4.12_53 =3D .MASK_LOAD (_27, 64B, loop_mask_52);
  mask__41.13_55 =3D vect__4.12_53 > { 0.0, ... };
  vec_mask_and_58 =3D loop_mask_52 & mask__41.13_55;
  _48 =3D &MEM <vector([2,2]) double> [(double *)x_18(D) + ivtmp_76 * 8];
  vect__6.16_59 =3D .MASK_LOAD (_48, 64B, vec_mask_and_58);
  mask__43.18_62 =3D ~mask__41.13_55;
  vec_mask_and_65 =3D loop_mask_52 & mask__43.18_62;
  _25 =3D &MEM <vector([2,2]) double> [(double *)y_16(D) + ivtmp_76 * 8];
  vect__8.21_66 =3D .MASK_LOAD (_25, 64B, vec_mask_and_65);
  vect_iftmp.22_68 =3D .COND_SUB (vec_mask_and_65, vect__8.21_66, vect__4.1=
2_53, vect__8.21_66);
  _50 =3D .COND_ADD (vec_mask_and_58, vect__4.12_53, vect__6.16_59, vect_if=
tmp.22_68);
  _1 =3D &MEM <vector([2,2]) double> [(double *)z_20(D) + ivtmp_76 * 8];
  .MASK_STORE (_1, 64B, loop_mask_52, _50);
  ivtmp_77 =3D ivtmp_76 + POLY_INT_CST [2, 2];
  _2 =3D (unsigned int) ivtmp_77;
  next_mask_80 =3D .WHILE_ULT (_2, bnd.9_49, { 0, ... });

where the mask vec_mask_and_65 is masking the negate and not the compare be=
cause of how the expansion
is done. =20

>   /* See whether another part of the vectorized code applies a loop
>      mask to the condition, or to its inverse.  */
>=20
> but it would need extending to handle this case.

This code is fine as far as I can tell.  But there's nothing you can do her=
e. The mask it needs is ~original
So it does not find an inverse mask to use because it has to honor floating=
 point exceptions.

And indeed `-fno-trapping-math` or `-Ofast` generates the most optimal sequ=
ence, but when honoring
traps it can't re-use invert existing mask, which leaves the operation unpr=
edicated.

So is what you're requesting that it looks inside unary operators and tries=
 to CSE the thing they're pointed to?
In which case isn't it about the same as what I had before just that the ve=
ctorizer did the CSE itself?

If that's the case maybe it's better to do lookups into loop_vinfo->scalar_=
cond_masked_set in prepare_load_store_mask?
So that it just applies to everything?

Thanks,
Tamar

>=20
> Thanks,
> Richard