From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on0625.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe0d::625]) by sourceware.org (Postfix) with ESMTPS id CD7703854835 for ; Mon, 14 Jun 2021 13:43:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CD7703854835 Received: from AM6PR10CA0028.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:209:89::41) by VI1PR08MB4429.eurprd08.prod.outlook.com (2603:10a6:803:100::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4219.21; Mon, 14 Jun 2021 13:43:13 +0000 Received: from VE1EUR03FT050.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:89:cafe::c4) by AM6PR10CA0028.outlook.office365.com (2603:10a6:209:89::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4219.21 via Frontend Transport; Mon, 14 Jun 2021 13:43:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT050.mail.protection.outlook.com (10.152.19.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4219.21 via Frontend Transport; Mon, 14 Jun 2021 13:43:13 +0000 Received: ("Tessian outbound cdfb4109116c:v93"); Mon, 14 Jun 2021 13:43:13 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 095450257fb743b1 X-CR-MTA-TID: 64aa7808 Received: from 1a65a1867af4.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C626C3ED-86BE-4D17-AEA7-B9187AAC9F2C.1; Mon, 14 Jun 2021 13:43:06 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 1a65a1867af4.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 14 Jun 2021 13:43:06 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=T/AgBjgloqeEAGbEBqkVA4iz71U3Hh6jpZjVPjwdXCBoHMdKqod6fEx3ACkq7mqbnNALiEn+KWNeipXCp8bMlbF6mniglBl/6BUviod0OYUpuPfGJPE0s4Ff36BvSxFTMkGP+EodlmERomjhM5/tjfNCLMgUWks9yJw+2lRi78aElGvRWM2oyzj1GPb5I7IVswdh1SM6C7D5rJCQiMei8RBEtjkxGOAD300r8kjqmHlpQ8pRQ/OcsG7Y6B5p8jxyyQzu3SYZiPFAOckZZxprsC6SCvo5Ma2pB6yvf3P28YoUUlWn+7d58am0sTzgTWOp1xGLuil096QBH9y0jdq8fQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+uXNQS2EthVMD/yWExCG8mZlOuMUx58VEozbW5/c1yg=; b=F+yZWX85owBffzis/HgwzGWes7MHWTQqpdw0YjsD5I5OmvAG+v4FAKy7yzdwaU6fwvshQu126nwrU/jnnXWEH++Lfzu5ohWwNJVEpOYwFRUCXhrZdRM7YJrPovpwayG/Yy1VnZcSOsZ8uxnuKqtaGwL/pHfC4Jr763gE0jRW6M+bAql3qUosJoI9Xc3ecCkWme4hUj6UqEZHgO/HnsTjg4QNzlAMX6RKB03jGouRQDznKaztc1lzf5tE6P24ZjBIY4DhyDl1wu4wZ7OsslUDcUd2KmrNcaxkQCBZ58WeIx19HWO+gLJG74Tj87ggY7wuKY6X1wxbCHmDH5xm8cl7qw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR0801MB1885.eurprd08.prod.outlook.com (2603:10a6:800:84::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4219.21; Mon, 14 Jun 2021 13:43:04 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::69b2:ae3a:1d7b:5e3c]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::69b2:ae3a:1d7b:5e3c%7]) with mapi id 15.20.4219.025; Mon, 14 Jun 2021 13:43:04 +0000 Date: Mon, 14 Jun 2021 14:43:01 +0100 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH][RFC]AArch64 SVE: Fix multiple comparison masks on inverted operands Message-ID: Content-Type: multipart/mixed; boundary="mYCpIKhGyMATD0i+" Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Originating-IP: [217.140.106.51] X-ClientProxiedBy: LO4P123CA0251.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1a7::22) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.51) by LO4P123CA0251.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1a7::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4219.20 via Frontend Transport; Mon, 14 Jun 2021 13:43:03 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 36311c11-485a-4132-724a-08d92f3a5b51 X-MS-TrafficTypeDiagnostic: VI1PR0801MB1885:|VI1PR08MB4429: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:10000;OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: HHu7uJcKf9VW4FaE6IeTd/3+Mg9n88RC8ba0/VBL0y39W6E9J8HGgFg0TxlOIoFSrDeNSLNerDl9zsnsZhhsh9G08VQMxGzrR+mmDLGdzQgwruhha45mw8Kx5/46mGZ6k2d2G1/nJLv2aDeFWhTtjbDuglBGkmJnE0nI97yj6nPDu+ci/OsWFzDsKAlIB5nv9ajkg+DgS8MvOeMwOieCJueHDUB3uki6bC5rWxbVGL48s/i2YTGRA+N3+22KhTKUnLTDEJ1UbmHHztysBk1UWu8+XtBUQ7i7EURliOU/JpVyR/RZDhBkW4+N2OurjJergV8q6HWo6zNSMEqyy2wPWUQV6XyLxJGO9fR+KVdzkrZntQpE2y177PhCOQYSKdYlHO95xZolv7h7cJ8C9aIkBqNG9IEZzqH6lWP8V9uSUAfY+a+Y9FYAUE2VxG9JizfL88IiyM/Y/vantBynIy4cJz1+BmITE4TIxqH03UojQd0HcCEapnzEXKk7nDwl+9JqOlb3tSAAXbK/cVz1vsghzFXL3OXHk7+twCLNv9GLMVubdyvRmGjdeCxVOhp8pMOg4B43/dXU7GVjEvVX8pFZgOTloOtuFT+/JSo/sGWJav+haA7PF/sRqpHuUT8vYMNaqsLf7ElqABLNdjN4Eh/MIq4yNGlVXS7NY2iuCidaoK566LwcCvTbg/FIgQOSIArgoc9fntIg69v7kRCNfL+P93jmXg2d15gnkR2ch/KF9KQ= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(346002)(376002)(366004)(396003)(39850400004)(478600001)(66616009)(5660300002)(66476007)(235185007)(66556008)(55016002)(66946007)(956004)(4326008)(86362001)(2616005)(44832011)(8676002)(36756003)(8936002)(38350700002)(316002)(44144004)(38100700002)(7696005)(83380400001)(8886007)(52116002)(186003)(26005)(4743002)(16526019)(2906002)(6916009)(33964004)(4216001)(2700100001)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?VWV6NkVwNkdpVi9aTlRpVTZFNjZXMGhMMUZoSHZ5OEZ3eFlhREtkOTdGa0g0?= =?utf-8?B?d24yWjBOb0E2YjczWUVrUzlWMEFCcTdiaDdHNVlMaHp6eTM1WGlTUXN2UXI5?= =?utf-8?B?LzBLTytkOEM2Ly9KaWpNYXN6aWhFVW9nUjY2US9JWVFLbGVRV21lblBPSUFq?= =?utf-8?B?WTBLNitEV1NCSjJRSktMb09oNU5TU3pPTVVxMTFIOGlpTGFCT1NsZUV5ZWRQ?= =?utf-8?B?V2xYMHJjckJKNWVjalFNL0dkalVFVTBteHBwREIzYlVCcmJQYzZLMWZUTkpN?= =?utf-8?B?UHpLUHhrb0FmcHNMVldaVm4vb1RFVlFLQnAwV1RvYmMwUCtEZVI1cUlMajNV?= =?utf-8?B?dDJJUCtZY3hiUHBUOVlLOWJHUXdZLys0TGVVeHl6Q1JnQVllT2NRYUZ2VVNJ?= =?utf-8?B?SDhDNlJsNWwxektaUVlVK0diTGVWWng1VzFoWDYvSXpkK2RtZUxmWk9KMWVZ?= =?utf-8?B?STNYSTFrM3pTRUhzNEVSbWtPY2FvSUlsY1RlQmRIZTVsWGtmdjZQdjU1cStN?= =?utf-8?B?bmVxMFU2NFA3UUl1RGJLSytQRVJpeGNXMEpCTkVqdkV1NFZtWTRRbzAxdlZh?= =?utf-8?B?aCt2bWpVK1pSQjl1UGFlcmpaT3ZPYXZCcEVUYjBNOHR6dXZOalpxb0pMaU14?= =?utf-8?B?WVFIbFhRc0gxKzdhaVAzRjhBQldmNkc0MDArSXBHQnFkTmRyYlFwMmZ0d1BI?= =?utf-8?B?OVhNRlZuZUswYUp1YXpzbWtnbHcyNS96a0pLenhpTC9OdEZhMkY5dWJvU0t1?= =?utf-8?B?SlN5NjQrdVlodFRzU3V4N29qbVo0NnBmcU9ZbElNeVd0Qm1GdnBmQVp3SVJT?= =?utf-8?B?MHk3NUdvdTBmRlFDcFBRb2xEbHBtNGd3R2h0c3pMR0tsL1B4TXpNZkpEOGg2?= =?utf-8?B?Q2RuQUZweXVWcngveTcrWlNwN3VhVTFyNUh3azh5cEdrL2RFbzJUY1JiM0Ru?= =?utf-8?B?bHNMRmpWVHBZQUMyT2pyMkp0MGE2WVZWK2FwZTBpdTJTWHlXZU12RjhKL2NR?= =?utf-8?B?eUF2amtzdlNQdnhDeXN0aDgrby9ta0s0U2k4OTBrMEE4VnUrdGhkTC91VkVq?= =?utf-8?B?UXhzczRidmZKVGk3SnFsb2F6STlnWGxwTDZTQi9zem5RRGZWazNQSTFRbEtI?= =?utf-8?B?RmwyTFM5NUJIQzZWWXZ2WXY0d01Rb1VZOFJJOUk1bzJZTzlEOXhsNW0ydUpn?= =?utf-8?B?MDJlZWxnRG9tK2Y1clVHcFlVTmd1Wm51d3F6dCtkNjJxc3NrY2tDeDlrSWJY?= =?utf-8?B?V0VXQzYzNWUzcnlPRWpwM1I4QkFYYVp4MEhSWXdHVURDRmtpN3EyQVFaUjhh?= =?utf-8?B?RjlrVVp4UlVHYlRJV0xONlMvdWdPOXFMc2QrUytPSmt1emt4TFJ0WFNDM2hC?= =?utf-8?B?RkQwcWRDcFJJUVd4T2JJelN6YUpjejl2Y3lwdzZGV3FYRVZVSTd4aDRlTU1L?= =?utf-8?B?VjkvSU5lOUJVV1M4Qi9kRnM4SXU0RFZjSmtOVjRiRzdQZCt3WC96aXlWQ1RB?= =?utf-8?B?ZC9wNWNHYkV3V0JlUExBeXY5NmpVb0YrUXlDTlZTTlk3RHBZOW9GUkVEanZh?= =?utf-8?B?U203d3Bud21VWDloUTBseU13SjQwM2pPeHJvSUszclYyL29yczhMOHJ1ek01?= =?utf-8?B?WDNXYzFMeTQzcW50V1VudTJTcWJUVkdXcnlDS3o2RkdXbkZKSTNyTXBmb3Jn?= =?utf-8?B?YTM4Yzl0Z1lML3FqVjc1Smk0STkzMzhtQUJIMjFRNWRCamxyWGtnWlNEcG0y?= =?utf-8?Q?xauHKBC/KqyjujeuFYCySI7wnjsY82Q6dvp5ltQ?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1885 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT050.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 5242b564-9914-4c73-80eb-08d92f3a55a4 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: uMdN6omPm9ofVCoE1bSpWDTzxjWHKMLb7VwZbWXRR7k/PmxWU9M6UGJE/JNewfobIOFcMgN7lLIbvKZ+usbJ5le6W+zrPQxGHCb4a98V8VR+goTELUgyNZDXko90MaHKcEzFM5PN4P2GBL/I++E3hab8KBoGaM07G7NIZ3nb6VHKqV4uGuDnD8T21uE6qR0FU+/dFrjHl8hx4Q168K8FzU9JHF3wFyIBmAhgxutio72MwWOEe/lke7KOoXxil+IVnetQfv+7sDXIFYbX0zRWDSh4Kl4A8neZppwZjrPMRczxvD/7Q2wrHMlX2CiuRwL+TOTUxcelGsf/2DocHoUwvBNNB/SeuRhPBQGxy9T5XJll0beYK7YM3SNiicFLQkG3qM/gdu66VNehUf2tsKFtfAQOARvkjm3fGOU4LV3uTrXwei3TeZgCMnfoDJxagg/rwDlJTCMXSHNAcZY4YG2rhUfdUElavM2sxEMYDoHOgQ25x6qLcem7llc8Xz4Qrn+MKzfevvNYLYxPQiXoCeHX5zV5lDUAmlRUnxkcVmjaO8soQR0paStn9HigoHItMxg25GxCXB1GzrBubtmq0YiqcZR7oPVCblHU8h+tBTyPUKF86zaFsZ8mxhcQa29R258CE9LIaosLf7ux/P33DkupSsMWa/wr7WFPvGh3lY4Nrlk7W17GWISiuNkU3DJ1MThtWRUdoLipfKcOG5U8KmRT86eXY2jFNnZM0OefknCfqe8= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(346002)(396003)(376002)(136003)(39850400004)(46966006)(36840700001)(44144004)(235185007)(33964004)(6916009)(55016002)(81166007)(316002)(7696005)(478600001)(26005)(86362001)(2906002)(70206006)(70586007)(44832011)(66616009)(47076005)(2616005)(82310400003)(16526019)(956004)(186003)(82740400003)(8886007)(8676002)(83380400001)(356005)(36756003)(4326008)(8936002)(4743002)(336012)(36860700001)(5660300002)(4216001)(2700100001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Jun 2021 13:43:13.4799 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 36311c11-485a-4132-724a-08d92f3a5b51 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT050.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4429 X-Spam-Status: No, score=-14.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jun 2021 13:43:20 -0000 --mYCpIKhGyMATD0i+ Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Hi All, This RFC is trying to address the following inefficiency when vectorizing conditional statements with SVE. Consider the case void f10(double * restrict z, double * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) { z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i]; } } For which we currently generate at -O3: f10: cmp w4, 0 ble .L1 mov x5, 0 whilelo p1.d, wzr, w4 ptrue p3.b, all .L3: ld1d z1.d, p1/z, [x1, x5, lsl 3] fcmgt p2.d, p1/z, z1.d, #0.0 fcmgt p0.d, p3/z, z1.d, #0.0 ld1d z2.d, p2/z, [x2, x5, lsl 3] bic p0.b, p3/z, p1.b, p0.b ld1d z0.d, p0/z, [x3, x5, lsl 3] fsub z0.d, p0/m, z0.d, z1.d movprfx z0.d, p2/m, z1.d fadd z0.d, p2/m, z0.d, z2.d st1d z0.d, p1, [x0, x5, lsl 3] incd x5 whilelo p1.d, w5, w4 b.any .L3 .L1: ret Notice that the condition for the else branch duplicates the same predicate as the then branch and then uses BIC to negate the results. The reason for this is that during instruction generation in the vectorizer we emit mask__41.11_66 = vect__4.10_64 > vect_cst__65; vec_mask_and_69 = mask__41.11_66 & loop_mask_63; vec_mask_and_71 = mask__41.11_66 & loop_mask_63; mask__43.16_73 = ~mask__41.11_66; vec_mask_and_76 = mask__43.16_73 & loop_mask_63; vec_mask_and_78 = mask__43.16_73 & loop_mask_63; which ultimately gets optimized to mask__41.11_66 = vect__4.10_64 > { 0.0, ... }; vec_mask_and_69 = loop_mask_63 & mask__41.11_66; mask__43.16_73 = ~mask__41.11_66; vec_mask_and_76 = loop_mask_63 & mask__43.16_73; Notice how the negate is on the operation and not the predicate resulting from the operation. When this is expanded this turns into RTL where the negate is on the compare directly. This means the RTL is different from the one without the negate and so CSE is unable to recognize that they are essentially same operation. To fix this my patch changes it so you negate the mask rather than the operation mask__41.13_55 = vect__4.12_53 > { 0.0, ... }; vec_mask_and_58 = loop_mask_52 & mask__41.13_55; vec_mask_op_67 = ~vec_mask_and_58; vec_mask_and_65 = loop_mask_52 & vec_mask_op_67; which means the negate end up on the masked operation. This removes the additional comparisons f10: cmp w4, 0 ble .L1 mov x5, 0 whilelo p0.d, wzr, w4 ptrue p3.b, all .p2align 5,,15 .L3: ld1d z1.d, p0/z, [x1, x5, lsl 3] fcmgt p1.d, p0/z, z1.d, #0.0 bic p2.b, p3/z, p0.b, p1.b ld1d z2.d, p1/z, [x2, x5, lsl 3] ld1d z0.d, p2/z, [x3, x5, lsl 3] fsub z0.d, p2/m, z0.d, z1.d movprfx z0.d, p1/m, z1.d fadd z0.d, p1/m, z0.d, z2.d st1d z0.d, p0, [x0, x5, lsl 3] incd x5 whilelo p0.d, w5, w4 b.any .L3 .L1: ret But is still not optimal. The problem is the BIC pattern, aarch64_pred__z which will replace the NOT and AND with BIC. However in this case since p1 is the result of a predicate operation on p0 the BIC should instead be a NEG disabling the pattern for combine (adding && reload_completed) gives me the codegen I'm after: f10: cmp w4, 0 ble .L1 mov x5, 0 whilelo p0.d, wzr, w4 .p2align 5,,15 .L3: ld1d z1.d, p0/z, [x1, x5, lsl 3] fcmgt p1.d, p0/z, z1.d, #0.0 not p2.b, p0/z, p1.b ld1d z2.d, p1/z, [x2, x5, lsl 3] ld1d z0.d, p2/z, [x3, x5, lsl 3] fsub z0.d, p2/m, z0.d, z1.d movprfx z0.d, p1/m, z1.d fadd z0.d, p1/m, z0.d, z2.d st1d z0.d, p0, [x0, x5, lsl 3] incd x5 whilelo p0.d, w5, w4 b.any .L3 .L1: ret Which used NOT pedicated on p0 instead. Which is what the code was pre-combine and also removed the need of having a third predicate p3 with the BIC case. I can't remove combine to remove the BIC since in the case above the fcmgt isn't single use so combine won't try. Of course disabling the early recog for BIC isn't ideal since you miss genuine BICs. Any feedback on the approach and how to fix the BIC issue? I did try an approach using combine where I matched against the full sequence and spit it early in combine. This works for the case above but falls apart in other cases where the cmp isn't single use from the start. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. But would like to solve the remaining issues. Thanks, Tamar gcc/ChangeLog: * tree-vect-stmts.c (prepare_load_store_mask): Expand unary operators on the mask instead of the operation. --- inline copy of patch -- diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index eeef96a2eb60853e9c18a288af9e49ae9ad65128..35e5212c77d7cb26b1a2b9645cbac22c30078fb8 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1785,8 +1785,38 @@ prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask, gcc_assert (TREE_TYPE (loop_mask) == mask_type); tree and_res = make_temp_ssa_name (mask_type, NULL, "vec_mask_and"); + tree final_mask = vec_mask; + + /* Check if what vec_mask is pointing at is a unary operator and if so + expand the operand before the mask and not on the operation to allow + for better CSE. */ + if (TREE_CODE (vec_mask) == SSA_NAME) + { + gimple *stmt = SSA_NAME_DEF_STMT (vec_mask); + if (is_gimple_assign (stmt) + && gimple_assign_rhs_class (stmt) == GIMPLE_UNARY_RHS) + { + tree_code code = gimple_assign_rhs_code (stmt); + tree pred_op = gimple_assign_rhs1 (stmt); + + /* Predicate the operation first. */ + gimple *pred_stmt; + tree pred_res1 = make_temp_ssa_name (mask_type, NULL, "vec_mask_op"); + pred_stmt = gimple_build_assign (pred_res1, BIT_AND_EXPR, + pred_op, loop_mask); + gsi_insert_before (gsi, pred_stmt, GSI_SAME_STMT); + + /* Now move the operation to the top and predicate it. */ + tree pred_res2 = make_temp_ssa_name (mask_type, NULL, "vec_mask_op"); + pred_stmt = gimple_build_assign (pred_res2, code, + pred_res1); + gsi_insert_before (gsi, pred_stmt, GSI_SAME_STMT); + final_mask = pred_res2; + } + } + gimple *and_stmt = gimple_build_assign (and_res, BIT_AND_EXPR, - vec_mask, loop_mask); + final_mask, loop_mask); gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT); return and_res; } -- --mYCpIKhGyMATD0i+ Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="rb14553.patch" diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index eeef96a2eb60853e9c18a288af9e49ae9ad65128..35e5212c77d7cb26b1a2b9645cbac22c30078fb8 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1785,8 +1785,38 @@ prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask, gcc_assert (TREE_TYPE (loop_mask) == mask_type); tree and_res = make_temp_ssa_name (mask_type, NULL, "vec_mask_and"); + tree final_mask = vec_mask; + + /* Check if what vec_mask is pointing at is a unary operator and if so + expand the operand before the mask and not on the operation to allow + for better CSE. */ + if (TREE_CODE (vec_mask) == SSA_NAME) + { + gimple *stmt = SSA_NAME_DEF_STMT (vec_mask); + if (is_gimple_assign (stmt) + && gimple_assign_rhs_class (stmt) == GIMPLE_UNARY_RHS) + { + tree_code code = gimple_assign_rhs_code (stmt); + tree pred_op = gimple_assign_rhs1 (stmt); + + /* Predicate the operation first. */ + gimple *pred_stmt; + tree pred_res1 = make_temp_ssa_name (mask_type, NULL, "vec_mask_op"); + pred_stmt = gimple_build_assign (pred_res1, BIT_AND_EXPR, + pred_op, loop_mask); + gsi_insert_before (gsi, pred_stmt, GSI_SAME_STMT); + + /* Now move the operation to the top and predicate it. */ + tree pred_res2 = make_temp_ssa_name (mask_type, NULL, "vec_mask_op"); + pred_stmt = gimple_build_assign (pred_res2, code, + pred_res1); + gsi_insert_before (gsi, pred_stmt, GSI_SAME_STMT); + final_mask = pred_res2; + } + } + gimple *and_stmt = gimple_build_assign (and_res, BIT_AND_EXPR, - vec_mask, loop_mask); + final_mask, loop_mask); gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT); return and_res; } --mYCpIKhGyMATD0i+--