From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2082.outbound.protection.outlook.com [40.107.21.82]) by sourceware.org (Postfix) with ESMTPS id 54B343858C2B for ; Fri, 11 Nov 2022 18:52:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 54B343858C2B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=BL6VQtaRhuLiInHJfHMQOON/AaEdus/s9w7cXNUOx/1vtMDLR7Y2v+gk74xEpFua61HJU/KlcIfpJa8M3ti1ei2d+CqCam3Hvq/HjlxUuuF9b5RF+VPZPMN200m/gpZMyvQuBNyOR22ZmNmhbRMf4L/Xm5qeC9ijarHfJ9JBza/6FkUR3xaldGvOLkj8PNjlejdxWSORGWiRJ0ASCrAvUaSWSEFKxy4LNQRFJpsuygsTfKOCouw5wEsgnx9OTAi5Oo4+/UGX+0lcdNSyou69FtzHS1HKCeyKGD8b38MSlY9g4hak1DWmxmguiTHKSvNyNidFHDhLZfGmUdbVNf2Cnw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5lHc1j/m/cNgoh1jXOVIkMk1fiMZn5ulABWhAbgRTus=; b=C4z/jTGL3BUEWFKalHQlG0fqMpt7xV27SMProkoqHDXNcQwqe7zePaqRfSt6TMuUdLxSO2kZFKl+3XSpIvhBsScImL2qsFlYmNJEiWEBWlKoDpBjyW0TzmNFBeUdebRh6uL8W7rWPJDxs5+40CitViOwZ3HRYOJZPY/kOa4h6UXi9a7k9YMLQOaVKC4+k6VNfGqQYfQVTjE3FvlEYmNerCFLFvDNT+xmt07ozvzRsthWjUN8g8ka1/bHjWE/JC7MLcOafFSuznbwa4wTq+2IkWMqO0YovCyNLUjgtaAZYTVDFVoZBjVTZDqFwTano5TKBpoNiAHIBi9tK338J0uThw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5lHc1j/m/cNgoh1jXOVIkMk1fiMZn5ulABWhAbgRTus=; b=I+62brYwHfIDbBPvLNcvM40NaP4MW494LdPFW/iI4RY5iZNWeR9q1hyq/i7+jTBrs3+KSrTpUxGiU/pfKtUNV0mx83yg5CbSuQsXmzjUToK38sylSbH3l/JYqP21sN0xVodEG1USU247jcp4FFDU4UfogwlINtyKXQM6eTovL/k= Received: from AS9PR07CA0010.eurprd07.prod.outlook.com (2603:10a6:20b:46c::32) by AS2PR08MB10083.eurprd08.prod.outlook.com (2603:10a6:20b:647::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.13; Fri, 11 Nov 2022 18:52:24 +0000 Received: from AM7EUR03FT031.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:46c:cafe::df) by AS9PR07CA0010.outlook.office365.com (2603:10a6:20b:46c::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.15 via Frontend Transport; Fri, 11 Nov 2022 18:52:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT031.mail.protection.outlook.com (100.127.140.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.12 via Frontend Transport; Fri, 11 Nov 2022 18:52:24 +0000 Received: ("Tessian outbound f394866f3f2b:v130"); Fri, 11 Nov 2022 18:52:24 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 28f3ed8f92b646f9 X-CR-MTA-TID: 64aa7808 Received: from ba9e08d33d68.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 935EF38C-63E2-4771-AA9A-497F5612BE61.1; Fri, 11 Nov 2022 18:52:17 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ba9e08d33d68.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 11 Nov 2022 18:52:17 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kBoPzKlsphhZJZOvh4iNuFCu3GOFLadhYB8x7rpN/kJgW4TFfqgHCEUVigFS0nku1P9VwRBAsWSgk7FjoJAC1L6MPEBJg93iKDlByhf5pB4cgWZTNpqKDRH0bbSHN+1KKPi/Ev3hGHnb7YgHIKFfSm0oD3vjVMg3rQHXmi9543r+weG/gMcJy7CtsNeeFVdiJSp4S6gO57Ma8om8PfeBB+7ZE4NVjdNG4EaUzgPD0wZEQr+h8Gt/E67iEeQaL7w9gRv4XQgWTfGorAFtOMTRJ6BeZ/X5Lbow7IfbgBt1PR4puAJDCNGWr5b6GIyQQhjGMExkyWeDcjPQRa8p2DX8Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5lHc1j/m/cNgoh1jXOVIkMk1fiMZn5ulABWhAbgRTus=; b=cXn8eFowk3J0tgGLi1RdSFbXR3roUMKwnlJf3VEkcJpVho2Ar7bdx8qIwDerdmNkomOUAyz1cB8YGUN7THckYGEPqrexc/sKOa8RIYWPCNQ1L7uu7sZjN1y1yWHso07D8fnS0V15WIqW7kfWih2kuLFyHXgEbv15zb7bde6V+RY5lAe2q7pIhZQlK50A5C1xwwiDYqDXLWItVQ4yIGw/Bmzl22BvwlgGX9fTruOumQeywW4+0fEPBYE4nYidesZeMyEfpwL4reiztLrWKPZgJ5kaUC8lKPAK9n9VaZwGK22Ih/G9+zX305xhZ1XZgv/JDFwfMJc4lsht4QD10ckPXQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5lHc1j/m/cNgoh1jXOVIkMk1fiMZn5ulABWhAbgRTus=; b=I+62brYwHfIDbBPvLNcvM40NaP4MW494LdPFW/iI4RY5iZNWeR9q1hyq/i7+jTBrs3+KSrTpUxGiU/pfKtUNV0mx83yg5CbSuQsXmzjUToK38sylSbH3l/JYqP21sN0xVodEG1USU247jcp4FFDU4UfogwlINtyKXQM6eTovL/k= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from PAXPR08MB6686.eurprd08.prod.outlook.com (2603:10a6:102:13e::8) by AS8PR08MB7790.eurprd08.prod.outlook.com (2603:10a6:20b:527::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.12; Fri, 11 Nov 2022 18:52:14 +0000 Received: from PAXPR08MB6686.eurprd08.prod.outlook.com ([fe80::44db:f442:b299:a701]) by PAXPR08MB6686.eurprd08.prod.outlook.com ([fe80::44db:f442:b299:a701%3]) with mapi id 15.20.5813.013; Fri, 11 Nov 2022 18:52:14 +0000 Date: Fri, 11 Nov 2022 18:50:12 +0000 From: Andrew Carlotti To: gcc-patches@gcc.gnu.org Subject: [PATCH 5/8] middle-end: Add cltz_complement idiom recognition Message-ID: References: Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LNXP265CA0070.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:5d::34) To PAXPR08MB6686.eurprd08.prod.outlook.com (2603:10a6:102:13e::8) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: PAXPR08MB6686:EE_|AS8PR08MB7790:EE_|AM7EUR03FT031:EE_|AS2PR08MB10083:EE_ X-MS-Office365-Filtering-Correlation-Id: 6e28c3ec-a588-4066-5db7-08dac415df7b x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: k/mSE8JWpMDG92H4u6gDes/o7UsWbWnuEOY06/Xw23Po3dSOvbwq7RUcrGwbfm29NRSi/joo5Yfk7QtYvlwGp/e3+bxMF/hOghS6AyJzeVpCE6B0m8swW+A/4qyhKdiDzTuo9p7adOx3quI8Gp+zvRs+IT6WCRFo+tF3rHtuwhzBPeKhgwQNvirzZ2ZH2U6sUR978k/EfWlXP0LfJrAyCXrwLsDiQoEODzpykJfmFuupABn+3lxqHEbxIyUDSx351jDPBXo0eq3+WcjntdGSAwtC+dI33C02L7dn4cdKjun3wm01iZo8oevlBJ2gnWfBqgWNEfJcoFLX0SYEIne3m8wztMbcst5FBVSmQ9oZF8gC9GJrvzIhFIpnc7jIYNjHGpI3fAUX1iN6RM1y8Z6yiZlvobo6KKwjec32wiGnVGWi14PWIfL47Lwl+Ltp5qI9LzcuDPNjfGvMP48TcbceR0vD9AkX0/DBfgdz8jbBEC88rgAZd6SNyulea80rgNMJzqmcjLpZaNZ9+dZSKS3iRidXX4h8YIkJeTu/ADB5s/CjQ7s1ePAvgUJi6++13ZGJ2w9SI8UOQHDTdcWT0jgFghUmh/RYlIffhmo9ksvaCg3o9risxtcjMCAQNCqiKE/3xC5RTDAPtV+70hni16PtTCkXjWH04sYfrGbZDQqtP9u/ox7RsjoUiZ+jga8GfQgd0LSKemxRrKFkqk2JqF+5OrjUrwnoYJ/UUrMUGDDe/bW0ur6iEi8YVwenHniTHqmbbpx2an6/+Vh2seC+e4RgqA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAXPR08MB6686.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(376002)(136003)(396003)(39860400002)(346002)(366004)(451199015)(8936002)(84970400001)(6916009)(30864003)(316002)(44832011)(41300700001)(478600001)(83380400001)(5660300002)(6486002)(66476007)(66946007)(8676002)(2906002)(66556008)(6512007)(86362001)(26005)(186003)(6506007)(38100700002);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7790 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT031.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 07bcd8fd-5215-47aa-7ac8-08dac415d930 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vAr6JU4bCs1/cvZlZCoqyhY9TiZpeQlpRTaC39XduJhd5+789cFz7YeICBhafh63VtQMDPYa9Nf+H16Ox3yT0OCe4V8Zukox65tVpJWVJw7ZqeHcvqoS3hHbQFVxTfDoxnc8wwYu+zIireYG14g2liBTXADRoR5GrhgIsm54svTI//Fmq6HKircFdcWwxQWa0Olpgcy2qIbT5UytKWov9prCijRAjHc6mG4ryK3WV72KiFYe+lBic8KK/ugMc+cUi/Z99N42Ze/yrYLpLfSbjwUeRW/yRZv32X1xN97h+vAsQAU5pVHBjtygWqDlB49iVad5ch5folM8OD6GWLAr8HO4KmgOGn4LI80QG9HbDtBzqpJNJap7rHOIB+Ju4xvveYPVOr4hqkMFtgfPyAao0RYQCpeUo2dDyOD+Q1I3VwbFAEV16O+c+PExROt7THcYfIeAkI6Usm2+NLCQ3U83QGqtbsUW7lUEDtbShrD+0dwxg30vb88o6vNK2QmV9yPXDWHkSl1MKGTJZmlaBn+9e7B1QlY5uGMk1L9shiglLyOiQHdlsU25203kD1bAPo4R6NcktbSA0Zlagt2aFExyoMLiwhOrbMqcIxWsfwTK0tcxdj7wScaFD4czHvUGxZ86r8KqlKo/uezzMmb84pHDMhlWDNekJotZ+SHdp+NU11KQQ8Xko4hqzGqiwYYc9WE2vs/4liYEZ9w4nNMscKAq0c9h7YHO2RTBXN5NpuBp6Y/5jfPckXZoJINsUt6jbNxRD8od7+gq8XKJ4JnvRxzJVE1gQCebjExS8I6qWyHYSPkVl1x7EjIXNTdDZxNv+fF1gvEP1Ux1ANUEjWGLS4o7/Q== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(346002)(376002)(136003)(451199015)(40470700004)(46966006)(36840700001)(82740400003)(40480700001)(40460700003)(86362001)(6916009)(316002)(70586007)(81166007)(70206006)(6486002)(356005)(30864003)(5660300002)(44832011)(83380400001)(8676002)(41300700001)(47076005)(2906002)(8936002)(478600001)(6512007)(6506007)(26005)(36860700001)(186003)(82310400005)(336012)(84970400001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Nov 2022 18:52:24.8326 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6e28c3ec-a588-4066-5db7-08dac415df7b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT031.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB10083 X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This recognises patterns of the form: while (n) { n >>= 1 } This patch results in improved (but still suboptimal) codegen: foo (unsigned int b) { int c = 0; while (b) { b >>= 1; c++; } return c; } foo: .LFB11: .cfi_startproc cbz w0, .L3 clz w1, w0 tst x0, 1 mov w0, 32 sub w0, w0, w1 csel w0, w0, wzr, ne ret The conditional is unnecessary. phiopt could recognise a redundant csel (using cond_removal_in_builtin_zero_pattern) when one of the inputs is a clz call, but it cannot recognise the redunancy when the input is (e.g.) (32 - clz). I could perhaps extend this function to recognise this pattern in a later patch, if this is a good place to recognise more patterns. gcc/ChangeLog: * tree-scalar-evolution.cc (expression_expensive_p): Add checks for c[lt]z optabs. * tree-ssa-loop-niter.cc (build_cltz_expr): New. (number_of_iterations_cltz_complement): New. (number_of_iterations_bitcount): Add call to the above. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_clz) (check_effective_target_clzl, check_effective_target_clzll) (check_effective_target_ctz, check_effective_target_clzl) (check_effective_target_ctzll): New. * gcc.dg/tree-ssa/cltz-complement-max.c: New test. * gcc.dg/tree-ssa/clz-complement-char.c: New test. * gcc.dg/tree-ssa/clz-complement-int.c: New test. * gcc.dg/tree-ssa/clz-complement-long-long.c: New test. * gcc.dg/tree-ssa/clz-complement-long.c: New test. * gcc.dg/tree-ssa/ctz-complement-char.c: New test. * gcc.dg/tree-ssa/ctz-complement-int.c: New test. * gcc.dg/tree-ssa/ctz-complement-long-long.c: New test. * gcc.dg/tree-ssa/ctz-complement-long.c: New test. -- diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c b/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c new file mode 100644 index 0000000000000000000000000000000000000000..1a29ca52e42e50822e4e3213b2cb008b766d0318 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/cltz-complement-max.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-loop-optimize -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__) + +int clz_complement_count1 (unsigned char b) { + int c = 0; + + while (b) { + b >>= 1; + c++; + } + if (c <= PREC) + return 0; + else + return 34567; +} + +int clz_complement_count2 (unsigned char b) { + int c = 0; + + while (b) { + b >>= 1; + c++; + } + if (c <= PREC - 1) + return 0; + else + return 76543; +} + +int ctz_complement_count1 (unsigned char b) { + int c = 0; + + while (b) { + b <<= 1; + c++; + } + if (c <= PREC) + return 0; + else + return 23456; +} + +int ctz_complement_count2 (unsigned char b) { + int c = 0; + + while (b) { + b <<= 1; + c++; + } + if (c <= PREC - 1) + return 0; + else + return 65432; +} +/* { dg-final { scan-tree-dump-times "34567" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "76543" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "23456" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "65432" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c new file mode 100644 index 0000000000000000000000000000000000000000..2ebe8fabcaf0ce88f3a6a46e9ba4ba79b7d3672e --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-char.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-require-effective-target clz } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__) + +int +__attribute__ ((noinline, noclone)) +foo (unsigned char b) { + int c = 0; + + while (b) { + b >>= 1; + c++; + } + + return c; +} + +int main() +{ + if (foo(0) != 0) + __builtin_abort (); + if (foo(5) != 3) + __builtin_abort (); + if (foo(255) != 8) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c new file mode 100644 index 0000000000000000000000000000000000000000..f2c5c23f6a7d84ecb637c6961698b0fc30d7426b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-int.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-require-effective-target clz } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__ * __SIZEOF_INT__) + +int +__attribute__ ((noinline, noclone)) +foo (unsigned int b) { + int c = 0; + + while (b) { + b >>= 1; + c++; + } + + return c; +} + +int main() +{ + if (foo(0) != 0) + __builtin_abort (); + if (foo(5) != 3) + __builtin_abort (); + if (foo(1 << (PREC - 1)) != PREC) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-long-long.c b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-long-long.c new file mode 100644 index 0000000000000000000000000000000000000000..7f7793f0efac1f0d793e6e99b84988e5cc5221c9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-long-long.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-require-effective-target clzll } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__ * __SIZEOF_LONG_LONG__) + +int +__attribute__ ((noinline, noclone)) +foo (unsigned long long b) { + int c = 0; + + while (b) { + b >>= 1; + c++; + } + + return c; +} + +int main() +{ + if (foo(0) != 0) + __builtin_abort (); + if (foo(5) != 3) + __builtin_abort (); + if (foo(1LL << (PREC - 1)) != PREC) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-long.c b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-long.c new file mode 100644 index 0000000000000000000000000000000000000000..97161bb7a74260bea20e325ebab64acb33a2b696 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/clz-complement-long.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-require-effective-target clzl } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__ * __SIZEOF_LONG__) + +int +__attribute__ ((noinline, noclone)) +foo (unsigned long b) { + int c = 0; + + while (b) { + b >>= 1; + c++; + } + + return c; +} + +int main() +{ + if (foo(0) != 0) + __builtin_abort (); + if (foo(5) != 3) + __builtin_abort (); + if (foo(1L << (PREC - 1)) != PREC) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c new file mode 100644 index 0000000000000000000000000000000000000000..b9afe8852d8ffbc7ee9a0760cf04b8f98af293a2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ctz } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__) + +int +__attribute__ ((noinline, noclone)) +foo (unsigned char b) { + int c = 0; + + while (b) { + b <<= 1; + c++; + } + + return c; +} + +int main() +{ + if (foo(0) != 0) + __builtin_abort (); + if (foo(96) != PREC - 5) + __builtin_abort (); + if (foo(35) != PREC) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_ctz|\\.CTZ" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c new file mode 100644 index 0000000000000000000000000000000000000000..d2702a65daf34db66550d2255395db68a29a4797 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ctz } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__ * __SIZEOF_INT__) + +int +__attribute__ ((noinline, noclone)) +foo (unsigned int b) { + int c = 0; + + while (b) { + b <<= 1; + c++; + } + + return c; +} + +int main() +{ + if (foo(0) != 0) + __builtin_abort (); + if (foo(96) != PREC - 5) + __builtin_abort (); + if (foo(35) != PREC) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_ctz|\\.CTZ" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c new file mode 100644 index 0000000000000000000000000000000000000000..1ea0d5d7d9f8be1824c4177c33edd91e66b4ddab --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ctzll } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__ * __SIZEOF_LONG_LONG__) + +int +__attribute__ ((noinline, noclone)) +foo (unsigned long long b) { + int c = 0; + + while (b) { + b <<= 1; + c++; + } + + return c; +} + +int main() +{ + if (foo(0) != 0) + __builtin_abort (); + if (foo(96) != PREC - 5) + __builtin_abort (); + if (foo(35) != PREC) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_ctz|\\.CTZ" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c new file mode 100644 index 0000000000000000000000000000000000000000..80fb02dcfa68bc022ae69b26fb189323e01fc6fc --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ctzl } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define PREC (__CHAR_BIT__ * __SIZEOF_LONG__) + +int +__attribute__ ((noinline, noclone)) +foo (unsigned long b) { + int c = 0; + + while (b) { + b <<= 1; + c++; + } + + return c; +} + +int main() +{ + if (foo(0) != 0) + __builtin_abort (); + if (foo(96) != PREC - 5) + __builtin_abort (); + if (foo(35) != PREC) + __builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin_ctz|\\.CTZ" 1 "optimized" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index c7f583d6d1498401a7c106ed3f539dcd04f95451..325f12d62324793d6b2cf55b074ef6cc9cf4dd4d 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8687,6 +8687,72 @@ proc check_effective_target_popcount { } { } "" ] } +# Return 1 if the target supports clz on int. + +proc check_effective_target_clz { } { + return [check_no_messages_and_pattern clz "!\\(call" rtl-expand { + int foo (int b) + { + return __builtin_clz (b); + } + } "" ] +} + +# Return 1 if the target supports clz on long. + +proc check_effective_target_clzl { } { + return [check_no_messages_and_pattern clzl "!\\(call" rtl-expand { + int foo (long b) + { + return __builtin_clzl (b); + } + } "" ] +} + +# Return 1 if the target supports clz on long long. + +proc check_effective_target_clzll { } { + return [check_no_messages_and_pattern clzll "!\\(call" rtl-expand { + int foo (long long b) + { + return __builtin_clzll (b); + } + } "" ] +} + +# Return 1 if the target supports ctz on int. + +proc check_effective_target_ctz { } { + return [check_no_messages_and_pattern ctz "!\\(call" rtl-expand { + int foo (int b) + { + return __builtin_ctz (b); + } + } "" ] +} + +# Return 1 if the target supports ctz on long. + +proc check_effective_target_ctzl { } { + return [check_no_messages_and_pattern ctzl "!\\(call" rtl-expand { + int foo (long b) + { + return __builtin_ctzl (b); + } + } "" ] +} + +# Return 1 if the target supports ctz on long long. + +proc check_effective_target_ctzll { } { + return [check_no_messages_and_pattern ctzll "!\\(call" rtl-expand { + int foo (long long b) + { + return __builtin_ctzll (b); + } + } "" ] +} + # Return 1 if the target supports atomic operations on "long long" # and can execute them. # diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc index 7e2a3e986619de87e4ae9daf16198be1f13b917c..1ac9526c69b5fe80c26022f2fa1176d222e2dfb9 100644 --- a/gcc/tree-scalar-evolution.cc +++ b/gcc/tree-scalar-evolution.cc @@ -3406,12 +3406,21 @@ expression_expensive_p (tree expr, hash_map &cache, library call for popcount when backend does not have an instruction to do so. We consider this to be expensive and generate __builtin_popcount only when backend defines it. */ + optab optab; combined_fn cfn = get_call_combined_fn (expr); switch (cfn) { CASE_CFN_POPCOUNT: + optab = popcount_optab; + goto bitcount_call; + CASE_CFN_CLZ: + optab = clz_optab; + goto bitcount_call; + CASE_CFN_CTZ: + optab = ctz_optab; +bitcount_call: /* Check if opcode for popcount is available in the mode required. */ - if (optab_handler (popcount_optab, + if (optab_handler (optab, TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0)))) == CODE_FOR_nothing) { @@ -3424,7 +3433,7 @@ expression_expensive_p (tree expr, hash_map &cache, instructions. */ if (is_a (mode, &int_mode) && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD - && (optab_handler (popcount_optab, word_mode) + && (optab_handler (optab, word_mode) != CODE_FOR_nothing)) break; return true; diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index fece876099c1687569d6351e7d2416ea6acae5b5..16e8e25919d808cea27adbd72f0be01ae2f0e547 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -2198,6 +2198,195 @@ number_of_iterations_popcount (loop_p loop, edge exit, return true; } +/* Return an expression that counts the leading/trailing zeroes of src. */ + +static tree +build_cltz_expr (tree src, bool leading, bool defined_at_zero) +{ + tree fn; + int prec = TYPE_PRECISION (TREE_TYPE (src)); + int i_prec = TYPE_PRECISION (integer_type_node); + int li_prec = TYPE_PRECISION (long_integer_type_node); + int lli_prec = TYPE_PRECISION (long_long_integer_type_node); + if (prec <= i_prec) + fn = leading ? builtin_decl_implicit (BUILT_IN_CLZ) + : builtin_decl_implicit (BUILT_IN_CTZ); + else if (prec == li_prec) + fn = leading ? builtin_decl_implicit (BUILT_IN_CLZL) + : builtin_decl_implicit (BUILT_IN_CTZL); + else if (prec == lli_prec || prec == 2 * lli_prec) + fn = leading ? builtin_decl_implicit (BUILT_IN_CLZLL) + : builtin_decl_implicit (BUILT_IN_CTZLL); + else + return NULL_TREE; + + tree utype = unsigned_type_for (TREE_TYPE (src)); + src = fold_convert (utype, src); + if (prec < i_prec) + src = fold_convert (unsigned_type_node, src); + + tree call; + if (prec == 2 * lli_prec) + { + tree src1 = fold_convert (long_long_unsigned_type_node, + fold_build2 (RSHIFT_EXPR, TREE_TYPE (src), + unshare_expr (src), + build_int_cst (integer_type_node, + lli_prec))); + tree src2 = fold_convert (long_long_unsigned_type_node, src); + /* We count the zeroes in src1, and add the number in src2 when src1 + is 0. */ + if (!leading) + std::swap(src1, src2); + tree call1 = build_call_expr (fn, 1, src1); + tree call2 = build_call_expr (fn, 1, src2); + if (defined_at_zero) + { + tree is_zero2 = fold_build2 (NE_EXPR, boolean_type_node, src2, + build_zero_cst (TREE_TYPE (src2))); + call2 = fold_build3(COND_EXPR, integer_type_node, is_zero2, call2, + build_int_cst (integer_type_node, lli_prec)); + } + tree is_zero1 = fold_build2 (NE_EXPR, boolean_type_node, src1, + build_zero_cst (TREE_TYPE (src1))); + call = fold_build3(COND_EXPR, integer_type_node, is_zero1, call1, + fold_build2 (PLUS_EXPR, integer_type_node, call2, + build_int_cst (integer_type_node, + lli_prec))); + } + else + { + call = build_call_expr (fn, 1, src); + if (defined_at_zero) + { + tree is_zero = fold_build2 (NE_EXPR, boolean_type_node, src, + build_zero_cst (TREE_TYPE (src))); + call = fold_build3(COND_EXPR, integer_type_node, is_zero, call, + build_int_cst (integer_type_node, prec)); + } + } + + if (leading && prec < i_prec) + call = fold_build2(MINUS_EXPR, integer_type_node, call, + build_int_cst (integer_type_node, + i_prec - prec)); + + return call; +} + +/* See comment below for number_of_iterations_bitcount. + For c[lt]z complement, we have: + + modify: + iv_2 = iv_1 >> 1 OR iv_1 << 1 + + test: + if (iv != 0) + + modification count: + src precision - c[lt]z (src) + + */ + +static bool +number_of_iterations_cltz_complement (loop_p loop, edge exit, + enum tree_code code, + class tree_niter_desc *niter) +{ + bool modify_before_test = true; + HOST_WIDE_INT max; + + /* Check that condition for staying inside the loop is like + if (iv != 0). */ + gimple *cond_stmt = last_stmt (exit->src); + if (!cond_stmt + || gimple_code (cond_stmt) != GIMPLE_COND + || code != NE_EXPR + || !integer_zerop (gimple_cond_rhs (cond_stmt)) + || TREE_CODE (gimple_cond_lhs (cond_stmt)) != SSA_NAME) + return false; + + tree iv_2 = gimple_cond_lhs (cond_stmt); + gimple *iv_2_stmt = SSA_NAME_DEF_STMT (iv_2); + + /* If the test comes before the iv modification, then these will actually be + iv_1 and a phi node. */ + if (gimple_code (iv_2_stmt) == GIMPLE_PHI + && gimple_bb (iv_2_stmt) == loop->header + && gimple_phi_num_args (iv_2_stmt) == 2 + && (TREE_CODE (gimple_phi_arg_def (iv_2_stmt, + loop_latch_edge (loop)->dest_idx)) + == SSA_NAME)) + { + /* iv_2 is actually one of the inputs to the phi. */ + iv_2 = gimple_phi_arg_def (iv_2_stmt, loop_latch_edge (loop)->dest_idx); + iv_2_stmt = SSA_NAME_DEF_STMT (iv_2); + modify_before_test = false; + } + + /* Make sure iv_2_stmt is a logical shift by one stmt: + iv_2 = iv_1 {>>|<<} 1 */ + if (!is_gimple_assign (iv_2_stmt) + || (gimple_assign_rhs_code (iv_2_stmt) != LSHIFT_EXPR + && (gimple_assign_rhs_code (iv_2_stmt) != RSHIFT_EXPR + || !TYPE_UNSIGNED (TREE_TYPE (gimple_assign_lhs (iv_2_stmt))))) + || !integer_onep (gimple_assign_rhs2 (iv_2_stmt))) + return false; + + bool left_shift = (gimple_assign_rhs_code (iv_2_stmt) == LSHIFT_EXPR); + + tree iv_1 = gimple_assign_rhs1 (iv_2_stmt); + + /* Check the recurrence. */ + gimple *phi = SSA_NAME_DEF_STMT (iv_1); + if (gimple_code (phi) != GIMPLE_PHI + || (gimple_bb (phi) != loop_latch_edge (loop)->dest) + || (iv_2 != gimple_phi_arg_def (phi, loop_latch_edge (loop)->dest_idx))) + return false; + + /* We found a match. */ + tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx); + int src_precision = TYPE_PRECISION (TREE_TYPE (src)); + + /* Get the corresponding c[lt]z builtin. */ + tree expr = build_cltz_expr (src, !left_shift, true); + + if (!expr) + return false; + + expr = fold_build2 (MINUS_EXPR, integer_type_node, + build_int_cst (integer_type_node, src_precision), + expr); + + max = src_precision; + + tree may_be_zero = boolean_false_node; + + if (modify_before_test) + { + expr = fold_build2 (MINUS_EXPR, integer_type_node, expr, + integer_one_node); + max = max - 1; + may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, + build_zero_cst (TREE_TYPE (src))); + } + + expr = fold_convert (unsigned_type_node, expr); + + niter->assumptions = boolean_true_node; + niter->may_be_zero = simplify_using_initial_conditions (loop, may_be_zero); + niter->niter = simplify_using_initial_conditions (loop, expr); + + if (TREE_CODE (niter->niter) == INTEGER_CST) + niter->max = tree_to_uhwi (niter->niter); + else + niter->max = max; + + niter->bound = NULL_TREE; + niter->cmp = ERROR_MARK; + return true; +} + /* See if LOOP contains a bit counting idiom. The idiom consists of two parts: 1. A modification to the induction variabler;. 2. A test to determine whether or not to exit the loop. @@ -2244,7 +2433,8 @@ number_of_iterations_bitcount (loop_p loop, edge exit, enum tree_code code, class tree_niter_desc *niter) { - return number_of_iterations_popcount (loop, exit, code, niter); + return (number_of_iterations_popcount (loop, exit, code, niter) + || number_of_iterations_cltz_complement (loop, exit, code, niter)); } /* Substitute NEW_TREE for OLD in EXPR and fold the result.