From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2077.outbound.protection.outlook.com [40.107.22.77]) by sourceware.org (Postfix) with ESMTPS id 94B6E385223C for ; Mon, 21 Nov 2022 15:53:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 94B6E385223C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vkJnf6SpwZAncwM4SUDTPfT1qP/57T9pGut9HbavP8U=; b=iZurzgioIYJZ6PDxBby3xBQxHE+rVv/bx+hEFAr7rXpxuQWMiTRGFkHKEuhKmFqlzK0/Ntt2KHsLH0cj5lz2SvjvjOjhN02Nt7apHbV1SZ2c7EHmkWzMCxRSkd+zney0HCE4ofTcjo4T9qQVZlSAAo8Xvx2age8UmQCG+idqCE4= Received: from AM5PR1001CA0057.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:206:15::34) by DBBPR08MB5995.eurprd08.prod.outlook.com (2603:10a6:10:20b::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17; Mon, 21 Nov 2022 15:53:15 +0000 Received: from AM7EUR03FT017.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:15:cafe::55) by AM5PR1001CA0057.outlook.office365.com (2603:10a6:206:15::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.15 via Frontend Transport; Mon, 21 Nov 2022 15:53:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT017.mail.protection.outlook.com (100.127.140.184) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5834.8 via Frontend Transport; Mon, 21 Nov 2022 15:53:15 +0000 Received: ("Tessian outbound aeae1c7b66fd:v130"); Mon, 21 Nov 2022 15:53:15 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7d50b2b922165cf3 X-CR-MTA-TID: 64aa7808 Received: from 0db919d3c26d.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 85A5B602-57EF-4BED-B515-CFC51C595C59.1; Mon, 21 Nov 2022 15:53:08 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 0db919d3c26d.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 21 Nov 2022 15:53:08 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=of+xpjcbDJjlTcFr2YBWV27ttI7FoezoXk7NhxmZ9/Ksr8iXI0ue9ISxXo3fB/7WJr/CHzwU6zjAInnIDcC2ZVpgFt2l9v1lta2ygAIN+oFj6HGjDEz3iUqT0RdvzSlWvkFP/F3GSEZJdKjIRRLLvZPodFP1RCjo00WnPMVnM/T5+4DEt9exPgcFTq2xYzReiuinfdaD/X+MhRiFp36c2AZKuckwPfEdzxWmR6/7QdXS43H+IV9g/zh+QnSmqyp3/xkpkWubyuxInq/9Zg7Zldq9NYfxG0kKWB0iEc04rdgSwyb8/sT/XHs4K31H+LUQc6pwkBMPyhiuBIREx96gSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vkJnf6SpwZAncwM4SUDTPfT1qP/57T9pGut9HbavP8U=; b=ofXnYT02pQslWwc3tW29MdPmwVMpW/ZwK/3bjotziTn2eXybmm9/lWWTrtrGvRNo4TpZCfxx3JB2x9aTTmtMV5LuUw1Vy/XnG9VOYWXVi9z3WjXZvQGaK01zTCrpCEzoq5a9NZDj364ens400mWbTxtxD+TJG4NA2CCQxSLYNWhkJ+YbNIHFtGIFi8zuUdjejrmJw0HGcURdEOVI78iG2VODUdwNp99Yv5kFi2DiyvkDFKS3tj2gN6d/gD/sZ7gkYpbwkxW/zNi4scixRLQTK/wJMBMQIpIZ3hBwHLfru1kh6n+4jS573elYo93Co2uyDd5tCZwIh+DfnDus4yT0cA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vkJnf6SpwZAncwM4SUDTPfT1qP/57T9pGut9HbavP8U=; b=iZurzgioIYJZ6PDxBby3xBQxHE+rVv/bx+hEFAr7rXpxuQWMiTRGFkHKEuhKmFqlzK0/Ntt2KHsLH0cj5lz2SvjvjOjhN02Nt7apHbV1SZ2c7EHmkWzMCxRSkd+zney0HCE4ofTcjo4T9qQVZlSAAo8Xvx2age8UmQCG+idqCE4= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from AS8PR08MB6678.eurprd08.prod.outlook.com (2603:10a6:20b:398::8) by AM9PR08MB6177.eurprd08.prod.outlook.com (2603:10a6:20b:283::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.17; Mon, 21 Nov 2022 15:53:06 +0000 Received: from AS8PR08MB6678.eurprd08.prod.outlook.com ([fe80::8256:29ca:bcd8:b754]) by AS8PR08MB6678.eurprd08.prod.outlook.com ([fe80::8256:29ca:bcd8:b754%5]) with mapi id 15.20.5857.017; Mon, 21 Nov 2022 15:53:06 +0000 Date: Mon, 21 Nov 2022 15:53:04 +0000 From: Andrew Carlotti To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 5/8] middle-end: Add cltz_complement idiom recognition Message-ID: References: Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO2P265CA0115.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:c::31) To AS8PR08MB6678.eurprd08.prod.outlook.com (2603:10a6:20b:398::8) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: AS8PR08MB6678:EE_|AM9PR08MB6177:EE_|AM7EUR03FT017:EE_|DBBPR08MB5995:EE_ X-MS-Office365-Filtering-Correlation-Id: 161e1310-7199-424f-ab81-08dacbd8809e x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: euTbSKkLKSLTb1bwbq6QkEApLl0sfC/cZwhLWfu6uDsxdnqfdMd9LNLGR1A/cqDMlO8c+qijaG+E7iAWSqf5LQV8ucF5tk6dftH5HAtR7e0qClSQyl8pt9LDmBeVOWH7MJAA0/QTeGVO0MHwiured7nMlxbcg7/NBREQ5Rp/H7qtLrNo0RzYEWXjk3h0Tod4Wdg+U7+bfH2YpS6ZEYbYGFelwQnf73zhWPDlkBBw7PpmoZFX7d6ujV+YAP2SAd9+a6LsHfUqNR+j94FndoSFrie9Far1kxR//oLLvlYWqffEjlPthS73pVHt2JO/kt9oW5aQ0YU/J4/RV5GmBvm6GdVX3mVGAxKbOaJi0+xHxq71b/fcmghLm12MpyEXpnOSwEOilrhJSG0Z8jqwwHsMAHzubSFtTmRudrupjUM1ITwTn6Vse18okGdwwxtGewmKAs/tzCljGg5blnUqIr1GS1KEhE2xYpgChKKsgbbOsFqZaWAq1ssX2lXh+GrEQ/WaWbTaGKPlniflJDBzzZc8ikV47PgY5hnvgc4JkLdORyXitezZGuFmJtZ0I7i7MngbMfJNvI6ptaaXq2UxJeFxMS7PssftRLNzxVEpFccorhdXKuf0eqJQu2ghsaWQ1VGRdsrS9A0foKAaRSn0NHQSzRGOYO8qcxn1x12nIeE+x60ZSdSLhdOjEKNdTES2EGLlxjzCUyJKKlWFvfEDIWs8fAGjIvRD/fqV+m6iMQQKz+curG6C6x3z1RZp1GmM5Zb6 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AS8PR08MB6678.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(366004)(346002)(376002)(39860400002)(136003)(396003)(451199015)(41300700001)(44832011)(8936002)(6916009)(5660300002)(4326008)(66476007)(8676002)(66556008)(84970400001)(316002)(66946007)(6486002)(83380400001)(478600001)(6506007)(30864003)(26005)(2906002)(86362001)(6512007)(186003)(38100700002)(53546011)(67856001);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6177 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT017.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 3dca4a9c-5576-435c-c355-08dacbd87b0c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9ZncfH1IurpOBePnABXlg9qM8PMFuLn8XJXWnYJTYPMwZRKzUIgbaTkmYIefSVlv3h0HdwgItN+Df/ALeyoLptJtNHSWyyCdef8Pwv+vW9sgSEEDG2+SnK3/xkOPSYtkgWUe+UhekD9NuTY7ycUOpSzertX4Yp4SD//0ZPThSd0hSpkXLRTEGzRC279OfFy0OBGTYJXI4M0WDuWv/feopsxJ2OCQwEY/c2yNtC9WIxFU8A/pau5Lw3hkS/2aZqA1njEicFH/3mdiBR759Vl0W+LzxV9lNWSRN0p/CBV84VzNClrJWfrkThCfliyL5CdY9S/KpFgTmlbPflVPCtMf7N15EEYBpM9/zGJGV9DVzqd1a0ENGFKSO/Ga14cLEjQVbhQVIz4kWRJsFzd560EtAjRNReWfsCO3jgL++W/qI0Lp4PGOwc+KCxpwl7w+IMY2Djfp8lL8X+rksr41UNExfnZgT7mX8PR+IZKyniEbukMd+dROghkRG4PFkZV8+CYJ+7JQZ8bBTm7NIYa4QpylfDApGL9/eQG78ebghKW273hWqtB9/o6PoATuRb4AZPIf4G+Yc5A5qOyUPlEXkSqZAewnqkGCA0r62jcedGDSnW4XLUvPfQSxZVB4jT8R9OvyztYwzAf+2LgK+IPDpFjUIbXGXqObk7E6FnPTjAh+pQ6mWTD/ISE0xBOZ+xFPWGXOC+2rSnP9CSXg+yV8yIx0gHdcmKLhmvANXuKRWi8RBXGm9zXiL06e3zruBG9U9H3lvrfsiJiMHk7rC7ZEm7VK19w7uRGBCKm7J3iDDjdr/fY= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(376002)(396003)(39850400004)(346002)(451199015)(46966006)(36840700001)(40470700004)(84970400001)(82740400003)(81166007)(8676002)(86362001)(4326008)(2906002)(356005)(6862004)(8936002)(36860700001)(83380400001)(316002)(47076005)(336012)(70586007)(6486002)(40480700001)(186003)(478600001)(44832011)(30864003)(82310400005)(41300700001)(70206006)(40460700003)(5660300002)(6506007)(6512007)(53546011)(26005)(67856001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Nov 2022 15:53:15.6822 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 161e1310-7199-424f-ab81-08dacbd8809e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT017.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB5995 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Nov 14, 2022 at 04:10:22PM +0100, Richard Biener wrote: > On Fri, Nov 11, 2022 at 7:53 PM Andrew Carlotti via Gcc-patches > wrote: > > > > This recognises patterns of the form: > > while (n) { n >>= 1 } > > > > This patch results in improved (but still suboptimal) codegen: > > > > foo (unsigned int b) { > > int c = 0; > > > > while (b) { > > b >>= 1; > > c++; > > } > > > > return c; > > } > > > > foo: > > .LFB11: > > .cfi_startproc > > cbz w0, .L3 > > clz w1, w0 > > tst x0, 1 > > mov w0, 32 > > sub w0, w0, w1 > > csel w0, w0, wzr, ne > > ret > > > > The conditional is unnecessary. phiopt could recognise a redundant csel > > (using cond_removal_in_builtin_zero_pattern) when one of the inputs is a > > clz call, but it cannot recognise the redunancy when the input is (e.g.) > > (32 - clz). > > > > I could perhaps extend this function to recognise this pattern in a later > > patch, if this is a good place to recognise more patterns. > > > > gcc/ChangeLog: > > + PR tree-optimization/94793 > > * tree-scalar-evolution.cc (expression_expensive_p): Add checks > > for c[lt]z optabs. > > * tree-ssa-loop-niter.cc (build_cltz_expr): New. > > (number_of_iterations_cltz_complement): New. > > (number_of_iterations_bitcount): Add call to the above. > > > > gcc/testsuite/ChangeLog: > > > > * lib/target-supports.exp (check_effective_target_clz) > > (check_effective_target_clzl, check_effective_target_clzll) > > (check_effective_target_ctz, check_effective_target_clzl) > > (check_effective_target_ctzll): New. > > * gcc.dg/tree-ssa/cltz-complement-max.c: New test. > > * gcc.dg/tree-ssa/clz-complement-char.c: New test. > > * gcc.dg/tree-ssa/clz-complement-int.c: New test. > > * gcc.dg/tree-ssa/clz-complement-long-long.c: New test. > > * gcc.dg/tree-ssa/clz-complement-long.c: New test. > > * gcc.dg/tree-ssa/ctz-complement-char.c: New test. > > * gcc.dg/tree-ssa/ctz-complement-int.c: New test. > > * gcc.dg/tree-ssa/ctz-complement-long-long.c: New test. > > * gcc.dg/tree-ssa/ctz-complement-long.c: New test. > > > > > > -- > > > > [snip test diffs] > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc > > index 7e2a3e986619de87e4ae9daf16198be1f13b917c..1ac9526c69b5fe80c26022f2fa1176d222e2dfb9 100644 > > --- a/gcc/tree-scalar-evolution.cc > > +++ b/gcc/tree-scalar-evolution.cc > > @@ -3406,12 +3406,21 @@ expression_expensive_p (tree expr, hash_map &cache, > > library call for popcount when backend does not have an instruction > > to do so. We consider this to be expensive and generate > > __builtin_popcount only when backend defines it. */ > > + optab optab; > > combined_fn cfn = get_call_combined_fn (expr); > > switch (cfn) > > { > > CASE_CFN_POPCOUNT: > > + optab = popcount_optab; > > + goto bitcount_call; > > + CASE_CFN_CLZ: > > + optab = clz_optab; > > + goto bitcount_call; > > + CASE_CFN_CTZ: > > + optab = ctz_optab; > > +bitcount_call: > > /* Check if opcode for popcount is available in the mode required. */ > > - if (optab_handler (popcount_optab, > > + if (optab_handler (optab, > > TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0)))) > > == CODE_FOR_nothing) > > { > > @@ -3424,7 +3433,7 @@ expression_expensive_p (tree expr, hash_map &cache, > > instructions. */ > > if (is_a (mode, &int_mode) > > && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD > > - && (optab_handler (popcount_optab, word_mode) > > + && (optab_handler (optab, word_mode) > > != CODE_FOR_nothing)) > > break; > > return true; > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc > > index fece876099c1687569d6351e7d2416ea6acae5b5..16e8e25919d808cea27adbd72f0be01ae2f0e547 100644 > > --- a/gcc/tree-ssa-loop-niter.cc > > +++ b/gcc/tree-ssa-loop-niter.cc > > @@ -2198,6 +2198,195 @@ number_of_iterations_popcount (loop_p loop, edge exit, > > return true; > > } > > > > +/* Return an expression that counts the leading/trailing zeroes of src. */ > > Can you expand the comment on how you handle defined_at_zero and how > that relates to the C[LT]Z_DEFINED_VALUE_AT_ZERO target macros? > The loop examples you gave above all have a defined value for zero, I'm > not sure how you'd write a C loop which has that undefined. > How about: /* Return an expression that counts the leading/trailing zeroes of src. If defined_at_zero is true, then the built expression uses a conditonal expression to return the precision of src when src == 0. Otherwise, we can elide the conditional expression and let src = 0 invoke undefined behaviour. */ > > +static tree > > +build_cltz_expr (tree src, bool leading, bool defined_at_zero) > > +{ > > + tree fn; > > + int prec = TYPE_PRECISION (TREE_TYPE (src)); > > + int i_prec = TYPE_PRECISION (integer_type_node); > > + int li_prec = TYPE_PRECISION (long_integer_type_node); > > + int lli_prec = TYPE_PRECISION (long_long_integer_type_node); > > + if (prec <= i_prec) > > I don't think we can use <= for both CLZ and CTZ, no? You probably > need a GIMPLE testcase or a language that doesn't promote char/short > to int for a testcase that fails though. I don't see an issue here. I've ensured that src is the mode used in the iterator, and if a longer mode is used for the __builtin_clz call then I account for the difference below... > > + fn = leading ? builtin_decl_implicit (BUILT_IN_CLZ) > > + : builtin_decl_implicit (BUILT_IN_CTZ); > > + else if (prec == li_prec) > > + fn = leading ? builtin_decl_implicit (BUILT_IN_CLZL) > > + : builtin_decl_implicit (BUILT_IN_CTZL); > > + else if (prec == lli_prec || prec == 2 * lli_prec) > > + fn = leading ? builtin_decl_implicit (BUILT_IN_CLZLL) > > + : builtin_decl_implicit (BUILT_IN_CTZLL); > > + else > > + return NULL_TREE; > > + > > + tree utype = unsigned_type_for (TREE_TYPE (src)); > > + src = fold_convert (utype, src); > > + if (prec < i_prec) > > + src = fold_convert (unsigned_type_node, src); > > + > > + tree call; > > + if (prec == 2 * lli_prec) > > + { > > + tree src1 = fold_convert (long_long_unsigned_type_node, > > + fold_build2 (RSHIFT_EXPR, TREE_TYPE (src), > > + unshare_expr (src), > > + build_int_cst (integer_type_node, > > + lli_prec))); > > + tree src2 = fold_convert (long_long_unsigned_type_node, src); > > + /* We count the zeroes in src1, and add the number in src2 when src1 > > + is 0. */ > > + if (!leading) > > + std::swap(src1, src2); > > + tree call1 = build_call_expr (fn, 1, src1); > > + tree call2 = build_call_expr (fn, 1, src2); > > + if (defined_at_zero) > > + { > > + tree is_zero2 = fold_build2 (NE_EXPR, boolean_type_node, src2, > > + build_zero_cst (TREE_TYPE (src2))); > > + call2 = fold_build3(COND_EXPR, integer_type_node, is_zero2, call2, > > + build_int_cst (integer_type_node, lli_prec)); > > + } > > + tree is_zero1 = fold_build2 (NE_EXPR, boolean_type_node, src1, > > + build_zero_cst (TREE_TYPE (src1))); > > + call = fold_build3(COND_EXPR, integer_type_node, is_zero1, call1, > > + fold_build2 (PLUS_EXPR, integer_type_node, call2, > > + build_int_cst (integer_type_node, > > + lli_prec))); > > + } > > + else > > + { > > + call = build_call_expr (fn, 1, src); > > + if (defined_at_zero) > > + { > > + tree is_zero = fold_build2 (NE_EXPR, boolean_type_node, src, > > + build_zero_cst (TREE_TYPE (src))); > > + call = fold_build3(COND_EXPR, integer_type_node, is_zero, call, > > + build_int_cst (integer_type_node, prec)); > > + } > > + } > > + ...with this code: > > + if (leading && prec < i_prec) > > + call = fold_build2(MINUS_EXPR, integer_type_node, call, > > + build_int_cst (integer_type_node, > > + i_prec - prec)); > > + > > + return call; > > +} > > + > > +/* See comment below for number_of_iterations_bitcount. > > + For c[lt]z complement, we have: > > + > > + modify: > > + iv_2 = iv_1 >> 1 OR iv_1 << 1 > > + > > + test: > > + if (iv != 0) > > + > > + modification count: > > + src precision - c[lt]z (src) > > + > > + */ > > + > > +static bool > > +number_of_iterations_cltz_complement (loop_p loop, edge exit, > > + enum tree_code code, > > + class tree_niter_desc *niter) > > +{ > > + bool modify_before_test = true; > > + HOST_WIDE_INT max; > > + > > + /* Check that condition for staying inside the loop is like > > + if (iv != 0). */ > > + gimple *cond_stmt = last_stmt (exit->src); > > + if (!cond_stmt > > + || gimple_code (cond_stmt) != GIMPLE_COND > > + || code != NE_EXPR > > + || !integer_zerop (gimple_cond_rhs (cond_stmt)) > > + || TREE_CODE (gimple_cond_lhs (cond_stmt)) != SSA_NAME) > > + return false; > > + > > + tree iv_2 = gimple_cond_lhs (cond_stmt); > > + gimple *iv_2_stmt = SSA_NAME_DEF_STMT (iv_2); > > + > > + /* If the test comes before the iv modification, then these will actually be > > + iv_1 and a phi node. */ > > + if (gimple_code (iv_2_stmt) == GIMPLE_PHI > > + && gimple_bb (iv_2_stmt) == loop->header > > + && gimple_phi_num_args (iv_2_stmt) == 2 > > + && (TREE_CODE (gimple_phi_arg_def (iv_2_stmt, > > + loop_latch_edge (loop)->dest_idx)) > > + == SSA_NAME)) > > + { > > + /* iv_2 is actually one of the inputs to the phi. */ > > + iv_2 = gimple_phi_arg_def (iv_2_stmt, loop_latch_edge (loop)->dest_idx); > > + iv_2_stmt = SSA_NAME_DEF_STMT (iv_2); > > + modify_before_test = false; > > + } > > + > > + /* Make sure iv_2_stmt is a logical shift by one stmt: > > + iv_2 = iv_1 {>>|<<} 1 */ > > + if (!is_gimple_assign (iv_2_stmt) > > + || (gimple_assign_rhs_code (iv_2_stmt) != LSHIFT_EXPR > > + && (gimple_assign_rhs_code (iv_2_stmt) != RSHIFT_EXPR > > + || !TYPE_UNSIGNED (TREE_TYPE (gimple_assign_lhs (iv_2_stmt))))) > > + || !integer_onep (gimple_assign_rhs2 (iv_2_stmt))) > > + return false; > > + > > + bool left_shift = (gimple_assign_rhs_code (iv_2_stmt) == LSHIFT_EXPR); > > + > > + tree iv_1 = gimple_assign_rhs1 (iv_2_stmt); > > + > > + /* Check the recurrence. */ > > + gimple *phi = SSA_NAME_DEF_STMT (iv_1); > > + if (gimple_code (phi) != GIMPLE_PHI > > + || (gimple_bb (phi) != loop_latch_edge (loop)->dest) > > + || (iv_2 != gimple_phi_arg_def (phi, loop_latch_edge (loop)->dest_idx))) > > + return false; > > + > > + /* We found a match. */ > > + tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx); > > + int src_precision = TYPE_PRECISION (TREE_TYPE (src)); > > + > > + /* Get the corresponding c[lt]z builtin. */ > > + tree expr = build_cltz_expr (src, !left_shift, true); > > So we always have defined_at_zero == true? In the patch: yes. The next patch uses defined_at_zero == false, and I don't think there's any point in submitting a simpler build_cltz_expr for just this patch. > > + > > + if (!expr) > > + return false; > > + > > + expr = fold_build2 (MINUS_EXPR, integer_type_node, > > + build_int_cst (integer_type_node, src_precision), > > + expr); > > + > > + max = src_precision; > > + > > + tree may_be_zero = boolean_false_node; > > + > > + if (modify_before_test) > > + { > > + expr = fold_build2 (MINUS_EXPR, integer_type_node, expr, > > + integer_one_node); > > + max = max - 1; > > + may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, > > + build_zero_cst (TREE_TYPE (src))); > > + } > > + > > + expr = fold_convert (unsigned_type_node, expr); > > + > > + niter->assumptions = boolean_true_node; > > + niter->may_be_zero = simplify_using_initial_conditions (loop, may_be_zero); > > + niter->niter = simplify_using_initial_conditions (loop, expr); > > + > > + if (TREE_CODE (niter->niter) == INTEGER_CST) > > + niter->max = tree_to_uhwi (niter->niter); > > + else > > + niter->max = max; > > + > > + niter->bound = NULL_TREE; > > + niter->cmp = ERROR_MARK; > > + return true; > > +} > > + > > /* See if LOOP contains a bit counting idiom. The idiom consists of two parts: > > 1. A modification to the induction variabler;. > > 2. A test to determine whether or not to exit the loop. > > @@ -2244,7 +2433,8 @@ number_of_iterations_bitcount (loop_p loop, edge exit, > > enum tree_code code, > > class tree_niter_desc *niter) > > { > > - return number_of_iterations_popcount (loop, exit, code, niter); > > + return (number_of_iterations_popcount (loop, exit, code, niter) > > + || number_of_iterations_cltz_complement (loop, exit, code, niter)); > > I'm kind-of missing the non-complement variant ;) See next patch :) > Otherwise looks OK to me. > > Thanks, > Richard. > > > } > > > > /* Substitute NEW_TREE for OLD in EXPR and fold the result.