From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2045.outbound.protection.outlook.com [40.107.8.45]) by sourceware.org (Postfix) with ESMTPS id C77CD38582BF for ; Thu, 16 Nov 2023 11:37:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C77CD38582BF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C77CD38582BF Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.8.45 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700134626; cv=pass; b=o3GQ8pqBjE8/nSoQu5UqNRrKuvaLpWiniQzNJ3c507jH6RLJ2vXtBlOi/wkoBBN1Z2deEsLNv2URJTfpf9ERuXynQM5zhjE91phWmdAQspsMrObI6f0O5/RN4lCWnf2RheDiAQ131P3CVyTtjrG+/Pm91rdiEaTPBinA59YI0NA= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700134626; c=relaxed/simple; bh=iRKnNc1d5/OVbZLVi8qmlc23NQiRHHs9gwXXQmC+nWg=; h=DKIM-Signature:DKIM-Signature:Message-ID:Date:Subject:From:To: MIME-Version; b=Gc4jf/lNdYHtlSw699AdlrFMoNp2gyJjWBqwwd+64cEzDSpswzpT47E7IF8C8+uVh4PG+o3I8iDSomKcqkANSu101Wt1BSXYlJf0CXbE/t1ATsq68aBxph3c7qh4+VhWZdZ8XJJTP5LjPSmVgb9jAjoibeLGRYVzkQwEFNbXO+s= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=QMxO3HHHAImqzVOqjYGYYt46SMnDpzKVGCk1/J8dM1osm7ZzCzPJ+PJLzOlIgl/DOMzYTPTbwXiInPZAYls4weoE0jVA1L8EbHFShd6rYJnnGTfdnFZPzse6FcxYz1jIuCdI4OQVvb1llqMXhUfqQXKXmt1GKp6q3pIIbt0qE4vmBHnWRwn5bnw+LDpiCkDJOfQZPl1ePaYtXJ83VGbsIZfMjrgn5nav6mNTIPptco3tCKyTdV5rRSZbhClPR5VipDPC4+iUJi32ehuV5WYlETjTzUKkALw3hBZMbmuw9r70JtGWLtSVeSZ83Qex/UiWIBQr2Vf47dLqoK2OC8lglA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=71KGylKKU/BPU1J06buLRPXADUmozSuz2vIOw1UDtpA=; b=D6WeStvrZIQ6iWlSW/9i/0Sa84DQhjVBDNHSxWYMGh15+tPGIOdNy+yyWavmaW5rzTCiLAxVYKlEOn+W2PER2HO2oVHnNQ7zLsxaVhWeGmLaODYyYGDpk/LkrqvZTbB5UG+zyFQQemFNyCHfWqDh1oMNFq2hWEKj5rZBZMeQJsl+Uhqm7CI8mEfudHehDYj/dUv4fc5iqm81ZgrtSIFbRqV2nGKm7GRMyqpXnA7K4+7kfn6aFzD9JaMLwcNoMg8EHA0H8AwyzbYaxusQfDMOqJpWW9OCwpkbvVUSp7ZPot+R4PxGB5brOgaPOjzC2kJQWNehH7OfTIqtEr4K4HTatQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=71KGylKKU/BPU1J06buLRPXADUmozSuz2vIOw1UDtpA=; b=4vObRUIx1ljPIXRKuHLcHwczp6kAPnGWhafOBAID7azHmvkXmh5fRr0w1XXwBb1TqwJq47tWKVdvc/zIjxvqouhvxxwpcblNua8gLHxRQLGQO5StiNdu9G7JR0jmuvJWdNQLRrIQNvQMNaIeTxBzSLudcOfwYX3SHlUzPeT/yF4= Received: from DU2P250CA0027.EURP250.PROD.OUTLOOK.COM (2603:10a6:10:231::32) by PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.18; Thu, 16 Nov 2023 11:36:58 +0000 Received: from DU6PEPF0000B622.eurprd02.prod.outlook.com (2603:10a6:10:231:cafe::56) by DU2P250CA0027.outlook.office365.com (2603:10a6:10:231::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.33 via Frontend Transport; Thu, 16 Nov 2023 11:36:58 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU6PEPF0000B622.mail.protection.outlook.com (10.167.8.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.20 via Frontend Transport; Thu, 16 Nov 2023 11:36:58 +0000 Received: ("Tessian outbound e243565b0037:v228"); Thu, 16 Nov 2023 11:36:58 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 82c2e14f226440dd X-CR-MTA-TID: 64aa7808 Received: from ba7d882dc939.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 47271B69-ADFF-4A02-B642-F9169CCC5FC9.1; Thu, 16 Nov 2023 11:36:51 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ba7d882dc939.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 16 Nov 2023 11:36:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=goD5A/gRaSZvfT2cuH8pLF5NoPT4k5B+NODP2+EKA9MTZxtQD52efqJMjVd6uWRwJnIpYp2IGLfiS67LBwr/O+jNU6g05iKFFSha/YHm75oRzIMyjIy5AiRtDcbMigSB5gF8kk+l3VIdwz2peoaZk8rUicm2KHe70+nV8bxyOnl/81mm+UWea7tGm74P9gW7hC+deAYhc7vEyEc/bVP+dIj7GLGnHterRvXQjHRF3kq4ZhtCQitMWtNo0kk26LRbnxUwhcHMFhhLE8dTfG6BRfIG0S5jDidiIJ8tMxYmr2mzgq62MY3HQNB8XHnFuFC6HtzVOoN2pcFYq0gukrFZDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=71KGylKKU/BPU1J06buLRPXADUmozSuz2vIOw1UDtpA=; b=JtyCpLmTYKSK1jzomNXbNvusIDRgvrOlbcqRqolMnjs6qYXhdiSzOy6p3/vK2fxJTCWbxTbnm8siIG+dHxJyuyBScufVwyJxB7KognW0L7+9OEtFueLg6ZE6PykLB1W/ccsvCDknnKUbuAVZAxrhPXoF8dHO6Ax2bzDMBq+azFHv/4Mhz2BfmIsKhBMQpmlWwnaw0zaMDc0nC/ATb+X17BIq0KTq2SIEUF0d/HJCjhBQnhdfamfylEs7NFE6looTCB5vT60YarcWwBAJhehtzdRZc933xcbIpVlgCNAgA0Rq0eP7Rd5GSEBDqaPFAM8Z8QxP/vFVSnMytONM9wBe2g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=71KGylKKU/BPU1J06buLRPXADUmozSuz2vIOw1UDtpA=; b=4vObRUIx1ljPIXRKuHLcHwczp6kAPnGWhafOBAID7azHmvkXmh5fRr0w1XXwBb1TqwJq47tWKVdvc/zIjxvqouhvxxwpcblNua8gLHxRQLGQO5StiNdu9G7JR0jmuvJWdNQLRrIQNvQMNaIeTxBzSLudcOfwYX3SHlUzPeT/yF4= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB6507.eurprd08.prod.outlook.com (2603:10a6:10:25a::6) by GV2PR08MB10356.eurprd08.prod.outlook.com (2603:10a6:150:d6::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.21; Thu, 16 Nov 2023 11:36:47 +0000 Received: from DB9PR08MB6507.eurprd08.prod.outlook.com ([fe80::6118:948d:f205:3182]) by DB9PR08MB6507.eurprd08.prod.outlook.com ([fe80::6118:948d:f205:3182%4]) with mapi id 15.20.7002.021; Thu, 16 Nov 2023 11:36:45 +0000 Message-ID: <6cab7952-9b00-4d85-8fbb-c8058d2142d2@arm.com> Date: Thu, 16 Nov 2023 11:36:44 +0000 User-Agent: Mozilla Thunderbird Subject: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops From: Stamatis Markianos-Wright To: Stamatis Markianos-Wright via Gcc-patches , Richard Earnshaw , richard.sandiford@arm.com, Kyrylo Tkachov References: <949f5dd0-cdf0-715a-f04c-3de80c9b974f@arm.com> <32452185-e459-4521-9b77-e80d06573ee2@arm.com> <5793c5af-9c01-48a8-9bf3-f289e7f32640@arm.com> <05ab69cf-dea0-44d2-875c-983985a26b99@arm.com> Content-Language: en-US In-Reply-To: <05ab69cf-dea0-44d2-875c-983985a26b99@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: LNXP265CA0032.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:5c::20) To DB9PR08MB6507.eurprd08.prod.outlook.com (2603:10a6:10:25a::6) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB6507:EE_|GV2PR08MB10356:EE_|DU6PEPF0000B622:EE_|PAWPR08MB8982:EE_ X-MS-Office365-Filtering-Correlation-Id: e1a8363a-d835-4364-a373-08dbe69857d5 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: J/neDhs/9aDO8RFpxY5zVmrsYHx/wWtJRaumTltsJluh0fdn9JlxplShLdUuSOOKRNhQFSlzTxKgPI0MxVOl/CVON0LBmbMIXXCxVuMnSp5u/Orq/Bt58pc38b/wp8261AKFTqgEXlCGpmTPoWTeApzRhnhTvuRaVYoDACWe75RWEwlqBzAe1xrB6NBIMAcULnbPTozGroGr/OkM+1OArcCt0rO8sGeAuOL9lh1irFTrquOSt6nne067+aVpxTSFrn4fHa8UuRAbOU1DHG2PikCtqP6Bx5TDiQTrv9MTcBG3z+/0wpU5sKqw5YaLs3+h4Sh4dwYgo6qcS2uCUGNOi2xbLFCw0MAADX+6dlVNLQSgbkp31oNlEWfZherDF+OcY92GNBnycQ3B8m258mqUkk34pSq63ha9zBqObAms5xgRijghy4W4bAT11Pb5aEGntVmlDU2AvXbfscVfTpdtFl4A9zj7XpTEgKo/azbnXlk06dluiaNHiT/sZdacJoUXOSZC0wq57ybh2mRfQhRiK80BrZm9GtDRxRYZR5r+VnBwccu1A6nR4yqb8FmPin2Q7tj6q6Mu+UQ8KVdRqYKWkuhEFEow+/hXWyMzW4adtQY7+BEFP6ohHE8VTARWyCpGMdhsgYhLBlv/dFQczB6fJg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB6507.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(396003)(136003)(39860400002)(346002)(376002)(230922051799003)(186009)(1800799009)(64100799003)(451199024)(6506007)(26005)(53546011)(2616005)(6512007)(83380400001)(5660300002)(8676002)(8936002)(41300700001)(478600001)(2906002)(30864003)(6486002)(316002)(6636002)(36756003)(110136005)(66476007)(66556008)(66946007)(31696002)(86362001)(38100700002)(31686004)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB10356 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU6PEPF0000B622.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 87d1a6ff-b786-405b-45da-08dbe6985016 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: PbDSPXdhldDXNx/WB/aC+if0gZS158ZcdBPv8Jon5m2SlMKuD4iYWOtxOzc50nUxMnqhlVwf6IDejN1iwvN58QXZ9BE9Jy4nIsnd7lLbjv3y2DE3YpiTRYmXv5W2ByTM/dpsrjXYPBsJrjYmNFTFLmfaPzi35p/glz1wCnepE7+2RLjz1FoxyZa3JTzuw0Y600U8UmltnvoF66GqzreBdE6e9qAXhrlpL76hXOQX7edNFjifX1uREtGnc+Mj5wbEkFcYjdVwUDSntPn1cDHBKb4SZQYdYe4MfyE3vBn6Be4+08lGaUGZ7M/j7AWOxss6WVbJMIyTuH+4n9dBNzrBEvkR8QvP16MDRAocJf+s6Fw16G1OxWVNFAQ1CmocaTElLOdyY6XpRXPNkWCn6Y9xzsovrsAVRr/dfMiM7MpnLwwTGiqGgimus7JFqiswDDiqtx+uf1KO8pCD2qL+c5JO6bqKOheqgGGdfTC+pdbzjhlmgBzusr630txFIKDEALWQetReYNMI5FV9Ztad5aqVe+Cakbu10zVSbHLYpc7oEIJVlk9llVPMGFyI1W44jWbsoMsqcqVd9S2h9Uz/oLb4naJcX1Mpt+kA505VbNThmW7i7HNXUG2I78w44c9RrCbLGg93MyUIi0ZDKRRx7GmeBXiO/W8SJy3evtJSg5xcqi0zMGMOr7VbRRhEHaCxa9ihgjcd5uiS5Xusy9eI7pnsmneCSvAqtyfrvRP+bKtfD6osp4NnuS9Nf/zl0lbTrSzISeJPwGLKMOW+Vn+04vuW4Q== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(346002)(136003)(396003)(39860400002)(376002)(230922051799003)(1800799009)(64100799003)(186009)(451199024)(82310400011)(40470700004)(46966006)(36840700001)(83380400001)(356005)(40480700001)(31686004)(41300700001)(31696002)(6512007)(2906002)(81166007)(36860700001)(5660300002)(30864003)(86362001)(47076005)(8936002)(8676002)(336012)(26005)(6636002)(316002)(2616005)(70206006)(110136005)(70586007)(53546011)(6506007)(40460700003)(36756003)(478600001)(82740400003)(6486002)(43740500002);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Nov 2023 11:36:58.6093 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e1a8363a-d835-4364-a373-08dbe69857d5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF0000B622.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB8982 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Pinging back to the top of reviewers' inboxes due to worry about Stage 1 End in a few days :) See the last email for the latest version of the 2/2 patch. The 1/2 patch is A-Ok from Kyrill's earlier target-backend review. On 10/11/2023 12:41, Stamatis Markianos-Wright wrote: > > On 06/11/2023 17:29, Stamatis Markianos-Wright wrote: >> >> On 06/11/2023 11:24, Richard Sandiford wrote: >>> Stamatis Markianos-Wright writes: >>>>> One of the main reasons for reading the arm bits was to try to answer >>>>> the question: if we switch to a downcounting loop with a GE >>>>> condition, >>>>> how do we make sure that the start value is not a large unsigned >>>>> number that is interpreted as negative by GE?  E.g. if the loop >>>>> originally counted up in steps of N and used an LTU condition, >>>>> it could stop at a value in the range [INT_MAX + 1, UINT_MAX]. >>>>> But the loop might never iterate if we start counting down from >>>>> most values in that range. >>>>> >>>>> Does the patch handle that? >>>> So AFAICT this is actually handled in the generic code in >>>> `doloop_valid_p`: >>>> >>>> This kind of loops fail because of they are "desc->infinite", then no >>>> loop-doloop conversion is attempted at all (even for standard >>>> dls/le loops) >>>> >>>> Thanks to that check I haven't been able to trigger anything like the >>>> behaviour you describe, do you think the doloop_valid_p checks are >>>> robust enough? >>> The loops I was thinking of are provably not infinite though. E.g.: >>> >>>    for (unsigned int i = 0; i < UINT_MAX - 100; ++i) >>>      ... >>> >>> is known to terminate.  And doloop conversion is safe with the normal >>> count-down-by-1 approach, so I don't think current code would need >>> to reject it.  I.e. a conversion to: >>> >>>    unsigned int i = UINT_MAX - 101; >>>    do >>>      ... >>>    while (--i != ~0U); >>> >>> would be safe, but a conversion to: >>> >>>    int i = UINT_MAX - 101; >>>    do >>>      ... >>>    while ((i -= step, i > 0)); >>> >>> wouldn't, because the loop body would only be executed once. >>> >>> I'm only going off the name "infinite" though :)  It's possible that >>> it has more connotations than that. >>> >>> Thanks, >>> Richard >> >> Ack, yep, I see what you mean now, and yep, that kind of loop does >> indeed pass through doloop_valid_p >> >> Interestingly , in the v8-M Arm ARM this is done with: >> >> ``` >> >> boolean IsLastLowOverheadLoop(INSTR_EXEC_STATE_Type state) >> // This does not check whether a loop is currently active. >> // If the PE were in a loop, would this be the last one? >> return UInt(state.LoopCount) <= (1 << (4 - LTPSIZE)); >> >> ``` >> >> So architecturally the asm we output would be ok (except maybe the >> "branch too far subs;bgt;lctp" fallback at >> `predicated_doloop_end_internal` (maybe that should be `bhi`))... But >> now GE: isn't looking like an accurate representation of this >> operation in the compiler. >> >> I'm wondering if I should try to make >> `predicated_doloop_end_internal` contain a comparison along the lines >> of: >> (gtu: (plus: (LR) (const_int -num_lanes)) (const_int num_lanes_minus_1)) >> >> I'll give that a try :) >> >> The only reason I'd chosen to go with GE earlier, tbh, was because of >> the existing handling of GE in loop-doloop.cc >> >> Let me know if any other ideas come to your mind! >> >> >> Cheers, >> >> Stam > > > It looks like I've had success with the below (diff to previous patch), > trimmed a bit to only the functionally interesting things:: > > > > > diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md > index 368d5138ca1..54dd4ee564b 100644 > --- a/gcc/config/arm/thumb2.md > +++ b/gcc/config/arm/thumb2.md > @@ -1649,16 +1649,28 @@ >            && (decrement_num = arm_attempt_dlstp_transform (operands[1])) >            && (INTVAL (decrement_num) != 1)) >          { > -          insn = emit_insn > -              (gen_thumb2_addsi3_compare0 > -              (s0, s0, GEN_INT ((-1) * (INTVAL (decrement_num))))); > -          cmp = XVECEXP (PATTERN (insn), 0, 0); > -          cc_reg = SET_DEST (cmp); > -          bcomp = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx); >            loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[1]); > -          emit_jump_insn (gen_rtx_SET (pc_rtx, > -                       gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, > -                                 loc_ref, pc_rtx))); > +          switch (INTVAL (decrement_num)) > +        { > +          case 2: > +            insn = emit_jump_insn (gen_predicated_doloop_end_internal2 > +                        (s0, loc_ref)); > +            break; > +          case 4: > +            insn = emit_jump_insn (gen_predicated_doloop_end_internal4 > +                        (s0, loc_ref)); > +            break; > +          case 8: > +            insn = emit_jump_insn (gen_predicated_doloop_end_internal8 > +                        (s0, loc_ref)); > +            break; > +          case 16: > +            insn = emit_jump_insn (gen_predicated_doloop_end_internal16 > +                        (s0, loc_ref)); > +            break; > +          default: > +            gcc_unreachable (); > +        } >            DONE; >          } >      } > > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > index 93905583b18..c083f965fa9 100644 > --- a/gcc/config/arm/mve.md > +++ b/gcc/config/arm/mve.md > @@ -6922,23 +6922,24 @@ >  ;; Originally expanded by 'predicated_doloop_end'. >  ;; In the rare situation where the branch is too far, we do also need to >  ;; revert FPSCR.LTPSIZE back to 0x100 after the last iteration. > -(define_insn "*predicated_doloop_end_internal" > +(define_insn "predicated_doloop_end_internal" >    [(set (pc) >      (if_then_else > -       (ge (plus:SI (reg:SI LR_REGNUM) > -            (match_operand:SI 0 "const_int_operand" "")) > -        (const_int 0)) > -     (label_ref (match_operand 1 "" "")) > +       (gtu (unspec:SI [(plus:SI (match_operand:SI 0 > "s_register_operand" "=r") > +                     (const_int ))] > +        LETP) > +        (const_int )) > +     (match_operand 1 "" "") >       (pc))) > -   (set (reg:SI LR_REGNUM) > -    (plus:SI (reg:SI LR_REGNUM) (match_dup 0))) > +   (set (match_dup 0) > +    (plus:SI (match_dup 0) (const_int ))) >     (clobber (reg:CC CC_REGNUM))] >    "TARGET_HAVE_MVE" >    { >      if (get_attr_length (insn) == 4) >        return "letp\t%|lr, %l1"; >      else > -      return "subs\t%|lr, #%n0\n\tbgt\t%l1\n\tlctp"; > +      return "subs\t%|lr, #\n\tbhi\t%l1\n\tlctp"; >    } >    [(set (attr "length") >      (if_then_else > @@ -6947,11 +6948,11 @@ >          (const_int 6))) >     (set_attr "type" "branch")]) > > -(define_insn "dlstp_insn" > +(define_insn "dlstp_insn" >    [ >      (set (reg:SI LR_REGNUM) >       (unspec:SI [(match_operand:SI 0 "s_register_operand" "r")] >        DLSTP)) >    ] >    "TARGET_HAVE_MVE" > -  "dlstp.\t%|lr, %0") > +  "dlstp.\t%|lr, %0") > > diff --git a/gcc/loop-doloop.cc b/gcc/loop-doloop.cc > index 6a72700a127..47fdef989b4 100644 > --- a/gcc/loop-doloop.cc > +++ b/gcc/loop-doloop.cc > @@ -185,6 +185,7 @@ doloop_condition_get (rtx_insn *doloop_pat) >        || XEXP (inc_src, 0) != reg >        || !CONST_INT_P (XEXP (inc_src, 1))) >      return 0; > +  int dec_num = abs (INTVAL (XEXP (inc_src, 1))); > >    /* Check for (set (pc) (if_then_else (condition) >                                         (label_ref (label)) > @@ -199,21 +200,32 @@ doloop_condition_get (rtx_insn *doloop_pat) >    /* Extract loop termination condition.  */ >    condition = XEXP (SET_SRC (cmp), 0); > > -  /* We expect a GE or NE comparison with 0 or 1.  */ > -  if ((GET_CODE (condition) != GE > -       && GET_CODE (condition) != NE) > -      || (XEXP (condition, 1) != const0_rtx > -          && XEXP (condition, 1) != const1_rtx)) > +  /* We expect a GE or NE comparison with 0 or 1, or a GTU comparison > with > +     dec_num - 1.  */ > +  if (!((GET_CODE (condition) == GE > +     || GET_CODE (condition) == NE) > +    && (XEXP (condition, 1) == const0_rtx > +        || XEXP (condition, 1) == const1_rtx )) > +      &&!(GET_CODE (condition) == GTU > +      && ((INTVAL (XEXP (condition, 1))) == (dec_num - 1)))) >      return 0; > > -  if ((XEXP (condition, 0) == reg) > +  /* For the ARM special case of having a GTU: re-form the condition > without > +     the unspec for the benefit of the middle-end.  */ > +  if (GET_CODE (condition) == GTU) > +    { > +      condition = gen_rtx_fmt_ee (GTU, VOIDmode, inc_src, GEN_INT > (dec_num - 1)); > +      return condition; > +    } > +  else if ((XEXP (condition, 0) == reg) >        /* For the third case:  */ >        || ((cc_reg != NULL_RTX) >        && (XEXP (condition, 0) == cc_reg) >        && (reg_orig == reg)) > @@ -245,20 +257,11 @@ doloop_condition_get (rtx_insn *doloop_pat) >                         (label_ref (label)) >                         (pc))))]) > > -    So we return the second form instead for the two cases when n == 1. > - > -    For n > 1, the final value may be exceeded, so use GE instead of NE. > +    So we return the second form instead for the two cases. >       */ > -     if (GET_CODE (pattern) != PARALLEL) > -       { > -    if (INTVAL (XEXP (inc_src, 1)) != -1) > -      condition = gen_rtx_fmt_ee (GE, VOIDmode, inc_src, const0_rtx); > -    else > -      condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);; > -       } > - > +    condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx); >      return condition; > -   } > +    } > >    /* ??? If a machine uses a funny comparison, we could return a >       canonicalized form here.  */ > @@ -501,7 +504,8 @@ doloop_modify (class loop *loop, class niter_desc > *desc, >      case GE: >        /* Currently only GE tests against zero are supported.  */ >        gcc_assert (XEXP (condition, 1) == const0_rtx); > - > +      /* FALLTHRU */ > +    case GTU: >        noloop = constm1_rtx; > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > index a6a7ff507a5..9398702cddd 100644 > --- a/gcc/config/arm/iterators.md > +++ b/gcc/config/arm/iterators.md > @@ -2673,8 +2673,16 @@ >  (define_int_attr mrrc [(VUNSPEC_MRRC "mrrc") (VUNSPEC_MRRC2 "mrrc2")]) >  (define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")]) > > -(define_int_attr mode1 [(DLSTP8 "8") (DLSTP16 "16") (DLSTP32 "32") > -            (DLSTP64 "64")]) > +(define_int_attr dlstp_elemsize [(DLSTP8 "8") (DLSTP16 "16") (DLSTP32 > "32") > +                 (DLSTP64 "64")]) > + > +(define_int_attr letp_num_lanes [(LETP8 "16") (LETP16 "8") (LETP32 "4") > +                 (LETP64 "2")]) > +(define_int_attr letp_num_lanes_neg [(LETP8 "-16") (LETP16 "-8") > (LETP32 "-4") > +                     (LETP64 "-2")]) > + > +(define_int_attr letp_num_lanes_minus_1 [(LETP8 "15") (LETP16 "7") > (LETP32 "3") > +                     (LETP64 "1")]) > >  (define_int_attr opsuffix [(UNSPEC_DOT_S "s8") >                 (UNSPEC_DOT_U "u8") > @@ -2921,6 +2929,8 @@ >  (define_int_iterator VQSHLUQ_N [VQSHLUQ_N_S]) >  (define_int_iterator DLSTP [DLSTP8 DLSTP16 DLSTP32 >                     DLSTP64]) > +(define_int_iterator LETP [LETP8 LETP16 LETP32 > +               LETP64]) > >  ;; Define iterators for VCMLA operations >  (define_int_iterator VCMLA_OP [UNSPEC_VCMLA >        /* The iteration count does not need incrementing for a GE > test.  */ > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md > index 12ae4c4f820..2d6f27c14f4 100644 > --- a/gcc/config/arm/unspecs.md > +++ b/gcc/config/arm/unspecs.md > @@ -587,6 +587,10 @@ >    DLSTP16 >    DLSTP32 >    DLSTP64 > +  LETP8 > +  LETP16 > +  LETP32 > +  LETP64 >    VPNOT >    VCREATEQ_F >    VCVTQ_N_TO_F_S > > > I've attached the whole [2/2] patch diff with this change and > the required comment changes in doloop_condition_get. > WDYT? > > > Thanks, > > Stam > > >> >>