From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2074.outbound.protection.outlook.com [40.107.21.74]) by sourceware.org (Postfix) with ESMTPS id 474653857800 for ; Mon, 14 Sep 2020 09:24:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 474653857800 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Andrea.Corallo@arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=emdFulCkYc30vCOOwIgRHQ2fBis1aEo6wFooqwDaInk=; b=AsimHW5oakZYOCU9qz740kbbKfsW1UxXIvSvk2yqrY/nzjzIh8CRgzbI9xAjhovFNUeM+pCaAsgvPg1QnICd1Vh86DnJxGbxdp3NGmWbut7Ha4KZUyEX03mCByWzqfZj5k4ruscwj6zBVy6QGWnknJRwodGBho6ISFdkPlIQLUM= Received: from AM6PR08CA0014.eurprd08.prod.outlook.com (2603:10a6:20b:b2::26) by HE1PR0802MB2203.eurprd08.prod.outlook.com (2603:10a6:3:c3::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3370.16; Mon, 14 Sep 2020 09:24:16 +0000 Received: from AM5EUR03FT064.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:b2:cafe::72) by AM6PR08CA0014.outlook.office365.com (2603:10a6:20b:b2::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3370.16 via Frontend Transport; Mon, 14 Sep 2020 09:24:16 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT064.mail.protection.outlook.com (10.152.17.53) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3370.16 via Frontend Transport; Mon, 14 Sep 2020 09:24:16 +0000 Received: ("Tessian outbound a0bffebca527:v64"); Mon, 14 Sep 2020 09:24:16 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 79597f3e97c1337e X-CR-MTA-TID: 64aa7808 Received: from 01695133669c.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 8D812237-10FB-4700-B964-1A99A81836FD.1; Mon, 14 Sep 2020 09:23:46 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 01695133669c.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 14 Sep 2020 09:23:46 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LUCBsGxp/ZP8sBAPFtWqKlvpI3qsxaXOP1PDrhXh6yxYYhBNCNOEZ/KwPz+nvKeL+TxndAKHe8XB0UiQuRhCrEDxropOHvkEWP2K+T3F/FUWuiQEOiH80Scxy1pTzs6Np15gF1yWg/401/GmPFDD4NGx39LKhLbuuDG6HjcgsCjDoajZHU0kDebE3fpZmW1dh01HoSn9YvaZITXP6B8eXT2AcOMOYQC+2+oPP5FXLflnmEtkdekgfWYvj1tA2uiqvTkWSqXMLiFPj5nVoDgYVzn7bxzKrFJ/wiId70reU1fuzwjdvuhsh+N8GHmbWHhptl1QaclEsk0ak3q5qOkiIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=emdFulCkYc30vCOOwIgRHQ2fBis1aEo6wFooqwDaInk=; b=QuM8mKHy5E/d3g0Wttx80ut0REkedAxwLC0CWJEtWUCihnkMDIYfXaCxX3QiGCjmtKvTdyt5vUTkAqKJ/Ld9MkCIGuPzV4FnVAZZfpaqrbyGp4iiwyrOCB+NyY9T2YhcOdCnrAqa36XcADS6jgI58Z9kTg480kVQXQzicqt7utVqf/sdj9qjJ8p019mAkselGrqYkN6RMx46a+E8ed6OSngdLMS8ybMsPWuFuugWG7vy/XRV9PLG4lU1h0jM03ZDvaRhKhyEaIEP8VMKfQoXxEtX6rkhM3HaCPZMzfJ9bjGusjwwLHIFO4TK0d07PoI61SwlC3rZlz4fiH/B9q4j1A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=emdFulCkYc30vCOOwIgRHQ2fBis1aEo6wFooqwDaInk=; b=AsimHW5oakZYOCU9qz740kbbKfsW1UxXIvSvk2yqrY/nzjzIh8CRgzbI9xAjhovFNUeM+pCaAsgvPg1QnICd1Vh86DnJxGbxdp3NGmWbut7Ha4KZUyEX03mCByWzqfZj5k4ruscwj6zBVy6QGWnknJRwodGBho6ISFdkPlIQLUM= Authentication-Results-Original: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=arm.com; Received: from AM6PR08MB4900.eurprd08.prod.outlook.com (2603:10a6:20b:cc::10) by AM6PR08MB3079.eurprd08.prod.outlook.com (2603:10a6:209:45::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3370.17; Mon, 14 Sep 2020 09:23:45 +0000 Received: from AM6PR08MB4900.eurprd08.prod.outlook.com ([fe80::b96e:941c:e829:3903]) by AM6PR08MB4900.eurprd08.prod.outlook.com ([fe80::b96e:941c:e829:3903%3]) with mapi id 15.20.3370.019; Mon, 14 Sep 2020 09:23:45 +0000 From: Andrea Corallo To: gcc-patches@gcc.gnu.org Cc: ook@ucw.cz, nd@arm.com, richard.sandiford@arm.com, Richard Biener Subject: [PATCH V2] vec: don't select partial vectors when looping on full vectors References: Date: Mon, 14 Sep 2020 11:23:43 +0200 In-Reply-To: (Richard Sandiford's message of "Fri, 11 Sep 2020 08:20:54 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Content-Type: multipart/mixed; boundary="=-=-=" X-ClientProxiedBy: LO2P123CA0006.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:a6::18) To AM6PR08MB4900.eurprd08.prod.outlook.com (2603:10a6:20b:cc::10) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from e124257 (217.140.106.37) by LO2P123CA0006.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:a6::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3370.16 via Frontend Transport; Mon, 14 Sep 2020 09:23:44 +0000 X-Originating-IP: [217.140.106.37] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 65c3913c-bd11-4ee7-8e2a-08d8588ff39c X-MS-TrafficTypeDiagnostic: AM6PR08MB3079:|HE1PR0802MB2203: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:10000;OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: /LWG3BJ4Ny4LTzcg+pihqXGXunt6qgSyxqCKTBdaz1IcH+itARvmJ/Fked2a0pmvMyY4HB7sc6qtm+XGaP919ECfM6hV9vrqqxm8/C+gGXQBK3jPqRim9to42o95/D+7z5b9aq/bQgBCDCYztKsmMb4HcUOMrtmifT+luaJ6Dwnf/tEvftoa3I0u2kCBbaU7MbP5Jd74EbeuvrE4VWrw1I8rRX2TU6n5S6cCEBqkceTMM8Re0frYATPK1vFtg41JKQBhj+8fXA6r/PQaUsfB/94mrUs3aufx1g6Igo107VWEKLyTfmtaym7h/HYP4zNp X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM6PR08MB4900.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(366004)(39860400002)(376002)(136003)(346002)(44832011)(2906002)(478600001)(316002)(8936002)(4326008)(8676002)(2616005)(956004)(235185007)(5660300002)(6916009)(6486002)(52116002)(6496006)(66556008)(66476007)(66616009)(36756003)(26005)(33964004)(66946007)(186003)(16526019)(86362001)(83380400001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: ssn/0jLu5Y2qrm3c4DDx5iOq2oungtxoJeV81VJgQjfz3l0xgljY5wVOeXbdIX7B/O+SdS5LNBF/JLPM2NBWWG2LElghww+4z5oGci57j3iwc38GNka5AngEojU+xgwbUJDNoMqZycZBWodzH6TYJL3uqk6t17YuxvWL6Mt31xuVdEbtuuelCrQXaGZ6r6rAYQka1Obowhw7wem9UFZFti582jOPHR72YMTy+p6C2U07fj8NGZF6Np7AIvyD+yEWl8e5lh01/LnhrGjdxFlmcHkldbzfjjmdyMr2MDZgdbYG7kwTnClhv5We03JGO/b7tbOOrVSplApL3m5NBdax9Qdgo0KCmnij+aK2L4H42rFMBblX+3mg0jMobXWlH7CU35suWqsN5o/7LQahFzW2QrgBmvgRRgEu+opfNJUCcat6rE+zM/FkqN5I6s9gqdPM98xTPY4zWFrQL7p+KMrVr8cLk5muk8HrczTgryzO3djWN8uyweIWzz0CPEiw3as1kDA41kxQcJoTBlN4oIYDPOw+KRaHc9Rft5mMIXyvSLrfBAgTMgHiGusfm8ZV4aJ+ofWAZSyj9qE3kb/OLk8nzRxEeewMoGy5fiXIYBcLOmHLO+v4ZqIiiAfWw2og+1Xl+tmCtQjHvFoJVyytBJwOoA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3079 Original-Authentication-Results: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT064.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 5c3c041f-761f-4a94-112e-08d8588fe10f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tfhThtH8iMasi7aUc4QWUh+JfX3Cda25ZV4+zHXvKGP85W+5trDxZrbF9vjlnTxz0mXAGeksA5GAIhRR6DTS00aoT8IF3xtpOan+1Foj3F96UjD65S0ZTWiSvCnf4DA7e/TH9TZfmIk5Mv1l0cc/aS0NcvJbHOUYNIKxr/6oYKYK8tl2FqszpCRvXOgN44FvRjcBIz3l3oiCetihYgw3dEXkZFCjoK5YsfnbBB9olgubdhY/B1XvvcsObQoB4xOgIKqNpqhZ3JuKiLWXDppDr8d9+9ZFkTLQMTyxIbqxIZHcjTJu0ojjr0aH928Z3MZ5D76gP36F1IVieoIl6ipVYfOIAIfcoKgGqmhfyqouYQi+x13DkeYBKADeEo4p2uFZCiH8wB/mVu9H1h6MCwrj2Q== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(346002)(136003)(376002)(396003)(39860400002)(46966005)(235185007)(5660300002)(86362001)(8676002)(336012)(107886003)(44832011)(316002)(4326008)(82310400003)(36906005)(956004)(356005)(26005)(47076004)(8936002)(33964004)(2616005)(186003)(16526019)(82740400003)(81166007)(36756003)(6486002)(478600001)(6916009)(83380400001)(2906002)(6496006)(70586007)(66616009)(70206006); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2020 09:24:16.2867 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 65c3913c-bd11-4ee7-8e2a-08d8588ff39c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT064.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2203 X-Spam-Status: No, score=-14.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Sep 2020 09:24:22 -0000 --=-=-= Content-Type: text/plain Hi all, here is the update version of the patch implementing suggestions. The check for 'vect_need_peeling_or_partial_vectors_p' (and its comment) has also been move just before so we can short-circuit the partial vector handling if we know we are using full vectors. Bootstrapped and regtested on aarch64-linux-gnu. Okay for trunk? Thanks Andrea --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename=0001-vec-don-t-select-partial-vectors-when-unnecessary.patch >From 45b5e45a7ab2eecfa8489d2a7b8341556e9e8d7c Mon Sep 17 00:00:00 2001 From: Andrea Corallo Date: Fri, 28 Aug 2020 16:01:15 +0100 Subject: [PATCH] vec: don't select partial vectors when unnecessary gcc/ChangeLog 2020-09-09 Andrea Corallo * tree-vect-loop.c (vect_need_peeling_or_partial_vectors_p): New function. (vect_analyze_loop_2): Make use of it not to select partial vectors if no peel is required. (determine_peel_for_niter): Move out some logic into 'vect_need_peeling_or_partial_vectors_p'. gcc/testsuite/ChangeLog 2020-09-09 Andrea Corallo * gcc.target/aarch64/sve/cost_model_10.c: New test. * gcc.target/aarch64/sve/clastb_8.c: Update test for new vectorization strategy. * gcc.target/aarch64/sve/cost_model_5.c: Likewise. * gcc.target/aarch64/sve/struct_vect_14.c: Likewise. * gcc.target/aarch64/sve/struct_vect_15.c: Likewise. * gcc.target/aarch64/sve/struct_vect_16.c: Likewise. * gcc.target/aarch64/sve/struct_vect_17.c: Likewise. --- .../gcc.target/aarch64/sve/clastb_8.c | 5 +- .../gcc.target/aarch64/sve/cost_model_10.c | 12 +++ .../gcc.target/aarch64/sve/cost_model_5.c | 4 +- .../gcc.target/aarch64/sve/struct_vect_14.c | 8 +- .../gcc.target/aarch64/sve/struct_vect_15.c | 8 +- .../gcc.target/aarch64/sve/struct_vect_16.c | 8 +- .../gcc.target/aarch64/sve/struct_vect_17.c | 8 +- gcc/tree-vect-loop.c | 86 +++++++++++-------- 8 files changed, 81 insertions(+), 58 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c index 57c42082449..e61ff4ac92d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c @@ -23,7 +23,4 @@ TEST_TYPE (uint64_t); /* { dg-final { scan-assembler {\tclastb\t(h[0-9]+), p[0-7], \1, z[0-9]+\.h\n} } } */ /* { dg-final { scan-assembler {\tclastb\t(s[0-9]+), p[0-7], \1, z[0-9]+\.s\n} } } */ /* { dg-final { scan-assembler {\tclastb\t(d[0-9]+), p[0-7], \1, z[0-9]+\.d\n} } } */ -/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.b,} } } */ -/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.h,} } } */ -/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.s,} } } */ -/* { dg-final { scan-assembler {\twhilelo\tp[0-9]+\.d,} } } */ +/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.b,} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c new file mode 100644 index 00000000000..bfac09ed1c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_10.c @@ -0,0 +1,12 @@ +/* { dg-options "-O3 -msve-vector-bits=256" } */ + +void +f (int *restrict x, int *restrict y, unsigned int n) +{ + for (unsigned int i = 0; i < n * 8; ++i) + x[i] += y[i]; +} + +/* { dg-final { scan-assembler-not {\twhilelo\t} } } */ +/* { dg-final { scan-assembler {\tptrue\tp} } } */ +/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, x[0-9]+\n} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c index 250ca837324..f3a29fc38a1 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_5.c @@ -9,5 +9,5 @@ vset (int *restrict dst, int *restrict src, int count) *dst++ = 1; } -/* { dg-final { scan-assembler-not {\tst1w\tz} } } */ -/* { dg-final { scan-assembler-times {\tstp\tq} 2 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz} 2 } } */ +/* { dg-final { scan-assembler-not {\tstp\tq} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c index a16a79e51c0..45644b67bda 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c @@ -43,12 +43,12 @@ #undef NAME #undef TYPE -/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */ /* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_15.c b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_15.c index bc00267c8e7..814dbb3ae41 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_15.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_15.c @@ -3,12 +3,12 @@ #include "struct_vect_14.c" -/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */ /* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_16.c b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_16.c index 9e2a549f5e8..6ecf89b5442 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_16.c @@ -3,12 +3,12 @@ #include "struct_vect_14.c" -/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */ /* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_17.c b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_17.c index e791e2e12a6..571c6d0d33b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_17.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_17.c @@ -3,12 +3,12 @@ #include "struct_vect_14.c" -/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+, x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+, x[0-9]+\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */ /* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */ /* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 80e78f7adf4..7bb5e83b7b1 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -991,6 +991,51 @@ vect_min_prec_for_max_niters (loop_vec_info loop_vinfo, unsigned int factor) return wi::min_precision (max_ni * factor, UNSIGNED); } +/* true if the loop needs peeling or partial vectors when vectorized. */ + +static bool +vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo) +{ + unsigned HOST_WIDE_INT const_vf; + HOST_WIDE_INT max_niter + = likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo)); + + unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); + if (!th && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) + th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO + (loop_vinfo)); + + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0) + { + /* Work out the (constant) number of iterations that need to be + peeled for reasons other than niters. */ + unsigned int peel_niter = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) + peel_niter += 1; + if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo) - peel_niter, + LOOP_VINFO_VECT_FACTOR (loop_vinfo))) + return true; + } + else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) + /* ??? When peeling for gaps but not alignment, we could + try to check whether the (variable) niters is known to be + VF * N + 1. That's something of a niche case though. */ + || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) + || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf) + || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) + < (unsigned) exact_log2 (const_vf)) + /* In case of versioning, check if the maximum number of + iterations is greater than th. If they are identical, + the epilogue is unnecessary. */ + && (!LOOP_REQUIRES_VERSIONING (loop_vinfo) + || ((unsigned HOST_WIDE_INT) max_niter + > (th / const_vf) * const_vf)))) + return true; + + return false; +} + /* Each statement in LOOP_VINFO can be masked where necessary. Check whether we can actually generate the masks required. Return true if so, storing the type of the scalar IV in LOOP_VINFO_RGROUP_COMPARE_TYPE. */ @@ -1967,44 +2012,10 @@ determine_peel_for_niter (loop_vec_info loop_vinfo) { LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false; - unsigned HOST_WIDE_INT const_vf; - HOST_WIDE_INT max_niter - = likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo)); - - unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); - if (!th && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) - th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO - (loop_vinfo)); - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) /* The main loop handles all iterations. */ LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false; - else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) - && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0) - { - /* Work out the (constant) number of iterations that need to be - peeled for reasons other than niters. */ - unsigned int peel_niter = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) - peel_niter += 1; - if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo) - peel_niter, - LOOP_VINFO_VECT_FACTOR (loop_vinfo))) - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; - } - else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) - /* ??? When peeling for gaps but not alignment, we could - try to check whether the (variable) niters is known to be - VF * N + 1. That's something of a niche case though. */ - || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf) - || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) - < (unsigned) exact_log2 (const_vf)) - /* In case of versioning, check if the maximum number of - iterations is greater than th. If they are identical, - the epilogue is unnecessary. */ - && (!LOOP_REQUIRES_VERSIONING (loop_vinfo) - || ((unsigned HOST_WIDE_INT) max_niter - > (th / const_vf) * const_vf)))) + else if (vect_need_peeling_or_partial_vectors_p (loop_vinfo)) LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; } @@ -2265,7 +2276,10 @@ start_over: this vectorization factor. */ if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) { - if (param_vect_partial_vector_usage == 0) + /* Don't use partial vectors if we don't need to peel the + loop. */ + if (param_vect_partial_vector_usage == 0 + || !vect_need_peeling_or_partial_vectors_p (loop_vinfo)) LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; else if (vect_verify_full_masking (loop_vinfo) || vect_verify_loop_lens (loop_vinfo)) -- 2.20.1 --=-=-=--