From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 125914 invoked by alias); 3 Feb 2020 15:01:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 125906 invoked by uid 89); 3 Feb 2020 15:01:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.4 required=5.0 tests=AWL,BAYES_00,FORGED_SPF_HELO,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 spammy=vel, vetype, VEL, sxtw X-HELO: EUR02-HE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr10075.outbound.protection.outlook.com (HELO EUR02-HE1-obe.outbound.protection.outlook.com) (40.107.1.75) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 03 Feb 2020 15:01:44 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DzlMrboMyyslsreqDkcTk/5ifwiXAlArmdeMwTyEQfk=; b=GeZsibLlGi29fG0vxwLLanhzDWTOrWhGFlTzdFN1Mk8JpolMDLUCdXgP2fbRj1XenmsnqHSjLR8nhv31Kn3XMYP1G9WHtYcJi5whFPz2CP9f3zeoHyOko1Jhy7SAw4UvQb5ezR5LBah9DtBsWHfIuvQlHhuA7apvRbX95LSK7UQ= Received: from VI1PR08CA0128.eurprd08.prod.outlook.com (2603:10a6:800:d4::30) by AM6PR08MB4200.eurprd08.prod.outlook.com (2603:10a6:20b:a8::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2686.32; Mon, 3 Feb 2020 15:01:35 +0000 Received: from AM5EUR03FT061.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e08::206) by VI1PR08CA0128.outlook.office365.com (2603:10a6:800:d4::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2686.32 via Frontend Transport; Mon, 3 Feb 2020 15:01:35 +0000 Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT061.mail.protection.outlook.com (10.152.16.247) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2665.18 via Frontend Transport; Mon, 3 Feb 2020 15:01:35 +0000 Received: ("Tessian outbound 846b976b3941:v42"); Mon, 03 Feb 2020 15:01:35 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 81dbc158f1d2b32a X-CR-MTA-TID: 64aa7808 Received: from e94758a54442.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 16E8917B-8F4B-4130-9420-FF353D718C6B.1; Mon, 03 Feb 2020 15:01:29 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e94758a54442.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 03 Feb 2020 15:01:29 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d2wt/pBpIElYxsqKtWIRCZyYcLnxCYuGier5aBPzDDEa/qWC/sqpPR1CMTaNN60aW1QEcqno2wBSUchY56+Uo8QDC+1xN30qxV/ec01I/QWAJXs1GBoFzZEFvrWcCdnbr+qO2Be/OEwJ7hpLLU8RL8dl6nbL6pioU9roQE4/EdgJm+UkbNY4h8zOxFoVDRfjHnJz9hMNQuSEbPqQXfcuwKwC+Pn037uN8RUuifFAyOaCKVG+iW9jNE8UI3o1M3eUmrI8lZt8JcSjusGyx5lUIYxfly6LHu4M0IpJ26zz6PkAJ93BCMATtMhZNZC1yTEfn4/RkMgFO3Sav/QUYeNXJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DzlMrboMyyslsreqDkcTk/5ifwiXAlArmdeMwTyEQfk=; b=fC6apuEnEGTRcp6ejn5f1p5QrSntYNROTHEuoRlspPLFsxQPGDykJfBmHrHKT405vvrrBVVX+1dC+B8b0xO4s9BxQ/J5iQV/+/2c4JvLGuRZOk7qo+Sfu3nwbBRng5lMpw81cBbxkilUHkn4I4IE+e3/2P5TS2CCr6tgxwkm2IGrLSlJtqMxCkdZWBXKS1E5quFRvD1XLuF15FNkoZLN3DjnQ6DCNi+H0TDqUcioLUrfDtHEaMIAsbQkiSMiCjFmAi/pvovRehSUMpYZWBlZXbUeGzHO2xpAb17UPjD3dJrrkWVyiXN2ep0uKdQ4rbt886/3UrRsJeKoEMXicrcQug== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DzlMrboMyyslsreqDkcTk/5ifwiXAlArmdeMwTyEQfk=; b=GeZsibLlGi29fG0vxwLLanhzDWTOrWhGFlTzdFN1Mk8JpolMDLUCdXgP2fbRj1XenmsnqHSjLR8nhv31Kn3XMYP1G9WHtYcJi5whFPz2CP9f3zeoHyOko1Jhy7SAw4UvQb5ezR5LBah9DtBsWHfIuvQlHhuA7apvRbX95LSK7UQ= Received: from AM5PR0801MB2035.eurprd08.prod.outlook.com (10.168.157.147) by AM5PR0801MB1908.eurprd08.prod.outlook.com (10.168.158.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2686.29; Mon, 3 Feb 2020 15:01:27 +0000 Received: from AM5PR0801MB2035.eurprd08.prod.outlook.com ([fe80::19ff:5219:d351:3199]) by AM5PR0801MB2035.eurprd08.prod.outlook.com ([fe80::19ff:5219:d351:3199%3]) with mapi id 15.20.2686.030; Mon, 3 Feb 2020 15:01:27 +0000 From: Wilco Dijkstra To: GCC Patches CC: Kyrylo Tkachov , Richard Sandiford , Richard Earnshaw Subject: [PATCH][AArch64] Improve popcount expansion Date: Mon, 03 Feb 2020 15:01:00 -0000 Message-ID: Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-exchange-transport-forked: True x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;SFS:(10009020)(4636009)(396003)(376002)(136003)(366004)(39860400002)(346002)(189003)(199004)(52536014)(478600001)(76116006)(71200400001)(66446008)(66556008)(64756008)(66476007)(86362001)(66946007)(54906003)(5660300002)(186003)(81166006)(2906002)(81156014)(8676002)(8936002)(4326008)(7696005)(9686003)(33656002)(6916009)(316002)(6506007)(26005)(55016002);DIR:OUT;SFP:1101;SCL:1;SRVR:AM5PR0801MB1908;H:AM5PR0801MB2035.eurprd08.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: rxf8R2DHyfZEL4MTXB31BijN++XOPoNzDtzUdza0/0EQoSb/phLuLWOef4DaXnFuulj/eag9aeb1M35lXy4Wn/0+KWRpebkYW84yNqK/fQwDm3razanWtffoI2qqTZ6PEhJtg2wlZU3L+6oPD825yHUtAwLiP7S1MKe75AwmOaRoDimZ7NBLmfRNrysJnJ7goizbHHtVp7dhQjjcTK/GfgwOy9tgv+5OXGYeSFOdVkgzRaeCV+U38yYGZmfSAAMi3RxCwpjlgQuTGiBwCvQ+6tvYkRxa87q/wdMD7SWId0ZPievF4xhx04NnNjkhGnvSK7sQJ8F8tgJnAqIdY9D//zuURYy1fU5tcIkiG5Dd+aiTzE/pqYweIjXy9FJ6+VN77neVfHCbBMWVk9Cy9g5WDXWYC5iADY3Yz2iVWd7TZLO+nyFTF3n218/Qk51MkQSI x-ms-exchange-antispam-messagedata: 4ly+jIPCGW8Rzfn2rpDYZ6xKD6CuhDbG4UMlY+kTEPaY64KNiziaHZWNZ6BbALgTTIediXih2wxVQs1FlT1+5qHb9sg/03C2AnOdmAvb4+chuOJzhCRy+nYOQlKQ5xyzPy2KiT3ZV7A9qu/PbwbLng== Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; Return-Path: Wilco.Dijkstra@arm.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT061.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 4c940a44-f831-4dc8-5b38-08d7a8b9f1f1 X-SW-Source: 2020-02/txt/msg00076.txt.bz2 The popcount expansion uses umov to extend the result and move it back to the integer register file. If we model ADDV as a zero-extending operation, fmov can be used to move back to the integer side. This results in a ~0.5% speedup on deepsjeng on Cortex-A57. A typical __builtin_popcount expansion is now: fmov s0, w0 cnt v0.8b, v0.8b addv b0, v0.8b fmov w0, s0 Bootstrap OK, passes regress. ChangeLog 2020-02-02 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (popcount2): Improve expansion. * config/aarch64/aarch64-simd.md (aarch64_zero_extend_reduc_plus_): New pattern. * config/aarch64/iterators.md (VDQV_E): New iterator. testsuite/ * gcc.target/aarch64/popcnt2.c: New test. -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch6= 4-simd.md index 97f46f96968a6bc2f93bbc812931537b819b3b19..34765ff43c1a090a31e2aed64ce= 95510317ab8c3 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2460,6 +2460,17 @@ (define_insn "aarch64_reduc_plus_internal" [(set_attr "type" "neon_reduc_add")] ) =20 +;; ADDV with result zero-extended to SI/DImode (for popcount). +(define_insn "aarch64_zero_extend_reduc_plus_" + [(set (match_operand:GPI 0 "register_operand" "=3Dw") + (zero_extend:GPI + (unspec: [(match_operand:VDQV_E 1 "register_operand" "w")] + UNSPEC_ADDV)))] + "TARGET_SIMD" + "add\\t%0, %1." + [(set_attr "type" "neon_reduc_add")] +) + (define_insn "aarch64_reduc_plus_internalv2si" [(set (match_operand:V2SI 0 "register_operand" "=3Dw") (unspec:V2SI [(match_operand:V2SI 1 "register_operand" "w")] diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 86c2cdfc7973f4b964ba233cfbbe369b24e0ac10..5edc76ee14b55b2b4323530e10b= d22b3ffca483e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4829,7 +4829,6 @@ (define_expand "popcount2" { rtx v =3D gen_reg_rtx (V8QImode); rtx v1 =3D gen_reg_rtx (V8QImode); - rtx r =3D gen_reg_rtx (QImode); rtx in =3D operands[1]; rtx out =3D operands[0]; if(mode =3D=3D SImode) @@ -4843,8 +4842,7 @@ (define_expand "popcount2" } emit_move_insn (v, gen_lowpart (V8QImode, in)); emit_insn (gen_popcountv8qi2 (v1, v)); - emit_insn (gen_reduc_plus_scal_v8qi (r, v1)); - emit_insn (gen_zero_extendqi2 (out, r)); + emit_insn (gen_aarch64_zero_extend_reduc_plus_v8qi (out, v1)); DONE; }) =20 diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators= .md index fc973086cb91ae0dc54eeeb0b832d522539d7982..926779bf2442fa60d184ef17308= f91996d6e8d1b 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -208,6 +208,9 @@ (define_mode_iterator VDQV [V8QI V16QI V4HI V8HI V4SI V= 2DI]) ;; Advanced SIMD modes (except V2DI) for Integer reduction across lanes. (define_mode_iterator VDQV_S [V8QI V16QI V4HI V8HI V4SI]) =20 +;; Advanced SIMD modes for Integer reduction across lanes (zero/sign exten= ded). +(define_mode_iterator VDQV_E [V8QI V16QI V4HI V8HI]) + ;; All double integer narrow-able modes. (define_mode_iterator VDN [V4HI V2SI DI]) =20 diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt2.c b/gcc/testsuite/gcc= .target/aarch64/popcnt2.c new file mode 100644 index 0000000000000000000000000000000000000000..e321858afa4d6ecb6fc7348f39f= 6e5c6c0c46147 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/popcnt2.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned +foo (int x) +{ + return __builtin_popcount (x); +} + +unsigned long +foo1 (int x) +{ + return __builtin_popcount (x); +} + +/* { dg-final { scan-assembler-not {popcount} } } */ +/* { dg-final { scan-assembler-times {cnt\t} 2 } } */ +/* { dg-final { scan-assembler-times {fmov} 4 } } */ +/* { dg-final { scan-assembler-not {umov} } } */ +/* { dg-final { scan-assembler-not {uxtw} } } */ +/* { dg-final { scan-assembler-not {sxtw} } } */