From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28411 invoked by alias); 7 Jun 2019 09:09:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 28403 invoked by uid 89); 7 Jun 2019 09:09:29 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.4 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=sk:unspec_, sk:UNSPEC_ X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com Received: from mail-eopbgr150082.outbound.protection.outlook.com (HELO EUR01-DB5-obe.outbound.protection.outlook.com) (40.107.15.82) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 07 Jun 2019 09:09:26 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sYwp5a/LUf7TYmvqLRd8YpizmICykW5ys1RBRrmD28c=; b=6gjzZdxPSbE3QAZ+0Xddxntp12V2Tj3kx+PQB2kLcMq57ASM3O+jWoZNSjuqj9/tgp+v1NGMRlE7e7cP4LhDS0HAuCirxed4/kPARP1eZbF30GpxNn5zvbOLtECPEkdzxYvfTJsrMr8A6hcblWPNQKgN/wDAlof7/HWO8cpxHgw= Received: from VI1PR0801MB2062.eurprd08.prod.outlook.com (10.173.70.150) by VI1PR0801MB1758.eurprd08.prod.outlook.com (10.168.63.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1965.14; Fri, 7 Jun 2019 09:09:23 +0000 Received: from VI1PR0801MB2062.eurprd08.prod.outlook.com ([fe80::5077:3fe9:e76e:398e]) by VI1PR0801MB2062.eurprd08.prod.outlook.com ([fe80::5077:3fe9:e76e:398e%3]) with mapi id 15.20.1965.011; Fri, 7 Jun 2019 09:09:23 +0000 From: Przemyslaw Wirkus To: "gcc-patches@gcc.gnu.org" CC: nd , "nickc@redhat.com" , Richard Earnshaw , Ramana Radhakrishnan , Kyrylo Tkachov Subject: Re: [PATCH][arm] Implement usadv16qi and ssadv16qi standard names Date: Fri, 07 Jun 2019 09:09:00 -0000 Message-ID: References: , In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Przemyslaw.Wirkus@arm.com; x-ms-oob-tlc-oobclassifiers: OLM:747; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 Content-Type: multipart/mixed; boundary="_002_VI1PR0801MB2062E6870C83709BF801C15CE4100VI1PR0801MB2062_" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Przemyslaw.Wirkus@arm.com X-SW-Source: 2019-06/txt/msg00417.txt.bz2 --_002_VI1PR0801MB2062E6870C83709BF801C15CE4100VI1PR0801MB2062_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-length: 3156 Hi all, This patch implements the usadv16qi and ssadv16qi standard names for arm. The V16QImode variant is important as it is the most commonly used pattern: reducing vectors of bytes into an int. The midend expects the optab to compute the absolute differences of operand= s 1 and 2 and reduce them while widening along the way up to SImode. So the inp= uts are V16QImode and the output is V4SImode. I've based my solution on Aarch64 usadv16qi and ssadv16qi standard names current implementation (r260437). This solution emits below sequence of instructions: VABDL.u8 tmp, op1, op2 # op1, op2 lowpart VABAL.u8 tmp, op1, op2 # op1, op2 highpart VPADAL.u16 op3, tmp So, for the code: $ arm-none-linux-gnueabihf-gcc -S -O3 -march=3Darmv8-a+simd -mfpu=3Dauto -m= float-abi=3Dhard usadv16qi.c -dp #define N 1024 unsigned char pix1[N]; unsigned char pix2[N]; int foo (void) { int i_sum =3D 0; int i; for (i =3D 0; i < N; i++) i_sum +=3D __builtin_abs (pix1[i] - pix2[i]); return i_sum; } we now generate on arm: foo: movw r3, #:lower16:pix2 @ 57 [c=3D4 l=3D4] *arm_movsi_vfp/3 movt r3, #:upper16:pix2 @ 58 [c=3D4 l=3D4] *arm_movt/0 vmov.i32 q9, #0 @ v4si @ 3 [c=3D4 l=3D4] *neon_movv4si/2 movw r2, #:lower16:pix1 @ 59 [c=3D4 l=3D4] *arm_movsi_vfp/3 movt r2, #:upper16:pix1 @ 60 [c=3D4 l=3D4] *arm_movt/0 add r1, r3, #1024 @ 8 [c=3D4 l=3D4] *arm_addsi3/4 .L2: vld1.8 {q11}, [r3]! @ 11 [c=3D8 l=3D4] *movmisalignv16qi_neon_load vld1.8 {q10}, [r2]! @ 10 [c=3D8 l=3D4] *movmisalignv16qi_neon_load cmp r1, r3 @ 21 [c=3D4 l=3D4] *arm_cmpsi_insn/2 vabdl.u8 q8, d20, d22 @ 12 [c=3D8 l=3D4] neon_vabdluv8qi vabal.u8 q8, d21, d23 @ 15 [c=3D88 l=3D4] neon_vabaluv8qi vpadal.u16 q9, q8 @ 16 [c=3D8 l=3D4] neon_vpadaluv8hi bne .L2 @ 22 [c=3D16 l=3D4] arm_cond_branch vadd.i32 d18, d18, d19 @ 24 [c=3D120 l=3D4] quad_halves_plusv4si vpadd.i32 d18, d18, d18 @ 25 [c=3D8 l=3D4] neon_vpadd_internalv2si vmov.32 r0, d18[0] @ 30 [c=3D12 l=3D4] vec_extractv2sisi/1 instead of: foo: @ args =3D 0, pretend =3D 0, frame =3D 0 @ frame_needed =3D 0, uses_anonymous_args =3D 0 @ link register save eliminated. movw r3, #:lower16:pix1 movt r3, #:upper16:pix1 vmov.i32 q9, #0 @ v4si movw r2, #:lower16:pix2 movt r2, #:upper16:pix2 add r1, r3, #1024 .L2: vld1.8 {q8}, [r3]! vld1.8 {q11}, [r2]! vmovl.u8 q10, d16 cmp r1, r3 vmovl.u8 q8, d17 vmovl.u8 q12, d22 vmovl.u8 q11, d23 vsub.i16 q10, q10, q12 vsub.i16 q8, q8, q11 vabs.s16 q10, q10 vabs.s16 q8, q8 vaddw.s16 q9, q9, d20 vaddw.s16 q9, q9, d21 vaddw.s16 q9, q9, d16 vaddw.s16 q9, q9, d17 bne .L2 vadd.i32 d18, d18, d19 vpadd.i32 d18, d18, d18 vmov.32 r0, d18[0] Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Przemyslaw 2019-05-29 Przemyslaw Wirkus * config/arm/iterators.md (VABAL): New int iterator. * config/arm/neon.md (sadv16qi): New define_expand. * config/arm/unspecs.md ("unspec"): Define UNSPEC_VABAL_S, UNSPEC_VABAL_U values. 2019-05-29 Przemyslaw Wirkus * gcc.target/arm/ssadv16qi.c: New test. * gcc.target/arm/usadv16qi.c: Likewise.= --_002_VI1PR0801MB2062E6870C83709BF801C15CE4100VI1PR0801MB2062_ Content-Type: text/plain; name="patch.txt" Content-Description: patch.txt Content-Disposition: attachment; filename="patch.txt"; size=4805; creation-date="Fri, 07 Jun 2019 09:07:42 GMT"; modification-date="Fri, 07 Jun 2019 09:07:42 GMT" Content-Transfer-Encoding: base64 Content-length: 6515 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYXJtL2l0ZXJhdG9ycy5tZCBiL2dj Yy9jb25maWcvYXJtL2l0ZXJhdG9ycy5tZAppbmRleCBlYjA3YzViOTBjMWIx OTA1ZDM1ZDdiNDgwYmRiZTdkN2E0NWFiN2JhLi4yNDYyYjhjODdlYTdkYmU2 MGJhNTBkMjJiMWU0OTRiYjRmZTkwNWMyIDEwMDY0NAotLS0gYS9nY2MvY29u ZmlnL2FybS9pdGVyYXRvcnMubWQKKysrIGIvZ2NjL2NvbmZpZy9hcm0vaXRl cmF0b3JzLm1kCkBAIC0zNDEsNiArMzQxLDggQEAKIAogKGRlZmluZV9pbnRf aXRlcmF0b3IgVlNVQkhOIFtVTlNQRUNfVlNVQkhOIFVOU1BFQ19WUlNVQkhO XSkKIAorKGRlZmluZV9pbnRfaXRlcmF0b3IgVkFCQUwgW1VOU1BFQ19WQUJB TF9TIFVOU1BFQ19WQUJBTF9VXSkKKwogKGRlZmluZV9pbnRfaXRlcmF0b3Ig VkFCRCBbVU5TUEVDX1ZBQkRfUyBVTlNQRUNfVkFCRF9VXSkKIAogKGRlZmlu ZV9pbnRfaXRlcmF0b3IgVkFCREwgW1VOU1BFQ19WQUJETF9TIFVOU1BFQ19W QUJETF9VXSkKQEAgLTgzNCw2ICs4MzYsNyBAQAogICAoVU5TUEVDX1ZTVUJX X1MgInMiKSAoVU5TUEVDX1ZTVUJXX1UgInUiKQogICAoVU5TUEVDX1ZIU1VC X1MgInMiKSAoVU5TUEVDX1ZIU1VCX1UgInUiKQogICAoVU5TUEVDX1ZRU1VC X1MgInMiKSAoVU5TUEVDX1ZRU1VCX1UgInUiKQorICAoVU5TUEVDX1ZBQkFM X1MgInMiKSAoVU5TUEVDX1ZBQkFMX1UgInUiKQogICAoVU5TUEVDX1ZBQkRf UyAicyIpIChVTlNQRUNfVkFCRF9VICJ1IikKICAgKFVOU1BFQ19WQUJETF9T ICJzIikgKFVOU1BFQ19WQUJETF9VICJ1IikKICAgKFVOU1BFQ19WTUFYICJz IikgKFVOU1BFQ19WTUFYX1UgInUiKQpkaWZmIC0tZ2l0IGEvZ2NjL2NvbmZp Zy9hcm0vbmVvbi5tZCBiL2djYy9jb25maWcvYXJtL25lb24ubWQKaW5kZXgg ZGU5YWU0Mzg0OTAzOGIzY2Y3NWZlY2VlYzM2NDI5ZDVjNDBjNjNmMi4uNTFl ZDExYWJjNTE5ZWE5ZDRmOWUzMTc1MWFjNmQyNmEzZDFhZTVjZCAxMDA2NDQK LS0tIGEvZ2NjL2NvbmZpZy9hcm0vbmVvbi5tZAorKysgYi9nY2MvY29uZmln L2FybS9uZW9uLm1kCkBAIC0zMjU1LDYgKzMyNTUsMzIgQEAKICAgWyhzZXRf YXR0ciAidHlwZSIgIm5lb25fYXJpdGhfYWNjPHE+IildCiApCiAKKyhkZWZp bmVfZXhwYW5kICI8c3VwPnNhZHYxNnFpIgorICBbKHVzZSAobWF0Y2hfb3Bl cmFuZDpWNFNJIDAgInJlZ2lzdGVyX29wZXJhbmQiKSkKKyAgICh1bnNwZWM6 VjE2UUkgWyh1c2UgKG1hdGNoX29wZXJhbmQ6VjE2UUkgMSAicmVnaXN0ZXJf b3BlcmFuZCIpKQorICAgICAgICAgICAgICAgICAgKHVzZSAobWF0Y2hfb3Bl cmFuZDpWMTZRSSAyICJyZWdpc3Rlcl9vcGVyYW5kIikpXSBWQUJBTCkKKyAg ICh1c2UgKG1hdGNoX29wZXJhbmQ6VjRTSSAzICJyZWdpc3Rlcl9vcGVyYW5k IikpXQorICAiVEFSR0VUX05FT04iCisgIHsKKyAgICBydHggcmVkdWMgPSBn ZW5fcmVnX3J0eCAoVjhISW1vZGUpOworICAgIHJ0eCBvcDFfaGlnaHBhcnQg PSBnZW5fcmVnX3J0eCAoVjhRSW1vZGUpOworICAgIHJ0eCBvcDJfaGlnaHBh cnQgPSBnZW5fcmVnX3J0eCAoVjhRSW1vZGUpOworCisgICAgZW1pdF9pbnNu IChnZW5fbmVvbl92YWJkbDxzdXA+djhxaSAocmVkdWMsCisgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZ2VuX2xvd3BhcnQgKFY4 UUltb2RlLCBvcGVyYW5kc1sxXSksCisgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgZ2VuX2xvd3BhcnQgKFY4UUltb2RlLCBvcGVy YW5kc1syXSkpKTsKKworICAgIGVtaXRfaW5zbiAoZ2VuX25lb25fdmdldF9o aWdodjE2cWkgKG9wMV9oaWdocGFydCwgb3BlcmFuZHNbMV0pKTsKKyAgICBl bWl0X2luc24gKGdlbl9uZW9uX3ZnZXRfaGlnaHYxNnFpIChvcDJfaGlnaHBh cnQsIG9wZXJhbmRzWzJdKSk7CisgICAgZW1pdF9pbnNuIChnZW5fbmVvbl92 YWJhbDxzdXA+djhxaSAocmVkdWMsIHJlZHVjLAorICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIG9wMV9oaWdocGFydCwgb3AyX2hp Z2hwYXJ0KSk7CisgICAgZW1pdF9pbnNuIChnZW5fbmVvbl92cGFkYWw8c3Vw PnY4aGkgKG9wZXJhbmRzWzNdLCBvcGVyYW5kc1szXSwgcmVkdWMpKTsKKwor ICAgIGVtaXRfbW92ZV9pbnNuIChvcGVyYW5kc1swXSwgb3BlcmFuZHNbM10p OworICAgIERPTkU7CisgIH0KKykKKwogKGRlZmluZV9pbnNuICJuZW9uX3Y8 bWF4bWluPjxzdXA+PG1vZGU+IgogICBbKHNldCAobWF0Y2hfb3BlcmFuZDpW RFFJVyAwICJzX3JlZ2lzdGVyX29wZXJhbmQiICI9dyIpCiAgICAgICAgICh1 bnNwZWM6VkRRSVcgWyhtYXRjaF9vcGVyYW5kOlZEUUlXIDEgInNfcmVnaXN0 ZXJfb3BlcmFuZCIgInciKQpkaWZmIC0tZ2l0IGEvZ2NjL2NvbmZpZy9hcm0v dW5zcGVjcy5tZCBiL2djYy9jb25maWcvYXJtL3Vuc3BlY3MubWQKaW5kZXgg MTc0YmNjNWUzZDVlMTEyM2NiMWMxYTU5NWY1MDAzODg0ODQwYWVhOC4uNDEw NjhiYWM5MGFhMGNlNmZlZjUzMTc4OWEzOGU1ZjdiM2IyN2RmZiAxMDA2NDQK LS0tIGEvZ2NjL2NvbmZpZy9hcm0vdW5zcGVjcy5tZAorKysgYi9nY2MvY29u ZmlnL2FybS91bnNwZWNzLm1kCkBAIC0yMDAsNiArMjAwLDggQEAKICAgVU5T UEVDX1NIQTI1NlNVMQogICBVTlNQRUNfVk1VTExQNjQKICAgVU5TUEVDX0xP QURfQ09VTlQKKyAgVU5TUEVDX1ZBQkFMX1MKKyAgVU5TUEVDX1ZBQkFMX1UK ICAgVU5TUEVDX1ZBQkRfRgogICBVTlNQRUNfVkFCRF9TCiAgIFVOU1BFQ19W QUJEX1UKZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9h cm0vc3NhZHYxNnFpLmMgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYXJt L3NzYWR2MTZxaS5jCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAw MDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAuLmRiYTVlZjRm NmI5YzBiNzU0NjNhMDg1NDllOTg5ZWRjOWMyMmE5ZDcKLS0tIC9kZXYvbnVs bAorKysgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYXJtL3NzYWR2MTZx aS5jCkBAIC0wLDAgKzEsMjkgQEAKKy8qIHsgZGctZG8gY29tcGlsZSB9ICov CisvKiB7IGRnLWFkZGl0aW9uYWwtb3B0aW9ucyAiLU8zIC0tc2F2ZS10ZW1w cyIgfSAqLworLyogeyBkZy1yZXF1aXJlLWVmZmVjdGl2ZS10YXJnZXQgYXJt X2ZwX29rIH0gKi8KKy8qIHsgZGctcmVxdWlyZS1lZmZlY3RpdmUtdGFyZ2V0 IGFybV9uZW9uX29rIH0gKi8KKy8qIHsgZGctYWRkLW9wdGlvbnMgYXJtX25l b24gfSAqLworCisjZGVmaW5lIE4gMTAyNAorCitzaWduZWQgY2hhciBwaXgx W05dLCBwaXgyW05dOworCitpbnQKK2ZvbyAodm9pZCkKK3sKKyAgaW50IGlf c3VtID0gMDsKKyAgaW50IGk7CisKKyAgZm9yIChpID0gMDsgaSA8IE47IGkr KykKKyAgICBpX3N1bSArPSBfX2J1aWx0aW5fYWJzIChwaXgxW2ldIC0gcGl4 MltpXSk7CisKKyAgcmV0dXJuIGlfc3VtOworfQorCisvKiB7IGRnLWZpbmFs IHsgc2Nhbi1hc3NlbWJsZXIge1x0dmFiZGxcLnM4XHR9IH0gfSAqLworLyog eyBkZy1maW5hbCB7IHNjYW4tYXNzZW1ibGVyIHtcdHZhYmFsXC5zOFx0fSB9 IH0gKi8KKy8qIHsgZGctZmluYWwgeyBzY2FuLWFzc2VtYmxlciB7XHR2cGFk YWxcLnMxNlx0fSB9IH0gKi8KKworLyogeyBkZy1maW5hbCB7IHNjYW4tYXNz ZW1ibGVyLW5vdCB7XHR2bW92bH0gfSB9ICovCisvKiB7IGRnLWZpbmFsIHsg c2Nhbi1hc3NlbWJsZXItbm90IHtcdHZzdWJ9IH0gfSAqLworLyogeyBkZy1m aW5hbCB7IHNjYW4tYXNzZW1ibGVyLW5vdCB7XHR2YWJzfSB9IH0gKi8KZGlm ZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hcm0vdXNhZHYx NnFpLmMgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYXJtL3VzYWR2MTZx aS5jCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwMDAw MDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAuLmQ3NDRiY2JhYjU3NTg1MGRl ODRiNzAzOGEyYjY1ZTQ2NDYxYzAxODUKLS0tIC9kZXYvbnVsbAorKysgYi9n Y2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYXJtL3VzYWR2MTZxaS5jCkBAIC0w LDAgKzEsMjkgQEAKKy8qIHsgZGctZG8gY29tcGlsZSB9ICovCisvKiB7IGRn LWFkZGl0aW9uYWwtb3B0aW9ucyAiLU8zIC0tc2F2ZS10ZW1wcyIgfSAqLwor LyogeyBkZy1yZXF1aXJlLWVmZmVjdGl2ZS10YXJnZXQgYXJtX2ZwX29rIH0g Ki8KKy8qIHsgZGctcmVxdWlyZS1lZmZlY3RpdmUtdGFyZ2V0IGFybV9uZW9u X29rIH0gKi8KKy8qIHsgZGctYWRkLW9wdGlvbnMgYXJtX25lb24gfSAqLwor CisjZGVmaW5lIE4gMTAyNAorCit1bnNpZ25lZCBjaGFyIHBpeDFbTl0sIHBp eDJbTl07CisKK2ludAorZm9vICh2b2lkKQoreworICBpbnQgaV9zdW0gPSAw OworICBpbnQgaTsKKworICBmb3IgKGkgPSAwOyBpIDwgTjsgaSsrKQorICAg IGlfc3VtICs9IF9fYnVpbHRpbl9hYnMgKHBpeDFbaV0gLSBwaXgyW2ldKTsK KworICByZXR1cm4gaV9zdW07Cit9CisKKy8qIHsgZGctZmluYWwgeyBzY2Fu LWFzc2VtYmxlciB7XHR2YWJkbFwudThcdH0gfSB9ICovCisvKiB7IGRnLWZp bmFsIHsgc2Nhbi1hc3NlbWJsZXIge1x0dmFiYWxcLnU4XHR9IH0gfSAqLwor LyogeyBkZy1maW5hbCB7IHNjYW4tYXNzZW1ibGVyIHtcdHZwYWRhbFwudTE2 XHR9IH0gfSAqLworCisvKiB7IGRnLWZpbmFsIHsgc2Nhbi1hc3NlbWJsZXIt bm90IHtcdHZtb3ZsfSB9IH0gKi8KKy8qIHsgZGctZmluYWwgeyBzY2FuLWFz c2VtYmxlci1ub3Qge1x0dnN1Yn0gfSB9ICovCisvKiB7IGRnLWZpbmFsIHsg c2Nhbi1hc3NlbWJsZXItbm90IHtcdHZhYnN9IH0gfSAqLwo= --_002_VI1PR0801MB2062E6870C83709BF801C15CE4100VI1PR0801MB2062_--