From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-427478-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 99915 invoked by alias); 17 May 2016 12:28:04 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 99899 invoked by uid 89); 17 May 2016 12:28:03 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=AWL,BAYES_50,LIKELY_SPAM_BODY,SPF_PASS autolearn=no version=3.3.2 spammy=aus, 2d, indx, dus
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 17 May 2016 12:27:52 +0000
Received: from emea01-am1-obe.outbound.protection.outlook.com (mail-am1lrp0016.outbound.protection.outlook.com [213.199.154.16]) (Using TLS) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-27-ZeIMsfqqRKCttvpESshz4g-1; Tue, 17 May 2016 13:27:47 +0100
Received: from AM4PR08CA0021.eurprd08.prod.outlook.com (10.166.127.31) by AM2PR08MB0226.eurprd08.prod.outlook.com (10.161.130.27) with Microsoft SMTP Server (TLS) id 15.1.497.12; Tue, 17 May 2016 12:27:46 +0000
Received: from DB3FFO11FD047.protection.gbl (2a01:111:f400:7e04::125) by AM4PR08CA0021.outlook.office365.com (2603:10a6:200:1a::31) with Microsoft SMTP Server (TLS) id 15.1.497.12 via Frontend Transport; Tue, 17 May 2016 12:27:45 +0000
Received: from nebula.arm.com (217.140.96.140) by DB3FFO11FD047.mail.protection.outlook.com (10.47.217.78) with Microsoft SMTP Server (TLS) id 15.1.492.8 via Frontend Transport; Tue, 17 May 2016 12:27:45 +0000
Received: from arm.com (10.1.2.79) by mail.arm.com (10.1.105.66) with Microsoft SMTP Server id 14.3.279.2; Tue, 17 May 2016 13:27:43 +0100
Date: Tue, 17 May 2016 12:28:00 -0000
From: James Greenhalgh <james.greenhalgh@arm.com>
To: Jiong Wang <jiong.wang@foss.arm.com>
CC: GCC Patches <gcc-patches@gcc.gnu.org>, <nd@arm.com>
Subject: Re: [AArch64, 2/4] Extend vector mutiply by element to all supported modes
Message-ID: <20160517122743.GB13508@arm.com>
References: <57398D3D.1040806@foss.arm.com> <57398E4B.1000309@foss.arm.com>
MIME-Version: 1.0
In-Reply-To: <57398E4B.1000309@foss.arm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-EOPAttributedMessage: 0
X-Forefront-Antispam-Report: 	CIP:217.140.96.140;IPV:CAL;SCL:-1;CTRY:GB;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(438002)(189002)(199003)(54534003)(377424004)(24454002)(19580395003)(46406003)(19580405001)(33656002)(2950100001)(4001350100001)(87936001)(36756003)(50466002)(5003600100002)(2906002)(8936002)(77096005)(8676002)(6806005)(92566002)(110136002)(104016004)(1076002)(1220700001)(23726003)(83506001)(4326007)(97756001)(5008740100001)(106466001)(189998001)(586003)(54356999)(86362001)(450100001)(47776003)(11100500001)(50986999)(76176999);DIR:OUT;SFP:1101;SCL:1;SRVR:AM2PR08MB0226;H:nebula.arm.com;FPR:;SPF:Pass;MLV:sfv;MX:1;A:1;LANG:en;
X-Microsoft-Exchange-Diagnostics: 1;DB3FFO11FD047;1:lb8SOkQA7dBRk7yTkl6jAcPKr/7YvSjAxLIQtrshOW43J0QiuhkX98FFarSmgig6HAU4nq9fHNJhCiMdwYUhr5uOuzZoJf3btQNMQiBzM4kCrulV0Bl1Hi1lCM+9rGXsgNyg++rWXwDGyP7PTGbrcyVqXZBlQiASFUJ3iCXW+yNb3fuCw8aeCmuH3st/EHpUXjCAXOIoOynu75oOVnSNIwgZwayMdBP7EEcN4CYnOmeGv9/KXNU20ZDIwRD68DsUz3mQueDEb92kCCN8GWA3v8ot/HunGQKVbANLXqSfBmOFBPTwf82tNuvIZVKK/7G1eMppfTjIQYf+b4qsVbPepjQ10Jw/LLw/aUSvuUd/FD5ZwxvwWK9Md4N2Y3W9G0vSG+Blfe59+CT9n00T5OPExXVFzIt3+sgAgZUfK0FsQKNM4telcd3Db35iZHrsFUzxGxRtNFj6W5ePvXatAcPfO3Q1NTe9p7/JNA57qalekB39icw5I/GFocB5by67q5TpgYg7sl8g1zlwC3apCdvm2gaTQgIvuJgEBMF8DQCM/4OBUZkAC3XRp/L4wpOk/PIa
X-MS-Office365-Filtering-Correlation-Id: 88a4e2b9-f510-4f9d-0cdf-08d37e4ea69a
X-Microsoft-Exchange-Diagnostics: 	1;AM2PR08MB0226;2:uWulaMWaCB/YejIthPxB34rCB9jXR1QYD76YmQu9OZ4y0iWRgrfdXCMUlGXPtDOlRAkrSMaZpazFxkJDt5/VbsXzmFdlJ6nKgcOmmD6dOcyOVgUUlJwD96CPjUBJOeHAJc5K9RSs3sx+vmu4Ua8CfV5NBC95Pw/bONtwxxsP3WvUQsHvc8gs1BL5anmcRK3C;3:Siq+c4dn/dktB4OxgHXpYxyXLrbU+ym7+GOYSuk5tWesTckocFOnl/c0IHOsT/7WR3de7eOhco/u28YdRr16tW6o2bfnYTbsIYoB3dB/q4C9kB6ccl0YFNiIt76gY9FkrFNqbBGrEpeMH0fpg7C+vRRfYn/bePo4t9Og++pdfyYUYLglIIqw2xaceOiPY1jZdrx4DVE5Le980jAPg/wEbkobFsXS7mJ2/9RqZ8VHJMZT0JZ0nsD74GKslBxEwF0oLIn7EKOXpr0tBdIvrus1Tg==;25:QTaxDE6NQJU6NWAo1wKCEaufvAsIAgXy8kS2KXNNBnno3Lz8M0lMlEZEo3Jq1/rFbuzH/k3J1TrQem7sjrbSEGzCZ7WiPesepdu0kGJqmh3TFMksRk+l/Do29I6Rx5u4ES+cowp5Njdc8OtFUcOlVot7FkSPtVoUT/uiz/gibbhJaQftzPKHX2c+78bXhSgDN4UYB1Lc9OvAeXsy+TeOwSAMOJ2+luaBsjV4+I2erOHaRyppNKHvKgnKABvEg9IFF6hBM1u0Hy/76jxs7k72gs9qPVEmB8bhF2Audb6gKbxmqBjsSDC9mbkwSh9+gZEfRp9j5JIY0tgZFKsFEqwJczwggOHxDSHBpxUdMp6LtmQ1IGnfOkPkX2/wJ9OyUnINyaeiwcMX3naFr9un6snBmi1xE1bnFxdjz4NYCNDyoew=
X-Microsoft-Antispam: 	UriScan:;BCL:0;PCL:0;RULEID:(8251501002);SRVR:AM2PR08MB0226;
NoDisclaimer: True
X-Microsoft-Exchange-Diagnostics: 	1;AM2PR08MB0226;20:AFkomEycF46hz7IcXD18aTzR4qBuuqYzck/XBkmwXs2fUdQ8+leOcUdf/5XEAVQmaTbP1fW5OIioUhVUYYDFuquhiXyi1jvXnav4ad9b+BsAnf81bTx1SXyfwG+VG2pNZDlW5jKpYtQdlBGVMuR2osxHM/bNPOztBRsgEuuFhz513f5D9Tj5C/PVlseOtO27niPqJC0RRurogwyeAo0rGBm4DaiOja6OBY6HcsyPSDrMpwqaVo4LP1FnJrpZ4LLS;4:PtabgdFGpiNdzazp56l92q94tDlCb3RIdx7TEZPMS4LSvrg+5tkQf9ada2ZD06SxE/OaV+7J3Fhh0zasYbfMz1p3MkMnc63y85eI2Yc1BvGeZaKnK8XK3Sl5AzY41MfY7U99NzUKEHZyXaKKUf/2MZrQxPldmyAHiLHLT0tqA4MuXYHwXUBog2aDHQ7eI+y/2SFKoziyGw23Xm54rfHylzPrbpjRbnp8kqr35D8ML1O5OSlX+0A6syYL28tj8PHwzD4On7vbyVHvPeL/UrahFahhFkdwxZST1+4m4PYtRlAvEy6dLqm61t3/4+hlOOtfnNbIWIvOJkoyYD0Npwt7PQ9Vhm4tOZ9FFU/n+iUa9pvbho3FjoCUjEI1jCA4A5Rhe+6SbbB0f9L9MalmxU1tq0geoVrClsHzy+/vd82iQE/VO+UpwQyEjUHz3HByBGitzCllOjVym0qiD01AeoGa7Q==
X-Microsoft-Antispam-PRVS: 	<AM2PR08MB022607E4E2B7F4B38CED10CC84480@AM2PR08MB0226.eurprd08.prod.outlook.com>
X-Exchange-Antispam-Report-Test: UriScan:;
X-Exchange-Antispam-Report-CFA-Test: 	BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(13023025)(13024025)(13020025)(13013025)(3002001)(10201501046)(6055026);SRVR:AM2PR08MB0226;BCL:0;PCL:0;RULEID:;SRVR:AM2PR08MB0226;
X-Forefront-PRVS: 0945B0CC72
X-Microsoft-Exchange-Diagnostics: 	=?us-ascii?Q?1;AM2PR08MB0226;23:BcjgoqVpsfnv34SB2bg75WaptLWZeXVrWA6pkiYdg?= =?us-ascii?Q?/fXDxIN2t/9GKkjIfH2mR0kPeBrthlGqMpy5zGqgrh7Otevj2wFzR+cqy0GW?= =?us-ascii?Q?WpgnoavngtL3XrnGXArzbtvoPisAG0WMsFRktxnfUK0+D2MCkxPh9AUOkXMF?= =?us-ascii?Q?pYd0uRcOmCnax1B5bhMHJwmjFkEwfwg9xGidZTXhXmIusfE0Ip8rMMTdohnb?= =?us-ascii?Q?oU+OCfvxXSE1BjDKvhwqnrd9aJ6e9F0CO3XgjWTJ1GuFR7P59SoLqzGjCiLZ?= =?us-ascii?Q?nYgPnGEPsnYciqNQf/q/brW8lAkWpS2Z/efXMbeumGJTAWTPD3vZPWk2My04?= =?us-ascii?Q?JfyFNHu/Fm7+ARKrPHJn2blXA63BNiFwaWAErSnohnMxjKdIJyDE1HJYfxT0?= =?us-ascii?Q?HZt5btVdF8j6z4ekt/dB7fLXVx7+jKyb65ybl2curER/NvVuTMM183AbCasm?= =?us-ascii?Q?k8C66eLwdH1jpfi/M5JMdy2QpQGbSJN3ypZGwDm/3534A9nBkjPvypEy1O+e?= =?us-ascii?Q?3sweSXafoChUWKHmgCEvPS54tCkvoCzPC7QukWbsyhgJH2thPegY/jtHsRHp?= =?us-ascii?Q?kVJoVE+2SuTzaqArgMdJ7VgUWZWEiIgBzfvP5xA3rcgKa3YvvVUu9C807Low?= =?us-ascii?Q?aoriNqPAKqQtFGjo1Dqo6lreOB5+RQyGB57iOPgutUIQkK40ZAowfgr4PyJ9?= =?us-ascii?Q?VGod8gMNgUC2QMQt5ZKNzPlbSJPpCqfB0/kMfya3fbTmLHSfZynthIKYAeA8?= =?us-ascii?Q?4i4rTUfIyDJl509xS5o+V6vmDlLyazQmhvJ7c2nQPIqePZH9DJ96xbV8COkh?= =?us-ascii?Q?SaFRUj3s/yBOEzIRacwwJ29t+0LNkkzsy+cQgh0un9eEYyA/X1xU5IkEgQ8a?= =?us-ascii?Q?+Z5EqrzQUYwrew531EdP00D2CXOM65PsazySyG0nbqsS7RUzk7q3Atw1ZsqC?= =?us-ascii?Q?ZUTKVQfiIDYRKDT4HMgRyM8785ZTXteBAvOVUcXwjkRJ/JW6lSNgnktwd8AZ?= =?us-ascii?Q?MqDrk8JgwrZM6YmLEnIaOiUr2ppwOh6JSikIO7S0mC49rm1Y6FohMx4s2Sjw?= =?us-ascii?Q?VjTOWygQPWotcnfr6XNSjKX5wtptKZoMV1GiyA5S2MKQ1jgva+CUIKbKpMNf?= =?us-ascii?Q?8PgYAFTak4=3D?=
X-Microsoft-Exchange-Diagnostics: 	1;AM2PR08MB0226;5:EF9sDZo84wAcFdA3zbl7K+3L85csnFL6/7hDyqKeKh13mJiaSh/65d7lWUj6RlxVUpb+KfkFGUh1fNjvEcKeOYmJ2X0uHASQ829LQYkUIdK3QZfKO4b3t1vrdDMiaWsGv+STv9cVCL3ToAKUTlr+Ng==;24:nC3ygCd1Kil+8C9d+tm1HFvWrdMrfXQcGUfb6znIWgvs0e6x6toRu8HaIqt61oRonk3PmYSPfJJsGn0iCxu/XcaZEA3g3JVCvb2wUBQygiw=;7:pNRzvT4+H/I3oVjdIOjiV9fLoR9bmRqOVZJBmhu8qjY9mlkbldWbmyS04n3Qcn3lbi4/nRPJfcpQrIT9292745UaEYds0cWl/Okik+xZB81nilq3VWyDOuPsI8qNWl3OKNN1SjQF9oZlq4cC06YYgzJZdYUMjBF/h0V3cGrbejPvw9vSCq9c0ilesAHCd89f;20:xZTt1FNakv4cpjoBgQfWvrVHhZYyJO3195wda2n4l1/2DhHP0bTTJkPbD/uyZVCrlE2OZLaQg6/aVMUkda/AFfZbPvv4sfedN8c0WOeyvaMRhvo/QQQhxH7uNykHzB0B7iCDD6qpamJvXzlTdIh2j4/qvR2CloPIJH3LArOfax0=
SpamDiagnosticOutput: 1:23
SpamDiagnosticMetadata: NSPM
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 May 2016 12:27:45.4382 (UTC)
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[217.140.96.140];Helo=[nebula.arm.com]
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM2PR08MB0226
X-MC-Unique: ZeIMsfqqRKCttvpESshz4g-1
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-IsSubscribed: yes
X-SW-Source: 2016-05/txt/msg01227.txt.bz2

On Mon, May 16, 2016 at 10:09:31AM +0100, Jiong Wang wrote:
> AArch64 support vector multiply by element for V2DF, V2SF, V4SF, V2SI,
> V4SI, V4HI, V8HI.
>=20
> All above are well supported by "*aarch64_mul3_elt<mode>" pattern and
> "*aarch64_mul3_elt_<vswap_width_name><mode>" if there is lane size
> change.
>=20
> Above patterns are trying to match "(mul (vec_dup (vec_select)))"
> which is genuinely vector multiply by element.
>=20
> While vector multiply by element can also comes from "(mul (vec_dup
> (scalar" where the scalar value is already sitting in vector register
> then duplicated to other lanes, and there is no lane size change.
>=20
> We have "*aarch64_mul3_elt_to_128df" to match this already, but it's
> restricted for V2DF while this patch extends this support to more modes,
> for example vector integer operations.
>=20
> For the testcase included, the following codegen change will happen:
>=20
>=20
> -       ldr     w0, [x3, 160]
> -       dup     v1.2s, w0
> -       mul     v1.2s, v1.2s, v2.2s
> +       ldr     s1, [x3, 160]
> +       mul     v1.2s, v0.2s, v1.s[0]
>=20
> OK for trunk?
>=20
> 2016-05-16  Jiong Wang<jiong.wang@arm.com>
>=20
> gcc/
>   * config/aarch64/aarch64-simd.md (*aarch64_mul3_elt_to_128df): Extend t=
o all
>   supported modes.  Rename to "*aarch64_mul3_elt_from_dup".
>=20
> gcc/testsuite/
>   * /gcc.target/aarch64/simd/vmul_elem_1.c: New.


This ChangeLog formatting is incorrect. It should look like:

gcc/

2016-05-17  Jiong Wang  <jiong.wang@arm.com>

	* config/aarch64/aarch64-simd.md (*aarch64_mul3_elt_to_128df): Extend
	to all supported modes.  Rename to...
	(*aarch64_mul3_elt_from_dup): ...this.

gcc/testsuite/

2016-05-17  Jiong Wang  <jiong.wang@arm.com>

	* gcc.target/aarch64/simd/vmul_elem_1.c: New.

Otherwise, this patch is OK.

Thanks,
James

> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarc=
h64-simd.md
> index eb18defef15c24bf2334045e92bf7c34b989136d..7f338ff78fabccee868a4befb=
ffed54c3e842dc9 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -371,15 +371,15 @@
>    [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
>  )
>=20=20
> -(define_insn "*aarch64_mul3_elt_to_128df"
> -  [(set (match_operand:V2DF 0 "register_operand" "=3Dw")
> -     (mult:V2DF
> -       (vec_duplicate:V2DF
> -	 (match_operand:DF 2 "register_operand" "w"))
> -      (match_operand:V2DF 1 "register_operand" "w")))]
> +(define_insn "*aarch64_mul3_elt_from_dup<mode>"
> + [(set (match_operand:VMUL 0 "register_operand" "=3Dw")
> +    (mult:VMUL
> +      (vec_duplicate:VMUL
> +	    (match_operand:<VEL> 1 "register_operand" "<h_con>"))
> +      (match_operand:VMUL 2 "register_operand" "w")))]
>    "TARGET_SIMD"
> -  "fmul\\t%0.2d, %1.2d, %2.d[0]"
> -  [(set_attr "type" "neon_fp_mul_d_scalar_q")]
> +  "<f>mul\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]";
> +  [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
>  )
>=20=20
>  (define_insn "aarch64_rsqrte_<mode>2"
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c b/gcc/te=
stsuite/gcc.target/aarch64/simd/vmul_elem_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..290a4e9adbc5d9ce1335ca281=
20e437293776f30
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c
> @@ -0,0 +1,519 @@
> +/* Test the vmul_n_f64 AArch64 SIMD intrinsic.  */
> +
> +/* { dg-do run } */
> +/* { dg-options "-O2 --save-temps" } */
> +
> +#include "arm_neon.h"
> +
> +extern void abort (void);
> +
> +#define A (132.4f)
> +#define B (-0.0f)
> +#define C (-34.8f)
> +#define D (289.34f)
> +float32_t expected2_1[2] =3D {A * A, B * A};
> +float32_t expected2_2[2] =3D {A * B, B * B};
> +float32_t expected4_1[4] =3D {A * A, B * A, C * A, D * A};
> +float32_t expected4_2[4] =3D {A * B, B * B, C * B, D * B};
> +float32_t expected4_3[4] =3D {A * C, B * C, C * C, D * C};
> +float32_t expected4_4[4] =3D {A * D, B * D, C * D, D * D};
> +float32_t _elemA =3D A;
> +float32_t _elemB =3D B;
> +float32_t _elemC =3D C;
> +float32_t _elemD =3D D;
> +
> +#define AD (1234.5)
> +#define BD (-0.0)
> +#define CD (71.3)
> +#define DD (-1024.4)
> +float64_t expectedd2_1[2] =3D {AD * CD, BD * CD};
> +float64_t expectedd2_2[2] =3D {AD * DD, BD * DD};
> +float64_t _elemdC =3D CD;
> +float64_t _elemdD =3D DD;
> +
> +
> +#define AS (1024)
> +#define BS (-31)
> +#define CS (0)
> +#define DS (655)
> +int32_t expecteds2_1[2] =3D {AS * AS, BS * AS};
> +int32_t expecteds2_2[2] =3D {AS * BS, BS * BS};
> +int32_t expecteds4_1[4] =3D {AS * AS, BS * AS, CS * AS, DS * AS};
> +int32_t expecteds4_2[4] =3D {AS * BS, BS * BS, CS * BS, DS * BS};
> +int32_t expecteds4_3[4] =3D {AS * CS, BS * CS, CS * CS, DS * CS};
> +int32_t expecteds4_4[4] =3D {AS * DS, BS * DS, CS * DS, DS * DS};
> +int32_t _elemsA =3D AS;
> +int32_t _elemsB =3D BS;
> +int32_t _elemsC =3D CS;
> +int32_t _elemsD =3D DS;
> +
> +#define AH ((int16_t) 0)
> +#define BH ((int16_t) -32)
> +#define CH ((int16_t) 102)
> +#define DH ((int16_t) -51)
> +#define EH ((int16_t) 71)
> +#define FH ((int16_t) -91)
> +#define GH ((int16_t) 48)
> +#define HH ((int16_t) 255)
> +int16_t expectedh4_1[4] =3D {AH * AH, BH * AH, CH * AH, DH * AH};
> +int16_t expectedh4_2[4] =3D {AH * BH, BH * BH, CH * BH, DH * BH};
> +int16_t expectedh4_3[4] =3D {AH * CH, BH * CH, CH * CH, DH * CH};
> +int16_t expectedh4_4[4] =3D {AH * DH, BH * DH, CH * DH, DH * DH};
> +int16_t expectedh8_1[8] =3D {AH * AH, BH * AH, CH * AH, DH * AH,
> +			   EH * AH, FH * AH, GH * AH, HH * AH};
> +int16_t expectedh8_2[8] =3D {AH * BH, BH * BH, CH * BH, DH * BH,
> +			   EH * BH, FH * BH, GH * BH, HH * BH};
> +int16_t expectedh8_3[8] =3D {AH * CH, BH * CH, CH * CH, DH * CH,
> +			   EH * CH, FH * CH, GH * CH, HH * CH};
> +int16_t expectedh8_4[8] =3D {AH * DH, BH * DH, CH * DH, DH * DH,
> +			   EH * DH, FH * DH, GH * DH, HH * DH};
> +int16_t expectedh8_5[8] =3D {AH * EH, BH * EH, CH * EH, DH * EH,
> +			   EH * EH, FH * EH, GH * EH, HH * EH};
> +int16_t expectedh8_6[8] =3D {AH * FH, BH * FH, CH * FH, DH * FH,
> +			   EH * FH, FH * FH, GH * FH, HH * FH};
> +int16_t expectedh8_7[8] =3D {AH * GH, BH * GH, CH * GH, DH * GH,
> +			   EH * GH, FH * GH, GH * GH, HH * GH};
> +int16_t expectedh8_8[8] =3D {AH * HH, BH * HH, CH * HH, DH * HH,
> +			   EH * HH, FH * HH, GH * HH, HH * HH};
> +int16_t _elemhA =3D AH;
> +int16_t _elemhB =3D BH;
> +int16_t _elemhC =3D CH;
> +int16_t _elemhD =3D DH;
> +int16_t _elemhE =3D EH;
> +int16_t _elemhF =3D FH;
> +int16_t _elemhG =3D GH;
> +int16_t _elemhH =3D HH;
> +
> +#define AUS (1024)
> +#define BUS (31)
> +#define CUS (0)
> +#define DUS (655)
> +uint32_t expectedus2_1[2] =3D {AUS * AUS, BUS * AUS};
> +uint32_t expectedus2_2[2] =3D {AUS * BUS, BUS * BUS};
> +uint32_t expectedus4_1[4] =3D {AUS * AUS, BUS * AUS, CUS * AUS, DUS * AU=
S};
> +uint32_t expectedus4_2[4] =3D {AUS * BUS, BUS * BUS, CUS * BUS, DUS * BU=
S};
> +uint32_t expectedus4_3[4] =3D {AUS * CUS, BUS * CUS, CUS * CUS, DUS * CU=
S};
> +uint32_t expectedus4_4[4] =3D {AUS * DUS, BUS * DUS, CUS * DUS, DUS * DU=
S};
> +uint32_t _elemusA =3D AUS;
> +uint32_t _elemusB =3D BUS;
> +uint32_t _elemusC =3D CUS;
> +uint32_t _elemusD =3D DUS;
> +
> +#define AUH ((uint16_t) 0)
> +#define BUH ((uint16_t) 32)
> +#define CUH ((uint16_t) 102)
> +#define DUH ((uint16_t) 51)
> +#define EUH ((uint16_t) 71)
> +#define FUH ((uint16_t) 91)
> +#define GUH ((uint16_t) 48)
> +#define HUH ((uint16_t) 255)
> +uint16_t expecteduh4_1[4] =3D {AUH * AUH, BUH * AUH, CUH * AUH, DUH * AU=
H};
> +uint16_t expecteduh4_2[4] =3D {AUH * BUH, BUH * BUH, CUH * BUH, DUH * BU=
H};
> +uint16_t expecteduh4_3[4] =3D {AUH * CUH, BUH * CUH, CUH * CUH, DUH * CU=
H};
> +uint16_t expecteduh4_4[4] =3D {AUH * DUH, BUH * DUH, CUH * DUH, DUH * DU=
H};
> +uint16_t expecteduh8_1[8] =3D {AUH * AUH, BUH * AUH, CUH * AUH, DUH * AU=
H,
> +			     EUH * AUH, FUH * AUH, GUH * AUH, HUH * AUH};
> +uint16_t expecteduh8_2[8] =3D {AUH * BUH, BUH * BUH, CUH * BUH, DUH * BU=
H,
> +			     EUH * BUH, FUH * BUH, GUH * BUH, HUH * BUH};
> +uint16_t expecteduh8_3[8] =3D {AUH * CUH, BUH * CUH, CUH * CUH, DUH * CU=
H,
> +			     EUH * CUH, FUH * CUH, GUH * CUH, HUH * CUH};
> +uint16_t expecteduh8_4[8] =3D {AUH * DUH, BUH * DUH, CUH * DUH, DUH * DU=
H,
> +			     EUH * DUH, FUH * DUH, GUH * DUH, HUH * DUH};
> +uint16_t expecteduh8_5[8] =3D {AUH * EUH, BUH * EUH, CUH * EUH, DUH * EU=
H,
> +			     EUH * EUH, FUH * EUH, GUH * EUH, HUH * EUH};
> +uint16_t expecteduh8_6[8] =3D {AUH * FUH, BUH * FUH, CUH * FUH, DUH * FU=
H,
> +			     EUH * FUH, FUH * FUH, GUH * FUH, HUH * FUH};
> +uint16_t expecteduh8_7[8] =3D {AUH * GUH, BUH * GUH, CUH * GUH, DUH * GU=
H,
> +			     EUH * GUH, FUH * GUH, GUH * GUH, HUH * GUH};
> +uint16_t expecteduh8_8[8] =3D {AUH * HUH, BUH * HUH, CUH * HUH, DUH * HU=
H,
> +			     EUH * HUH, FUH * HUH, GUH * HUH, HUH * HUH};
> +uint16_t _elemuhA =3D AUH;
> +uint16_t _elemuhB =3D BUH;
> +uint16_t _elemuhC =3D CUH;
> +uint16_t _elemuhD =3D DUH;
> +uint16_t _elemuhE =3D EUH;
> +uint16_t _elemuhF =3D FUH;
> +uint16_t _elemuhG =3D GUH;
> +uint16_t _elemuhH =3D HUH;
> +
> +void
> +check_v2sf (float32_t elemA, float32_t elemB)
> +{
> +  int32_t indx;
> +  const float32_t vec32x2_buf[2] =3D {A, B};
> +  float32x2_t vec32x2_src =3D vld1_f32 (vec32x2_buf);
> +  float32x2_t vec32x2_res =3D vec32x2_src * elemA;
> +
> +  for (indx =3D 0; indx < 2; indx++)
> +    if (* (uint32_t *) &vec32x2_res[indx] !=3D * (uint32_t *) &expected2=
_1[indx])
> +      abort ();
> +
> +  vec32x2_res =3D vec32x2_src * elemB;
> +
> +  for (indx =3D 0; indx < 2; indx++)
> +    if (* (uint32_t *) &vec32x2_res[indx] !=3D * (uint32_t *) &expected2=
_2[indx])
> +      abort ();
> +
> +/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.2s, v\[0-9\]+\.2s=
, v\[0-9\]+\.s\\\[0\\\]" 2 } } */
> +}
> +
> +void
> +check_v4sf (float32_t elemA, float32_t elemB, float32_t elemC, float32_t=
 elemD)
> +{
> +  int32_t indx;
> +  const float32_t vec32x4_buf[4] =3D {A, B, C, D};
> +  float32x4_t vec32x4_src =3D vld1q_f32 (vec32x4_buf);
> +  float32x4_t vec32x4_res =3D vec32x4_src * elemA;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (* (uint32_t *) &vec32x4_res[indx] !=3D * (uint32_t *) &expected4=
_1[indx])
> +      abort ();
> +
> +  vec32x4_res =3D vec32x4_src * elemB;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (* (uint32_t *) &vec32x4_res[indx] !=3D * (uint32_t *) &expected4=
_2[indx])
> +      abort ();
> +
> +  vec32x4_res =3D vec32x4_src * elemC;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (* (uint32_t *) &vec32x4_res[indx] !=3D * (uint32_t *) &expected4=
_3[indx])
> +      abort ();
> +
> +  vec32x4_res =3D vec32x4_src * elemD;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (* (uint32_t *) &vec32x4_res[indx] !=3D * (uint32_t *) &expected4=
_4[indx])
> +      abort ();
> +
> +/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.4s, v\[0-9\]+\.4s=
, v\[0-9\]+\.s\\\[0\\\]" 4 } } */
> +}
> +
> +void
> +check_v2df (float64_t elemdC, float64_t elemdD)
> +{
> +  int32_t indx;
> +  const float64_t vec64x2_buf[2] =3D {AD, BD};
> +  float64x2_t vec64x2_src =3D vld1q_f64 (vec64x2_buf);
> +  float64x2_t vec64x2_res =3D vec64x2_src * elemdC;
> +
> +  for (indx =3D 0; indx < 2; indx++)
> +    if (* (uint64_t *) &vec64x2_res[indx] !=3D * (uint64_t *) &expectedd=
2_1[indx])
> +      abort ();
> +
> +  vec64x2_res =3D vec64x2_src * elemdD;
> +
> +  for (indx =3D 0; indx < 2; indx++)
> +    if (* (uint64_t *) &vec64x2_res[indx] !=3D * (uint64_t *) &expectedd=
2_2[indx])
> +      abort ();
> +
> +/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.2d, v\[0-9\]+\.2d=
, v\[0-9\]+\.d\\\[0\\\]" 2 } } */
> +}
> +
> +void
> +check_v2si (int32_t elemsA, int32_t elemsB)
> +{
> +  int32_t indx;
> +  const int32_t vecs32x2_buf[2] =3D {AS, BS};
> +  int32x2_t vecs32x2_src =3D vld1_s32 (vecs32x2_buf);
> +  int32x2_t vecs32x2_res =3D vecs32x2_src * elemsA;
> +
> +  for (indx =3D 0; indx < 2; indx++)
> +    if (vecs32x2_res[indx] !=3D expecteds2_1[indx])
> +      abort ();
> +
> +  vecs32x2_res =3D vecs32x2_src * elemsB;
> +
> +  for (indx =3D 0; indx < 2; indx++)
> +    if (vecs32x2_res[indx] !=3D expecteds2_2[indx])
> +      abort ();
> +}
> +
> +void
> +check_v2si_unsigned (uint32_t elemusA, uint32_t elemusB)
> +{
> +  int indx;
> +  const uint32_t vecus32x2_buf[2] =3D {AUS, BUS};
> +  uint32x2_t vecus32x2_src =3D vld1_u32 (vecus32x2_buf);
> +  uint32x2_t vecus32x2_res =3D vecus32x2_src * elemusA;
> +
> +  for (indx =3D 0; indx < 2; indx++)
> +    if (vecus32x2_res[indx] !=3D expectedus2_1[indx])
> +      abort ();
> +
> +  vecus32x2_res =3D vecus32x2_src * elemusB;
> +
> +  for (indx =3D 0; indx < 2; indx++)
> +    if (vecus32x2_res[indx] !=3D expectedus2_2[indx])
> +      abort ();
> +
> +/* { dg-final { scan-assembler-times "\tmul\tv\[0-9\]+\.2s, v\[0-9\]+\.2=
s, v\[0-9\]+\.s\\\[0\\\]" 4 } } */
> +}
> +
> +void
> +check_v4si (int32_t elemsA, int32_t elemsB, int32_t elemsC, int32_t elem=
sD)
> +{
> +  int32_t indx;
> +  const int32_t vecs32x4_buf[4] =3D {AS, BS, CS, DS};
> +  int32x4_t vecs32x4_src =3D vld1q_s32 (vecs32x4_buf);
> +  int32x4_t vecs32x4_res =3D vecs32x4_src * elemsA;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecs32x4_res[indx] !=3D expecteds4_1[indx])
> +      abort ();
> +
> +  vecs32x4_res =3D vecs32x4_src * elemsB;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecs32x4_res[indx] !=3D expecteds4_2[indx])
> +      abort ();
> +
> +  vecs32x4_res =3D vecs32x4_src * elemsC;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecs32x4_res[indx] !=3D expecteds4_3[indx])
> +      abort ();
> +
> +  vecs32x4_res =3D vecs32x4_src * elemsD;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecs32x4_res[indx] !=3D expecteds4_4[indx])
> +      abort ();
> +}
> +
> +void
> +check_v4si_unsigned (uint32_t elemusA, uint32_t elemusB, uint32_t elemus=
C,
> +		     uint32_t elemusD)
> +{
> +  int indx;
> +  const uint32_t vecus32x4_buf[4] =3D {AUS, BUS, CUS, DUS};
> +  uint32x4_t vecus32x4_src =3D vld1q_u32 (vecus32x4_buf);
> +  uint32x4_t vecus32x4_res =3D vecus32x4_src * elemusA;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecus32x4_res[indx] !=3D expectedus4_1[indx])
> +      abort ();
> +
> +  vecus32x4_res =3D vecus32x4_src * elemusB;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecus32x4_res[indx] !=3D expectedus4_2[indx])
> +      abort ();
> +
> +  vecus32x4_res =3D vecus32x4_src * elemusC;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecus32x4_res[indx] !=3D expectedus4_3[indx])
> +      abort ();
> +
> +  vecus32x4_res =3D vecus32x4_src * elemusD;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecus32x4_res[indx] !=3D expectedus4_4[indx])
> +      abort ();
> +
> +/* { dg-final { scan-assembler-times "\tmul\tv\[0-9\]+\.4s, v\[0-9\]+\.4=
s, v\[0-9\]+\.s\\\[0\\\]" 8 } } */
> +}
> +
> +
> +void
> +check_v4hi (int16_t elemhA, int16_t elemhB, int16_t elemhC, int16_t elem=
hD)
> +{
> +  int32_t indx;
> +  const int16_t vech16x4_buf[4] =3D {AH, BH, CH, DH};
> +  int16x4_t vech16x4_src =3D vld1_s16 (vech16x4_buf);
> +  int16x4_t vech16x4_res =3D vech16x4_src * elemhA;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vech16x4_res[indx] !=3D expectedh4_1[indx])
> +      abort ();
> +
> +  vech16x4_res =3D vech16x4_src * elemhB;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vech16x4_res[indx] !=3D expectedh4_2[indx])
> +      abort ();
> +
> +  vech16x4_res =3D vech16x4_src * elemhC;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vech16x4_res[indx] !=3D expectedh4_3[indx])
> +      abort ();
> +
> +  vech16x4_res =3D vech16x4_src * elemhD;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vech16x4_res[indx] !=3D expectedh4_4[indx])
> +      abort ();
> +}
> +
> +void
> +check_v4hi_unsigned (uint16_t elemuhA, uint16_t elemuhB, uint16_t elemuh=
C,
> +		     uint16_t elemuhD)
> +{
> +  int indx;
> +  const uint16_t vecuh16x4_buf[4] =3D {AUH, BUH, CUH, DUH};
> +  uint16x4_t vecuh16x4_src =3D vld1_u16 (vecuh16x4_buf);
> +  uint16x4_t vecuh16x4_res =3D vecuh16x4_src * elemuhA;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecuh16x4_res[indx] !=3D expecteduh4_1[indx])
> +      abort ();
> +
> +  vecuh16x4_res =3D vecuh16x4_src * elemuhB;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecuh16x4_res[indx] !=3D expecteduh4_2[indx])
> +      abort ();
> +
> +  vecuh16x4_res =3D vecuh16x4_src * elemuhC;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecuh16x4_res[indx] !=3D expecteduh4_3[indx])
> +      abort ();
> +
> +  vecuh16x4_res =3D vecuh16x4_src * elemuhD;
> +
> +  for (indx =3D 0; indx < 4; indx++)
> +    if (vecuh16x4_res[indx] !=3D expecteduh4_4[indx])
> +      abort ();
> +
> +/* { dg-final { scan-assembler-times "mul\tv\[0-9\]+\.4h, v\[0-9\]+\.4h,=
 v\[0-9\]+\.h\\\[0\\\]" 8 } } */
> +}
> +
> +void
> +check_v8hi (int16_t elemhA, int16_t elemhB, int16_t elemhC, int16_t elem=
hD,
> +	    int16_t elemhE, int16_t elemhF, int16_t elemhG, int16_t elemhH)
> +{
> +  int32_t indx;
> +  const int16_t vech16x8_buf[8] =3D {AH, BH, CH, DH, EH, FH, GH, HH};
> +  int16x8_t vech16x8_src =3D vld1q_s16 (vech16x8_buf);
> +  int16x8_t vech16x8_res =3D vech16x8_src * elemhA;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vech16x8_res[indx] !=3D expectedh8_1[indx])
> +      abort ();
> +
> +  vech16x8_res =3D vech16x8_src * elemhB;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vech16x8_res[indx] !=3D expectedh8_2[indx])
> +      abort ();
> +
> +  vech16x8_res =3D vech16x8_src * elemhC;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vech16x8_res[indx] !=3D expectedh8_3[indx])
> +      abort ();
> +
> +  vech16x8_res =3D vech16x8_src * elemhD;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vech16x8_res[indx] !=3D expectedh8_4[indx])
> +      abort ();
> +
> +  vech16x8_res =3D vech16x8_src * elemhE;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vech16x8_res[indx] !=3D expectedh8_5[indx])
> +      abort ();
> +
> +  vech16x8_res =3D vech16x8_src * elemhF;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vech16x8_res[indx] !=3D expectedh8_6[indx])
> +      abort ();
> +
> +  vech16x8_res =3D vech16x8_src * elemhG;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vech16x8_res[indx] !=3D expectedh8_7[indx])
> +      abort ();
> +
> +  vech16x8_res =3D vech16x8_src * elemhH;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vech16x8_res[indx] !=3D expectedh8_8[indx])
> +      abort ();
> +}
> +
> +void
> +check_v8hi_unsigned (uint16_t elemuhA, uint16_t elemuhB, uint16_t elemuh=
C,
> +		     uint16_t elemuhD, uint16_t elemuhE, uint16_t elemuhF,
> +		     uint16_t elemuhG, uint16_t elemuhH)
> +{
> +  int indx;
> +  const uint16_t vecuh16x8_buf[8] =3D {AUH, BUH, CUH, DUH, EUH, FUH, GUH=
, HUH};
> +  uint16x8_t vecuh16x8_src =3D vld1q_u16 (vecuh16x8_buf);
> +  uint16x8_t vecuh16x8_res =3D vecuh16x8_src * elemuhA;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vecuh16x8_res[indx] !=3D expecteduh8_1[indx])
> +      abort ();
> +
> +  vecuh16x8_res =3D vecuh16x8_src * elemuhB;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vecuh16x8_res[indx] !=3D expecteduh8_2[indx])
> +      abort ();
> +
> +  vecuh16x8_res =3D vecuh16x8_src * elemuhC;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vecuh16x8_res[indx] !=3D expecteduh8_3[indx])
> +      abort ();
> +
> +  vecuh16x8_res =3D vecuh16x8_src * elemuhD;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vecuh16x8_res[indx] !=3D expecteduh8_4[indx])
> +      abort ();
> +
> +  vecuh16x8_res =3D vecuh16x8_src * elemuhE;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vecuh16x8_res[indx] !=3D expecteduh8_5[indx])
> +      abort ();
> +
> +  vecuh16x8_res =3D vecuh16x8_src * elemuhF;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vecuh16x8_res[indx] !=3D expecteduh8_6[indx])
> +      abort ();
> +
> +  vecuh16x8_res =3D vecuh16x8_src * elemuhG;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vecuh16x8_res[indx] !=3D expecteduh8_7[indx])
> +      abort ();
> +
> +  vecuh16x8_res =3D vecuh16x8_src * elemuhH;
> +
> +  for (indx =3D 0; indx < 8; indx++)
> +    if (vecuh16x8_res[indx] !=3D expecteduh8_8[indx])
> +      abort ();
> +
> +/* { dg-final { scan-assembler-times "mul\tv\[0-9\]+\.8h, v\[0-9\]+\.8h,=
 v\[0-9\]+\.h\\\[0\\\]" 16 } } */
> +}
> +
> +int
> +main (void)
> +{
> +  check_v2sf (_elemA, _elemB);
> +  check_v4sf (_elemA, _elemB, _elemC, _elemD);
> +  check_v2df (_elemdC, _elemdD);
> +  check_v2si (_elemsA, _elemsB);
> +  check_v4si (_elemsA, _elemsB, _elemsC, _elemsD);
> +  check_v4hi (_elemhA, _elemhB, _elemhC, _elemhD);
> +  check_v8hi (_elemhA, _elemhB, _elemhC, _elemhD,
> +	      _elemhE, _elemhF, _elemhG, _elemhH);
> +  check_v2si_unsigned (_elemusA, _elemusB);
> +  check_v4si_unsigned (_elemusA, _elemusB, _elemusC, _elemusD);
> +  check_v4hi_unsigned (_elemuhA, _elemuhB, _elemuhC, _elemuhD);
> +  check_v8hi_unsigned (_elemuhA, _elemuhB, _elemuhC, _elemuhD,
> +		       _elemuhE, _elemuhF, _elemuhG, _elemuhH);
> +
> +  return 0;
> +}
> +
>=20