From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60066.outbound.protection.outlook.com [40.107.6.66]) by sourceware.org (Postfix) with ESMTPS id 985C6384400A for ; Wed, 2 Jun 2021 09:28:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 985C6384400A Received: from PR2P264CA0002.FRAP264.PROD.OUTLOOK.COM (2603:10a6:101::14) by VI1PR08MB2653.eurprd08.prod.outlook.com (2603:10a6:802:1b::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4173.24; Wed, 2 Jun 2021 09:28:31 +0000 Received: from VE1EUR03FT020.eop-EUR03.prod.protection.outlook.com (2603:10a6:101:0:cafe::e9) by PR2P264CA0002.outlook.office365.com (2603:10a6:101::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4195.15 via Frontend Transport; Wed, 2 Jun 2021 09:28:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT020.mail.protection.outlook.com (10.152.18.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.30 via Frontend Transport; Wed, 2 Jun 2021 09:28:30 +0000 Received: ("Tessian outbound 836922dda4f1:v93"); Wed, 02 Jun 2021 09:28:30 +0000 X-CR-MTA-TID: 64aa7808 Received: from 3ad08724d7b9.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 60BBDF08-D1CB-445F-90CE-FFA45996304D.1; Wed, 02 Jun 2021 09:28:20 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3ad08724d7b9.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 02 Jun 2021 09:28:20 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FyIz5+YvDL111OaFtmTmBpOn+QES16NvKT/Cc7sMNXf1YvLgD6amWj3TF6JzzMkzo4HfwpYE0Va4kX82lwZOPg6JbjBDLG2kFcUyj7nMW14qu++vKkKxR3a0JnJsCF1rrWlAJiQ344bZDzlxSdKFS+rcOG7u7KptaTwIWjxOpP9YA82sg9ho7oU7/NvSdCMp4HrG7BNvkKJqs84IqzvitvB0nzk7XeUoWmQiwBu25R+PggzphMOIVKEWPMzK/j12L1ORgVeLDZaJqJiskUO35DUzTf6tN6u+qoxLN5nDmx67WDn7jK5DgeiV6mBe3BWVbZEBl5tWEE5YaEZj+3R02w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Qg9GY1bG6fHDan0VK8mg4itiMGMToiN4L+NHXSGmvoQ=; b=EIEDvue7onnN0dq3eCa+dDax4fii+GJMvGyZl3G0Htp9yxkogFWMVac3e3C1vf27Cl0HgsNjY8oHgqqiXiyqBvRU8CfZi/VJJG95A+m0/RmGJn6D0pby/R8AvgR3tTsCz4Hy1hB2WcgDF/s6D7gjWcPPxZZpxwT/o2LMhbhKbYvzSwznjMcYlkR6GtdKE6qW1EWZ8T+AYo+JgUf+pTXhGzQF4xWR3K3dyVOxqemGGqNyAUwCj6De+eTp834Wd032DFO1e7F2SJFLLW0zhgU9jL3wQVAhzq8tWpWpgzg7lMH3m5TAJHld4fGbJyde48g5KzSf3ztMyU7tXuHfMwF0tw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR0801MB1727.eurprd08.prod.outlook.com (2603:10a6:800:5a::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4173.24; Wed, 2 Jun 2021 09:28:17 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f557:1fb2:62cc:5243]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f557:1fb2:62cc:5243%9]) with mapi id 15.20.4173.030; Wed, 2 Jun 2021 09:28:17 +0000 From: Tamar Christina To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd , Richard Sandiford Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes. Thread-Topic: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes. Thread-Index: AQHXQdV4g2WFnSd+rEyE6iZSy1EmNqrX6WiAgAABSHCABLQJgIAAAwwAgAAbeYCAF6c1YIABMgmAgAsI3tA= Date: Wed, 2 Jun 2021 09:28:17 +0000 Message-ID: References: <7q3oonr2-92r0-8o9q-s27q-9r735s4n3s3@fhfr.qr> In-Reply-To: <7q3oonr2-92r0-8o9q-s27q-9r735s4n3s3@fhfr.qr> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: BFD89EEE4BA2BF409ACA24F80E3D6DF8.0 x-checkrecipientchecked: true Authentication-Results-Original: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.11.185.166] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: acd36a27-4a96-4f7d-eb03-08d925a8c939 x-ms-traffictypediagnostic: VI1PR0801MB1727:|VI1PR08MB2653: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:4941;OLM:4941; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Q0TNx7xivh8Uv8be2bZQQRPvfyLYc7t46pFTF1n7uZ+K8ROuVq8q69qZfWYcwf9ENMP3B7sKPkp0Et5JgoitWYzLjLZJu9UAjCnbl4D/S1898hzfXJ7bSzOO0meP0c5XKcNowNqyvljE8tfkceLRW76/FkgPGK959kDo+3e5Vv7QQAMi1rRl/Sjaa/dDotGKm/Wpnp1jjDS5KBaoZNFtN7KqYeBDrRzLfc/lrNSfXZoYGHkn6D/8T3ES9enJgirJSj7/Cagp+5BooRtUIxtBAvSPZaoW5wwnIM4dco+8OZqWTHWo9hbf3W3Dg9QcNNX0TtxQwBvOWMQkEq3HEaJBeW4d/ohBpXUFiKtJAd4ARyB6ZwVHxvNf4pZToaDCuerMRVXv8jgT3asU1mKFprWQwY5HfEaa1ZWC+jnZe5c0iVtrFlHj1ff1zVEsFQ4Ab8ZgN2gaiXgWjQfhbAhqOiitAR16WspgBvpevnWbcqpsB7zgP279MMnvqVX8t3n0e1LSmSlUWzJJdKiC4g7/OOWRQE/eLJWa323vz618GiX1bBJIs5QWxUyHo5sIhKTXcp9bBhVxL9xmmt4iNrzSTSTDaMvSvsmvx/RZfOYJtWjsI/J4LdVzdyhQvT0+M+3jI0HZnNyTW1eVhaIIzBId8WEmPQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(136003)(376002)(396003)(346002)(39860400002)(30864003)(5660300002)(86362001)(26005)(8936002)(66446008)(66556008)(6506007)(53546011)(83380400001)(478600001)(71200400001)(66476007)(6916009)(76116006)(64756008)(55016002)(66946007)(52536014)(122000001)(9686003)(8676002)(186003)(7696005)(2906002)(4326008)(38100700002)(316002)(33656002)(54906003)(579004)(559001)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?5jQa5Esvn9iY2xXOD6/Y6YTZpTwSYFF59zSw817zwo+ABEeXhv72qbaqTD0W?= =?us-ascii?Q?YgKS3/teX6OJN+WykQn3qufmo/Kr+ImXlTaas+P7zUDg4L31HXn27CJMos6C?= =?us-ascii?Q?ATQIFZzAKGGhKgj+QlCAGhr3o4LlweuekGyqm8N2xNdqdC9AURSI7WvwNHoj?= =?us-ascii?Q?ft402VbR0c3GjuZ7CrzsMLhbzGxzCP7CB9ksoz/7wJlP3TM+0LCuYThTS/p6?= =?us-ascii?Q?ZR7qaAZGgCQS1s7jbcSltcDcKbGh6gyv4yColjLRF7UxUbOaJsRXEfRXSZWV?= =?us-ascii?Q?AAF/ZqL9FkCdCMgdvHIoV+Tqq/La9M0+Kx3PxSF7qaVJasYKrDjayhhsGrkE?= =?us-ascii?Q?dtj9doI4MNyvRM1xclzVCA4pdm9xlWsFI0OCh6wY2mR55G9yqWhoUqonRRda?= =?us-ascii?Q?TixPojT2bs5C6kIsTT2iK4Pc7IUU336KaM6pRrdMZF2SrbtVeAFFoE3NLcH5?= =?us-ascii?Q?ZahzEZyBwtZEwQDu3ieZXgqbzgcntxU2V9srzm7xYZgQZJvFblH4LTQeV7dy?= =?us-ascii?Q?Etfhw8unVBclAcTlv/H4bEURxEtTyefE4bhN6Du7YVIu60lIZwJEwu/ASPCj?= =?us-ascii?Q?k6yi1cimKeRWQOFuyH4FvJ3Zz1xMGwn6Nf1oWlEUV6TDweL1GWC3YMbmHU/u?= =?us-ascii?Q?yoGcyTKQUtBywvh4AQ4OzJueUO9Tq7pBSbO01Nga9wYl/dKuBOKgSNDZKVzl?= =?us-ascii?Q?4lue/CTWnctvc5hJzyqcdasNUz+YTFlPHq+ojyw/B+SA5YsL8ulOdTzVP+/c?= =?us-ascii?Q?QG6NtMcUG7XL49sSldujsoXgAdXofmFpdXX2Stg1a1RHVnGT28vikRcbI4E1?= =?us-ascii?Q?XmnHnAgazQWvJj05m1bsQ5dpYhXUOw49wA2DQNEqc7RwFyDfZLKpHZqblChp?= =?us-ascii?Q?KbmNQe3y6mvRpU13l9spn0aBzgZmj4rdZdIL7AKpa5ylIERFihGJ7CGKApdD?= =?us-ascii?Q?TgejvT4nSb01VBtvNySMtDuZoiaShnn//xdkHIlUmaZM4MZZp7jMT5XoT8XF?= =?us-ascii?Q?5xiZsaYvB8zjCB0H5Ht5+TjDYbAIpAfmBJM8KLVjR5EYX70QaFglzcJmtQlD?= =?us-ascii?Q?dGRLWF5/3Hx7z0TRdo9bAV0tRbPHU7jIhFH/DKQuxuf63uIlMvPfIWNlNvBG?= =?us-ascii?Q?8zdds86SpmSRUGfMP1rpdiGEBZ9jVQOsWn/9IycF+CU7f8mdYUYogs17eUXO?= =?us-ascii?Q?JS6XQFthUutIqHz0fN9WI27IWxVd/2zKSNAfaHv/uetfTpyri/js6Au3l8Fp?= =?us-ascii?Q?EF2QyZ0alT6YrSj72ksRdUGpYtTzOeWYxjFUQKVjKqyWFtnFkyihE86Dtf5G?= =?us-ascii?Q?9DM=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1727 Original-Authentication-Results: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT020.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 966659e0-6417-4d6f-5dc3-08d925a8c14e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: x8ROByOB3UdFcPFw4yxIHZXWVFK0ZzV0ptjKD7hISDMSoW6VPNkS7IdfBidbzhX7dnoGw9J55TXdT+7dt4OKZYgxTn3h9UBVoTUBtFR2ggmkwzYkhNUNt5S/WVzOBOrgIAANfLza2mljOqoqChGsGpCR5s0nDHnG1xzFfKT5WeGKKyulnwtspmBlAT0ihhA726lm8Ets65VC0z/Zb0XEqvv72A8jC8+Zkkwfc5pmCAYwOBwHEVTHxp/sLNimy8ehSTPvtdveQl7ABOL48KiD8ZXBUcN7HD6qOVvbkuENqYyxLTnOWPP0/IVb2Mj/+WPWIDfmW9AjMjIBljoi0OkNnbyYhWbukHLTEIHMXkiCKqGKRLUfu4OETa6CFtxkfnvPjl/LTH2pT8sSIE+6sJGtxDPFEBzsKfkUTcIrduMumLPNvrbAo1FBzxjKzXVisWyO/A2ibjJrlaNIomh1puybPvnIxvepopzzyW2isTJz6kVAVY3EEvF/JqjAIneO4QrjT1DAFd5qSelbX9kHGKJMm9dIJMqLmK7ij6rE9nCkRoLIRzqN3szEBi61dFz7wTPUDF6moGrKLVMYJhnoArofqTkCkFPB/jCEqrPLCHCckChnmhDNm5Bk/RAb/YuTu+Lgh2lsITvKeg+HdiyDOAuq90arS1yTEaThQK5kXe/8Eyv0hLyzvMthtU8mfsrcLpCc X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(9686003)(53546011)(6506007)(8936002)(36860700001)(82310400003)(54906003)(4326008)(26005)(83380400001)(52536014)(7696005)(356005)(47076005)(86362001)(5660300002)(8676002)(70206006)(336012)(498600001)(70586007)(33656002)(30864003)(55016002)(6862004)(186003)(81166007)(2906002)(579004)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jun 2021 09:28:30.8676 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: acd36a27-4a96-4f7d-eb03-08d925a8c939 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT020.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB2653 X-Spam-Status: No, score=-14.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jun 2021 09:28:46 -0000 Ping, Did you have any comments Richard S? Otherwise I'll proceed with respining according to Richi's comments. Regards, Tamar > -----Original Message----- > From: Richard Biener > Sent: Wednesday, May 26, 2021 9:57 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product > where the sign for the multiplicant changes. >=20 > On Tue, 25 May 2021, Tamar Christina wrote: >=20 > > Hi Richi, > > > > Here's a respun version of the patch. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? >=20 > index > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d > 2d0b3cd88f0af7c > 100644 > --- a/gcc/tree-cfg.c > +++ b/gcc/tree-cfg.c > @@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt) > && !SCALAR_FLOAT_TYPE_P (rhs1_type)) > || (!INTEGRAL_TYPE_P (lhs_type) > && !SCALAR_FLOAT_TYPE_P (lhs_type)))) > - || !types_compatible_p (rhs1_type, rhs2_type) > + || (!types_compatible_p (rhs1_type, rhs2_type) > + && TYPE_SIGN (rhs1_type) =3D=3D TYPE_SIGN (rhs2_type) > + && TYPE_PRECISION (rhs1_type) !=3D TYPE_PRECISION > (rhs2_type)) >=20 > I think this doesn't capture the constraints - instead please do >=20 > - || !types_compatible_p (rhs1_type, rhs2_type) > + /* rhs1_type and rhs2_type may differ in sign. */ > + || !tree_nop_conversion_p (rhs1_type, rhs2_type) >=20 >=20 > +/* Determine the optab_subtype to use for the given CODE and STMT. For > + most CODE this will be optab_vector, however for certain operations > such as > + DOT_PROD_EXPR where the operation can different signs for the > operands > we > + need to be able to pick the right optabs. */ > + > +static enum optab_subtype > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo) >=20 > vect_determine_optab_subkind would be a better name. 'code' is > redundant (or should better match stmt_vinfo->stmts code). I wonder > if it might be clearer to compute the subtype where we compute 'code' > and the relation to stmt_info is obvious, I mean here: >=20 > /* 3. Check the operands of the operation. The first operands are > defined > inside the loop body. The last operand is the reduction variable, > which is defined by the loop-header-phi. */ >=20 > tree vectype_out =3D STMT_VINFO_VECTYPE (stmt_info); > STMT_VINFO_REDUC_VECTYPE (reduc_info) =3D vectype_out; > gassign *stmt =3D as_a (stmt_info->stmt); > enum tree_code code =3D gimple_assign_rhs_code (stmt); > bool lane_reduc_code_p > =3D (code =3D=3D DOT_PROD_EXPR || code =3D=3D WIDEN_SUM_EXPR || code = =3D=3D > SAD_EXPR); >=20 > so just add >=20 > enum optab_subtype optab_query_kind =3D optab_vector; > if (code =3D=3D DOT_PROD_EXPR > && ) > optab_query_kind =3D optab_vector_mixed_sign; >=20 > in this place and avoid adding the new function? >=20 > I'm not too familiar with the pattern recog code, a 2nd eye would be > prefered (Richard?), but >=20 > + /* Check if the mismatch is only in the sign and if we have > + allow_short_sign_mismatch then allow it. */ > + if (unprom_type > + && TYPE_SIGN (unprom_type) =3D=3D SIGNED > + && TYPE_SIGN (*common_type) !=3D TYPE_SIGN (new_type)) > + { > + bool sign =3D TYPE_SIGN (*common_type) =3D=3D UNSIGNED; > + tree eq_type > + =3D build_nonstandard_integer_type (TYPE_PRECISION (new_type), > + sign); > + > + if (types_compatible_p (*common_type, eq_type)) > + return true; > + } >=20 > looks somewhat complicated - is that equal to >=20 > if (unprom_type > && tree_nop_conversion_p (*common_type, new_type)) > return true; >=20 > ? That is, *common_type and new_type only differ in sign? >=20 > @@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo, > stmt_vec_info > stmt_info, unsigned int n, > for (j =3D 0; j < i; ++j) > if (unprom[j].op =3D=3D unprom[i].op) > break; > + bool only_sign =3D allow_short_sign_mismatch > + && TYPE_SIGN (type) !=3D TYPE_SIGN (unprom[i].type= ) > + && TYPE_PRECISION (type) =3D=3D TYPE_PRECISION > (unprom[i].type); >=20 > this could use the same tree_nop_conversion_p predicate. >=20 > Otherwise the patch looks good. >=20 > Thanks, > Richard. >=20 >=20 >=20 > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * optabs.def (usdot_prod_optab): New. > > * doc/md.texi: Document it and clarify other dot prod optabs. > > * optabs-tree.h (enum optab_subtype): Add > optab_vector_mixed_sign. > > * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab. > > * optabs.c (expand_widen_pattern_expr): Likewise. > > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > > * tree-vect-loop.c (vect_determine_dot_kind): New. > > (vectorizable_reduction): Query dot-product kind. > > * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take > optional > > optab subtype. > > (vect_joust_widened_type, vect_widened_op_tree): Optionally > ignore > > mismatch types. > > (vect_recog_dot_prod_pattern): Support usdot_prod_optab. > > > > > > > -----Original Message----- > > > From: Richard Biener > > > Sent: Monday, May 10, 2021 2:29 PM > > > To: Tamar Christina > > > Cc: gcc-patches@gcc.gnu.org; nd > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product > > > where the sign for the multiplicant changes. > > > > > > On Mon, 10 May 2021, Tamar Christina wrote: > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Richard Biener > > > > > Sent: Monday, May 10, 2021 12:40 PM > > > > > To: Tamar Christina > > > > > Cc: gcc-patches@gcc.gnu.org; nd > > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot- > product > > > > > where the sign for the multiplicant changes. > > > > > > > > > > On Fri, 7 May 2021, Tamar Christina wrote: > > > > > > > > > > > Hi Richi, > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Richard Biener > > > > > > > Sent: Friday, May 7, 2021 12:46 PM > > > > > > > To: Tamar Christina > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd > > > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for > > > > > > > dot-product where the sign for the multiplicant changes. > > > > > > > > > > > > > > On Wed, 5 May 2021, Tamar Christina wrote: > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > This patch adds support for a dot product where the sign of > > > > > > > > the multiplication arguments differ. i.e. one is signed and > > > > > > > > one is unsigned but the precisions are the same. > > > > > > > > > > > > > > > > #define N 480 > > > > > > > > #define SIGNEDNESS_1 unsigned > > > > > > > > #define SIGNEDNESS_2 signed > > > > > > > > #define SIGNEDNESS_3 signed > > > > > > > > #define SIGNEDNESS_4 unsigned > > > > > > > > > > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 in= t > > > > > > > > res, > > > > > > > > SIGNEDNESS_3 char *restrict a, > > > > > > > > SIGNEDNESS_4 char *restrict b) { > > > > > > > > for (__INTPTR_TYPE__ i =3D 0; i < N; ++i) > > > > > > > > { > > > > > > > > int av =3D a[i]; > > > > > > > > int bv =3D b[i]; > > > > > > > > SIGNEDNESS_2 short mult =3D av * bv; > > > > > > > > res +=3D mult; > > > > > > > > } > > > > > > > > return res; > > > > > > > > } > > > > > > > > > > > > > > > > The operations are performed as if the operands were > extended > > > > > > > > to a 32-bit > > > > > > > value. > > > > > > > > As such this operation isn't valid if there is an intermedi= ate > > > > > > > > conversion to an unsigned value. i.e. if SIGNEDNESS_2 is > unsigned. > > > > > > > > > > > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are > > > > > > > > flipped the same optab is used but the operands are flipped= in > > > > > > > > the optab > > > > > > > expansion. > > > > > > > > > > > > > > > > To support this the patch extends the dot-product detection= to > > > > > > > > optionally ignore operands with different signs and stores > > > > > > > > this information in the optab subtype which is now made a > bitfield. > > > > > > > > > > > > > > > > The subtype can now additionally controls which optab an EX= PR > > > > > > > > can expand > > > > > > > to. > > > > > > > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no > issues. > > > > > > > > > > > > > > > > Ok for master? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Tamar > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > > > * optabs.def (usdot_prod_optab): New. > > > > > > > > * doc/md.texi: Document it. > > > > > > > > * optabs-tree.c (optab_for_tree_code): Support > > > usdot_prod_optab. > > > > > > > > * optabs-tree.h (enum optab_subtype): Likewise. > > > > > > > > * optabs.c (expand_widen_pattern_expr): Likewise. > > > > > > > > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > > > > > > > > * tree-vect-loop.c (vect_determine_dot_kind): New. > > > > > > > > (vectorizable_reduction): Query dot-product kind. > > > > > > > > * tree-vect-patterns.c (vect_supportable_direct_optab_p): > > > > > > > > Take > > > > > > > optional > > > > > > > > optab subtype. > > > > > > > > (vect_joust_widened_type, vect_widened_op_tree): > > > Optionally > > > > > > > ignore > > > > > > > > mismatch types. > > > > > > > > (vect_recog_dot_prod_pattern): Support usdot_prod_optab. > > > > > > > > > > > > > > > > --- inline copy of patch -- > > > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index > > > > > > > > > > > > > > > > > > > > > > > > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd > > > > > > > f2 > > > > > > > > e66bc80d7d23 100644 > > > > > > > > --- a/gcc/doc/md.texi > > > > > > > > +++ b/gcc/doc/md.texi > > > > > > > > @@ -5440,11 +5440,13 @@ Like > @samp{fold_left_plus_@var{m}}, > > > > > > > > but > > > > > > > takes > > > > > > > > an additional mask operand @item @samp{sdot_prod@var{m}} > > > > > @cindex > > > > > > > > @code{udot_prod@var{m}} instruction pattern @itemx > > > > > > > > @samp{udot_prod@var{m}} > > > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern > @itemx > > > > > > > > +@samp{usdot_prod@var{m}} > > > > > > > > Compute the sum of the products of two signed/unsigned > > > elements. > > > > > > > > -Operand 1 and operand 2 are of the same mode. Their produc= t, > > > > > > > > which is of a -wider mode, is computed and added to operand= 3. > > > > > > > > Operand 3 is of a mode equal or -wider than the mode of the > > > > > > > > product. The result is placed in operand 0, which -is of th= e > > > > > > > > same mode > > > > > as operand 3. > > > > > > > > +Operand 1 and operand 2 are of the same mode but may diffe= r > > > > > > > > +in > > > > > signs. > > > > > > > > +Their product, which is of a wider mode, is computed and > > > > > > > > +added to > > > > > > > operand 3. > > > > > > > > +Operand 3 is of a mode equal or wider than the mode of the > > > product. > > > > > > > > +The result is placed in operand 0, which is of the same mo= de > > > > > > > > +as > > > > > operand 3. > > > > > > > > > > > > > > This doesn't really say what the 's', 'u' and 'us' specify. > > > > > > > Since we're doing a widen multiplication and then a non-widen= ing > > > > > > > addition we only need to know the effective sign of the > > > > > > > multiplication so I think > > > > > the existing 's' and 'u' > > > > > > > are enough to cover all cases? > > > > > > > > > > > > The existing 's' and 'u' enforce that both operands of the > > > > > > multiplication are of the same sign. So for e.g. 'u' both oper= and > > > > > > must be > > > > > unsigned. > > > > > > > > > > > > In the `us` case one can be signed and one unsigned. Operationa= lly > > > > > > this does a sign extension to the wider type for the signed val= ue, > > > > > > and the unsigned value gets zero extended first, and then conve= rts > > > > > > it to unsigned to perform the unsigned multiplication, conformi= ng > > > > > > to the C > > > > > promotion rules. > > > > > > > > > > > > TL;DR; Without a new optab I can't tell during expansion which > > > > > > semantic the operation had at the gimple/C level as modes don't > carry > > > signs. > > > > > > > > > > > > Long version: > > > > > > > > > > > > The problem with using the existing patterns, because of their > > > > > > enforcement of `av` and `bv` being the same sign is that we can= 't > > > > > > remove the explicit sign extensions, but the multiplication mus= t > > > > > > be done on > > > > > the sign/zero extended char input in the same sign. > > > > > > > > > > > > Which means (unless I am mistaken) to get the correct result, y= ou > > > > > > can't use neither `udot` nor `sdot` as semantically these would > > > > > > zero or sign extend both operands from char to int to perform t= he > > > > > > multiplication in the same sigh. Whereas in this case, one > > > > > > parameter is zero > > > > > and one parameter is sign extended and the result is always an > > > > > unsigned number. > > > > > > > > > > > > So basically > > > > > > > > > > > > udot =3D=3D > > > > > > c =3D zero-ext (a) * zero-ext (b) sdot > > > > > b> =3D=3D > > > > > > c =3D sign-ext (a) * sign-ext (b) usdot > > > > > signed b> =3D=3D > > > > > > c =3D ((unsigned-conv) sign-ext (a)) * zero-ext (b) > > > > > > > > > > > > So semantically the existing optabs won't fit here. udot would > > > > > > internally promote to unsigned types before the multiplication = so > > > > > > the result of the multiplication would be wrong. sdot would > > > > > > promote both to > > > > > signed and do signed multiplication, so the result is also wrong. > > > > > > > > > > > > Now if I relax the constraint on the signs of udot and sdot the= re > > > > > > are two > > > > > problems: > > > > > > RTL Modes don't contain signs. So a target can't tell me how t= he > > > > > > operands > > > > > will be promoted. > > > > > > So: > > > > > > > > > > > > 1) I can't really check which semantics the target will adhere = to > > > > > > on > > > > > expansion. > > > > > > 2) at expand time I have no way to differentiate between the tw= o > > > > > instructions variants, given just modes > > > > > > I can't tell whether I expand to the normal dot-product or > > > > > > the new > > > > > instruction. > > > > > > > > > > Ah, OK. Indeed with such a weird instruction the new variant mak= es > > > sense. > > > > > Still can you please amend the optab documentation to say which > > > > > operand is unsigned and which is signed? Just 'may differ in sig= ns' > > > > > is bad. > > > > > > > > Sure, will expand on it. > > > > > > > > > > > > > > Since the multiplication is commutative I wonder why you need to > > > > > handle both signed_to_unsigned and unsigned_to_signed - we > should > > > > > just enforce a canonical order (like the optab does). > > > > > > > > Sure, I thought it would have been better to change the order at > > > > expand time, but can do so at detection time. > > > > > > > > > I also think it's a particular bad fit for the bad > > > > > optab_for_tree_code API - would any of that improve when using a > > > > > direct internal function here? > > > > > > > > Somewhat, but this has considerable knock on effects, e.g. currentl= y > > > > DOT_PROD is treated as a widening operation and so is handled by > > > > supportable_widening_operation which does not support calls. There'= s > a > > > > significant number of places which work on the tree EXPR (including > > > constant folding) which all need to be changed. > > > > > > > > > In particular all the changes around optab_subtype look like they > > > > > make a bad API worse ... at least a single optab_vector_mixed_sig= n > > > > > should suffice here, no need to make it a flags kind. > > > > > > > > The reason I did so is because depending on where the query is done= it > > > > does use different subtypes currently. During detection it uses > > > > optab_default, and during vectorization optab_vector. For this > > > > instruction this difference doesn't seem to be used, but did not wa= nt to > > > lose this information in case something depended on it. > > > > > > > > But can make it just one. > > > > > > > > > > > > > > + /* If we have a sign changing dot product we need to check tha= t > the > > > > > + promoted type if unsigned has at least the same precision a= s > > > > > + the > > > > > final > > > > > + type of the dot-product. */ > > > > > + if (subtype !=3D optab_default) > > > > > + { > > > > > + tree mult_type =3D TREE_TYPE (unprom_mult.op); > > > > > + if (TYPE_SIGN (mult_type) =3D=3D UNSIGNED > > > > > + && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type)) > > > > > + return NULL; > > > > > + } > > > > > > > > > > I don't understand this - how do we ever arrive at a result with = less > > > precision? > > > > > > > > The user could have manually truncated the results, i.e. in the > > > > detection code notice `mult` > > > > > > > > int av =3D a[i]; > > > > int bv =3D b[i]; > > > > SIGNEDNESS_2 short mult =3D av * bv; > > > > res +=3D mult; > > > > > > > > which is a short, so it's manually truncating the multiplication wh= ich > > > > is done as int by the instruction. If `mult` is unsigned then it wi= ll > > > > truncate the result if the signed input to usdot was negative, unle= ss > > > > the Intermediate calculation is of the same precision as the > > > > instruction. i.e. if mult is unsigned int then there's no truncatio= n > > > > going on, it's casting from int to unsigned int so it's safe to use > > > > then as the instruction does the same thing internally. > > > > > > It looks to me that we simply should only ever allow sing-changes fro= m > > > multiplication result to the sum. At least your example above is not > special to > > > mixed sign multiplications, no? > > > > > > > > And why's this not an issue for signed multiplication? > > > > > > > > It is, but in that case it's handled by the type jousting, which > > > > doesn't allow the type mismatch. i.e. > > > > > > > > #define SIGNEDNESS_1 unsigned > > > > #define SIGNEDNESS_2 unsigned > > > > #define SIGNEDNESS_3 signed > > > > #define SIGNEDNESS_4 signed > > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, > > > > SIGNEDNESS_3 char *restrict a, > > > > SIGNEDNESS_4 char *restrict b) > > > > { > > > > for (__INTPTR_TYPE__ i =3D 0; i < N; ++i) > > > > { > > > > int av =3D a[i]; > > > > int bv =3D b[i]; > > > > SIGNEDNESS_2 short mult =3D av * bv; > > > > res +=3D mult; > > > > } > > > > return res; > > > > } > > > > > > > > Is also not detected as a dot product. By adding the carve out to = the > > > > widen multiplication detection it now allows this case through so I > > > > handle it in the detection code. Thinking about it now, it seems m= ore > > > > logical to add this case handling inside the type jousting code as = I > > > > don't think it's ever something you'd want. > > > > > > Yeah, I think we only need to look through sign changes on the > multiplication > > > result. > > > > > > > > Also... > > > > > > > > > > + /* If we have a sign changing dot-product the dot-product itse= lf > > > > > + does > > > > > any > > > > > + sign conversions, so consume the type and use the unpromote= d > > > types. > > > > > */ > > > > > + tree mult_arg1, mult_arg2; > > > > > + if (subtype =3D=3D optab_default) > > > > > + { > > > > > + mult_arg1 =3D mult_oprnd[0]; > > > > > + mult_arg2 =3D mult_oprnd[1]; > > > > > + } > > > > > + else > > > > > + { > > > > > + mult_arg1 =3D unprom0[0].op; > > > > > + mult_arg2 =3D unprom0[1].op; > > > > > + } > > > > > pattern_stmt =3D gimple_build_assign (var, DOT_PROD_EXPR, > > > > > - mult_oprnd[0], mult_oprnd[1= ], > > > > > oprnd1); > > > > > + mult_arg1, mult_arg2, oprnd= 1); > > > > > > > > > > I thought DOT_PROD always performs the promotion. Maybe > > > mult_oprnd > > > > > and unprom0 are just misnamed here? > > > > > > > > Somewhat, in a normal dot-product the sign of the multiplication ar= e > > > > the same here as the "unpromoted" types. So after > vect_convert_input > > > > these two types are the same. > > > > > > > > However because here the sign changes and to maintain the semantics > of > > > > the C code there's an extra conversion here to get the arguments in > > > > the same sign. That needs to be stripped before given to the > > > > instruction which does the conversion internally. > > > > > > Yes, but then why's that not done by the detection code? That is, do= es it > > > (mis-)handle the (int)short_a * (int)(unsigned short)short_b where we= 'd > > > want the mixed-sign handling and not strip the unsigned short convers= ion > > > from short_b? > > > > > > Richard. > > > > > > > > > > > Regards, > > > > Tamar > > > > > > > > > > > > > > Richard. > > > > > > > > > > > Regards, > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > The tree.def docs say the sum is also possibly widening but I > > > > > > > don't see this covered by the optab so we should eventually > > > > > > > remove this feature from the tree side. In fact the tree-cfg= .c > > > > > > > verifier requires the addition to be not widening - thus only > > > > > > > tree.def needs > > > > > adjustment. > > > > > > > > > > > > > > > @cindex @code{ssad@var{m}} instruction pattern @item > > > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h > > > > > > > > b/gcc/optabs-tree.h index > > > > > > > > > > > > > > > > > > > > > > > > c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f > > > > > > > 19 > > > > > > > > 90e0548ba08d 100644 > > > > > > > > --- a/gcc/optabs-tree.h > > > > > > > > +++ b/gcc/optabs-tree.h > > > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3. > If > > > > > > > > not > > > > > see > > > > > > > > shift amount vs. machines that take a vector for the sh= ift > amount. > > > > > > > > */ enum optab_subtype { > > > > > > > > - optab_default, > > > > > > > > - optab_scalar, > > > > > > > > - optab_vector > > > > > > > > + optab_default =3D 1 << 0, > > > > > > > > + optab_scalar =3D 1 << 1, > > > > > > > > + optab_vector =3D 1 << 2, > > > > > > > > + optab_signed_to_unsigned =3D 1 << 3, > > > > > > > > + optab_unsigned_to_signed =3D > > > > > > > > + 1 << 4 > > > > > > > > }; > > > > > > > > > > > > > > > > +/* Override the OrEqual-operator so we can use > optab_subtype > > > > > > > > +as a bit flag. */ inline enum optab_subtype& operator |= =3D > > > > > > > > +(enum > > > > > > > optab_subtype& > > > > > > > > +a, enum optab_subtype b) { > > > > > > > > + return a =3D static_cast(static_cast(a) > > > > > > > > + | static_cast(b)); } > > > > > > > > + > > > > > > > > +/* Override the Or-operator so we can use optab_subtype as= a > > > > > > > > +bit flag. */ inline enum optab_subtype operator | (enum > > > > > > > > +optab_subtype a, enum optab_subtype b) { > > > > > > > > + return static_cast(static_cast(a) > > > > > > > > + | static_cast(b)); } > > > > > > > > + > > > > > > > > /* Return the optab used for computing the given operation= on > > > > > > > > the type > > > > > > > given by > > > > > > > > the second argument. The third argument distinguishes > > > > > > > > between the > > > > > > > types of > > > > > > > > vector shifts and rotates. */ diff --git > > > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index > > > > > > > > > > > > > > > > > > > > > > > > 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea > > > > > > > 1e > > > > > > > > 5c22b7453072 100644 > > > > > > > > --- a/gcc/optabs-tree.c > > > > > > > > +++ b/gcc/optabs-tree.c > > > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code > > > code, > > > > > > > const_tree type, > > > > > > > > return TYPE_UNSIGNED (type) ? usum_widen_optab : > > > > > > > > ssum_widen_optab; > > > > > > > > > > > > > > > > case DOT_PROD_EXPR: > > > > > > > > - return TYPE_UNSIGNED (type) ? udot_prod_optab : > > > > > sdot_prod_optab; > > > > > > > > + { > > > > > > > > + gcc_assert (subtype & optab_default > > > > > > > > + || subtype & optab_vector > > > > > > > > + || subtype & optab_signed_to_unsigned > > > > > > > > + || subtype & optab_unsigned_to_signed); > > > > > > > > + > > > > > > > > + if (subtype & (optab_unsigned_to_signed | > > > > > > > optab_signed_to_unsigned)) > > > > > > > > + return usdot_prod_optab; > > > > > > > > + > > > > > > > > + return (TYPE_UNSIGNED (type) ? udot_prod_optab : > > > > > > > sdot_prod_optab); > > > > > > > > + } > > > > > > > > > > > > > > > > case SAD_EXPR: > > > > > > > > return TYPE_UNSIGNED (type) ? usad_optab : ssad_opta= b; > > > > > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c index > > > > > > > > > > > > > > > > > > > > > > > > f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac > > > > > > > 67 > > > > > > > > 8597c0d00098 100644 > > > > > > > > --- a/gcc/optabs.c > > > > > > > > +++ b/gcc/optabs.c > > > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops > ops, > > > > > > > > rtx op0, > > > > > > > rtx op1, rtx wide_op, > > > > > > > > bool sbool =3D false; > > > > > > > > > > > > > > > > oprnd0 =3D ops->op0; > > > > > > > > + if (nops >=3D 2) > > > > > > > > + oprnd1 =3D ops->op1; > > > > > > > > + if (nops >=3D 3) > > > > > > > > + oprnd2 =3D ops->op2; > > > > > > > > + > > > > > > > > tmode0 =3D TYPE_MODE (TREE_TYPE (oprnd0)); > > > > > > > > if (ops->code =3D=3D VEC_UNPACK_FIX_TRUNC_HI_EXPR > > > > > > > > || ops->code =3D=3D VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@= - > > > 285,6 > > > > > > > +290,27 > > > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, > > > > > > > > rtx > > > > > > > wide_op, > > > > > > > > ? vec_unpacks_sbool_hi_optab : > > > vec_unpacks_sbool_lo_optab); > > > > > > > > sbool =3D true; > > > > > > > > } > > > > > > > > + else if (ops->code =3D=3D DOT_PROD_EXPR) > > > > > > > > + { > > > > > > > > + enum optab_subtype subtype =3D optab_default; > > > > > > > > + signop sign1 =3D TYPE_SIGN (TREE_TYPE (oprnd0)); > > > > > > > > + signop sign2 =3D TYPE_SIGN (TREE_TYPE (oprnd1)); > > > > > > > > + if (sign1 =3D=3D sign2) > > > > > > > > + ; > > > > > > > > + else if (sign1 =3D=3D SIGNED && sign2 =3D=3D UNSIGNE= D) > > > > > > > > + { > > > > > > > > + subtype |=3D optab_signed_to_unsigned; > > > > > > > > + /* Same as optab_unsigned_to_signed but flip the > > > operands. */ > > > > > > > > + std::swap (op0, op1); > > > > > > > > + } > > > > > > > > + else if (sign1 =3D=3D UNSIGNED && sign2 =3D=3D SIGNE= D) > > > > > > > > + subtype |=3D optab_unsigned_to_signed; > > > > > > > > + else > > > > > > > > + gcc_unreachable (); > > > > > > > > + > > > > > > > > + widen_pattern_optab > > > > > > > > + =3D optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), > > > subtype); > > > > > > > > + } > > > > > > > > else > > > > > > > > widen_pattern_optab > > > > > > > > =3D optab_for_tree_code (ops->code, TREE_TYPE (oprnd= 0), > > > > > > > > optab_default); @@ -298,10 +324,7 @@ > > > expand_widen_pattern_expr > > > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op, > > > > > > > > gcc_assert (icode !=3D CODE_FOR_nothing); > > > > > > > > > > > > > > > > if (nops >=3D 2) > > > > > > > > - { > > > > > > > > - oprnd1 =3D ops->op1; > > > > > > > > - tmode1 =3D TYPE_MODE (TREE_TYPE (oprnd1)); > > > > > > > > - } > > > > > > > > + tmode1 =3D TYPE_MODE (TREE_TYPE (oprnd1)); > > > > > > > > else if (sbool) > > > > > > > > { > > > > > > > > nops =3D 2; > > > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops > ops, > > > rtx > > > > > > > > op0, > > > > > > > rtx op1, rtx wide_op, > > > > > > > > { > > > > > > > > gcc_assert (tmode1 =3D=3D tmode0); > > > > > > > > gcc_assert (op1); > > > > > > > > - oprnd2 =3D ops->op2; > > > > > > > > wmode =3D TYPE_MODE (TREE_TYPE (oprnd2)); > > > > > > > > } > > > > > > > > > > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index > > > > > > > > > > > > > > > > > > > > > > > > b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd > > > > > > > b7c > > > > > > > > 18615baae928 100644 > > > > > > > > --- a/gcc/optabs.def > > > > > > > > +++ b/gcc/optabs.def > > > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, > "uavg$a3_ceil") > > > > > > > OPTAB_D > > > > > > > > (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D > > > (ssum_widen_optab, > > > > > > > > "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, > > > "udot_prod$I$a") > > > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a") > > > > > > > > OPTAB_D (usum_widen_optab, "widen_usum$I$a3") > OPTAB_D > > > > > > > (usad_optab, > > > > > > > > "usad$I$a") OPTAB_D (ssad_optab, "ssad$I$a") diff --git > > > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index > > > > > > > > > > > > > > > > > > > > > > > > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb > > > > > > > 00 > > > > > > > > 808fd2678b42 100644 > > > > > > > > --- a/gcc/tree-cfg.c > > > > > > > > +++ b/gcc/tree-cfg.c > > > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign > > > *stmt) > > > > > > > > && !SCALAR_FLOAT_TYPE_P (rhs1_type)) > > > > > > > > || (!INTEGRAL_TYPE_P (lhs_type) > > > > > > > > && !SCALAR_FLOAT_TYPE_P (lhs_type)))) > > > > > > > > - || !types_compatible_p (rhs1_type, rhs2_type) > > > > > > > > + || (!types_compatible_p (rhs1_type, rhs2_type) > > > > > > > > + && TYPE_SIGN (rhs1_type) =3D=3D TYPE_SIGN > > > (rhs2_type)) > > > > > > > > > > > > > > That's not restrictive enough. I suggest you use > > > > > > > > > > > > > > && element_precision (rhs1_type) !=3D > > > > > > > element_precision > > > > > > > (rhs2_type) > > > > > > > > > > > > > > instead. > > > > > > > > > > > > > > As said, I'm not sure all the changes in this patch are requi= red. > > > > > > > > > > > > > > Please elaborate. > > > > > > > > > > > > > > Thanks, > > > > > > > Richard. > > > > > > > > > > > > > > > || !useless_type_conversion_p (lhs_type, rhs3_type) > > > > > > > > || maybe_lt (GET_MODE_SIZE (element_mode > > > (rhs3_type)), > > > > > > > > 2 * GET_MODE_SIZE (element_mode > > > (rhs1_type)))) > > > > > > > diff --git > > > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index > > > > > > > > > > > > > > > > > > > > > > > > 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1 > > > > > > > 9f > > > > > > > > ec29ec6e4176 100644 > > > > > > > > --- a/gcc/tree-vect-loop.c > > > > > > > > +++ b/gcc/tree-vect-loop.c > > > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum > tree_code > > > > > code, > > > > > > > tree vop[3], tree mask, > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > +/* Determine the optab_subtype to use for the given CODE > and > > > STMT. > > > > > > > For > > > > > > > > + most CODE this will be optab_vector, however for certai= n > > > > > > > > + operations > > > > > > > such as > > > > > > > > + DOT_PROD_EXPR where the operation can different signs f= or > > > > > > > > + the > > > > > > > operands we > > > > > > > > + need to be able to pick the right optabs. */ > > > > > > > > + > > > > > > > > +static enum optab_subtype > > > > > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info > > > > > > > > +stmt_vinfo) { > > > > > > > > + enum optab_subtype subtype =3D optab_vector; > > > > > > > > + switch (code) > > > > > > > > + { > > > > > > > > + case DOT_PROD_EXPR: > > > > > > > > + { > > > > > > > > + gassign *stmt =3D as_a (STMT_VINFO_STMT > > > (stmt_vinfo)); > > > > > > > > + signop rhs1_sign =3D TYPE_SIGN (TREE_TYPE > > > > > > > > +(gimple_assign_rhs1 > > > > > > > (stmt))); > > > > > > > > + signop rhs2_sign =3D TYPE_SIGN (TREE_TYPE > > > > > > > > +(gimple_assign_rhs2 > > > > > > > (stmt))); > > > > > > > > + if (rhs1_sign !=3D rhs2_sign) > > > > > > > > + subtype |=3D optab_unsigned_to_signed; > > > > > > > > + break; > > > > > > > > + } > > > > > > > > + default: > > > > > > > > + break; > > > > > > > > + } > > > > > > > > + > > > > > > > > + return subtype; > > > > > > > > +} > > > > > > > > + > > > > > > > > /* Function vectorizable_reduction. > > > > > > > > > > > > > > > > Check if STMT_INFO performs a reduction operation that = can > > > > > > > > be > > > > > > > vectorized. > > > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info > > > > > > > loop_vinfo, > > > > > > > > bool ok =3D true; > > > > > > > > > > > > > > > > /* 4.1. check support for the operation in the loop = */ > > > > > > > > - optab optab =3D optab_for_tree_code (code, vectype_i= n, > > > > > optab_vector); > > > > > > > > + enum optab_subtype subtype =3D vect_determine_dot_ki= nd > > > > > > > > + (code, > > > > > > > stmt_info); > > > > > > > > + optab optab =3D optab_for_tree_code (code, vectype_i= n, > > > > > > > > + subtype); > > > > > > > > if (!optab) > > > > > > > > { > > > > > > > > if (dump_enabled_p ()) > > > > > > > > diff --git a/gcc/tree-vect-patterns.c > > > > > > > > b/gcc/tree-vect-patterns.c index > > > > > > > > > > > > > > > > > > > > > > > > 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f > > > > > > > a84 > > > > > > > > 942316846d5e 100644 > > > > > > > > --- a/gcc/tree-vect-patterns.c > > > > > > > > +++ b/gcc/tree-vect-patterns.c > > > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info > > > > > > > > *vinfo, tree > > > > > > > > var) static bool vect_supportable_direct_optab_p (vec_inf= o > > > > > > > > *vinfo, tree otype, tree_code code, > > > > > > > > tree itype, tree *vecotype_out, > > > > > > > > - tree *vecitype_out =3D NULL) > > > > > > > > + tree *vecitype_out =3D NULL, > > > > > > > > + enum optab_subtype subtype =3D > > > > > > > optab_default) > > > > > > > > { > > > > > > > > tree vecitype =3D get_vectype_for_scalar_type (vinfo, it= ype); > > > > > > > > if (!vecitype) > > > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p > (vec_info > > > > > > > > *vinfo, > > > > > > > tree otype, tree_code code, > > > > > > > > if (!vecotype) > > > > > > > > return false; > > > > > > > > > > > > > > > > - optab optab =3D optab_for_tree_code (code, vecitype, > > > > > > > > optab_default); > > > > > > > > + optab optab =3D optab_for_tree_code (code, vecitype, > > > > > > > > + subtype); > > > > > > > > if (!optab) > > > > > > > > return false; > > > > > > > > > > > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree > type, > > > > > > > > bool shift_p, tree op, } > > > > > > > > > > > > > > > > /* Return true if the common supertype of NEW_TYPE and > > > > > > > *COMMON_TYPE > > > > > > > > - is narrower than type, storing the supertype in > *COMMON_TYPE > > > if > > > > > so. > > > > > > > */ > > > > > > > > + is narrower than type, storing the supertype in > > > > > > > > + *COMMON_TYPE if > > > > > so. > > > > > > > > + If ALLOW_SHORT_SIGN_MISMATCH then accept that > > > > > *COMMON_TYPE > > > > > > > and NEW_TYPE > > > > > > > > + may be of different signs but equal precision. */ > > > > > > > > > > > > > > > > static bool > > > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree > > > > > > > *common_type) > > > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree > > > > > > > *common_type, > > > > > > > > + bool allow_short_sign_mismatch =3D false) > > > > > > > > { > > > > > > > > if (types_compatible_p (*common_type, new_type)) > > > > > > > > return true; > > > > > > > > > > > > > > > > + /* Check if the mismatch is only in the sign and if we h= ave > > > > > > > > + allow_short_sign_mismatch then allow it. */ > > > > > > > > + if (allow_short_sign_mismatch > > > > > > > > + && TYPE_SIGN (*common_type) !=3D TYPE_SIGN (new_type= )) > > > > > > > > + { > > > > > > > > + bool sign =3D TYPE_SIGN (*common_type) =3D=3D UNSIGN= ED; > > > > > > > > + tree eq_type > > > > > > > > + =3D build_nonstandard_integer_type (TYPE_PRECISION > > > (new_type), > > > > > > > > + sign); > > > > > > > > + > > > > > > > > + if (types_compatible_p (*common_type, eq_type)) > > > > > > > > + return true; > > > > > > > > + } > > > > > > > > + > > > > > > > > /* See if *COMMON_TYPE can hold all values of NEW_TYPE. > */ > > > > > > > > if ((TYPE_PRECISION (new_type) < TYPE_PRECISION > > > (*common_type)) > > > > > > > > && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED > > > > > > > (*common_type))) > > > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, > tree > > > > > > > new_type, tree *common_type) > > > > > > > > to a type that (a) is narrower than the result of STMT_= INFO > and > > > > > > > > (b) can hold all leaf operand values. > > > > > > > > > > > > > > > > + If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs > of > > > > > > > > + the > > > > > > > operands > > > > > > > > + may differ in signs but not in precision. > > > > > > > > + > > > > > > > > Return 0 if STMT_INFO isn't such a tree, or if no such > > > COMMON_TYPE > > > > > > > > exists. */ > > > > > > > > > > > > > > > > @@ -539,7 +560,8 @@ static unsigned int > vect_widened_op_tree > > > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, > > > > > > > > tree_code widened_code, bool shift_p, > > > > > > > > unsigned int max_nops, > > > > > > > > - vect_unpromoted_value *unprom, tree > > > *common_type) > > > > > > > > + vect_unpromoted_value *unprom, tree > > > *common_type, > > > > > > > > + bool allow_short_sign_mismatch =3D false) > > > > > > > > { > > > > > > > > /* Check for an integer operation with the right code. = */ > > > > > > > > gassign *assign =3D dyn_cast (stmt_info->stm= t); > > > > > > > > @@ > > > > > > > > -600,7 > > > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, > > > stmt_vec_info > > > > > > > stmt_info, tree_code code, > > > > > > > > =3D vinfo->lookup_def (this_unprom->op); > > > > > > > > nops =3D vect_widened_op_tree (vinfo, def_stmt_info= , > > > code, > > > > > > > > widened_code, shift_p, > > > max_nops, > > > > > > > > - this_unprom, > > > common_type); > > > > > > > > + this_unprom, > > > common_type, > > > > > > > > + > > > allow_short_sign_mismatch); > > > > > > > > if (nops =3D=3D 0) > > > > > > > > return 0; > > > > > > > > > > > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo, > > > > > > > stmt_vec_info stmt_info, tree_code code, > > > > > > > > if (i =3D=3D 0) > > > > > > > > *common_type =3D this_unprom->type; > > > > > > > > else if (!vect_joust_widened_type (type, this_unpro= m- > > > >type, > > > > > > > > - common_type)) > > > > > > > > + common_type, > > > > > > > > + > > > allow_short_sign_mismatch)) > > > > > > > > return 0; > > > > > > > > } > > > > > > > > } > > > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p > (vec_info > > > > > > > > *vinfo, > > > > > > > > > > > > > > > > Try to find the following pattern: > > > > > > > > > > > > > > > > - type x_t, y_t; > > > > > > > > + type1a x_t > > > > > > > > + type1b y_t; > > > > > > > > TYPE1 prod; > > > > > > > > TYPE2 sum =3D init; > > > > > > > > loop: > > > > > > > > sum_0 =3D phi > > > > > > > > S1 x_t =3D ... > > > > > > > > S2 y_t =3D ... > > > > > > > > - S3 x_T =3D (TYPE1) x_t; > > > > > > > > - S4 y_T =3D (TYPE1) y_t; > > > > > > > > + S3 x_T =3D (TYPE3) x_t; > > > > > > > > + S4 y_T =3D (TYPE4) y_t; > > > > > > > > S5 prod =3D x_T * y_T; > > > > > > > > [S6 prod =3D (TYPE2) prod; #optional] > > > > > > > > S7 sum_1 =3D prod + sum_0; > > > > > > > > > > > > > > > > - where 'TYPE1' is exactly double the size of type 'type'= , and > 'TYPE2' > > > is > > > > > the > > > > > > > > - same size of 'TYPE1' or bigger. This is a special case = of a > reduction > > > > > > > > + where 'TYPE1' is exactly double the size of type 'type1= a' and > > > 'type1b', > > > > > > > > + the sign of 'TYPE1' must be one of 'type1a' or 'type1b'= but the > > > sign of > > > > > > > > + 'type1a' and 'type1b' can differ. 'TYPE2' is the same s= ize of > 'TYPE1' > > > or > > > > > > > > + bigger and must be the same sign. This is a special cas= e > > > > > > > > + of a reduction > > > > > > > > computation. > > > > > > > > > > > > > > > > Input: > > > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern > (vec_info > > > > > > > > *vinfo, > > > > > > > > > > > > > > > > /* Look for the following pattern > > > > > > > > DX =3D (TYPE1) X; > > > > > > > > - DY =3D (TYPE1) Y; > > > > > > > > + DY =3D (TYPE2) Y; > > > > > > > > DPROD =3D DX * DY; > > > > > > > > - DDPROD =3D (TYPE2) DPROD; > > > > > > > > + DDPROD =3D (TYPE3) DPROD; > > > > > > > > sum_1 =3D DDPROD + sum_0; > > > > > > > > In which > > > > > > > > - DX is double the size of X > > > > > > > > - DY is double the size of Y > > > > > > > > - DX, DY, DPROD all have the same type but the sign > > > > > > > > - between DX, DY and DPROD can differ. > > > > > > > > + between DX, DY and DPROD can differ. The sign of DP= ROD > > > > > > > > + is one of the signs of DX or DY. > > > > > > > > - sum is the same size of DPROD or bigger > > > > > > > > - sum has been recognized as a reduction variable. > > > > > > > > > > > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern > (vec_info > > > > > *vinfo, > > > > > > > > inside the loop (in case we are analyzing an outer-lo= op). */ > > > > > > > > vect_unpromoted_value unprom0[2]; > > > > > > > > if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, > > > > > > > WIDEN_MULT_EXPR, > > > > > > > > - false, 2, unprom0, &half_type)) > > > > > > > > + false, 2, unprom0, &half_type, true)) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > + /* Check to see if there is a sign change happening in t= he > > > > > > > > + operands of > > > > > > > the > > > > > > > > + multiplication and pick the appropriate optab subtype= . > > > > > > > > +*/ > > > > > > > > + enum optab_subtype subtype; > > > > > > > > + tree rhs_type1 =3D unprom0[0].type; > > > > > > > > + tree rhs_type2 =3D unprom0[1].type; > > > > > > > > + if (TYPE_SIGN (rhs_type1) =3D=3D TYPE_SIGN (rhs_type2)) > > > > > > > > + subtype =3D optab_default; > > > > > > > > + else if (TYPE_SIGN (rhs_type1) =3D=3D SIGNED > > > > > > > > + && TYPE_SIGN (rhs_type2) =3D=3D UNSIGNED) > > > > > > > > + subtype =3D optab_signed_to_unsigned; > > > > > > > > + else if (TYPE_SIGN (rhs_type1) =3D=3D UNSIGNED > > > > > > > > + && TYPE_SIGN (rhs_type2) =3D=3D SIGNED) > > > > > > > > + subtype =3D optab_unsigned_to_signed; > > > > > > > > + else > > > > > > > > + gcc_unreachable (); > > > > > > > > + > > > > > > > > + /* If we have a sign changing dot product we need to che= ck > that > > > the > > > > > > > > + promoted type if unsigned has at least the same > > > > > > > > + precision as the > > > > > final > > > > > > > > + type of the dot-product. */ > > > > > > > > + if (subtype !=3D optab_default) > > > > > > > > + { > > > > > > > > + tree mult_type =3D TREE_TYPE (unprom_mult.op); > > > > > > > > + if (TYPE_SIGN (mult_type) =3D=3D UNSIGNED > > > > > > > > + && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type)) > > > > > > > > + return NULL; > > > > > > > > + } > > > > > > > > + > > > > > > > > vect_pattern_detected ("vect_recog_dot_prod_pattern", > > > > > > > > last_stmt); > > > > > > > > > > > > > > > > tree half_vectype; > > > > > > > > if (!vect_supportable_direct_optab_p (vinfo, type, > > > > > > > > DOT_PROD_EXPR, > > > > > > > half_type, > > > > > > > > - type_out, &half_vectype)) > > > > > > > > + type_out, &half_vectype, > > > subtype)) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > /* Get the inputs in the appropriate types. */ @@ -1002= ,8 > > > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo, > > > > > > > > unprom0, half_vectype); > > > > > > > > > > > > > > > > var =3D vect_recog_temp_ssa_var (type, NULL); > > > > > > > > + > > > > > > > > + /* If we have a sign changing dot-product the dot-produc= t > > > > > > > > + itself does > > > > > any > > > > > > > > + sign conversions, so consume the type and use the > > > > > > > > + unpromoted types. */ tree mult_arg1, mult_arg2; if > > > > > > > > + (subtype =3D=3D > > > > > > > > + optab_default) > > > > > > > > + { > > > > > > > > + mult_arg1 =3D mult_oprnd[0]; > > > > > > > > + mult_arg2 =3D mult_oprnd[1]; > > > > > > > > + } > > > > > > > > + else > > > > > > > > + { > > > > > > > > + mult_arg1 =3D unprom0[0].op; > > > > > > > > + mult_arg2 =3D unprom0[1].op; > > > > > > > > + } > > > > > > > > pattern_stmt =3D gimple_build_assign (var, DOT_PROD_EXPR= , > > > > > > > > - mult_oprnd[0], mult_oprnd[1], > > > oprnd1); > > > > > > > > + mult_arg1, mult_arg2, oprnd1); > > > > > > > > > > > > > > > > return pattern_stmt; > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Richard Biener SUSE Software Solutions > > > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; > GF: > > > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg) > > > > > > > > > > > > > > > > -- > > > > > Richard Biener > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > > > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG > Nuernberg) > > > > > > > > > > -- > > > Richard Biener > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG Nuernberg) > > >=20 > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > Nuernberg, > Germany; GF: Felix Imend