From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=zn6w=4G=amd.com=Venkataramanan.Kumar@sourceware.org>
Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2057.outbound.protection.outlook.com [40.107.237.57])
	by sourceware.org (Postfix) with ESMTPS id 5FA893855153
	for <gcc-patches@gcc.gnu.org>; Thu,  8 Dec 2022 09:43:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5FA893855153
Authentication-Results: sourceware.org; dmarc=fail (p=quarantine dis=none) header.from=amd.com
Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=amd.com
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=aJh7QBREXeHnHkz6kXRDenw6wKV1U46ZBJmSY2VEeOSltg7qEVyDeLwCIdeowWXJ+jwUSrDMyne/QgqSse3C0BJIko+dqUAaQ8Kgh4/lYQOOdopbLr0GGs8VcBpH8iOJ/lpiF+27g6nAa1enUEWmZ/yGDLFROfJIm3uhwB5AGj5NIXkrxnPSew41SvXOMrrtLAmVyQ0bTCj9rKfg8CQfCOhDZyOdQ2wEzAVxp44AoOZN28YKxei7lcpfw6D4xCxKw2dtpBaAYlV46l3NobvBuY2xIWtrtLMuYeCPzv8AsPE9ZsbE7YOphj2TDEEFIDA+HfLVUmLRbHZy4TtHsmgi0Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=EXBxOkclhTp7fHN8DxuxZzvghaoiDVPQ4DdswF3+WiE=;
 b=Q5MUv/4UWoMZcr/u7Cr1QcQaXqLIkhYQbRAfe9TyZsCn02NaRyYRQIDdWwtVM/aOM0Oop1FRmCnM/XIKGAcER2ez0uxN4suoa3L40jJ0lX+S8aEEjemBUlaSp9FbzDmmR7vEX9lmQF3ZJk6PDD+43O8wGRQB0Jb/qBmcpwR0x6UgzpMnY+PZm83o+gro4AoNt4VKyPEb/whXeEYmG5sVBqVh0jWQ7HU3rvNdp/urd8xAjFMMHTMOqf+6SB3Exot6fGIcVs3gWFG3rcE0pQvBi9yITOZrvrFzGBTe8okFv5PXE83AnOdtg07BAT2O1lQedjHyyidBEvrkLhd+hzaHLA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass
 header.d=amd.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=EXBxOkclhTp7fHN8DxuxZzvghaoiDVPQ4DdswF3+WiE=;
 b=VrCqhmSUf74jWhYdf09iG8rNDhN9jT2kxI1LLggqwTd9Hk/R534YicOiNN4dyuQXFTxg9AY0yDJYP2YEmT9DQhM3jZs/zYc43Jj880bMrm365uvegoXCA+WWu7l256AqfIcm8mhoBqoOzlLn7C9U6DRsig6wUYXroEQdGwbDnmA=
Received: from DM6PR12MB3081.namprd12.prod.outlook.com (2603:10b6:5:38::27) by
 CY8PR12MB7290.namprd12.prod.outlook.com (2603:10b6:930:55::13) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.5880.14; Thu, 8 Dec 2022 09:43:16 +0000
Received: from DM6PR12MB3081.namprd12.prod.outlook.com
 ([fe80::223b:bdae:16c2:cd07]) by DM6PR12MB3081.namprd12.prod.outlook.com
 ([fe80::223b:bdae:16c2:cd07%4]) with mapi id 15.20.5880.014; Thu, 8 Dec 2022
 09:43:16 +0000
From: "Kumar, Venkataramanan" <Venkataramanan.Kumar@amd.com>
To: Jan Hubicka <hubicka@ucw.cz>, "gcc-patches@gcc.gnu.org"
	<gcc-patches@gcc.gnu.org>, "mjambor@suse.cz" <mjambor@suse.cz>, Alexander
 Monakov <amonakov@ispras.ru>, "Joshi, Tejas Sanjay"
	<TejasSanjay.Joshi@amd.com>
Subject: RE: Zen4 tuning part 1 - cost tables
Thread-Topic: Zen4 tuning part 1 - cost tables
Thread-Index: AQHZCVmkR41aEbKUT0i2CNBp8Q256K5juXLA
Date: Thu, 8 Dec 2022 09:43:16 +0000
Message-ID:
 <DM6PR12MB308132008253D20AD69C61288F1D9@DM6PR12MB3081.namprd12.prod.outlook.com>
References: <Y48S1d7kqcbRhfJ3@kam.mff.cuni.cz>
In-Reply-To: <Y48S1d7kqcbRhfJ3@kam.mff.cuni.cz>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels:
 MSIP_Label_4342314e-0df4-4b58-84bf-38bed6170a0f_ActionId=ae892748-8150-4e57-8b85-669b776f9474;MSIP_Label_4342314e-0df4-4b58-84bf-38bed6170a0f_ContentBits=0;MSIP_Label_4342314e-0df4-4b58-84bf-38bed6170a0f_Enabled=true;MSIP_Label_4342314e-0df4-4b58-84bf-38bed6170a0f_Method=Standard;MSIP_Label_4342314e-0df4-4b58-84bf-38bed6170a0f_Name=General;MSIP_Label_4342314e-0df4-4b58-84bf-38bed6170a0f_SetDate=2022-12-08T09:18:45Z;MSIP_Label_4342314e-0df4-4b58-84bf-38bed6170a0f_SiteId=3dd8961f-e488-4e60-8e11-a82d994e183d;
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=amd.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: DM6PR12MB3081:EE_|CY8PR12MB7290:EE_
x-ms-office365-filtering-correlation-id: bc4b24f7-dd05-43ce-84bf-08dad900a1ab
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info:
 mKaSedmN/TWGl/5YTiJAz2zpilq45Bmqc/2ZvJovbDIYWKXr/LeNoD0sLhPb7j8BJs4xLloJ0Ju0ksnuaMz8Ug4MXHxE/dsc32UZ6YjsEyehFiOFGoZlsCUgDxs8TpP8/qI9sbtCpl18zaCZyK5OyvxqpapVUbTLe2KU+/P8ryQm89IKY4czsWmk9qM/T5KmWs4ddCKMxUvySegYVAfkYAIvz2k2mRxYbo8C37Bx7a94BRkUzKKdxs4YmXdj0kX6wcjFNCmuwCvWWuinqpy0sa8V33xNUZGCLCS2G0G/J2m7DOIimeE5HZe68WJX7jkuHycCpDniwqsOB76Df63XxRSWDhVZ53L3kPMrDlfSJgBz+xTUXeP97Hx/B9+2cm0lI72CL3NqojQrk9Qi+685Okql2/DRAx9DmowW94raJFUIyY3pkJP7uS2gXeebLx//O60qAqr7fif4kzMXBOob4HsURbesT+ZkzKVA3XGppFPN2NbRzGa/et+C9rE3IrX8f0DqbCkT62r9Uxe66RUMp2y9keCEwqbEKEnGKUKSwtC8Pe9qoh/BYuQbyIBW8FM7wMNYIJmsWxiuJ+TnwbIhuDnRWmL7ky8uGuENMrnLae4B8j/pIIvWPX3TGghuDuzHV0AGAXgvsH4BSdNIcxJOpSLtYPUZBtCnLYrEr4JyYi0qhV49PTt620CfzYisBX2VPUghHzvYBto8HFkGdRGGUg==
x-forefront-antispam-report:
 CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR12MB3081.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(376002)(396003)(39860400002)(136003)(346002)(366004)(451199015)(83380400001)(38070700005)(86362001)(2906002)(122000001)(38100700002)(8936002)(52536014)(55016003)(5660300002)(71200400001)(186003)(26005)(9686003)(66446008)(7696005)(6506007)(53546011)(66946007)(66556008)(66476007)(64756008)(76116006)(478600001)(8676002)(41300700001)(110136005)(316002)(6636002)(66899015)(33656002);DIR:OUT;SFP:1101;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0:
 =?us-ascii?Q?fU83JKL8aaFyMR3F78rjRCcJzdBFi0p6q48s+Hor5ZerXiwg+yzKnQjoEYcY?=
 =?us-ascii?Q?BgoOOqjdjNhPn0iQlNpxxbvq4JCVUEHmFGXbUqjEhLAaGi3/U2PbrVGuSFDZ?=
 =?us-ascii?Q?Dvo+24TIZWgTTFo7ZicRGdRAuflHUf6IHKXPX0owTXsh6QxXnRHQ3RJ0nkVq?=
 =?us-ascii?Q?i6D3RuI9veHIUCQJs1e6pjcqpGQQIURAaXGpF75ssfmz/+LuhgAthpwWvqr4?=
 =?us-ascii?Q?f2W7er7wHH4pa5ty/Wqsm6B1CXCpYj0x0UdFAJfUyqh7d9yRbndJQwqaP08m?=
 =?us-ascii?Q?9JPAKI5yMA+qiouTQiXMg3HqRmYKtJpp9i6WzcBx/2z9dfYfo0U++stCa+yw?=
 =?us-ascii?Q?s1ueI5m+5g/JJJxxj21uQSm0O1ZZd3lMLPox+KC6nwZNDPHtab5eYiEYZGac?=
 =?us-ascii?Q?PHfpv8gvUjdnPDkJdeuourdadcQlmNn6Lps3eP4jfUKbOBqlroxP7Wa5cNB2?=
 =?us-ascii?Q?4amMAuUzCFxngWiQvCb2T7aYBTlpL5uRw38g9JEA2RkYM8Mt7DvBwzrJkswy?=
 =?us-ascii?Q?PE6zZ8LqTfewLm4S3Y4qveMS0EB1T3dAC0OFxuaqjHOOBkdPV0txFg+Du9AV?=
 =?us-ascii?Q?Vy6EPWaqIoQNVq7PBF/nC0ch52sZYiuVe0ZowE5TTyZeV2XbP6ab+xv5jZZo?=
 =?us-ascii?Q?9ZG5a/COjymKpoB6M90WbK+1PcbeZ5sj6C9lCf83jfZprxdvwC76dMGtj6te?=
 =?us-ascii?Q?kuLAot+rQNhBxYdkgJm9giNPLcRZi1DL6I3rklN2qoei8jLegyKOibbm7YjY?=
 =?us-ascii?Q?9/2w/LqZwHxXxtilKWD6sJwJbrzOPDmgnZpX4H9gMleArU60+4c4EaxTlCp7?=
 =?us-ascii?Q?UwRkyBX7UKRvCYU/AXw2kVuZZgIrWfohVNmt6Zr9YuLd7aJ9vygm3MYV39XS?=
 =?us-ascii?Q?G9JF4e0PjEP0qP1mnxSFBmNldSrBmn4GpzYfkIAdPjOm5Uyn7wA/dR+HRaE7?=
 =?us-ascii?Q?LZw+1YSq/38zCbXuyqzAxaxi1hj+ZaHz6o0SH6n7zzRRFlXOEySU29H/wr/8?=
 =?us-ascii?Q?GpkbS4JMzbIWOiQavEmJ2t88W2wevxxMbP2NZ/yckYl2w3LKdp7cgVC3a2oH?=
 =?us-ascii?Q?rOlR8DZIIH+XI1bxQvAmBcH7aDC7XNTFfP+i6mP3lltt36I+Ee87pxba5Yeb?=
 =?us-ascii?Q?961qGE0NdyqT1Bm9Qa9KjCoDHPqdjkHtF59ixOG1GtCA+6X5KZeuxZDeR8px?=
 =?us-ascii?Q?UQ5FOPfR98ec3lKgYVoIv27Ku0BGggI4DoEYoFiVzU4gNnwXFZ3wQWrum4UG?=
 =?us-ascii?Q?zCpIqQYEsvHK7HwcFxEDoRV7TkKoG+0lgiifaRJpnrf6/Ql+oe1eQeBexQA0?=
 =?us-ascii?Q?XU6qVcddzoVjrezrfZlMHHod+uLgiwmZek1HlTIttFLtTgJvH1cJf5YomgV5?=
 =?us-ascii?Q?+ikMN8jOpMMEbWzfkMv1s2Ftz01FOg7wxPcGO0yaPKnVoaIXTleTvVaeqGcS?=
 =?us-ascii?Q?XRlMgoEQo6L8C7kIfGGuq8+76dctZmhiteryKmI2js8xopaRKeJk231j9Nvu?=
 =?us-ascii?Q?LKGtHvQRsUPX+aWvzpyD4PO1svNQGN82/Npo5rkqr9+fxsyWEbDzCHQgIp+F?=
 =?us-ascii?Q?NpkLe8sDGumCZYSIQOg=3D?=
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: amd.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: DM6PR12MB3081.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: bc4b24f7-dd05-43ce-84bf-08dad900a1ab
X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Dec 2022 09:43:16.1790
 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: uOvxeRKiTJMg1MfKhae7TQk3lq4MF74VZYvIaiW71iYFaFjkRcUatexyUQsWYKgK
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR12MB7290
X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

[AMD Official Use Only - General]

Hi Honza,

Thank you for posting the tuning patch.

> -----Original Message-----
> From: Jan Hubicka <hubicka@ucw.cz>
> Sent: Tuesday, December 6, 2022 3:31 PM
> To: gcc-patches@gcc.gnu.org; mjambor@suse.cz; Alexander Monakov
> <amonakov@ispras.ru>; Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com>; Joshi, Tejas Sanjay
> <TejasSanjay.Joshi@amd.com>
> Subject: Zen4 tuning part 1 - cost tables
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> Hi
> this patch updates cost of znver4 mostly based on data measued by Agner
> Fog.
> Compared to previous generations x87 became bit slower which is probably
> not big deal (and we have minimal benchmarking coverage for it).  One
> interesting improvement is reducation of FMA cost.  I also updated costs =
of
> AVX256 loads/stores  based on latencies (not throughput which is twice of
> avx256).
> Overall AVX512 vectorization seems to improve noticeably some of TSVC
> benchmarks but since internally 512 vectors are split to 256 vectors it i=
s
> somewhat risky and does not win in SPEC scores (mostly by regressing
> benchmarks with loop that have small trip count like x264 and exchange), =
so
> for now I am going to set AVX256_OPTIMAL tune but I am still playing with=
 it.
> We improved since ZNVER1 on choosing vectorization size and also have
> vectorized prologues/epilogues so it may be possible to make avx512 small
> win overall.

I also noted improvements to TSVC benchmarks when we enable AVX512 vectoriz=
ation.  I think we should allow full AVX512 bit vectorization for znver4.  =
 Even if the 512 vectors are broken into two 256 vectors we can pipeline th=
e higher half immediately in the next cycle.  Also we have less instruction=
s to decode with avx512 instructions.  Overall AVX512 operations should be =
better.

>
> In general I would like to keep cost tables latency based unless we have =
a
> good reason to not do so.  There are some interesting diferences in
> znver3 tables that I also patched and seems performance neutral.  I will =
send
> that separately.
>
> Bootstrapped/regtested x86_64-linux, also benchmarked on SPEC2017 along
> with AVX512 tuning.  I plan to commit it tomorrow unless there are some
> comments.
>
> Honza
>
>         * x86-tune-costs.h (znver4_cost): Upate costs of FP and SSE moves=
,
>         division multiplication, gathers, L2 cache size, and more complex
>         FP instrutions.
> diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-
> costs.h
> index f01b8ee9eef..3a6ce02f093 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -1867,9 +1868,9 @@ struct processor_costs znver4_cost =3D {
>    {8, 8, 8},                           /* cost of storing integer
>                                            registers.  */
>    2,                                   /* cost of reg,reg fld/fst.  */
> -  {6, 6, 16},                          /* cost of loading fp registers
> +  {14, 14, 17},                                /* cost of loading fp reg=
isters
>                                            in SFmode, DFmode and XFmode. =
 */
> -  {8, 8, 16},                          /* cost of storing fp registers
> +  {12, 12, 16},                                /* cost of storing fp reg=
isters
>                                            in SFmode, DFmode and XFmode. =
 */
>    2,                                   /* cost of moving MMX register.  =
*/
>    {6, 6},                              /* cost of loading MMX registers
> @@ -1878,13 +1879,13 @@ struct processor_costs znver4_cost =3D {
>                                            in SImode and DImode.  */
>    2, 2, 3,                             /* cost of moving XMM,YMM,ZMM
>                                            register.  */
> -  {6, 6, 6, 6, 12},                    /* cost of loading SSE registers
> +  {6, 6, 10, 10, 12},                  /* cost of loading SSE registers
>                                            in 32,64,128,256 and 512-bit. =
 */
> -  {8, 8, 8, 8, 16},                    /* cost of storing SSE registers
> +  {8, 8, 8, 12, 12},                   /* cost of storing SSE registers
>                                            in 32,64,128,256 and 512-bit. =
 */
> -  6, 6,                                        /* SSE->integer and integ=
er->SSE
> +  6, 8,                                        /* SSE->integer and integ=
er->SSE
>                                            moves.  */
> -  8, 8,                                /* mask->integer and integer->mas=
k moves */
> +  8, 8,                                        /* mask->integer and inte=
ger->mask moves */
>    {6, 6, 6},                           /* cost of loading mask register
>                                            in QImode, HImode, SImode.  */
>    {8, 8, 8},                           /* cost if storing mask register
> @@ -1894,6 +1895,7 @@ struct processor_costs znver4_cost =3D {
>    },
>
>    COSTS_N_INSNS (1),                   /* cost of an add instruction.  *=
/
> +  /* TODO: Lea with 3 components has cost 2.  */
>    COSTS_N_INSNS (1),                   /* cost of a lea instruction.  */
>    COSTS_N_INSNS (1),                   /* variable shift costs.  */
>    COSTS_N_INSNS (1),                   /* constant shift costs.  */
> @@ -1904,11 +1906,11 @@ struct processor_costs znver4_cost =3D {
>     COSTS_N_INSNS (3)},                 /*                      other.  *=
/
>    0,                                   /* cost of multiply per each bit
>                                            set.  */
> -  {COSTS_N_INSNS (9),                  /* cost of a divide/mod for QI.  =
*/
> -   COSTS_N_INSNS (10),                 /*                          HI.  =
*/
> -   COSTS_N_INSNS (12),                 /*                          SI.  =
*/
> -   COSTS_N_INSNS (17),                 /*                          DI.  =
*/
> -   COSTS_N_INSNS (17)},                        /*                       =
   other.  */
> +  {COSTS_N_INSNS (12),                 /* cost of a divide/mod for QI.  =
*/
> +   COSTS_N_INSNS (13),                 /*                          HI.  =
*/
> +   COSTS_N_INSNS (13),                 /*                          SI.  =
*/
> +   COSTS_N_INSNS (18),                 /*                          DI.  =
*/
> +   COSTS_N_INSNS (18)},                        /*                       =
   other.  */
>    COSTS_N_INSNS (1),                   /* cost of movsx.  */
>    COSTS_N_INSNS (1),                   /* cost of movzx.  */
>    8,                                   /* "large" insn.  */
> @@ -1919,22 +1921,22 @@ struct processor_costs znver4_cost =3D {
>                                            Relative to reg-reg move (2). =
 */
>    {8, 8, 8},                           /* cost of storing integer
>                                            registers.  */
> -  {6, 6, 6, 6, 12},                    /* cost of loading SSE registers
> +  {6, 6, 10, 10, 12},                  /* cost of loading SSE registers
>                                            in 32bit, 64bit, 128bit, 256bi=
t and 512bit */
> -  {8, 8, 8, 8, 16},                    /* cost of storing SSE register
> +  {8, 8, 8, 12, 12},                   /* cost of storing SSE register
>                                            in 32bit, 64bit, 128bit, 256bi=
t and 512bit */
> -  {6, 6, 6, 6, 12},                    /* cost of unaligned loads.  */
> -  {8, 8, 8, 8, 16},                    /* cost of unaligned stores.  */
> -  2, 2, 3,                             /* cost of moving XMM,YMM,ZMM
> +  {6, 6, 6, 6, 6},                     /* cost of unaligned loads.  */
> +  {8, 8, 8, 8, 8},                     /* cost of unaligned stores.  */
> +  2, 2, 2,                             /* cost of moving XMM,YMM,ZMM
>                                            register.  */
>    6,                                   /* cost of moving SSE register to=
 integer.  */
> -  /* VGATHERDPD is 15 uops and throughput is 4, VGATHERDPS is 23 uops,
> -     throughput 9.  Approx 7 uops do not depend on vector size and every
> load
> -     is 4 uops.  */
> -  14, 8,                               /* Gather load static, per_elt.  =
*/
> -  14, 10,                              /* Gather store static, per_elt. =
 */
> +  /* VGATHERDPD is 17 uops and throughput is 4, VGATHERDPS is 24 uops,
> +     throughput 5.  Approx 7 uops do not depend on vector size and every
> load
> +     is 5 uops.  */
> +  14, 10,                              /* Gather load static, per_elt.  =
*/
> +  14, 20,                              /* Gather store static, per_elt. =
 */
>    32,                                  /* size of l1 cache.  */
> -  512,                                 /* size of l2 cache.  */
> +  1024,                                        /* size of l2 cache.  */
>    64,                                  /* size of prefetch block.  */
>    /* New AMD processors never drop prefetches; if they cannot be
> performed
>       immediately, they are queued.  We set number of simultaneous
> prefetches @@ -1943,26 +1945,26 @@ struct processor_costs znver4_cost =3D
> {
>       time).  */
>    100,                                 /* number of parallel prefetches.=
  */
>    3,                                   /* Branch cost.  */
> -  COSTS_N_INSNS (5),                   /* cost of FADD and FSUB insns.  =
*/
> -  COSTS_N_INSNS (5),                   /* cost of FMUL instruction.  */
> +  COSTS_N_INSNS (7),                   /* cost of FADD and FSUB insns.  =
*/
> +  COSTS_N_INSNS (7),                   /* cost of FMUL instruction.  */
>    /* Latency of fdiv is 8-15.  */
>    COSTS_N_INSNS (15),                  /* cost of FDIV instruction.  */
>    COSTS_N_INSNS (1),                   /* cost of FABS instruction.  */
>    COSTS_N_INSNS (1),                   /* cost of FCHS instruction.  */
>    /* Latency of fsqrt is 4-10.  */
> -  COSTS_N_INSNS (10),                  /* cost of FSQRT instruction.  */
> +  COSTS_N_INSNS (25),                  /* cost of FSQRT instruction.  */
>
>    COSTS_N_INSNS (1),                   /* cost of cheap SSE instruction.=
  */
>    COSTS_N_INSNS (3),                   /* cost of ADDSS/SD SUBSS/SD insn=
s.  */
>    COSTS_N_INSNS (3),                   /* cost of MULSS instruction.  */
>    COSTS_N_INSNS (3),                   /* cost of MULSD instruction.  */
> -  COSTS_N_INSNS (5),                   /* cost of FMA SS instruction.  *=
/
> -  COSTS_N_INSNS (5),                   /* cost of FMA SD instruction.  *=
/
> -  COSTS_N_INSNS (10),                  /* cost of DIVSS instruction.  */
> +  COSTS_N_INSNS (4),                   /* cost of FMA SS instruction.  *=
/
> +  COSTS_N_INSNS (4),                   /* cost of FMA SD instruction.  *=
/
> +  COSTS_N_INSNS (13),                  /* cost of DIVSS instruction.  */
>    /* 9-13.  */
>    COSTS_N_INSNS (13),                  /* cost of DIVSD instruction.  */
> -  COSTS_N_INSNS (10),                  /* cost of SQRTSS instruction.  *=
/
> -  COSTS_N_INSNS (15),                  /* cost of SQRTSD instruction.  *=
/
> +  COSTS_N_INSNS (15),                  /* cost of SQRTSS instruction.  *=
/
> +  COSTS_N_INSNS (21),                  /* cost of SQRTSD instruction.  *=
/
>    /* Zen can execute 4 integer operations per cycle.  FP operations
>       take 3 cycles and it can execute 2 integer additions and 2
>       multiplications thus reassociation may make sense up to with of 6.

The cost changes looks fine.

Regards,
Venkat.