public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Kumar, Venkataramanan" <Venkataramanan.Kumar@amd.com>
To: "Alexander Monakov" <amonakov@ispras.ru>,
	"Jan Hubička" <honza.hubicka@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>,
	Richard Biener <richard.guenther@gmail.com>,
	"Joshi, Tejas Sanjay" <TejasSanjay.Joshi@amd.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU
Date: Wed, 26 Oct 2022 18:07:12 +0000	[thread overview]
Message-ID: <DM6PR12MB308107B8515269F15F7CD1608F309@DM6PR12MB3081.namprd12.prod.outlook.com> (raw)
In-Reply-To: <4549f27b-238a-7d77-f72b-cc77df8ae36e@ispras.ru>

[AMD Official Use Only - General]

Hi Alexander,

Thank you for looking in to this issue.

> -----Original Message-----
> From: Alexander Monakov <amonakov@ispras.ru>
> Sent: Tuesday, October 25, 2022 12:18 AM
> To: Jan Hubička <honza.hubicka@gmail.com>
> Cc: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; Jakub
> Jelinek <jakub@redhat.com>; Richard Biener
> <richard.guenther@gmail.com>; Joshi, Tejas Sanjay
> <TejasSanjay.Joshi@amd.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen4 CPU
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Mon, 24 Oct 2022, Jan Hubička wrote:
>
> > > By the way, it appears pre-existing znver[123] models are also
> > > causing some kind of combinatorial blow-up, but before znver4 it was
> > > not a blocking issue:
> > >
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgc
> > >
> c.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D87832&amp;data=05%7C
> 01%7C
> > >
> Venkataramanan.Kumar%40amd.com%7C5d22bec311ac43b3f56a08dab5f
> 03fc7%7C
> > >
> 3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638022340726474
> 812%7CUnkn
> > >
> own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik
> 1haW
> > >
> wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=kg2zKCBxDEeYYKijH
> 204QpOC4
> > > 0SJBADOvqlk0LhzJhc%3D&amp;reserved=0
> >
> >
> > It is really easy to make DFA size to grow if there are possibly many
> > instructions in the pipeline (as every possible state of a modelled
> > pipeline needs to be a new state of the automaton). This is
> > essentially depth_of_pipeline * number_of_units with additional states
> > to repesent special instructions and this naturally keeps growing.
> >
> > We could try to break the FP automata into multiple ones, but there
> > are instructions that can go down any pipe which makes this hard or we
> > can try toreduce number of different reservation types (possibly by
> > breaking the automaton to znver1-3 and 4 or so).
> > With znver2 model I experimented with broken up version and common
> one
> > and ended up with smaller binary for combined one.
>
> Looking at znver1.md again, I think the problem is caused by incorrect
> modeling of division instructions: they have descriptions like
>
> (define_insn_reservation "znver1_idiv_DI" 41
>                         (and (eq_attr "cpu" "znver1,znver2")
>                              (and (eq_attr "type" "idiv")
>                                   (and (eq_attr "mode" "DI")
>                                        (eq_attr "memory" "none"))))
>                         "znver1-double,znver1-ieu2*41")
>
> which says that DImode idiv has latency 41 (which is correct) and that it
> occupies 2nd integer execution unit for 41 consecutive cycles, but that is
> not correct:

Yes you are correct. It does not block the 2nd integer execution pipe consecutively for 41 cycles.

>
> 1) the division instruction is partially pipelined, and has throughput 1/14

"Div" unit takes one instruction and in the worst case the latency will be 41 cycles in znver1/2.
But I agree that we can put best case latency of 14 cycles for the scheduler model in znver1/2 .

>
> 2) for the most part it occupies a separate division unit, not the general
> arithmetic unit.

Agreed.

>
> (incidentally, I think the blowup is caused by interaction of such super-long
> 41-cycle paths with the rest of reservations)
>
> I think we should fix this by modeling the separate division unit properly,
> and fixing reservations to use the measured reciprocal throughput of those
> instructions (available from uops.info). The following patch does that for
> integer divisions and completely eliminates the integer part of the problem;
> the issue with floating-point divisions remains.
>
> Top 5 znver table sizes, before:
>
> 68692 r znver1_ieu_check
> 68692 r znver1_ieu_transitions
> 99792 r znver1_ieu_min_issue_delay
> 428108 r znver1_fp_min_issue_delay
> 856216 r znver1_fp_transitions
>
> After:
>
> 1454 r znver1_ieu_translate
> 1454 r znver1_translate
> 2304 r znver1_ieu_transitions
> 428108 r znver1_fp_min_issue_delay
> 856216 r znver1_fp_transitions
>
> Will you help getting this reviewed for trunk?
>
>
>
> diff --git a/gcc/config/i386/znver1.md b/gcc/config/i386/znver1.md index
> 9c25b4e27..39b59343d 100644
> --- a/gcc/config/i386/znver1.md
> +++ b/gcc/config/i386/znver1.md
> @@ -24,7 +24,7 @@
>  ;; AMD znver1, znver2 and znver3 Scheduling  ;; Modeling automatons for
> zen decoders, integer execution pipes,  ;; AGU pipes and floating point
> execution units.
> -(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu")
> +(define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu,
> +znver1_idiv")
>
>  ;; Decoders unit has 4 decoders and all of them can decode fast path  ;; and
> vector type instructions.
> @@ -50,6 +50,7 @@
>  (define_cpu_unit "znver1-ieu1" "znver1_ieu")  (define_cpu_unit "znver1-
> ieu2" "znver1_ieu")  (define_cpu_unit "znver1-ieu3" "znver1_ieu")
> +(define_cpu_unit "znver1-idiv" "znver1_idiv")
>  (define_reservation "znver1-ieu" "znver1-ieu0|znver1-ieu1|znver1-
> ieu2|znver1-ieu3")
>
>  ;; 2 AGU pipes in znver1 and 3 AGU pipes in znver2 and znver3 @@ -
> 176,28 +177,28 @@
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "DI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-ieu2*41")
> +                        "znver1-double,znver1-idiv*14")
>
>  (define_insn_reservation "znver1_idiv_SI" 25
>                          (and (eq_attr "cpu" "znver1,znver2")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "SI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-ieu2*25")
> +                        "znver1-double,znver1-idiv*14")
>
>  (define_insn_reservation "znver1_idiv_HI" 17
>                          (and (eq_attr "cpu" "znver1,znver2")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "HI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-ieu2*17")
> +                        "znver1-double,znver1-idiv*14")
>
>  (define_insn_reservation "znver1_idiv_QI" 12
>                          (and (eq_attr "cpu" "znver1,znver2")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "QI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-direct,znver1-ieu2*12")
> +                        "znver1-direct,znver1-idiv*13")
>
>  ;; Mem operands
>  (define_insn_reservation "znver1_idiv_mem_DI" 45 @@ -205,84 +206,84
> @@
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "DI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-load,znver1-ieu2*41")
> +                        "znver1-double,znver1-load,znver1-idiv*14")
>
>  (define_insn_reservation "znver1_idiv_mem_SI" 29
>                          (and (eq_attr "cpu" "znver1,znver2")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "SI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-load,znver1-ieu2*25")
> +                        "znver1-double,znver1-load,znver1-idiv*14")
>
>  (define_insn_reservation "znver1_idiv_mem_HI" 21
>                          (and (eq_attr "cpu" "znver1,znver2")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "HI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-load,znver1-ieu2*17")
> +                        "znver1-double,znver1-load,znver1-idiv*14")
>
>  (define_insn_reservation "znver1_idiv_mem_QI" 16
>                          (and (eq_attr "cpu" "znver1,znver2")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "QI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-direct,znver1-load,znver1-ieu2*12")
> +                        "znver1-direct,znver1-load,znver1-idiv*13")
>
>  (define_insn_reservation "znver3_idiv_DI" 18
>                          (and (eq_attr "cpu" "znver3")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "DI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-ieu2*18")
> +                        "znver1-double,znver1-idiv*7")
>
>  (define_insn_reservation "znver3_idiv_SI" 12
>                          (and (eq_attr "cpu" "znver3")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "SI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-ieu2*12")
> +                        "znver1-double,znver1-idiv*6")
>
>  (define_insn_reservation "znver3_idiv_HI" 10
>                          (and (eq_attr "cpu" "znver3")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "HI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-double,znver1-ieu2*10")
> +                        "znver1-double,znver1-idiv*4")
>
>  (define_insn_reservation "znver3_idiv_QI" 9
>                          (and (eq_attr "cpu" "znver3")
>                               (and (eq_attr "type" "idiv")
>                                    (and (eq_attr "mode" "QI")
>                                         (eq_attr "memory" "none"))))
> -                        "znver1-direct,znver1-ieu2*9")
> +                        "znver1-direct,znver1-idiv*4")
>
>  (define_insn_reservation "znver3_idiv_mem_DI" 22
>                           (and (eq_attr "cpu" "znver3")
>                                (and (eq_attr "type" "idiv")
>                                     (and (eq_attr "mode" "DI")
>                                          (eq_attr "memory" "load"))))
> -                         "znver1-double,znver1-load,znver1-ieu2*22")
> +                         "znver1-double,znver1-load,znver1-idiv*7")
>
>  (define_insn_reservation "znver3_idiv_mem_SI" 16
>                           (and (eq_attr "cpu" "znver3")
>                                (and (eq_attr "type" "idiv")
>                                     (and (eq_attr "mode" "SI")
>                                          (eq_attr "memory" "load"))))
> -                         "znver1-double,znver1-load,znver1-ieu2*16")
> +                         "znver1-double,znver1-load,znver1-idiv*6")
>
>  (define_insn_reservation "znver3_idiv_mem_HI" 14
>                           (and (eq_attr "cpu" "znver3")
>                                (and (eq_attr "type" "idiv")
>                                     (and (eq_attr "mode" "HI")
>                                          (eq_attr "memory" "load"))))
> -                         "znver1-double,znver1-load,znver1-ieu2*10")
> +                         "znver1-double,znver1-load,znver1-idiv*4")
>
>  (define_insn_reservation "znver3_idiv_mem_QI" 13
>                           (and (eq_attr "cpu" "znver3")
>                                (and (eq_attr "type" "idiv")
>                                     (and (eq_attr "mode" "QI")
>                                          (eq_attr "memory" "load"))))
> -                         "znver1-direct,znver1-load,znver1-ieu2*9")
> +                         "znver1-direct,znver1-load,znver1-idiv*4")
>
>  ;; STR ISHIFT which are micro coded.
>  ;; Fix me: Latency need to be rechecked.

The changes looks good.  But we will do a quick benchmarking with your patch and update you .

Regards,
Venkat.

  reply	other threads:[~2022-10-26 18:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-13 15:32 Joshi, Tejas Sanjay
2022-10-16 17:48 ` Uros Bizjak
2022-10-17 14:39   ` Joshi, Tejas Sanjay
2022-10-21  9:59     ` Kumar, Venkataramanan
2022-10-21 11:51       ` Richard Biener
2022-10-21 12:52         ` Jan Hubicka
2022-10-21 14:02           ` Joshi, Tejas Sanjay
2022-10-21 17:59             ` Joshi, Tejas Sanjay
2022-10-22 17:11         ` Jakub Jelinek
2022-10-23 14:29           ` Kumar, Venkataramanan
2022-10-24 14:26             ` Alexander Monakov
2022-10-24 14:40               ` Jan Hubička
2022-10-24 18:47                 ` Alexander Monakov
2022-10-26 18:07                   ` Kumar, Venkataramanan [this message]
2022-10-26 18:23                     ` Alexander Monakov
2022-10-31 10:39                       ` Joshi, Tejas Sanjay
2022-10-31 10:59                         ` Jan Hubička
2022-11-01 12:22                           ` Alexander Monakov
2022-10-25  9:17                 ` Joshi, Tejas Sanjay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB308107B8515269F15F7CD1608F309@DM6PR12MB3081.namprd12.prod.outlook.com \
    --to=venkataramanan.kumar@amd.com \
    --cc=TejasSanjay.Joshi@amd.com \
    --cc=amonakov@ispras.ru \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=honza.hubicka@gmail.com \
    --cc=jakub@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).