From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <crazylht@gmail.com>
Received: from mail-vs1-xe29.google.com (mail-vs1-xe29.google.com
 [IPv6:2607:f8b0:4864:20::e29])
 by sourceware.org (Postfix) with ESMTPS id 74CF93857C51
 for <gcc-patches@gcc.gnu.org>; Wed, 30 Jun 2021 08:48:48 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 74CF93857C51
Received: by mail-vs1-xe29.google.com with SMTP id x1so1255193vsc.1
 for <gcc-patches@gcc.gnu.org>; Wed, 30 Jun 2021 01:48:48 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=lkLtgmWSAQJ6gT+Tfe7aCFy5FwPEwBpwBHw03XWasvM=;
 b=Esj/MU0fC4NmcecGEVWJEPq5RI/3StUDHlDSD8pbIbXFdm5i5TlzzuFppdY3OAGOaV
 KVmiG7ZrXqoXycis4gf/W7VQXkvofJAyzyT34SbQ/k/qg3sSKWqlQBcD4l1hkn9cRvPo
 /Mf9RcpQoHnGy3LmkJk01SFIY7h6ZEg0kG+QeLbEmAswBDIB5S72REpJAk56BHjZcdKz
 BSNQ4iazRleSOsA3w4Nt8kALbzuOTht+w9Jn0Qr1MUvchpNBDpA2zNAPJDF+JdAb1P6i
 htHvoLwFZcvtIeEbxLgsif64GYDpWl07yVboyDEtudX329j+DsDtgbTvUNrAR7m1MXzz
 oYOQ==
X-Gm-Message-State: AOAM533craq1yx2o0A0/t+oU7YbUDJdXl2znxfV5IluCqb7A7QBItGtp
 e6eGxcouNWVfQNg2uABryeqpVZmeXydhRvuCFME=
X-Google-Smtp-Source: ABdhPJyJHj4tP1Gfp76hlS+XdbbaUysbSSM87hQBYoB14FfTm9vhOfdl/gflJykF8ikloKqe4CIb64KVuLyL1ui40uQ=
X-Received: by 2002:a67:1906:: with SMTP id 6mr29011387vsz.6.1625042927992;
 Wed, 30 Jun 2021 01:48:47 -0700 (PDT)
MIME-Version: 1.0
References: <c8ca748c-d7fd-6fb1-5ef2-567935d38722@linux.ibm.com>
 <8a5fd52a-1cc9-6563-ee6c-f345b489654c@linux.ibm.com>
 <CAMZc-bzzUSrg33sL=9HUYHs_CaLY1XGR+P6cFK57uAQJ2CD_oQ@mail.gmail.com>
 <CAMZc-bxuwG-XiEWRfDRsA3CDj8jdbTppr6GuiWhM3wkH030-Gg@mail.gmail.com>
 <bb221383-bb9f-d6c6-3982-0f93fbb9aecf@linux.ibm.com>
In-Reply-To: <bb221383-bb9f-d6c6-3982-0f93fbb9aecf@linux.ibm.com>
From: Hongtao Liu <crazylht@gmail.com>
Date: Wed, 30 Jun 2021 16:53:54 +0800
Message-ID: <CAMZc-bxSTVMan+bvjPx7OGRs=YmzWMvKxN9TmQ7Rtz0ReSuAFw@mail.gmail.com>
Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with
 param [PR100328]
To: "Kewen.Lin" <linkw@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
 Vladimir Makarov <vmakarov@redhat.com>, bergner@linux.ibm.com, 
 Bill Schmidt <wschmidt@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>, 
 Richard Sandiford <richard.sandiford@arm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Jun 2021 08:48:50 -0000

On Mon, Jun 28, 2021 at 3:27 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/6/28 =E4=B8=8B=E5=8D=883:20, Hongtao Liu wrote:
> > On Mon, Jun 28, 2021 at 3:12 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >>
> >> On Mon, Jun 28, 2021 at 2:50 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>
> >>> Hi!
> >>>
> >>> on 2021/6/9 =E4=B8=8B=E5=8D=881:18, Kewen.Lin via Gcc-patches wrote:
> >>>> Hi,
> >>>>
> >>>> PR100328 has some details about this issue, I am trying to
> >>>> brief it here.  In the hottest function LBM_performStreamCollideTRT
> >>>> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
> >>>> (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
> >>>> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
> >>>> class have 64 registers whose foregoing 32 ones make up the
> >>>> whole FLOAT_REG.  There are some differences for these two
> >>>> flavors, taking "*fma<mode>4_fpr" as example:
> >>>>
> >>>> (define_insn "*fma<mode>4_fpr"
> >>>>   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=3D<Ff>,wa,wa")
> >>>>       (fma:SFDF
> >>>>         (match_operand:SFDF 1 "gpc_reg_operand" "%<Ff>,wa,wa")
> >>>>         (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
> >>>>         (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa")))]
> >>>>
> >>>> // wa =3D> A VSX register (VSR), vs0=E2=80=A6vs63, aka. VSX_REG.
> >>>> // <Ff> (f/d) =3D> A floating point register, aka. FLOAT_REG.
> >>>>
> >>>> So for VSX_REG, we only have the destructive form, when VSX_REG
> >>>> alternative being used, the operand 2 or operand 3 is required
> >>>> to be the same as operand 0.  reload has to take care of this
> >>>> constraint and create some non-free register copies if required.
> >>>>
> >>>> Assuming one fma insn looks like:
> >>>>   op0 =3D FMA (op1, op2, op3)
> >>>>
> >>>> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead=
,
> >>>> IRA simply creates three shuffle copies for them (here the operand
> >>>> order matters, since with the same freq, the one with smaller number
> >>>> takes preference), but IMO both op2 and op3 should take higher prior=
ity
> >>>> in copy queue due to the matching constraint.
> >>>>
> >>>> I noticed that there is one function ira_get_dup_out_num, which mean=
t
> >>>> to create this kind of constraint copy, but the below code looks to
> >>>> refuse to create if there is an alternative which has valid regclass
> >>>> without spilled need.
> >>>>
> >>>>       default:
> >>>>       {
> >>>>         enum constraint_num cn =3D lookup_constraint (str);
> >>>>         enum reg_class cl =3D reg_class_for_constraint (cn);
> >>>>         if (cl !=3D NO_REGS
> >>>>             && !targetm.class_likely_spilled_p (cl))
> >>>>           goto fail
> >>>>
> >>>>        ...
> >>>>
> >>>> I cooked one patch attached to make ira respect this kind of matchin=
g
> >>>> constraint guarded with one parameter.  As I stated in the PR, I was
> >>>> not sure this is on the right track.  The RFC patch is to check the
> >>>> matching constraint in all alternatives, if there is one alternative
> >>>> with matching constraint and matches the current preferred regclass
> >>>> (or best of allocno?), it will record the output operand number and
> >>>> further create one constraint copy for it.  Normally it can get the
> >>>> priority against shuffle copies and the matching constraint will get
> >>>> satisfied with higher possibility, reload doesn't create extra copie=
s
> >>>> to meet the matching constraint or the desirable register class when
> >>>> it has to.
> >>>>
> >>>> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly st=
ay
> >>>> as shuffle copies, and later any of A,B,C,D gets assigned by one
> >>>> hardware register which is a VSX register (VSX_REG) but not a FP
> >>>> register (FLOAT_REG), which means it has to pay costs once we can NO=
T
> >>>> go with VSX alternatives, so at that time it's important to respect
> >>>> the matching constraint then we can increase the freq for the remain=
ing
> >>>> copies related to this (A/B, A/C, A/D).  This idea requires some sid=
e
> >>>> tables to record some information and seems a bit complicated in the
> >>>> current framework, so the proposed patch aggressively emphasizes the
> >>>> matching constraint at the time of creating copies.
> >>>>
> >>>
> >>> Comparing with the original patch (v1), this patch v3 has
> >>> considered: (this should be v2 for this mail list, but bump
> >>> it to be consistent as PR's).
> >>>
> >>>   - Excluding the case where for one preferred register class
> >>>     there can be two or more alternatives, one of them has the
> >>>     matching constraint, while another doesn't have.  So for
> >>>     the given operand, even if it's assigned by a hardware reg
> >>>     which doesn't meet the matching constraint, it can simply
> >>>     use the alternative which doesn't have matching constraint
> >>>     so no register move is needed.  One typical case is
> >>>     define_insn *mov<mode>_internal2 on rs6000.  So we
> >>>     shouldn't create constraint copy for it.
> >>>
> >>>   - The possible free register move in the same register class,
> >>>     disable this if so since the register move to meet the
> >>>     constraint is considered as free.
> >>>
> >>>   - Making it on by default, suggested by Segher & Vladimir, we
> >>>     hope to get rid of the parameter if the benchmarking result
> >>>     looks good on major targets.
> >>>
> >>>   - Tweaking cost when either of matching constraint two sides
> >>>     is hardware register.  Before this patch, the constraint
> >>>     copy is simply taken as a real move insn for pref and
> >>>     conflict cost with one hardware register, after this patch,
> >>>     it's allowed that there are several input operands
> >>>     respecting the same matching constraint (but in different
> >>>     alternatives), so we should take it to be like shuffle copy
> >>>     for some cases to avoid over preferring/disparaging.
> >>>
> >>> Please check the PR comments for more details.
> >>>
> >>> This patch can be bootstrapped & regtested on
> >>> powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some
> >>> "XFAIL->XPASS" failures on aarch64-linux-gnu.  The failure list
> >>> was attached in the PR and thought the new assembly looks
> >>> improved (expected).
> >>>
> >>> With option Ofast unroll, this patch can help to improve SPEC2017
> >>> bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while
> >>> 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any
> >>> remarkable degradations.

Here's SPEC2017  rate result tested on AMD milan
option is: -march=3Dznver2 -Ofast -funroll-loops  -mfpmath=3Dsse -flto

fprate:
      503.bwaves_r                 0.01    (A)  shliclel219
      507.cactuBSSN_r             -0.19    (A)  shliclel219
      508.namd_r                   0.02    (A)  shliclel219
      510.parest_r                -0.68    (A)  shliclel219
      511.povray_r                 1.59    (A)  shliclel219
      521.wrf_r                    0.19    (A)  shliclel219
      526.blender_r                0.68    (A)  shliclel219
      527.cam4_r                  -0.30    (A)  shliclel219
      538.imagick_r               -3.81 <- (A)  shliclel219
      544.nab_r                    0.02    (A)  shliclel219
      549.fotonik3d_r              0.02    (A)  shliclel219
      554.roms_r                  -0.43    (A)  shliclel219
      997.specrand_fr             -3.80 <- (A)  shliclel219
                                    Geometric mean:  -0.52
intrate:
      500.perlbench_r             -1.54    (A)  shliclel219
      502.gcc_r                   -0.38    (A)  shliclel219
      505.mcf_r                   -0.10    (A)  shliclel219
      520.omnetpp_r               -0.24    (A)  shliclel219
      523.xalancbmk_r             -1.04    (A)  shliclel219
      525.x264_r                   0.31    (A)  shliclel219
      531.deepsjeng_r             -0.02    (A)  shliclel219
      541.leela_r                  0.95    (A)  shliclel219
      548.exchange2_r              0.08    (A)  shliclel219
      557.xz_r                    -0.40    (A)  shliclel219
                                    Geometric mean:  -0.24
> >>>
> >>> Since this patch likely benefits x86_64 and aarch64, but I don't
> >>> have performance machines with these arches at hand, could
> >>> someone kindly help to benchmark it if possible?
> >> I can help test it on Intel cascade lake and AMD milan.
>
>
> Thanks for your help, Hongtao!
>
>
> > And could you rebase your patch on the lastest trunk, i got several
> > failures when applying the patch
> > ~ git apply ira-v3.diff
> > error: patch failed: gcc/doc/invoke.texi:13845
> > error: gcc/doc/invoke.texi: patch does not apply
> > error: patch failed: gcc/ira-conflicts.c:233
> > error: gcc/ira-conflicts.c: patch does not apply
> > error: patch failed: gcc/ira-int.h:971
> > error: gcc/ira-int.h: patch does not apply
> > error: patch failed: gcc/ira.c:1922
> > error: gcc/ira.c: patch does not apply
> > error: patch failed: gcc/params.opt:330
> > error: gcc/params.opt: patch does not apply
> >
>
> I think it's due to unexpected git stat lines in previously attached diff=
.
>
> I have attached the format-patch file.  Please have a check.  Thanks!
>
>
> BR,
> Kewen


--=20
BR,
Hongtao