[Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well
@ 2021-04-29  6:41 linkw at gcc dot gnu.org
  2021-04-30  8:30 ` [Bug rtl-optimization/100328] " linkw at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-04-29  6:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

            Bug ID: 100328
           Summary: IRA doesn't model dup num constraint well
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

source: function LBM_performStreamCollideTRT in SPEC2017 519.lbm_r

This issue was exposed by O2 vectorization enablement evaluation on 519.lbm_r.

baseline option: -O2 -mcpu=power9 -ffast-math

test option: -O2 -mcpu=power9 -ffast-math -ftree-vectorize
             -fvect-cost-model=very-cheap

The ratio with test option will degrade -1.66% against baseline (-1.74% without
the very-cheap cost model).

The hotspot LBM_performStreamCollideTRT isn't vectorized at all, but the
pre-pass if-conversion of vectorization gets the issue exposed. Firstly,
if-conversion will use the new copied loop as the scalar version after loop
versioning, once vectorization fails, we end up with one loop which has a
little
difference against before.

The difference mainly comes from: 

1) Different basic block placement. For this function, the fall through BB and
branch BB are switched. The reason is that the new copied loop BBs are adjusted
as dom_order while the idom insertion order changes when it sets the idom
during
copying. Anyway, it's acceptable. 

2) SSA names difference. The new copied loop can reuse some discarded
SSA_names,
the gimple commutative operands  canonicalization will change some order.

I did some hack to filter the fall through/branch BB difference, the gap
becomes
smaller but still some. The remaining difference on gimple are some operand
orders as mentioned above, the difference on assembly file are some different
insns choices mainly on fma style insns, one remarkable difference is the
number
of register copies: 

  fmr + xxlor: 16 (baseline) vs 21 (test respecting fall through)

In this function, there are many FMA style expressions (27 FMA, 19 FMS, 11
FNMA). Their VSX_REG version insns are destructive and the define_insns look
like:

(define_insn "*nfma<mode>4_fpr"
  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
        (neg:SFDF
         (fma:SFDF
          (match_operand:SFDF 1 "gpc_reg_operand" "<Ff>,wa,wa")
          (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
          (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa"))))]
  "TARGET_HARD_FLOAT"
  "@
   fnmadd<s> %0,%1,%2,%3
   xsnmadda<sd>p %x0,%x1,%x2
   xsnmaddm<sd>p %x0,%x1,%x3"
  [(set_attr "type" "fp")
   (set_attr "isa" "*,<Fisa>,<Fisa>")])

(define_insn "*fms<mode>4_fpr"
  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
        (fma:SFDF
         (match_operand:SFDF 1 "gpc_reg_operand" "<Ff>,wa,wa")
         (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
         (neg:SFDF (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa"))))]
...

(define_insn "*fma<mode>4_fpr"
  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
        (fma:SFDF
          (match_operand:SFDF 1 "gpc_reg_operand" "%<Ff>,wa,wa")
          (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
          (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa")))]
...

Since the 1st alternative are with class FLOAT_REG, which are the subset of
VSX_REG whose total number are 64 while fp shares the first 32, in most cases
the preferred rclass for these insns are VSX_REG. Assuming we have the
expression that:

  FMA A,B,C,D

If these four register are totally different, it can not meet with the
alternatives with duplicated number constraint. If it prefers to use the
remaining alternative (1st), at the same time, if one of these isn't low 32 vsx
(can't fit with fp), we have to generate register copy from vsx register (high
number vsx reg) to fp register (low number vsx reg).

How the commutative operand order affects this? 

IRA tries to create copy for register coalescing, for FMA expression above,
assuming both B and C are dead at the current insn, it will have copy on A/B
and
A/C, later when it does thread forming, if both A/B and A/C have the same freq,
lower copy number comes first. It means the operand order can affect how we
form
the thread, different pulled-in allocno will probably produce different
conflict
set, it further affects the global thread forming and final assignment.

But I think the root cause is that when we create copy for these fma style
insns, ira doesn't fully consider the duplicate number constraint, for example,
for FMS if the operands 1,2,3 are dead, both 2 and 3 should take higher
priority
in copy queue.

I noticed that there is one function ira_get_dup_out_num, which meant to create
this kind of copy, but the below code looks to refuse to create if there is an
alternative which has valid regclass without spilled need. 

              default:
                {
                  enum constraint_num cn = lookup_constraint (str);
                  enum reg_class cl = reg_class_for_constraint (cn);
                  if (cl != NO_REGS
                      && !targetm.class_likely_spilled_p (cl))
                    goto fail

                 ...

Is there some particular reason for this behavior?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model dup num constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
@ 2021-04-30  8:30 ` linkw at gcc dot gnu.org
  2021-06-23 18:28 ` [Bug rtl-optimization/100328] IRA doesn't model matching " vmakarov at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-04-30  8:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> ---
Created attachment 50715
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50715&action=edit
ira:consider matching cstr in all alternatives

With little understanding on ira, I am not quite sure this patch is on the
reasonable direction. It aims to check the matching constraint in all
alternatives, if there is one alternative with matching constraint and matches
the current preferred regclass, it will record the output operand number and
further create one copy for it. Normally it can get the priority against
shuffle copies and the matching constraint will get satisfied with higher
possibility, reload doesn't create extra copies to meet the matching constraint
or the desirable register class when it has to.

For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay as
shuffle copies, and later any of A,B,C,D gets assigned by one hardware register
which is a VSX register but not a FP register, which means it has to pay costs
once we can NOT go with VSX alternatives, so at that time we can increase the
freq for the remaining copies related to this, once the matching constraint
gets satisfied further, there aren't any extra costs to pay. This idea seems a
bit complicated in the current framework, so the proposed patch aggressively
emphasizes the matching constraint at the time of creating copies.

FWIW bootstrapped/regtested on powerpc64le-linux-gnu P9. The evaluation with
Power9 SPEC2017 all run shows 505.mcf_r +2.98%, 508.namd_r +3.37%, 519.lbm_r
+2.51%, no remarkable degradation is observed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
  2021-04-30  8:30 ` [Bug rtl-optimization/100328] " linkw at gcc dot gnu.org
@ 2021-06-23 18:28 ` vmakarov at gcc dot gnu.org
  2021-06-24 12:11 ` linkw at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2021-06-23 18:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #2 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #1)
> Created attachment 50715 [details]
> ira:consider matching cstr in all alternatives
> 
> With little understanding on ira, I am not quite sure this patch is on the
> reasonable direction. It aims to check the matching constraint in all
> alternatives, if there is one alternative with matching constraint and
> matches the current preferred regclass, it will record the output operand
> number and further create one copy for it. Normally it can get the priority
> against shuffle copies and the matching constraint will get satisfied with
> higher possibility, reload doesn't create extra copies to meet the matching
> constraint or the desirable register class when it has to.
> 
> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay as
> shuffle copies, and later any of A,B,C,D gets assigned by one hardware
> register which is a VSX register but not a FP register, which means it has
> to pay costs once we can NOT go with VSX alternatives, so at that time we
> can increase the freq for the remaining copies related to this, once the
> matching constraint gets satisfied further, there aren't any extra costs to
> pay. This idea seems a bit complicated in the current framework, so the
> proposed patch aggressively emphasizes the matching constraint at the time
> of creating copies.
> 
> FWIW bootstrapped/regtested on powerpc64le-linux-gnu P9. The evaluation with
> Power9 SPEC2017 all run shows 505.mcf_r +2.98%, 508.namd_r +3.37%, 519.lbm_r
> +2.51%, no remarkable degradation is observed.

Thank you for working on this issue.

The current implementation of ira_get_dup_out_num was specifically tuned for
better register allocation for x86-64 div insns.

Your patch definitely improves code for power9 and I would say significantly
(congratulations!).  The patch you proposed makes me think that it might work
for major targets as well.

I would prefer to avoid introducing new parameter because there are too many of
them already and its description is cryptic.

It would be nice if you benchmark the patch on x86-64 too, If there is no
overall degradation with new behaviour we could remove the parameter and make
the new behaviour as a default. If it is not, well we will keep the parameter.

As for the patch itself, I don't like some variable names.  Sorry.  Could you
use op_regno, out_regno, and present_alt instead of op_no, out_no, tot. 
Please, in general use longer variable names reflecting their purpose as GCC
developers reads code in many times more than writing it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
  2021-04-30  8:30 ` [Bug rtl-optimization/100328] " linkw at gcc dot gnu.org
  2021-06-23 18:28 ` [Bug rtl-optimization/100328] IRA doesn't model matching " vmakarov at gcc dot gnu.org
@ 2021-06-24 12:11 ` linkw at gcc dot gnu.org
  2021-06-24 12:36 ` linkw at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-06-24 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #3 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Vladimir Makarov from comment #2)
> (In reply to Kewen Lin from comment #1)
> > Created attachment 50715 [details]
> > ira:consider matching cstr in all alternatives
> > 
> > With little understanding on ira, I am not quite sure this patch is on the
> > reasonable direction. It aims to check the matching constraint in all
> > alternatives, if there is one alternative with matching constraint and
> > matches the current preferred regclass, it will record the output operand
> > number and further create one copy for it. Normally it can get the priority
> > against shuffle copies and the matching constraint will get satisfied with
> > higher possibility, reload doesn't create extra copies to meet the matching
> > constraint or the desirable register class when it has to.
> > 
> > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay as
> > shuffle copies, and later any of A,B,C,D gets assigned by one hardware
> > register which is a VSX register but not a FP register, which means it has
> > to pay costs once we can NOT go with VSX alternatives, so at that time we
> > can increase the freq for the remaining copies related to this, once the
> > matching constraint gets satisfied further, there aren't any extra costs to
> > pay. This idea seems a bit complicated in the current framework, so the
> > proposed patch aggressively emphasizes the matching constraint at the time
> > of creating copies.
> > 
> > FWIW bootstrapped/regtested on powerpc64le-linux-gnu P9. The evaluation with
> > Power9 SPEC2017 all run shows 505.mcf_r +2.98%, 508.namd_r +3.37%, 519.lbm_r
> > +2.51%, no remarkable degradation is observed.
> 
> Thank you for working on this issue.
> 
> The current implementation of ira_get_dup_out_num was specifically tuned for
> better register allocation for x86-64 div insns.
> 
> Your patch definitely improves code for power9 and I would say significantly
> (congratulations!).  The patch you proposed makes me think that it might
> work for major targets as well.
> 
> I would prefer to avoid introducing new parameter because there are too many
> of them already and its description is cryptic.
> 

Thanks for your comments, Vladimir!  Yeah, Segher also thought it can benefit
other targets and suggested making it on by default, I've made this parameter
on by default in v2, if it's fine on x86-64 and aarch64 with some testing and
benchmarking later, I think we can simply get rid of the parameter as you
suggested. 

> It would be nice if you benchmark the patch on x86-64 too, If there is no
> overall degradation with new behaviour we could remove the parameter and
> make the new behaviour as a default. If it is not, well we will keep the
> parameter.
> 

Sorry that I don't have a x86-64 or aarch64 performance machine at hand, the
new version v2 was bootstrapped/regtested on powerpc64le-linux-gnu P9 and
x86_64-redhat-linux, but had some failures on aarch64, I was still
investigating it.  Once it got root-caused and fixed, I would ask around folks
to help to benchmark this.

> As for the patch itself, I don't like some variable names.  Sorry.  Could
> you use op_regno, out_regno, and present_alt instead of op_no, out_no, tot. 
> Please, in general use longer variable names reflecting their purpose as GCC
> developers reads code in many times more than writing it.

Got it, thanks for the suggestion! This part has been simplified with
recog_op_alt, hope it looks better.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-06-24 12:11 ` linkw at gcc dot gnu.org
@ 2021-06-24 12:36 ` linkw at gcc dot gnu.org
  2021-06-28  3:27 ` linkw at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-06-24 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |linkw at gcc dot gnu.org
   Last reconfirmed|                            |2021-06-24
     Ever confirmed|0                           |1

--- Comment #4 from Kewen Lin <linkw at gcc dot gnu.org> ---
Created attachment 51059
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51059&action=edit
ira-respect-matching-constraint-v2

This v2 considers the situation that: for one preferred register class
there can be two or more alternatives, one of them has the matching
constraint, while another doesn't have.  For the given operand, even if
it's assigned by a hardware reg which doesn't meet the matching
constraint, it can simply use the alternative which doesn't have
matching constraint so no register copy is needed.  One typical case is
define_insn *mov<mode>_internal2 on rs6000:

(define_insn "*mov<mode>_internal2"
  [(set (match_operand:CC 2 "cc_reg_operand" "=y,x,?y")
        (compare:CC (match_operand:P 1 "gpc_reg_operand" "0,r,r")
                    (const_int 0)))
   (set (match_operand:P 0 "gpc_reg_operand" "=r,r,r") (match_dup 1))]
  ""
  "@
   cmp<wd>i %2,%0,0
   mr. %0,%1
   #"

So we shouldn't create constraint copy for it.  For fma style insns on
rs6000, there are also several alternatives for preferred regclass and
also only one with matching constraint.  The difference is that for the
given operand although there is no matching constraint applied for the
alternaitve but matching constraint is being applied for one other input
operand in this same alternative, it means when one matching constraint
can be applied to more than one input operand, it has to have several
alternatives like this.  And to create constraint copies for all of
these input operands with matching constraint is fine, once the matching
constraint is honored on one input operand, it implicitly disable the
others due to the interference relationship.  So this patch is going
to record and check all the other alternatives, which don't have matching
constraint but with preferred classes, whether there is one input operand
having same matching constraint.

It also considers the possible free register move in the same register
class, disable this if so since the register copy to meet the constraint
is considered as free.

I re-evaluated SPEC2017 performance with option set Ofast unroll, bmks 
508.namd_r and 519.lbm_r were observed to be improved by 2.4% ~ 3.8%
on Power8 and Power9.

As mentioned before, it's bootstrapped/regtested on powerpc64le-linux-gnu P9
and x86_64-redhat-linux, but hit some regression failures on aarch64, I am
still
investigating the only one PASS->FAIL: (the others are XFAIL->XPASS)

PASS->FAIL: gcc.target/aarch64/sve/acle/general/pr94683.c -march=armv8.2-a+sve
-moverride=tune=none  check-function-bodies test

In this case, the newly created constraint copy is expected (which was shuffle
copy), but this copy change somehow affects the full cost on register 92 due to
conflict with reg 102. Need more digging on this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-06-24 12:36 ` linkw at gcc dot gnu.org
@ 2021-06-28  3:27 ` linkw at gcc dot gnu.org
  2021-06-28  5:25 ` linkw at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-06-28  3:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> ---
Created attachment 51065
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51065&action=edit
ira: Consider matching constraint heavily with some parameter v3

The mentioned only one aarch64-linux-gnu "PASS->FAIL" regression failure
exposes one issue: how to set up the costs when either of matching constraint
two sides is hardware register (hw reg for short later) meanwhile there are
several operands can have the matching constraint. Without this patch, the
constraint copy is simply taken as a real move insn, so when there is one hw
reg, it set the preference and conflict cost as big value, it's good since we
used to create constraint copy when there is only one operand respecting
matching constraint. With this patch, we allow there are several input operands
respecting the same matching constraint (but in different alternatives), so
even there is one hw reg, we can't set the preference/conflict cost with high
value.

CASE 1:
  hw_reg0 = reg1, reg2, reg3 ... (op1/2/3 has the matching constraint on op0)

We can make reg1/reg2/reg3 be preferred to be assigned by hw_reg0 over the
other hw_regs. But reg1/reg2/reg3 conflicts with any of the others, we can't
make reg1's conflict objects be disparaged on hw_reg0 since it can affect
hw_reg0 to be assigned onto reg2/reg3 and also their allocno copies. So make it
simply as shuffle copy.

CASE 2:
  reg0 = hw_reg1, reg2, reg3 ... (op1/2/3 has the matching constraint on op0)

hw_reg1 is just one option to be assigned to reg0, shouldn't give it much cost
otherwise it can disparage reg0's copies with reg2/reg3. Now the patch simple
takes it as shuffle copy, so it slightly prefer hw_reg1, but still have the
flexibility to go with some hw_reg in reg2/reg3. Note that we can early catch
this in function ira_get_dup_out_num, but I think it might be more clear to
leave the handling in process_regs_for_copy near the CASE1 handling and easier
to tweak it in case we have the need.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-06-28  3:27 ` linkw at gcc dot gnu.org
@ 2021-06-28  5:25 ` linkw at gcc dot gnu.org
  2021-06-29 16:01 ` rsandifo at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-06-28  5:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #6 from Kewen Lin <linkw at gcc dot gnu.org> ---
Created attachment 51066
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51066&action=edit
aarch64 XPASS failure list

The patch v3 bootstrapped and regression-tested on x86_64-redhat-linux and
powerpc64le-linux-gnu, but still have some "XFAIL -> XPASS" regression failures
on aarch64-linux-gnu, I think these remaining failures are expected (assembly
gets improved). Need Richard's help to double check/confirm them.

For example, for div_f16.c, mad_f32.c and sub_f64.c, the related different
assembly looks like below:

BEFORE:

div_h4_f16_x_untied:

        mov     z4.h, h4
        movprfx z0, z1
        fdiv    z0.h, p0/m, z0.h, z4.h
        ret

mad_s4_f32_x_untied:

        mov     z4.s, s4
        movprfx z0, z4
        fmla    z0.s, p0/m, z1.s, z2.s
        ret

sub_d4_f64_x_untied:

        mov     z4.d, d4
        movprfx z0, z1
        fsub    z0.d, p0/m, z0.d, z4.d
        ret

AFTER:

div_h4_f16_x_untied:

        mov     z0.h, h4
        fdivr   z0.h, p0/m, z0.h, z1.h
        ret

mad_s4_f32_x_untied:

        mov     z0.s, s4
        fmla    z0.s, p0/m, z1.s, z2.s
        ret

sub_d4_f64_x_untied:

        mov     z0.d, d4
        fsubr   z0.d, p0/m, z0.d, z1.d
        ret

The assembly with this patch saves movprfx.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-06-28  5:25 ` linkw at gcc dot gnu.org
@ 2021-06-29 16:01 ` rsandifo at gcc dot gnu.org
  2021-07-01  6:18 ` linkw at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-06-29 16:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #7 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #6)
> Created attachment 51066 [details]
> aarch64 XPASS failure list
> 
> The patch v3 bootstrapped and regression-tested on x86_64-redhat-linux and
> powerpc64le-linux-gnu, but still have some "XFAIL -> XPASS" regression
> failures on aarch64-linux-gnu, I think these remaining failures are expected
> (assembly gets improved). Need Richard's help to double check/confirm them.
Nice!  Yeah, these are all progressions rather than regressions.

The scan-assemblers in gcc.target/sve/acle check for the preferred output.
We don't always get there yet, so some of them have been XFAILed.  But for
this part of the testsuite all XFAILs->XPASSes are improvements.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-06-29 16:01 ` rsandifo at gcc dot gnu.org
@ 2021-07-01  6:18 ` linkw at gcc dot gnu.org
  2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-07-01  6:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #8 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to rsandifo@gcc.gnu.org from comment #7)
> (In reply to Kewen Lin from comment #6)
> > Created attachment 51066 [details]
> > aarch64 XPASS failure list
> > 
> > The patch v3 bootstrapped and regression-tested on x86_64-redhat-linux and
> > powerpc64le-linux-gnu, but still have some "XFAIL -> XPASS" regression
> > failures on aarch64-linux-gnu, I think these remaining failures are expected
> > (assembly gets improved). Need Richard's help to double check/confirm them.
> Nice!  Yeah, these are all progressions rather than regressions.
> 
> The scan-assemblers in gcc.target/sve/acle check for the preferred output.
> We don't always get there yet, so some of them have been XFAILed.  But for
> this part of the testsuite all XFAILs->XPASSes are improvements.

Thanks for your time, Richard! I'll make a patch to fix these expected test
case changes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-07-01  6:18 ` linkw at gcc dot gnu.org
@ 2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
  2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
  2021-07-07  3:07 ` linkw at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-06  2:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:8ffe25eefae57fb3a228a2d31a57af5bdab8911f

commit r12-2045-g8ffe25eefae57fb3a228a2d31a57af5bdab8911f
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Mon Jul 5 20:53:19 2021 -0500

    ira: Support more matching constraint forms with param [PR100328]

    This patch is to make IRA consider matching constraint heavily,
    even if there is at least one other alternative with non-NO_REG
    register class constraint, it will continue and check matching
    constraint in all available alternatives and respect the
    matching constraint with preferred register class.

    One typical case is destructive FMA style instruction on rs6000.
    Without this patch, for the mentioned FMA instruction, IRA won't
    respect the matching constraint on VSX_REG since there are some
    alternative with FLOAT_REG which doesn't have matching constraint.
    It can cause extra register copies since later reload has to make
    code to respect the constraint.  This patch make IRA respect this
    matching constraint on VSX_REG which is the preferred regclass,
    but it excludes some cases where for one preferred register class
    there can be two or more alternatives, one of them has the
    matching constraint, while another doesn't have.  It also
    considers the possibility of free register copy.

    With option Ofast unroll, this patch can help to improve SPEC2017
    bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while
    508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any
    remarkable degradations.  It also improved something on SVE as
    testcase changes showed and Richard's confirmation.

    Bootstrapped & regtested on powerpc64le-linux-gnu P9,
    x86_64-redhat-linux and aarch64-linux-gnu.

    gcc/ChangeLog:

            PR rtl-optimization/100328
            * doc/invoke.texi (ira-consider-dup-in-all-alts): Document new
            parameter.
            * ira.c (ira_get_dup_out_num): Adjust as parameter
            param_ira_consider_dup_in_all_alts.
            * params.opt (ira-consider-dup-in-all-alts): New.
            * ira-conflicts.c (process_regs_for_copy): Add one parameter
            single_input_op_has_cstr_p.
            (get_freq_for_shuffle_copy): New function.
            (add_insn_allocno_copies): Adjust as single_input_op_has_cstr_p.
            * ira-int.h (ira_get_dup_out_num): Add one bool parameter.

    gcc/testsuite/ChangeLog:

            PR rtl-optimization/100328
            * gcc.target/aarch64/sve/acle/asm/div_f16.c: Remove one xfail.
            * gcc.target/aarch64/sve/acle/asm/div_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/div_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/divr_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/divr_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/divr_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mad_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mad_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mad_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mla_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mla_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mla_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mls_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mls_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mls_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/msb_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/msb_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/msb_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mulx_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mulx_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/mulx_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmad_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmad_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmad_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmla_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmla_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmla_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmls_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmls_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmls_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmsb_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmsb_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/nmsb_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/sub_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/sub_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/sub_f64.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/subr_f16.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/subr_f32.c: Likewise.
            * gcc.target/aarch64/sve/acle/asm/subr_f64.c: Likewise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
@ 2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
  2021-07-07  3:07 ` linkw at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-06  2:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:c64d15224c6dc4dc3b6b3c2c6b6fd5c7e0e9c072

commit r12-2046-gc64d15224c6dc4dc3b6b3c2c6b6fd5c7e0e9c072
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Mon Jul 5 20:54:16 2021 -0500

    i386: Disable param ira-consider-dup-in-all-alts [PR100328]

    With Hongtao's SPEC2017 performance evaluation result here:
    https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573986.html
    this new parameter ira-consider-dup-in-all-alts has negative
    effects on i386, this patch is to disable it explicitly on
    i386.

    Bootstrapped & regtested on x86_64-redhat-linux.

    gcc/ChangeLog:

            PR rtl-optimization/100328
            * config/i386/i386-options.c (ix86_option_override_internal):
            Set param_ira_consider_dup_in_all_alts to 0.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/100328] IRA doesn't model matching constraint well
  2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
@ 2021-07-07  3:07 ` linkw at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-07-07  3:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #11 from Kewen Lin <linkw at gcc dot gnu.org> ---
Should be resolved on trunk.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-07  3:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
2021-04-30  8:30 ` [Bug rtl-optimization/100328] " linkw at gcc dot gnu.org
2021-06-23 18:28 ` [Bug rtl-optimization/100328] IRA doesn't model matching " vmakarov at gcc dot gnu.org
2021-06-24 12:11 ` linkw at gcc dot gnu.org
2021-06-24 12:36 ` linkw at gcc dot gnu.org
2021-06-28  3:27 ` linkw at gcc dot gnu.org
2021-06-28  5:25 ` linkw at gcc dot gnu.org
2021-06-29 16:01 ` rsandifo at gcc dot gnu.org
2021-07-01  6:18 ` linkw at gcc dot gnu.org
2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
2021-07-07  3:07 ` linkw at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).