public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well
@ 2021-04-29  6:41 linkw at gcc dot gnu.org
  2021-04-30  8:30 ` [Bug rtl-optimization/100328] " linkw at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-04-29  6:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328

            Bug ID: 100328
           Summary: IRA doesn't model dup num constraint well
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

source: function LBM_performStreamCollideTRT in SPEC2017 519.lbm_r

This issue was exposed by O2 vectorization enablement evaluation on 519.lbm_r.

baseline option: -O2 -mcpu=power9 -ffast-math

test option: -O2 -mcpu=power9 -ffast-math -ftree-vectorize
             -fvect-cost-model=very-cheap

The ratio with test option will degrade -1.66% against baseline (-1.74% without
the very-cheap cost model).

The hotspot LBM_performStreamCollideTRT isn't vectorized at all, but the
pre-pass if-conversion of vectorization gets the issue exposed. Firstly,
if-conversion will use the new copied loop as the scalar version after loop
versioning, once vectorization fails, we end up with one loop which has a
little
difference against before.

The difference mainly comes from: 

1) Different basic block placement. For this function, the fall through BB and
branch BB are switched. The reason is that the new copied loop BBs are adjusted
as dom_order while the idom insertion order changes when it sets the idom
during
copying. Anyway, it's acceptable. 

2) SSA names difference. The new copied loop can reuse some discarded
SSA_names,
the gimple commutative operands  canonicalization will change some order.

I did some hack to filter the fall through/branch BB difference, the gap
becomes
smaller but still some. The remaining difference on gimple are some operand
orders as mentioned above, the difference on assembly file are some different
insns choices mainly on fma style insns, one remarkable difference is the
number
of register copies: 

  fmr + xxlor: 16 (baseline) vs 21 (test respecting fall through)

In this function, there are many FMA style expressions (27 FMA, 19 FMS, 11
FNMA). Their VSX_REG version insns are destructive and the define_insns look
like:

(define_insn "*nfma<mode>4_fpr"
  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
        (neg:SFDF
         (fma:SFDF
          (match_operand:SFDF 1 "gpc_reg_operand" "<Ff>,wa,wa")
          (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
          (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa"))))]
  "TARGET_HARD_FLOAT"
  "@
   fnmadd<s> %0,%1,%2,%3
   xsnmadda<sd>p %x0,%x1,%x2
   xsnmaddm<sd>p %x0,%x1,%x3"
  [(set_attr "type" "fp")
   (set_attr "isa" "*,<Fisa>,<Fisa>")])

(define_insn "*fms<mode>4_fpr"
  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
        (fma:SFDF
         (match_operand:SFDF 1 "gpc_reg_operand" "<Ff>,wa,wa")
         (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
         (neg:SFDF (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa"))))]
...

(define_insn "*fma<mode>4_fpr"
  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Ff>,wa,wa")
        (fma:SFDF
          (match_operand:SFDF 1 "gpc_reg_operand" "%<Ff>,wa,wa")
          (match_operand:SFDF 2 "gpc_reg_operand" "<Ff>,wa,0")
          (match_operand:SFDF 3 "gpc_reg_operand" "<Ff>,0,wa")))]
...


Since the 1st alternative are with class FLOAT_REG, which are the subset of
VSX_REG whose total number are 64 while fp shares the first 32, in most cases
the preferred rclass for these insns are VSX_REG. Assuming we have the
expression that:

  FMA A,B,C,D

If these four register are totally different, it can not meet with the
alternatives with duplicated number constraint. If it prefers to use the
remaining alternative (1st), at the same time, if one of these isn't low 32 vsx
(can't fit with fp), we have to generate register copy from vsx register (high
number vsx reg) to fp register (low number vsx reg).

How the commutative operand order affects this? 

IRA tries to create copy for register coalescing, for FMA expression above,
assuming both B and C are dead at the current insn, it will have copy on A/B
and
A/C, later when it does thread forming, if both A/B and A/C have the same freq,
lower copy number comes first. It means the operand order can affect how we
form
the thread, different pulled-in allocno will probably produce different
conflict
set, it further affects the global thread forming and final assignment.

But I think the root cause is that when we create copy for these fma style
insns, ira doesn't fully consider the duplicate number constraint, for example,
for FMS if the operands 1,2,3 are dead, both 2 and 3 should take higher
priority
in copy queue.

I noticed that there is one function ira_get_dup_out_num, which meant to create
this kind of copy, but the below code looks to refuse to create if there is an
alternative which has valid regclass without spilled need. 

              default:
                {
                  enum constraint_num cn = lookup_constraint (str);
                  enum reg_class cl = reg_class_for_constraint (cn);
                  if (cl != NO_REGS
                      && !targetm.class_likely_spilled_p (cl))
                    goto fail

                 ...

Is there some particular reason for this behavior?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-07  3:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29  6:41 [Bug rtl-optimization/100328] New: IRA doesn't model dup num constraint well linkw at gcc dot gnu.org
2021-04-30  8:30 ` [Bug rtl-optimization/100328] " linkw at gcc dot gnu.org
2021-06-23 18:28 ` [Bug rtl-optimization/100328] IRA doesn't model matching " vmakarov at gcc dot gnu.org
2021-06-24 12:11 ` linkw at gcc dot gnu.org
2021-06-24 12:36 ` linkw at gcc dot gnu.org
2021-06-28  3:27 ` linkw at gcc dot gnu.org
2021-06-28  5:25 ` linkw at gcc dot gnu.org
2021-06-29 16:01 ` rsandifo at gcc dot gnu.org
2021-07-01  6:18 ` linkw at gcc dot gnu.org
2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
2021-07-06  2:35 ` cvs-commit at gcc dot gnu.org
2021-07-07  3:07 ` linkw at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).