[Bug tree-optimization/114932] New: Improvement in CHREC can give large performance gains

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "tnfchris at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/114932] New: Improvement in CHREC can give large performance gains
Date: Fri, 03 May 2024 05:51:26 +0000	[thread overview]
Message-ID: <bug-114932-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

            Bug ID: 114932
           Summary: Improvement in CHREC can give large performance gains
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

With the original fix from PR114074 applied (e.g.
g:a0b1798042d033fd2cc2c806afbb77875dd2909b) we not only saw regressions but saw
big improvements.

The following testcase:

---
  module brute_force
    integer, parameter :: r=9
     integer  block(r, r, r)
    contains
  subroutine brute
      k = 1
call digits_2(k)
  end
   recursive subroutine digits_2(row)
  integer, intent(in) :: row
  logical OK
     do i1 = 0, 1
      do i2 = 1, 1
          do i3 = 1, 1
           do i4 = 0, 1
                do i5 = 1, select
                     do i6 = 0, 1
                         do i7 = l0, u0
                       select case(1 )
                       case(1)
                           block(:2, 7:, i7) = block(:2, 7:, i7) - 1
                       end select
                            do i8 = 1, 1
                               do i9 = 1, 1
                            if(row == 5) then
                          elseif(OK)then
                                    call digits_2(row + 1)
                                end if
                                end do
                          end do
                       block(:, 1, i7) =   select
                    end do
                    end do
              end do
              end do
           end do
        block = 1
     end do
     block = 1
     block = block0 + select
  end do
 end
  end
---

compiled with: -mcpu=neoverse-v1 -Ofast -fomit-frame-pointer foo.f90

gets vectorized after sra and constprop.  But the final addressing modes are so
complicated that IVopts generates a register offset mode:

  4c:   2f00041d        mvni    v29.2s, #0x0
  50:   fc666842        ldr     d2, [x2, x6]
  54:   fc656841        ldr     d1, [x2, x5]
  58:   fc646840        ldr     d0, [x2, x4]
  5c:   0ebd8442        add     v2.2s, v2.2s, v29.2s
  60:   0ebd8421        add     v1.2s, v1.2s, v29.2s
  64:   0ebd8400        add     v0.2s, v0.2s, v29.2s

which is harder for prefetchers to follow.  When the patch was applied it was
able to correctly lower these to the immediate offset loads that the scalar
code was using:

  38:   2f00041d        mvni    v29.2s, #0x0
  34:   fc594002        ldur    d2, [x0, #-108]
  40:   fc5b8001        ldur    d1, [x0, #-72]
  44:   fc5dc000        ldur    d0, [x0, #-36]
  48:   0ebd8442        add     v2.2s, v2.2s, v29.2s
  4c:   0ebd8421        add     v1.2s, v1.2s, v29.2s
  50:   0ebd8400        add     v0.2s, v0.2s, v29.2s

and also removes all the additional instructions to keep x6,x5 and x4 up to
date.

This gave 10%+ improvements on various workloads.

(ps I'm looking at the __brute_force_MOD_digits_2.constprop.3.isra.0
specialization).

I will try to reduce it more, but am filing this so we can keep track and
hopefully fix.

next             reply	other threads:[~2024-05-03  5:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-03  5:51 tnfchris at gcc dot gnu.org [this message]
2024-05-03  6:26 ` [Bug tree-optimization/114932] " rguenth at gcc dot gnu.org
2024-05-03  7:03 ` pinskia at gcc dot gnu.org
2024-05-03  8:09 ` tnfchris at gcc dot gnu.org
2024-05-03  8:41 ` tnfchris at gcc dot gnu.org
2024-05-03  8:44 ` tnfchris at gcc dot gnu.org
2024-05-03  8:45 ` tnfchris at gcc dot gnu.org
2024-05-03  9:12 ` rguenth at gcc dot gnu.org
2024-05-13  8:28 ` tnfchris at gcc dot gnu.org
2024-06-05  9:42 ` [Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing tnfchris at gcc dot gnu.org
2024-06-05 10:23 ` rguenth at gcc dot gnu.org
2024-06-05 19:02 ` tnfchris at gcc dot gnu.org
2024-06-06  6:17 ` rguenther at suse dot de
2024-06-06  6:40 ` tnfchris at gcc dot gnu.org
2024-06-06  7:55 ` rguenther at suse dot de
2024-06-06  8:01 ` tnfchris at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-114932-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).