public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/110888] New: Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros
@ 2023-08-03  7:06 rimvydas.jas at gmail dot com
  2023-08-03  7:11 ` [Bug fortran/110888] " rimvydas.jas at gmail dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: rimvydas.jas at gmail dot com @ 2023-08-03  7:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888

            Bug ID: 110888
           Summary: Missing optimization for trivial MATMUL cases,
                    requires -fno-signed-zeros
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rimvydas.jas at gmail dot com
  Target Milestone: ---

$ cat foo.f90
subroutine foo(x,y,z)
  implicit none
  real(kind=selected_real_kind(9,99)) :: x(1), y(1,1), z(1)
  z = matmul(x,y)
end subroutine

$ gfortran -c -S -Wall -Wextra -O2 -fdump-tree-optimized foo.f90

The do loop get reduced in matmul intrinsic implementation, however redundant
accumulator store is not optimized out during PRE if
-ffast-math(-fno-signed-zeros) is not used:
   <bb 2> [local count: 536870912]:
   __builtin_memset (z_11(D), 0, 8);
-  _18 = (*z_11(D))[0];
   _19 = (*x_13(D))[0];
   _20 = (*y_14(D))[0];
   _21 = _19 * _20;
-  _22 = _18 + _21;
+  _22 = _21 + 0.0;
   (*z_11(D))[0] = _22;
   return;

Not sure if it is possible to mark accumulator expr as artificial so that
optimizers could ignore side effects by default, but luckily it is easily
avoidable in frontend itself.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/110888] Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros
  2023-08-03  7:06 [Bug fortran/110888] New: Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros rimvydas.jas at gmail dot com
@ 2023-08-03  7:11 ` rimvydas.jas at gmail dot com
  2023-08-04  1:01 ` jvdelisle at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rimvydas.jas at gmail dot com @ 2023-08-03  7:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888

--- Comment #1 from Rimvydas (RJ) <rimvydas.jas at gmail dot com> ---
Created attachment 55680
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55680&action=edit
possible fix

With this patch an extra register is freed and compiler produces expected code
on x86_64:
        movsd   (%rdi), %xmm0
        mulsd   (%rsi), %xmm0
        movsd   %xmm0, (%rdx)
        ret

Patch could be expanded to consider inlining trivial matmul cases even for
known arrays of size 1 with rank > 2.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/110888] Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros
  2023-08-03  7:06 [Bug fortran/110888] New: Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros rimvydas.jas at gmail dot com
  2023-08-03  7:11 ` [Bug fortran/110888] " rimvydas.jas at gmail dot com
@ 2023-08-04  1:01 ` jvdelisle at gcc dot gnu.org
  2023-08-04 15:38 ` [Bug middle-end/110888] " tkoenig at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jvdelisle at gcc dot gnu.org @ 2023-08-04  1:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888

Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jvdelisle at gcc dot gnu.org,
                   |                            |tkoenig at gcc dot gnu.org

--- Comment #2 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
Copying Thomas on this and adding myself.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/110888] Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros
  2023-08-03  7:06 [Bug fortran/110888] New: Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros rimvydas.jas at gmail dot com
  2023-08-03  7:11 ` [Bug fortran/110888] " rimvydas.jas at gmail dot com
  2023-08-04  1:01 ` jvdelisle at gcc dot gnu.org
@ 2023-08-04 15:38 ` tkoenig at gcc dot gnu.org
  2023-08-04 15:45 ` [Bug fortran/110888] " tkoenig at gcc dot gnu.org
  2023-08-05  4:22 ` rimvydas.jas at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2023-08-04 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|fortran                     |middle-end

--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Interesting problem.

For

  _19 = (*x_13(D))[0];
  _20 = (*y_14(D))[0];
  _21 = _19 * _20;
  _22 = _21 + 0.0;

the multiplication cannot produce a signalling NaN, so the addition
of zero should always be a no-op. For this, a simpler test case would
be

double add(double a, double b)
{
  return a*b + 0.0;
}

which gets me, on x86_64, 

        mulsd   %xmm1, %xmm0
        pxor    %xmm1, %xmm1
        addsd   %xmm1, %xmm0
        re

According to godbolt, icc produces

add:
        mulsd     %xmm1, %xmm0                                  #3.12
        ret                           

which should be fine.

So, an issue for tree optimization?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/110888] Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros
  2023-08-03  7:06 [Bug fortran/110888] New: Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros rimvydas.jas at gmail dot com
                   ` (2 preceding siblings ...)
  2023-08-04 15:38 ` [Bug middle-end/110888] " tkoenig at gcc dot gnu.org
@ 2023-08-04 15:45 ` tkoenig at gcc dot gnu.org
  2023-08-05  4:22 ` rimvydas.jas at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2023-08-04 15:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |fortran

--- Comment #4 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Hm, on second thoughts, signed zeros are an issue, resetting to Fortran.

Generally, we are in an intrinsic, so we can do whatever we please
(we certainly do in the library case, and this is expected behavior).

Having -ffast-math applied locally to the BLOCK that the matmul
is executed in would be a possibility.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/110888] Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros
  2023-08-03  7:06 [Bug fortran/110888] New: Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros rimvydas.jas at gmail dot com
                   ` (3 preceding siblings ...)
  2023-08-04 15:45 ` [Bug fortran/110888] " tkoenig at gcc dot gnu.org
@ 2023-08-05  4:22 ` rimvydas.jas at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: rimvydas.jas at gmail dot com @ 2023-08-05  4:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888

--- Comment #5 from Rimvydas (RJ) <rimvydas.jas at gmail dot com> ---
It is more like this problem:
$ cat foo.c
void foo_(double *x, double *y, double *z)
{
  int i;
  __builtin_memset(z, 0, 8); /* z[0] = 0.0; */
  for (i=0; i<1 ; i++)
    z[0] += x[0] * y[0];
}

$ gcc -O2 -Wall -Wextra -c foo.c -S -fdump-tree-optimized
  <bb 2> [local count: 536870913]:
  __builtin_memset (z_9(D), 0, 8);
  _17 = *x_11(D);
  _18 = *y_12(D);
  _19 = _17 * _18;
  _20 = _19 + 0.0;
  *z_9(D) = _20;
  return;

It would be beneficial for all frontends if the use of __builtin_memset() to
zero out accumulators would be be considered as !HONOR_SIGNED_ZEROS at least
during PRE pass in the middle-end.  If that would complicate things, then
easier solution is to add special case in gfortran frontend-passes that simply
transforms expression to drop accumulator: z[0] = x[0] * y[0];

* side note, in C the redundant __builtin_memset() does not get optimized out,
unlike in gfortran "zero" expr version.  Might be middle-end optimization
passes ordering issue.  At least PRE pass does take accumulator zeroing into
account.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-08-05  4:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-03  7:06 [Bug fortran/110888] New: Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros rimvydas.jas at gmail dot com
2023-08-03  7:11 ` [Bug fortran/110888] " rimvydas.jas at gmail dot com
2023-08-04  1:01 ` jvdelisle at gcc dot gnu.org
2023-08-04 15:38 ` [Bug middle-end/110888] " tkoenig at gcc dot gnu.org
2023-08-04 15:45 ` [Bug fortran/110888] " tkoenig at gcc dot gnu.org
2023-08-05  4:22 ` rimvydas.jas at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).