public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars
@ 2021-09-25 15:22 david.bolvansky at gmail dot com
  2021-09-25 19:05 ` [Bug target/102483] Reduction of 4 chars can be improved pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: david.bolvansky at gmail dot com @ 2021-09-25 15:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483

            Bug ID: 102483
           Summary: Regression in codegen of reduction of 4 chars
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

char foo (char* p)
 {
   char sum = 0;
    for (int i = 0; i != 4; i++)
    sum += p[i];
     return sum;
  }

-O3 -march=x86-64


GCC trunk:

foo:
        mov     edx, DWORD PTR [rdi]
        movzx   eax, dh
        mov     ecx, edx
        add     eax, edx
        shr     ecx, 16
        add     eax, ecx
        shr     edx, 24
        add     eax, edx
        ret


GCC 11 (much better):
foo:
        movzx   eax, BYTE PTR [rdi+1]
        add     al, BYTE PTR [rdi]
        add     al, BYTE PTR [rdi+2]
        add     al, BYTE PTR [rdi+3]
        ret


Best? llvm-mca says so..

foo:                                    # @foo
        movd    xmm0, dword ptr [rdi]           # xmm0 = mem[0],zero,zero,zero
        pxor    xmm1, xmm1
        psadbw  xmm1, xmm0
        movd    eax, xmm1
        ret


https://godbolt.org/z/sT9svvj7W

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/102483] Reduction of 4 chars can be improved
  2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
@ 2021-09-25 19:05 ` pinskia at gcc dot gnu.org
  2021-09-27  8:36 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-25 19:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |target
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-09-25
             Target|                            |x86_64-linux-gnu
            Summary|Regression in codegen of    |Reduction of 4 chars can be
                   |reduction of 4 chars        |improved
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
There is a cost model issue as the backend does not implement a reduction add
for vector(4) char.

Note just a slightly different function (doing a store rather than a return)
produces way different results too (even on LLMV):
void foo (unsigned char* p, unsigned char *t)
 {
   char sum = 0;
    for (int i = 0; i != 4; i++)
    sum += p[i];
   *t= sum;
  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/102483] Reduction of 4 chars can be improved
  2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
  2021-09-25 19:05 ` [Bug target/102483] Reduction of 4 chars can be improved pinskia at gcc dot gnu.org
@ 2021-09-27  8:36 ` rguenth at gcc dot gnu.org
  2021-09-27  8:46 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-27  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |12.0
             Blocks|                            |53947

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's an interesting sub-case which we could eventually handle generally as a
mechanism for the final reduction.  Note we need to constrain vector [us]sad
as to which lanes are summed, otherwise we might get them to spread to
4 integer lanes.

Of course the target could simply provide reduc_plus_scal_v4qi ...


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/102483] Reduction of 4 chars can be improved
  2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
  2021-09-25 19:05 ` [Bug target/102483] Reduction of 4 chars can be improved pinskia at gcc dot gnu.org
  2021-09-27  8:36 ` rguenth at gcc dot gnu.org
@ 2021-09-27  8:46 ` crazylht at gmail dot com
  2021-10-11  6:22 ` crazylht at gmail dot com
  2021-10-12  7:25 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27  8:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
Also for reduc_umin/umax/smin/smax_scal_v4qi.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/102483] Reduction of 4 chars can be improved
  2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
                   ` (2 preceding siblings ...)
  2021-09-27  8:46 ` crazylht at gmail dot com
@ 2021-10-11  6:22 ` crazylht at gmail dot com
  2021-10-12  7:25 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2021-10-11  6:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> Also for reduc_umin/umax/smin/smax_scal_v4qi.


After providing expanders for reduc_umin/umax/smin/smax_scal_v4qi, perfomance
for below functions are a little bit faster than before for -O2 -march=haswell,
-O2 -march=skylake-avx512 and -Ofast -march=skylake-avx512.

char
__attribute__((noipa, optimize("Ofast"),target("sse4.1")))
reduce_add (char* p)
{
  char sum = 0;
  for (int i = 0; i != 4; i++)
    sum += p[i];
  return sum;
}

#define MAX(a, b) ((a) > (b) ? (a) : (b))
#define MIN(a, b) ((a) > (b) ? (b) : (a))

unsigned char
__attribute__((noipa))
reduce_umax (unsigned char* p)
{
  unsigned char sum = p[0];
  for (int i = 0; i != 4; i++)
    sum = MAX(sum, p[i]);
  return sum;
}

unsigned char
__attribute__((noipa))
reduce_umin (unsigned char* p)
{
  unsigned char sum = p[0];
  for (int i = 0; i != 4; i++)
    sum = MIN(sum, p[i]);
  return sum;
}

char
__attribute__((noipa))
reduce_smax (char* p)
{
  char sum = p[0];
  for (int i = 0; i != 4; i++)
    sum = MAX(sum, p[i]);
  return sum;
}

char
__attribute__((noipa))
reduce_smin (char* p)
{
  char sum = p[0];
  for (int i = 0; i != 4; i++)
    sum = MIN(sum, p[i]);
  return sum;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/102483] Reduction of 4 chars can be improved
  2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
                   ` (3 preceding siblings ...)
  2021-10-11  6:22 ` crazylht at gmail dot com
@ 2021-10-12  7:25 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-10-12  7:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:73c535a00bc4dfe9a939cd80facbe79a929cab3e

commit r12-4339-g73c535a00bc4dfe9a939cd80facbe79a929cab3e
Author: liuhongt <hongtao.liu@intel.com>
Date:   Sat Oct 9 14:34:38 2021 +0800

    Support reduc_{plus,smax,smin,umax,umin}_scal_v4qi.

    gcc/ChangeLog

            PR target/102483
            * config/i386/i386-expand.c (emit_reduc_half): Handle
            V4QImode.
            * config/i386/mmx.md (reduc_<code>_scal_v4qi): New expander.
            (reduc_plus_scal_v4qi): Ditto.

    gcc/testsuite/ChangeLog

            * gcc.target/i386/pr102483.c: New test.
            * gcc.target/i386/pr102483-2.c: New test.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-12  7:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
2021-09-25 19:05 ` [Bug target/102483] Reduction of 4 chars can be improved pinskia at gcc dot gnu.org
2021-09-27  8:36 ` rguenth at gcc dot gnu.org
2021-09-27  8:46 ` crazylht at gmail dot com
2021-10-11  6:22 ` crazylht at gmail dot com
2021-10-12  7:25 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).