public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES
@ 2020-12-25  1:38 crazylht at gmail dot com
  2020-12-31  3:38 ` [Bug target/98442] " crazylht at gmail dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2020-12-25  1:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442

            Bug ID: 98442
           Summary: [X86] suboptimal for memset with CLEAR_BY_PIECES
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
                CC: hjl.tools at gmail dot com, wei3.xiao at intel dot com,
                    wwwhhhyyy333 at gmail dot com
  Target Milestone: ---
            Target: x86_64-*-* i?86-*-*

cat test.c

--------
char Tab[64];
void foo(int n)
{
    for (int i= 0; i != 64; i++)
     Tab[i] = 0;
}
----


gcc generate

------
foo(int):
  vpxor xmm0, xmm0, xmm0
  vmovdqa XMMWORD PTR Tab[rip], xmm0
  vmovdqa XMMWORD PTR Tab[rip+16], xmm0
  vmovdqa XMMWORD PTR Tab[rip+32], xmm0
  vmovdqa XMMWORD PTR Tab[rip+48], xmm0
  ret
Tab:
  .zero 64
---------

Could be better

----
foo(int):
        vpxor     ymm0, ymm0, ymm0                              #4.5
        vmovdqu   YMMWORD PTR Tab[rip], ymm0                    #4.5
        vmovdqu   YMMWORD PTR 32+Tab[rip], ymm0                 #4.5
        vzeroupper                                              #6.1
        ret                                                     #6.1
Tab:
-----

GCC use 128-bit as default
----
bool
default_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
                                        unsigned int alignment,
                                        enum by_pieces_operation op,
                                        bool speed_p)
{
  unsigned int max_size = 0;
  unsigned int ratio = 0;

  switch (op)
    {
    case CLEAR_BY_PIECES:
      max_size = STORE_MAX_PIECES;
      ratio = CLEAR_RATIO (speed_p);
----

Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P for i386?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
  2020-12-25  1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
@ 2020-12-31  3:38 ` crazylht at gmail dot com
  2020-12-31  3:48 ` hjl.tools at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2020-12-31  3:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
> 
> Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P for i386?

It's actually determined by MOV_MAX_PIECES and related to MAX_FIXED_MODE_SIZE

   ??? We should use TImode in 32-bit mode and use OImode or XImode
   if they are available.  But since by_pieces_ninsns determines the
   widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
   64-bit mode.  */
#define MOVE_MAX_PIECES \
  ((TARGET_64BIT \
    && TARGET_SSE2 \
    && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
    && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
   ? GET_MODE_SIZE (OImode) : UNITS_PER_WORD)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
  2020-12-25  1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
  2020-12-31  3:38 ` [Bug target/98442] " crazylht at gmail dot com
@ 2020-12-31  3:48 ` hjl.tools at gmail dot com
  2020-12-31  3:56 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2020-12-31  3:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-12-31
     Ever confirmed|0                           |1

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
Please take a look at users/hjl/pieces/master branch:

https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master

You may get some ideas.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
  2020-12-25  1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
  2020-12-31  3:38 ` [Bug target/98442] " crazylht at gmail dot com
  2020-12-31  3:48 ` hjl.tools at gmail dot com
@ 2020-12-31  3:56 ` hjl.tools at gmail dot com
  2021-01-05 10:05 ` rguenth at gcc dot gnu.org
  2021-10-06 23:48 ` hjl.tools at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2020-12-31  3:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442

--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to H.J. Lu from comment #2)
> Please take a look at users/hjl/pieces/master branch:
> 
> https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master
> 
> You may get some ideas.

I got

[hjl@gnu-cfl-1 gcc]$ cat /tmp/x.c
char Tab[64];
void foo(int n)
{
    for (int i= 0; i != 64; i++)
     Tab[i] = 0;
}
[hjl@gnu-cfl-1 gcc]$ ./xgcc -B./ -march=skylake -S -O2 /tmp/x.c
[hjl@gnu-cfl-1 gcc]$ cat x.s
        .file   "x.c"
        .text
        .p2align 4
        .globl  foo
        .type   foo, @function
foo:
.LFB0:
        .cfi_startproc
        vpxor   %xmm0, %xmm0, %xmm0
        vmovups %ymm0, Tab(%rip)
        vmovups %ymm0, Tab+32(%rip)
        vzeroupper
        ret
        .cfi_endproc
.LFE0:
        .size   foo, .-foo
        .comm   Tab,64,32
        .ident  "GCC: (GNU) 10.0.0 20190523 (experimental)"
        .section        .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-1 gcc]$ 

This requires middle-end changes.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
  2020-12-25  1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2020-12-31  3:56 ` hjl.tools at gmail dot com
@ 2021-01-05 10:05 ` rguenth at gcc dot gnu.org
  2021-10-06 23:48 ` hjl.tools at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-05 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Using ymm might also trigger dynamic stack realignment if we ever spill, also
using ymm can be slower when the memory is unaligned (and/or when the CPU
has split AVX support only).  It will also require vzeroupper.

So I wonder if it is really worth for small structures like this?  And with
fast rep;movb isn't that even better?  [can fast rep/movb stores be forwarded?]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
  2020-12-25  1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
                   ` (3 preceding siblings ...)
  2021-01-05 10:05 ` rguenth at gcc dot gnu.org
@ 2021-10-06 23:48 ` hjl.tools at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2021-10-06 23:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> ---
Dup.

*** This bug has been marked as a duplicate of bug 90773 ***

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-06 23:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-25  1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
2020-12-31  3:38 ` [Bug target/98442] " crazylht at gmail dot com
2020-12-31  3:48 ` hjl.tools at gmail dot com
2020-12-31  3:56 ` hjl.tools at gmail dot com
2021-01-05 10:05 ` rguenth at gcc dot gnu.org
2021-10-06 23:48 ` hjl.tools at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).