public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/96239] New: Failure to recognize __builtin_bswap16 pattern
@ 2020-07-18  0:39 gabravier at gmail dot com
  2020-07-20  6:15 ` [Bug tree-optimization/96239] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: gabravier at gmail dot com @ 2020-07-18  0:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96239

            Bug ID: 96239
           Summary: Failure to recognize __builtin_bswap16 pattern
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

union BytesOverlay
{
    uint8_t u8[2];
    uint16_t u16;
};

uint16_t f(uint16_t from)
{
    BytesOverlay overlay;
    overlay.u16 = from;
    uint8_t byte1 = overlay.u8[0];
    uint8_t byte2 = overlay.u8[1];
    overlay.u8[0] = byte2;
    overlay.u8[1] = byte1;
    return overlay.u16;
}

This can be transformed to `return __builtin_bswap16(from);`. This
transformation is done by LLVM, but not by GCC.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/96239] Failure to recognize __builtin_bswap16 pattern
  2020-07-18  0:39 [Bug tree-optimization/96239] New: Failure to recognize __builtin_bswap16 pattern gabravier at gmail dot com
@ 2020-07-20  6:15 ` rguenth at gcc dot gnu.org
  2020-12-14 14:41 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-07-20  6:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96239

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-07-20
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I see at -O2

Coalescing successful!
Merged into 1 stores
16 bit bswap implementation found at: _9
New sequence of 1 stores to replace old one of 2 stores

  _9 = from_2(D) r>> 8;
  MEM[(union BytesOverlay *)&overlay] = _9;

and

        movl    %edi, %eax
        rolw    $8, %ax
        ret

with -O3 the vectorizer gets in the way and elides overlay:

  _7 = (unsigned char) from_2(D);
  _8 = BIT_FIELD_REF <from_2(D), 8, 8>;
  _9 = {_8, _7};
  _3 = VIEW_CONVERT_EXPR<short unsigned int>(_9);
  overlay ={v} {CLOBBER};
  return _3;

now we could fold the V_C_E of the vector ctor to a bswap on from_2 though
that would be quite a big special pattern.  Maybe vector-ctor folding can
consider this case to convert it to a V_C_E to a vector type from the
bswap result.  OTOH "fixing" the vectorizer to emit a bswap would be
even nicer.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/96239] Failure to recognize __builtin_bswap16 pattern
  2020-07-18  0:39 [Bug tree-optimization/96239] New: Failure to recognize __builtin_bswap16 pattern gabravier at gmail dot com
  2020-07-20  6:15 ` [Bug tree-optimization/96239] " rguenth at gcc dot gnu.org
@ 2020-12-14 14:41 ` jakub at gcc dot gnu.org
  2020-12-14 19:54 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-12-14 14:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96239

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Doing it in the vectorizer wouldn't optimize it when users write it by hand,
like:

typedef char V __attribute__((vector_size (2)));
typedef char W __attribute__((vector_size (8)));

V
foo (unsigned short x)
{
  return (V) { x >> 8, x };
}

W
bar (unsigned long long x)
{
  return (W) { x >> 56, x >> 48, x >> 40, x >> 32, x >> 24, x >> 16, x >> 8, x
};
}

and forwprop doesn't have the infrastructure needed for this.
Wouldn't the bswap pass be the right spot to handle this?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/96239] Failure to recognize __builtin_bswap16 pattern
  2020-07-18  0:39 [Bug tree-optimization/96239] New: Failure to recognize __builtin_bswap16 pattern gabravier at gmail dot com
  2020-07-20  6:15 ` [Bug tree-optimization/96239] " rguenth at gcc dot gnu.org
  2020-12-14 14:41 ` jakub at gcc dot gnu.org
@ 2020-12-14 19:54 ` jakub at gcc dot gnu.org
  2020-12-16 12:11 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-12-14 19:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96239

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 49766
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49766&action=edit
gcc11-pr96239.patch

So, I wrote this untested patch which fixes my testcases, but doesn't fix due
to pass ordering issue the original one, as the bswap pass is done before
vectorization only (both loop and slp).
I guess some of the options are schedule another bswap pass instance later, or
include just the vector CONSTRUCTOR handling in e.g. the store-merging pass
(yes, it would be related just because the bswap code lives in the
store-merging source file and performs also bswap optimizations upon stores).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/96239] Failure to recognize __builtin_bswap16 pattern
  2020-07-18  0:39 [Bug tree-optimization/96239] New: Failure to recognize __builtin_bswap16 pattern gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2020-12-14 19:54 ` jakub at gcc dot gnu.org
@ 2020-12-16 12:11 ` cvs-commit at gcc dot gnu.org
  2021-01-05 15:17 ` cvs-commit at gcc dot gnu.org
  2021-01-05 15:24 ` jakub at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-12-16 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96239

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:cd676dfa57e643a4f7d8445e6ebad0f21cf3fd84

commit r11-6115-gcd676dfa57e643a4f7d8445e6ebad0f21cf3fd84
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Dec 16 13:08:07 2020 +0100

    bswap: Handle vector CONSTRUCTORs [PR96239]

    The following patch teaches the bswap pass to handle for small (2/4/8 byte
    long) vectors a CONSTRUCTOR by determining if the bytes of the constructor
    come from non-vector sources and are either nop or bswap and changing the
    CONSTRUCTOR in that case to VIEW_CONVERT_EXPR from scalar integer to
    the vector type.

    Unfortunately, as I found after the patch was written, due to pass ordering
    this doesn't really fix the original testcase, just the one I wrote,
    because both loop and slp vectorization is done only after the bswap pass.
    A possible way out of that would be to perform just this particular bswap
    optimization (i.e. for CONSTRUCTOR assignments with integral vector types
    call find_bswap_or_nop and bswap_replace if successful) also during the
    store merging pass, it isn't really a store, but the store merging pass
    already performs bswapping when handling store, so it wouldn't be that big
    hack.  What do you think?

    2020-12-16  Jakub Jelinek  <jakub@redhat.com>

            PR tree-optimization/96239
            * gimple-ssa-store-merging.c (find_bswap_or_nop): Handle a vector
            CONSTRUCTOR.
            (bswap_replace): Likewise.

            * gcc.dg/pr96239.c: New test.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/96239] Failure to recognize __builtin_bswap16 pattern
  2020-07-18  0:39 [Bug tree-optimization/96239] New: Failure to recognize __builtin_bswap16 pattern gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2020-12-16 12:11 ` cvs-commit at gcc dot gnu.org
@ 2021-01-05 15:17 ` cvs-commit at gcc dot gnu.org
  2021-01-05 15:24 ` jakub at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-05 15:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96239

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a7553ad60bebc419d510564b8b9f9e5e03725ff5

commit r11-6470-ga7553ad60bebc419d510564b8b9f9e5e03725ff5
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 5 16:16:06 2021 +0100

    store-merging: Handle vector CONSTRUCTORs using bswap [PR96239]

    I've tried to add such helper, but handling over just analysis and letting
    each pass handle it differently seems complicated given the limitations of
    the bswap infrastructure.

    So, this patch just hooks the optimization also into store-merging so that
    the original testcase from the PR can be fixed.

    2021-01-05  Jakub Jelinek  <jakub@redhat.com>

            PR tree-optimization/96239
            * gimple-ssa-store-merging.c (maybe_optimize_vector_constructor):
New
            function.
            (get_status_for_store_merging): Don't return BB_INVALID for blocks
            with potential bswap optimizable CONSTRUCTORs.
            (pass_store_merging::execute): Optimize vector CONSTRUCTORs with
bswap
            if possible.

            * gcc.dg/tree-ssa/pr96239.c: New test.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/96239] Failure to recognize __builtin_bswap16 pattern
  2020-07-18  0:39 [Bug tree-optimization/96239] New: Failure to recognize __builtin_bswap16 pattern gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2021-01-05 15:17 ` cvs-commit at gcc dot gnu.org
@ 2021-01-05 15:24 ` jakub at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-05 15:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96239

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Should be fixed now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-05 15:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-18  0:39 [Bug tree-optimization/96239] New: Failure to recognize __builtin_bswap16 pattern gabravier at gmail dot com
2020-07-20  6:15 ` [Bug tree-optimization/96239] " rguenth at gcc dot gnu.org
2020-12-14 14:41 ` jakub at gcc dot gnu.org
2020-12-14 19:54 ` jakub at gcc dot gnu.org
2020-12-16 12:11 ` cvs-commit at gcc dot gnu.org
2021-01-05 15:17 ` cvs-commit at gcc dot gnu.org
2021-01-05 15:24 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).