public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/63259] New: Detecting byteswap sequence
@ 2014-09-13 17:41 bisqwit at iki dot fi
  2014-09-19 23:24 ` [Bug rtl-optimization/63259] " hp at gcc dot gnu.org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: bisqwit at iki dot fi @ 2014-09-13 17:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

            Bug ID: 63259
           Summary: Detecting byteswap sequence
           Product: gcc
           Version: 4.9.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bisqwit at iki dot fi

This is just silly. GCC optimizes the first function into single opcode
(bswap), but not the other. For Clang, it's the other way around.

unsigned byteswap_gcc(unsigned result)
{
    result = ((result & 0xFFFF0000u) >>16) | ((result & 0x0000FFFFu) <<16);
    result = ((result & 0xFF00FF00u) >> 8) | ((result & 0x00FF00FFu) << 8);
    return result;
}
unsigned byteswap_clang(unsigned result)
{
    result = ((result & 0xFF00FF00u) >> 8) | ((result & 0x00FF00FFu) << 8);
    result = ((result & 0xFFFF0000u) >>16) | ((result & 0x0000FFFFu) <<16);
    return result;
}

unsigned byteswap(unsigned v)
{
    #ifdef __clang__
     return byteswap_clang(v);
    #else
     return byteswap_gcc(v);
    #endif
}

GCC output:

    byteswap_gcc:
        movl    %edi, %eax
        bswap   %eax
        ret

    byteswap_clang:
        movl    %edi, %eax
        andl    $-16711936, %eax
        shrl    $8, %eax
        movl    %eax, %edx
        movl    %edi, %eax
        andl    $16711935, %eax
        sall    $8, %eax
        orl     %edx, %eax
        roll    $16, %eax
        ret

    byteswap:
        movl    %edi, %eax
        bswap   %eax
        ret

Clang output:

    byteswap_gcc:                           # @byteswap_gcc
        roll    $16, %edi
        movl    %edi, %eax
        shrl    $8, %eax
        andl    $16711935, %eax         # imm = 0xFF00FF
        shll    $8, %edi
        andl    $-16711936, %edi        # imm = 0xFFFFFFFFFF00FF00
        orl     %eax, %edi
        movl    %edi, %eax
        retq

    byteswap_clang:                         # @byteswap_clang
        bswapl  %edi
        movl    %edi, %eax
        retq

    byteswap:                               # @byteswap
        bswapl  %edi
        movl    %edi, %eax
        retq


Tested both -m32 and -m64, with options: -Ofast -S
Tested versions:
- gcc (Debian 4.9.1-11) 4.9.1  Target: x86_64-linux-gnu
- Debian clang version 3.5.0-+rc1-2 (tags/RELEASE_35/rc1) (based on LLVM 3.5.0)
 Target: x86_64-pc-linux-gnu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
@ 2014-09-19 23:24 ` hp at gcc dot gnu.org
  2014-09-25 18:31 ` olegendo at gcc dot gnu.org
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: hp at gcc dot gnu.org @ 2014-09-19 23:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

Hans-Peter Nilsson <hp at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-09-19
                 CC|                            |hp at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
Confirmed the general observation; observed (as far back as) 4.7 era and trunk
at r215401 for cris-elf (-march=v8 or higher needed).


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
  2014-09-19 23:24 ` [Bug rtl-optimization/63259] " hp at gcc dot gnu.org
@ 2014-09-25 18:31 ` olegendo at gcc dot gnu.org
  2014-09-28 10:09 ` thopre01 at gcc dot gnu.org
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-09-25 18:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |olegendo at gcc dot gnu.org,
                   |                            |thopre01 at gcc dot gnu.org

--- Comment #2 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Thomas just recently did some bswap patterns work, maybe he's got an idea.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
  2014-09-19 23:24 ` [Bug rtl-optimization/63259] " hp at gcc dot gnu.org
  2014-09-25 18:31 ` olegendo at gcc dot gnu.org
@ 2014-09-28 10:09 ` thopre01 at gcc dot gnu.org
  2014-09-30  1:09 ` thopre01 at gcc dot gnu.org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-09-28 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

thopre01 at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #3 from thopre01 at gcc dot gnu.org ---
The reason is that the last gimple statement of this bswap is a rotation right
on the result of a bitwise OR. Right now bswap are searched only from bitwise
OR statement doing recursion until the source. Therefore the analysis is only
done on a subset of the statements making up this bswap.

The fix is easy but needs compilation benchmarking to make sure this doesn't
increase the cost of compiling in a noticeable way. The more statement are
considered as finishing a bswap, the closer we are to a O(n²) algorithm.
>From gcc-bugs-return-462767-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Sun Sep 28 12:48:23 2014
Return-Path: <gcc-bugs-return-462767-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 7188 invoked by alias); 28 Sep 2014 12:48:22 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 7153 invoked by uid 48); 28 Sep 2014 12:48:16 -0000
From: "piotrdz at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/62056] Long compile times with large tuples
Date: Sun, 28 Sep 2014 12:48:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: piotrdz at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-62056-4-Z97Es0yoyM@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-62056-4@http.gcc.gnu.org/bugzilla/>
References: <bug-62056-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-09/txt/msg02601.txt.bz2
Content-length: 1206

https://gcc.gnu.org/bugzilla/show_bug.cgi?idb056

Piotr Dziwinski <piotrdz at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |piotrdz at gmail dot com

--- Comment #1 from Piotr Dziwinski <piotrdz at gmail dot com> ---
I can confirm this bug report as I also noticed longer compilation times with
GCC's implementation of `std::tuple`, specifically when switching from
`std::tr1::tuple` to `std::tuple` (problem originally reported in Google Test
library:
https://groups.google.com/forum/#!topic/googletestframework/TGrf26S65n0).

Using the same sample code as above, when I replace `std::tuple` to
`std::tr1::tuple`, I get better compilation time (tested with GCC 4.9.1):

std::tuple       0m0.388s
std::tr1::tuple  0m0.134s

This difference is small in such small example, but it has noticeable impact on
larger projects (total compilation time increase counted in minutes).

I would also second the proposal to fix this issue by implementing flat version
of std::tuple. Perhaps the existing std::tr1::tuple implementation can be
re-used here?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (2 preceding siblings ...)
  2014-09-28 10:09 ` thopre01 at gcc dot gnu.org
@ 2014-09-30  1:09 ` thopre01 at gcc dot gnu.org
  2014-10-31 12:09 ` thopre01 at gcc dot gnu.org
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-09-30  1:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #4 from thopre01 at gcc dot gnu.org ---
I detect no noticeable difference when bootstrapping gcc with or without the
patch so I think we're in for a fix. :-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (3 preceding siblings ...)
  2014-09-30  1:09 ` thopre01 at gcc dot gnu.org
@ 2014-10-31 12:09 ` thopre01 at gcc dot gnu.org
  2014-12-14 15:54 ` olegendo at gcc dot gnu.org
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-10-31 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #5 from thopre01 at gcc dot gnu.org ---
Author: thopre01
Date: Fri Oct 31 11:55:07 2014
New Revision: 216971

URL: https://gcc.gnu.org/viewcvs?rev=216971&root=gcc&view=rev
Log:
2014-10-31  Thomas Preud'homme  <thomas.preudhomme@arm.com>

    gcc/
    PR tree-optimization/63259
    * tree-ssa-math-opts.c (bswap_replace): Replace expression by a
    rotation left if it is a 16 bit byte swap.
    (pass_optimize_bswap::execute): Also consider bswap in LROTATE_EXPR
    and RROTATE_EXPR statements if it is a byte rotation.

    gcc/testsuite/
    PR tree-optimization/63259
    * optimize-bswapsi-1.c (swap32_f): New bswap pass test.
    * optimize-bswaphi-1.c: Drop useless SIType definition and fix typo in
    following comment.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
    trunk/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
    trunk/gcc/tree-ssa-math-opts.c


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (4 preceding siblings ...)
  2014-10-31 12:09 ` thopre01 at gcc dot gnu.org
@ 2014-12-14 15:54 ` olegendo at gcc dot gnu.org
  2014-12-14 19:35 ` thopre01 at gcc dot gnu.org
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-14 15:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #6 from Oleg Endo <olegendo at gcc dot gnu.org> ---
With r218705 on SH (-O2 -m4 -ml) I get the following:

unsigned short test_099 (unsigned short a, unsigned short b)
{
  return (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
}

compiles to:
        extu.w    r4,r4
        rts
        swap.b    r4,r0


unsigned short test_08 (unsigned short a, unsigned short b)
{
  return b + (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
}

compiles to:
        extu.w  r4,r4
        mov     r4,r0
        shll8   r4
        shlr8   r0
        or      r0,r4
        add     r4,r5
        rts
        extu.w  r5,r0


Byte swapping of signed short types seems to be not working:

short test_func_111 (short a, short b, short c)
{
  return (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
}

        exts.w  r4,r4
        mov     r4,r0
        shlr8   r0
        extu.b  r0,r0
        shll8   r4
        or      r0,r4
        rts
        exts.w  r4,r0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (5 preceding siblings ...)
  2014-12-14 15:54 ` olegendo at gcc dot gnu.org
@ 2014-12-14 19:35 ` thopre01 at gcc dot gnu.org
  2014-12-14 23:53 ` olegendo at gcc dot gnu.org
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-12-14 19:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #7 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #6)
> With r218705 on SH (-O2 -m4 -ml) I get the following:
> 
> unsigned short test_099 (unsigned short a, unsigned short b)
> {
>   return (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
> }
> 
> compiles to:
>         extu.w	r4,r4
>         rts
>         swap.b	r4,r0

This one looks ok except for the zero-extension. Is the swap.b instruction
limited to 32 bit values?

> 
> 
> unsigned short test_08 (unsigned short a, unsigned short b)
> {
>   return b + (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
> }
> 
> compiles to:
>         extu.w  r4,r4
>         mov     r4,r0
>         shll8   r4
>         shlr8   r0
>         or      r0,r4
>         add     r4,r5
>         rts
>         extu.w  r5,r0

Strange, could you show the output of -fdump-tree-bswap?

> 
> 
> Byte swapping of signed short types seems to be not working:
> 
> short test_func_111 (short a, short b, short c)
> {
>   return (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
> }
> 
>         exts.w  r4,r4
>         mov     r4,r0
>         shlr8   r0
>         extu.b  r0,r0
>         shll8   r4
>         or      r0,r4
>         rts
>         exts.w  r4,r0

That's expected. Think about what happens if a = 0x8001. Doing a right shift by
8 bit would give 0xFF80 (due to the most significant bit being 1). The right
part of the bitwise OR would give 0x0100 as expected and the result would be
0xFF80, so not a byte swap. It would work with an int though as the highest bit
would then be 0, or with unsigned short as a right shift would introduce 0 in
the most significant bits.


Best regards,

Thomas
>From gcc-bugs-return-470648-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Sun Dec 14 20:27:50 2014
Return-Path: <gcc-bugs-return-470648-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 11635 invoked by alias); 14 Dec 2014 20:27:50 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 11577 invoked by uid 48); 14 Dec 2014 20:27:45 -0000
From: "schwab@linux-m68k.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/63259] Detecting byteswap sequence
Date: Sun, 14 Dec 2014 20:27:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: schwab@linux-m68k.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: thopre01 at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-63259-4-YKLZPyPqxV@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-63259-4@http.gcc.gnu.org/bugzilla/>
References: <bug-63259-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg01655.txt.bz2
Content-length: 221

https://gcc.gnu.org/bugzilla/show_bug.cgi?idc259

--- Comment #8 from Andreas Schwab <schwab@linux-m68k.org> ---
(a & 0xFF00) >> 8 with short a = 0x8001 evaluates to 0x80, since all operands
are first promoted to int.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (6 preceding siblings ...)
  2014-12-14 19:35 ` thopre01 at gcc dot gnu.org
@ 2014-12-14 23:53 ` olegendo at gcc dot gnu.org
  2014-12-15 10:13 ` thopre01 at gcc dot gnu.org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-14 23:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #9 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to thopre01 from comment #7)
> (In reply to Oleg Endo from comment #6)
> > With r218705 on SH (-O2 -m4 -ml) I get the following:
> > 
> > unsigned short test_099 (unsigned short a, unsigned short b)
> > {
> >   return (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
> > }
> > 
> > compiles to:
> >         extu.w	r4,r4
> >         rts
> >         swap.b	r4,r0
> 
> This one looks ok except for the zero-extension. Is the swap.b instruction
> limited to 32 bit values?

Yes, in sh.md swap.b is defined for SImode values only.  But never mind the
extu.w, it's an SH ABI thing (PR 52441).

> 
> > 
> > 
> > unsigned short test_08 (unsigned short a, unsigned short b)
> > {
> >   return b + (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
> > }
> > 
> > compiles to:
> >         extu.w  r4,r4
> >         mov     r4,r0
> >         shll8   r4
> >         shlr8   r0
> >         or      r0,r4
> >         add     r4,r5
> >         rts
> >         extu.w  r5,r0
> 
> Strange, could you show the output of -fdump-tree-bswap?

Not so strange at all.  After looking at the RTL dumps, I've noticed that the
swap.b insn is generated by the combine pass.  I've got a few combine patterns
for matching byte swaps on SH.  The pattern for swap.b doesn't combine well
with other ops around/on it.  -fdump-tree-bswap says:

;; Function test_08 (test_08, funcdef_no=1, decl_uid=1333, cgraph_uid=1,
symbol_order=1)

test_08 (short unsigned int a, short unsigned int b)
{
  short unsigned int _2;
  signed short _3;
  int _4;
  int _5;
  signed short _6;
  signed short _7;
  short unsigned int _8;
  short unsigned int _10;

  <bb 2>:
  _2 = a_1(D) >> 8;
  _3 = (signed short) _2;
  _4 = (int) a_1(D);
  _5 = _4 << 8;
  _6 = (signed short) _5;
  _7 = _3 | _6;
  _8 = (short unsigned int) _7;
  _10 = _8 + b_9(D);
  return _10;

}


> > Byte swapping of signed short types seems to be not working:
> > 
> > short test_func_111 (short a, short b, short c)
> > {
> >   return (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
> > }
> > 
> >         exts.w  r4,r4
> >         mov     r4,r0
> >         shlr8   r0
> >         extu.b  r0,r0
> >         shll8   r4
> >         or      r0,r4
> >         rts
> >         exts.w  r4,r0
> 
> That's expected. Think about what happens if a = 0x8001. Doing a right shift
> by 8 bit would give 0xFF80 (due to the most significant bit being 1). The
> right part of the bitwise OR would give 0x0100 as expected and the result
> would be 0xFF80, so not a byte swap. It would work with an int though as the
> highest bit would then be 0, or with unsigned short as a right shift would
> introduce 0 in the most significant bits.

As Andreas mentioned, 'a' is promoted to int, so this should be a byte swap.
>From gcc-bugs-return-470654-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Dec 15 00:58:26 2014
Return-Path: <gcc-bugs-return-470654-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 26858 invoked by alias); 15 Dec 2014 00:58:25 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 26809 invoked by uid 48); 15 Dec 2014 00:58:20 -0000
From: "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug ipa/64218] [5 Regression] ICE: Segmentation fault (symtab_node::get_alias_target()) running Boost testsuite
Date: Mon, 15 Dec 2014 00:58:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: ipa
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: hubicka at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-64218-4-O8Un4QGHsi@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64218-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64218-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg01661.txt.bz2
Content-length: 1166

https://gcc.gnu.org/bugzilla/show_bug.cgi?idd218

--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I guess the problem is that we inline function but do not eliminate its (now
dead) alias.  Probably it is better to do so in inline-transform like this:
Index: ipa-inline-transform.c
==================================================================--- ipa-inline-transform.c      (revision 218722)
+++ ipa-inline-transform.c      (working copy)
@@ -199,6 +199,12 @@
             until after these clones are materialized.  */
          && !master_clone_with_noninline_clones_p (e->callee))
        {
+         ipa_ref *alias;
+         /* Remove aliases (that must be dead by can_remove_node_now_p)
+            so they do not confuse us later.  */
+         while (e->callee->iterate_direct_aliases (0, alias))
+           alias->referred->remove ();
+
          /* TODO: When callee is in a comdat group, we could remove all of it,
             including all inline clones inlined into it.  That would however
             need small function inlining to register edge removal hook to

will need to find a way to reproduce this though.

Honza


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (7 preceding siblings ...)
  2014-12-14 23:53 ` olegendo at gcc dot gnu.org
@ 2014-12-15 10:13 ` thopre01 at gcc dot gnu.org
  2014-12-17 11:50 ` thopre01 at gcc dot gnu.org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-12-15 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #10 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #9)
> (In reply to thopre01 from comment #7)
> > 
> > Strange, could you show the output of -fdump-tree-bswap?
> 
> Not so strange at all. 

What is strange is that it should detect such bswap pattern.

> After looking at the RTL dumps, I've noticed that
> the swap.b insn is generated by the combine pass.  I've got a few combine
> patterns for matching byte swaps on SH.  The pattern for swap.b doesn't
> combine well with other ops around/on it.  -fdump-tree-bswap says:
> 
> ;; Function test_08 (test_08, funcdef_no=1, decl_uid=1333, cgraph_uid=1,
> symbol_order=1)
> 
> test_08 (short unsigned int a, short unsigned int b)
> {
>   short unsigned int _2;
>   signed short _3;
>   int _4;
>   int _5;
>   signed short _6;
>   signed short _7;
>   short unsigned int _8;
>   short unsigned int _10;
> 
>   <bb 2>:
>   _2 = a_1(D) >> 8;
>   _3 = (signed short) _2;
>   _4 = (int) a_1(D);
>   _5 = _4 << 8;
>   _6 = (signed short) _5;
>   _7 = _3 | _6;
>   _8 = (short unsigned int) _7;
>   _10 = _8 + b_9(D);
>   return _10;
> 
> }

I have the same gimple and for me the bswap is correctly detected. Can you
break at find_bswap_or_nop just after calling find_bswap_or_nop_1 on the if
(!source_stmt) and show me the output of p/x n->n ?

> 
> 
> > > Byte swapping of signed short types seems to be not working:
> > > 
> > > short test_func_111 (short a, short b, short c)
> > > {
> > >   return (((a & 0xFF00) >> 8) | ((a & 0xFF) << 8));
> > > }
> > > 
> > >         exts.w  r4,r4
> > >         mov     r4,r0
> > >         shlr8   r0
> > >         extu.b  r0,r0
> > >         shll8   r4
> > >         or      r0,r4
> > >         rts
> > >         exts.w  r4,r0
> > 
> > That's expected. Think about what happens if a = 0x8001. Doing a right shift
> > by 8 bit would give 0xFF80 (due to the most significant bit being 1). The
> > right part of the bitwise OR would give 0x0100 as expected and the result
> > would be 0xFF80, so not a byte swap. It would work with an int though as the
> > highest bit would then be 0, or with unsigned short as a right shift would
> > introduce 0 in the most significant bits.
> 
> As Andreas mentioned, 'a' is promoted to int, so this should be a byte swap.

Indeed, my mistake. Ok I tested a bit and found that the problem is the depth
at which it's looking. Try to recompile tree-ssa-math-opts.c after increasing
the limit number in find_bswap_or_nop. Right now the limit will evaluate to 4
and the gimple I have has a depth of 5.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (8 preceding siblings ...)
  2014-12-15 10:13 ` thopre01 at gcc dot gnu.org
@ 2014-12-17 11:50 ` thopre01 at gcc dot gnu.org
  2014-12-19  1:53 ` olegendo at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-12-17 11:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #12 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #11)
> (In reply to thopre01 from comment #10)
> > 
> > I have the same gimple and for me the bswap is correctly detected. Can you
> > break at find_bswap_or_nop just after calling find_bswap_or_nop_1 on the if
> > (!source_stmt) and show me the output of p/x n->n ?
> 
> n->n = 0x0000000000000102  limit = 4

That's good, it means the pattern is recognized. Is there an optab defined for
bswap16?

> 
> For both, test_099 and test_08.
> 
> > Indeed, my mistake. Ok I tested a bit and found that the problem is the
> > depth at which it's looking. Try to recompile tree-ssa-math-opts.c after
> > increasing the limit number in find_bswap_or_nop. Right now the limit will
> > evaluate to 4 and the gimple I have has a depth of 5.
> 
> I've tried ...
> 
>   limit += 10 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);
> 
> ... but it doesn't change anything here.

Same as the other pattern, can you try to print n->n in hex with this new
limit? My guess is that the pattern is now recognized but fails later for the
same reason as above.

Best regards.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (9 preceding siblings ...)
  2014-12-17 11:50 ` thopre01 at gcc dot gnu.org
@ 2014-12-19  1:53 ` olegendo at gcc dot gnu.org
  2014-12-19 10:53 ` thopre01 at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-19  1:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #13 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to thopre01 from comment #12)
> 
> That's good, it means the pattern is recognized. Is there an optab defined
> for bswap16?

Nope.  Just this:

(define_insn "rotlhi3_8"
  [(set (match_operand:HI 0 "arith_reg_dest" "=r")
    (rotate:HI (match_operand:HI 1 "arith_reg_operand" "r")
           (const_int 8)))]
  "TARGET_SH1"
  "swap.b    %1,%0"
  [(set_attr "type" "arith")])

If I remember correctly, there was something that would check whether it can
use the rotate pattern instead of a bswaphi2.  After adding the following
pattern:

(define_expand "bswaphi2"
  [(set (match_operand:HI 0 "arith_reg_dest")
    (bswap:HI (match_operand:HI 1 "arith_reg_operand")))]
  "TARGET_SH1"
{
  if (!can_create_pseudo_p ())
    FAIL;
  else
    {
      emit_insn (gen_rotlhi3_8 (operands[0], operands[1]));
      DONE;
    }
})

it looks much better.  The cases above work, except for the signed short.  On
SH the bswap:HI HW insn actually doesn't modify the upper 16 bit of the 32 bit
register.  What it does is:

unsigned int test_0999 (unsigned int a, unsigned int b)
{
  return (a & 0xFFFF0000) | ((a & 0xFF00) >> 8) | ((a & 0xFF) << 8);
}

I was afraid that using a bswap:HI will result in unnecessary code around it:


    mov.l    .L6,r0  ! r0 = 0xFFFF0000
    and    r4,r0
    extu.w    r4,r4
    swap.b    r4,r4
    rts
    or    r4,r0

In my case, combine is looking for the pattern:

Failed to match this instruction:
(set (reg:SI 172 [ D.1356 ])
    (ior:SI (ior:SI (and:SI (ashift:SI (reg/v:SI 170 [ a ])
                    (const_int 8 [0x8]))
                (const_int 65280 [0xff00]))
            (zero_extract:SI (reg/v:SI 170 [ a ])
                (const_int 8 [0x8])
                (const_int 8 [0x8])))
        (and:SI (reg/v:SI 170 [ a ])
            (const_int -65536 [0xffffffffffff0000]))))

Which should then work.

> 
> Same as the other pattern, can you try to print n->n in hex with this new
> limit? My guess is that the pattern is now recognized but fails later for
> the same reason as above.

With increased limit, the number is/was the same.  The bswap:HI expander was
missing.  Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (10 preceding siblings ...)
  2014-12-19  1:53 ` olegendo at gcc dot gnu.org
@ 2014-12-19 10:53 ` thopre01 at gcc dot gnu.org
  2014-12-19 14:41 ` thopre01 at gcc dot gnu.org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-12-19 10:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #14 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #13)
> (In reply to thopre01 from comment #12)
> > 
> > That's good, it means the pattern is recognized. Is there an optab defined
> > for bswap16?
> 
> Nope.  Just this:
> 
> (define_insn "rotlhi3_8"
>   [(set (match_operand:HI 0 "arith_reg_dest" "=r")
> 	(rotate:HI (match_operand:HI 1 "arith_reg_operand" "r")
> 		   (const_int 8)))]
>   "TARGET_SH1"
>   "swap.b	%1,%0"
>   [(set_attr "type" "arith")])
> 
> If I remember correctly, there was something that would check whether it can
> use the rotate pattern instead of a bswaphi2.

It's rather the contrary. The bswap pass will replace the statements by a 8 bit
rotation if the value is 16bit and the expander will choose a bswaphi pattern
for that if the backend has one, otherwise it will keep the rotation.

The problem is that currently the bswap pass still bails out if there is no 16
bit bswap available. I shall fix that.

> After adding the following
> pattern:
> 
> (define_expand "bswaphi2"
>   [(set (match_operand:HI 0 "arith_reg_dest")
> 	(bswap:HI (match_operand:HI 1 "arith_reg_operand")))]
>   "TARGET_SH1"
> {
>   if (!can_create_pseudo_p ())
>     FAIL;
>   else
>     {
>       emit_insn (gen_rotlhi3_8 (operands[0], operands[1]));
>       DONE;
>     }
> })
> 
> it looks much better.  The cases above work, except for the signed short. 

You mean with the added bswaphi2 pattern the pattern is still unchanged?

> On SH the bswap:HI HW insn actually doesn't modify the upper 16 bit of the
> 32 bit register.  What it does is:
> 
> unsigned int test_0999 (unsigned int a, unsigned int b)
> {
>   return (a & 0xFFFF0000) | ((a & 0xFF00) >> 8) | ((a & 0xFF) << 8);
> }
> 
> I was afraid that using a bswap:HI will result in unnecessary code around it:
> 
> 
> 	mov.l	.L6,r0  ! r0 = 0xFFFF0000
> 	and	r4,r0
> 	extu.w	r4,r4
> 	swap.b	r4,r4
> 	rts
> 	or	r4,r0

Looks good to me, what exactly is the problem?

> 
> In my case, combine is looking for the pattern:
> 
> Failed to match this instruction:
> (set (reg:SI 172 [ D.1356 ])
>     (ior:SI (ior:SI (and:SI (ashift:SI (reg/v:SI 170 [ a ])
>                     (const_int 8 [0x8]))
>                 (const_int 65280 [0xff00]))
>             (zero_extract:SI (reg/v:SI 170 [ a ])
>                 (const_int 8 [0x8])
>                 (const_int 8 [0x8])))
>         (and:SI (reg/v:SI 170 [ a ])
>             (const_int -65536 [0xffffffffffff0000]))))
> 
> Which should then work.

If you have a bswap instruction it seems better to define a pattern for that
which the expander will use. That's the job of the bswap pass to detect a
bswap, it shouldn't be done by combine.

> 
> > 
> > Same as the other pattern, can you try to print n->n in hex with this new
> > limit? My guess is that the pattern is now recognized but fails later for
> > the same reason as above.
> 
> With increased limit, the number is/was the same.  The bswap:HI expander was
> missing.  Thanks!

Ok so the limit ought to be increased and the check for bswaphi removed. I'll
take a stab at that but that will probably make it only after Christmas.
>From gcc-bugs-return-471130-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Fri Dec 19 11:03:20 2014
Return-Path: <gcc-bugs-return-471130-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 27539 invoked by alias); 19 Dec 2014 11:03:19 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 27512 invoked by uid 48); 19 Dec 2014 11:03:15 -0000
From: "glisse at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/62191] extra shift generated for vector integer division by constant 2
Date: Fri, 19 Dec 2014 11:03:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: glisse at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status cf_reconfirmed_on everconfirmed
Message-ID: <bug-62191-4-jHLjsLiZQj@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-62191-4@http.gcc.gnu.org/bugzilla/>
References: <bug-62191-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg02137.txt.bz2
Content-length: 399

https://gcc.gnu.org/bugzilla/show_bug.cgi?idb191

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-12-19
     Ever confirmed|0                           |1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (11 preceding siblings ...)
  2014-12-19 10:53 ` thopre01 at gcc dot gnu.org
@ 2014-12-19 14:41 ` thopre01 at gcc dot gnu.org
  2014-12-19 15:05 ` olegendo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-12-19 14:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #16 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #15)
> (In reply to thopre01 from comment #14)
> 
> > You mean with the added bswaphi2 pattern the pattern is still unchanged?
> > 
> 
> After adding bswaphi2, the bswap pass does the transformation.  Except for
> the non-working 'signed short' mentioned above.  But we already figured that
> out earlier.

Did we? All I can find is you and Andreas mentionning that it should work
because it will be sign extended to int when doing the bitwise AND with 0xFF00.

What did I miss?

> 
> The expected sequence for the function above is:
> 
>         rts
>         swap.b  r4,r0
> 
> i.e. no anding and oring of lower/higher 16 bit word, since the swap.b insn
> operates on a SImode value and does not alter the high 16 bits.

Oh yeah right.

> 
> > 
> > If you have a bswap instruction it seems better to define a pattern for that
> > which the expander will use. That's the job of the bswap pass to detect a
> > bswap, it shouldn't be done by combine.
> 
> The combine parts I was talking about are to eliminate the anding and oring
> of higher 16 bits when a 16 bit byte swap is done on a 32 bit value.

I'm surprised that it's not the semantic of a bswaphi.

Best regards,

Thomas


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (12 preceding siblings ...)
  2014-12-19 14:41 ` thopre01 at gcc dot gnu.org
@ 2014-12-19 15:05 ` olegendo at gcc dot gnu.org
  2014-12-21 23:18 ` olegendo at gcc dot gnu.org
  2014-12-22 10:07 ` thopre01 at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-19 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #17 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to thopre01 from comment #16)
> 
> Did we? All I can find is you and Andreas mentionning that it should work
> because it will be sign extended to int when doing the bitwise AND with
> 0xFF00.
> 
> What did I miss?

Ah sorry ... me bad.  I haven't tried increasing the 'level' with the bswaphi
expander pattern in place.  Will do that later.

> > > If you have a bswap instruction it seems better to define a pattern for that
> > > which the expander will use. That's the job of the bswap pass to detect a
> > > bswap, it shouldn't be done by combine.
> > 
> > The combine parts I was talking about are to eliminate the anding and oring
> > of higher 16 bits when a 16 bit byte swap is done on a 32 bit value.
> 
> I'm surprised that it's not the semantic of a bswaphi.

As far as I know...
bswaphi's inputs/output modes are HImode and it operates on the whole reg, not
on a subreg.  If applied to a SImode subreg the other bits need to be
saved/restored, which is what happens wiit the anding/oring.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (13 preceding siblings ...)
  2014-12-19 15:05 ` olegendo at gcc dot gnu.org
@ 2014-12-21 23:18 ` olegendo at gcc dot gnu.org
  2014-12-22 10:07 ` thopre01 at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-21 23:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #18 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #17)
> (In reply to thopre01 from comment #16)
> > 
> > Did we? All I can find is you and Andreas mentionning that it should work
> > because it will be sign extended to int when doing the bitwise AND with
> > 0xFF00.
> > 
> > What did I miss?
> 
> Ah sorry ... me bad.  I haven't tried increasing the 'level' with the
> bswaphi expander pattern in place.  Will do that later.

So I did.  Increasing the limit to

  limit += 2 + (int) ceil_log2 ...

works for me for the signed short case.

However, the following function from the CSiBE set:

void
_nrrdSwap32Endian(void *_data, size_t N) {
  int *data, w, fix;
  size_t I;

  if (_data) {
    data = (int *)_data;
    for (I=0; I<N; I++) {
      w = data[I];
      fix =  (w & 0x000000FF);
      fix = ((w & 0x0000FF00) >> 0x08) | (fix << 0x08);
      fix = ((w & 0x00FF0000) >> 0x10) | (fix << 0x08);
      fix = ((w & 0xFF000000) >> 0x18) | (fix << 0x08);
      data[I] = fix;
    }
  }
}

seems to require 'limit += 3 + (int) ...' for the bswap insn to be detected.

Then, this fine function (from the same set)

void
_nrrdSwap64Endian(void *_data, size_t N) {
  airLLong *data, l, fix;
  size_t I;

  if (_data) {
    data = (airLLong *)_data;
    for (I=0; I<N; I++) {
      l = data[I];
      fix =  (l & 0x00000000000000FF);
      fix = ((l & 0x000000000000FF00) >> 0x08) | (fix << 0x08);
      fix = ((l & 0x0000000000FF0000) >> 0x10) | (fix << 0x08);
      fix = ((l & 0x00000000FF000000) >> 0x18) | (fix << 0x08);
#if defined(_WIN32)
      fix = ((l & 0x000000FF00000000i64) >> 0x20) | (fix << 0x08);
      fix = ((l & 0x0000FF0000000000i64) >> 0x28) | (fix << 0x08);
      fix = ((l & 0x00FF000000000000i64) >> 0x30) | (fix << 0x08);
      fix = ((l & 0xFF00000000000000i64) >> 0x38) | (fix << 0x08);
#else
      fix = ((l & 0x000000FF00000000LL) >> 0x20) | (fix << 0x08);
      fix = ((l & 0x0000FF0000000000LL) >> 0x28) | (fix << 0x08);
      fix = ((l & 0x00FF000000000000LL) >> 0x30) | (fix << 0x08);
      fix = ((l & 0xFF00000000000000LL) >> 0x38) | (fix << 0x08);
#endif
      data[I] = fix;
    }
  }
}

requires 'limit += 6 + (int) ...'

(luckily, there's no _nrrdSwap128Endian in the set)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/63259] Detecting byteswap sequence
  2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
                   ` (14 preceding siblings ...)
  2014-12-21 23:18 ` olegendo at gcc dot gnu.org
@ 2014-12-22 10:07 ` thopre01 at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: thopre01 at gcc dot gnu.org @ 2014-12-22 10:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #19 from thopre01 at gcc dot gnu.org ---
Yeah, when doing something like (((((x[0] << 8) | x[1]) << 8) | x[2]) << 8) |
x[3] there is already a depth proportional to the size of the value being byte
swapped with a coefficient due to casting. But I need to evaluate the impact of
increasing the limit in terms of compilation time. If the impact is noticeable,
it might be necessary to do the refactoring suggested by Richard Biener in [1]
first.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00616.html
"From the performance side the pass could be re-structured to populate
a lattice, thus work from def to use instead of the other way around.  Which
means we visit each stmt exactly once, compute its value symbolically
and check it against a rotate."


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-12-22 10:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-13 17:41 [Bug rtl-optimization/63259] New: Detecting byteswap sequence bisqwit at iki dot fi
2014-09-19 23:24 ` [Bug rtl-optimization/63259] " hp at gcc dot gnu.org
2014-09-25 18:31 ` olegendo at gcc dot gnu.org
2014-09-28 10:09 ` thopre01 at gcc dot gnu.org
2014-09-30  1:09 ` thopre01 at gcc dot gnu.org
2014-10-31 12:09 ` thopre01 at gcc dot gnu.org
2014-12-14 15:54 ` olegendo at gcc dot gnu.org
2014-12-14 19:35 ` thopre01 at gcc dot gnu.org
2014-12-14 23:53 ` olegendo at gcc dot gnu.org
2014-12-15 10:13 ` thopre01 at gcc dot gnu.org
2014-12-17 11:50 ` thopre01 at gcc dot gnu.org
2014-12-19  1:53 ` olegendo at gcc dot gnu.org
2014-12-19 10:53 ` thopre01 at gcc dot gnu.org
2014-12-19 14:41 ` thopre01 at gcc dot gnu.org
2014-12-19 15:05 ` olegendo at gcc dot gnu.org
2014-12-21 23:18 ` olegendo at gcc dot gnu.org
2014-12-22 10:07 ` thopre01 at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).