public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized
@ 2004-10-08  0:04 ak at muc dot de
  2004-10-08  0:05 ` [Bug middle-end/17886] " ak at muc dot de
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: ak at muc dot de @ 2004-10-08  0:04 UTC (permalink / raw)
  To: gcc-bugs

gcc can detect the (x << y)|(x >> (bitwidth-y)) idiom for rotate and convert
it into the machine rotate instruction. But it only works when y is a constant
and is not long long.

Enhancement request is to handle it for long long too (on 32bit) and
to handle variable shifts.

The attached test case should use rol in f1-f4

-- 
           Summary: variable rotate and long long rotate should be better
                    optimized
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ak at muc dot de
                CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: i386-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
@ 2004-10-08  0:05 ` ak at muc dot de
  2004-10-08  0:07 ` steven at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: ak at muc dot de @ 2004-10-08  0:05 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From ak at muc dot de  2004-10-08 00:05 -------
Created an attachment (id=7307)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7307&action=view)
test case showing the various cases


Only f4 is currently optimized into rol.

f1-f3 should be too. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
  2004-10-08  0:05 ` [Bug middle-end/17886] " ak at muc dot de
@ 2004-10-08  0:07 ` steven at gcc dot gnu dot org
  2004-10-08  0:16 ` pinskia at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-10-08  0:07 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
  2004-10-08  0:05 ` [Bug middle-end/17886] " ak at muc dot de
  2004-10-08  0:07 ` steven at gcc dot gnu dot org
@ 2004-10-08  0:16 ` pinskia at gcc dot gnu dot org
  2005-09-28  0:15 ` mmitchel at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-08  0:16 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-08 00:16 -------
Really this should have been split up into two different bugs as the last two expamples are done on tree 
level wich means that is middle-end/target problem, the first two examples are needed to be done on 
the tree level before fixing it on the middle-end/target problem.

Confirmed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2004-10-08 00:16:56
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (2 preceding siblings ...)
  2004-10-08  0:16 ` pinskia at gcc dot gnu dot org
@ 2005-09-28  0:15 ` mmitchel at gcc dot gnu dot org
  2005-09-28  0:18 ` mmitchel at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  0:15 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 00:15 -------
Shouldn't f2 use (32 - y) instead of (64 - y), since unsigned is a 32-bit type?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (3 preceding siblings ...)
  2005-09-28  0:15 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28  0:18 ` mmitchel at gcc dot gnu dot org
  2005-09-28  0:57 ` mmitchel at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  0:18 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 00:18 -------
With the change suggested in Comment #4, we do indeed get roll for f2 and rorl
for f4.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (4 preceding siblings ...)
  2005-09-28  0:18 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28  0:57 ` mmitchel at gcc dot gnu dot org
  2005-09-28  1:53 ` mmitchel at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  0:57 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 00:57 -------
I don't understand how, on IA32, we can use rol; rcl to perform the rotation in
f3.  Would you please add the complete code sequence you have in mind?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (5 preceding siblings ...)
  2005-09-28  0:57 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28  1:53 ` mmitchel at gcc dot gnu dot org
  2005-09-28  2:11 ` mmitchel at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  1:53 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 01:53 -------
I think the optimal sequence for f3 would look something like this, assuming
that EAX contains the low-order word and EDX contains the high-order word after
the prologue:

movl %edx, %ebx
shrl $23, %ebx
sall $9, %edx
movl %eax, %ecx
shrl $23, %ecx
sall $9, %eax
orl %ebx, %eax
orl %ecx, %edx


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (6 preceding siblings ...)
  2005-09-28  1:53 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28  2:11 ` mmitchel at gcc dot gnu dot org
  2005-09-28 16:24 ` mmitchel at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  2:11 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 02:11 -------
Actuall, I think this is better:

mov %edx, %ebx
shld $9, %eax, %edx
shld %9, %ebx, %eax

Right?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (7 preceding siblings ...)
  2005-09-28  2:11 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28 16:24 ` mmitchel at gcc dot gnu dot org
  2005-09-29  3:32 ` cvs-commit at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28 16:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 16:24 -------
I am working on a patch to improve the rotation of "long long" by a constant.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |mark at codesourcery dot com
                   |dot org                     |
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (8 preceding siblings ...)
  2005-09-28 16:24 ` mmitchel at gcc dot gnu dot org
@ 2005-09-29  3:32 ` cvs-commit at gcc dot gnu dot org
  2005-09-29  3:52 ` mmitchel at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-09-29  3:32 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2005-09-29 03:31 -------
Subject: Bug 17886

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	mmitchel@gcc.gnu.org	2005-09-29 03:31:27

Modified files:
	gcc            : ChangeLog expmed.c optabs.c 
	gcc/config/i386: i386.md 

Log message:
	PR 17886
	* expmed.c (expand_shift): Move logic to reverse rotation
	direction when 	rotating by constants ...
	* optabs.c (expand_binop): ... here.
	* config/i386/i386.md (rotrdi3): Handle 32-bit mode.
	(ix86_rotrdi3): New pattern.
	(rotldi3): Handle 32-bit mode.
	(ix86_rotldi3): New pattern.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.10044&r2=2.10045
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expmed.c.diff?cvsroot=gcc&r1=1.236&r2=1.237
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/optabs.c.diff?cvsroot=gcc&r1=1.294&r2=1.295
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.656&r2=1.657



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (9 preceding siblings ...)
  2005-09-29  3:32 ` cvs-commit at gcc dot gnu dot org
@ 2005-09-29  3:52 ` mmitchel at gcc dot gnu dot org
  2005-09-29  4:36 ` dank at kegel dot com
  2005-09-29  5:05 ` mmitchel at gcc dot gnu dot org
  12 siblings, 0 replies; 25+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-29  3:52 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-29 03:52 -------
Here is the current status for the four functions in Andi's testcase, with "f2"
changed to use "32 - y" so that it is a proper rotation:

* f still generates a complex code sequence, but I'm not sure how much better we
can do.  Our code sequence doesn't look a lot worse than the sequence generated
by icc 9.0, at first glance.  We could try something like:  

  if %ecx > 31:
    mov %eax, %ebx
    shldl $31, %edx, %eax
    shldl $31, %ebx, %edx
    %ecx -= 31
  if %ecx > 31:
    mov %eax, %ebx
    shldl $31, %edx, %eax
    shldl $31, %ebx, %edx
    %ecx -= 31
  if %ecx != 0:
    mov %eax, %ebx
    shldl %cl, %edx, %eax
    shldl %cl, %ebx, %edx

but, that doesn't seem clearly better than what we presently generate.

* f2 uses the roll instruction, which appears optimal.

* f3 uses two shdl instructions, which appears optimal.

* f4 uses the rorl instruction, which appears optimal.

For all of f2 and f3, it looks like we generate code better than you get with
icc 9.0.

I have no plans to work on this further, for the time being, but I'll not close
out the PRt; someone else might want to try to attack the code generated for the
variable rotation case.   Or, if people are satisfied, we can close the PR.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|mark at codesourcery dot com|unassigned at gcc dot gnu
                   |                            |dot org
             Status|ASSIGNED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (10 preceding siblings ...)
  2005-09-29  3:52 ` mmitchel at gcc dot gnu dot org
@ 2005-09-29  4:36 ` dank at kegel dot com
  2005-09-29  5:05 ` mmitchel at gcc dot gnu dot org
  12 siblings, 0 replies; 25+ messages in thread
From: dank at kegel dot com @ 2005-09-29  4:36 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-09-29 04:36 -------
Thanks - I'll try to get this benchmarked on a semi-real app.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (11 preceding siblings ...)
  2005-09-29  4:36 ` dank at kegel dot com
@ 2005-09-29  5:05 ` mmitchel at gcc dot gnu dot org
  12 siblings, 0 replies; 25+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-29  5:05 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-29 05:05 -------
Here's the best I can think of for the first case, assuming that %ecx contains
the rotate-left count, %eax contains the low order word, and %ebx contains the
high-order word.

  mov %eax, %ebx
  cmp %ecx, $32
  ja l1
  je l2
  shldl %cl, %edx, %eax
  shldl %cl, %ebx, %edx
  jmp l3
l1:
  negl %ecx
  shrdl %cl, %edx, %eax
  shrdl %cl, %ebx, %edx
  jmp l3
l2:
  mov %edx, %eax
  mov %ebx, %edx
l3:

I have no current plans to try to teach GCC to generate that, though.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2005-11-10 22:44 ` tkho at ucla dot edu
@ 2005-11-11  1:26 ` tkho at ucla dot edu
  10 siblings, 0 replies; 25+ messages in thread
From: tkho at ucla dot edu @ 2005-11-11  1:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from tkho at ucla dot edu  2005-11-11 01:26 -------
For comparison, here's the code from gcc 2.95.3. It generates the same 18
instructions for both march=i386 and march=pentiumpro.
`gcc -c test3.c -save-temps -O2 -momit-leaf-frame-pointer -march=pentiumpro`:
        pushl %ebx
        movl 8(%esp),%ecx
        movl 12(%esp),%ebx
        movl 16(%esp),%eax
        movl 20(%esp),%edx
        shldl $8,%ecx,%ebx
        sall $8,%ecx
        movl %edx,%eax
        xorl %edx,%edx
        shrl $16,%eax
        andl $255,%eax
        andl $0,%edx
        orl %eax,%ecx
        orl %edx,%ebx
        movl %ecx,%eax
        movl %ebx,%edx
        popl %ebx
        ret

Also, in comment #23, I erronously used g++. Luckily, the same code was
generated with gcc.

On another note, Mark, I tried your patch in comment #10. I grabbed gcc-head
from 2005-09-28 and compared a clean build with a build that had your patches
applied. There was no difference in the assembly for the test case in comment
#23, and there was no performance gain in our benchmark application.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2005-10-25 21:38 ` cvs-commit at gcc dot gnu dot org
@ 2005-11-10 22:44 ` tkho at ucla dot edu
  2005-11-11  1:26 ` tkho at ucla dot edu
  10 siblings, 0 replies; 25+ messages in thread
From: tkho at ucla dot edu @ 2005-11-10 22:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from tkho at ucla dot edu  2005-11-10 22:43 -------
Created an attachment (id=10211)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10211&action=view)
Another long long rotate test case

Hi Michael,

I tried your patch in comment #16, and it didn't optimize our code. Attached is
another minimal testcase. I hope it helps and you can do something with it.

Here is a rough instruction-count comparison for f() compiled at -O2,
march=pentiumpro with icc9 and gcc head 20051108 with your patch applied:

icc: 11
gcc: 23

`icc -O2 -march=pentiumpro -S test3.c` gives:
        movl      4(%esp), %eax
        movl      8(%esp), %ecx
        movl      %eax, %edx
        shrl      $24, %edx
        shll      $8, %eax
        shll      $8, %ecx
        orl       %ecx, %edx
        movzwl    18(%esp), %ecx
        movzbl    %cl, %ecx
        orl       %ecx, %eax
        ret

`g++ -c test3.c -save-temps -O2 -march=pentiumpro -momit-leaf-frame-pointer`
gives:
        subl    $12, %esp
        movl    %edi, 8(%esp)
        movl    28(%esp), %edi
        movl    16(%esp), %eax
        movl    20(%esp), %edx
        movl    %esi, 4(%esp)
        movl    24(%esp), %esi
        movl    %edi, %esi
        xorl    %edi, %edi
        movl    8(%esp), %edi
        movl    %ebx, (%esp)
        shrl    $16, %esi
        xorl    %ebx, %ebx
        shldl   $8, %eax, %edx
        movl    %esi, %ecx
        movl    4(%esp), %esi
        orl     %ebx, %edx
        movl    (%esp), %ebx
        andl    $255, %ecx
        sall    $8, %eax
        addl    $12, %esp
        orl     %ecx, %eax
        ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2005-10-04 20:46 ` michael dot meissner at amd dot com
@ 2005-10-25 21:38 ` cvs-commit at gcc dot gnu dot org
  2005-11-10 22:44 ` tkho at ucla dot edu
  2005-11-11  1:26 ` tkho at ucla dot edu
  10 siblings, 0 replies; 25+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-10-25 21:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from cvs-commit at gcc dot gnu dot org  2005-10-25 21:38 -------
Subject: Bug 17886

CVSROOT:        /cvs/gcc
Module name:    gcc
Branch:         apple-local-200502-branch
Changes by:     echristo@gcc.gnu.org    2005-10-25 21:38:27

Modified files:
        gcc            : ChangeLog.apple-ppc expmed.c optabs.c 
        gcc/config/i386: i386.md 

Log message:
        2005-10-25  Eric Christopher  <echristo@apple.com>

        Import from mainline:
        2005-09-28  Mark Mitchell  <mark@codesourcery.com>

        PR 17886
        * expmed.c (expand_shift): Move logic to reverse rotation
        direction when rotating by constants ...
        * optabs.c (expand_binop):  ... here.
        * config/i386/i386.md (rotrdi3): Handle 32-bit mode.
        (ix86_rotrdi3): New pattern.
        (rotldi3): Handle 32-bit mode.
        (ix86_rotldi3): New pattern.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.apple-ppc.diff?cvsroot=gcc&only_with_tag=apple-local-200502-branch&r1=1.1.4.200&r2=1.1.4.201
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expmed.c.diff?cvsroot=gcc&only_with_tag=apple-local-200502-branch&r1=1.223.2.2&r2=1.223.2.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/optabs.c.diff?cvsroot=gcc&only_with_tag=apple-local-200502-branch&r1=1.260&r2=1.260.2.1
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&only_with_tag=apple-local-200502-branch&r1=1.618.2.10&r2=1.618.2.11


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2005-10-04 20:39 ` ak at muc dot de
@ 2005-10-04 20:46 ` michael dot meissner at amd dot com
  2005-10-25 21:38 ` cvs-commit at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: michael dot meissner at amd dot com @ 2005-10-04 20:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from michael dot meissner at amd dot com  2005-10-04 20:46 -------
Subject: RE:  variable rotate and long long rotate
 should be better optimized

Sorry, I got mixed up as to who the original poster was.

SSE2 is harder to use because it deals with 128 bit items instead of 64
bit (unless you are in 64-bit and working on TImode values).
Ultimately, it is a matter whether it is important enough for somebody
to spend a week or two of work to use the multimedia instructions for
this case.  I suspect in most cases, it might be better to isolate the
code and use #ifdef's and builtin functions/asm's.

-----Original Message-----
From: ak at muc dot de [mailto:gcc-bugzilla@gcc.gnu.org] 
Sent: Tuesday, October 04, 2005 4:40 PM
To: Meissner, Michael
Subject: [Bug middle-end/17886] variable rotate and long long rotate
should be better optimized



------- Comment #20 from ak at muc dot de  2005-10-04 20:39 -------
Newer linux does that of course, although not always in older releases.

But even in user space it's not a good idea to use SSE2 unless you
really need
it because it increases the cost of the context switch and costs an
exception
each time first in a timeslice.

P.S.: I was the original poster, but the application wasn't a kernel but
I
doubt
it's a good idea to use SSE2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2005-10-04 20:35 ` michael dot meissner at amd dot com
@ 2005-10-04 20:39 ` ak at muc dot de
  2005-10-04 20:46 ` michael dot meissner at amd dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: ak at muc dot de @ 2005-10-04 20:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from ak at muc dot de  2005-10-04 20:39 -------
Newer linux does that of course, although not always in older releases.

But even in user space it's not a good idea to use SSE2 unless you really need
it because it increases the cost of the context switch and costs an exception
each time first in a timeslice.

P.S.: I was the original poster, but the application wasn't a kernel but I
doubt
it's a good idea to use SSE2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2005-10-04 20:32 ` michael dot meissner at amd dot com
@ 2005-10-04 20:35 ` michael dot meissner at amd dot com
  2005-10-04 20:39 ` ak at muc dot de
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: michael dot meissner at amd dot com @ 2005-10-04 20:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from michael dot meissner at amd dot com  2005-10-04 20:35 -------
Subject: RE:  variable rotate and long long rotate
 should be better optimized

I almost forgot, kernels should be using -mno-mmx and -mno-sse as a
matter of course (or -msoft-float).  I first ran into this problem in
1990 when I was supporting the MIPS platform, and the kernel guys were
surprised that the compiler would use the double precision registers to
do block copies, since it could double the bandwidth of doing 32-bit
moves.

-----Original Message-----
From: ak at muc dot de [mailto:gcc-bugzilla@gcc.gnu.org] 
Sent: Tuesday, October 04, 2005 4:20 PM
To: Meissner, Michael
Subject: [Bug middle-end/17886] variable rotate and long long rotate
should be better optimized



------- Comment #17 from ak at muc dot de  2005-10-04 20:20 -------
The code now looks fine to me thanks

I would prefer if it didn't generate SSE2/MMX code because that would be
a
problem for kernels. Also in many x86 implementations moving things
between
normal integer registers and SIMD registers is quite slow and would
likely eat
all advantages


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2005-10-04 20:20 ` ak at muc dot de
@ 2005-10-04 20:32 ` michael dot meissner at amd dot com
  2005-10-04 20:35 ` michael dot meissner at amd dot com
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: michael dot meissner at amd dot com @ 2005-10-04 20:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from michael dot meissner at amd dot com  2005-10-04 20:32 -------
Subject: RE:  variable rotate and long long rotate
 should be better optimized

Yep, all valid points.  So I don't think it should be done by default.
But I suspect the original poster's application may be well behaved to
be able to use it.  Certainly if the only reason for doing long long is
to do heavy duty bit banging (shift/rotate/and/or/test), but no
arithmetic it would speed up since it could do one instruction instead
of multiple, and it would lesson the register pressure that long longs
put on the x86.

-----Original Message-----
From: ak at muc dot de [mailto:gcc-bugzilla@gcc.gnu.org] 
Sent: Tuesday, October 04, 2005 4:20 PM
To: Meissner, Michael
Subject: [Bug middle-end/17886] variable rotate and long long rotate
should be better optimized



------- Comment #17 from ak at muc dot de  2005-10-04 20:20 -------
The code now looks fine to me thanks

I would prefer if it didn't generate SSE2/MMX code because that would be
a
problem for kernels. Also in many x86 implementations moving things
between
normal integer registers and SIMD registers is quite slow and would
likely eat
all advantages


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2005-10-04 20:06 ` michael dot meissner at amd dot com
@ 2005-10-04 20:20 ` ak at muc dot de
  2005-10-04 20:32 ` michael dot meissner at amd dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: ak at muc dot de @ 2005-10-04 20:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from ak at muc dot de  2005-10-04 20:20 -------
The code now looks fine to me thanks

I would prefer if it didn't generate SSE2/MMX code because that would be a
problem for kernels. Also in many x86 implementations moving things between
normal integer registers and SIMD registers is quite slow and would likely eat
all advantages


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
  2005-10-04 18:59 ` michael dot meissner at amd dot com
  2005-10-04 19:51 ` michael dot meissner at amd dot com
@ 2005-10-04 20:06 ` michael dot meissner at amd dot com
  2005-10-04 20:20 ` ak at muc dot de
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: michael dot meissner at amd dot com @ 2005-10-04 20:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from michael dot meissner at amd dot com  2005-10-04 20:06 -------
Created an attachment (id=9880)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9880&action=view)
Respin of 17886 patch to match new tree contents

This patch is meant to apply on top of Mark's changes, but provides the same
code as my previous patch.


-- 

michael dot meissner at amd dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #9876 is|0                           |1
           obsolete|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
  2005-10-04 18:59 ` michael dot meissner at amd dot com
@ 2005-10-04 19:51 ` michael dot meissner at amd dot com
  2005-10-04 20:06 ` michael dot meissner at amd dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: michael dot meissner at amd dot com @ 2005-10-04 19:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from michael dot meissner at amd dot com  2005-10-04 19:51 -------
Note, Mark's patch as applied to the tree has a minor typo in it.  The rotrdi3
define_expand uses (rotate:DI ...) instead of (rotatert:DI ...).  It doesn't
matter in practice, since the generator function is never called, but it is
useful to have the right insns listed.


-- 

michael dot meissner at amd dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michael dot meissner at amd
                   |                            |dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
       [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
@ 2005-10-04 18:59 ` michael dot meissner at amd dot com
  2005-10-04 19:51 ` michael dot meissner at amd dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: michael dot meissner at amd dot com @ 2005-10-04 18:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from michael dot meissner at amd dot com  2005-10-04 18:59 -------
Created an attachment (id=9876)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9876&action=view)
Patch for x86 double word shifts

This patch fixes the bug from the x86 side of things instead of from the
machine independent, by adding direct expanders for the best code (for doing 64
bit rotates in 32-bit mode and 128 bit rotates in 64-bit mode).  On a machine
with conditional move (all recent processors), the code becomes:

        movl    %edx, %ebx
        shldl   %eax, %edx
        shldl   %ebx, %eax
        movl    %edx, %ebx
        andl    $32, %ecx
        cmovne  %eax, %edx
        cmovne  %ebx, %eax

However, I suspect using MMX or SSE2 instructions will provide even more of a
speedup, since there are direct 64-bit shifts, and, or, load/store directly
(but no direct rotate).  In the MMX space you have to be careful not to have
active floating point going on, and to switch out of MMX mode before doing
calls or returns. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2005-11-11  1:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
2004-10-08  0:05 ` [Bug middle-end/17886] " ak at muc dot de
2004-10-08  0:07 ` steven at gcc dot gnu dot org
2004-10-08  0:16 ` pinskia at gcc dot gnu dot org
2005-09-28  0:15 ` mmitchel at gcc dot gnu dot org
2005-09-28  0:18 ` mmitchel at gcc dot gnu dot org
2005-09-28  0:57 ` mmitchel at gcc dot gnu dot org
2005-09-28  1:53 ` mmitchel at gcc dot gnu dot org
2005-09-28  2:11 ` mmitchel at gcc dot gnu dot org
2005-09-28 16:24 ` mmitchel at gcc dot gnu dot org
2005-09-29  3:32 ` cvs-commit at gcc dot gnu dot org
2005-09-29  3:52 ` mmitchel at gcc dot gnu dot org
2005-09-29  4:36 ` dank at kegel dot com
2005-09-29  5:05 ` mmitchel at gcc dot gnu dot org
     [not found] <bug-17886-7834@http.gcc.gnu.org/bugzilla/>
2005-10-04 18:59 ` michael dot meissner at amd dot com
2005-10-04 19:51 ` michael dot meissner at amd dot com
2005-10-04 20:06 ` michael dot meissner at amd dot com
2005-10-04 20:20 ` ak at muc dot de
2005-10-04 20:32 ` michael dot meissner at amd dot com
2005-10-04 20:35 ` michael dot meissner at amd dot com
2005-10-04 20:39 ` ak at muc dot de
2005-10-04 20:46 ` michael dot meissner at amd dot com
2005-10-25 21:38 ` cvs-commit at gcc dot gnu dot org
2005-11-10 22:44 ` tkho at ucla dot edu
2005-11-11  1:26 ` tkho at ucla dot edu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).