public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized
@ 2004-10-08  0:04 ak at muc dot de
  2004-10-08  0:05 ` [Bug middle-end/17886] " ak at muc dot de
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: ak at muc dot de @ 2004-10-08  0:04 UTC (permalink / raw)
  To: gcc-bugs

gcc can detect the (x << y)|(x >> (bitwidth-y)) idiom for rotate and convert
it into the machine rotate instruction. But it only works when y is a constant
and is not long long.

Enhancement request is to handle it for long long too (on 32bit) and
to handle variable shifts.

The attached test case should use rol in f1-f4

-- 
           Summary: variable rotate and long long rotate should be better
                    optimized
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ak at muc dot de
                CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: i386-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
@ 2004-10-08  0:05 ` ak at muc dot de
  2004-10-08  0:07 ` steven at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ak at muc dot de @ 2004-10-08  0:05 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From ak at muc dot de  2004-10-08 00:05 -------
Created an attachment (id=7307)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7307&action=view)
test case showing the various cases


Only f4 is currently optimized into rol.

f1-f3 should be too. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
  2004-10-08  0:05 ` [Bug middle-end/17886] " ak at muc dot de
@ 2004-10-08  0:07 ` steven at gcc dot gnu dot org
  2004-10-08  0:16 ` pinskia at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-10-08  0:07 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
  2004-10-08  0:05 ` [Bug middle-end/17886] " ak at muc dot de
  2004-10-08  0:07 ` steven at gcc dot gnu dot org
@ 2004-10-08  0:16 ` pinskia at gcc dot gnu dot org
  2005-09-28  0:15 ` mmitchel at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-08  0:16 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-08 00:16 -------
Really this should have been split up into two different bugs as the last two expamples are done on tree 
level wich means that is middle-end/target problem, the first two examples are needed to be done on 
the tree level before fixing it on the middle-end/target problem.

Confirmed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2004-10-08 00:16:56
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (2 preceding siblings ...)
  2004-10-08  0:16 ` pinskia at gcc dot gnu dot org
@ 2005-09-28  0:15 ` mmitchel at gcc dot gnu dot org
  2005-09-28  0:18 ` mmitchel at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  0:15 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 00:15 -------
Shouldn't f2 use (32 - y) instead of (64 - y), since unsigned is a 32-bit type?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (3 preceding siblings ...)
  2005-09-28  0:15 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28  0:18 ` mmitchel at gcc dot gnu dot org
  2005-09-28  0:57 ` mmitchel at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  0:18 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 00:18 -------
With the change suggested in Comment #4, we do indeed get roll for f2 and rorl
for f4.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (4 preceding siblings ...)
  2005-09-28  0:18 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28  0:57 ` mmitchel at gcc dot gnu dot org
  2005-09-28  1:53 ` mmitchel at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  0:57 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 00:57 -------
I don't understand how, on IA32, we can use rol; rcl to perform the rotation in
f3.  Would you please add the complete code sequence you have in mind?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (5 preceding siblings ...)
  2005-09-28  0:57 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28  1:53 ` mmitchel at gcc dot gnu dot org
  2005-09-28  2:11 ` mmitchel at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  1:53 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 01:53 -------
I think the optimal sequence for f3 would look something like this, assuming
that EAX contains the low-order word and EDX contains the high-order word after
the prologue:

movl %edx, %ebx
shrl $23, %ebx
sall $9, %edx
movl %eax, %ecx
shrl $23, %ecx
sall $9, %eax
orl %ebx, %eax
orl %ecx, %edx


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (6 preceding siblings ...)
  2005-09-28  1:53 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28  2:11 ` mmitchel at gcc dot gnu dot org
  2005-09-28 16:24 ` mmitchel at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28  2:11 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 02:11 -------
Actuall, I think this is better:

mov %edx, %ebx
shld $9, %eax, %edx
shld %9, %ebx, %eax

Right?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (7 preceding siblings ...)
  2005-09-28  2:11 ` mmitchel at gcc dot gnu dot org
@ 2005-09-28 16:24 ` mmitchel at gcc dot gnu dot org
  2005-09-29  3:32 ` cvs-commit at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-28 16:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-28 16:24 -------
I am working on a patch to improve the rotation of "long long" by a constant.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |mark at codesourcery dot com
                   |dot org                     |
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (8 preceding siblings ...)
  2005-09-28 16:24 ` mmitchel at gcc dot gnu dot org
@ 2005-09-29  3:32 ` cvs-commit at gcc dot gnu dot org
  2005-09-29  3:52 ` mmitchel at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2005-09-29  3:32 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2005-09-29 03:31 -------
Subject: Bug 17886

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	mmitchel@gcc.gnu.org	2005-09-29 03:31:27

Modified files:
	gcc            : ChangeLog expmed.c optabs.c 
	gcc/config/i386: i386.md 

Log message:
	PR 17886
	* expmed.c (expand_shift): Move logic to reverse rotation
	direction when 	rotating by constants ...
	* optabs.c (expand_binop): ... here.
	* config/i386/i386.md (rotrdi3): Handle 32-bit mode.
	(ix86_rotrdi3): New pattern.
	(rotldi3): Handle 32-bit mode.
	(ix86_rotldi3): New pattern.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.10044&r2=2.10045
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/expmed.c.diff?cvsroot=gcc&r1=1.236&r2=1.237
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/optabs.c.diff?cvsroot=gcc&r1=1.294&r2=1.295
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.656&r2=1.657



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (9 preceding siblings ...)
  2005-09-29  3:32 ` cvs-commit at gcc dot gnu dot org
@ 2005-09-29  3:52 ` mmitchel at gcc dot gnu dot org
  2005-09-29  4:36 ` dank at kegel dot com
  2005-09-29  5:05 ` mmitchel at gcc dot gnu dot org
  12 siblings, 0 replies; 14+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-29  3:52 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-29 03:52 -------
Here is the current status for the four functions in Andi's testcase, with "f2"
changed to use "32 - y" so that it is a proper rotation:

* f still generates a complex code sequence, but I'm not sure how much better we
can do.  Our code sequence doesn't look a lot worse than the sequence generated
by icc 9.0, at first glance.  We could try something like:  

  if %ecx > 31:
    mov %eax, %ebx
    shldl $31, %edx, %eax
    shldl $31, %ebx, %edx
    %ecx -= 31
  if %ecx > 31:
    mov %eax, %ebx
    shldl $31, %edx, %eax
    shldl $31, %ebx, %edx
    %ecx -= 31
  if %ecx != 0:
    mov %eax, %ebx
    shldl %cl, %edx, %eax
    shldl %cl, %ebx, %edx

but, that doesn't seem clearly better than what we presently generate.

* f2 uses the roll instruction, which appears optimal.

* f3 uses two shdl instructions, which appears optimal.

* f4 uses the rorl instruction, which appears optimal.

For all of f2 and f3, it looks like we generate code better than you get with
icc 9.0.

I have no plans to work on this further, for the time being, but I'll not close
out the PRt; someone else might want to try to attack the code generated for the
variable rotation case.   Or, if people are satisfied, we can close the PR.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|mark at codesourcery dot com|unassigned at gcc dot gnu
                   |                            |dot org
             Status|ASSIGNED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (10 preceding siblings ...)
  2005-09-29  3:52 ` mmitchel at gcc dot gnu dot org
@ 2005-09-29  4:36 ` dank at kegel dot com
  2005-09-29  5:05 ` mmitchel at gcc dot gnu dot org
  12 siblings, 0 replies; 14+ messages in thread
From: dank at kegel dot com @ 2005-09-29  4:36 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dank at kegel dot com  2005-09-29 04:36 -------
Thanks - I'll try to get this benchmarked on a semi-real app.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/17886] variable rotate and long long rotate should be better optimized
  2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
                   ` (11 preceding siblings ...)
  2005-09-29  4:36 ` dank at kegel dot com
@ 2005-09-29  5:05 ` mmitchel at gcc dot gnu dot org
  12 siblings, 0 replies; 14+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-09-29  5:05 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From mmitchel at gcc dot gnu dot org  2005-09-29 05:05 -------
Here's the best I can think of for the first case, assuming that %ecx contains
the rotate-left count, %eax contains the low order word, and %ebx contains the
high-order word.

  mov %eax, %ebx
  cmp %ecx, $32
  ja l1
  je l2
  shldl %cl, %edx, %eax
  shldl %cl, %ebx, %edx
  jmp l3
l1:
  negl %ecx
  shrdl %cl, %edx, %eax
  shrdl %cl, %ebx, %edx
  jmp l3
l2:
  mov %edx, %eax
  mov %ebx, %edx
l3:

I have no current plans to try to teach GCC to generate that, though.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-09-29  5:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-08  0:04 [Bug middle-end/17886] New: variable rotate and long long rotate should be better optimized ak at muc dot de
2004-10-08  0:05 ` [Bug middle-end/17886] " ak at muc dot de
2004-10-08  0:07 ` steven at gcc dot gnu dot org
2004-10-08  0:16 ` pinskia at gcc dot gnu dot org
2005-09-28  0:15 ` mmitchel at gcc dot gnu dot org
2005-09-28  0:18 ` mmitchel at gcc dot gnu dot org
2005-09-28  0:57 ` mmitchel at gcc dot gnu dot org
2005-09-28  1:53 ` mmitchel at gcc dot gnu dot org
2005-09-28  2:11 ` mmitchel at gcc dot gnu dot org
2005-09-28 16:24 ` mmitchel at gcc dot gnu dot org
2005-09-29  3:32 ` cvs-commit at gcc dot gnu dot org
2005-09-29  3:52 ` mmitchel at gcc dot gnu dot org
2005-09-29  4:36 ` dank at kegel dot com
2005-09-29  5:05 ` mmitchel at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).