public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/23726] New: Missed optimizations for divmod
@ 2005-09-04 22:00 bjoern dot m dot haase at web dot de
  2005-09-04 22:02 ` [Bug rtl-optimization/23726] " bjoern dot m dot haase at web dot de
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: bjoern dot m dot haase at web dot de @ 2005-09-04 22:00 UTC (permalink / raw)
  To: gcc-bugs

Prologue: 
 
Purpose of this enhancement request is not only to resolve a missed 
optimization issue for the avr port. It aims to show up ways for resolving a 
general difficulty for the RTL optimizers that we are presently having for 
double-set insn: A difficulty that shows up when dealing with side effects of 
double-set RTL insns that produce unexpected useful results.  
 
IMO this is a general issue that will not only show up for the divmod4 patterns 
but will probably also be very important for the efficiency of code generated 
after the CCmode transition for targets where 1.) arithmetic or logic 
operations leave the CC register with useful information and 2.) arithmetic 
operations on SImode or DImode objects need to be lowered to a sequence of 
subreg operations. 
 
At the end of this bug report I'm about to ask for an early CSE pass or a small 
change of the jump2 pass. 
 
 
Text body: 
 
When having a function like 
 
uint32_t d,m; 
void foo (uint32_t z, uint32_t n) 
{ 
  d = z / n; 
  m = z % n; 
} 
 
the avr port presently does not recognize that it needs to call the udivmodsi 
function only once. Assembly code generated reads, thus,  
        movw r14,r22 
        movw r16,r24 
        movw r10,r18 
        movw r12,r20 
        call __udivmodsi4 
        sts d,r18 
        sts (d)+1,r19 
        sts (d)+2,r20 
        sts (d)+3,r21 
        movw r24,r16 
        movw r22,r14 
        movw r20,r12 
        movw r18,r10 
        call __udivmodsi4 
        sts m,r22 
        sts (m)+1,r23 
        sts (m)+2,r24 
        sts (m)+3,r25 
(function prologue/epilogue stripped). The issue is that the CSE passes 
presently are not able to realize that the same result is calculated twice. The 
optimizers do, thus, not recognize that the first divmod4 call had a useful 
side-effect (the one that it calculates as well the mod). 
This could be fixed by attaching appropriate REG_EQUAL register notes to the 
insns that copy the results from the library call that reside in hard regs to 
target pseudos. The present implementation of the avr port does not generate 
such notes since it generates the calls explicitly and not by optabs.c (because 
the implementation knows exactly which registers will be clobbered by the 
library call). I will attach a patch that adds these required REG_EQUAL notes.  
 
The important issue is, that the REG_EQUAL note needs to refer to the original 
pseudos that were used as input parameters for the library call. Otherwise CSE 
will not realize the optimization opportunity. 
 
When using the attached patch, i.e. when adding the appropriate REG_EQUAL notes 
to the RTL, I veryfied that CSE is able to realize the optimization 
opportunity. I.e. one ends up with assembly output reading 
        call __udivmodsi4 
        sts d,r18 
        sts (d)+1,r19 
        sts (d)+2,r20 
        sts (d)+3,r21 
        sts m,r22 
        sts (m)+1,r23 
        sts (m)+2,r24 
        sts (m)+3,r25 
. The only difficulty is, that with the present optimization setup, the jump2 
pass removes the insn that contain the useful REG_EQUAL notes for the 
unexpected useful side-effect because it correctly recognizes that it is 
"trivially dead". For this reason, I had to place the first CSE pass before the 
jump2 pass in "passes.c" in order to get above results. I.e. my "passes.c" 
excerpt reads 
  NEXT_PASS (pass_instantiate_virtual_regs); 
  NEXT_PASS (pass_cse); 
  NEXT_PASS (pass_jump2); 
instead of  
  NEXT_PASS (pass_instantiate_virtual_regs); 
  NEXT_PASS (pass_jump2); 
  NEXT_PASS (pass_cse); 
. 
 
Epilogue: 
 
For this reason the attached patch could only resolve this issue IMO only if 
one of the following solutions is realized: 
 
1.) the jump2 pass is changed such that it refrains from deleting trivially 
dead insn if they are containing REG_EQUAL notes because possibly it could be 
necessary to make them being seen at least once by a CSE pass. 
2.) an earlier CSE pass is added to the optimizer flow. Possibly controlled by 
using a target-dependent flag. 
 
Yours, 
 
Bjoern

-- 
           Summary: Missed optimizations for divmod
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P1
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bjoern dot m dot haase at web dot de
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: unknown-x86_64-linux
  GCC host triplet: unknown-x86_64-linux
GCC target triplet: avr-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23726


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/23726] Missed optimizations for divmod
  2005-09-04 22:00 [Bug rtl-optimization/23726] New: Missed optimizations for divmod bjoern dot m dot haase at web dot de
@ 2005-09-04 22:02 ` bjoern dot m dot haase at web dot de
  2005-09-04 22:33 ` pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bjoern dot m dot haase at web dot de @ 2005-09-04 22:02 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bjoern dot m dot haase at web dot de  2005-09-04 22:02 -------
Created an attachment (id=9665)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9665&action=view)
Patch adding REG_EQUAL notes to the divmod4 expanders


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23726


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/23726] Missed optimizations for divmod
  2005-09-04 22:00 [Bug rtl-optimization/23726] New: Missed optimizations for divmod bjoern dot m dot haase at web dot de
  2005-09-04 22:02 ` [Bug rtl-optimization/23726] " bjoern dot m dot haase at web dot de
@ 2005-09-04 22:33 ` pinskia at gcc dot gnu dot org
  2005-09-05  6:59 ` bjoern dot m dot haase at web dot de
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-04 22:33 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-09-04 22:33 -------
Or even better add support for multiple sets in CSE and forget about adding notes.   That will improve 
more targets than just AVR.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  GCC build triplet|unknown-x86_64-linux        |
   GCC host triplet|unknown-x86_64-linux        |
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23726


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/23726] Missed optimizations for divmod
  2005-09-04 22:00 [Bug rtl-optimization/23726] New: Missed optimizations for divmod bjoern dot m dot haase at web dot de
  2005-09-04 22:02 ` [Bug rtl-optimization/23726] " bjoern dot m dot haase at web dot de
  2005-09-04 22:33 ` pinskia at gcc dot gnu dot org
@ 2005-09-05  6:59 ` bjoern dot m dot haase at web dot de
  2005-09-05 14:55 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bjoern dot m dot haase at web dot de @ 2005-09-05  6:59 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bjoern dot m dot haase at web dot de  2005-09-05 06:59 -------
IMO, one would not be able to handle the issue by changing CSE. E.g. I 
presently don't see how to avoid using register notes for the following 
situation. Imagine a target not having DImode operations so that DImode 
arithmetic needs to be lowered to a sequence of SImode operations. Imagine 
further that after the sequence the condition code contains useful information 
that we want to re-use. 
E.g a minus:DI operation would be expanded into two parallels like  
 
(parallel[ 
  (set (subreg:SI (reg:DI operand0) 0) 
        minus:SI ((subreg:SI (reg:DI operand1) 0)  
                  (subreg:SI (reg:DI operand2) 0)))) 
  (set (reg:CCmode CC) (generate borrow))]) 
(parallel[ 
  (set (subreg:SI (reg:DI operand0) 4) 
        minus:SI ( 
         (extract_borrow:SI (reg:CCmode CC)) 
         (minus:SI ((subreg:SI (reg:DI operand1) 4)  
                    (subreg:SI (reg:DI operand2) 4))))) 
  (set (reg:CCmode CC) (generate condition code))]) 
 
In order to describe what information is written to the CC register in the 
second parallel, one needs to refer to both, input parameters of the parallel 
for the lower 4 bytes and input parameters for the higher 4 bytes. E.g. the 
information that CC contains the result of a compare of operand1 and operand2 
could therefore not be expressed in the RTL! One could add, however, a third 
instruction to the expanded sequence reading 
 
(set (reg:CCmode CC) (reg:CCmode CC))  
-> WITH ATTACHED REGISTER NOTE "is equal to compare:DI (operand1) (operand2)" 
 
where the REG_EQUAL note gives the required information. IMO this is a general 
issue relevant for all targets that are aiming to do subreg lowering of 
arithmetic and logic operations at expand and that wish to recycle the 
condition codes generated. Of course 8 bit targets like AVR that need to use 
subreg lowering for almost everything will benefit most :-). 
 
I agree that generally register notes are kind of ugly. But for this kind of 
information, I think that they could be useful. 
 
Yours, 
 
Bjoern 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23726


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/23726] Missed optimizations for divmod
  2005-09-04 22:00 [Bug rtl-optimization/23726] New: Missed optimizations for divmod bjoern dot m dot haase at web dot de
                   ` (2 preceding siblings ...)
  2005-09-05  6:59 ` bjoern dot m dot haase at web dot de
@ 2005-09-05 14:55 ` pinskia at gcc dot gnu dot org
  2005-09-06  7:56 ` bjoern dot m dot haase at web dot de
  2005-09-06 13:00 ` bjoern dot m dot haase at web dot de
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-05 14:55 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-09-05 14:55 -------
You most likely want to use a libcall blocks instead of the regnotes here.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23726


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/23726] Missed optimizations for divmod
  2005-09-04 22:00 [Bug rtl-optimization/23726] New: Missed optimizations for divmod bjoern dot m dot haase at web dot de
                   ` (3 preceding siblings ...)
  2005-09-05 14:55 ` pinskia at gcc dot gnu dot org
@ 2005-09-06  7:56 ` bjoern dot m dot haase at web dot de
  2005-09-06 13:00 ` bjoern dot m dot haase at web dot de
  5 siblings, 0 replies; 7+ messages in thread
From: bjoern dot m dot haase at web dot de @ 2005-09-06  7:56 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bjoern dot m dot haase at web dot de  2005-09-06 07:52 -------
Do you know of any doc on how libcall block RTL is supposed to look like 
(except from e.g. code reading optabs.c)? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23726


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/23726] Missed optimizations for divmod
  2005-09-04 22:00 [Bug rtl-optimization/23726] New: Missed optimizations for divmod bjoern dot m dot haase at web dot de
                   ` (4 preceding siblings ...)
  2005-09-06  7:56 ` bjoern dot m dot haase at web dot de
@ 2005-09-06 13:00 ` bjoern dot m dot haase at web dot de
  5 siblings, 0 replies; 7+ messages in thread
From: bjoern dot m dot haase at web dot de @ 2005-09-06 13:00 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bjoern dot m dot haase at web dot de  2005-09-06 12:59 -------
I have done some code reading now and come to the following conclusion: 
 
When having an expanded sequence that provides one single result, libcall 
blocks are an appropriate method for making sure that a single-set insn that 
carries a REG_EQUAL note is *not* deleted too early. 
 
Libcall notes, however, do not provide a method so far for dealing with library 
calls or expanded sequences that yield *two* results. E.g. they are no solution 
for both, divmod4 on the one hand and arithmetic expanders doing subreg 
lowering and also yielding a CC on the other hand. 
 
In order to keep changes as small as possible, my suggestion is to change  
 
static void 
rest_of_handle_jump2 (void) 
 
such that it no longer calls "delete_trivially_dead_insns ()" at it's very 
beginning. (I'd have posted a patch, if savannah.gnu.org is down right now.) 
 
IIUC that the trivially dead insn are in first line a performance issue because 
they need memory and handling that would otherwise not be necessary, my 
suggested change would not be too serious. "delete_trivially_dead_insns ()" 
would be among the first things that is called in the pass that immediately 
follows jump2: The first cse pass. 
I'd appreciate comments on whether such trivially dead insn could prevent jump2 
from realizing important optimization steps. 
 
Bjoern. 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23726


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-09-06 13:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-04 22:00 [Bug rtl-optimization/23726] New: Missed optimizations for divmod bjoern dot m dot haase at web dot de
2005-09-04 22:02 ` [Bug rtl-optimization/23726] " bjoern dot m dot haase at web dot de
2005-09-04 22:33 ` pinskia at gcc dot gnu dot org
2005-09-05  6:59 ` bjoern dot m dot haase at web dot de
2005-09-05 14:55 ` pinskia at gcc dot gnu dot org
2005-09-06  7:56 ` bjoern dot m dot haase at web dot de
2005-09-06 13:00 ` bjoern dot m dot haase at web dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).