* RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
@ 2005-06-04 12:27 Björn Haase
2005-06-04 13:05 ` Paolo Bonzini
2005-06-05 2:16 ` Ian Lance Taylor
0 siblings, 2 replies; 5+ messages in thread
From: Björn Haase @ 2005-06-04 12:27 UTC (permalink / raw)
To: Denis Chertykov; +Cc: Andy Hutchinson, marekm, gcc
Hi,
During the last weeks I have experimented a bit with the AVR back-end. IMO
there presently are two areas where it is worth to concentrating efforts on:
1.) cc0 -> CCmode transition
2.) splitting of HI- and SI-mode operations so that the RTL finally gets some
similarity with the actually existing AVR instructions.
I'd like to discuss with you on how to address these issues best because I
think it's better to have a good plan of what to do before starting
programing.
Concerning 1.) Ian Lance Taylor has made a couple of suggestions on how to
make the transition easier for the back-end maintainers. So it seems that
there is already some activity around. For issue 2.) Richard Henderson has
recently posted a patch that implements a "subreg-lowering" pass run directly
after expand that gives the register allocator the freedom to handle and
allocate the individual subreg expressions individually (could give some
tremendous improvements for AVR, e.g. when dealing with expressions combining
expressions with different modes).
IMO the two issues 1.) and 2.) are somewhat linked since, IMO it would be a
good idea to implement a cc0->CCmode transition method that does not make it
difficult to use Richard Henderson's patch later-on.
1.) Concerning the cc0 -> CCmode transition, IIUC, there are two main
problems:
A) We must make sure the reload does not insert condition-code (CC) clobbering
instructions in between a CC setter and a CC user (i.e. conditional
branches).
B) We must find a way to teach GCC to re-use as frequently as possible the
condition codes that are set by arithmetic and logic operations.
IMO, the easiest and most straight-forward solution of A) for the avr target
would be to follow Ian's suggestion:
> Ian Lance Taylor wrote:
>
>> Is there a way that makes it possible that only reload uses the patterns
>> that save and restore the condition code while everywhere else the usual
>> patterns are generated?
>> In case of the AVR target, most of the time reload will be
>> able to use processor instructions that do not require the save/restore
>> operations (it could freely access addresses up to a constant offset of 64
>> from the stack pointer) and the costly operations would not show up very
>> frequently (only for large offsets).
>
>You could do it by writing your insn patterns as expanders which check
>reload_in_progress || reload_completed and under those circumstances
>emit insns which do not clobber the condition codes.
>
>Ian
I would expect that in the rare case that reload needs to access data beyond
the 64-Byte boundary it could be tolerated to expand the memory access to a
sequence of type
"(set (temp_register) (SR))
(memory_move_instruction_instruction_inserted_by_reload)
(set (SR) (temp_register))"
One would have to confirm that, but I assume that the remaining passes after
reload would be sufficiently smart to optimize away the two copy operations
for saving and restoring the status register in case that they are not
necessary.? E.g. I think that it is justified to assume that if above
reload-generated memory access is followed by an instruction that clobbers
the condition code, both of the status register operations that are embracing
the memory move would be deleted. Possibly these two instructions would also
be deleted, if the memory move instruction does not clobber the condition
code.
Doing this, I think that one could resolve the most difficult issue A) of
cc0->CCmode conversion without having to face what has been called "the
combinatorical pattern explosion" in the mailing list archives.
Concerning issue B) Ian Lance Taylor suggests, IIUC, to implement a new pass
after reload: After reload, it is known which alternative of the different
instructions that possibly set CC has been chosen and one could find out
whether some of the compare instructions could be deleted.
IMO, if this new pass is available by the time the cc0->CCmode transition is
implemented for AVR, one should probably try to use it.
Meanwhile I'd like to suggest an approach that tries to remove the unnecessary
compare instructions already before reload. I.e. what I am thinking about is
to try to use the (G)CSE passes to identify situations where arithmetic
instructions calculate the same condition code that is again calculated
later-on by compare or test instructions. Disadvantage would be that at this
time one would not be able to know which alternative of an instruction will
be chosen by reload. In my present opinion, however, this is an issue that is
not a very big problem for AVR.
Also, IMO, one should probably try hard to identify the
"single-bit-test-and-branch" patterns before reload.
The condition-code re-use issue is the point, where, IMO, the link to the
subreg-lowering 2.) shows up. After, e.g., breaking down a HI mode "sub"
operation into two QI mode "sub" and "sub-with-carry"s at expand, I consider
it to be extremely difficult to make the mid-end smart enough to identify
that a the end of the QI "sub-with-carry" the condition code is set according
to the corresponding HImode substract operation. For DImode operations the
mid-end would already need to take 8 (!) Instructions into account for
finding out what the calculated condition code actually represents. This,
also, will be a major difficulty when considering Ian's suggested optimizer
pass after reload.
IMO condition code re-use will also be an issue for the bigger targets (e.g.
x86) when dealing with DI mode operations. My suggestion to address this
issue is: Let's not try to make the mid-end hyper smart, but let's give the
back-end designer a possibility to guide the mid-end by inserting helpful
information in the RTL. In order to illustrate what I am thinking about,
let's take above example with the minus:HI. I'd suggest to expand a subhi3
operation with 3 register operands into a sequence like
; copying the input operands in individual QImode variables
(set (reg:QI 1) (subreg:QI (operands[1]) 0))
(set (reg:QI 2) (subreg:QI (operands[1]) 1))
(set (reg:QI 3) (subreg:QI (operands[2]) 0))
(set (reg:QI 4) (subreg:QI (operands[2]) 1))
; first add
(parallel [
(set (reg:QI 5)(minus:QI (reg:QI 1) (reg:QI 3))
(set (reg:CC_carry SR) (unspec:CC_carry [(reg:QI 1) (reg:QI 3)]
carry_from_sub_code))])
; second add (with carry)
(parallel [
(set (reg:QI 6) (minus:QI (minus:QI (reg:QI 2) (reg:QI 4)) (reg:CC_carry SR))
(set (reg:CC_cc SR) (unspec:CC_cary [(reg:QI 1) (reg:QI 3) (reg:CC_carry SR)]
carry_from_sub_with_carry_code))])
; write back the result into the output operands
(set (subreg:QI (operands[0]) 0) (reg:QI 5))
(set (subreg:QI (operands[0]) 1) (reg:QI 6))
; Additional "Marker" instructions to be used by GCSE
(parallel [
(use (reg:CC_cc SR))
(set (reg:CC_cc SR) (compare:HI (operands[1]) (operands[2]))
(note "please delete the entire embracing parallel instruction before
register life-time analysis by a new pass: It pretends to use operands 1 and
2 while in fact this instruction does nothing except from giving hints to
GCSE.")
])
(parallel [
(use (reg:CC_cc SR))
(set (reg:CC_cc SR) (compare:HI (operands[0]) (const_0))
(note "please delete the entire embracing parallel instruction before
register life-time analysis by a new pass: It pretends to use operand[0]
while in fact this instruction does nothing except from giving hints to
GCSE.")
])
(parallel [
(use (operands[0]))
(set (operands[0]) (minus:HI (operands[1]) (operands[2]))
(note "please delete the entire embracing parallel instruction before
register life-time analysis by a new pass: It pretends to use operands 1 and
2 while in fact this instruction does nothing except from giving hints to
GCSE.")
])
IIUC, (g)cse would be well adapted to deal with the probably unnecessary move
instructions without any serious problems. It also will be able to identify
if a later compare instruction calculates the same condition code as the
minus operation had calculated. When having to choose between (a) teaching
the mid-end to understand what add/substract-with-carry instructions actually
do and (b) using a method like above to give hints to the present mid-end,
(b) is my favorite approach. The simplified pass sequence before reload would
then look similar to
- expand
- Richard's subreg lowering pass
- gcse and friends
- ( possible place 1 for marker instruction removal )
- combine
- ( possible place 2 for marker instruction removal )
- register allocation and reload
I know that the RTL generated by the back-end will get much more complex in
comparison to the present case. But IMO this would be part of the price one
always will have to pay when exposing the complexity of the subreg-lowering
to the RTL optimizers.
In order to be able to identify the case of single-bit tests, it might be
possible to implement a second to the optimizing procedure: By using a
special optimizer switch it would be nice if one could change the pass flow
before reload by
- expand
- gcse and friends
- combine
- ( marker instruction removal )
- Richard's subreg lowering pass
- gcse and friends -re run
- combine -re-run
- register allocation and reload
This way, it would be possibly easier for the combine pass to find out whether
a conditional branch in fact is a "branch on a single bit condition" since it
would still find the information in the RTL which kind of original
aggregate-mode expressions (i.e. HImode and larger) correspond to the zoo of
atomic (i.e. QImode) instructions expand had generated.
I think that in case that above ideas work, it will be the AVR-back-end that
would benefit most. But I also think that a similar approach could be very
for all of the other targets that have only 16 bit instructions, e.g. like
MSP430 and HC12. Possibly a similar procedure might also be helpful for the
bigger targets when dealing with DI mode expressions!
So I am having hope that addressing this issue justifies changes in gcc's mid
end.
Hoping to receive many comments and additional ideas from more competent
people ... best regards
Björn
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
2005-06-04 12:27 RFC: Strategy for cc0 -> CCmode conversion for the AVR target Björn Haase
@ 2005-06-04 13:05 ` Paolo Bonzini
2005-06-04 14:20 ` Björn Haase
2005-06-05 2:16 ` Ian Lance Taylor
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2005-06-04 13:05 UTC (permalink / raw)
To: gcc
> The condition-code re-use issue is the point, where, IMO, the link to the
> subreg-lowering 2.) shows up. After, e.g., breaking down a HI mode "sub"
> operation into two QI mode "sub" and "sub-with-carry"s at expand, I consider
> it to be extremely difficult to make the mid-end smart enough to identify
> that a the end of the QI "sub-with-carry" the condition code is set according
> to the corresponding HImode substract operation.
As a last resort, ou can always use peephole2's to remove unnecessary
subtracts on the HImode values.
> (parallel [
> (use (operands[0]))
> (set (operands[0]) (minus:HI (operands[1]) (operands[2]))
> (note "please delete the entire embracing parallel instruction before
> register life-time analysis by a new pass: It pretends to use operands 1 and
> 2 while in fact this instruction does nothing except from giving hints to
> GCSE.")
> ])
This seems define_insn_and_split, but it is a lot more complex than what
you probably can do...
IIRC, s390 does use add with carry and subtract with borrow instructions
effectively (alc and slb in IBM360^W s390-ese). Search the archives on
google or gmane.
> - combine -re-run
No way, combine is too expensive... Its simplification engine is fine,
but a lot of things ought to be redone from scratch so that it becomes a
serious instruction selection pass.
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
2005-06-04 13:05 ` Paolo Bonzini
@ 2005-06-04 14:20 ` Björn Haase
0 siblings, 0 replies; 5+ messages in thread
From: Björn Haase @ 2005-06-04 14:20 UTC (permalink / raw)
To: gcc; +Cc: Paolo Bonzini
Am Samstag, 4. Juni 2005 15:04 schrieb Paolo Bonzini:
> > (parallel [
> > (use (operands[0]))
> > (set (operands[0]) (minus:HI (operands[1]) (operands[2]))
> > (note "please delete the entire embracing parallel instruction before
> > register life-time analysis by a new pass: It pretends to use operands 1
> > and 2 while in fact this instruction does nothing except from giving
> > hints to GCSE.")
> > ])
>
> This seems define_insn_and_split, but it is a lot more complex than what
> you probably can do...
I already have confirmed that GCSE is smart enough to deal with such kind of
expressions. It effectively ignores the (use) operand when searching for
common expressions and I know that it *will* optimize away a later
instruction that has the same "set" statement: I have seen that it works when
experimenting with a sub-optimal divmodsi4 expand pattern that has been
supplemented by such type of parallel [use) (set)] instruction.
Concerning the seeming similarity with "define_insn_and_split": The *huge*
benefit of subreg lowering at expand in comparison to define_insn_and_split
is that all of the power of the optimizers before reload can work on the
resulting instruction sequences. E.g. IMO there is no way to implement
"
uint8_t a; // automatic variable in registers
int8_t b; // automatic variable in registers
int16_t c; // variable in static memory
c = a | (b << 8);
"
efficiently for AVR when splitting after expand: Gcc before reload will first
allocate four additional 8-Bit registers: It will store the sign-extended b
in to two of them and the zero-extended a in the other two new registers. For
doing this it will insert instruction sequences for calculating the
sign-extension for b. It will insert instruction sequences for the
zero-extension of a. It will excecute a shift instruction for the resulting
16 bit-value of the sign-extended b. Afterwards will come a 16 bit "ior"
instruction of the two new 16-bit values and finally, after all this
unnecessary work it will emit two QImode memory moves for the two bytes of
the 16 bit-result.
With appropriately use of Richard Henderson's patch, all that comes out after
the optimizers before reload is "two QImode moves to memory".
The optimizer passes after reload are not smart enough to identify the
optimization opportunities. Also it would be too late to take profit of the
four registers that have been allocated without actually needing them.
> IIRC, s390 does use add with carry and subtract with borrow instructions
> effectively (alc and slb in IBM360^W s390-ese). Search the archives on
> google or gmane.
Thank's for the hint. I'll have a look at the 360 port.
> > - combine -re-run
>
> No way, combine is too expensive... Its simplification engine is fine,
> but a lot of things ought to be redone from scratch so that it becomes a
> serious instruction selection pass.
I agree with you that this might not be a good choice for targets like x86 and
I am not suggesting to include this option in the lists of passes run with
any of the default options like -O0 and -O3. Maybe it could be included with
"expensive-optimizations".
Concerning compile times, one should keep in mind that a typical AVR target
disposes of 8k - 64k program memory only. IMO, compile time degradation on
the host machine would be readily accepted by all of the AVR users if it
improves code. Build time for my entire projects amount to roughly 30
seconds, most of which used for checking dependencies, linking and report
file generation. In my personal opinion: Everyone using the AVR port would be
happy even if degradation would amount to a factor of 10!
Yours,
Björn
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
2005-06-04 12:27 RFC: Strategy for cc0 -> CCmode conversion for the AVR target Björn Haase
2005-06-04 13:05 ` Paolo Bonzini
@ 2005-06-05 2:16 ` Ian Lance Taylor
2005-06-05 8:44 ` Björn Haase
1 sibling, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2005-06-05 2:16 UTC (permalink / raw)
To: Björn Haase; +Cc: Denis Chertykov, Andy Hutchinson, marekm, gcc
Björn Haase <bjoern.m.haase@web.de> writes:
> Concerning 1.) Ian Lance Taylor has made a couple of suggestions on how to
> make the transition easier for the back-end maintainers. So it seems that
> there is already some activity around.
In fact Hans-Peter Nilsson is implementing code to support this
transition. I think he started following my suggestions, but has of
course modified them in the course of actual implementation.
> I would expect that in the rare case that reload needs to access data beyond
> the 64-Byte boundary it could be tolerated to expand the memory access to a
> sequence of type
> "(set (temp_register) (SR))
> (memory_move_instruction_instruction_inserted_by_reload)
> (set (SR) (temp_register))"
>
> One would have to confirm that, but I assume that the remaining passes after
> reload would be sufficiently smart to optimize away the two copy operations
> for saving and restoring the status register in case that they are not
> necessary.? E.g. I think that it is justified to assume that if above
> reload-generated memory access is followed by an instruction that clobbers
> the condition code, both of the status register operations that are embracing
> the memory move would be deleted. Possibly these two instructions would also
> be deleted, if the memory move instruction does not clobber the condition
> code.
Assuming the memory move instruction does not clobber the SR, then I
would expect the reload CSE pass to remove the (set (SR) ...).
Presumably the temporary register would then be dead, in which case
the flow2 pass should remove the (set (temp_register) ...).
> The condition-code re-use issue is the point, where, IMO, the link to the
> subreg-lowering 2.) shows up. After, e.g., breaking down a HI mode "sub"
> operation into two QI mode "sub" and "sub-with-carry"s at expand, I consider
> it to be extremely difficult to make the mid-end smart enough to identify
> that a the end of the QI "sub-with-carry" the condition code is set according
> to the corresponding HImode substract operation. For DImode operations the
> mid-end would already need to take 8 (!) Instructions into account for
> finding out what the calculated condition code actually represents. This,
> also, will be a major difficulty when considering Ian's suggested optimizer
> pass after reload.
I agree that there is a problem here, but it's not clear to me how you
can address it under the current cc0 scheme either.
> ; Additional "Marker" instructions to be used by GCSE
> (parallel [
> (use (reg:CC_cc SR))
> (set (reg:CC_cc SR) (compare:HI (operands[1]) (operands[2]))
> (note "please delete the entire embracing parallel instruction before
> register life-time analysis by a new pass: It pretends to use operands 1 and
> 2 while in fact this instruction does nothing except from giving hints to
> GCSE.")
> ])
> (parallel [
> (use (reg:CC_cc SR))
> (set (reg:CC_cc SR) (compare:HI (operands[0]) (const_0))
> (note "please delete the entire embracing parallel instruction before
> register life-time analysis by a new pass: It pretends to use operand[0]
> while in fact this instruction does nothing except from giving hints to
> GCSE.")
> ])
> (parallel [
> (use (operands[0]))
> (set (operands[0]) (minus:HI (operands[1]) (operands[2]))
> (note "please delete the entire embracing parallel instruction before
> register life-time analysis by a new pass: It pretends to use operands 1 and
> 2 while in fact this instruction does nothing except from giving hints to
> GCSE.")
> ])
I'm not crazy about these marker instructions personally. They are
describing something which I think could be handled via parallel sets
or register notes. The more serious problem I see is that if part of
the subtraction disappears for some reason, the information, however
stored, will be incorrect. How can that problem be avoided?
Ian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RFC: Strategy for cc0 -> CCmode conversion for the AVR target.
2005-06-05 2:16 ` Ian Lance Taylor
@ 2005-06-05 8:44 ` Björn Haase
0 siblings, 0 replies; 5+ messages in thread
From: Björn Haase @ 2005-06-05 8:44 UTC (permalink / raw)
To: Ian Lance Taylor; +Cc: gcc
Thank's for your response,
Sunday, 5. Juni 2005 04:16 Ian Lance Taylor wrote:
> > The condition-code re-use issue is the point, where, IMO, the link to the
> > subreg-lowering 2.) shows up. After, e.g., breaking down a HI mode "sub"
> > operation into two QI mode "sub" and "sub-with-carry"s at expand, I
> > consider it to be extremely difficult to make the mid-end smart enough to
> > identify that a the end of the QI "sub-with-carry" the condition code is
> > set according to the corresponding HImode substract operation. For DImode
> > operations the mid-end would already need to take 8 (!) Instructions into
> > account for finding out what the calculated condition code actually
> > represents. This, also, will be a major difficulty when considering Ian's
> > suggested optimizer pass after reload.
>
> I agree that there is a problem here, but it's not clear to me how you
> can address it under the current cc0 scheme either.
I agree. That problem is existing as well with the cc0 scheme. What I was
aiming to suggest is: When it's necessary to work on the CC re-use for the
cc0->CCmode transition anyway, let's try to use a path that makes it possible
to switch to splitting after expand without facing too much problems.
> > ; Additional "Marker" instructions to be used by GCSE
> > (parallel [
> > (use (reg:CC_cc SR))
> > (set (reg:CC_cc SR) (compare:HI (operands[0]) (const_0))
> > (note "please delete the entire embracing parallel instruction before
> > register life-time analysis by a new pass: It pretends to use operand[0]
> > while in fact this instruction does nothing except from giving hints to
> > GCSE.")
> > ])
> > (parallel [
> > (use (operands[0]))
> > (set (operands[0]) (minus:HI (operands[1]) (operands[2]))
> > (note "please delete the entire embracing parallel instruction before
> > register life-time analysis by a new pass: It pretends to use operands 1
> > and 2 while in fact this instruction does nothing except from giving
> > hints to GCSE.")
> > ])
>
> I'm not crazy about these marker instructions personally. They are
> describing something which I think could be handled via parallel sets
> or register notes.
The advantage of using actual instructions that I see is, that already the
present implementation of the (G)CSE is able to work with the way the
information is stored in the marker instruction. And it's, IMO, one possible
method for carrying around the information what is actually contained in the
zoo of subregs. One would have a bit freedom to choose at which time the
information concerning the larger modes is discarded.
The reason why (IMO) it is not possible to realize the same functionality with
"double set" instructions is, that the condition code generally depends on
*all* of the involved subregs. It will be set, however, only by the last
instruction of the sequence and this one works only on some *sub*-set of the
input operands. I.e. in the example the later "substract with carry"
operation (the one working on the most significant byte) would have to
pretend to use both, the least significant bytes and the most significant
bytes if it included a parallel set for writing the condition code of the
entire sequence. The double-set instruction would therefore include an
incorrect dependency. This will cause problems, e.g. when the input and
output operands of the earlier instructions overlap: Gcc would be forced to
hold an unchanged copy of the overwritten input operands in additional
registers since it assumes that the last instruction needs them.
> The more serious problem I see is that if part of
> the subtraction disappears for some reason, the information, however
> stored, will be incorrect. How can that problem be avoided?
IIUC, the existence of the use statement in the marker instruction prevents
that the registers written by the subtraction ever may be marked as "dead"
unless the marker instruction itself is removed. I so far have expected that
no pass would be allowed to remove an instruction if a later instruction
depends on it's result, independent of whether the dependence stems from a
(use) statement or the use of the value in a conventional arithmetic or logic
operation.
Maybe I understood something wrong, but so far I do not see a problem that one
would need to avoid.
Yours,
Björn
P.S.:
As a detail: I have just realized that the first one of the three marker
instructions (the one setting CC according to the compare of operands[1] and
operands[2]) probably would have to be placed in front of the two
instructions writing (operands[0]). operands[0] and operands[1] possibly
could overlap.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-06-05 8:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-04 12:27 RFC: Strategy for cc0 -> CCmode conversion for the AVR target Björn Haase
2005-06-04 13:05 ` Paolo Bonzini
2005-06-04 14:20 ` Björn Haase
2005-06-05 2:16 ` Ian Lance Taylor
2005-06-05 8:44 ` Björn Haase
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).