public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Exploiting dual mode operation
@ 2004-04-22 13:58 Leehod Baruch
  2004-04-22 14:59 ` Joern Rennecke
  0 siblings, 1 reply; 8+ messages in thread
From: Leehod Baruch @ 2004-04-22 13:58 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: gcc, Mircea Namolaru

Hello,

Thank you for your message.

We see redundant sign extension removal as a partial 
redundancy problem. This will give us not only the redundant 
sign extensions, but also better placement for the 
remaining ones. As a result sign extension instructions 
may be moved and instructions that uses the highpart
may be replaced with others that doesn't use it.

This is not equivalent with live information for the highest 
part of the registers. 

Our approach will not require changes in the description file 
(which in our opinion is a major problem with your code) and will 
limit the changes to much fewer files.

Leehod and Mircea 






Joern Rennecke <joern.rennecke@superh.com>
21/04/2004 09:26 PM
 
        To:     Mircea Namolaru/Haifa/IBM@IBMIL
        cc:     gcc@gcc.gnu.org, Leehod Baruch/Haifa/IBM@IBMIL
        Subject:        Re: Exploiting dual mode operation


> We are working on an optimization for eliminating redundant sign 
extension
> instructions by making use of this duality. Currently this is done only 
in 
> very 
> simple cases.

I have already submitted a patch for this:
http://gcc.gnu.org/ml/gcc-patches/2004-01/msg03259.html

It is still waiting for review.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Exploiting dual mode operation
  2004-04-22 13:58 Exploiting dual mode operation Leehod Baruch
@ 2004-04-22 14:59 ` Joern Rennecke
  2004-04-26 16:41   ` Mircea Namolaru
  0 siblings, 1 reply; 8+ messages in thread
From: Joern Rennecke @ 2004-04-22 14:59 UTC (permalink / raw)
  To: Leehod Baruch; +Cc: Joern Rennecke, gcc, Mircea Namolaru

> We see redundant sign extension removal as a partial 
> redundancy problem. This will give us not only the redundant 
> sign extensions, but also better placement for the 
> remaining ones. As a result sign extension instructions 
> may be moved and instructions that uses the highpart
> may be replaced with others that doesn't use it.

Why would you be using the highpart if you don't need to in the
first place?

Or are you saying that the port will have to duplicate patterns
that manipulate the highpart, to have a variant that uses and sets
only a smaller SUBREG, so that you can avoid tracking the lifeness
of the highpart separately?  That would be much more intrusive than
adding an attribute.

> This is not equivalent with live information for the highest 
> part of the registers. 
> 
> Our approach will not require changes in the description file 
> (which in our opinion is a major problem with your code) and will 
> limit the changes to much fewer files.

Without a special attribute in the machine description, or new code
that analyzes the md rtl to automatically generate this information,
you will have only scant information to feed into PRE computations.

And when you fix this problem, you will find that you need low-level
rtl to feed this pass, and then you don't want to push up the register
pressure with PRE.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Exploiting dual mode operation
  2004-04-22 14:59 ` Joern Rennecke
@ 2004-04-26 16:41   ` Mircea Namolaru
  2004-04-26 16:57     ` Joern Rennecke
  0 siblings, 1 reply; 8+ messages in thread
From: Mircea Namolaru @ 2004-04-26 16:41 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: gcc, Joern Rennecke, Leehod Baruch

Hello,

1. Trying to solve the sign extension removal problem using the live 
highpart 
information has some limitations. For instance in the following case 
(which
appears during computation of array addresses):

i = sign extension i1;
....
index = 64-bit shift of i  // the target and the source are 64 bits 

In some architectures we may get the same result without using an explicit 
sign
extension. As we understand it, your algorithm will found that the 
highpart  of 
"i" is live and the sign extension will not be discarded.

Another example is:
 
int i, s;
for (i = 0; i < N; i++)
{
   s1 = s + i;
   s = sign extend s1;
}
return s;
The sign extension is required for the return only, so the sign extension 
can be 
removed from the loop and placed before the return. The highpart of "s" is 
live,
but this information alone will not help to improve the code. 

2. To exploit the dual mode operation, for instructions that uses the 
result of 
explicit sign extensions we need to found if it is possible to get the 
same result 
via an instruction that doesn't require explicit sign extensions. 
Basically we 
need to found if: 

s1 = sign extend s
t1 = sign extend  t
result = inst s1, t1 

can be replaced by an instruction inst1:

result = inst1 s, t.

But this seems similar with what combine does, so the information 
from the description file should suffice.

3. One possible way for implementation is to use reaching definitions
to propagate the sign extensions forward right before the uses. This will 
create 
opportunities for combine and gcse to do the rest of the work afterward.

Another possible way is to extend gcse (but there are some issues that 
we still need to clarify).

Maybe there is a way to use your code (or part of it) ? 

Mircea and Leehod 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Exploiting dual mode operation
  2004-04-26 16:41   ` Mircea Namolaru
@ 2004-04-26 16:57     ` Joern Rennecke
  2004-04-29 14:26       ` Mircea Namolaru
  0 siblings, 1 reply; 8+ messages in thread
From: Joern Rennecke @ 2004-04-26 16:57 UTC (permalink / raw)
  To: Mircea Namolaru; +Cc: Joern Rennecke, gcc, Leehod Baruch

> 1. Trying to solve the sign extension removal problem using the live 
> highpart 
> information has some limitations. For instance in the following case 
> (which
> appears during computation of array addresses):
> 
> i = sign extension i1;
> ....
> index = 64-bit shift of i  // the target and the source are 64 bits 
> 
> In some architectures we may get the same result without using an explicit 
> sign
> extension. As we understand it, your algorithm will found that the 
> highpart  of 
> "i" is live and the sign extension will not be discarded.

It depends. If you have a static right shift such that the highpart of the
value being shifted does not actually influence the result, the highpart
attribute can be 'ignore', and the sign extension can be eliminated.

But sign extension / shift combination can actually be handled generally
and much simpler in the combiner.  You only have to make sure that your
machine description contains the matching patterns, and it will just work -
no patches to the machine independent code required.

> Another example is:
>  
> int i, s;
> for (i = 0; i < N; i++)
> {
>    s1 = s + i;
>    s = sign extend s1;
> }
> return s;
> The sign extension is required for the return only, so the sign extension 
> can be 
> removed from the loop and placed before the return. The highpart of "s" is 
> live,
> but this information alone will not help to improve the code. 

Yes, this is not covered by the highpart liveness optimization.
The SHmedia intruction set has (among others) an addition instruction
that does a 32->64 bit sign extension of the result, so again this
can be handled by the combiner.
I have some across some code that uses short or unsigned short basic
induction variables, though.
I've written some patches for the loop optimizer to pre-condition the loop
so that it stops at the end or at the signed overflow, whichever is earlier,
and then use an outer loop to handle the sign extend.
If vector addition is available, that is also used to do a zero-extending
increment of a value that has been pre-conditioned to be zero-extended.

> 2. To exploit the dual mode operation, for instructions that uses the 
> result of 
> explicit sign extensions we need to found if it is possible to get the 
> same result 
> via an instruction that doesn't require explicit sign extensions. 
> Basically we 
> need to found if: 
> 
> s1 = sign extend s
> t1 = sign extend  t
> result = inst s1, t1 
> 
> can be replaced by an instruction inst1:
> 
> result = inst1 s, t.
> 
> But this seems similar with what combine does, so the information 
> from the description file should suffice.

Yes.  Just write a testcase, start gdb on cc1, set a breakpoint at
combine_instructions, start cc1 on your testcase, look for the patterns
you want combined, then set a breakpoint at try_combine with a condition
that the uids of i2 and i3 are your two patterns.
Then step through it to see if there is any snag that prevents the pattern
from being combined, or if not, look what pattern it generates.
Than add a matching pattern to your machine description.

> 3. One possible way for implementation is to use reaching definitions
> to propagate the sign extensions forward right before the uses. This will 
> create 
> opportunities for combine and gcse to do the rest of the work afterward.

Do you mean putting the sign extended values into new pseudo registers?
That seems to have about as much potential for harm as good, since it
can leave you with extra register-register copies, and you might
loose strength reduction unless you change the loop optimizer to grok
these new copies too.

> Another possible way is to extend gcse (but there are some issues that 
> we still need to clarify).

gcse works by computing the values in separate pseudos, thus creating new
register-register copies as discussed above.
> 
> Maybe there is a way to use your code (or part of it) ? 

Would you like a unidiff of all our patches against
gcc 3.4.0 20040414 (prerelease) ?
It's 736615 bytes raw, or 216022 bytes gzipped & uuencoded.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Exploiting dual mode operation
  2004-04-26 16:57     ` Joern Rennecke
@ 2004-04-29 14:26       ` Mircea Namolaru
  0 siblings, 0 replies; 8+ messages in thread
From: Mircea Namolaru @ 2004-04-29 14:26 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: gcc, Joern Rennecke, Leehod Baruch

> Would you like a unidiff of all our patches against
> gcc 3.4.0 20040414 (prerelease) ?
> It's 736615 bytes raw, or 216022 bytes gzipped & uuencoded.

We are looking at the patches submitted by you to FSF.

Mircea and Leehod

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Exploiting dual mode operation
@ 2004-06-07 17:17 Mircea Namolaru
  0 siblings, 0 replies; 8+ messages in thread
From: Mircea Namolaru @ 2004-06-07 17:17 UTC (permalink / raw)
  To: gcc; +Cc: joern.rennecke, Leehod Baruch

Hello,

Following our message ( http://gcc.gnu.org/ml/gcc/2004-04/msg00970.html )
regarding expoiting dual mode operations, we enclose an overview of our 
algorithm.

In order to support 32 bit computations on a 64 bit machine, sign 
extension
instructions are generated to ensure the correctness of the computation.
A possible policy (currently implemented in gcc) is to generate a sign
extension after each 32 bit computation. Depending on the instruction set 
of
the architecture some of these sign extension instructions may be 
redundant.

There are two cases:

Case1: 
The instruction using the 64bit operands (after they are sign-extended)
has a dual mode that works with 32bit operands.

For example:

int32 a, b;

a = ....                               a = ....
a = sign extend a          <-->
b = ....                               b = ....
b = sign extend a          <-->

cmpd a, b                              cmpw a, b  //half word compare

Case2:
The instruction defining the 64bit operand (which is later sign-extended)
has a dual mode that defines and sign-extends a 32bit operand.

For example:

int32 a;

ld a                                   lwa a     // load half and sign 
extend
a = sign extend a          <-->

return a                               return a


We'll present the algorithm on the following example:

Example:

We have two definitions of int32, and multiple uses as below:
Source:

                         def1           def2
                          | \           / |
                          |  \         /  |
                          |   \       /   |
                          |    \     /    |
                          |     \   /     |
                          |      \ /      |
                        use1    use12    use2

Assume that only a single sign extend (se) instruction is needed on the 
path
between def1 and use12 - for all other paths, the se can be combined with
either the def or the use.

0. Initial code, as currently generated by gcc.
   (There are sign extension after each definition)

                         def1           def2
                          se             se
                          | \           / |
                          |  \         /  |
                          |   \       /   |
                          |    \     /    |
                          |     \   /     |
                          |      \ /      |
                        use1    use12    use2

1. Combine
1.a prepare for combine 
   (Redundant) sign extensions generated also before uses.

                         def1           def2
                          se             se
                          | \           / |
                          |  \         /  |
                          |   \       /   |
                          |    \     /    |
                          |     \   /     |
                          |      \ /      |
                         se      se       se
                        use1    use12    use2


1.b combine
   - Combine tries to merge se's with defs and with uses.
   Assume it succeeds for def2 and use1:

                         def1           def2
                          se            [se removed]
                          | \           / |
                          |  \         /  |
                          |   \       /   |
                          |    \     /    |
                          |     \   /     |
                          |      \ /      |
                   [se removed]  se       se
                        use1    use12    use2

2. PRE
2.a  prepare for PRE
  - if combine did not remove an se after a def (as in def1), we remove it 
now
  (recall that it is redundant) - PRE should compute an optimal placement 
of
  se's which may or may not include this after-the-def location.
  - if combine did remove an se after a def (as in def2), we want PRE to 
know
  that the new def contains an se, so that it will remove other redundant 
se's.
  So we regenerate the se explicitly, and remove it eventually.

  - Note that if combine removed an se before a use, this use no longer 
requires
  an se, so PRE needs not consider this use any longer (and hence we do 
not
  regenerate the se).

                         def1           def2
                     [se removed]        se [regenerated]
                          | \           / |
                          |  \         /  |
                          |   \       /   |
                          |    \     /    |
                          |     \   /     |
                          |      \ /      |
                                 se       se
                        use1    use12    use2

2.b PRE
   An optimal placement is:

                         def1           def2
                                         se [regenerated]
                          | \           / |
                          |  \         /  |
                          |   se      /   |
                          |    \     /    |
                          |     \   /     |
                          |      \ /      |
                        use1    use12    use2

4. Cleanup
   - The regenerated sign extension after def2 is removed.

                         def1           def2
                                     [se removed] 
                          | \           / |
                          |  \         /  |
                          |   se      /   |
                          |    \     /    |
                          |     \   /     |
                          |      \ /      |
                        use1    use12    use2

Finally we got the best placement for the sign extensions as required.

We will soon post part of code for review. 

Comments welcomed. Thanks,

Mircea and Leehod


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Exploiting dual mode operation
  2004-04-21 14:33 Mircea Namolaru
@ 2004-04-21 18:42 ` Joern Rennecke
  0 siblings, 0 replies; 8+ messages in thread
From: Joern Rennecke @ 2004-04-21 18:42 UTC (permalink / raw)
  To: Mircea Namolaru; +Cc: gcc, Leehod Baruch

> We are working on an optimization for eliminating redundant sign extension
> instructions by making use of this duality. Currently this is done only in 
> very 
> simple cases.

I have already submitted a patch for this:
http://gcc.gnu.org/ml/gcc-patches/2004-01/msg03259.html

It is still waiting for review.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Exploiting dual mode operation
@ 2004-04-21 14:33 Mircea Namolaru
  2004-04-21 18:42 ` Joern Rennecke
  0 siblings, 1 reply; 8+ messages in thread
From: Mircea Namolaru @ 2004-04-21 14:33 UTC (permalink / raw)
  To: gcc; +Cc: Leehod Baruch

Hello,

For a 64-bit architecture, values defined as 32-bit values in a program 
must
be sign extended (or zero extended, in the case of unsigned values) to 
make 
them 64-bit values.
Some 64-bit architectures support certain operations in both 64-bit and 
32-bit form.
We are working on an optimization for eliminating redundant sign extension
instructions by making use of this duality. Currently this is done only in 
very 
simple cases.

For example in the following code there are explicit sign extension
instructions that could be eliminated :

i2 = extsw i1      //  i1, j1 are 32-bit integers in 64-bits architecture.
j2 = extsw j1
.......
cmpd i2, j2        // This compare is for double word (64-bits).

But, if a half-word compare instruction is available, we could use it and
remove the sign extensions:

cmpw i1, j1        // This compare is for single word (32-bits).

The sign extension instructions are generated after each definition of a
32-bit integer. By examining all the uses of a sign extension instruction
we can decide whether it is redundent or not.

Comments welcome. Thanks,

Leehod and Mircea

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-06-07 17:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-22 13:58 Exploiting dual mode operation Leehod Baruch
2004-04-22 14:59 ` Joern Rennecke
2004-04-26 16:41   ` Mircea Namolaru
2004-04-26 16:57     ` Joern Rennecke
2004-04-29 14:26       ` Mircea Namolaru
  -- strict thread matches above, loose matches on Subject: below --
2004-06-07 17:17 Mircea Namolaru
2004-04-21 14:33 Mircea Namolaru
2004-04-21 18:42 ` Joern Rennecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).