public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* paradoxical subreg problem
@ 2002-01-28 11:36 law
  2002-01-28 11:50 ` Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: law @ 2002-01-28 11:36 UTC (permalink / raw)
  To: gcc


What a rats nest.  This relates of PRs 5169, 5185 and 5264.

I'll start simple.  On a big endian machine, can this expression be optimized
into true/false at compile time, or must it be run-time computed?

(eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                (const_int 15 [0xf])) 1) 0)
    (mem/s:SI (plus:SI (reg:SI 3 %r3)
            (const_int 12 [0xc])) 1))


Think very very carefully about the semantics of a paradoxical subreg.

According to my reading, the compiler is allowed to optimize the expression
into (true) because the bits outside of QImode on the subreg are "don't care
bits" -- meaning they can have any value that is convenient to us.

Agree/Disagree?

Now consider if byte loads zero extend.  Does your answer change?  In the
subreg arm, those "don't care" bits, have a well defined meaning -- ie, we
can't pretend they have whatever value is convenient for us.  So, unless
we have some more specific knowledge about the other arm, then this
expression must be evaluated at runtime.

As you can probably guess, the compiler evaluates the expression at compile
time, which is wrong.


Can someone explain to me clearly what the semantics of a paradoxical
subreg really are?


jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 11:36 paradoxical subreg problem law
@ 2002-01-28 11:50 ` Richard Henderson
  2002-01-28 11:58   ` law
  2002-01-28 12:15 ` Michael Matz
  2002-01-28 15:41 ` Jim Wilson
  2 siblings, 1 reply; 32+ messages in thread
From: Richard Henderson @ 2002-01-28 11:50 UTC (permalink / raw)
  To: law; +Cc: gcc

On Mon, Jan 28, 2002 at 11:31:16AM -0700, law@redhat.com wrote:
> I'll start simple.  On a big endian machine, can this expression be optimized
> into true/false at compile time, or must it be run-time computed?
> 
> (eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
>                 (const_int 15 [0xf])) 1) 0)
>     (mem/s:SI (plus:SI (reg:SI 3 %r3)
>             (const_int 12 [0xc])) 1))
> 
> 
> Think very very carefully about the semantics of a paradoxical subreg.

I'll remind you that we should only have subregs of memory during
reload (while pseudos are being replaced), and even then we don't
create paradoxical subregs of memory.

So this is a "shouldn't happen".


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 11:50 ` Richard Henderson
@ 2002-01-28 11:58   ` law
  2002-01-28 12:05     ` Richard Henderson
  0 siblings, 1 reply; 32+ messages in thread
From: law @ 2002-01-28 11:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

In message <20020128105031.D25844@redhat.com>, Richard Henderson writes:
 > On Mon, Jan 28, 2002 at 11:31:16AM -0700, law@redhat.com wrote:
 > > I'll start simple.  On a big endian machine, can this expression be optimi
 > zed
 > > into true/false at compile time, or must it be run-time computed?
 > > 
 > > (eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
 > >                 (const_int 15 [0xf])) 1) 0)
 > >     (mem/s:SI (plus:SI (reg:SI 3 %r3)
 > >             (const_int 12 [0xc])) 1))
 > > 
 > > 
 > > Think very very carefully about the semantics of a paradoxical subreg.
 > 
 > I'll remind you that we should only have subregs of memory during
 > reload (while pseudos are being replaced), and even then we don't
 > create paradoxical subregs of memory.
 > 
 > So this is a "shouldn't happen"
More correctly, they're allowed between combine and reload inclusively.  And
it's the combiner that is creating the paradoxical, then incorrectly
optimizing away a necessary comparison.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 11:58   ` law
@ 2002-01-28 12:05     ` Richard Henderson
  2002-01-28 12:09       ` law
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Henderson @ 2002-01-28 12:05 UTC (permalink / raw)
  To: law; +Cc: gcc

On Mon, Jan 28, 2002 at 11:59:37AM -0700, law@redhat.com wrote:
> More correctly, they're allowed between combine and reload inclusively.

I don't think so.

> And it's the combiner that is creating the paradoxical,

That would be a bug.  A failure to call simplify_subreg or
simplify_gen_subreg.


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 12:05     ` Richard Henderson
@ 2002-01-28 12:09       ` law
  2002-01-28 12:39         ` Richard Henderson
  0 siblings, 1 reply; 32+ messages in thread
From: law @ 2002-01-28 12:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

In message <20020128112330.A25997@redhat.com>, Richard Henderson writes:
 > On Mon, Jan 28, 2002 at 11:59:37AM -0700, law@redhat.com wrote:
 > > More correctly, they're allowed between combine and reload inclusively.
 > 
 > I don't think so.
The documentation has allowed this for years.  From rtl.texi:


@cindex combiner pass
@cindex reload pass
@cindex @code{subreg}, special reload handling
Between the combiner pass and the reload pass, it is possible to have a
paradoxical @code{subreg} which contains a @code{mem} instead of a
@code{reg} as its first operand.  After the reload pass, it is also
possible to have a non-paradoxical @code{subreg} which contains a
@code{mem}; this usually occurs when the @code{mem} is a stack slot
which replaced a pseudo register.

 > > And it's the combiner that is creating the paradoxical,
 > 
 > That would be a bug.  A failure to call simplify_subreg or
 > simplify_gen_subreg.
I disagree given the above docs.

jeff
 > 
 > 
 > r~


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 11:36 paradoxical subreg problem law
  2002-01-28 11:50 ` Richard Henderson
@ 2002-01-28 12:15 ` Michael Matz
  2002-01-28 15:41 ` Jim Wilson
  2 siblings, 0 replies; 32+ messages in thread
From: Michael Matz @ 2002-01-28 12:15 UTC (permalink / raw)
  To: Jeffrey A Law; +Cc: gcc

Hi,

On Mon, 28 Jan 2002 law@redhat.com wrote:

> I'll start simple.  On a big endian machine, can this expression be optimized
> into true/false at compile time, or must it be run-time computed?
>
> (eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
>                 (const_int 15 [0xf])) 1) 0)
>     (mem/s:SI (plus:SI (reg:SI 3 %r3)
>             (const_int 12 [0xc])) 1))
>
> Think very very carefully about the semantics of a paradoxical subreg.
>
> According to my reading, the compiler is allowed to optimize the expression
> into (true) because the bits outside of QImode on the subreg are "don't care
> bits" -- meaning they can have any value that is convenient to us.
>
> Agree/Disagree?

Disagree.  Richard H. pointed out that paradoxical subreg of mem are
normally not allowed, so this shouldn't happen, and Richard K. later
transformed it into what reload would do to this, and concludes that this
optimization is valid.  I disagree to the latter for deeper reasons:

The non-touched bit's of paradoxical subregs (be they from mem or from
pseudos) are _not_ "don't care".  They are "we don't know", i.e.
undefined.  Any optimizations making use of knowledge of the content of
those bits must be invalid.  In some very constrained circumstances we
might relax that (e.g. when we know that byte-loading will zero-extend).
In those cases we of course shouldn't simply rely on
  (use (subreg:SI (reg:QI x) 0) ...) doing the right thing, but instead
make that knowledge explicit, e.g. by
  (use (zero_extend:SI (reg:QI x))

I.e. replace paradoxical subregs by known operations, and if we can't, due
to lack of knowledge or other reasons, inhibit any optimizations referring
to the undefined parts.  The documentation seems to be misleading: it
talks about forming subregs, when the code doesn't care about the other
bits.  If that is correct combine shouldn't have produced that subreg of
mem, because clearly we _do_ care about those other bits in the compare.
OTOH once such a subreg is produced we can't deduce anything anymore.

So this is twofold: may be it was too difficult to prove that we don't
care about those bits (cf. combine), so such subregs are produced in the
hope to get better code later.  Then we should deactivate optimizations
making use of knowledge about undefined bits (i.e. when we later see, that
we do care nontheless).

Or we could get rid of paradoxical subregs ;-)  They are so paradoxical.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 12:09       ` law
@ 2002-01-28 12:39         ` Richard Henderson
  2002-01-28 13:03           ` law
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Henderson @ 2002-01-28 12:39 UTC (permalink / raw)
  To: law; +Cc: gcc

On Mon, Jan 28, 2002 at 12:26:13PM -0700, law@redhat.com wrote:
>  > That would be a bug.  A failure to call simplify_subreg or
>  > simplify_gen_subreg.
> I disagree given the above docs.

Given the problem you're seeing, wouldn't it be better to 
disallow it anyway and fix the docs?  I can't imagine what
good you could do by allowing such a thing.


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 12:39         ` Richard Henderson
@ 2002-01-28 13:03           ` law
  2002-01-28 13:18             ` Richard Henderson
  2002-01-28 14:25             ` Michael Matz
  0 siblings, 2 replies; 32+ messages in thread
From: law @ 2002-01-28 13:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

In message <20020128115045.A26034@redhat.com>, Richard Henderson writes:
 > On Mon, Jan 28, 2002 at 12:26:13PM -0700, law@redhat.com wrote:
 > >  > That would be a bug.  A failure to call simplify_subreg or
 > >  > simplify_gen_subreg.
 > > I disagree given the above docs.
 > 
 > Given the problem you're seeing, wouldn't it be better to 
 > disallow it anyway and fix the docs?  I can't imagine what
 > good you could do by allowing such a thing.
Possibly -- I don't even known why it's allowed in this case.  There
may be a good reason for it, then again, there may not.  My gut
instinct is to disallow it, given the semantic issues that arise.

However, even if we do that I think we need to clarify precisely what
the semantics of a paradoxical subreg really are.    Clearly there is
some confusion.

[ Actually my preference would be to zap paradoxical subregs, but I
  don't think we're in a position to do that right now. ]

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 13:03           ` law
@ 2002-01-28 13:18             ` Richard Henderson
  2002-01-28 14:25             ` Michael Matz
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Henderson @ 2002-01-28 13:18 UTC (permalink / raw)
  To: law; +Cc: gcc

On Mon, Jan 28, 2002 at 01:03:20PM -0700, law@redhat.com wrote:
> However, even if we do that I think we need to clarify precisely what
> the semantics of a paradoxical subreg really are.    Clearly there is
> some confusion.

With plain registers, I don't think there is any confusion -- the
upper bits are simply undefined.  With memories, there is confusion.
But I think we should "clarify" by disallowing them.

> [ Actually my preference would be to zap paradoxical subregs, but I
>   don't think we're in a position to do that right now. ]

Nope.


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 13:03           ` law
  2002-01-28 13:18             ` Richard Henderson
@ 2002-01-28 14:25             ` Michael Matz
  1 sibling, 0 replies; 32+ messages in thread
From: Michael Matz @ 2002-01-28 14:25 UTC (permalink / raw)
  To: law; +Cc: Richard Henderson, gcc

Ho,

On Mon, 28 Jan 2002 law@redhat.com wrote:

> However, even if we do that I think we need to clarify precisely what
> the semantics of a paradoxical subreg really are.    Clearly there is
> some confusion.

Well, para. subregs introduce undefined bits.  From that everything
follows.  They can only be introduced if the code doesn't care about that
undefinedness (IMHO that is what the docu talks about).
If they are introduced there exists no knowledge about the undefined bits.
Under this rule alone para. subregs of mem's are not disallowed.  In fact
all are handled the same.  (mem subregs might be (and are) disallowed for
other reasons)

Btw. (and:SI (subreg:SI (reg:QI X) 0) 255) and (reg:QI X) are in that
definition equal.  The subreg first introduces undefinedness just to throw
it away again with the 'and'.  Problems only arise if that subreg is
copied to a (SImode) pseudo.  That pseudo still carries that partial
undefinedness property, which is what is problematic about them, because
we have no real way to note that.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 11:36 paradoxical subreg problem law
  2002-01-28 11:50 ` Richard Henderson
  2002-01-28 12:15 ` Michael Matz
@ 2002-01-28 15:41 ` Jim Wilson
  2002-01-29  1:57   ` Jim Wilson
                     ` (2 more replies)
  2 siblings, 3 replies; 32+ messages in thread
From: Jim Wilson @ 2002-01-28 15:41 UTC (permalink / raw)
  To: law; +Cc: gcc

In article <16350.1012242676@porcupine.cygnus.com> you write:
>Think very very carefully about the semantics of a paradoxical subreg.

A paradoxical subreg can have one of two different meanings, depending on
context.

1) The extra bits are don't care bits.  This usage can occur anywhere, and
   is primarily used when manipulating values larger than the word size.
   This perhaps requires that the operand be a register, but I'm not sure.
   For instance, on a 32-bit target, given (subreg:DI (reg:SI 100)), the
   extra 32-bits are don't care bits.
2) The extra bits are known to be zero or one, depending on whether loads
   zero or sign extend by default.  This is only used for values smaller than
   or equal to the the word size, and I think this also requires that the
   operand be a MEM, but I'm not positive about the last condition.  For
   instance, on a 32-bit target with zero-extending loads, given
   (subreg:SI (mem:QI ...)), the extra 24-bits are known to be zero.  This
   usage occurs only between combine and reload.

It has been this way for over a decade I'd say.

The first meaning came first, and has obvious uses.  I believe the second
meaning arose because of our primarily CISC oriented view at the time.
Since predicates know about subregs already, using subregs for zero/sign
extended loads means that they would be accepted automatically.  If we
added the explicit zero/sign extension operators, then a lot of predicates
and patterns would have to be modified to allow zero/sign extension operators.
Nowadays, the prevailing wisdom is that you should never create an insn
that will require a reload, and thus it may be that the second type of
paradoxical subreg is no longer useful.  We already have the problem that
combine.c and recog.c have special code checking INSN_SCHEDULING to get rid
of the second class of paradoxical subregs.  This code is unclean though,
as we never should have created these paradoxical subregs on a load/store
(RISC) machine to begin with.  If the patterns in the md file don't allow
MEM, then recog should not be accepting (subreg (mem)) as a register operand.

There might be a second justification that this feature allows for better
optimization, by exposing zero/sign extensions that would otherwise be
hidden.  However, I don't think this is convincing.  Combine has code to
keep track of sign-bit copies now, and we can do this just as well by using
explicit zero_extend/sign_extend operators instead of using subregs.

Jim





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 15:41 ` Jim Wilson
@ 2002-01-29  1:57   ` Jim Wilson
  2002-01-29 10:55   ` Richard Earnshaw
  2002-01-29 12:21   ` paradoxical subreg problem [ patch attached ] law
  2 siblings, 0 replies; 32+ messages in thread
From: Jim Wilson @ 2002-01-29  1:57 UTC (permalink / raw)
  To: gcc; +Cc: law

In article <200201282147.NAA19688@rtl.cygnus.com> you write:
>In article <16350.1012242676@porcupine.cygnus.com> you write:
>2) The extra bits are known to be zero or one, depending on whether loads
>   zero or sign extend by default.  This is only used for values smaller than
>   or equal to the the word size, and I think this also requires that the
>   operand be a MEM, but I'm not positive about the last condition.  For
>   instance, on a 32-bit target with zero-extending loads, given
>   (subreg:SI (mem:QI ...)), the extra 24-bits are known to be zero.  This
>   usage occurs only between combine and reload.

I left out an important piece.  This is only true if WORD_REGISTER_OPERATIONS
is defined.  If WORD_REGISTER_OPERATIONS is not defined, then the upper
24 bits of (subreg:SI (mem:QI ...)) are undefined.  A QImode load on a CISC
tends to modify the low 8-bits of the register, and leave the upper 24-bit
unchanged, which means they have no useful value.  If WORD_REGISTER_OPERATIONS
is defined, then the upper 24 bits of (subreg:SI (mem:QI ...)) are defined,
and are known to be zero or one depending on the value of LOAD_EXTEND_OP.
A QImode load on a RISC tends to set the entire 32-bits of a register, so we
want to use that knowledge to optimize away unnecessary zero/sign extend
instructions.  Except that these paradoxical subregs prevent us from
properly scheduling memory operations, so we try to avoid recognizing them when
INSN_SCHEDULING, which partially defeats the purpose of creating them when
WORD_REGISTER_OPERATIONS is true, since most any target that has one of
these defined has the other one defined too.  They still are created and
used inside combine though, and help with some simplications inside combine,
but I think we would get the same effect by using explicit zero/sign extend
operators.

Searching combine.c for WORD_REGISTER_OPERATIONS shows that it does some
interesting things, some of them in conjunction with LOAD_EXTEND_OP.

Curiously, I just noticed that the INSN_SCHEDULING code in combine.c looks
wrong.  It has:
          /* If *SPLIT is a paradoxical SUBREG, when we split it, it should
             be written as a ZERO_EXTEND.  */
          if (split_code == SUBREG && GET_CODE (SUBREG_REG (*split)) == MEM)
            SUBST (*split, gen_rtx_ZERO_EXTEND  (split_mode,
                                                 SUBREG_REG (*split)));
I think it should be using LOAD_EXTEND_OP instead of assuming a ZERO_EXTEND,
but that code has been there for about 10 years, so if it was wrong, we
should have noticed long ago.

>(eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
>                (const_int 15 [0xf])) 1) 0)
>    (mem/s:SI (plus:SI (reg:SI 3 %r3)
>            (const_int 12 [0xc])) 1))

If the word size is 32-bits or more, and WORD_REGISTER_OPERATIONS is defined,
and LOAD_EXTEND_OP is defined, then this is a 32-bit compare.  It can not
be simplified at compile time unless we know the values in the memory locations
at offsets 12, 13, and 14, and know that these values match the value of
the byte at offset 15 extended as per LOAD_EXTEND_OP.

>Can someone explain to me clearly what the semantics of a paradoxical
>subreg really are?

No, because the use of subregs has gotten too complicated over time.
They are used for more than one purpose, and have more than one meaning,
depending on the context, and depending on the value of some target macros.

Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 15:41 ` Jim Wilson
  2002-01-29  1:57   ` Jim Wilson
@ 2002-01-29 10:55   ` Richard Earnshaw
  2002-01-29 12:21   ` paradoxical subreg problem [ patch attached ] law
  2 siblings, 0 replies; 32+ messages in thread
From: Richard Earnshaw @ 2002-01-29 10:55 UTC (permalink / raw)
  To: Jim Wilson; +Cc: law, gcc, Richard.Earnshaw

> In article <16350.1012242676@porcupine.cygnus.com> you write:
> >Think very very carefully about the semantics of a paradoxical subreg.
> 
> A paradoxical subreg can have one of two different meanings, depending on
> context.
> 
> 1) The extra bits are don't care bits.  This usage can occur anywhere, and
>    is primarily used when manipulating values larger than the word size.
>    This perhaps requires that the operand be a register, but I'm not sure.
>    For instance, on a 32-bit target, given (subreg:DI (reg:SI 100)), the
>    extra 32-bits are don't care bits.
> 2) The extra bits are known to be zero or one, depending on whether loads
>    zero or sign extend by default.  This is only used for values smaller than
>    or equal to the the word size, and I think this also requires that the
>    operand be a MEM, but I'm not positive about the last condition.  For
>    instance, on a 32-bit target with zero-extending loads, given
>    (subreg:SI (mem:QI ...)), the extra 24-bits are known to be zero.  This
>    usage occurs only between combine and reload.
> 
> It has been this way for over a decade I'd say.
> 
> The first meaning came first, and has obvious uses.  I believe the second
> meaning arose because of our primarily CISC oriented view at the time.
> Since predicates know about subregs already, using subregs for zero/sign
> extended loads means that they would be accepted automatically.  If we
> added the explicit zero/sign extension operators, then a lot of predicates
> and patterns would have to be modified to allow zero/sign extension operators.
> Nowadays, the prevailing wisdom is that you should never create an insn
> that will require a reload, and thus it may be that the second type of
> paradoxical subreg is no longer useful.  We already have the problem that
> combine.c and recog.c have special code checking INSN_SCHEDULING to get rid
> of the second class of paradoxical subregs.  This code is unclean though,
> as we never should have created these paradoxical subregs on a load/store
> (RISC) machine to begin with.  If the patterns in the md file don't allow
> MEM, then recog should not be accepting (subreg (mem)) as a register operand.
> 
> There might be a second justification that this feature allows for better
> optimization, by exposing zero/sign extensions that would otherwise be
> hidden.  However, I don't think this is convincing.  Combine has code to
> keep track of sign-bit copies now, and we can do this just as well by using
> explicit zero_extend/sign_extend operators instead of using subregs.

Yep, that summarizes my understanding as well.

Most of this rubbish seems to come about because register_operand accepts 
subreg(mem) as a valid register.  This has always seemed daft to me, and 
indeed the arm back-end hardly ever uses register_operand because of this 
(it seems to generate worse code by forcing spills during reload that 
would otherwise have been caught during general register allocation).

R.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem  [ patch attached ]
  2002-01-28 15:41 ` Jim Wilson
  2002-01-29  1:57   ` Jim Wilson
  2002-01-29 10:55   ` Richard Earnshaw
@ 2002-01-29 12:21   ` law
  2002-01-29 13:46     ` Richard Earnshaw
  2002-01-29 16:41     ` Jim Wilson
  2 siblings, 2 replies; 32+ messages in thread
From: law @ 2002-01-29 12:21 UTC (permalink / raw)
  To: Jim Wilson; +Cc: gcc, gcc-patches

 In message <200201282147.NAA19688@rtl.cygnus.com>, Jim Wilson writes:
 > 1) The extra bits are don't care bits.  This usage can occur anywhere, and
 >    is primarily used when manipulating values larger than the word size.
 >    This perhaps requires that the operand be a register, but I'm not sure.
 >    For instance, on a 32-bit target, given (subreg:DI (reg:SI 100)), the
 >    extra 32-bits are don't care bits.
OK.  This is basically I've always understood a paradoxical subreg to mean;
I don't recall a requirement that this only apply to registers, but that
restriction makes a fair amount of sense.


 > 2) The extra bits are known to be zero or one, depending on whether loads
 >    zero or sign extend by default.  This is only used for values smaller 
than
 >    or equal to the the word size, and I think this also requires that the
 >    operand be a MEM, but I'm not positive about the last condition.  For
 >    instance, on a 32-bit target with zero-extending loads, given
 >    (subreg:SI (mem:QI ...)), the extra 24-bits are known to be zero.  This
 >    usage occurs only between combine and reload.
This one is new to me, but appears to correspond to parts of combine that
I reviewed yesterday.   Based on my reading of combine, I suspect this 
requires a MEM.  And as pointed out later, it also requires LOAD_EXTEND_OP --
else the bits are undefined.

[ ... ]
Thanks.  This was extremely helpful in working through this issue.




So let's pretend we're in try_combine with the following:

[ Note the current sources will not exhibit this problem due to unrelated
  changes in how we order arguments for a comparison.  However, the
  underlying bug is very real. ]

(gdb) p debug_rtx(i3)
(jump_insn 83 82 92 (set (pc)
        (if_then_else (eq (reg:SI 107)
                (reg/v:SI 94))
            (label_ref 45)
            (pc))) 44 {absdi2+5} (insn_list 82 (nil))
    (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_BR_PROB (const_int 3999 [0xf9f])
            (nil))))

(gdb) p debug_rtx(i2)
(insn 82 79 83 (set (reg:SI 107)
        (zero_extend:SI (subreg:QI (reg/v:SI 94) 0))) 135 {zero_extendqidi2-1} 
(insn_list 11 (nil))
    (nil))

(gdb) p debug_rtx(i1)
(insn 11 7 14 (set (reg/v:SI 94)
        (mem/s:SI (plus:SI (reg:SI 3 %r3)
                (const_int 12 [0xc])) 1)) 69 {pre_ldw-4} (nil)
    (nil))


The first thing we do is substitute the SET_SRC of i2 into i3 giving us

(jump_insn 83 82 92 (set (pc)
        (if_then_else (eq (zero_extend:SI (subreg:QI (reg/v:SI 94) 0))
                (reg/v:SI 94))
            (label_ref 45)
            (pc))) 44 {absdi2+5} (insn_list 82 (nil))
    (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_BR_PROB (const_int 3999 [0xf9f])
            (nil))))


i1 is unchanged at this point, and i2 would be deleted if the combination
is a success.  I think we would all agree this transformation has not changed
the meaning of the code.

The next step is to substitute the SET_SRC of i1 into i3.  This happens in
several steps as (reg 94) appears twice in i3.

The first step creates an insn which looks something like this:

(jump_insn 83 82 92 (set (pc)
        (if_then_else (eq (zero_extend:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                            (const_int 15 [0xf])) 1))
                (reg/v:SI 94))
            (label_ref 45)
            (pc))) 44 {absdi2+5} (insn_list 82 (nil))
    (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_BR_PROB (const_int 3999 [0xf9f])
            (nil))))


We then simplify the zero_extension into a subreg, which is OK given #2 in the
semantics Jim detailed.  The result of that simplification looks like:

(jump_insn 83 82 92 (set (pc)
        (if_then_else (eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                            (const_int 15 [0xf])) 1) 0)
                (reg/v:SI 94))
            (label_ref 45)
            (pc))) 44 {absdi2+5} (insn_list 82 (nil))
    (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_BR_PROB (const_int 3999 [0xf9f])
            (nil))))


Now we proceed to replace the other occurrence of (reg 94) with its
memory location.  That results in the following insn:

(jump_insn 83 82 92 (set (pc)
        (if_then_else (eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                            (const_int 15 [0xf])) 1) 0)
                (mem/s:SI (plus:SI (reg:SI 3 %r3)
                        (const_int 12 [0xc])) 1))
            (label_ref 45)
            (pc))) 44 {absdi2+5} (insn_list 82 (nil))
    (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_BR_PROB (const_int 3999 [0xf9f])
            (nil))))

Again, given Jim's semantics, we're still OK at this point.  We proceed to
call combine_simplify_rtx on the (eq ...) expression.  We simplify it to


(eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                (const_int 15 [0xf])) 1) 0)
    (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                (const_int 15 [0xf])) 1) 0))

Which is wrong as the second operand of the EQ has been incorrectly changed.


The culprit is this code in combine.c::simplify_comparison:

[ Kenner, does this look familiar :-) ]

  /* Now make any compound operations involved in this comparison.  Then,
     check for an outmost SUBREG on OP0 that is not doing anything or is
     paradoxical.  The latter case can only occur when it is known that the
     "extra" bits will be zero.  Therefore, it is safe to remove the SUBREG.
     We can never remove a SUBREG for a non-equality comparison because the
     sign bit is in a different place in the underlying object.  */

  op0 = make_compound_operation (op0, op1 == const0_rtx ? COMPARE : SET);
  op1 = make_compound_operation (op1, SET);

  if (GET_CODE (op0) == SUBREG && subreg_lowpart_p (op0)
      && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
      && (code == NE || code == EQ)
      && ((GET_MODE_SIZE (GET_MODE (op0))
           > GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))))))
    {
      op0 = SUBREG_REG (op0);
      op1 = gen_lowpart_for_combine (GET_MODE (op0), op1);
    }

The compound operations look OK.

(gdb) p debug_rtx(op0)
(subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
            (const_int 15 [0xf])) 1) 0)

(gdb) p debug_rtx(op1)
(mem/s:SI (plus:SI (reg:SI 3 %r3)
        (const_int 12 [0xc])) 1)


However, the logic for removing the SUBREG seems wrong.

Consider that we're comparing op0 and op1 for equality -- both operands
are SImode, so we're doing a 32bit comparison.  Narrowing the operands
to a smaller mode effectively removes bits from consideration.  This 
transformation is valid if and only if we know those bits have identical
values for *both* operands.

[ Time to go band and re-review the semantics. ]

So it seems like there's three cases to consider.

  1. SUBREG_REG (op0)) is a register.  In this case the bits are don't
  care bits and we can assume they have any convenient value.  So
  making the transformation is safe.

  2. SUBREG_REG (op0) is a memory and LOAD_EXTEND_OP is not defined.  In
  this case the upper bits of op0 are undefined.  It seems to me we should
  not make the simplification in that case as we do not know the contents
  of those bits.

  3. SUBREG_REG (op0) is a memory and LOAD_EXTEND_OP is defined.  In that
  case we know those bits are zeros or ones; however we do not know if
  they are the same as the upper bits of op1.  It seems to me we should
  not make the simplification in this case.


The net result is this simplification should only be valid if
SUBREG_REG (op0) is a register.  

With that in mind, this patch fixes the problem.  I'm testing this in an
older tree which exhibits the bug -- I'll refrain from checking this
in until folks have had a chance to comment.


	* combine.c (simplify_comparison): Avoid narrowing a comparison
	with a paradoxical subreg when doing so would drop signficant bits.

Index: combine.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gcc/combine.c,v
retrieving revision 1.227.6.4
diff -c -3 -p -r1.227.6.4 combine.c
*** combine.c	2001/02/13 05:06:59	1.227.6.4
--- combine.c	2002/01/29 16:18:53
*************** simplify_comparison (code, pop0, pop1)
*** 10912,10917 ****
--- 10912,10918 ----
    op1 = make_compound_operation (op1, SET);
  
    if (GET_CODE (op0) == SUBREG && subreg_lowpart_p (op0)
+       && GET_CODE (SUBREG_REG (op0)) == REG
        && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
        && (code == NE || code == EQ)
        && ((GET_MODE_SIZE (GET_MODE (op0))


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem [ patch attached ]
  2002-01-29 12:21   ` paradoxical subreg problem [ patch attached ] law
@ 2002-01-29 13:46     ` Richard Earnshaw
  2002-01-29 16:41     ` Jim Wilson
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Earnshaw @ 2002-01-29 13:46 UTC (permalink / raw)
  To: law; +Cc: Jim Wilson, gcc, gcc-patches, Richard.Earnshaw


>   3. SUBREG_REG (op0) is a memory and LOAD_EXTEND_OP is defined. 

And doesn't return NIL for the mode of op0.  The ARM port can do this for 
HImode, since there are variants of the ARM (older ones) that don't extend 
half-word memory fetches.

>  In that
>   case we know those bits are zeros or ones; however we do not know if
>   they are the same as the upper bits of op1.  It seems to me we should
>   not make the simplification in this case.

R.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem [ patch attached ]
  2002-01-29 12:21   ` paradoxical subreg problem [ patch attached ] law
  2002-01-29 13:46     ` Richard Earnshaw
@ 2002-01-29 16:41     ` Jim Wilson
  2002-01-29 18:08       ` law
  2002-01-31 13:22       ` law
  1 sibling, 2 replies; 32+ messages in thread
From: Jim Wilson @ 2002-01-29 16:41 UTC (permalink / raw)
  To: law; +Cc: gcc, gcc-patches

>[ Kenner, does this look familiar :-) ]

Kenner's original patch was added in 1991, a couple of months after he
completely rewrote combine.c.  There was a lot of change in combine.c at
the time, and it is very unlikely anyone will remember all of the details
of any one particular change.

Kenner's change is correct if you exclude (subreg (mem)).  The (subreg (mem))
meaning changed as part of Kenner's combine.c rewrite, as this is when
BYTE_LOADS_ZERO_EXTEND was introduced.  BYTE_LOADS_ZERO_EXTEND was the
forerunner of the current WORD_REGISTER_OPERATIONS and LOAD_EXTEND_OP macros.
I believe Kenner's change will correct again in the future after we get rid
of (subreg (mem)).  But for now, it is wrong.  It looks like a simple
oversight.  Or maybe BYTE_LOADS_ZERO_EXTEND was so new at the time that we
had not yet realized that the meaning of (subreg (mem)) had (accidentally?)
changed.  Or maybe the (subreg (mem)) confusion happened later.

>	* combine.c (simplify_comparison): Avoid narrowing a comparison
>	with a paradoxical subreg when doing so would drop signficant bits.

This patch looks reasonable to me.  Excluding (subreg (mem)) should do the
right thing.  You might want to add a comment pointing to LOAD_EXTEND_OP
and/or WORD_REGISTER_OPERATIONS so that this extra check can be deleted if/when
(subreg (mem)) goes away.

Long term, we should get rid of (subreg (mem)).  Looking at Alan Modra's
comments in PR 5169, it looks like the problem in this specific case is an
accidental bad interaction in combine RTL simplification.  We started with
a zero_extend, converted it to two shifts, converted them to an AND, and
then optimized away the AND because nonzero_bits knows that (subreg (mem))
has a special meaning, leaving us with (subreg (mem)) which is undesirable.
We should either stop expand_compound_operation from converting zero_extend
to subreg, which is what Alan Modra's patch does, or else we should fix
nonzero_bits to stop handling (subreg (mem)) specially.  The latter is risky
since it might result in lost optimizations.  The former seems reasonably safe,
but not as safe as your change, since Alan's patch changes how we canonicalize
RTL inside combine, but your patch only diasables a specific RTL rewrite that
is unsafe.  Alan's patch has the advantage that it might fix more instances
of this problem, since your patch only fixes one specific case.  But perhaps
this is the only specific case at the moment that is broken.  I think we
would want Alan's patch if/when we eliminate (subreg (mem)) though.  Or maybe
a revised version of Alan's patch that aborts if we think this RTL should
never be created at this point.  That depends on whether the nonzero_bits
handling of (subreg (mem)) is doing anything useful.

Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem [ patch attached ]
  2002-01-29 16:41     ` Jim Wilson
@ 2002-01-29 18:08       ` law
  2002-01-31 13:22       ` law
  1 sibling, 0 replies; 32+ messages in thread
From: law @ 2002-01-29 18:08 UTC (permalink / raw)
  To: Jim Wilson; +Cc: gcc, gcc-patches

In message <200201292139.NAA31158@wilson.cygnus.com>, Jim Wilson writes:
 > >[ Kenner, does this look familiar :-) ]
 > 
 > Kenner's original patch was added in 1991, a couple of months after he
 > completely rewrote combine.c.  There was a lot of change in combine.c at
 > the time, and it is very unlikely anyone will remember all of the details
 > of any one particular change.
Just a note -- that was a little joke.  The joke being I asked Kenner if
he had any memory of that decade old code a few weeks ago that might help
analyzing this problem.


jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem [ patch attached ]
  2002-01-29 16:41     ` Jim Wilson
  2002-01-29 18:08       ` law
@ 2002-01-31 13:22       ` law
  1 sibling, 0 replies; 32+ messages in thread
From: law @ 2002-01-31 13:22 UTC (permalink / raw)
  To: Jim Wilson; +Cc: gcc, gcc-patches

 In message <200201292139.NAA31158@wilson.cygnus.com>, Jim Wilson writes:
 > Kenner's change is correct if you exclude (subreg (mem)).
 [ ... ]
Agreed.

 > I believe Kenner's change will correct again in the future after we get rid
 > of (subreg (mem)).  But for now, it is wrong.  It looks like a simple
 > oversight.  Or maybe BYTE_LOADS_ZERO_EXTEND was so new at the time that we
 > had not yet realized that the meaning of (subreg (mem)) had (accidentally?)
 > changed.  Or maybe the (subreg (mem)) confusion happened later.
Agreed.

 > 
 > >	* combine.c (simplify_comparison): Avoid narrowing a comparison
 > >	with a paradoxical subreg when doing so would drop signficant bits.
 > 
 > This patch looks reasonable to me.  Excluding (subreg (mem)) should do the
 > right thing.  You might want to add a comment pointing to LOAD_EXTEND_OP
 > and/or WORD_REGISTER_OPERATIONS so that this extra check can be deleted
 > if/when (subreg (mem)) goes away.
Definitely needs comments.  Thanks for pointing that out.


 > Long term, we should get rid of (subreg (mem)).
I couldn't agree more.

 > Looking at Alan Modra's
 > comments in PR 5169, it looks like the problem in this specific case is an
 > accidental bad interaction in combine RTL simplification.

 > We started with
 > a zero_extend, converted it to two shifts, converted them to an AND, and
 > then optimized away the AND because nonzero_bits knows that (subreg (mem))
 > has a special meaning, leaving us with (subreg (mem)) which is undesirable.
Correct.  I skipped some of those conversions as they were completely
non-controversial (converting zero_extend into shifts, shifts into an AND)).

 > We should either stop expand_compound_operation from converting zero_extend
 > to subreg, which is what Alan Modra's patch does, or else we should fix
 > nonzero_bits to stop handling (subreg (mem)) specially.
I was nearly ready to twiddle nonzero_bits until we decided that the
the high bits in a paradoxical (subreg (mem)) were known to be zero/one
when LOAD_EXTEND_OP is defined.

As for Alan's change to avoid zero_extend->subreg conversion, I decided
against it for the same reason.  It's safe and correct given the semantics
for paradoxical (subreg (mem)).


 > is unsafe.  Alan's patch has the advantage that it might fix more instances
 > of this problem, since your patch only fixes one specific case.
Correct.  BUt as you say, it might be more risky in other ways.

 > But perhaps this is the only specific case at the moment that is broken.
Hard to know.

 > I think we would want Alan's patch if/when we eliminate (subreg (mem))
 > though.
Maybe -- if we get rid of (subreg (mem)), we would fix nonzero_bits and
num_sign_bit_copies at the same time.    I poked briefly at this and I'm
pretty sure fixing nonzero_bits would avoid the problem without the need
for Alan's patch.

So, the question becomes, which patch do we want to go with?  I'll go with
the majority on this one since there's no clear right/wrong answer.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 14:53 ` law
@ 2002-01-29 10:31   ` Richard Earnshaw
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Earnshaw @ 2002-01-29 10:31 UTC (permalink / raw)
  To: law; +Cc: Richard Kenner, gcc, Richard.Earnshaw

> In message <10201281935.AA25716@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
>  >     Don't assume you can break it into two expressions.  Consider the
>  >     expression as it stands (and as combine creates it).
>  > 
>  > Sure, but I'm trying to define what it means by comparison with
>  > two expressions.
>  > 
>  >     So with your assertions in mind are these two expresions equivalent?
>  > 
>  >     (and:SI (subreg:SI (mem:QI) 0) (const_int 255))
>  > 
>  >     (subreg:SI (mem:QI X) 0)
> 
> Are you sure?
> 
> Are the bits outside the mode of SUBREG_REG undefined or "don't care"?  That
> is the crux of the issue.
> 
> If those bits are undefined, then those expressions are not equivalent as
> the result of the first expression has 24 zeros in its high bits whereas
> the second expression has 24 undefined bits.  If those bits are "don't care"
> then the compiler can treat those expressions as equivalent.
> 

I was under the impression that we would only create paradoxical subregs 
of a mem if LOAD_EXTEND_OP returns either ZERO_EXTEND or SIGN_EXTEND for 
the mode of the mem (ie, we won't create such paradoxical subregs if 
LOAD_EXTEND_OP returns NIL).  That is, the operation is an alternate way 
of expressing the extend operation, because that is what will happen in 
practice.

Given the above, then on such machines

	(subreg:SI (mem:QI X) 0)

is defined for all 32 bits.

R.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 15:20   ` law
@ 2002-01-28 15:50     ` Geoff Keating
  0 siblings, 0 replies; 32+ messages in thread
From: Geoff Keating @ 2002-01-28 15:50 UTC (permalink / raw)
  To: law; +Cc: gcc

law@redhat.com writes:

> In message <200201282037.PAA24126@makai.watson.ibm.com>, David Edelsohn writes:
>  > >>>>> Richard Kenner writes:
>  > 
>  > Richard> Let me rephrase: why are we genering this if we don't know what tho
>  > se
>  > Richard> bits are?
>  > 
>  > 	I agree with Kenner.  There are two things we can discuss: should
>  > this ever be generated and should this be generated in this particular
>  > instance causing problems.  Regardless whether this compare result can be
>  > determined at compile-time or run-time, if the bits are undefined, why is
>  > GCC generating an expression that relies on undefined bits?
> Kenner's initial claim was not that they were undefined, but that they were
> bits we could pretend had any value that was interesting to us (don't care).
> If you read the section on paradoxical subregs, this is what it implies.
> 
> I think we really need to clarify if the bits are "don't care" or "undefined";
> once that's settled it shouldn't be terribly difficult to deal with combine
> to make it follow those semantics.
> 
> I think the semantics should be "undefined".

I think you're using the word "undefined" in a confusing way.  You
probably mean "opaque".  If the expression (subreg:SI (mem:QI ...))
had no defined value in the high bits, then the generator of the
expression should know this, and so presumably the value of the bits
doesn't affect the computation and the rest of the compiler can pick
any convenient value.

In this particular case, though, IMO either way the bug is in the
generator of the RTL.  We know that the bits will actually be zero,
and so either we rely on this and a ZERO_EXTEND should be used, or we
don't care what the bits will be, in which case the SUBREG is valid no
matter which definition of it we choose.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 15:21 Richard Kenner
@ 2002-01-28 15:41 ` law
  0 siblings, 0 replies; 32+ messages in thread
From: law @ 2002-01-28 15:41 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

In message <10201282114.AA26279@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 >     Kenner's initial claim was not that they were undefined, but that they
 >     were bits we could pretend had any value that was interesting to us
 >     (don't care).  If you read the section on paradoxical subregs, this is
 >     what it implies.
 > 
 > I must say I don't understand the distinction between those two cases.
In the don't care case, you can assume they have any value you want and
make optimizations based on that.  Going back to the original comparison:

(eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                (const_int 15 [0xf])) 1) 0)
    (mem/s:SI (plus:SI (reg:SI 3 %r3)
            (const_int 12 [0xc])) 1))

The ability to assume that the upper 24 bits in the first operand have any
convenient value (don't care) allows you do equate both arms, creating
an expression that optimizes into a compile-time constant.  This is what
combine does right now.  [ ie, you are allowed to assume that the upper
bits in the first operand match the upper bits in the second operand].

If the upper 24 bits in the first operand are undefined, then we can make
*no* assumptions about what value they might have.  Thus we can't optimize
away the comparison.

Or to look at the second example.  Given these two expressions:

(and:SI (subreg:SI (mem:QI) 0) (const_int 255))

(subreg:SI (mem:QI X) 0)

If the semantics are "don't care", then we can assume the upper bits of
the second expression are all zero and the two expressions are equivalent
(ie, they can be used interchangably).

However, if the semantics are "undefined", then the second expression's
upper 24bits are undefined whereas the first expressions upper bits are known
to have the value 0.  Which means the expressions can not be used 
interchangably.


jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
@ 2002-01-28 15:21 Richard Kenner
  2002-01-28 15:41 ` law
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Kenner @ 2002-01-28 15:21 UTC (permalink / raw)
  To: law; +Cc: gcc

    Kenner's initial claim was not that they were undefined, but that they
    were bits we could pretend had any value that was interesting to us
    (don't care).  If you read the section on paradoxical subregs, this is
    what it implies.

I must say I don't understand the distinction between those two cases.

But, as I said, I think the real issue here is why this is being emitted when
we don't know for sure what those bits are.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 13:46 ` David Edelsohn
@ 2002-01-28 15:20   ` law
  2002-01-28 15:50     ` Geoff Keating
  0 siblings, 1 reply; 32+ messages in thread
From: law @ 2002-01-28 15:20 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Richard Kenner, gcc

In message <200201282037.PAA24126@makai.watson.ibm.com>, David Edelsohn writes:
 > >>>>> Richard Kenner writes:
 > 
 > Richard> Let me rephrase: why are we genering this if we don't know what tho
 > se
 > Richard> bits are?
 > 
 > 	I agree with Kenner.  There are two things we can discuss: should
 > this ever be generated and should this be generated in this particular
 > instance causing problems.  Regardless whether this compare result can be
 > determined at compile-time or run-time, if the bits are undefined, why is
 > GCC generating an expression that relies on undefined bits?
Kenner's initial claim was not that they were undefined, but that they were
bits we could pretend had any value that was interesting to us (don't care).
If you read the section on paradoxical subregs, this is what it implies.

I think we really need to clarify if the bits are "don't care" or "undefined";
once that's settled it shouldn't be terribly difficult to deal with combine
to make it follow those semantics.

I think the semantics should be "undefined".

jeff



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 12:18 Richard Kenner
  2002-01-28 12:47 ` law
@ 2002-01-28 14:53 ` law
  2002-01-29 10:31   ` Richard Earnshaw
  1 sibling, 1 reply; 32+ messages in thread
From: law @ 2002-01-28 14:53 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

In message <10201281935.AA25716@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 >     Don't assume you can break it into two expressions.  Consider the
 >     expression as it stands (and as combine creates it).
 > 
 > Sure, but I'm trying to define what it means by comparison with
 > two expressions.
 > 
 >     So with your assertions in mind are these two expresions equivalent?
 > 
 >     (and:SI (subreg:SI (mem:QI) 0) (const_int 255))
 > 
 >     (subreg:SI (mem:QI X) 0)

Are you sure?

Are the bits outside the mode of SUBREG_REG undefined or "don't care"?  That
is the crux of the issue.

If those bits are undefined, then those expressions are not equivalent as
the result of the first expression has 24 zeros in its high bits whereas
the second expression has 24 undefined bits.  If those bits are "don't care"
then the compiler can treat those expressions as equivalent.

jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 13:25 Richard Kenner
@ 2002-01-28 13:46 ` David Edelsohn
  2002-01-28 15:20   ` law
  0 siblings, 1 reply; 32+ messages in thread
From: David Edelsohn @ 2002-01-28 13:46 UTC (permalink / raw)
  To: Richard Kenner; +Cc: law, gcc

>>>>> Richard Kenner writes:

Richard> Let me rephrase: why are we genering this if we don't know what those
Richard> bits are?

	I agree with Kenner.  There are two things we can discuss: should
this ever be generated and should this be generated in this particular
instance causing problems.  Regardless whether this compare result can be
determined at compile-time or run-time, if the bits are undefined, why is
GCC generating an expression that relies on undefined bits?

David

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
@ 2002-01-28 13:25 Richard Kenner
  2002-01-28 13:46 ` David Edelsohn
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Kenner @ 2002-01-28 13:25 UTC (permalink / raw)
  To: law; +Cc: gcc

Let me rephrase: why are we genering this if we don't know what those
bits are?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 12:47 Richard Kenner
@ 2002-01-28 13:10 ` law
  0 siblings, 0 replies; 32+ messages in thread
From: law @ 2002-01-28 13:10 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

In message <10201281959.AA25855@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 > I think the code issue here is that if the bits are undefined, why is
 > this being generated?
Err, if the bits were undefined, then we can't make any assumptions about
their value.  Contrast that to how the compiler treats those bits -- it
treats them as "don't care, assume any useful value".

If the bits are undefined in the traditional sense, then this expression is
not a compile-time constant:

(eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                (const_int 15 [0xf])) 1) 0)
    (mem/s:SI (plus:SI (reg:SI 3 %r3)
            (const_int 12 [0xc])) 1))


If the bits are undefined in the traditional sense, then these two expressions
are _NOT_ equivalent:

(and:SI (subreg:SI (mem:QI) 0) (const_int 255))

(subreg:SI (mem:QI X) 0)



jeff
jeff










^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 12:18 Richard Kenner
@ 2002-01-28 12:47 ` law
  2002-01-28 14:53 ` law
  1 sibling, 0 replies; 32+ messages in thread
From: law @ 2002-01-28 12:47 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

In message <10201281935.AA25716@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 >     Don't assume you can break it into two expressions.  Consider the
 >     expression as it stands (and as combine creates it).
 > 
 > Sure, but I'm trying to define what it means by comparison with
 > two expressions.
 > 
 >     So with your assertions in mind are these two expresions equivalent?
 > 
 >     (and:SI (subreg:SI (mem:QI) 0) (const_int 255))
 > 
 >     (subreg:SI (mem:QI X) 0)
OK.  So let's go back to this expression:

(eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                (const_int 15 [0xf])) 1) 0)
    (mem/s:SI (plus:SI (reg:SI 3 %r3)
            (const_int 12 [0xc])) 1))

Your claim is that the paradoxical subreg allows us to pretend the bits
are anything that is convenient for the optimizer.  I claim we can't make
that assumption.  Let's pretend the memory in question has the value
0x12345678.  Plug it in (remembering byte loads zero extend, big endian)
and the hardware actually performs the following comparison:

(eq (0x00000078) (0x12345678)

Which is false.

However, combine treats this as

(eq (0x12345678) (0x12345678)

Which is true and combine optimizes away the comparison.

So, something is inconsistent.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
@ 2002-01-28 12:47 Richard Kenner
  2002-01-28 13:10 ` law
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Kenner @ 2002-01-28 12:47 UTC (permalink / raw)
  To: law; +Cc: gcc

I think the code issue here is that if the bits are undefined, why is
this being generated?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
@ 2002-01-28 12:18 Richard Kenner
  2002-01-28 12:47 ` law
  2002-01-28 14:53 ` law
  0 siblings, 2 replies; 32+ messages in thread
From: Richard Kenner @ 2002-01-28 12:18 UTC (permalink / raw)
  To: law; +Cc: gcc

    Don't assume you can break it into two expressions.  Consider the
    expression as it stands (and as combine creates it).

Sure, but I'm trying to define what it means by comparison with
two expressions.

    So with your assertions in mind are these two expresions equivalent?

    (and:SI (subreg:SI (mem:QI) 0) (const_int 255))

    (subreg:SI (mem:QI X) 0)

I'd be inclined to say "yes".

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: paradoxical subreg problem
  2002-01-28 12:03 paradoxical subreg problem Richard Kenner
@ 2002-01-28 12:15 ` law
  0 siblings, 0 replies; 32+ messages in thread
From: law @ 2002-01-28 12:15 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc

In message <10201281902.AA25419@vlsi1.ultra.nyu.edu>, Richard Kenner writes:
 >     I'll start simple.  On a big endian machine, can this expression be
 >     optimized into true/false at compile time, or must it be run-time
 >     computed?
 > 
 >     (eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
 >                     (const_int 15 [0xf])) 1) 0)
 >         (mem/s:SI (plus:SI (reg:SI 3 %r3)
 >               (const_int 12 [0xc])) 1))
 > 
 > This is equivalent to:
 > 
 >     (set (reg:QI xx) (mem/s:QI (plus:SI (reg:SI 3 %r3)
 > 				        (const_int 15 [0xf])) 0))
 >     (eq (subreg:SI (reg:QI xx) 0)
 >         (mem/s:SI (plus:SI (reg:SI 3 %r3)
 > 	                   (const_int 12 [0xc])) 1))
 > 
 > 
 > on all machines.
Don't assume you can break it into two expressions.  Consider the
expression as it stands (and as combine creates it).

 >     According to my reading, the compiler is allowed to optimize the
 >     expression into (true) because the bits outside of QImode on the
 >     subreg are "don't care bits" -- meaning they can have any value that
 >     is convenient to us.
 > 
 >     Agree/Disagree?
 > 
 > I agree.
OK.

 > 
 >     Now consider if byte loads zero extend.  Does your answer change?  In
 >     the subreg arm, those "don't care" bits, have a well defined meaning --
 >     ie, we can't pretend they have whatever value is convenient for us.
 >     So, unless we have some more specific knowledge about the other arm,
 >     then this expression must be evaluated at runtime.
 > 
 > I disagree.  We "know" what they will be, but the undefined semantics still
 > holds.  So this can also be true.
OK.

So with your assertions in mind are these two expresions equivalent?

(and:SI (subreg:SI (mem:QI) 0) (const_int 255))

(subreg:SI (mem:QI X) 0)


jeff


















^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  paradoxical subreg problem
@ 2002-01-28 12:03 Richard Kenner
  2002-01-28 12:15 ` law
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Kenner @ 2002-01-28 12:03 UTC (permalink / raw)
  To: law; +Cc: gcc

    I'll start simple.  On a big endian machine, can this expression be
    optimized into true/false at compile time, or must it be run-time
    computed?

    (eq (subreg:SI (mem/s:QI (plus:SI (reg:SI 3 %r3)
                    (const_int 15 [0xf])) 1) 0)
        (mem/s:SI (plus:SI (reg:SI 3 %r3)
              (const_int 12 [0xc])) 1))

This is equivalent to:

    (set (reg:QI xx) (mem/s:QI (plus:SI (reg:SI 3 %r3)
				        (const_int 15 [0xf])) 0))
    (eq (subreg:SI (reg:QI xx) 0)
        (mem/s:SI (plus:SI (reg:SI 3 %r3)
	                   (const_int 12 [0xc])) 1))


on all machines.

    According to my reading, the compiler is allowed to optimize the
    expression into (true) because the bits outside of QImode on the
    subreg are "don't care bits" -- meaning they can have any value that
    is convenient to us.

    Agree/Disagree?

I agree.

    Now consider if byte loads zero extend.  Does your answer change?  In the
    subreg arm, those "don't care" bits, have a well defined meaning -- ie, we
    can't pretend they have whatever value is convenient for us.  So, unless
    we have some more specific knowledge about the other arm, then this
    expression must be evaluated at runtime.

I disagree.  We "know" what they will be, but the undefined semantics still
holds.  So this can also be true.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2002-01-31 18:15 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-01-28 11:36 paradoxical subreg problem law
2002-01-28 11:50 ` Richard Henderson
2002-01-28 11:58   ` law
2002-01-28 12:05     ` Richard Henderson
2002-01-28 12:09       ` law
2002-01-28 12:39         ` Richard Henderson
2002-01-28 13:03           ` law
2002-01-28 13:18             ` Richard Henderson
2002-01-28 14:25             ` Michael Matz
2002-01-28 12:15 ` Michael Matz
2002-01-28 15:41 ` Jim Wilson
2002-01-29  1:57   ` Jim Wilson
2002-01-29 10:55   ` Richard Earnshaw
2002-01-29 12:21   ` paradoxical subreg problem [ patch attached ] law
2002-01-29 13:46     ` Richard Earnshaw
2002-01-29 16:41     ` Jim Wilson
2002-01-29 18:08       ` law
2002-01-31 13:22       ` law
2002-01-28 12:03 paradoxical subreg problem Richard Kenner
2002-01-28 12:15 ` law
2002-01-28 12:18 Richard Kenner
2002-01-28 12:47 ` law
2002-01-28 14:53 ` law
2002-01-29 10:31   ` Richard Earnshaw
2002-01-28 12:47 Richard Kenner
2002-01-28 13:10 ` law
2002-01-28 13:25 Richard Kenner
2002-01-28 13:46 ` David Edelsohn
2002-01-28 15:20   ` law
2002-01-28 15:50     ` Geoff Keating
2002-01-28 15:21 Richard Kenner
2002-01-28 15:41 ` law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).