[Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2
       [not found] <20020430191600.6526.marcus@mc.pp.se>
@ 2003-06-02 11:40 ` joern.rennecke@superh.com
  2003-06-02 12:25 ` marcus@mc.pp.se
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: joern.rennecke@superh.com @ 2003-06-02 11:40 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6526

------- Additional Comments From joern.rennecke@superh.com  2003-06-02 11:39 -------
Subject: Re:  [SH4] sdivsi3_i4 can clobber xd0/xd2

If you want to change fpscr so that the change is effecitve and preserved
in and across function calls, you should use __set_fpscr.
The new value for fpscr is to be passed in r4, and it is put in fpscr,
and with appropriate modifications to the PR and SZ bit into the two
locations of __fpscr_values.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2
       [not found] <20020430191600.6526.marcus@mc.pp.se>
  2003-06-02 11:40 ` [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2 joern.rennecke@superh.com
@ 2003-06-02 12:25 ` marcus@mc.pp.se
  2003-06-03 15:55 ` joern.rennecke@superh.com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: marcus@mc.pp.se @ 2003-06-02 12:25 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6526



------- Additional Comments From marcus@mc.pp.se  2003-06-02 12:25 -------
Subject: Re:  [SH4] sdivsi3_i4 can clobber xd0/xd2


"joern.rennecke@superh.com" <gcc-bugzilla@gcc.gnu.org> writes:

> If you want to change fpscr so that the change is effecitve and preserved
> in and across function calls, you should use __set_fpscr.
> The new value for fpscr is to be passed in r4, and it is put in fpscr,
> and with appropriate modifications to the PR and SZ bit into the two
> locations of __fpscr_values.

Sorry, but this isn't really helpful.  First, the current libgcc
functions do _not_ use __fpscr_values, so using __set_fpscr won't help
anyway (as you can see in the test case, which _does_ use __set_fpscr
but still fails).  Second, I need to use frchg for preformance reasons.


  // Marcus






------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2
       [not found] <20020430191600.6526.marcus@mc.pp.se>
  2003-06-02 11:40 ` [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2 joern.rennecke@superh.com
  2003-06-02 12:25 ` marcus@mc.pp.se
@ 2003-06-03 15:55 ` joern.rennecke@superh.com
  2003-06-03 17:14 ` marcus@mc.pp.se
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: joern.rennecke@superh.com @ 2003-06-03 15:55 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6526

------- Additional Comments From joern.rennecke@superh.com  2003-06-03 15:55 -------
Subject: Re:  [SH4] sdivsi3_i4 can clobber xd0/xd2

"marcus@mc.pp.se" wrote:
> Sorry, but this isn't really helpful.  First, the current libgcc
> functions do _not_ use __fpscr_values, so using __set_fpscr won't help
> anyway (as you can see in the test case, which _does_ use __set_fpscr
> but still fails).  Second, I need to use frchg for preformance reasons.

Sorry, I haven't looked at the issue closely enough first.

Having FR set to 1 at the start of a gcc compiled or supplied function
is not supported.  Supporting this would require too much overhead for
no apparent gain.
The patch you provided does not make integer division slower, it also
causes it to give different results when the floating point rounding
mode is changed, or trap when e.g. inexact traps are enabled.

I suggest that you switch to the alternate floating point register bank
to put your matrix there, and then switch back to use the ordinary
floating point registers for generic floating point operations.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2
       [not found] <20020430191600.6526.marcus@mc.pp.se>
                   ` (2 preceding siblings ...)
  2003-06-03 15:55 ` joern.rennecke@superh.com
@ 2003-06-03 17:14 ` marcus@mc.pp.se
  2003-06-03 18:40 ` joern.rennecke@superh.com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: marcus@mc.pp.se @ 2003-06-03 17:14 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6526

------- Additional Comments From marcus@mc.pp.se  2003-06-03 17:14 -------
Subject: Re:  [SH4] sdivsi3_i4 can clobber xd0/xd2

"joern.rennecke@superh.com" <gcc-bugzilla@gcc.gnu.org> writes:

> The patch you provided does not make integer division slower,

Well, that's a good thing isn't it?

>it also causes it to give different results when the floating point
>rounding mode is changed, or trap when e.g. inexact traps are enabled.

This is already the case with several other implementations in
lib1funcs.asm, so presumably changing rounding/trap mode is also "not
supported", and fixing that is a separate issue.

> I suggest that you switch to the alternate floating point register bank
> to put your matrix there, and then switch back to use the ordinary
> floating point registers for generic floating point operations.

That's what I'm doing.  The trick comes when applying a transformation
operation to the matrix.  Then the operation matrix is loaded into the
"ordinary" floating point registers, and four ftrv operations are
carried out to compute the new matrix.  The new matrix however ends up
in fr0-fr15, but it's just a matter of doing a frchg and it becomes
the new xmtrx, and I get a new set of ordinary floating point
registers.  Unless I'm very much mistaken, this is the whole idea of
the frchg instruction.  The alternative would be to copy all the
computed values manually, which would be much slower.

  // Marcus

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2
       [not found] <20020430191600.6526.marcus@mc.pp.se>
                   ` (3 preceding siblings ...)
  2003-06-03 17:14 ` marcus@mc.pp.se
@ 2003-06-03 18:40 ` joern.rennecke@superh.com
  2003-06-03 19:19 ` marcus@mc.pp.se
  2003-06-16 18:18 ` dhazeghi@yahoo.com
  6 siblings, 0 replies; 7+ messages in thread
From: joern.rennecke@superh.com @ 2003-06-03 18:40 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6526

------- Additional Comments From joern.rennecke@superh.com  2003-06-03 18:39 -------
Subject: Re:  [SH4] sdivsi3_i4 can clobber xd0/xd2

"marcus@mc.pp.se" wrote:
> "joern.rennecke@superh.com" <gcc-bugzilla@gcc.gnu.org> writes:
> 
> > The patch you provided does not make integer division slower,
> 
> Well, that's a good thing isn't it?

Oops, scratch that 'not'.
FWIW, division by small numbers can be made faster by using single
precision, but that makes handling larger numbers a bit slower.

> >it also causes it to give different results when the floating point
> >rounding mode is changed, or trap when e.g. inexact traps are enabled.
> 
> This is already the case with several other implementations in
> lib1funcs.asm, so presumably changing rounding/trap mode is also "not
> supported", and fixing that is a separate issue.

Oops, I forgot that changing the rounding mode only changes the number
of guard bits available, but this is not an issue with doing 32 bit/32 bit
divisions in double precision, as there are plenty of spare bits.  When
you calculate x/y, this results either in an exact integer, or a fraction
that is at least 1/y larger than the correct integral result for truncating
divide, and at least 1/y less than the next larger integer.
The smallest representable delta is x/y/2**51, which is smaller than
1/y/2**19 .  So the ftrc at the end will do the right rounding, no matter
how the floating point divide was rounded.

> That's what I'm doing.  The trick comes when applying a transformation
> operation to the matrix.  Then the operation matrix is loaded into the
> "ordinary" floating point registers, and four ftrv operations are
> carried out to compute the new matrix.  The new matrix however ends up
> in fr0-fr15, but it's just a matter of doing a frchg and it becomes
> the new xmtrx, and I get a new set of ordinary floating point
> registers.  Unless I'm very much mistaken, this is the whole idea of
> the frchg instruction.  The alternative would be to copy all the
> computed values manually, which would be much slower.

So, are you using integer division somewhere in this loop?
Or are you doing an uneven number of iterations, so you end up
with a switched register set at the end?

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2
       [not found] <20020430191600.6526.marcus@mc.pp.se>
                   ` (4 preceding siblings ...)
  2003-06-03 18:40 ` joern.rennecke@superh.com
@ 2003-06-03 19:19 ` marcus@mc.pp.se
  2003-06-16 18:18 ` dhazeghi@yahoo.com
  6 siblings, 0 replies; 7+ messages in thread
From: marcus@mc.pp.se @ 2003-06-03 19:19 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6526

------- Additional Comments From marcus@mc.pp.se  2003-06-03 19:19 -------
Subject: Re:  [SH4] sdivsi3_i4 can clobber xd0/xd2

"joern.rennecke@superh.com" <gcc-bugzilla@gcc.gnu.org> writes:

> > > The patch you provided does not make integer division slower,
> > 
> > Well, that's a good thing isn't it?
> 
> Oops, scratch that 'not'.

Ah.  I did some quick measurememts and the slowdown seems to be around
3.5% for signed and 8% for unsigned division.  A little higher than I
expected actually.

> So, are you using integer division somewhere in this loop?

Not in the inner loop, which is 100% assembler code.

> Or are you doing an uneven number of iterations, so you end up
> with a switched register set at the end?

Yup.  And the "end" can be followed by more iterations later, after I
have done some integer operations (modulus).  An outer loop, if you
will.  I did some attempts with adding extra switches to "balance" the
thing, but it all turned out far to messy.

  // Marcus

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2
       [not found] <20020430191600.6526.marcus@mc.pp.se>
                   ` (5 preceding siblings ...)
  2003-06-03 19:19 ` marcus@mc.pp.se
@ 2003-06-16 18:18 ` dhazeghi@yahoo.com
  6 siblings, 0 replies; 7+ messages in thread
From: dhazeghi@yahoo.com @ 2003-06-16 18:18 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6526


dhazeghi@yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  GCC build triplet|                            |sparc-sun-solaris2.8


------- Additional Comments From dhazeghi@yahoo.com  2003-06-16 18:18 -------
Hello,

so just to clarify, this problem is not yet solved, correct? Thanks,

Dara


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-06-16 18:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20020430191600.6526.marcus@mc.pp.se>
2003-06-02 11:40 ` [Bug target/6526] [SH4] sdivsi3_i4 can clobber xd0/xd2 joern.rennecke@superh.com
2003-06-02 12:25 ` marcus@mc.pp.se
2003-06-03 15:55 ` joern.rennecke@superh.com
2003-06-03 17:14 ` marcus@mc.pp.se
2003-06-03 18:40 ` joern.rennecke@superh.com
2003-06-03 19:19 ` marcus@mc.pp.se
2003-06-16 18:18 ` dhazeghi@yahoo.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).