[Bug c/24743] New: NaN or correct result after divrp with 3 FPU registers

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/24743]  New: NaN or correct result after divrp with 3 FPU registers
@ 2005-11-08 21:17 sraa at kse dot nl
  2005-11-08 21:23 ` [Bug target/24743] " pinskia at gcc dot gnu dot org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: sraa at kse dot nl @ 2005-11-08 21:17 UTC (permalink / raw)
  To: gcc-bugs

Hello,

(OS, machine and compiler info in bottom of this message).

In certain sources I get a problem that a calculation with floats gives a NaN.
It occurs when parens are used for the division (they are not neccessary, but
it is not wrong). But when two same instructions are placed after each other
there is no problem with the second one. Also when other lines are between the
firs with parens and the second instruction with parens, there is no problem. 
The problem is unfortunately not reproducable with a short program.
Still I can't reproduce the problem by means of a short program. I let three
system run calculations for day's. The disassembly shows the same instructions,
and here no problems at all. So I can not deliver a small program that
reproduces the problem.... but sometimes under some conditions a NaN is the
result of a calculation with normal values (we don't want high precision, and
numbers are not above 5000.000)

The software is also not multithreaded... I also tested with schedctl_start and
stop if it has something to do with context swithing. But results where the
same.

Short piece of disassemby where is often goes wrong: 

  gewenst1      = uitspit->gewenst[dosering] * batch / 100.0;     <-- Works
always
    20e2:       8b 15 00 00 00 00       mov    0x0,%edx
    20e8:       8b 45 08                mov    0x8(%ebp),%eax
    20eb:       d9 84 82 6c 07 00 00    flds   0x76c(%edx,%eax,4)
    20f2:       d8 4d f0                fmuls  0xfffffff0(%ebp)
    20f5:       d9 05 78 04 00 00       flds   0x478
    20fb:       de f9                   fdivrp %st,%st(1)
    20fd:       d9 9d d4 f5 ff ff       fstps  0xfffff5d4(%ebp)
  gewenst2      = uitspit->gewenst[dosering] * (batch / 100.0);     <-- Works
NOT always, sometimes NaN!!!
    2103:       8b 15 00 00 00 00       mov    0x0,%edx
    2109:       8b 45 08                mov    0x8(%ebp),%eax
    210c:       d9 84 82 6c 07 00 00    flds   0x76c(%edx,%eax,4)
    2113:       d9 45 f0                flds   0xfffffff0(%ebp)
    2116:       dd 05 80 04 00 00       fldl   0x480                   <-----
opcode not in Intel manual?!
    211c:       de f9                   fdivrp %st,%st(1)              <---
opcode DE F9 belongs to fdiv
    211e:       de c9                   fmulp  %st,%st(1)
    2120:       d9 9d d0 f5 ff ff       fstps  0xfffff5d0(%ebp)
  gewenst3      = uitspit->gewenst[dosering] * (batch / 100.0);     <-- Works
always
    2126:       8b 15 00 00 00 00       mov    0x0,%edx
    212c:       8b 45 08                mov    0x8(%ebp),%eax
    212f:       d9 84 82 6c 07 00 00    flds   0x76c(%edx,%eax,4)
    2136:       d9 45 f0                flds   0xfffffff0(%ebp)
    2139:       dd 05 80 04 00 00       fldl   0x480
    213f:       de f9                   fdivrp %st,%st(1)
    2141:       de c9                   fmulp  %st,%st(1)
    2143:       d9 9d cc f5 ff ff       fstps  0xfffff5cc(%ebp)

So I can desribe the problem as follows:
The result of "fdivrp %st, %st1" (or fdiv) goes wrong when the FPU-stack is
used (more than normal): as long as only ST(0) and ST(1) are used the results
are o.k., but when ST(2) is used it goes wrong. In this calculation X = A * (B
/ 100.0), all 3 variables are pushed on the FPU-stack. The result is "NaN". But
only the first time that this construction is used in this source!  .......

Is gcc producing wrong assemby? Since fdivrp is fdiv when you look at the
opcode. Also the opcode of fldl is not in the Intel manual... Or is the
disassember (gobjdump) wrong?

When the 100.0 is used via a float, the code is different and there is no fldl
used but only flds.

We don't want to rewrite our complete application of roughly 300.000 lines. So
it would be nice if someone knows where the problem is.

Sorry if bug report not quite what's expected. But after three weeks working on
this problem this is the best I can do, and hope that someone can say something
wise about this.

Thanks in advance.

Best Regards,
Stefan Raaijmakers

We use Advantech hardware, with Intel Pentium 4 2.8 GHz Celeron with Solaris 10
x86 (32bit) installed.

gcc -v save-temps:
mira@promas */packages/dos 77 : make
Checking dependencies, please wait...
mdept - Dependency DosUitspit.c newer than ./DosUitspit.o
`DosDiv.o' is up to date.
`DosLib.o' is up to date.
cc -g -Wcast-align -v save-temps -DNO_ANSI -I../../include
-I/usr/pm_tools/inclu
de -c DosUitspit.c
cc: save-temps: No such file or directory
Reading specs from /usr/sfw/lib/gcc/i386-pc-solaris2.10/3.4.3/specs
Configured with: /builds/sfw10-gate/usr/src/cmd/gcc/gcc-3.4.3/configure
--prefix
=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as --with-ld=/usr/ccs/bin/ld
--w
ithout-gnu-ld --enable-languages=c,c++ --enable-shared
Thread model: posix
gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath)
 /usr/sfw/libexec/gcc/i386-pc-solaris2.10/3.4.3/cc1 -quiet -v -I../../include
-I
/usr/pm_tools/include -DNO_ANSI DosUitspit.c -quiet -dumpbase DosUitspit.c
-auxb
ase DosUitspit -g -Wcast-align -version -o /var/tmp//cctIUOWh.s
ignoring nonexistent directory
"/usr/sfw/lib/gcc/i386-pc-solaris2.10/3.4.3/../..
/../../i386-pc-solaris2.10/include"
#include "..." search starts here:
#include <...> search starts here:
 ../../include
 /usr/pm_tools/include
 /usr/local/include
 /usr/sfw/include
 /usr/sfw/lib/gcc/i386-pc-solaris2.10/3.4.3/include
 /usr/include
End of search list.
GNU C version 3.4.3 (csl-sol210-3_4-branch+sol_rpath) (i386-pc-solaris2.10)
        compiled by GNU C version 3.4.3 (csl-sol210-3_4-branch+sol_rpath).
GGC heuristics: --param ggc-min-expand=46 --param ggc-min-heapsize=31687
 /usr/sfw/bin/gas --traditional-format -V -Qy -s -o DosUitspit.o
/var/tmp//cctIU
OWh.s
GNU assembler version 2.15 (i386-pc-solaris2.10) using BFD version 2.15
`DosVid.o' is up to date.
mira@promas */packages/dos 78 :

-- 
           Summary: NaN or correct result after divrp with 3 FPU registers
           Product: gcc
           Version: 3.4.3
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sraa at kse dot nl
 GCC build triplet: /builds/sfw10-gate/usr/src/cmd/gcc/gcc-3.4.3/configure -
                    -prefix
  GCC host triplet: GCC 3.4.3 (csl-sol210-3_4-branch+sol_rpath)
GCC target triplet: Solaris 10 GA x86 SunOs 5.10 i386

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24743

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/24743] NaN or correct result after divrp with 3 FPU registers
  2005-11-08 21:17 [Bug c/24743] New: NaN or correct result after divrp with 3 FPU registers sraa at kse dot nl
@ 2005-11-08 21:23 ` pinskia at gcc dot gnu dot org
  2005-11-08 21:57 ` sraa at kse dot nl
  2005-11-12 12:07 ` sraa at kse dot nl
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-11-08 21:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2005-11-08 21:23 -------
Do you have a small source code which exposes the issue here?
(Note I saw "So I can not deliver a small program that
reproduces the problem.... " but we really cannot do anything about it if there
is not a testcase)

Another thing is that you are using a GCC provided by Sun/Codesourcery and we
(FSF) really don't support that version of GCC.  You really should had filed a
bug report to them first.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24743


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/24743] NaN or correct result after divrp with 3 FPU registers
  2005-11-08 21:17 [Bug c/24743] New: NaN or correct result after divrp with 3 FPU registers sraa at kse dot nl
  2005-11-08 21:23 ` [Bug target/24743] " pinskia at gcc dot gnu dot org
@ 2005-11-08 21:57 ` sraa at kse dot nl
  2005-11-12 12:07 ` sraa at kse dot nl
  2 siblings, 0 replies; 4+ messages in thread
From: sraa at kse dot nl @ 2005-11-08 21:57 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #2 from sraa at kse dot nl  2005-11-08 21:57 -------
Subject: RE:  NaN or correct result after divrp with 3 FPU registers

Hello,

That's the big problem. I'm trying for three weeks to get the problem expose
itself in a small program. But I can't get it wrong on purpose. Otherwise I
would have filled in a bug report 3 weeks ago.

So it's a combination of the application with all it's use of memory and the
instructions. It manifests regularly but not constant at our customer. It's
a large application and communicating with PLC's. Even for me is hard to
simulate the proces of the factory at our office. When I'm "lucky" I get the
problem once a day.

A lot of quesions:
- Is there another way that someone with knowledge of gcc (development) can
help me? Maybe someone in The Netherlands.
- Or can you recommend someone at Sun?
- Why isn't this version supported?
- Is it better for me to build a new version of gcc myself?
- When I build a new version on one system, is it possible to copy only the
gcc executable to other systems, or do I need more, or even build again on
all systems needing the new version (I don't expect this, but I ask to be
sure).
- When I send a .i or .s file of the source where it goes wrong, is there
nothing you can say about it.

I really have a very big problem since a few systems are waiting on a
solution.

Anyhow, thanks for the fast reply, and I hope that you can help me or answer
my questions!

Best regards,

Stefan Raaijmakers.

-----Original Message-----
From: pinskia at gcc dot gnu dot org [mailto:gcc-bugzilla@gcc.gnu.org] 
Sent: dinsdag 8 november 2005 22:23
To: sraa@kse.nl
Subject: [Bug target/24743] NaN or correct result after divrp with 3 FPU
registers

------- Comment #1 from pinskia at gcc dot gnu dot org  2005-11-08 21:23
------- Do you have a small source code which exposes the issue here? (Note
I saw "So I can not deliver a small program that reproduces the problem....
" but we really cannot do anything about it if there is not a testcase)

Another thing is that you are using a GCC provided by Sun/Codesourcery and
we
(FSF) really don't support that version of GCC.  You really should had filed
a bug report to them first.

-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24743

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is. You
reported the bug, or are watching the reporter.

____________________________
Head Office:
KSE Group
Rondweg 27
NL-5531 AJ Bladel

Phone: +31 (0)497 383818
Fax:   +31 (0)497 383840

Visit our website at: www.kse.nl

****************************************************************************
If you as intended recipient have received this e-mail incorrectly, please
notify the sender (by e-mail) immediately. This e-mail is confidential and may
be legally privileged. KSE Group does not guarantee that the information sent
and/or received by or with this e-mail is correct and does not accept any
liability for damages related thereto. KSE Group also does not accept any
liability in case of the presence of viruses.
****************************************************************************

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24743

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/24743] NaN or correct result after divrp with 3 FPU registers
  2005-11-08 21:17 [Bug c/24743] New: NaN or correct result after divrp with 3 FPU registers sraa at kse dot nl
  2005-11-08 21:23 ` [Bug target/24743] " pinskia at gcc dot gnu dot org
  2005-11-08 21:57 ` sraa at kse dot nl
@ 2005-11-12 12:07 ` sraa at kse dot nl
  2 siblings, 0 replies; 4+ messages in thread
From: sraa at kse dot nl @ 2005-11-12 12:07 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #3 from sraa at kse dot nl  2005-11-12 12:07 -------
We found out that the stackpointer was unbalanced at a given point. Via a small
program in assembly wich reads the stackpointer that I placed at suspicious
points in the code I found a function that didn't restore the stackpointer
correctly.

This function was of type float and returned a float but this result was not
assigned to any variable... So this has to be it. I know this for sure on
monday since the production has stopped now and I cannot test it.

Example of the false code:

   partijlocmut_kg(geleverd1);              /* returns a float ..... */

Instead of:

   geleverd1 = partijlocmut_kg(geleverd1);

So a Sun hadn't this problem because the stack is managed differently and the
floating point calculations are done by the software instead of a FPU with
using of the stack registers. Or has something smart or by coincidence to
return the float to a register anyhow. I think it returned the value to the
parameter with which the function was called. Otherwise the software wouldn't
have worked.

If our software was written, in ANSI-C, with correct prototyping of functions,
the compiler would have been able to signal the error. Unfortunalely it isn't.
But now we know how to seatch or what we have to do.....

The cause for my problems was not a buggy OS or compiler but just bad written
code.

-- 

sraa at kse dot nl changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24743

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-11-12 12:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-08 21:17 [Bug c/24743] New: NaN or correct result after divrp with 3 FPU registers sraa at kse dot nl
2005-11-08 21:23 ` [Bug target/24743] " pinskia at gcc dot gnu dot org
2005-11-08 21:57 ` sraa at kse dot nl
2005-11-12 12:07 ` sraa at kse dot nl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).