public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/29852]  New: x86_64: SSE version missing for fmod{d,s,x}f3
@ 2006-11-15 20:46 burnus at gcc dot gnu dot org
  2006-11-15 21:20 ` [Bug target/29852] " rguenth at gcc dot gnu dot org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: burnus at gcc dot gnu dot org @ 2006-11-15 20:46 UTC (permalink / raw)
  To: gcc-bugs

The is currently no SSE version in x86_64 for fmod.

fmod{d,s,x}f3 intriniscs are constrainted by:
 "TARGET_USE_FANCY_MATH_387
  && (!(TARGET_SSE2 && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387)"

The need for these intriniscs can be seen in the Polyhedron Fortran performance
test "ac". As soon as gfortran started to used fmod the execution time for the
program "ac" almost trippled under x86_64 as libcall to the math library is
done. For the performance, see:
http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-summary.txt-ac-3.png
at http://www.suse.de/~gcctest/c++bench/polyhedron/

See mailing list thread which starts with
http://gcc.gnu.org/ml/fortran/2006-11/msg00333.html
the actually interesting thread starts, however, with:
http://gcc.gnu.org/ml/fortran/2006-11/msg00353.html


-- 
           Summary: x86_64: SSE version missing for fmod{d,s,x}f3
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: burnus at gcc dot gnu dot org
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
@ 2006-11-15 21:20 ` rguenth at gcc dot gnu dot org
  2006-11-29 10:38 ` burnus at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-11-15 21:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2006-11-15 21:20 -------
Confirmed.  SSE doesn't have something like 387 fprem though, so this is
probably
a library problem.  (Note that remainder is one of the few extra things to
basic arithmetics that IEEE 754 specifies).


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2006-11-15 21:20:17
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
  2006-11-15 21:20 ` [Bug target/29852] " rguenth at gcc dot gnu dot org
@ 2006-11-29 10:38 ` burnus at gcc dot gnu dot org
  2006-11-29 10:50 ` rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: burnus at gcc dot gnu dot org @ 2006-11-29 10:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from burnus at gcc dot gnu dot org  2006-11-29 10:38 -------
If one uses -mfpmath=387 or -mfpmath=sse,387, the speed also dramatically
increases.

Results with test case below on a Athlon64:

icc -O3 test.c; time ./a.out
d=100002.216410, r=100000.000026
real    0m2.549s; user    0m2.548s; sys     0m0.000s

gcc -ftree-vectorize -O3 -msse3 -ffast-math -lm test.c
d=100002.216410, r=100000.000026
real    0m5.444s; user    0m5.444s; sys     0m0.000s

gcc -ftree-vectorize -O3 -msse3 -mfpmath=sse,387 -ffast-math -lm test.c
d=100002.216410, r=100000.000026
real    0m1.363s; user    0m1.192s; sys     0m0.000s

----------------
#include <math.h>
#include <stdio.h>

int main() {
  double r,d;
  d = 0.0;
  for(r=0.0; r < 100000.0; r += 0.001)
    d = fmod(d,5.0)+r;
  printf("d=%f, r=%f\n",d,r);
  return 0;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
  2006-11-15 21:20 ` [Bug target/29852] " rguenth at gcc dot gnu dot org
  2006-11-29 10:38 ` burnus at gcc dot gnu dot org
@ 2006-11-29 10:50 ` rguenth at gcc dot gnu dot org
  2006-11-29 15:58 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-11-29 10:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2006-11-29 10:49 -------
So another possibility is to adjust the 387 patterns to be enabled even without
TARGET_MIX_SSE_I387.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2006-11-29 10:50 ` rguenth at gcc dot gnu dot org
@ 2006-11-29 15:58 ` ubizjak at gmail dot com
  2006-11-29 16:02 ` rguenth at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2006-11-29 15:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from ubizjak at gmail dot com  2006-11-29 15:58 -------
(In reply to comment #3)
> So another possibility is to adjust the 387 patterns to be enabled even without
> TARGET_MIX_SSE_I387.
> 

Considering the fact that even solaris x86_64 libm [1] uses these functions for
DFmode and SFmode, I propose that we use only "TARGET_USE_FANCY_MATH_387"
constraint.

[1] http://svn.genunix.org/repos/devpro/trunk/usr/src/libm/src/i386/amd64/


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2006-11-29 15:58 ` ubizjak at gmail dot com
@ 2006-11-29 16:02 ` rguenth at gcc dot gnu dot org
  2006-11-29 18:19 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-11-29 16:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenth at gcc dot gnu dot org  2006-11-29 16:02 -------
Can we make sure to always emit proper truncation to SF/DFmode if not
TARGET_MIX_SSE_I387?  Just in case two fprem instructions follow each other
and so we don't truncate by moving to memory or SSE registers.  It would be
bad to let excess precision (aka bug 323) sneak in for fpmath=sse when we
tell people to use that to prevent excess precision.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2006-11-29 16:02 ` rguenth at gcc dot gnu dot org
@ 2006-11-29 18:19 ` ubizjak at gmail dot com
  2006-11-29 18:21 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2006-11-29 18:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from ubizjak at gmail dot com  2006-11-29 18:18 -------
(In reply to comment #5)
> Can we make sure to always emit proper truncation to SF/DFmode if not
> TARGET_MIX_SSE_I387?  Just in case two fprem instructions follow each other
> and so we don't truncate by moving to memory or SSE registers.  It would be
> bad to let excess precision (aka bug 323) sneak in for fpmath=sse when we
> tell people to use that to prevent excess precision.

We can't make any guarantees about truncation, but ...
... following patch can. 

2006-11-29  Uros Bizjak  <ubizjak@gmail.com>

        PR target/XXX
        config/i386/i386.md (*truncxfsf2_mixed, *truncxfdf2_mixed): Enable
        patterns for TARGET_80387.
        (*truncxfsf2_i387, *truncxfdf2_i387): Remove.

        (fmod<mode>3, remainder<mode>3): Enable patterns for SSE math.
        Generate truncxf<mode>2 instructions for strict SSE math.

for the testcase:

double test1(double a)
{
  double x = fmod(a, 1.1);
  return fmod(x, 2.1);
}

patched gcc generates (-fno-math-errno for clarity):

test1:
.LFB2:
        movsd   %xmm0, -16(%rsp)
        fldl    -16(%rsp)
        fldl    .LC0(%rip)
        fxch    %st(1)
.L2:
        fprem
        fnstsw  %ax
        testb   $4, %ah
        jne     .L2
        fstp    %st(1)
        fstpl   -8(%rsp)
        fldl    -8(%rsp)
        fldl    .LC1(%rip)
        fxch    %st(1)
.L3:
        fprem
        fnstsw  %ax
        testb   $4, %ah
        jne     .L3
        fstp    %st(1)
        fstpl   -8(%rsp)
        movsd   -8(%rsp), %xmm0
        ret
.LFE2:

In order to get optimal code, truncxf?f2_mixed patterns have to be enabled,
otherwise reload does its job by moving values again to memory and back. The
patch bootstrapps OK, but it will take over night for a regression test.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2006-11-29 18:19 ` ubizjak at gmail dot com
@ 2006-11-29 18:21 ` ubizjak at gmail dot com
  2006-11-29 18:36 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2006-11-29 18:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from ubizjak at gmail dot com  2006-11-29 18:20 -------
Created an attachment (id=12707)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12707&action=view)
Patch to enable x87 fprem and fprem1 for SSE math

I know that I've forgotten something ;)


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |ubizjak at gmail dot com
                   |dot org                     |
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2006-11-29 18:21 ` ubizjak at gmail dot com
@ 2006-11-29 18:36 ` rguenth at gcc dot gnu dot org
  2006-11-29 21:05 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-11-29 18:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2006-11-29 18:36 -------
The patch doesn't like me ;)

richard@trick:~/src/trunk/gcc/config/i386$ patch -p0 < /tmp/p
patching file i386.md
Hunk #1 succeeded at 3892 (offset -49 lines).
Hunk #2 succeeded at 3919 (offset -47 lines).
Hunk #3 succeeded at 3990 (offset -47 lines).
Hunk #4 succeeded at 4017 (offset -45 lines).
Hunk #5 FAILED at 15622.
patch: **** unexpected end of file in patch

what does it generate for

double foo(double a, double b)
{
  double x = fmod(a, 1.1);
  return x + b;
}

does it do the truncation as part of the x87 -> SSE register move or
is there extra operations involved?  If we can get all variants optimal
(store to memory comes to my mind as well) it would be nice!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (7 preceding siblings ...)
  2006-11-29 18:36 ` rguenth at gcc dot gnu dot org
@ 2006-11-29 21:05 ` ubizjak at gmail dot com
  2006-11-30  6:55 ` uros at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2006-11-29 21:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from ubizjak at gmail dot com  2006-11-29 21:05 -------
(In reply to comment #8)
> The patch doesn't like me ;)
> 
> richard@trick:~/src/trunk/gcc/config/i386$ patch -p0 < /tmp/p
> patching file i386.md
> Hunk #1 succeeded at 3892 (offset -49 lines).
> Hunk #2 succeeded at 3919 (offset -47 lines).
> Hunk #3 succeeded at 3990 (offset -47 lines).
> Hunk #4 succeeded at 4017 (offset -45 lines).
> Hunk #5 FAILED at 15622.
> patch: **** unexpected end of file in patch

That is because I have 4 open projects in one branch. In about an hour, the
regression test will finish and I'll post clean patch to gcc-patches.
> 
> what does it generate for
> 
> double foo(double a, double b)
> {
>   double x = fmod(a, 1.1);
>   return x + b;
> }
> 
> does it do the truncation as part of the x87 -> SSE register move or
> is there extra operations involved?  If we can get all variants optimal
> (store to memory comes to my mind as well) it would be nice!
> 

        movsd   %xmm0, -16(%rsp)
        fldl    -16(%rsp)
        fldl    .LC0(%rip)
        fxch    %st(1)
.L2:
        fprem
        fnstsw  %ax
        testb   $4, %ah
        jne     .L2
        fstp    %st(1)
        fstpl   -8(%rsp)
        movsd   -8(%rsp), %xmm0
        addsd   %xmm1, %xmm0
        ret

The x87 store represents the truncation.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (8 preceding siblings ...)
  2006-11-29 21:05 ` ubizjak at gmail dot com
@ 2006-11-30  6:55 ` uros at gcc dot gnu dot org
  2006-11-30  7:17 ` ubizjak at gmail dot com
  2006-11-30  7:18 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: uros at gcc dot gnu dot org @ 2006-11-30  6:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from uros at gcc dot gnu dot org  2006-11-30 06:55 -------
Subject: Bug 29852

Author: uros
Date: Thu Nov 30 06:54:47 2006
New Revision: 119356

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=119356
Log:
        PR target/29852
        * config/i386/i386.md (*truncxfsf2_mixed, *truncxfdf2_mixed): Enable
        insn patterns for TARGET_80387.
        (*truncxfsf2_i387, *truncxfdf2_i387): Remove.
        (*truncxfsf2_i387_1): Rename to *truncxfsf2_i387.
        (*truncxfdf2_i387_1): Rename to *truncxfdf2_i387.
        (fmod<mode>3, remainder<mode>3): Enable expaders for SSE math.
        Generate truncxf<mode>2 insn patterns for strict SSE math.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (9 preceding siblings ...)
  2006-11-30  6:55 ` uros at gcc dot gnu dot org
@ 2006-11-30  7:17 ` ubizjak at gmail dot com
  2006-11-30  7:18 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2006-11-30  7:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from ubizjak at gmail dot com  2006-11-30 07:17 -------
Fixed, by intriducing x87 helpers.

Let's see those benchmarks fly again ;)


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |http://gcc.gnu.org/ml/gcc-
                   |                            |patches/2006-
                   |                            |11/msg02000.html
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/29852] x86_64: SSE version missing for fmod{d,s,x}f3
  2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
                   ` (10 preceding siblings ...)
  2006-11-30  7:17 ` ubizjak at gmail dot com
@ 2006-11-30  7:18 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2006-11-30  7:18 UTC (permalink / raw)
  To: gcc-bugs



-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29852


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-11-30  7:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-15 20:46 [Bug target/29852] New: x86_64: SSE version missing for fmod{d,s,x}f3 burnus at gcc dot gnu dot org
2006-11-15 21:20 ` [Bug target/29852] " rguenth at gcc dot gnu dot org
2006-11-29 10:38 ` burnus at gcc dot gnu dot org
2006-11-29 10:50 ` rguenth at gcc dot gnu dot org
2006-11-29 15:58 ` ubizjak at gmail dot com
2006-11-29 16:02 ` rguenth at gcc dot gnu dot org
2006-11-29 18:19 ` ubizjak at gmail dot com
2006-11-29 18:21 ` ubizjak at gmail dot com
2006-11-29 18:36 ` rguenth at gcc dot gnu dot org
2006-11-29 21:05 ` ubizjak at gmail dot com
2006-11-30  6:55 ` uros at gcc dot gnu dot org
2006-11-30  7:17 ` ubizjak at gmail dot com
2006-11-30  7:18 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).