public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/37489]  New: In cse.c:fold_rtx(), "true" is represented in floating-point modes as const_true_rtx, if FLOAT_STORE_FLAG_VALUE is undefined.
@ 2008-09-11 23:36 raksit at gcc dot gnu dot org
  2008-09-11 23:54 ` [Bug target/37489] " pinskia at gcc dot gnu dot org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: raksit at gcc dot gnu dot org @ 2008-09-11 23:36 UTC (permalink / raw)
  To: gcc-bugs

Consider the c++ code:
------
class StatVal {
 public:
  StatVal(double ev, double va)
    : m(ev),
      v(va) {}
  StatVal(const StatVal& other)
    : m(other.m),
      v(other.v) {}
  StatVal& operator*=(const StatVal& other) {
    double A = m == 0 ? 1.0 : v / (m * m);
    double B = other.m == 0 ? 1.0 : other.v / (other.m * other.m);
    m = m * other.m;
    v = m * m * (A + B);
    return *this;
  }
  double m;
  double v;
};

extern "C" void abort (void);
const StatVal two_dot_three(2, 0.3);

int main(int argc, char **argv) {
  StatVal product3(two_dot_three);
  product3 *= two_dot_three;
  if (product3.v > 2.5)
    abort();
}
------

In the above code, product3.v should be 2.4, and the program aborts if this
value is greater than 2.5.

Compiled with the trunk gcc, the program aborts if options "-O1
-fno-guess-branch-probability -fcse-follow-jumps -fgcse -frerun-cse-after-loop"
are used. Lets call these "baseOptions".
On the gcc-4.3 branch, this program aborts if compiled with those options, and
also if compiled with simply "-O2".

Lets look at interesting snippets of assembley code generated with the trunk
compiler:
(1) First, with using "baseOptions -fno-if-conversion
-fno-rerun-cse-after-loop" (the test program passes with these options):
        ...snip...
        ucomisd %xmm3, %xmm0
        jne     .L12
        .p2align 4,,3
        .p2align 3
        jp      .L12
        movsd   .LC3(%rip), %xmm0
        jmp     .L7
.L12:
        movapd  %xmm2, %xmm0
.L7:
        mulsd   %xmm1, %xmm1
        ...

(2) Second, with using "baseOptions -fno-rerun-cse-after-loop" (the test
program passes with these options):
        ...snip...
        cmpneqsd        %xmm3, %xmm0
        movapd  %xmm2, %xmm3
        andpd   %xmm0, %xmm3
        movsd   .LC3(%rip), %xmm4
        andnpd  %xmm4, %xmm0
        orpd    %xmm3, %xmm0
        ...

Comparing (1) and (2), the if-conversion gets rid of a few branches by
converting:
if (condX) x=a; else x = b;
into:
maskX = condX ? 0xfff.. : 0;  // cmpneqsd %xmm3, %xmm0
x1 = a & maskX;               // andpd    %xmm0, %xmm3
x2 = b & ~maskX;              // andnpd   %xmm4, %xmm0
x = x1 | x2;                  // orpd     %xmm3, %xmm0

(3) Lastly, with using "baseOptions" (the test program fails now):
        ...snip...
        cmpneqsd        %xmm3, %xmm0
        movapd  %xmm2, %xmm3
        andpd   %xmm0, %xmm3
        movapd  %xmm3, %xmm0
        movsd   .LC3(%rip), %xmm3
        orpd    %xmm0, %xmm3
        ...

What has happened is that the "cse2" phase has deleted the "andnpd"
instruction.
We will have to look at the (.cse2) dump file to figure out why:

---- snip ----
(insn 69 15 70 3 /home/raksit/bug-test.C:17 (set (reg:DF 77)
        (mem/u/c/i:DF (symbol_ref/u:DI ("*.LC3") [flags 0x2]) [0 S8 A64])) 103
{*movdf_integer_rex64} (expr_list:REG_EQUAL (const_double:DF 1.0e+0 [0x0.8p+1])
        (nil)))

(insn 70 69 71 3 /home/raksit/bug-test.C:17 (set (reg:DF 78)
        (ne:DF (reg:DF 61 [ D.2927 ])
            (reg:DF 68))) 615 {*sse_setccdf} (expr_list:REG_EQUAL (const_int 1
[0x1])
        (nil)))

(insn 71 70 73 3 /home/raksit/bug-test.C:17 (set (reg:DF 79)
        (and:DF (reg/v:DF 60 [ B ])
            (reg:DF 78))) 1277 {*anddf3} (nil))

(insn 73 71 31 3 /home/raksit/bug-test.C:17 (set (reg/v:DF 58 [ B.25 ])
        (ior:DF (reg:DF 77)
            (reg:DF 79))) 1278 {*iordf3} (nil))
--------------

The interesting part is:
(insn 70 69 71 3 /home/raksit/bug-test.C:17 (set (reg:DF 78)
        (ne:DF (reg:DF 61 [ D.2927 ])
            (reg:DF 68))) 615 {*sse_setccdf} (expr_list:REG_EQUAL (const_int 1
[0x1])
        (nil)))

This instruction corresponds to:
maskX = condX ? 0xfff.. : 0;  // cmpneqsd %xmm3, %xmm0

Gcc is able to figure out that condX evaluates to true at compile-time -- and
this is conveyed by the "REG_EQUAL (const_int 1 [0x1])" note on the
instruction.
This note which says that maskX is equal to "const_int 1", is added by
cse.c:fold_rtx(). It is this REG_EQUAL note that ultimately results in the CSE
phase deleting the andnpd instruction (because, given the generated code
sequence, maskX should be folded into 0xffff..., not "const_int 1").

The problem is in cse.c:fold_rtx(), when it folds the given floating-point-mode
RTX into a true/false value. The code in fold_rtx() checks the
FLOAT_STORE_FLAG_VALUE macro to find the correct representation of "true" in
floating-point modes. But if this macro is not defined (its not for the i386
target), it uses const_true_rtx, which is equal to "const_int 1".

This is different behavior from something closely related in simplify-rtx.c.
In simplify-rtx.c:simplify_relational_operation():
---- snip ----
  tem = simplify_const_relational_operation (code, cmp_mode, op0, op1);
  if (tem)
    {
      if (SCALAR_FLOAT_MODE_P (mode))
        {
          if (tem == const0_rtx)
            return CONST0_RTX (mode);
#ifdef FLOAT_STORE_FLAG_VALUE
          {
            REAL_VALUE_TYPE val;
            val = FLOAT_STORE_FLAG_VALUE (mode);
            return CONST_DOUBLE_FROM_REAL_VALUE (val, mode);
          }
#else
          return NULL_RTX;
#endif
        }
--------------

Above, if simplify_const_relational_operation() can simplify the given rel-op
expression to a compile-time true/false, the returned value tem may be
const_true_rtx/const0_rtx.
When its const_true_rtx and this expression is floating-point mode, the
FLOAT_STORE_FLAG_VALUE macro is used to return the correct floating-point
representation of true. And if the macro is undefined, instead of returning
const_true_rtx, NULL_RTX is returned (i.e., the given expression couldn't be
simplified).

One fix for the bug described above is to make cse.c:fold_rtx() behave the same
way as simplify-rtx.c, i.e., when a floating-point-mode expression can be
folded into "true", but the FLOAT_STORE_FLAG_VALUE is undefined, give up on
folding instead of returning const_true_rtx as the folded expression.


-- 
           Summary: In cse.c:fold_rtx(), "true" is represented in floating-
                    point modes as const_true_rtx, if FLOAT_STORE_FLAG_VALUE
                    is undefined.
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Keywords: wrong-code, ssemmx
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: raksit at gcc dot gnu dot org
        ReportedBy: raksit at gcc dot gnu dot org
GCC target triplet: x86_64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37489


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-11-06 15:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-11 23:36 [Bug rtl-optimization/37489] New: In cse.c:fold_rtx(), "true" is represented in floating-point modes as const_true_rtx, if FLOAT_STORE_FLAG_VALUE is undefined raksit at gcc dot gnu dot org
2008-09-11 23:54 ` [Bug target/37489] " pinskia at gcc dot gnu dot org
2008-09-12  0:47 ` hjl dot tools at gmail dot com
2008-09-12  1:09 ` raksit at gcc dot gnu dot org
2008-09-12 17:54 ` hjl dot tools at gmail dot com
2008-09-12 18:53 ` raksit at gcc dot gnu dot org
2008-09-12 18:55 ` raksit at gcc dot gnu dot org
2008-09-12 19:42 ` [Bug middle-end/37489] const_true_rtx returned for float compare hjl dot tools at gmail dot com
2008-09-13 15:50 ` hjl at gcc dot gnu dot org
2008-09-13 16:01 ` [Bug rtl-optimization/37489] " hjl dot tools at gmail dot com
2008-11-06 14:05 ` rguenth at gcc dot gnu dot org
2008-11-06 15:08 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).