public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/53100] New: Optimize __int128 with range information
@ 2012-04-24  7:08 marc.glisse at normalesup dot org
  2012-04-29  8:06 ` [Bug middle-end/53100] " marc.glisse at normalesup dot org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-04-24  7:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100

             Bug #: 53100
           Summary: Optimize __int128 with range information
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: marc.glisse@normalesup.org


(not sure about the "component" field)

In the following program, on x86_64, the first version generates two imulq,
while the second generates 4 imulq and 2 mulq. It would be convenient if I
could just write the whole code with __int128 and let the compiler do the
optimization by tracking the range of numbers.

int f(int a,int b,int c,int d,int e,int f){
#if 0
  long x=a;
  long y=b;
  long z=c;
  long t=d;
  long u=e;
  long v=f;
  return (z-x)*(__int128)(v-y) < (u-x)*(__int128)(t-y);
#else
  __int128 x=a;
  __int128 y=b;
  __int128 z=c;
  __int128 t=d;
  __int128 u=e;
  __int128 v=f;
  return (z-x)*(v-y) < (u-x)*(t-y);
#endif
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/53100] Optimize __int128 with range information
  2012-04-24  7:08 [Bug middle-end/53100] New: Optimize __int128 with range information marc.glisse at normalesup dot org
@ 2012-04-29  8:06 ` marc.glisse at normalesup dot org
  2012-04-29  8:43 ` marc.glisse at normalesup dot org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-04-29  8:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100

--- Comment #1 from Marc Glisse <marc.glisse at normalesup dot org> 2012-04-29 08:05:59 UTC ---
(In reply to comment #0)
> It would be convenient if I
> could just write the whole code with __int128 and let the compiler do the
> optimization by tracking the range of numbers.

The transformation from an __int128 to a pair of long happens extremely late
(optabs.c), so we can't count on tree-vrp to notice that one of them is always
zero (and actually it is either 0 or -1, as a sign extension, which would make
this hard). On the other hand, tree-vrp does have the information that the
differences are in [-4294967295, 4294967295], which comfortably fits in a type
half the size of __int128. It seems a possible strategy would be to have
tree-vrp mark variables that fit in a type half their size (only for TImode?),
try and preserve that information along the way, and finally use it in
expand_doubleword_mult. But that seems to imply storing the information in an
rtx, and rtx seems a bit too densely packed to add this.

Better ideas?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/53100] Optimize __int128 with range information
  2012-04-24  7:08 [Bug middle-end/53100] New: Optimize __int128 with range information marc.glisse at normalesup dot org
  2012-04-29  8:06 ` [Bug middle-end/53100] " marc.glisse at normalesup dot org
@ 2012-04-29  8:43 ` marc.glisse at normalesup dot org
  2012-05-01 12:47 ` marc.glisse at normalesup dot org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-04-29  8:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100

--- Comment #2 from Marc Glisse <marc.glisse at normalesup dot org> 2012-04-29 08:42:36 UTC ---
(In reply to comment #1)
> On the other hand, tree-vrp does have the information that the
> differences are in [-4294967295, 4294967295], which comfortably fits in a type
> half the size of __int128. It seems a possible strategy would be to have
> tree-vrp mark variables that fit in a type half their size (only for TImode?),
> try and preserve that information along the way, and finally use it in
> expand_doubleword_mult.

An other possibility would be, when the range analysis detects this situation,
to have it introduce a double-cast: (__int128)(long)var. In the example here,
it would give:

((__int128)(long)((__int128)c-(__int128)a))*((__int128)(long)((__int128)f-(__int128)b))

and existing optimizations already handle:

(long)((__int128)c-(__int128)a) as (long)c-(long)a

and

(__int128)mylong1*(__int128)mylong2 as a widening multiplication.

But then we'd have to be careful not to introduce too many such casts, not to
introduce them too late, and not to introduce them just before an optimization
that removes them. And find the appropriate half-sized type to cast to. And
possibly do this only for modes not handled natively.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/53100] Optimize __int128 with range information
  2012-04-24  7:08 [Bug middle-end/53100] New: Optimize __int128 with range information marc.glisse at normalesup dot org
  2012-04-29  8:06 ` [Bug middle-end/53100] " marc.glisse at normalesup dot org
  2012-04-29  8:43 ` marc.glisse at normalesup dot org
@ 2012-05-01 12:47 ` marc.glisse at normalesup dot org
  2021-08-06  0:33 ` [Bug tree-optimization/53100] " pinskia at gcc dot gnu.org
  2023-07-21 23:32 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-05-01 12:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100

--- Comment #3 from Marc Glisse <marc.glisse at normalesup dot org> 2012-05-01 12:47:03 UTC ---
(In reply to comment #2)
> and not to introduce them just before an optimization that removes them.

Usually, doing (long)num1*(__int128)(long)num2 does the right thing. I tried in
the example here replacing the plain __int128 multiplications with:

inline bool g1(__int128 x){
  //return(x<=LONG_MAX)&&(x>=LONG_MIN);
  //on 2 lines because of PR30318, unless you apply the patch I posted there
  bool b1 = x<=LONG_MAX;
  bool b2 = x>=LONG_MIN;
  return b1&&b2;
}
inline __int128 mul(__int128 a,__int128 b){
  bool B=g1(a)&&g1(b);
  if(__builtin_constant_p(B)&&B)
    return (long)a*(__int128)(long)b;
  return a*b;
}

__builtin_constant_p does detect we are in the right case, however, because of
bad timing between the various optimizations, the double cast
(__int128)(long)(u-x) is simplified to just (u-x) before it gets a chance to
help. I need to replace the subtraction instead (or in addition) to the
multiplication:

inline __int128 sub(__int128 a,__int128 b){
  bool B=g1(a)&&g1(b)&& g1(a-b);
  if(__builtin_constant_p(B)&&B)
    return (long)a-(long)b;
  return a-b;
}

But it would fit better inside the compiler than as a fragile use of
__builtin_constant_p.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/53100] Optimize __int128 with range information
  2012-04-24  7:08 [Bug middle-end/53100] New: Optimize __int128 with range information marc.glisse at normalesup dot org
                   ` (2 preceding siblings ...)
  2012-05-01 12:47 ` marc.glisse at normalesup dot org
@ 2021-08-06  0:33 ` pinskia at gcc dot gnu.org
  2023-07-21 23:32 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-06  0:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |tree-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-08-06
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So in the #if 0 case we get:
  x_13 = (long int) a_12(D);
  y_15 = (long int) b_14(D);
  z_17 = (long int) c_16(D);
  t_19 = (long int) d_18(D);
  u_21 = (long int) e_20(D);
  v_23 = (long int) f_22(D);
  _1 = z_17 - x_13;
  _3 = v_23 - y_15;
  _5 = _1 w* _3;

Notice the w*

While with the original case we get:
  x_9 = (__int128) a_8(D);
  y_11 = (__int128) b_10(D);
  z_13 = (__int128) c_12(D);
  t_15 = (__int128) d_14(D);
  u_17 = (__int128) e_16(D);
  v_19 = (__int128) f_18(D);
  _1 = z_13 - x_9;
  _2 = v_19 - y_11;
  _3 = _1 * _2;

If we had simplified/shortened: ((__int128) c_12(D)) - ((__int128) a_8(D))
to
(__int128)(long)((unsigned long) c_12(D)) - ((unsigned long) a_8(D))
We might have optimized this.
There might be another bug about having some PLUS_EXPR which has WRAPPING
effects rather than undefined OVERFLOW which will help the casting issue.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/53100] Optimize __int128 with range information
  2012-04-24  7:08 [Bug middle-end/53100] New: Optimize __int128 with range information marc.glisse at normalesup dot org
                   ` (3 preceding siblings ...)
  2021-08-06  0:33 ` [Bug tree-optimization/53100] " pinskia at gcc dot gnu.org
@ 2023-07-21 23:32 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-21 23:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2021-08-06 00:00:00         |2023-7-21

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(simplify
 (plus (convert @0) (convert @1))
 (if (INTEGRAL_TYPE (type)
      && types_match (TREE_TYPE (@0), TREE_TYPE (@1))
      && element_precision (TREE_TYPE (@0)) < element_precision (type))
 (with { type ntype = find_next_best_type (TREE_TYPE (@0));
  }
  (if (ntype
       && !types_match (ntype, type)
       && element_precision (ntype) < element_precision (type))
   (convert (plus (convert:ntype @0) (convert:ntype @1)))
  )
 )
)

Where find_next_best_type checks int, long, long long types ...

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-21 23:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-24  7:08 [Bug middle-end/53100] New: Optimize __int128 with range information marc.glisse at normalesup dot org
2012-04-29  8:06 ` [Bug middle-end/53100] " marc.glisse at normalesup dot org
2012-04-29  8:43 ` marc.glisse at normalesup dot org
2012-05-01 12:47 ` marc.glisse at normalesup dot org
2021-08-06  0:33 ` [Bug tree-optimization/53100] " pinskia at gcc dot gnu.org
2023-07-21 23:32 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).