public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/49471] New: ICE when -ftree-parallelize-loops is enabled together with -m32 on power7
@ 2011-06-20  8:57 razya at il dot ibm.com
  2011-06-20 10:24 ` [Bug tree-optimization/49471] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: razya at il dot ibm.com @ 2011-06-20  8:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49471

           Summary: ICE when -ftree-parallelize-loops is enabled together
                    with -m32 on power7
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: razya@il.ibm.com


When building cactusADM on igoo, I get an error when autopar is enabled with
-m32
(does not happen for 64!)

This is the command line which is executed:

> /home/razya/gcc-bin/bin/gcc -c -o PUGHReduce/ReductionNormInf.o -DSPEC_CPU -DNDEBUG  -Iinclude -I../include -DCCODE -m32  -ffast-math -O3  -fno-tree-vectorize -fno-vect-cost-model -ftree-parallelize-loops=6         PUGHReduce/ReductionNormInf.c


The source of the failure is at the code generated to expand omp_for pragma (in
tree level).
The expansion of pragma omp_for generates a prolog code which calculates the
particular interval of (the loop's) iterations which the  thread (executing
this code) should execute.
This prolog calculation involves some arithmetic operations, and in particular
MULT and DIV statements which  cause the failures.

so, for example, this is the tree level code generated for the prolog, and the
div and mult operations highlighted in red:


  D.7313_5 = MEM[(struct  *).paral_data_param_1(D)].D.7288; /*  Number of loop
iterations.  */
   D.7316_8 = __builtin_omp_get_num_threads ();
  D.7317_9 = (<unnamed-unsigned:128>) D.7316_8;
  D.7318_10 = __builtin_omp_get_thread_num ();
  D.7319_11 = (<unnamed-unsigned:128>) D.7318_10;
  D.7320_12 = D.7313_5 / D.7317_9;
  D.7321_13 = D.7320_12 * D.7317_9;
  D.7322_14 = D.7321_13 != D.7313_5;
  D.7323_15 = D.7322_14 + D.7320_12;
  ivtmp.575_16 = D.7323_15 * D.7319_11;
  D.7325_17 = ivtmp.575_16 + D.7323_15;
  D.7326_18 = MIN_EXPR <D.7325_17, D.7313_5>;
  if (ivtmp.575_16 >= D.7326_18)
    goto <bb 3>;
  else
    goto <bb 4>;



 when the div expr is  being expanded to RTL code, we fail in executing 
the following expand_binop:
expmed.c:

            quotient = sign_expand_binop (compute_mode,
                                              udiv_optab, sdiv_optab,
                                              op0, op1, target,
                                              unsignedp, OPTAB_LIB_WIDEN);

The call to this function returns NULL (where it shouldn't s far as I
understand).

When I tried removing the div instruction, the mult expr caused an assert
failure 
 in expand_mult() because the following call returned NULL.:

 expmed.c:

  /* This used to use umul_optab if unsigned, but for non-widening multiply
     there is no difference between signed and unsigned.  */
  op0 = expand_binop (mode,
                      ! unsignedp
                      && flag_trapv && (GET_MODE_CLASS(mode) == MODE_INT)
                      ? smulv_optab : smul_optab,
                      op0, op1, target, unsignedp, OPTAB_LIB_WIDEN);
  gcc_assert (op0);

-------------------------------------------------------------------------------

I found that the variables being divided/multiplied are of 128 bit types.
They are created when canonicalize_loop_ivs is called:

tree
canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
{
  unsigned precision = TYPE_PRECISION (TREE_TYPE (*nit));   //precision of
number of iterations

  for (psi = gsi_start_phis (loop->header);         
       !gsi_end_p (psi); gsi_next (&psi))
    {
      gimple phi = gsi_stmt (psi);
      tree res = PHI_RESULT (phi);

      if (is_gimple_reg (res) && TYPE_PRECISION (TREE_TYPE (res)) > precision)
        precision = TYPE_PRECISION (TREE_TYPE (res));
    }

  type = lang_hooks.types.type_for_size (precision, 1);    // here precision is
128 

....
}


Note that this is also the case when -m64 is enabled.
The difference is that the type created by lang_hooks for the -m32 case is
<unnamed-unsigned:128>
and for -m64 it is __int128 unsigned, whose arithmetic operations apparently
are handled correctly by the compiler.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-07-27 16:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-20  8:57 [Bug tree-optimization/49471] New: ICE when -ftree-parallelize-loops is enabled together with -m32 on power7 razya at il dot ibm.com
2011-06-20 10:24 ` [Bug tree-optimization/49471] " rguenth at gcc dot gnu.org
2011-06-20 12:31 ` razya at il dot ibm.com
2011-06-20 12:55 ` rguenth at gcc dot gnu.org
2011-06-20 14:15 ` razya at il dot ibm.com
2011-07-13 15:10 ` razya at gcc dot gnu.org
2011-07-13 15:15 ` razya at gcc dot gnu.org
2011-07-25 13:33 ` [Bug tree-optimization/49471] cactusADM/dealII build with autopar fails on x86, and fails on power7 when -m32 is enabled razya at gcc dot gnu.org
2011-07-27 16:54 ` spop at gcc dot gnu.org
2011-07-27 16:59 ` spop at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).