From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-455650-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 119288 invoked by alias); 12 Jun 2017 07:37:36 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 119252 invoked by uid 89); 12 Jun 2017 07:37:33 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Received:Mon
X-HELO: mail-vk0-f43.google.com
Received: from mail-vk0-f43.google.com (HELO mail-vk0-f43.google.com) (209.85.213.43) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 12 Jun 2017 07:37:30 +0000
Received: by mail-vk0-f43.google.com with SMTP id p62so44997189vkp.0        for <gcc-patches@gcc.gnu.org>; Mon, 12 Jun 2017 00:37:35 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=1QcbQprzt1pQneU6GeirB9HCX/fWPch1/1GEFMiPDto=;        b=KW4j+tonygcb4hpUv/bsV965deh9fwoV8yKQ8xFI2Bk0ImVamB5pjxkzRn93TW1/iO         iXPMQNJZJT7g4Utu+0lQQNJ1CoQc3XCyk36/Khx/JNq6fupT0z7lJ8ccwMJAPsQDDvSo         uHPEBm7gZLX+4eSMY+oPiKMcfAeAXvc8Z3tWFSbSo2qKrTIw/Vd2Z6CILt6jXCc2Q20V         C8/tWLiByPHwycIlBnbprZX5EhC2ncVi8P6iRCrM+mMhrXDKta4BLNN3CbiUIXKL3lm1         wvjDNBItDGl0W0DoS3/nm7ASsHpmUb8aDp7YgrBQziQhZbP2UKCVc7DXYcu99cQp+bb2         wX1w==
X-Gm-Message-State: AODbwcAOolKUr1mk3FgYDEq/JyIvjfiggANvoRBa7KJmZL/yX7iNxbH3	NDSAtxcloxVXyJv/UjhYzsN9xjaRb+mw
X-Received: by 10.31.128.11 with SMTP id b11mr6710916vkd.93.1497253053359; Mon, 12 Jun 2017 00:37:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.50.131 with HTTP; Mon, 12 Jun 2017 00:37:32 -0700 (PDT)
In-Reply-To: <099B16EC-5ABF-4DCF-8876-D4FD7D913AE8@gmail.com>
References: <alpine.LSU.2.20.1706081117290.7349@zhemvz.fhfr.qr> <alpine.LSU.2.20.1706081548461.7349@zhemvz.fhfr.qr> <CAKdteOYroGjsFfoeo1Ur=16hgUCPFj04g2mtRAp-7K1Z5N4Q6w@mail.gmail.com> <099B16EC-5ABF-4DCF-8876-D4FD7D913AE8@gmail.com>
From: Christophe Lyon <christophe.lyon@linaro.org>
Date: Mon, 12 Jun 2017 07:37:00 -0000
Message-ID: <CAKdteObujRDGUfTMozBTJETW_Z3YZ98=4K60uqgoNP5uRB65wQ@mail.gmail.com>
Subject: Re: [PATCH] Fix PR66623
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, Richard Biener <rguenther@suse.de>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2017-06/txt/msg00745.txt.bz2

On 9 June 2017 at 17:48, Richard Biener <richard.guenther@gmail.com> wrote:
> On June 9, 2017 5:32:10 PM GMT+02:00, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>>On 8 June 2017 at 15:49, Richard Biener <rguenther@suse.de> wrote:
>>> On Thu, 8 Jun 2017, Richard Biener wrote:
>>>
>>>>
>>>> The following fixes unsafe vectorization of reductions in outer loop
>>>> vectorization.  It combines it with a bit of cleanup in the affected
>>>> function.
>>>>
>>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>>>
>>> Re-testing the following variant -- multiple loop-closed PHI nodes
>>> for the same SSA name happen.
>>>
>>> Richard.
>>>
>>> 2017-06-08  Richard Biener  <rguenther@suse.de>
>>>
>>>         PR tree-optimization/66623
>>>         * tree-vect-loop.c (vect_is_simple_reduction): Cleanup,
>>>         refactor check_reduction into two parts, properly computing
>>>         whether we have to check reduction validity for outer loop
>>>         vectorization.
>>>
>>>         * gcc.dg/vect/pr66623.c: New testcase.
>>>
>>
>>Hi Richard,
>>
>>The new testcase fails on armeb* targets:
>>FAIL:    gcc.dg/vect/pr66623.c -flto -ffat-lto-objects execution test
>>FAIL:    gcc.dg/vect/pr66623.c execution test
>
> Ah, the xfail probably isn't enough.  Can you try a dg-skip or so?
>

I can try, but do we really want to ignore/hide the fact that the generated
code in non-functional on arm big-endian? (execution is OK on little-endian)


>>Christophe
>>
>>
>>> Index: gcc/tree-vect-loop.c
>>> ===================================================================
>>> --- gcc/tree-vect-loop.c        (revision 249007)
>>> +++ gcc/tree-vect-loop.c        (working copy)
>>> @@ -2727,8 +2727,6 @@ vect_is_simple_reduction (loop_vec_info
>>>  {
>>>    struct loop *loop = (gimple_bb (phi))->loop_father;
>>>    struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
>>> -  edge latch_e = loop_latch_edge (loop);
>>> -  tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
>>>    gimple *def_stmt, *def1 = NULL, *def2 = NULL, *phi_use_stmt =
>>NULL;
>>>    enum tree_code orig_code, code;
>>>    tree op1, op2, op3 = NULL_TREE, op4 = NULL_TREE;
>>> @@ -2742,11 +2740,6 @@ vect_is_simple_reduction (loop_vec_info
>>>    *double_reduc = false;
>>>    *v_reduc_type = TREE_CODE_REDUCTION;
>>>
>>> -  /* Check validity of the reduction only for the innermost loop.
>>*/
>>> -  bool check_reduction = ! flow_loop_nested_p (vect_loop, loop);
>>> -  gcc_assert ((check_reduction && loop == vect_loop)
>>> -              || (!check_reduction && flow_loop_nested_p (vect_loop,
>>loop)));
>>> -
>>>    name = PHI_RESULT (phi);
>>>    /* ???  If there are no uses of the PHI result the inner loop
>>reduction
>>>       won't be detected as possibly double-reduction by
>>vectorizable_reduction
>>> @@ -2775,13 +2768,15 @@ vect_is_simple_reduction (loop_vec_info
>>>          {
>>>            if (dump_enabled_p ())
>>>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> -                            "reduction used in loop.\n");
>>> +                            "reduction value used in loop.\n");
>>>            return NULL;
>>>          }
>>>
>>>        phi_use_stmt = use_stmt;
>>>      }
>>>
>>> +  edge latch_e = loop_latch_edge (loop);
>>> +  tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
>>>    if (TREE_CODE (loop_arg) != SSA_NAME)
>>>      {
>>>        if (dump_enabled_p ())
>>> @@ -2795,18 +2790,22 @@ vect_is_simple_reduction (loop_vec_info
>>>      }
>>>
>>>    def_stmt = SSA_NAME_DEF_STMT (loop_arg);
>>> -  if (!def_stmt)
>>> +  if (gimple_nop_p (def_stmt))
>>>      {
>>>        if (dump_enabled_p ())
>>>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> -                        "reduction: no def_stmt.\n");
>>> +                        "reduction: no def_stmt\n");
>>>        return NULL;
>>>      }
>>>
>>>    if (!is_gimple_assign (def_stmt) && gimple_code (def_stmt) !=
>>GIMPLE_PHI)
>>>      {
>>>        if (dump_enabled_p ())
>>> -       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, def_stmt, 0);
>>> +       {
>>> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> +                          "reduction: unhandled reduction operation:
>>");
>>> +         dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
>>def_stmt, 0);
>>> +       }
>>>        return NULL;
>>>      }
>>>
>>> @@ -2822,6 +2821,7 @@ vect_is_simple_reduction (loop_vec_info
>>>      }
>>>
>>>    nloop_uses = 0;
>>> +  auto_vec<gphi *, 3> lcphis;
>>>    FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
>>>      {
>>>        gimple *use_stmt = USE_STMT (use_p);
>>> @@ -2829,6 +2829,9 @@ vect_is_simple_reduction (loop_vec_info
>>>         continue;
>>>        if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
>>>         nloop_uses++;
>>> +      else
>>> +       /* We can have more than one loop-closed PHI.  */
>>> +       lcphis.safe_push (as_a <gphi *> (use_stmt));
>>>        if (nloop_uses > 1)
>>>         {
>>>           if (dump_enabled_p ())
>>> @@ -2873,6 +2876,27 @@ vect_is_simple_reduction (loop_vec_info
>>>        return NULL;
>>>      }
>>>
>>> +  /* If we are vectorizing an inner reduction we are executing that
>>> +     in the original order only in case we are not dealing with a
>>> +     double reduction.  */
>>> +  bool check_reduction = true;
>>> +  if (flow_loop_nested_p (vect_loop, loop))
>>> +    {
>>> +      gphi *lcphi;
>>> +      unsigned i;
>>> +      check_reduction = false;
>>> +      FOR_EACH_VEC_ELT (lcphis, i, lcphi)
>>> +       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result
>>(lcphi))
>>> +         {
>>> +           gimple *use_stmt = USE_STMT (use_p);
>>> +           if (is_gimple_debug (use_stmt))
>>> +             continue;
>>> +           if (! flow_bb_inside_loop_p (vect_loop, gimple_bb
>>(use_stmt)))
>>> +             check_reduction = true;
>>> +         }
>>> +    }
>>> +
>>> +  bool nested_in_vect_loop = flow_loop_nested_p (vect_loop, loop);
>>>    code = orig_code = gimple_assign_rhs_code (def_stmt);
>>>
>>>    /* We can handle "res -= x[i]", which is non-associative by
>>> @@ -2887,27 +2911,8 @@ vect_is_simple_reduction (loop_vec_info
>>>
>>>    if (code == COND_EXPR)
>>>      {
>>> -      if (check_reduction)
>>> +      if (! nested_in_vect_loop)
>>>         *v_reduc_type = COND_REDUCTION;
>>> -    }
>>> -  else if (!commutative_tree_code (code) || !associative_tree_code
>>(code))
>>> -    {
>>> -      if (dump_enabled_p ())
>>> -       report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
>>> -                       "reduction: not commutative/associative: ");
>>> -      return NULL;
>>> -    }
>>> -
>>> -  if (get_gimple_rhs_class (code) != GIMPLE_BINARY_RHS)
>>> -    {
>>> -      if (code != COND_EXPR)
>>> -        {
>>> -         if (dump_enabled_p ())
>>> -           report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
>>> -                           "reduction: not binary operation: ");
>>> -
>>> -          return NULL;
>>> -        }
>>>
>>>        op3 = gimple_assign_rhs1 (def_stmt);
>>>        if (COMPARISON_CLASS_P (op3))
>>> @@ -2918,30 +2923,35 @@ vect_is_simple_reduction (loop_vec_info
>>>
>>>        op1 = gimple_assign_rhs2 (def_stmt);
>>>        op2 = gimple_assign_rhs3 (def_stmt);
>>> -
>>> -      if (TREE_CODE (op1) != SSA_NAME && TREE_CODE (op2) !=
>>SSA_NAME)
>>> -        {
>>> -          if (dump_enabled_p ())
>>> -            report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
>>> -                           "reduction: uses not ssa_names: ");
>>> -
>>> -          return NULL;
>>> -        }
>>>      }
>>> -  else
>>> +  else if (!commutative_tree_code (code) || !associative_tree_code
>>(code))
>>> +    {
>>> +      if (dump_enabled_p ())
>>> +       report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
>>> +                       "reduction: not commutative/associative: ");
>>> +      return NULL;
>>> +    }
>>> +  else if (get_gimple_rhs_class (code) == GIMPLE_BINARY_RHS)
>>>      {
>>>        op1 = gimple_assign_rhs1 (def_stmt);
>>>        op2 = gimple_assign_rhs2 (def_stmt);
>>> +    }
>>> +  else
>>> +    {
>>> +      if (dump_enabled_p ())
>>> +       report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
>>> +                       "reduction: not handled operation: ");
>>> +      return NULL;
>>> +    }
>>>
>>> -      if (TREE_CODE (op1) != SSA_NAME && TREE_CODE (op2) !=
>>SSA_NAME)
>>> -        {
>>> -          if (dump_enabled_p ())
>>> -           report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
>>> -                           "reduction: uses not ssa_names: ");
>>> +  if (TREE_CODE (op1) != SSA_NAME && TREE_CODE (op2) != SSA_NAME)
>>> +    {
>>> +      if (dump_enabled_p ())
>>> +       report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
>>> +                       "reduction: both uses not ssa_names: ");
>>>
>>> -          return NULL;
>>> -        }
>>> -   }
>>> +      return NULL;
>>> +    }
>>>
>>>    type = TREE_TYPE (gimple_assign_lhs (def_stmt));
>>>    if ((TREE_CODE (op1) == SSA_NAME
>>> @@ -3091,7 +3101,7 @@ vect_is_simple_reduction (loop_vec_info
>>>                            == vect_internal_def
>>>                       && !is_loop_header_bb_p (gimple_bb (def2)))))))
>>>      {
>>> -      if (check_reduction && orig_code != MINUS_EXPR)
>>> +      if (! nested_in_vect_loop && orig_code != MINUS_EXPR)
>>>         {
>>>           /* Check if we can swap operands (just for simplicity - so
>>that
>>>              the rest of the code can assume that the reduction
>>variable
>>> @@ -3145,7 +3155,8 @@ vect_is_simple_reduction (loop_vec_info
>>>      }
>>>
>>>    /* Try to find SLP reduction chain.  */
>>> -  if (check_reduction && code != COND_EXPR
>>> +  if (! nested_in_vect_loop
>>> +      && code != COND_EXPR
>>>        && vect_is_slp_reduction (loop_info, phi, def_stmt))
>>>      {
>>>        if (dump_enabled_p ())
>>> Index: gcc/testsuite/gcc.dg/vect/pr66623.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.dg/vect/pr66623.c (revision 0)
>>> +++ gcc/testsuite/gcc.dg/vect/pr66623.c (working copy)
>>> @@ -0,0 +1,86 @@
>>> +/* { dg-require-effective-target vect_float } */
>>> +
>>> +#include "tree-vect.h"
>>> +
>>> +extern void abort (void);
>>> +
>>> +#define OP *
>>> +#define INIT 1.0
>>> +
>>> +float __attribute__((noinline,noclone))
>>> +foo (float *__restrict__ i)
>>> +{
>>> +  float l = INIT;
>>> +  int a;
>>> +  int b;
>>> +
>>> +  for (a = 0; a < 4; a++)
>>> +    for (b = 0; b < 4; b++)
>>> +      l = l OP i[b];
>>> +
>>> +  return l;
>>> +}
>>> +
>>> +float __attribute__((noinline,noclone))
>>> +foo_ref (float *__restrict__ i)
>>> +{
>>> +  float l = INIT;
>>> +
>>> +  l = l OP i[0];
>>> +  l = l OP i[1];
>>> +  l = l OP i[2];
>>> +  l = l OP i[3];
>>> +
>>> +  l = l OP i[0];
>>> +  l = l OP i[1];
>>> +  l = l OP i[2];
>>> +  l = l OP i[3];
>>> +
>>> +  l = l OP i[0];
>>> +  l = l OP i[1];
>>> +  l = l OP i[2];
>>> +  l = l OP i[3];
>>> +
>>> +  l = l OP i[0];
>>> +  l = l OP i[1];
>>> +  l = l OP i[2];
>>> +  l = l OP i[3];
>>> +
>>> +  return l;
>>> +}
>>> +
>>> +union u
>>> +{
>>> +  float f;
>>> +  unsigned int u;
>>> +};
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> +  union u res, res2;
>>> +  float a[4];
>>> +
>>> +  if (sizeof (float) != sizeof (unsigned int))
>>> +    return 0;
>>> +
>>> +  check_vect ();
>>> +
>>> +  a[0] = 0.01;
>>> +  a[1] = 0.01;
>>> +  a[2] = 0.01;
>>> +  a[3] = 1.0;
>>> +
>>> +  res.f = foo_ref (a);
>>> +
>>> +  res2.f = foo (a);
>>> +
>>> +  if (res.u != res2.u)
>>> +    abort ();
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +/* need -ffast-math to vectorize this loop.  */
>>> +/* ARM NEON passes -ffast-math to these tests, so expect this to
>>fail.  */
>>> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {
>>xfail arm_neon_ok } } } */
>