From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com
 [IPv6:2a00:1450:4864:20::533])
 by sourceware.org (Postfix) with ESMTPS id 9CD2A3857804
 for <gcc-patches@gcc.gnu.org>; Wed, 16 Jun 2021 14:22:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9CD2A3857804
Received: by mail-ed1-x533.google.com with SMTP id u24so2921137edy.11
 for <gcc-patches@gcc.gnu.org>; Wed, 16 Jun 2021 07:22:47 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=1fRjYM78BgTdFBnv63lkJJYPpMdPHMb92f8p7mXWuBs=;
 b=TLFnzhxWmhTAGA5fN6YlNFRtbSZCqKQUFfAOxftGDYYjvXNWEiGLBa5lkssQYv4n3q
 TFfpH0g/FJhTPYSoq+eAVVURtTfTmDvbLx1baKtJpQ68FwtCxDHblBU7yE6hDO3djmqu
 m/cSQKP0Xux18kJGzhCIXkc+cwzCQmCGlI2fDVdPQZpHIC/mec8ENapmj9O8A9ehWdAx
 Ubb4LJkvK0l+Lja5W5wwwTIW6zO0+ltpmAbK5K+yFaSPbWtXGLH/z3KSXO5Ze+JWd4Kq
 qezEFvMEOYdqE0WQMLZ9b9YLex1TXnDGPH3KUAV7fXud6Xs3UgBFQRlvm7Yd1vF6tuoo
 lCAg==
X-Gm-Message-State: AOAM530UR5QENua8LHY7Vhmg7eadCjdHmQ/duYQEkcp3BM28KVfPoYql
 SiLl7UaGi1eTdsqd0OcPkICnc2azosKQI41EO9c=
X-Google-Smtp-Source: ABdhPJzmVVU66L9K0jGPLAd56xpHCIcO8KDIiXpggZbJH4EnCz4oLwyyojRNcQWg9xcUZCqtLicvom9xRidG7Ayl8kg=
X-Received: by 2002:a05:6402:175b:: with SMTP id
 v27mr4552188edx.61.1623853366544; 
 Wed, 16 Jun 2021 07:22:46 -0700 (PDT)
MIME-Version: 1.0
References: <20210208124716.487709-1-stefansf@linux.ibm.com>
 <CAFiYyc12nnPUJRcPAig3nmWvNzxVD8Q9zEWQKPAMDvpwg4g_tA@mail.gmail.com>
 <YCj7GniMRQ5DkS37@localhost.localdomain>
 <CAFiYyc0udwxVePaDsWkpVNh6kbiqke0PVv-xarbsXq3VC-Bz-w@mail.gmail.com>
 <YD/EkX2TRRk1qRh5@localhost.localdomain>
 <YFDnMF3i6OsQa6L9@localhost.localdomain>
 <CAFiYyc0Q3OZqbzGBukt=rvd6VM0EcdLKbv1d2uORbGCGg-Lbzg@mail.gmail.com>
 <YJUzcnv1J2HaNro6@localhost.localdomain>
 <CAFiYyc3RR5P4GZXEUiwN2=5d96_pGaHPwPTYCOY=ax-6W3=raw@mail.gmail.com>
 <YKasY3VQ5GDBRomk@localhost.localdomain>
 <YMeRMgR17PE0KTFx@localhost.localdomain>
In-Reply-To: <YMeRMgR17PE0KTFx@localhost.localdomain>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 16 Jun 2021 16:22:35 +0200
Message-ID: <CAFiYyc2JW0H-+Z7X7vwD39T3i2ndF5No6gQ55Qs=0VBtCF5=WA@mail.gmail.com>
Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns
To: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Jun 2021 14:22:49 -0000

On Mon, Jun 14, 2021 at 7:26 PM Stefan Schulze Frielinghaus
<stefansf@linux.ibm.com> wrote:
>
> On Thu, May 20, 2021 at 08:37:24PM +0200, Stefan Schulze Frielinghaus wrote:
> [...]
> > > but we won't ever arrive here because of the niters condition.  But
> > > yes, doing the pattern matching in the innermost loop processing code
> > > looks good to me - for the specific case it would be
> > >
> > >       /* Don't distribute loop if niters is unknown.  */
> > >       tree niters = number_of_latch_executions (loop);
> > >       if (niters == NULL_TREE || niters == chrec_dont_know)
> > > ---> here?
> > >         continue;
> >
> > Right, please find attached a new version of the patch where everything
> > is included in the loop distribution pass.  I will do a bootstrap and
> > regtest on IBM Z over night.  If you give me green light I will also do
> > the same on x86_64.
>
> Meanwhile I gave it a shot on x86_64 where the testsuite runs fine (at
> least the ldist-strlen testcase).  If you are Ok with the patch, then I
> would rebase and run the testsuites again and post a patch series
> including the rawmemchr implementation for IBM Z.

@@ -3257,6 +3261,464 @@ find_seed_stmts_for_distribution (class loop
*loop, vec<gimple *> *work_list)
   return work_list->length () > 0;
 }

+static void
+generate_rawmemchr_builtin (loop_p loop, tree reduction_var,
+                           data_reference_p store_dr, tree base, tree pattern,
+                           location_t loc)
+{

this new function needs a comment.  Applies to all of the new ones, btw.

+  gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (base))
+                      && TREE_TYPE (TREE_TYPE (base)) == TREE_TYPE (pattern));

this looks fragile and is probably unnecessary as well.

+  gcc_checking_assert (TREE_TYPE (reduction_var) == TREE_TYPE (base));

in general you want types_compatible_p () checks which for pointers means
all pointers are compatible ...

(skipping stuff)

@@ -3321,10 +3783,20 @@ loop_distribution::execute (function *fun)
              && !optimize_loop_for_speed_p (loop)))
        continue;

-      /* Don't distribute loop if niters is unknown.  */
+      /* If niters is unknown don't distribute loop but rather try to transform
+        it to a call to a builtin.  */
       tree niters = number_of_latch_executions (loop);
       if (niters == NULL_TREE || niters == chrec_dont_know)
-       continue;
+       {
+         if (transform_reduction_loop (loop))
+           {
+             changed = true;
+             loops_to_be_destroyed.safe_push (loop);
+             if (dump_file)
+               fprintf (dump_file, "Loop %d transformed into a
builtin.\n", loop->num);
+           }
+         continue;
+       }

please look at

          if (nb_generated_loops + nb_generated_calls > 0)
            {
              changed = true;
              if (dump_enabled_p ())
                dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
                                 loc, "Loop%s %d distributed: split to
%d loops "
                                 "and %d library calls.\n", str, loop->num,
                                 nb_generated_loops, nb_generated_calls);

and follow the use of dump_* and MSG_OPTIMIZED_LOCATIONS so the
transforms are reported with -fopt-info-loop

+
+  return transform_reduction_loop_1 (loop, load_dr, store_dr, reduction_var);
+}

what's the point in tail-calling here and visually splitting the
function in half?

(sorry for picking random pieces now ;))

+      for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+          gsi_next (&bsi), ++ninsns)
+       {

this counts debug insns, I guess you want gsi_next_nondebug at least.
not sure why you are counting PHIs at all btw - for the loops you match
you are expecting at most two, one IV and eventually one for the virtual
operand of the store?

+         if (gimple_has_volatile_ops (phi))
+           return false;

PHIs never have volatile ops.

+         if (gimple_clobber_p (phi))
+           continue;

or are clobbers.

Btw, can you factor out a helper from find_single_drs working on a
stmt to reduce code duplication?

+  tree reduction_var;
+  switch (gimple_code (reduction_stmt))
+    {
+    case GIMPLE_PHI:
+      reduction_var = gimple_phi_result (reduction_stmt);
+      break;
+    case GIMPLE_ASSIGN:
+      reduction_var = gimple_assign_lhs (reduction_stmt);
+      break;
+    default:
+      /* Bail out e.g. for GIMPLE_CALL.  */
+      return false;

gimple_get_lhs (reduction_stmt); would work for both PHIs
and assigns.

+  if (reduction_var == NULL)
+    return false;

it can never be NULL here.

+  /* Bail out if this is a bitfield memory reference.  */
+  if (TREE_CODE (DR_REF (load_dr)) == COMPONENT_REF
+      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (load_dr), 1)))
+    return false;
...

I see this is again quite some code copied from find_single_drs, please
see how to avoid this much duplication by splitting out helpers.

+static bool
+transform_reduction_loop_1 (loop_p loop,
+                           data_reference_p load_dr,
+                           data_reference_p store_dr,
+                           tree reduction_var)
+{
+  tree load_ref = DR_REF (load_dr);
+  tree load_type = TREE_TYPE (load_ref);
+  tree load_access_base = build_fold_addr_expr (load_ref);
+  tree load_access_size = TYPE_SIZE_UNIT (load_type);
+  affine_iv load_iv, reduction_iv;
+  tree pattern;
+
+  /* A limitation of the current implementation is that we only support
+     constant patterns.  */
+  edge e = single_exit (loop);
+  gcond *cond_stmt = safe_dyn_cast <gcond *> (last_stmt (e->src));
+  if (!cond_stmt)
+    return false;

that looks like checks to be done at the start of
transform_reduction_loop, not this late.

+  if (gimple_cond_code (cond_stmt) != NE_EXPR
+      || gimple_cond_lhs (cond_stmt) != gimple_assign_lhs (DR_STMT (load_dr))
+      || TREE_CODE (pattern) != INTEGER_CST)
+    return false;

half of this as well.  Btw, there's no canonicalization for
the tests so you have to verify the false edge actually exits
the loop and allow for EQ_EXPR in case the false edge does.

+  /* Handle strlen like loops.  */
+  if (store_dr == NULL
+      && integer_zerop (pattern)
+      && TREE_CODE (reduction_iv.base) == INTEGER_CST
+      && TREE_CODE (reduction_iv.step) == INTEGER_CST
+      && integer_onep (reduction_iv.step)
+      && (types_compatible_p (TREE_TYPE (reduction_var), size_type_node)
+         || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))))
+    {

I wonder what goes wrong with a larger or smaller wrapping IV type?
The iteration
only stops when you load a NUL and the increments just wrap along (you're
using the pointer IVs to compute the strlen result).  Can't you simply truncate?
For larger than size_type_node (actually larger than ptr_type_node would matter
I guess), the argument is that since pointer wrapping would be undefined anyway
the IV cannot wrap either.  Now, the correct check here would IMHO be

      TYPE_PRECISION (TREE_TYPE (reduction_var)) < TYPE_PRECISION
(ptr_type_node)
       || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (pointer-iv-var))

?

+      if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)))
+       {
+         const char *msg = G_("assuming signed overflow does not occur "
+                              "when optimizing strlen like loop");
+         fold_overflow_warning (msg, WARN_STRICT_OVERFLOW_MISC);
+       }

no, please don't add any new strict-overflow warnings ;)

The generate_*_builtin routines need some factoring - if you code-generate
into a gimple_seq you could use gimple_build () which would do the fold_stmt
(not sure why you do that - you should see to fold the call, not necessarily
the rest).  The replacement of reduction_var and the dumping could be shared.
There's also GET_MODE_NAME for the printing.

I think overall the approach is sound now but the details still need work.

Thanks,
Richard.


> Cheers,
> Stefan