From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by sourceware.org (Postfix) with ESMTPS id 9CD2A3857804 for ; Wed, 16 Jun 2021 14:22:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9CD2A3857804 Received: by mail-ed1-x533.google.com with SMTP id u24so2921137edy.11 for ; Wed, 16 Jun 2021 07:22:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1fRjYM78BgTdFBnv63lkJJYPpMdPHMb92f8p7mXWuBs=; b=TLFnzhxWmhTAGA5fN6YlNFRtbSZCqKQUFfAOxftGDYYjvXNWEiGLBa5lkssQYv4n3q TFfpH0g/FJhTPYSoq+eAVVURtTfTmDvbLx1baKtJpQ68FwtCxDHblBU7yE6hDO3djmqu m/cSQKP0Xux18kJGzhCIXkc+cwzCQmCGlI2fDVdPQZpHIC/mec8ENapmj9O8A9ehWdAx Ubb4LJkvK0l+Lja5W5wwwTIW6zO0+ltpmAbK5K+yFaSPbWtXGLH/z3KSXO5Ze+JWd4Kq qezEFvMEOYdqE0WQMLZ9b9YLex1TXnDGPH3KUAV7fXud6Xs3UgBFQRlvm7Yd1vF6tuoo lCAg== X-Gm-Message-State: AOAM530UR5QENua8LHY7Vhmg7eadCjdHmQ/duYQEkcp3BM28KVfPoYql SiLl7UaGi1eTdsqd0OcPkICnc2azosKQI41EO9c= X-Google-Smtp-Source: ABdhPJzmVVU66L9K0jGPLAd56xpHCIcO8KDIiXpggZbJH4EnCz4oLwyyojRNcQWg9xcUZCqtLicvom9xRidG7Ayl8kg= X-Received: by 2002:a05:6402:175b:: with SMTP id v27mr4552188edx.61.1623853366544; Wed, 16 Jun 2021 07:22:46 -0700 (PDT) MIME-Version: 1.0 References: <20210208124716.487709-1-stefansf@linux.ibm.com> In-Reply-To: From: Richard Biener Date: Wed, 16 Jun 2021 16:22:35 +0200 Message-ID: Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns To: Stefan Schulze Frielinghaus Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2021 14:22:49 -0000 On Mon, Jun 14, 2021 at 7:26 PM Stefan Schulze Frielinghaus wrote: > > On Thu, May 20, 2021 at 08:37:24PM +0200, Stefan Schulze Frielinghaus wrote: > [...] > > > but we won't ever arrive here because of the niters condition. But > > > yes, doing the pattern matching in the innermost loop processing code > > > looks good to me - for the specific case it would be > > > > > > /* Don't distribute loop if niters is unknown. */ > > > tree niters = number_of_latch_executions (loop); > > > if (niters == NULL_TREE || niters == chrec_dont_know) > > > ---> here? > > > continue; > > > > Right, please find attached a new version of the patch where everything > > is included in the loop distribution pass. I will do a bootstrap and > > regtest on IBM Z over night. If you give me green light I will also do > > the same on x86_64. > > Meanwhile I gave it a shot on x86_64 where the testsuite runs fine (at > least the ldist-strlen testcase). If you are Ok with the patch, then I > would rebase and run the testsuites again and post a patch series > including the rawmemchr implementation for IBM Z. @@ -3257,6 +3261,464 @@ find_seed_stmts_for_distribution (class loop *loop, vec *work_list) return work_list->length () > 0; } +static void +generate_rawmemchr_builtin (loop_p loop, tree reduction_var, + data_reference_p store_dr, tree base, tree pattern, + location_t loc) +{ this new function needs a comment. Applies to all of the new ones, btw. + gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (base)) + && TREE_TYPE (TREE_TYPE (base)) == TREE_TYPE (pattern)); this looks fragile and is probably unnecessary as well. + gcc_checking_assert (TREE_TYPE (reduction_var) == TREE_TYPE (base)); in general you want types_compatible_p () checks which for pointers means all pointers are compatible ... (skipping stuff) @@ -3321,10 +3783,20 @@ loop_distribution::execute (function *fun) && !optimize_loop_for_speed_p (loop))) continue; - /* Don't distribute loop if niters is unknown. */ + /* If niters is unknown don't distribute loop but rather try to transform + it to a call to a builtin. */ tree niters = number_of_latch_executions (loop); if (niters == NULL_TREE || niters == chrec_dont_know) - continue; + { + if (transform_reduction_loop (loop)) + { + changed = true; + loops_to_be_destroyed.safe_push (loop); + if (dump_file) + fprintf (dump_file, "Loop %d transformed into a builtin.\n", loop->num); + } + continue; + } please look at if (nb_generated_loops + nb_generated_calls > 0) { changed = true; if (dump_enabled_p ()) dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, "Loop%s %d distributed: split to %d loops " "and %d library calls.\n", str, loop->num, nb_generated_loops, nb_generated_calls); and follow the use of dump_* and MSG_OPTIMIZED_LOCATIONS so the transforms are reported with -fopt-info-loop + + return transform_reduction_loop_1 (loop, load_dr, store_dr, reduction_var); +} what's the point in tail-calling here and visually splitting the function in half? (sorry for picking random pieces now ;)) + for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); + gsi_next (&bsi), ++ninsns) + { this counts debug insns, I guess you want gsi_next_nondebug at least. not sure why you are counting PHIs at all btw - for the loops you match you are expecting at most two, one IV and eventually one for the virtual operand of the store? + if (gimple_has_volatile_ops (phi)) + return false; PHIs never have volatile ops. + if (gimple_clobber_p (phi)) + continue; or are clobbers. Btw, can you factor out a helper from find_single_drs working on a stmt to reduce code duplication? + tree reduction_var; + switch (gimple_code (reduction_stmt)) + { + case GIMPLE_PHI: + reduction_var = gimple_phi_result (reduction_stmt); + break; + case GIMPLE_ASSIGN: + reduction_var = gimple_assign_lhs (reduction_stmt); + break; + default: + /* Bail out e.g. for GIMPLE_CALL. */ + return false; gimple_get_lhs (reduction_stmt); would work for both PHIs and assigns. + if (reduction_var == NULL) + return false; it can never be NULL here. + /* Bail out if this is a bitfield memory reference. */ + if (TREE_CODE (DR_REF (load_dr)) == COMPONENT_REF + && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (load_dr), 1))) + return false; ... I see this is again quite some code copied from find_single_drs, please see how to avoid this much duplication by splitting out helpers. +static bool +transform_reduction_loop_1 (loop_p loop, + data_reference_p load_dr, + data_reference_p store_dr, + tree reduction_var) +{ + tree load_ref = DR_REF (load_dr); + tree load_type = TREE_TYPE (load_ref); + tree load_access_base = build_fold_addr_expr (load_ref); + tree load_access_size = TYPE_SIZE_UNIT (load_type); + affine_iv load_iv, reduction_iv; + tree pattern; + + /* A limitation of the current implementation is that we only support + constant patterns. */ + edge e = single_exit (loop); + gcond *cond_stmt = safe_dyn_cast (last_stmt (e->src)); + if (!cond_stmt) + return false; that looks like checks to be done at the start of transform_reduction_loop, not this late. + if (gimple_cond_code (cond_stmt) != NE_EXPR + || gimple_cond_lhs (cond_stmt) != gimple_assign_lhs (DR_STMT (load_dr)) + || TREE_CODE (pattern) != INTEGER_CST) + return false; half of this as well. Btw, there's no canonicalization for the tests so you have to verify the false edge actually exits the loop and allow for EQ_EXPR in case the false edge does. + /* Handle strlen like loops. */ + if (store_dr == NULL + && integer_zerop (pattern) + && TREE_CODE (reduction_iv.base) == INTEGER_CST + && TREE_CODE (reduction_iv.step) == INTEGER_CST + && integer_onep (reduction_iv.step) + && (types_compatible_p (TREE_TYPE (reduction_var), size_type_node) + || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)))) + { I wonder what goes wrong with a larger or smaller wrapping IV type? The iteration only stops when you load a NUL and the increments just wrap along (you're using the pointer IVs to compute the strlen result). Can't you simply truncate? For larger than size_type_node (actually larger than ptr_type_node would matter I guess), the argument is that since pointer wrapping would be undefined anyway the IV cannot wrap either. Now, the correct check here would IMHO be TYPE_PRECISION (TREE_TYPE (reduction_var)) < TYPE_PRECISION (ptr_type_node) || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (pointer-iv-var)) ? + if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))) + { + const char *msg = G_("assuming signed overflow does not occur " + "when optimizing strlen like loop"); + fold_overflow_warning (msg, WARN_STRICT_OVERFLOW_MISC); + } no, please don't add any new strict-overflow warnings ;) The generate_*_builtin routines need some factoring - if you code-generate into a gimple_seq you could use gimple_build () which would do the fold_stmt (not sure why you do that - you should see to fold the call, not necessarily the rest). The replacement of reduction_var and the dumping could be shared. There's also GET_MODE_NAME for the printing. I think overall the approach is sound now but the details still need work. Thanks, Richard. > Cheers, > Stefan