From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id EADD4384647C for ; Mon, 12 Jul 2021 10:02:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EADD4384647C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id D529C1FF6E; Mon, 12 Jul 2021 10:02:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1626084154; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=PwZLqDzrm1IzCFTVw1grOnq94pXvACZg7jfdSvHqZBg=; b=xIGfQcWizYDqwq1Xp+u0C4NompLJA6lvfFcHRQOmCrbsyEu8H/HlS8FNvTdmnEbnAy3WkQ FHQDTxjvXJEFQxTTkUyxhJW/oy5Gc6zPTCRLIu7Dw9mddMHjy61AFHJ63qXIPmLAzZkMgp 7IUphOo6Q3nVlY7VjS6pWWAG1hfSRy4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1626084154; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=PwZLqDzrm1IzCFTVw1grOnq94pXvACZg7jfdSvHqZBg=; b=TSOUfOa48PuzRvPOEH/M36L48r8+kRNE3ntIX+ccg+PUvEqA1l0+M1M2opW++wDWc2oM/c k5z49xiSF7zc/TAw== Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id C975BA3B85; Mon, 12 Jul 2021 10:02:34 +0000 (UTC) Date: Mon, 12 Jul 2021 12:02:34 +0200 (CEST) From: Richard Biener To: guojiufu cc: Segher Boessenkool , gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com, dje.gcc@gmail.com, jlaw@tachyum.com, amker.cheng@gmail.com Subject: Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837] In-Reply-To: <619972b1f8eb6f567801a8f9e1695b89@imap.linux.ibm.com> Message-ID: References: <20210709020718.177302-1-guojiufu@linux.ibm.com> <20210709170832.GW1583@gate.crashing.org> <619972b1f8eb6f567801a8f9e1695b89@imap.linux.ibm.com> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, PDS_TONAME_EQ_TOLOCAL_HDRS_LCASE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jul 2021 10:02:37 -0000 On Mon, 12 Jul 2021, guojiufu wrote: > On 2021-07-12 16:57, Richard Biener wrote: > > On Mon, 12 Jul 2021, guojiufu wrote: > > > >> On 2021-07-12 14:20, Richard Biener wrote: > >> > On Fri, 9 Jul 2021, Segher Boessenkool wrote: > >> > > >> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote: > >> >> > I wonder if there's a way to query the target what modes the doloop > >> >> > pattern can handle (not being too familiar with the doloop code). > >> >> > >> >> You can look what modes are allowed for operand 0 of doloop_end, > >> >> perhaps? Although that is a define_expand, not a define_insn, so it is > >> >> hard to introspect. > >> >> > >> >> > Why do you need to do any checks besides the new type being able to > >> >> > represent all IV values? The original doloop IV will never wrap > >> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become > >> >> > zero ... I suppose the doloop might still do the correct thing here > >> >> > but it also still will with a IV with larger type). > >> > >> The issue comes from U*_MAX (original short MAX), as you said: on which > >> niter + 1 becomes zero. And because the step for doloop is -1; then, on > >> larger type 'zero - 1' will be a very large number on larger type > >> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small > >> value > >> (e.g. "0xff"). > > > > But for the larger type the small type MAX + 1 fits and does not yield > > zero so it should still work exactly as before, no? Of course you > > have to compute the + 1 in the larger type. > > > You are right, if compute the "+ 1" in the larger type it is ok, as below > code: > ``` > /* Use type in word size may fast. */ > if (TYPE_PRECISION (ntype) < BITS_PER_WORD) > { > ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1); > niter = fold_convert (ntype, niter); > } > > tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter), > build_int_cst (ntype, 1)); > > > add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, NULL, > true); > ``` > The issue of this is, this code generates more stmt for doloop.xxx: > _12 = (unsigned int) xx(D); > _10 = _12 + 4294967295; > _24 = (long unsigned int) _10; > doloop.6_8 = _24 + 1; > > if use previous patch, "+ 1" on original type, then the stmts will looks like: > _12 = (unsigned int) xx(D); > doloop.6_8 = (long unsigned int) _12; > > This is the reason for checking > wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype))) But this then only works when there's an upper bound on the number of iterations. Note you should not use TYPE_MAX_VALUE here but you can instead use wi::ltu_p (niter_desc->max, wi::to_widest (wi::max_value (TYPE_PRECISION (ntype), TYPE_SIGN (ntype)))); I think the -1 above comes from number of latch iterations vs. header entries - it's a common source for this kind of issues. range analysis might be able to prove that we can still merge the two adds even with the intermediate extension. Is this pre-loop extra add really offsetting the in-loop doloop improvements? > >> >> > >> >> doloop_valid_p guarantees it is simple and doesn't wrap. > >> >> > >> >> > I'd have expected sth like > >> >> > > >> >> > ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED > >> >> > (ntype)); > >> >> > > >> >> > thus the decision made using a mode - which is also why I wonder > >> >> > if there's a way to query the target for this. As you say, > >> >> > it _may_ be fast, so better check (somehow). > >> > >> > >> I was also thinking of using hooks like type_for_size/type_for_mode. > >> /* Use type in word size may fast. */ > >> if (TYPE_PRECISION (ntype) < BITS_PER_WORD > >> && Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE > >> (ntype)))) > >> { > >> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1); > >> base = fold_convert (ntype, base); > >> } > >> > >> As you pointed out, this does not query the mode from targets. > >> As Segher pointed out "doloop_end" checks unsupported mode, while it seems > >> not easy to use it in tree-ssa-loop-ivopts.c. > >> For implementations of doloop_end, tartgets like rs6000/aarch64/ia64 > >> requires > >> Pmode/DImode; while there are other targets that work on other 'mode' (e.g. > >> SI). > >> > >> > >> In doloop_optimize, there is code: > >> > >> ``` > >> mode = desc->mode; > >> ..... > >> doloop_reg = gen_reg_rtx (mode); > >> rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg, > >> start_label); > >> > >> word_mode_size = GET_MODE_PRECISION (word_mode); > >> word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - > >> 1; > >> if (! doloop_seq > >> && mode != word_mode > >> /* Before trying mode different from the one in that # of > >> iterations is > >> computed, we must be sure that the number of iterations fits > >> into > >> the new mode. */ > >> && (word_mode_size >= GET_MODE_PRECISION (mode) > >> || wi::leu_p (iterations_max, word_mode_max))) > >> { > >> if (word_mode_size > GET_MODE_PRECISION (mode)) > >> count = simplify_gen_unary (ZERO_EXTEND, word_mode, count, mode); > >> else > >> count = lowpart_subreg (word_mode, count, mode); > >> PUT_MODE (doloop_reg, word_mode); > >> doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label); > >> } > >> if (! doloop_seq) > >> { > >> if (dump_file) > >> fprintf (dump_file, > >> "Doloop: Target unwilling to use doloop pattern!\n"); > >> return false; > >> } > >> ``` > >> The above code first tries the mode of niter_desc by call > >> targetm.gen_doloop_end > >> to see if the target can generate doloop insns, if fail, then try to use > >> 'word_mode' against gen_doloop_end. > >> > >> > >> >> > >> >> Almost all targets just use Pmode, but there is no such guarantee I > >> >> think, and esp. some targets that do not have machine insns for this > >> >> (but want to generate different code for this anyway) can do pretty much > >> >> anything. > >> >> > >> >> Maybe using just Pmode here is good enough though? > >> > > >> > I think Pmode is a particularly bad choice and I'd prefer word_mode > >> > if we go for any hardcoded mode. s390x for example seems to handle > >> > both SImode and DImode (but names the helper gen_doloop_si64 > >> > for SImode?!). But indeed it looks like somehow querying doloop_end > >> > is going to be difficult since the expander doesn't have any mode, > >> > so we'd have to actually try emit RTL here. > >> > >> Instead of using hardcode mode, maybe we could add a hook for targets to > >> return > >> the preferred mode. > > > > That's a possiblity of course. Like the following which just shows the > > default implementation then (pass in current mode, return a more preferred > > mode or the mode itself) > > > > enum machine_mode > > prefered_doloop_mode (enum machine_mode mode) > > { > > return mode; > > } > > > Yes, thanks! > > Checking current do_loop_end in targets, in general, when do_loop_end requires > SI mode, the target is defining Pmode as SImode and word_mode (from > BITS_PER_WORD > which defaults from UNITS_PER_WORD) is also defined to align with SI mode. > When do_loop_end requires DI mode, the target is defining Pmode as DImode > and word_mode/UNITS_PER_WORD is also defined to align with DI mode. > > So, if aggressively, then by default we may just return word_mode. Note we still have to check whether the prefered mode is valid (passing in TImode but then returning DImode would be wrong). Richard. > BR, > > Jiufu Guo. > > > >> > >> Thanks for those valuable comments! > >> > >> Jiufu Guo > >> > >> > >> > >> > > >> > Richard. > >> > >> > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)