From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id 94A0F385782F for ; Mon, 6 Sep 2021 09:56:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 94A0F385782F Received: by mail-ed1-x536.google.com with SMTP id g21so8787531edw.4 for ; Mon, 06 Sep 2021 02:56:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lW8mDJ+iShRgNu2Msz4yY3wrrjHIsWu0pQFVD7zFa6U=; b=WpldNmTO4MUoGqHjvw4R+DhnwPK2Y/fSRpq+R6jnmZkL/g4p19+MIBz+xk3gfi1MPT E3NMjIEFerLkO0IVfZVLP0RYUs/gY57AuA5j+sT6w4rElwzlMds+vmUlXi/3kQ2u0l/W jk/lyDwKOT1DKAvc2LhEkGgrcoWpKGIVa4yslEalU4SlOf7ZdIdviPyEzpqi3yfw8LfI mwV2juu5Ny3ntEdCxLcAaSBuai5Qg6diUY79I0coaiIuKnyYhRg6uJHvi0kXwO55tZUD MrpTt7kCgv5vhMsfqKjXQjg58rg0YVWxAoklAGuDqdkXUwMU0ZVpeG+TILT/RtSkBZPu CRUQ== X-Gm-Message-State: AOAM533YUzxghhpNYMJLvnotS/lUhnA1yH/hVO4P4tghGk7iMejY2mow X22YiD1x470uOahGmSg3tWapVPvRr3G8fd+OgoM= X-Google-Smtp-Source: ABdhPJw2i8Dz9XsHO5pP7nbn3KCOvq8C4Vb4BMLiWtjVL+uWZVbR0kcJ+xfzj6gnEAZAeQQNlYGMKaa98ubLpF5XEdc= X-Received: by 2002:a50:ed0b:: with SMTP id j11mr12490412eds.97.1630922192508; Mon, 06 Sep 2021 02:56:32 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 6 Sep 2021 11:56:21 +0200 Message-ID: Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns To: Stefan Schulze Frielinghaus Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Sep 2021 09:56:35 -0000 On Fri, Sep 3, 2021 at 10:01 AM Stefan Schulze Frielinghaus wrote: > > On Fri, Aug 20, 2021 at 12:35:58PM +0200, Richard Biener wrote: > [...] > > > > > > > > + /* Handle strlen like loops. */ > > > > + if (store_dr == NULL > > > > + && integer_zerop (pattern) > > > > + && TREE_CODE (reduction_iv.base) == INTEGER_CST > > > > + && TREE_CODE (reduction_iv.step) == INTEGER_CST > > > > + && integer_onep (reduction_iv.step) > > > > + && (types_compatible_p (TREE_TYPE (reduction_var), size_type_node) > > > > + || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)))) > > > > + { > > > > > > > > I wonder what goes wrong with a larger or smaller wrapping IV type? > > > > The iteration > > > > only stops when you load a NUL and the increments just wrap along (you're > > > > using the pointer IVs to compute the strlen result). Can't you simply truncate? > > > > > > I think truncation is enough as long as no overflow occurs in strlen or > > > strlen_using_rawmemchr. > > > > > > > For larger than size_type_node (actually larger than ptr_type_node would matter > > > > I guess), the argument is that since pointer wrapping would be undefined anyway > > > > the IV cannot wrap either. Now, the correct check here would IMHO be > > > > > > > > TYPE_PRECISION (TREE_TYPE (reduction_var)) < TYPE_PRECISION > > > > (ptr_type_node) > > > > || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (pointer-iv-var)) > > > > > > > > ? > > > > > > Regarding the implementation which makes use of rawmemchr: > > > > > > We can count at most PTRDIFF_MAX many bytes without an overflow. Thus, > > > the maximal length we can determine of a string where each character has > > > size S is PTRDIFF_MAX / S without an overflow. Since an overflow for > > > ptrdiff type is undefined we have to make sure that if an overflow > > > occurs, then an overflow occurs for reduction variable, too, and that > > > this is undefined, too. However, I'm not sure anymore whether we want > > > to respect overflows in all cases. If TYPE_PRECISION (ptr_type_node) > > > equals TYPE_PRECISION (ptrdiff_type_node) and an overflow occurs, then > > > this would mean that a single string consumes more than half of the > > > virtual addressable memory. At least for architectures where > > > TYPE_PRECISION (ptrdiff_type_node) == 64 holds, I think it is reasonable > > > to neglect the case where computing pointer difference may overflow. > > > Otherwise we are talking about strings with lenghts of multiple > > > pebibytes. For other architectures we might have to be more precise > > > and make sure that reduction variable overflows first and that this is > > > undefined. > > > > > > Thus a conservative condition would be (I assumed that the size of any > > > integral type is a power of two which I'm not sure if this really holds; > > > IIRC the C standard requires only that the alignment is a power of two > > > but not necessarily the size so I might need to change this): > > > > > > /* Compute precision (reduction_var) < (precision (ptrdiff_type) - 1 - log2 (sizeof (load_type)) > > > or in other words return true if reduction variable overflows first > > > and false otherwise. */ > > > > > > static bool > > > reduction_var_overflows_first (tree reduction_var, tree load_type) > > > { > > > unsigned precision_ptrdiff = TYPE_PRECISION (ptrdiff_type_node); > > > unsigned precision_reduction_var = TYPE_PRECISION (TREE_TYPE (reduction_var)); > > > unsigned size_exponent = wi::exact_log2 (wi::to_wide (TYPE_SIZE_UNIT (load_type))); > > > return wi::ltu_p (precision_reduction_var, precision_ptrdiff - 1 - size_exponent); > > > } > > > > > > TYPE_PRECISION (ptrdiff_type_node) == 64 > > > || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)) > > > && reduction_var_overflows_first (reduction_var, load_type) > > > > > > Regarding the implementation which makes use of strlen: > > > > > > I'm not sure what it means if strlen is called for a string with a > > > length greater than SIZE_MAX. Therefore, similar to the implementation > > > using rawmemchr where we neglect the case of an overflow for 64bit > > > architectures, a conservative condition would be: > > > > > > TYPE_PRECISION (size_type_node) == 64 > > > || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)) > > > && TYPE_PRECISION (reduction_var) <= TYPE_PRECISION (size_type_node)) > > > > > > I still included the overflow undefined check for reduction variable in > > > order to rule out situations where the reduction variable is unsigned > > > and overflows as many times until strlen(,_using_rawmemchr) overflows, > > > too. Maybe this is all theoretical nonsense but I'm afraid of uncommon > > > architectures. Anyhow, while writing this down it becomes clear that > > > this deserves a comment which I will add once it becomes clear which way > > > to go. > > > > I think all the arguments about objects bigger than half of the address-space > > also are valid for 32bit targets and thus 32bit size_type_node (or > > 32bit pointer size). > > I'm not actually sure what's the canonical type to check against, whether > > it's size_type_node (Cs size_t), ptr_type_node (Cs void *) or sizetype (the > > middle-end "offset" type used for all address computations). For weird reasons > > I'd lean towards 'sizetype' (for example some embedded targets have 24bit > > pointers but 16bit 'sizetype'). > > Ok, for the strlen implementation I changed from size_type_node to > sizetype and assume that no overflow occurs for string objects bigger > than half of the address space for 32-bit targets and up: > > (TYPE_PRECISION (sizetype) >= TYPE_PRECISION (ptr_type_node) - 1 > && TYPE_PRECISION (ptr_type_node) >= 32) > || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)) > && TYPE_PRECISION (reduction_var) <= TYPE_PRECISION (sizetype)) > > and similarly for the rawmemchr implementation: > > (TYPE_PRECISION (ptrdiff_type_node) == TYPE_PRECISION (ptr_type_node) > && TYPE_PRECISION (ptrdiff_type_node) >= 32) > || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)) > && reduction_var_overflows_first (reduction_var, load_type)) > > > > > > > > > > > + if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))) > > > > + { > > > > + const char *msg = G_("assuming signed overflow does not occur " > > > > + "when optimizing strlen like loop"); > > > > + fold_overflow_warning (msg, WARN_STRICT_OVERFLOW_MISC); > > > > + } > > > > > > > > no, please don't add any new strict-overflow warnings ;) > > > > > > I just stumbled over code which produces such a warning and thought this > > > is a hard requirement :D The new patch doesn't contain it anymore. > > > > > > > > > > > The generate_*_builtin routines need some factoring - if you code-generate > > > > into a gimple_seq you could use gimple_build () which would do the fold_stmt > > > > (not sure why you do that - you should see to fold the call, not necessarily > > > > the rest). The replacement of reduction_var and the dumping could be shared. > > > > There's also GET_MODE_NAME for the printing. > > > > > > I wasn't really sure which way to go. Use a gsi, as it is done by > > > existing generate_* functions, or make use of gimple_seq. Since the > > > latter uses internally also gsi I thought it is better to stick to gsi > > > in the first place. Now, after changing to gimple_seq I see the beauty > > > of it :) > > > > > > I created two helper functions generate_strlen_builtin_1 and > > > generate_reduction_builtin_1 in order to reduce code duplication. > > > > > > In function generate_strlen_builtin I changed from using > > > builtin_decl_implicit (BUILT_IN_STRLEN) to builtin_decl_explicit > > > (BUILT_IN_STRLEN) since the former could return a NULL pointer. I'm not > > > sure whether my intuition about the difference between implicit and > > > explicit builtins is correct. In builtins.def there is a small example > > > given which I would paraphrase as "use builtin_decl_explicit if the > > > semantics of the builtin is defined by the C standard; otherwise use > > > builtin_decl_implicit" but probably my intuition is wrong? > > > > > > Beside that I'm not sure whether I really have to call > > > build_fold_addr_expr which looks superfluous to me since > > > gimple_build_call can deal with ADDR_EXPR as well as FUNCTION_DECL: > > > > > > tree fn = build_fold_addr_expr (builtin_decl_explicit (BUILT_IN_STRLEN)); > > > gimple *fn_call = gimple_build_call (fn, 1, mem); > > > > > > However, since it is also used that way in the context of > > > generate_memset_builtin I didn't remove it so far. > > > > > > > I think overall the approach is sound now but the details still need work. > > > > > > Once again thank you very much for your review. Really appreciated! > > > > The patch lacks a changelog entry / description. It's nice if patches sent > > out for review are basically the rev as git format-patch produces. > > > > The rawmemchr optab needs documenting in md.texi > > While writing the documentation in md.texi I realised that other > instructions expect an address to be a memory operand which is not the > case for rawmemchr currently. At the moment the address is either an > SSA_NAME or ADDR_EXPR with a tree pointer type in expand_RAWMEMCHR. As a > consequence in the backend define_expand rawmemchr expects a > register operand and not a memory operand. Would it make sense to build > a MEM_REF out of SSA_NAME/ADDR_EXPR in expand_RAWMEMCHR? Not sure if > MEM_REF is supposed to be the canonical form here. I suppose the expander could use code similar to what expand_builtin_memset_args does, using get_memory_rtx. I suppose that we're using MEM operands because those can convey things like alias info or alignment info, something which REG operands cannot (easily). I wouldn't build a MEM_REF and try to expand that. > > > > +} > > + > > +static bool > > +reduction_var_overflows_first (tree reduction_var, tree load_type) > > +{ > > + unsigned precision_ptrdiff = TYPE_PRECISION (ptrdiff_type_node); > > > > this function needs a comment. > > Done. > > > > > + if (stmt_has_scalar_dependences_outside_loop (loop, phi)) > > + { > > + if (reduction_stmt) > > + return false; > > > > you leak bbs here and elsewhere where you early exit the function. > > In fact you fail to free it at all. > > Whoopsy. I factored the whole loop out into static function > determine_reduction_stmt in order to deal with all early exits. > > > > > Otherwise the patch looks good - thanks for all the improvements. > > > > What I do wonder is > > > > + tree fn = build_fold_addr_expr (builtin_decl_explicit (BUILT_IN_STRLEN)); > > + gimple *fn_call = gimple_build_call (fn, 1, mem); > > > > using builtin_decl_explicit means that in a TU where strlen is neither > > declared nor used we can end up emitting calls to it. For memcpy/memmove > > that's usually OK since we require those to be present even in a > > freestanding environment. But I'm not sure about strlen here so I'd > > lean towards using builtin_decl_implicit and checking that for NULL which > > IIRC should prevent emitting strlen when it's not declared and maybe even > > if it's declared but not used. All other uses that generate STRLEN > > use that at least. > > Thanks for clarification. I changed it back to builtin_decl_implicit > and check for null pointers. Thanks, Richard. > Thanks, > Stefan