From: Tom de Vries <tdevries@suse.de>
To: Richard Biener <richard.guenther@gmail.com>,
Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns
Date: Mon, 31 Jan 2022 14:16:52 +0100 [thread overview]
Message-ID: <57323305-3384-e128-4627-ad26969a36f3@suse.de> (raw)
In-Reply-To: <CAFiYyc3zDoGU3iRRfXBg-4AyGb4r_Ut7gQUS4VA_g8+ZiLg3KQ@mail.gmail.com>
On 9/17/21 10:08, Richard Biener via Gcc-patches wrote:
> On Mon, Sep 13, 2021 at 4:53 PM Stefan Schulze Frielinghaus
> <stefansf@linux.ibm.com> wrote:
>>
>> On Mon, Sep 06, 2021 at 11:56:21AM +0200, Richard Biener wrote:
>>> On Fri, Sep 3, 2021 at 10:01 AM Stefan Schulze Frielinghaus
>>> <stefansf@linux.ibm.com> wrote:
>>>>
>>>> On Fri, Aug 20, 2021 at 12:35:58PM +0200, Richard Biener wrote:
>>>> [...]
>>>>>>>
>>>>>>> + /* Handle strlen like loops. */
>>>>>>> + if (store_dr == NULL
>>>>>>> + && integer_zerop (pattern)
>>>>>>> + && TREE_CODE (reduction_iv.base) == INTEGER_CST
>>>>>>> + && TREE_CODE (reduction_iv.step) == INTEGER_CST
>>>>>>> + && integer_onep (reduction_iv.step)
>>>>>>> + && (types_compatible_p (TREE_TYPE (reduction_var), size_type_node)
>>>>>>> + || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))))
>>>>>>> + {
>>>>>>>
>>>>>>> I wonder what goes wrong with a larger or smaller wrapping IV type?
>>>>>>> The iteration
>>>>>>> only stops when you load a NUL and the increments just wrap along (you're
>>>>>>> using the pointer IVs to compute the strlen result). Can't you simply truncate?
>>>>>>
>>>>>> I think truncation is enough as long as no overflow occurs in strlen or
>>>>>> strlen_using_rawmemchr.
>>>>>>
>>>>>>> For larger than size_type_node (actually larger than ptr_type_node would matter
>>>>>>> I guess), the argument is that since pointer wrapping would be undefined anyway
>>>>>>> the IV cannot wrap either. Now, the correct check here would IMHO be
>>>>>>>
>>>>>>> TYPE_PRECISION (TREE_TYPE (reduction_var)) < TYPE_PRECISION
>>>>>>> (ptr_type_node)
>>>>>>> || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (pointer-iv-var))
>>>>>>>
>>>>>>> ?
>>>>>>
>>>>>> Regarding the implementation which makes use of rawmemchr:
>>>>>>
>>>>>> We can count at most PTRDIFF_MAX many bytes without an overflow. Thus,
>>>>>> the maximal length we can determine of a string where each character has
>>>>>> size S is PTRDIFF_MAX / S without an overflow. Since an overflow for
>>>>>> ptrdiff type is undefined we have to make sure that if an overflow
>>>>>> occurs, then an overflow occurs for reduction variable, too, and that
>>>>>> this is undefined, too. However, I'm not sure anymore whether we want
>>>>>> to respect overflows in all cases. If TYPE_PRECISION (ptr_type_node)
>>>>>> equals TYPE_PRECISION (ptrdiff_type_node) and an overflow occurs, then
>>>>>> this would mean that a single string consumes more than half of the
>>>>>> virtual addressable memory. At least for architectures where
>>>>>> TYPE_PRECISION (ptrdiff_type_node) == 64 holds, I think it is reasonable
>>>>>> to neglect the case where computing pointer difference may overflow.
>>>>>> Otherwise we are talking about strings with lenghts of multiple
>>>>>> pebibytes. For other architectures we might have to be more precise
>>>>>> and make sure that reduction variable overflows first and that this is
>>>>>> undefined.
>>>>>>
>>>>>> Thus a conservative condition would be (I assumed that the size of any
>>>>>> integral type is a power of two which I'm not sure if this really holds;
>>>>>> IIRC the C standard requires only that the alignment is a power of two
>>>>>> but not necessarily the size so I might need to change this):
>>>>>>
>>>>>> /* Compute precision (reduction_var) < (precision (ptrdiff_type) - 1 - log2 (sizeof (load_type))
>>>>>> or in other words return true if reduction variable overflows first
>>>>>> and false otherwise. */
>>>>>>
>>>>>> static bool
>>>>>> reduction_var_overflows_first (tree reduction_var, tree load_type)
>>>>>> {
>>>>>> unsigned precision_ptrdiff = TYPE_PRECISION (ptrdiff_type_node);
>>>>>> unsigned precision_reduction_var = TYPE_PRECISION (TREE_TYPE (reduction_var));
>>>>>> unsigned size_exponent = wi::exact_log2 (wi::to_wide (TYPE_SIZE_UNIT (load_type)));
>>>>>> return wi::ltu_p (precision_reduction_var, precision_ptrdiff - 1 - size_exponent);
>>>>>> }
>>>>>>
>>>>>> TYPE_PRECISION (ptrdiff_type_node) == 64
>>>>>> || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))
>>>>>> && reduction_var_overflows_first (reduction_var, load_type)
>>>>>>
>>>>>> Regarding the implementation which makes use of strlen:
>>>>>>
>>>>>> I'm not sure what it means if strlen is called for a string with a
>>>>>> length greater than SIZE_MAX. Therefore, similar to the implementation
>>>>>> using rawmemchr where we neglect the case of an overflow for 64bit
>>>>>> architectures, a conservative condition would be:
>>>>>>
>>>>>> TYPE_PRECISION (size_type_node) == 64
>>>>>> || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))
>>>>>> && TYPE_PRECISION (reduction_var) <= TYPE_PRECISION (size_type_node))
>>>>>>
>>>>>> I still included the overflow undefined check for reduction variable in
>>>>>> order to rule out situations where the reduction variable is unsigned
>>>>>> and overflows as many times until strlen(,_using_rawmemchr) overflows,
>>>>>> too. Maybe this is all theoretical nonsense but I'm afraid of uncommon
>>>>>> architectures. Anyhow, while writing this down it becomes clear that
>>>>>> this deserves a comment which I will add once it becomes clear which way
>>>>>> to go.
>>>>>
>>>>> I think all the arguments about objects bigger than half of the address-space
>>>>> also are valid for 32bit targets and thus 32bit size_type_node (or
>>>>> 32bit pointer size).
>>>>> I'm not actually sure what's the canonical type to check against, whether
>>>>> it's size_type_node (Cs size_t), ptr_type_node (Cs void *) or sizetype (the
>>>>> middle-end "offset" type used for all address computations). For weird reasons
>>>>> I'd lean towards 'sizetype' (for example some embedded targets have 24bit
>>>>> pointers but 16bit 'sizetype').
>>>>
>>>> Ok, for the strlen implementation I changed from size_type_node to
>>>> sizetype and assume that no overflow occurs for string objects bigger
>>>> than half of the address space for 32-bit targets and up:
>>>>
>>>> (TYPE_PRECISION (sizetype) >= TYPE_PRECISION (ptr_type_node) - 1
>>>> && TYPE_PRECISION (ptr_type_node) >= 32)
>>>> || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))
>>>> && TYPE_PRECISION (reduction_var) <= TYPE_PRECISION (sizetype))
>>>>
>>>> and similarly for the rawmemchr implementation:
>>>>
>>>> (TYPE_PRECISION (ptrdiff_type_node) == TYPE_PRECISION (ptr_type_node)
>>>> && TYPE_PRECISION (ptrdiff_type_node) >= 32)
>>>> || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))
>>>> && reduction_var_overflows_first (reduction_var, load_type))
>>>>
>>>>>
>>>>>>>
>>>>>>> + if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)))
>>>>>>> + {
>>>>>>> + const char *msg = G_("assuming signed overflow does not occur "
>>>>>>> + "when optimizing strlen like loop");
>>>>>>> + fold_overflow_warning (msg, WARN_STRICT_OVERFLOW_MISC);
>>>>>>> + }
>>>>>>>
>>>>>>> no, please don't add any new strict-overflow warnings ;)
>>>>>>
>>>>>> I just stumbled over code which produces such a warning and thought this
>>>>>> is a hard requirement :D The new patch doesn't contain it anymore.
>>>>>>
>>>>>>>
>>>>>>> The generate_*_builtin routines need some factoring - if you code-generate
>>>>>>> into a gimple_seq you could use gimple_build () which would do the fold_stmt
>>>>>>> (not sure why you do that - you should see to fold the call, not necessarily
>>>>>>> the rest). The replacement of reduction_var and the dumping could be shared.
>>>>>>> There's also GET_MODE_NAME for the printing.
>>>>>>
>>>>>> I wasn't really sure which way to go. Use a gsi, as it is done by
>>>>>> existing generate_* functions, or make use of gimple_seq. Since the
>>>>>> latter uses internally also gsi I thought it is better to stick to gsi
>>>>>> in the first place. Now, after changing to gimple_seq I see the beauty
>>>>>> of it :)
>>>>>>
>>>>>> I created two helper functions generate_strlen_builtin_1 and
>>>>>> generate_reduction_builtin_1 in order to reduce code duplication.
>>>>>>
>>>>>> In function generate_strlen_builtin I changed from using
>>>>>> builtin_decl_implicit (BUILT_IN_STRLEN) to builtin_decl_explicit
>>>>>> (BUILT_IN_STRLEN) since the former could return a NULL pointer. I'm not
>>>>>> sure whether my intuition about the difference between implicit and
>>>>>> explicit builtins is correct. In builtins.def there is a small example
>>>>>> given which I would paraphrase as "use builtin_decl_explicit if the
>>>>>> semantics of the builtin is defined by the C standard; otherwise use
>>>>>> builtin_decl_implicit" but probably my intuition is wrong?
>>>>>>
>>>>>> Beside that I'm not sure whether I really have to call
>>>>>> build_fold_addr_expr which looks superfluous to me since
>>>>>> gimple_build_call can deal with ADDR_EXPR as well as FUNCTION_DECL:
>>>>>>
>>>>>> tree fn = build_fold_addr_expr (builtin_decl_explicit (BUILT_IN_STRLEN));
>>>>>> gimple *fn_call = gimple_build_call (fn, 1, mem);
>>>>>>
>>>>>> However, since it is also used that way in the context of
>>>>>> generate_memset_builtin I didn't remove it so far.
>>>>>>
>>>>>>> I think overall the approach is sound now but the details still need work.
>>>>>>
>>>>>> Once again thank you very much for your review. Really appreciated!
>>>>>
>>>>> The patch lacks a changelog entry / description. It's nice if patches sent
>>>>> out for review are basically the rev as git format-patch produces.
>>>>>
>>>>> The rawmemchr optab needs documenting in md.texi
>>>>
>>>> While writing the documentation in md.texi I realised that other
>>>> instructions expect an address to be a memory operand which is not the
>>>> case for rawmemchr currently. At the moment the address is either an
>>>> SSA_NAME or ADDR_EXPR with a tree pointer type in expand_RAWMEMCHR. As a
>>>> consequence in the backend define_expand rawmemchr<mode> expects a
>>>> register operand and not a memory operand. Would it make sense to build
>>>> a MEM_REF out of SSA_NAME/ADDR_EXPR in expand_RAWMEMCHR? Not sure if
>>>> MEM_REF is supposed to be the canonical form here.
>>>
>>> I suppose the expander could use code similar to what
>>> expand_builtin_memset_args does,
>>> using get_memory_rtx. I suppose that we're using MEM operands because those
>>> can convey things like alias info or alignment info, something which
>>> REG operands cannot
>>> (easily). I wouldn't build a MEM_REF and try to expand that.
>>
>> The new patch contains the following changes:
>>
>> - In expand_RAWMEMCHR I'm using get_memory_rtx now. This means I had to
>> change linkage of get_memory_rtx to extern.
>>
>> - In function generate_strlen_builtin_using_rawmemchr I'm not
>> reconstructing the load type anymore from the base pointer but rather
>> pass it as a parameter from function transform_reduction_loop where we
>> also ensured that it is of integral type. Reconstructing the load
>> type was error prone since e.g. I didn't distinct between
>> pointer_plus_expr or addr_expr. Thus passing the load type should be
>> more solid.
>>
>> Regtested on IBM Z and x86. Ok for mainline?
>
> OK, and sorry for all the repeated delays.
>
I'm running into PR56888 (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 ) on nvptx due to
this, f.i. in gcc/testsuite/gcc.c-torture/execute/builtins/strlen.c,
where gcc/testsuite/gcc.c-torture/execute/builtins/lib/strlen.c contains
a strlen function, with a strlen loop, which is transformed by
pass_loop_distribution into a __builtin_strlen, which is then expanded
into a strlen call, creating a self-recursive function. [ And on nvptx,
that happens to result in a compilation failure, which is how I found
this. ]
According to this (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888#c21 ) comment:
...
-fno-tree-loop-distribute-patterns is the reliable way to not
transform loops into library calls.
...
Then should we have something along the lines of:
...
$ git diff
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 6fe59cd56855..9a211d30cd7e 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -3683,7 +3683,11 @@ loop_distribution::transform_reduction_loop
&& TYPE_PRECISION (ptr_type_node) >= 32)
|| (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
&& TYPE_PRECISION (reduction_var_type) <=
TYPE_PRECISION (sizetype)))
- && builtin_decl_implicit (BUILT_IN_STRLEN))
+ && builtin_decl_implicit (BUILT_IN_STRLEN)
+ && flag_tree_loop_distribute_patterns)
generate_strlen_builtin (loop, reduction_var, load_iv.base,
reduction_iv.base, loc);
else if (direct_optab_handler (rawmemchr_optab, TYPE_MODE
(load_type))
...
?
Or is the comment no longer valid?
Thanks,
- Tom
next prev parent reply other threads:[~2022-01-31 13:16 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-08 12:47 Stefan Schulze Frielinghaus
2021-02-09 8:57 ` Richard Biener
2021-02-14 10:27 ` Stefan Schulze Frielinghaus
2021-02-25 17:01 ` Stefan Schulze Frielinghaus
2021-02-25 23:49 ` Jeff Law
2021-03-02 12:29 ` Richard Biener
2021-03-03 17:17 ` Stefan Schulze Frielinghaus
2021-03-16 17:13 ` Stefan Schulze Frielinghaus
2021-04-08 8:23 ` Stefan Schulze Frielinghaus
2021-05-04 17:25 ` Stefan Schulze Frielinghaus
2021-05-05 9:36 ` Richard Biener
2021-05-05 10:03 ` Richard Biener
2021-05-07 12:32 ` Stefan Schulze Frielinghaus
2021-05-20 9:24 ` Richard Biener
2021-05-20 18:37 ` Stefan Schulze Frielinghaus
2021-06-14 17:26 ` Stefan Schulze Frielinghaus
2021-06-16 14:22 ` Richard Biener
2021-06-25 10:23 ` Stefan Schulze Frielinghaus
2021-08-06 14:02 ` Stefan Schulze Frielinghaus
2021-08-20 10:35 ` Richard Biener
2021-09-03 8:00 ` Stefan Schulze Frielinghaus
2021-09-06 9:56 ` Richard Biener
2021-09-13 14:53 ` Stefan Schulze Frielinghaus
2021-09-17 8:08 ` Richard Biener
2021-10-11 16:02 ` Stefan Schulze Frielinghaus
2022-01-31 13:16 ` Tom de Vries [this message]
2022-01-31 15:00 ` Richard Biener
2022-01-31 16:26 ` [PATCH][ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns Tom de Vries
2022-02-01 7:04 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57323305-3384-e128-4627-ad26969a36f3@suse.de \
--to=tdevries@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
--cc=stefansf@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).