On Mon, Nov 14, 2022 at 8:28 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Christoph Müllner <christoph.muellner@vrull.eu>
> >
> > This patch implements expansions for the cmpstrsi and the cmpstrnsi
> > builtins using Zbb instructions (if available).
> > This allows to inline calls to strcmp() and strncmp().
> >
> > The expansion basically emits a peeled comparison sequence (i.e. a peeled
> > comparison loop) which compares XLEN bits per step if possible.
> >
> > The emitted sequence can be controlled, by setting the maximum number
> > of compared bytes (-mstring-compare-inline-limit).
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> >         prototype.
> >       * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >         macros.
> >       (GEN_EMIT_HELPER2): New helper macros.
> >       (expand_strncmp_zbb_sequence): New function.
> >       (riscv_emit_str_compare_zbb): New function.
> >       (riscv_expand_strn_compare): New function.
> >       * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> >         for strn_compare.
> >       (cmpstrsi): Invoke expansion functions for strn_compare.
> >       * config/riscv/riscv.opt: Add new parameter
> >         '-mstring-compare-inline-limit'.
>
> Presumably the hybrid inline + out of line approach is to capture the
> fact that most strings compare unequal early, then punt out to the
> library if they don't follow that model?  It looks like we're structured
> for that case by peeling iterations rather than having a fully inlined
> approach.  Just want to confirm...
>

Yes, this was inspired by gcc/config/rs6000/rs6000-string.cc
(e.g. expand_strncmp_gpr_sequence).

The current implementation emits an unrolled loop to process up to N
characters.
For longer strings, we do a handover to libc to process the remainder there.
The hand-over implies a call overhead and, of course, a well-optimized
str(n)cmp
implementation would be beneficial (once we have the information in user
space for ifuncs).

We can take this further, but then the following questions pop up:
* how much data processing per loop iteration?
* what about unaligned strings?

Happy to get suggestions/opinions for improvement.


> I was a bit worried about the "readahead" problem that arises when
> reading more than a byte and a NUL is found in the first string.  If
> you're not careful, the readahead of the second string could fault.  But
> it looks like we avoid that by requiring word alignment on both strings.
>

Yes, aligned strings are not affected by the readahead.

I wonder if we should add dynamic tests in case the compiler cannot derive
XLEN-alignment so we capture more cases (e.g. character-arrays have
guaranteed alignment 1, but are allocated with a higher actual alignment on
the stack).


> > +
> > +/* Emit a string comparison sequence using Zbb instruction.
> > +
> > +   OPERANDS[0] is the target (result).
> > +   OPERANDS[1] is the first source.
> > +   OPERANDS[2] is the second source.
> > +   If NO_LENGTH is zero, then:
> > +   OPERANDS[3] is the length.
> > +   OPERANDS[4] is the alignment in bytes.
> > +   If NO_LENGTH is nonzero, then:
> > +   OPERANDS[3] is the alignment in bytes.
>
> Ugh.  I guess it's inevitable unless we want to drop the array and pass
> each element individually (in which case we'd pass a NULL_RTX in the
> case we don't have a length argument).
>

I will split the array into individual rtx arguments as suggested.


> I'd like to give others a chance to chime in here.  Everything looks
> sensible here, but I may have missed something.  So give the other
> maintainers a couple days to chime in before committing.
>
>
> Jeff
>
>