On Mon, Nov 14, 2022 at 8:28 PM Jeff Law wrote: > > On 11/13/22 16:05, Christoph Muellner wrote: > > From: Christoph Müllner > > > > This patch implements expansions for the cmpstrsi and the cmpstrnsi > > builtins using Zbb instructions (if available). > > This allows to inline calls to strcmp() and strncmp(). > > > > The expansion basically emits a peeled comparison sequence (i.e. a peeled > > comparison loop) which compares XLEN bits per step if possible. > > > > The emitted sequence can be controlled, by setting the maximum number > > of compared bytes (-mstring-compare-inline-limit). > > > > gcc/ChangeLog: > > > > * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New > > prototype. > > * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper > > macros. > > (GEN_EMIT_HELPER2): New helper macros. > > (expand_strncmp_zbb_sequence): New function. > > (riscv_emit_str_compare_zbb): New function. > > (riscv_expand_strn_compare): New function. > > * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions > > for strn_compare. > > (cmpstrsi): Invoke expansion functions for strn_compare. > > * config/riscv/riscv.opt: Add new parameter > > '-mstring-compare-inline-limit'. > > Presumably the hybrid inline + out of line approach is to capture the > fact that most strings compare unequal early, then punt out to the > library if they don't follow that model? It looks like we're structured > for that case by peeling iterations rather than having a fully inlined > approach. Just want to confirm... > Yes, this was inspired by gcc/config/rs6000/rs6000-string.cc (e.g. expand_strncmp_gpr_sequence). The current implementation emits an unrolled loop to process up to N characters. For longer strings, we do a handover to libc to process the remainder there. The hand-over implies a call overhead and, of course, a well-optimized str(n)cmp implementation would be beneficial (once we have the information in user space for ifuncs). We can take this further, but then the following questions pop up: * how much data processing per loop iteration? * what about unaligned strings? Happy to get suggestions/opinions for improvement. > I was a bit worried about the "readahead" problem that arises when > reading more than a byte and a NUL is found in the first string. If > you're not careful, the readahead of the second string could fault. But > it looks like we avoid that by requiring word alignment on both strings. > Yes, aligned strings are not affected by the readahead. I wonder if we should add dynamic tests in case the compiler cannot derive XLEN-alignment so we capture more cases (e.g. character-arrays have guaranteed alignment 1, but are allocated with a higher actual alignment on the stack). > > + > > +/* Emit a string comparison sequence using Zbb instruction. > > + > > + OPERANDS[0] is the target (result). > > + OPERANDS[1] is the first source. > > + OPERANDS[2] is the second source. > > + If NO_LENGTH is zero, then: > > + OPERANDS[3] is the length. > > + OPERANDS[4] is the alignment in bytes. > > + If NO_LENGTH is nonzero, then: > > + OPERANDS[3] is the alignment in bytes. > > Ugh. I guess it's inevitable unless we want to drop the array and pass > each element individually (in which case we'd pass a NULL_RTX in the > case we don't have a length argument). > I will split the array into individual rtx arguments as suggested. > I'd like to give others a chance to chime in here. Everything looks > sensible here, but I may have missed something. So give the other > maintainers a couple days to chime in before committing. > > > Jeff > >