On 11/14/22 14:49, Christoph Müllner wrote: > > > We can take this further, but then the following questions pop up: > * how much data processing per loop iteration? I have no idea because I don't have any real data.  Last time I gathered any data on this issue was circa 1988 :-) > * what about unaligned strings? I'd punt.  I don't think we can depend on having a high performance unaligned access.  You could do a dynamic check of alignment, but you'd really need to know that they're aligned often enough that the dynamic check can often be recovered. > > Happy to get suggestions/opinions for improvement. I think this is pretty good without additional data that would indicate that handling unaligned cases or a different number of loop peels would be a notable improvement. Jeff