> > Is there a meaningful performance difference on always using the > __riscv_strict_align code path Minimum VLEN is 16 bytes, so for random operations we will average 8 iterations of a bytewise loop vs 2 of a word-oriented one. That said, I'm currently working on a v2 patch that removes the scalar fallback entirely - over a suite of random operation sizes, dropping the word-based loop is more expensive than just not having the fallback at all. So these patterns will all go away and the code will look much more like what's in the ISA manual. > The VLEN arbitrary upper bound and page size limit is also worrisome, as > Andrew has pointed out. I would prefer to either have a generic > implementation > that works without such limits > I realise I have not responded there yet - I'm certainly not ignoring this, but investigating options before I commit; e.g. another option might be to gate the affected code behind a compile time vlen check, and use fault only first loads as Andrew suggests where there was not enough information provided at compile time to prove the approach is safe. These decisions will all become much more straightforward with ifunc support - a generic version for the most common situation and runtime selection of more specific versions would resolve all these issues and also open the gates for people working on widely different implementations to easily provide their own versions of as many or as few of these functions as needed - and I do expect there will be a number of these, since the architecture is super flexible and the ecosystem already looking quite fragmented. Accordingly, I am also investigating what will be involved in getting ifuncs support in place.