> > Is there a meaningful performance difference on always using the

> __riscv_strict_align code path


Minimum VLEN is 16 bytes, so for random operations we will average 8
iterations of a bytewise loop vs 2 of a word-oriented one.
That said, I'm currently working on a v2 patch that removes the scalar
fallback entirely - over a suite of random operation sizes, dropping the
word-based loop is more expensive than just not having the fallback at all.
So these patterns will all go away and the code will look much more like
what's in the ISA manual.


> The VLEN arbitrary upper bound and page size limit is also worrisome, as
> Andrew has pointed out.  I would prefer to either have a generic
> implementation
> that works without such limits
>

I realise I have not responded there yet - I'm certainly not ignoring this,
but investigating options before I commit; e.g. another option might be to
gate the affected code behind a compile time vlen check, and use fault only
first loads as Andrew suggests where there was not enough information
provided at compile time to prove the approach is safe.

These decisions will all become much more straightforward with ifunc
support - a generic version for the most common situation and runtime
selection of more specific versions would resolve all these issues and also
open the gates for people working on widely different implementations to
easily provide their own versions of as many or as few of these functions
as needed - and I do expect there will be a number of these, since the
architecture is super flexible and the ecosystem already looking quite
fragmented. Accordingly, I am also investigating what will be involved in
getting ifuncs support in place.