public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [Question] New mmap64 syscall?
@ 2016-12-06 18:55 Yury Norov
  2016-12-06 21:21 ` Arnd Bergmann
  2016-12-07 13:24 ` Florian Weimer
  0 siblings, 2 replies; 12+ messages in thread
From: Yury Norov @ 2016-12-06 18:55 UTC (permalink / raw)
  To: libc-alpha, linux-arch, linux-kernel
  Cc: Catalin Marinas, szabolcs.nagy, heiko.carstens, cmetcalf,
	philipp.tomsich, joseph, zhouchengming1, Prasun.Kapoor, agraf,
	geert, kilobyte, manuel.montezelo, arnd, pinskia, linyongting,
	klimov.linux, broonie, bamvor.zhangjian, linux-arm-kernel,
	maxim.kuvyrkov, Nathan_Lynch, schwidefsky, davem,
	christoph.muellner

Hi all,

(Sorry if there is similar discussion, and I missed it. I didn't
find something in LKML in last half a year.)

In aarch64/ilp32 discussion Catalin wondered why we don't pass offset
in mmap() as 64-bit value (in 2 registers if needed). Looking at kernel
code I found that there's no generic interface for it. But almost all
architectures provide their own implementations, like this:

SYSCALL_DEFINE6(mips_mmap, unsigned long, addr, unsigned long, len,
                unsigned long, prot, unsigned long, flags, unsigned long,
                fd, off_t, offset)
{
        unsigned long result;

        result = -EINVAL;
        if (offset & ~PAGE_MASK)
                goto out;

        result = sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);

out:
        return result;
}

On glibc side things are even worse. There's no mmap() implementation
that allows to pass 64-bit offset in 32-bit architecture. mmap64() which 
is supposed to do this is simply broken:
void *
__mmap64 (void *addr, size_t len, int prot, int flags, int fd, off64_t
                offset)
{
        [...]
        void *result;
        result = (void *) INLINE_SYSCALL (mmap2, 6, addr,
                                         len, prot, flags, fd,
                                         (off_t) (offset >> page_shift));
        return result;
}

It explicitly declares offset as 64-bit value, but casts it to 32-bit
before passing to the kernel, which is wrong for me. Even if arch has
64-bit off_t, like aarch64/ilp32, the cast will take place because
offset is passed in a single register, which is 32-bit.

I see 3 solutions for my problem:
1. Reuse aarch64/lp64 mmap code for ilp32 in glibc, but wrap offset with
SYSCALL_LL64() macro - which converts offset to the pair for 32-bit
ports. This is simple but local solution. And most probably it's enough.

2. Add new flag to mmap, like MAP_OFFSET_IN_PAIR. This will also work.
The problem here is that there are too much arches that implement
their custom sys_mmap2(). And, of course, this type of flags is
looking ugly.

3. Introduce new mmap64() syscall like this:
sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
(The pointer here because otherwise we have 7 args, if simply pass off_hi and
off_lo in registers.)

With new 64-bit interface we can deprecate mmap2(), and generalize all
implementations in kernel.

I think we can discuss it because 64-bit is the default size for off_t 
in all new 32-bit architectures. So generic solution may take place.

The last question here is how important to support offsets bigger than
2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
which are looking like main aarch64/ilp32 users. If no, we can leave
things as is, and just do nothing.

Yury

On Mon, Dec 05, 2016 at 05:12:43PM +0000, Catalin Marinas wrote:
> On Fri, Oct 21, 2016 at 11:33:10PM +0300, Yury Norov wrote:
> > off_t is  passed in register pair just like in aarch32.
> > In this patch corresponding aarch32 handlers are shared to
> > ilp32 code.
> [...]
> > +/*
> > + * Note: off_4k (w5) is always in units of 4K. If we can't do the
> > + * requested offset because it is not page-aligned, we return -EINVAL.
> > + */
> > +ENTRY(compat_sys_mmap2_wrapper)
> > +#if PAGE_SHIFT > 12
> > +	tst	w5, #~PAGE_MASK >> 12
> > +	b.ne	1f
> > +	lsr	w5, w5, #PAGE_SHIFT - 12
> > +#endif
> > +	b	sys_mmap_pgoff
> > +1:	mov	x0, #-EINVAL
> > +	ret
> > +ENDPROC(compat_sys_mmap2_wrapper)
> 
> For compat sys_mmap2, the pgoff argument is in multiples of 4K. This was
> traditionally used for architectures where off_t is 32-bit to allow
> mapping files to 2^44.
> 
> Since off_t is 64-bit with AArch64/ILP32, should we just pass the off_t
> as a 64-bit value in two different registers (w5 and w6)?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
  2016-12-06 18:55 [Question] New mmap64 syscall? Yury Norov
@ 2016-12-06 21:21 ` Arnd Bergmann
  2016-12-07 10:35   ` Yury Norov
  2016-12-07 13:24 ` Florian Weimer
  1 sibling, 1 reply; 12+ messages in thread
From: Arnd Bergmann @ 2016-12-06 21:21 UTC (permalink / raw)
  To: Yury Norov
  Cc: libc-alpha, linux-arch, linux-kernel, Catalin Marinas,
	szabolcs.nagy, heiko.carstens, cmetcalf, philipp.tomsich, joseph,
	zhouchengming1, Prasun.Kapoor, agraf, geert, kilobyte,
	manuel.montezelo, pinskia, linyongting, klimov.linux, broonie,
	bamvor.zhangjian, linux-arm-kernel, maxim.kuvyrkov, Nathan_Lynch,
	schwidefsky, davem, christoph.muellner

On Wednesday, December 7, 2016 12:24:40 AM CET Yury Norov wrote:

> I see 3 solutions for my problem:
> 1. Reuse aarch64/lp64 mmap code for ilp32 in glibc, but wrap offset with
> SYSCALL_LL64() macro - which converts offset to the pair for 32-bit
> ports. This is simple but local solution. And most probably it's enough.

I wouldn't want arm64 to be different from all other architectures
here for the 32-bit API. The mmap() API used to be done entirely
in architecture specific code, while nowadays at least new architectures
use something resembling sys_mmap_pgoff(). I think that was originally
introduced to be the default API for 32-bit architectures, but it
failed to address architectures with variable page sizes.

> 2. Add new flag to mmap, like MAP_OFFSET_IN_PAIR. This will also work.
> The problem here is that there are too much arches that implement
> their custom sys_mmap2(). And, of course, this type of flags is
> looking ugly.

Right, better not touch make complicate it further. The other problem
is that mmap2() already has six argument and on most architectures
that is the limit for the number of syscall arguments, so you
cannot add another argument for the upper half.

> 3. Introduce new mmap64() syscall like this:
> sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
> (The pointer here because otherwise we have 7 args, if simply pass off_hi and
> off_lo in registers.)

This wouldn't have to be a pair, just a pointer to a 64-bit number.

> With new 64-bit interface we can deprecate mmap2(), and generalize all
> implementations in kernel.
> 
> I think we can discuss it because 64-bit is the default size for off_t 
> in all new 32-bit architectures. So generic solution may take place.
> 
> The last question here is how important to support offsets bigger than
> 2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
> which are looking like main aarch64/ilp32 users. If no, we can leave
> things as is, and just do nothing.

If there is a use case for larger than 16TB offsets, we should add
the call on all architectures, probably using your approach 3. I don't
think that we should treat it as anything special for arm64 though.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
  2016-12-06 21:21 ` Arnd Bergmann
@ 2016-12-07 10:35   ` Yury Norov
  2016-12-07 11:07     ` Dr. Philipp Tomsich
  0 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2016-12-07 10:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: libc-alpha, linux-arch, linux-kernel, Catalin Marinas,
	szabolcs.nagy, heiko.carstens, cmetcalf, philipp.tomsich, joseph,
	zhouchengming1, Prasun.Kapoor, agraf, geert, kilobyte,
	manuel.montezelo, pinskia, linyongting, klimov.linux, broonie,
	bamvor.zhangjian, linux-arm-kernel, maxim.kuvyrkov, Nathan_Lynch,
	schwidefsky, davem, christoph.muellner

On Tue, Dec 06, 2016 at 10:20:20PM +0100, Arnd Bergmann wrote:
> On Wednesday, December 7, 2016 12:24:40 AM CET Yury Norov wrote:
> > 3. Introduce new mmap64() syscall like this:
> > sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
> > (The pointer here because otherwise we have 7 args, if simply pass off_hi and
> > off_lo in registers.)
> 
> This wouldn't have to be a pair, just a pointer to a 64-bit number.
> 
> > With new 64-bit interface we can deprecate mmap2(), and generalize all
> > implementations in kernel.
> > 
> > I think we can discuss it because 64-bit is the default size for off_t 
> > in all new 32-bit architectures. So generic solution may take place.
> > 
> > The last question here is how important to support offsets bigger than
> > 2^44 on 32-bit machines in practice? It may be a case for ARM64 servers,
> > which are looking like main aarch64/ilp32 users. If no, we can leave
> > things as is, and just do nothing.
> 
> If there is a use case for larger than 16TB offsets, we should add
> the call on all architectures, probably using your approach 3. I don't
> think that we should treat it as anything special for arm64 though.

From this point of view, 16+TB offset is a matter of 16+TB storage,
and it's more than real. The other consideration to add it is that
we have 64-bit support for offsets in syscalls like sys_llseek().
So mmap64() will simply extend this support.

I can prepare this patch. Some implementation details I'd like to
clarify:
Syscall declaration:
SYSCALL_DEFINE6(mmap64, unsigned long, addr, unsigned long, len,
                unsigned long, prot, unsigned long, flags,
                unsigned long, fd, unsigned long long *, offset);

sys_mmap64() deprecates sys_mmap2(), and __ARCH_WANT_MMAP2 is
introduced to keep it enabled for all existing architectures.
All modern arches (aarch64/ilp32 is the first candidate) will have
mmap64() only. The example is set/getrlimit() or renameat() drop
patches (b0da6d44).
                                
On GLIBC side, __OFF_T_MATCHES_OFF64_t will wire mmap() from
linux/generic/wordsize32/mmap.c to mmap64() from linux/mmap64.c. 

mmap64() will first try __NR_mmap64, and if not defined, or ENOSYS
is returned, __NR_mmap2 will be called. This is to let userspace that
supports both mmap2() and mmap64() have full 64-bit offset support, not
44-bit one.

For __NR_mmap2 case, I'd also add the check against offsets more than
2^44, and set errno to EOVERFLOW in that case.

Any thoughts?

Yury.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
  2016-12-07 10:35   ` Yury Norov
@ 2016-12-07 11:07     ` Dr. Philipp Tomsich
  2016-12-07 12:40       ` Yury Norov
  0 siblings, 1 reply; 12+ messages in thread
From: Dr. Philipp Tomsich @ 2016-12-07 11:07 UTC (permalink / raw)
  To: Yury Norov
  Cc: Arnd Bergmann, libc-alpha, linux-arch, LKML, Catalin Marinas,
	szabolcs.nagy, heiko.carstens, cmetcalf, Joseph S. Myers,
	zhouchengming1, Kapoor, Prasun, Alexander Graf, geert, kilobyte,
	manuel.montezelo, Andrew Pinski, linyongting, Alexey Klimov,
	broonie, Zhangjian (Bamvor),
	linux-arm-kernel, Maxim Kuvyrkov, Nathan_Lynch, schwidefsky,
	davem, christoph.muellner

[Resend, as my mail-client had insisted on using the wrong MIME type…]

> On 07 Dec 2016, at 11:34, Yury Norov <ynorov@caviumnetworks.com> wrote:
> 
>> If there is a use case for larger than 16TB offsets, we should add
>> the call on all architectures, probably using your approach 3. I don't
>> think that we should treat it as anything special for arm64 though.
> 
> From this point of view, 16+TB offset is a matter of 16+TB storage,
> and it's more than real. The other consideration to add it is that
> we have 64-bit support for offsets in syscalls like sys_llseek().
> So mmap64() will simply extend this support.

I believe the question is rather if the 16TB offset is a real use-case for ILP32.  This
seems to bring the discussion full-circle, as this would indicate that 64bit is the 
preferred bit-width for all sizes, offsets, etc. throughout all filesystem-related calls 
(i.e. stat, seek, etc.).

But if that is the case, then we should have gone with 64bit arguments in a single
register for our ILP32 definition on AArch64.

In other words: Why not keep ILP32 simple an ask users that need a 16TB+ offset
to use LP64? It seems much more consistent with the other choices takes so far.

—Phil.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
  2016-12-07 11:07     ` Dr. Philipp Tomsich
@ 2016-12-07 12:40       ` Yury Norov
       [not found]         ` <20161207163210.GB31779@e104818-lin.cambridge.arm.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2016-12-07 12:40 UTC (permalink / raw)
  To: Dr.Philipp Tomsich
  Cc: Arnd Bergmann, libc-alpha, linux-arch, LKML, Catalin Marinas,
	szabolcs.nagy, heiko.carstens, cmetcalf, Joseph S. Myers,
	zhouchengming1, Kapoor, Prasun, Alexander Graf, geert, kilobyte,
	manuel.montezelo, Andrew Pinski, linyongting, Alexey Klimov,
	broonie, Zhangjian (Bamvor),
	linux-arm-kernel, Maxim Kuvyrkov, Nathan_Lynch, schwidefsky,
	davem, christoph.muellner

Hi Philipp,

On Wed, Dec 07, 2016 at 12:07:24PM +0100, Dr.Philipp Tomsich wrote:
> [Resend, as my mail-client had insisted on using the wrong MIME type…]
> 
> > On 07 Dec 2016, at 11:34, Yury Norov <ynorov@caviumnetworks.com> wrote:
> > 
> >> If there is a use case for larger than 16TB offsets, we should add
> >> the call on all architectures, probably using your approach 3. I don't
> >> think that we should treat it as anything special for arm64 though.
> > 
> > From this point of view, 16+TB offset is a matter of 16+TB storage,
> > and it's more than real. The other consideration to add it is that
> > we have 64-bit support for offsets in syscalls like sys_llseek().
> > So mmap64() will simply extend this support.
> 
> I believe the question is rather if the 16TB offset is a real use-case for ILP32.

This is not for ilp32, but for all 32-bit architectures - both native
and compat. And because the scope is so generic, I think it's the
strong reason for us to support true 64-bit offset in mmap().

> This seems to bring the discussion full-circle, as this would indicate that 64bit is the 
> preferred bit-width for all sizes, offsets, etc. throughout all filesystem-related calls 
> (i.e. stat, seek, etc.).

AARCH64/ILP32 (and all new arches) exposes ino_t, off_t, blkcnt_t,
fsblkcnt_t, fsfilcnt_t and rlim_t as 64-bit types. (Size_t should
be 32-bit of course, because it's the same lengths as pointer.)

It allows to make syscalls that pass it support 64-bit values, refer
Documentation/arm64/ilp32.txt for details. Stat and seek are both
supporting 64-bit types. From this point of view, mmap() is the (only?)
exception in current ILP32 ABI.

> But if that is the case, then we should have gone with 64bit arguments in a single
> register for our ILP32 definition on AArch64.
 
There are 2 unrelated matters - the size of types, and the size of
register. Most of 32-bit architectures has hardware limitation on
register size (consider aarch32). And it doesn't mean that they are
forced to stuck with 32-bit off_t etc. This is still opened question
how to pass 64-bit parameters in aarch64/ilp32 because there we have
the choice (the reason why it's RFC). If you have new ideas - welcome
to that discussion. This topic also covers architectures that has to
pass 64-bit parameters in a pair.

> In other words: Why not keep ILP32 simple an ask users that need a 16TB+ offset
> to use LP64? It seems much more consistent with the other choices takes so far.

If user can switch to lp64, he doesn't need ilp32 at all, right? :)
Also, I don't understand how true 64-bit offset in mmap64() would
complicate this port.

Yury

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
  2016-12-06 18:55 [Question] New mmap64 syscall? Yury Norov
  2016-12-06 21:21 ` Arnd Bergmann
@ 2016-12-07 13:24 ` Florian Weimer
  2016-12-07 15:49   ` Yury Norov
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2016-12-07 13:24 UTC (permalink / raw)
  To: Yury Norov, libc-alpha, linux-arch, linux-kernel
  Cc: Catalin Marinas, szabolcs.nagy, heiko.carstens, cmetcalf,
	philipp.tomsich, joseph, zhouchengming1, Prasun.Kapoor, agraf,
	geert, kilobyte, manuel.montezelo, arnd, pinskia, linyongting,
	klimov.linux, broonie, bamvor.zhangjian, linux-arm-kernel,
	maxim.kuvyrkov, Nathan_Lynch, schwidefsky, davem,
	christoph.muellner

On 12/06/2016 07:54 PM, Yury Norov wrote:
> 3. Introduce new mmap64() syscall like this:
> sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
> (The pointer here because otherwise we have 7 args, if simply pass off_hi and
> off_lo in registers.)

I would prefer a batched mmap/munmap/mremap/mprotect/madvise interface, 
so that VM changes can be coalesced and the output reduced.  This 
interface could then be used to implement mmap on 32-bit architectures 
as well because the offset restrictions would not apply there.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
  2016-12-07 13:24 ` Florian Weimer
@ 2016-12-07 15:49   ` Yury Norov
  2016-12-08 15:47     ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2016-12-07 15:49 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, linux-arch, linux-kernel, Catalin Marinas,
	szabolcs.nagy, heiko.carstens, cmetcalf, philipp.tomsich, joseph,
	zhouchengming1, Prasun.Kapoor, agraf, geert, kilobyte,
	manuel.montezelo, arnd, pinskia, linyongting, klimov.linux,
	broonie, bamvor.zhangjian, linux-arm-kernel, maxim.kuvyrkov,
	Nathan_Lynch, schwidefsky, davem, christoph.muellner

On Wed, Dec 07, 2016 at 02:23:55PM +0100, Florian Weimer wrote:
> On 12/06/2016 07:54 PM, Yury Norov wrote:
> >3. Introduce new mmap64() syscall like this:
> >sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
> >(The pointer here because otherwise we have 7 args, if simply pass off_hi and
> >off_lo in registers.)
> 
> I would prefer a batched mmap/munmap/mremap/mprotect/madvise interface, so
> that VM changes can be coalesced and the output reduced.  This interface
> could then be used to implement mmap on 32-bit architectures as well because
> the offset restrictions would not apply there.

Hi Florian,

I frankly don't understand what you mean, All syscalls you mentioned
doesn't take off_t or other 64-bit arguments. 'VM changes' - virtual
memory? If so, I don't see any changes in VM with this approach, just
correct handling of big offsets.

> This interface
> could then be used to implement mmap on 32-bit architectures as well 

This is for 32-bit architectures only. 64 bit arches use
sysdeps/unix/sysv/linux/wordsize-64/mmap.c for both mmap and mmap64,
and they don't need that tricks with off_t. Or you meaning to switch
64-bit mmap to this interface?

Please explain what you mean in details.

Yury

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
       [not found]         ` <20161207163210.GB31779@e104818-lin.cambridge.arm.com>
@ 2016-12-07 16:43           ` Dr. Philipp Tomsich
  2016-12-07 21:32             ` Arnd Bergmann
  0 siblings, 1 reply; 12+ messages in thread
From: Dr. Philipp Tomsich @ 2016-12-07 16:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Yury Norov, Arnd Bergmann, libc-alpha, linux-arch, LKML,
	szabolcs.nagy, heiko.carstens, cmetcalf, Joseph S. Myers,
	zhouchengming1, Kapoor, Prasun, Alexander Graf, geert, kilobyte,
	manuel.montezelo, Andrew Pinski, linyongting, Alexey Klimov,
	broonie, Zhangjian (Bamvor),
	linux-arm-kernel, Maxim Kuvyrkov, Nathan Lynch,
	Martin Schwidefsky, davem, christoph.muellner

Catalin,

> On 07 Dec 2016, at 17:32, Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
>>> In other words: Why not keep ILP32 simple an ask users that need a 16TB+ offset
>>> to use LP64? It seems much more consistent with the other choices takes so far.
>> 
>> If user can switch to lp64, he doesn't need ilp32 at all, right? :)
>> Also, I don't understand how true 64-bit offset in mmap64() would
>> complicate this port.
> 
> It's more like the user wanting a quick transition from code that was
> only ever compiled for AArch32 (or other 32-bit architecture) with a
> goal of full LP64 transition on the long run. I have yet to see
> convincing benchmarks showing ILP32 as an advantage over LP64 (of
> course, I hear the argument of reading a pointer a loop is twice as fast
> with a half-size pointer but I don't consider such benchmarks relevant).

Most of the performance advantage in benchmarks comes from a reduction
in the size of data-structures and/or tighter packing of arrays.  In other words,
we can make slightly better use of the caches and push the memory subsystem
a little further when running multiple instances of benchmarks.

Most of these advantages should eventually go away, when struct-reorg makes
it way into the compiler. That said, it’s a marginal (but real) improvement for a
subset of SPEC.

In the real world, the importance of ILP32 as an aid to transition legacy code
that is not 64bit clean… and this should drive the ILP32 discussion. That we
get a boost in our SPEC scores is just a nice extra that we get from it ;-)


Regards,
Philipp.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
  2016-12-07 16:43           ` Dr. Philipp Tomsich
@ 2016-12-07 21:32             ` Arnd Bergmann
  0 siblings, 0 replies; 12+ messages in thread
From: Arnd Bergmann @ 2016-12-07 21:32 UTC (permalink / raw)
  To: Dr. Philipp Tomsich
  Cc: Catalin Marinas, Yury Norov, libc-alpha, linux-arch, LKML,
	szabolcs.nagy, heiko.carstens, cmetcalf, Joseph S. Myers,
	zhouchengming1, Kapoor, Prasun, Alexander Graf, geert, kilobyte,
	manuel.montezelo, Andrew Pinski, linyongting, Alexey Klimov,
	broonie, Zhangjian (Bamvor),
	linux-arm-kernel, Maxim Kuvyrkov, Nathan Lynch,
	Martin Schwidefsky, davem, christoph.muellner

On Wednesday, December 7, 2016 5:43:27 PM CET Dr. Philipp Tomsich wrote:
> Catalin,
> 
> > On 07 Dec 2016, at 17:32, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > 
> >>> In other words: Why not keep ILP32 simple an ask users that need a 16TB+ offset
> >>> to use LP64? It seems much more consistent with the other choices takes so far.
> >> 
> >> If user can switch to lp64, he doesn't need ilp32 at all, right? 
> >> Also, I don't understand how true 64-bit offset in mmap64() would
> >> complicate this port.
> > 
> > It's more like the user wanting a quick transition from code that was
> > only ever compiled for AArch32 (or other 32-bit architecture) with a
> > goal of full LP64 transition on the long run. I have yet to see
> > convincing benchmarks showing ILP32 as an advantage over LP64 (of
> > course, I hear the argument of reading a pointer a loop is twice as fast
> > with a half-size pointer but I don't consider such benchmarks relevant).
> 
> Most of the performance advantage in benchmarks comes from a reduction
> in the size of data-structures and/or tighter packing of arrays.  In other words,
> we can make slightly better use of the caches and push the memory subsystem
> a little further when running multiple instances of benchmarks.
> 
> Most of these advantages should eventually go away, when struct-reorg makes
> it way into the compiler. That said, it’s a marginal (but real) improvement for a
> subset of SPEC.
> 
> In the real world, the importance of ILP32 as an aid to transition legacy code
> that is not 64bit clean… and this should drive the ILP32 discussion. That we
> get a boost in our SPEC scores is just a nice extra that we get from it 

To bring this back from the philosophical questions of ABI design
to the specific point of what file offset width you want for mmap()
on 32-bit architectures.

For all I can tell, using mmap() to access a file that is many thousand
times larger than your virtual address space is completely crazy.
Adding a new mmap64() syscall on all 32-bit architectures would be
trivial if there was a use case for it, without one we but without at
least one specific application asking for it (with good reasons), we
shouldn't even be talking about that.

Note that until commit f8b7256096a2 ("Unify sys_mmap"), we actually
had a sys_mmap64 implementation on a couple of architectures, but
removed it.

	Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
  2016-12-07 15:49   ` Yury Norov
@ 2016-12-08 15:47     ` Florian Weimer
       [not found]       ` <20170103205437.GA22548@amd>
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2016-12-08 15:47 UTC (permalink / raw)
  To: Yury Norov
  Cc: libc-alpha, linux-arch, linux-kernel, Catalin Marinas,
	szabolcs.nagy, heiko.carstens, cmetcalf, philipp.tomsich, joseph,
	zhouchengming1, Prasun.Kapoor, agraf, geert, kilobyte,
	manuel.montezelo, arnd, pinskia, linyongting, klimov.linux,
	broonie, bamvor.zhangjian, linux-arm-kernel, maxim.kuvyrkov,
	Nathan_Lynch, schwidefsky, davem, christoph.muellner

On 12/07/2016 04:48 PM, Yury Norov wrote:
> On Wed, Dec 07, 2016 at 02:23:55PM +0100, Florian Weimer wrote:
>> On 12/06/2016 07:54 PM, Yury Norov wrote:
>>> 3. Introduce new mmap64() syscall like this:
>>> sys_mmap64(void *addr, size_t len, int prot, int flags, int fd, struct off_pair *off);
>>> (The pointer here because otherwise we have 7 args, if simply pass off_hi and
>>> off_lo in registers.)
>>
>> I would prefer a batched mmap/munmap/mremap/mprotect/madvise interface, so
>> that VM changes can be coalesced and the output reduced.  This interface
>> could then be used to implement mmap on 32-bit architectures as well because
>> the offset restrictions would not apply there.
>
> Hi Florian,
>
> I frankly don't understand what you mean, All syscalls you mentioned
> doesn't take off_t or other 64-bit arguments. 'VM changes' - virtual
> memory? If so, I don't see any changes in VM with this approach, just
> correct handling of big offsets.

What I was trying to suggest is a completely different interface which 
is not subject to register size constraints and which has been requested 
before (a mechanism for batching mm updates).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
       [not found]       ` <20170103205437.GA22548@amd>
@ 2017-01-12 16:13         ` Florian Weimer
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Weimer @ 2017-01-12 16:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Yury Norov, libc-alpha, linux-arch, linux-kernel,
	Catalin Marinas, szabolcs.nagy, heiko.carstens, cmetcalf,
	philipp.tomsich, joseph, zhouchengming1, Prasun.Kapoor, agraf,
	geert, kilobyte, manuel.montezelo, arnd, pinskia, linyongting,
	klimov.linux, broonie, bamvor.zhangjian, linux-arm-kernel,
	maxim.kuvyrkov, Nathan_Lynch, schwidefsky, davem,
	christoph.muellner

On 01/03/2017 09:54 PM, Pavel Machek wrote:
> ...actually, with strace and batched interface, it will be impossible
> to see what is going on because of races. So I'm not sure if I like
> the batched interface at all...

I'm not sure if I understand this problem.

ioctl, fcntl, most socket system calls, even open all have this problem 
as well, right?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Question] New mmap64 syscall?
       [not found] <20161210092130.GA19309@xo-6d-61-c0.localdomain>
@ 2016-12-11 12:57 ` Yury Norov
  0 siblings, 0 replies; 12+ messages in thread
From: Yury Norov @ 2016-12-11 12:57 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Yury Norov, Arnd Bergmann, Dr. Philipp Tomsich, Catalin Marinas,
	libc-alpha, linux-arch, LKML, szabolcs.nagy, heiko.carstens,
	cmetcalf, Joseph S. Myers, zhouchengming1, Kapoor, Prasun,
	Alexander Graf, geert, kilobyte, manuel.montezelo, Andrew Pinski,
	linyongting, Alexey Klimov, broonie, Zhangjian (Bamvor),
	linux-arm-kernel, Maxim Kuvyrkov, Nathan Lynch,
	Martin Schwidefsky, davem, christoph.muellner

This is the draft of sys_mmap64() support in the kernel. For 64-bit
kernels everything is simple. For 32-bit kernels we have a problem.
pgoff_t is declared as unsigned long, and should be turned to 
unsigned long long. It affects the number of structures and interfaces.
Last patch does the change. It should be wide-tested on 32-bit kernels
whith I didn't do. Arm64 kernel is working with this patchset, and I don't
expect difficulties there.

Yury Norov (3):
  mm: move argument checkers of mmap_pgoff() to separated routine
  sys_mmap64()
  mm: turn page offset types to 64-bit

 fs/btrfs/extent_io.c              |  2 +-
 fs/ext2/dir.c                     |  4 +--
 include/linux/mm.h                |  9 +++---
 include/linux/radix-tree.h        |  8 ++---
 include/linux/syscalls.h          |  3 ++
 include/linux/types.h             |  2 +-
 include/uapi/asm-generic/unistd.h |  4 ++-
 lib/radix-tree.c                  |  8 ++---
 mm/debug.c                        |  2 +-
 mm/internal.h                     |  2 +-
 mm/memory.c                       |  4 +--
 mm/mmap.c                         | 66 ++++++++++++++++++++++++++++++++-------
 mm/readahead.c                    |  4 +--
 mm/util.c                         |  3 +-
 14 files changed, 85 insertions(+), 36 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-01-12 16:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-06 18:55 [Question] New mmap64 syscall? Yury Norov
2016-12-06 21:21 ` Arnd Bergmann
2016-12-07 10:35   ` Yury Norov
2016-12-07 11:07     ` Dr. Philipp Tomsich
2016-12-07 12:40       ` Yury Norov
     [not found]         ` <20161207163210.GB31779@e104818-lin.cambridge.arm.com>
2016-12-07 16:43           ` Dr. Philipp Tomsich
2016-12-07 21:32             ` Arnd Bergmann
2016-12-07 13:24 ` Florian Weimer
2016-12-07 15:49   ` Yury Norov
2016-12-08 15:47     ` Florian Weimer
     [not found]       ` <20170103205437.GA22548@amd>
2017-01-12 16:13         ` Florian Weimer
     [not found] <20161210092130.GA19309@xo-6d-61-c0.localdomain>
2016-12-11 12:57 ` Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).