How easy/hard would it be for glibc malloc to automatically align larger allocations (e.g. say 4KB+ that are also multiples of 4KB) to a page address boundary, so that they are always properly aligned for O_DIRECT IO? I _thought_ that was already being done by default, but much to my surprise that was not the case. For improved IO efficiency, I was looking at whether it would be possible to transparently avoid doing a user->kernel data copy during large write() calls and just submitting the IO directly to underlying flash storage, but since the input buffers are not aligned properly, this isn't possible. I'm of course aware of posix_memalign(), but I was wondering about "normal" applications that are written by users that don't know anything about this, and just allocate memory and use it to submit IO. I'd think that keeping this kind of "friendly" 4KB-multiple allocations in its own heap would be very efficient for malloc, but I am not really familiar with the details. Cheers, Andreas