On 10/28/2016 07:55 PM, Paul Eggert wrote:
> Thanks for working on this. The idea looks good. I second Carlos's
> comments on comment and NEWS. Some other (minor) issues:
>
>> +/* The heap cookie.  The lowest three bits (corresponding to
>> +   SIZE_BITS) in __malloc_guard_header must be clear. Initialized
>
> It's __malloc_header_guard, not __malloc_guard_header.

Fixed.

>> +   during libc startup, and computed by elf/dl-keysetup.c.  */
>> +INTERNAL_SIZE_T __malloc_header_guard; /* For size.  */
>> +INTERNAL_SIZE_T __malloc_footer_guard; /* For prev_size.  */
>
> Why two cookies? Doesn't one suffice, and mightn't that be a tad more
> efficient? The comment should explain.

I think we need two cookies because on is more likely to leak through 
uninitialized heap chunks because it protects data which overlaps with 
application data once the chunk is in use.

The question is whether the prev_size field needs protecting at all. 
It's easy just to mark the previous chunk as allocated in an injected 
chunk header, so that the attacker does not have to spoof it.  It is 
probably more beneficial to obfuscate the fastbin next pointer, and the 
forward/backward pointers of binned chunks.

I hope to receive feedback on this point later this week.

>> +/* Decrypt a heap header chunk.  */
>
> These macros both encrypt and decrypt, right?  At least, they're used
> that way.  The comment should say so.

> Come to think of it, suppose we want to microimprove the implementation
> to use val+guard to encrypt and val-guard to decrypt; this might be bit
> faster on some platforms, and would be safe in unsigned arithmetic.
> Would we then need two sets of macros, one for encryption and one for
> decryption?  Then perhaps we should have two sets of macros now, even
> though the implementations are the same now, to make invoking code
> clearer and more-portable to a +/- approach.

I'm not aware of any architectures where add/sub is faster than xor.  I 
assumed the uniform instruction encoding between encryption and 
decryption would be beneficial for other reasons.

Anyway, I want to experiment with byte-swapping as well.  (Big endian is 
much more difficult to exploit with single-byte overflows than little 
endian.)  I have introduced the additional macros.

>> +#define HEAP_CRYPT_SIZE(val) (__malloc_header_guard ^
> ((INTERNAL_SIZE_T) val))
>> +#define HEAP_CRYPT_PREVSIZE(val) \
>> +  (__malloc_footer_guard ^ ((INTERNAL_SIZE_T) val))
>
> 'val' needs to be parenthesized in these two macro definientia.

Right.

>> +#define set_foot(p, s) \
>> +  (((mchunkptr) ((char *) (p) + (s)))->mchunk_prev_size \
>> +    = HEAP_CRYPT_PREVSIZE (s))
>
> Last line is indented one space too much.

I don't have firm opinion on this one.  In general, operators on 
continued lines are indented by two spaces.

>> +      else if (usable > size + 4096)
>
> Rephrase as 'usable - size > 4096' so that we don't have to worry about
> SIZE being ridiculously large.  The subtraction cannot possibly overflow
> here.

Okay.

>> +     are treated as fake mmapped chunks.  We cannot use the regular
>> +     accessors because the chunks we read are not yet encrypted.  */
>
> Avoid the royal "we", and use imperative voice instead: "We cannot use"
> -> "Don't use", "the chunks we read" -> "the chunks read".

This writing style is used in comments throughout glibc.  I see nothing 
wrong with it.

I'm attaching an updated version.  As I said above, it's still not final 
because we'll likely want to apply the secondary/prev_size chunk 
hardening to other malloc_chunk members instead.

Thanks,
Florian