From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 76867 invoked by alias); 18 Oct 2016 15:16:54 -0000 Mailing-List: contact gnu-gabi-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: gnu-gabi-owner@sourceware.org Received: (qmail 75668 invoked by uid 89); 18 Oct 2016 15:16:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.99.2 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=distinguished, Consumer, linkers, appropriately X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on sourceware.org X-Spam-Level: X-HELO: mx1.redhat.com Subject: Re: RFC: Program Properties To: "H.J. Lu" , "Maciej W. Rozycki" References: <969fb6da-f13c-eb14-3e53-94a594384518@redhat.com> Cc: "Carlos O'Donell" , gnu-gabi@sourceware.org From: Nick Clifton Organization: Red Hat Message-ID: <85ae4c67-9e0c-2fcd-03aa-33028c6aac90@redhat.com> Date: Fri, 01 Jan 2016 00:00:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 18 Oct 2016 15:16:50 +0000 (UTC) X-SW-Source: 2016-q4/txt/msg00011.txt.bz2 Hi Guys, It looks like we are all working towards a common goal, which is good. H.J. - I like your program property scheme, especially the idea of having a lightweight, allocatable section that can be quickly parsed by the loader. I assume that the NT_GNU_PROPERTY_TYPE_0 note type also serves as a version indicator ? Ie future versions of the specification would use a different value to indicate newer features ? One thing that I would like to propose is an extension to the scheme to add a second, larger note section that is not allocatable, but which instead contains information to be parsed by static tools. In particular this section would contain information about the tools used to build the binary and the security features enabled (or disabled). Also this section would be able to discriminate this information on a per-symbol basis if necessary, so that multiple, conflicting properties can be recorded for a single file. I have a preliminary proposal to implement this second section (see below) and I would be very interested in any thoughts that you might have. Cheers Nick The purpose of this non-allocatable note section is to provide a way for package maintainers and distributions to answer questions about the binaries in their distribution. Especially security related questions. Here is the current, preliminary, specification: * The information is stored in a new section in the file using the ELF NOTE format. Creator tools (compilers, assemblers etc) place the notes into the binary files. Consumer tools (none written yet, but readelf and/or objdump could be enhanced for this purpose) read the notes and answer questions about the binaries concerned. Static linkers need special care to handle merging of the notes. * The information is stored in a section called .gnu.build.attributes. (The name can be changed - it is basically irrelevant anyway, it is the new section flag (defined below) that matters). The section has the SHT_NOTE type and a new section flag set: SHF_GNU_BUILD_ATTRIBUTES. (Suggested value: 0x00100000). This ndicates the special needs when merging notes (see below). The sh_link field should be set to contain the index of symbol table section. If this field is 0 then the consumer should assume that the first section of type SHT_SYMTAB in the section headers is symbol table being used. * The specification breaks the name/description convention of ELF notes to instead use a key/value/applies-to list. (This not a problem as we are only breaking a convention not a requirement of the ELF NOTE specification). The type of the note is the key. The name of the note is the value and the description field is the applies-to list. By default the description field contains the filename of the source file that was used to produce the binary. (FIXME: Absolute pathname ? Relative pathname ? Just the filename with no path ?) This indicates that the key/value pair applies to all symbols in the file. The length of this string must *not* be multiple of 4 (with the terminating NUL byte included). If necessary the filename should be padded with an extra NUL byte. (Note - this padding byte is separate from the padding bytes used to align the description field to its normal boundary). This restriction is so that a description containing symbol names (see below) can be distinguished from a description containing a file name. If a key/value pair applies to just some of the symbols in a file, then the description instead contains a list of 4-byte or 8-byte wide numbers. These are indices into the symbol table, (pointed to by the sh_link field of the section header). Notes: + In unrelocated files the offset should instead be zero, with a relocation present to set the actual value once the file is linked. FIXME: Unable to implement at the moment. Instead the relocation generated by the assembler evaluates to *value* of the symbol not its index in the ELF symbol table section. May have to change this spec if I cannot find a way around this. + The numbers are stored in the same endian format as that specified in the EI_DATA field of the ELF header of the file containing the note. + The symbol table is indexed rather than the string table because consumers are most likely to be interested the symbol as a whole, not just its name. (FIXME: Is this true ?) An empty description field is a special case. It should be treated as if it had the same filename as the nearest preceding version note. (See NT_GNU_BUILD_ATTRIBUTE_VERSION below). FIXME: This assumes that a linker will preserve the order of notes when linking. Does this actually happen ? Multiple notes of the same key can exist, providing that they have different values and that their applies-to lists do not intersect. (FIXME: is this restriction necessary ? Perhaps there are times when a symbol can have multiple values for the same key). Where notes for the same key exist in both symbol index form and filename form, the symbol index form takes precedence. Any symbol in the given file not explicitly indexed by one of the notes will take its value from the note using the filename form. At most one note for a given key can exist containing a filename rather than symbol indices. If this rule is broken then this indicates that the file has been created by a linker that has not been enhanced to support this specification. In such cases all notes containing symbol indices should be ignored. * When the linker merges two or more files containing these notes it should ensure that the above rules are maintained, and that the notes are merged appropriately. The linker will create a new version note (see the definition of NT_GNU_BUILD_ATTRIBUTE_VERSION below), with the output filename as its description, and the name set to any version of this specification that it chooses. Any input version notes that match this version are discarded. Other version notes are preserved and included in the output file. When notes are merged the following rules apply: 1. If all input notes of a given type just contain filenames and they all have the same value string then a single output note is created with this type/value and the output filename as its description. Otherwise: 2. If rule 1 would match except for one or more symbol containing notes then rule 1 is executed, but the symbol containing notes are also preserved and copied to the output. If this is a relocatable link then the relocations associated with the symbol indices should also be updated. Otherwise: 3. [This rule triggers if there are filename containing notes with different value strings]. The linker chooses one of the input value strings to be the default for the output and creates an output note using this value. (Presumably the linker will choose the value with the most matching input files). Input notes containing filenames but with a value that does not match this output value must be converted into symbol containing notes listing *all* of the symbols in the input file. Failure to do this breaks the requirement that there only be one filename containing output note for the given key. If this is a final link, then relocations on the notes should of course be resolved. The linker is also able to create and insert its own notes. Eg to indicate that -z relro is enabled. Linkers that have not been enhanced to support this proposal will simply concatenate the notes. (They may also eliminate duplicate notes, although this is not guaranteed. They may also sort the notes which would break the use of empty description fields, as mentioned above). In this case the output file is likely to contain multiple notes with the same key/value pair. Consumers can detect this situation by noticing that there is no NT_GNU_BUILD_ATTRIBUTE_VERSION note with output file name, and hence deduce that any notes containing symbol indices are broken. (The linker will not have updated the indices when merging the notes). Despite only supporting a file level granularity however, these notes may still prove useful. * Three new note types defined (so far): Type: NT_GNU_BUILD_ATTRIBUTE_VERSION (0x100) Name: A string identifying the version of this specification that is implemented in the accompanying notes. Currently set to "1.0". Type: NT_GNU_BUILD_ATTRIBUTE_CREATOR (0x101) Name: A string identifying the tool that created the symbols and their associated code eg: "gcc (GCC) 6.2.1 20160916 (Red Hat 6.2.1-2)" includes name, date and version. Type: NT_GNU_BUILD_ATTRIBUTE_OPTIONS (0x102) Name: A string identifying the *significant* compile time options affecting the specified symbols. Ie those that affect ABI, security, etc. Note: selection of *significant* compile time options may be subject to debate. But the actual choice can vary over time, this does not affect the current proposal.