From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from emagii.se (www.emagii.com [185.133.207.17]) by sourceware.org (Postfix) with ESMTPS id 39A4C3858CDB for ; Thu, 2 Mar 2023 22:02:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 39A4C3858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=emagii.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=emagii.com Received: from [10.175.196.145] (84-55-68-216.customers.ownit.se [84.55.68.216]) by emagii.se (Postfix) with ESMTPSA id 0D58F12018B; Thu, 2 Mar 2023 23:02:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emagii.com; s=default; t=1677794554; bh=bu/IgKeOKdGrhGZW4UZSqKaWTu98wofm76Wyc99ha9U=; h=Subject:To:From; b=eM5TIvYOsw7p6luZxFYqD3OTaPE9PEy+L5lp5MfNSGUfZZs7P95HWF8Nj+0ZYSW01 8u+SMM3FCiLhTcjpv4sUb8mhRfhtD6L4DvdvVKTgjBBJYE5a/YsFY8yX3stUVG94Jo GNabR73Ug9GYA5gM5CjMXV8IUawWESCNIjmgIA7U= Authentication-Results: emagii.beebytevps.io; spf=pass (sender IP is 84.55.68.216) smtp.mailfrom=binutils@emagii.com smtp.helo=[10.175.196.145] Received-SPF: pass (emagii.beebytevps.io: connection is authenticated) Message-ID: Date: Thu, 2 Mar 2023 23:02:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCH v8 0/11 Add support for CRC64 generation in linker Content-Language: en-US To: Nick Clifton , binutils@sourceware.org References: <20230301174325.1858522-1-binutils@emagii.com> <2b4f460e-3796-10b0-7fc4-a0635acdea18@redhat.com> From: Ulf Samuelsson In-Reply-To: <2b4f460e-3796-10b0-7fc4-a0635acdea18@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-PPP-Message-ID: <167779455429.1167616.14570820210303322501@localhost.localdomain> X-PPP-Vhost: emagii.com X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_ASCII_DIVIDERS,NICE_REPLY_A,SPF_HELO_FAIL,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2023-03-02 17:43, Nick Clifton wrote: > Hi Ulf, > >   The patch set is looking good so far. > >   I have been running some experiments on misusing the DIGEST directive >   and found a few problems.  For example, if I want to have two digests >   in my binary: > >      % ld foo.o -T test.t >     ld:test.t:19: warning: Only the first CRC polynome is used >     ld:crc.t:0: warning: updating CRC failed >     CRC generation aborted > >   which would be fine, except for the fact that the link succeeds: > >     % echo $? >     0 > >  (Hint: You can use %X in your message string when calling einfo() to set >   an error exit code...) > >   Even better would be to allow an number of digests.  I still do not > know >   why you want to have a limit of 1. I only see the DIGEST feature as useful for an embedded system executing from flash. It would not be used for applications running linux - until linux starts to CRC check the applications before they run. Embedded Linux, typically compresses the kernel and encapsulates the the compressed data in a header (U-Boot mkimage) is an example, but that, I think is outside the scope of the linker. In an embedded system you use the digest in two ways. 1. The bootloader will start by verifying its own contents.     At reset, the ".data" and ".bss" sections of the bootloader are uninitialized.     They will be initialized by the bootloader, from the ".text" section.     The CRC check will be done before initialization so you cannot     use this to check the ".data" or ".bss". 2. The bootloader will verify the real application before it is started.     The application has a header containing the digest in a location known to the bootloader.     The check should be of the *entire flash contents* except the digest itself.     There is nothing else to check so having additional checks is meaningless.     The bootloader cannot check the ".data" or ".bss" of the application     because they are initialized by the application, and the bootloader will check     before the application is started.     The bootloader will again check the entire flash contents, leaving nothing else to check.     If the application needs to do a selfcheck of the flash while running,     it will use the same digest as the bootloader and check the same area.     There is no need for a new digest for this.     If the application needs to calculate CRC on other things, like communications,     using another CRC algorithm, it can add the CRC routines and table in the     application code. Doing it in the linker just adds complexity.     That is why I suspect it does not make sense to support anything but ".text"     and it does not make sense to have more than one digest. =================================================     I can see that it would be good to have a built in function inside the gcc compiler     that would generate crc routines.     I.E: Something like this:     #pragma digest "CRC64-ECMA" "crc_ecma" using "crc_tab"     #pragma digest_table "CRC64-ECMA" "crc_tab"       extern uint64_t crc_ecma(char *s, uint32_t numbytes); ================================================= The digest table is strictly not neccessary, but it is convenient and removes the risk that the user uses a different table than the table used to generate the digest in the linker. ============================================= If you try to use two digests in error, then you have two cases. 1. You add a second digest which is not used by the application.     Then the link completes with a warning.     Really not worrysome. 2. You add a second digest which is used by the application.     DIGEST "digest1" "CRC64-ECMA"     DIGEST "digest2" "CRC64-ISO"     The application needs to refer to the digest and declares.     extern uint64_t digest1;     extern uint64_t digest2;     uint64_t mycrc = digest2;     The linker will not find "digest2" and will report that as an error;     and then the warning message will be detected as the user looks at the log. =============================================== > >   I do not like using the DIGEST SECTION directive to specify the section >   for the digest.  This should be implied by the presence of the DIGEST >   directive itself.  For example: > >     .data : { DIGEST ... } I did not manage to figure out which section I am in when the digest command is executed, so I tool the easy way out. As I said above, this is not going to happen in the real world as you cannot use this to check the ".data" section in an proper setup. I'd like to remove the function and only support ".text". > >   ought to create a digest in the .data section.  Furthermore it seems > that >   the DIGEST SECTION directive does not work reliably: > >     % cat test.t >     _start = 0x000000; >     SECTIONS >     { >       . = 0x1000 + SIZEOF_HEADERS; > >       .foo ALIGN (CONSTANT (COMMONPAGESIZE)) : >        { >           foo_start = .; >           *(.foo) >           foo_end = .; >         } > >         .data ALIGN (CONSTANT (COMMONPAGESIZE)) : >         { >           DIGEST SECTION ".foo" >           DIGEST "BE.BE.BE" POLY (64,a,b,c,d,e,f) (foo_start , foo_end) >           *(.data) >         } > >        /DISCARD/ : { *(*) } >     } > >     % ld -T test.t foo.o >     ld:test.t:0: warning: updating CRC failed >     CRC generation aborted > > The digest, start and end needs to be in the same section. You put the start and end in ".foo" and the digest in ".data". I do a check that the digest and the start/end are in the selected section, but I do not understand why there is no error message. =============================================== > Another issue - if the digest is unrecognised then the error message >   is not very helpful.  For example I tried: > >      DIGEST "0.BE" "TCRC32" (___CRC32_START___ , text_end) > >   and received the error message: > >     ld: Unknown DIGEST: no error > Will see if I can fix that. > (There is no line number in the error message, nor does it repeat of the >   unrecognized name). > >   Incidentally if a valid digest name is used in the above example a > symbol >   with the name "0" is generated, which is not really allowed as symbol >   names should not start with a numeric value.  But that is probably > being >   too pedantic. If the user tries to access the digest, he needs to declare a variable with the name "0" I.E: uint64_t   0; Good luck with that :-) If I read the manual, I find this: #################### 3.10.3 Symbol Names Unless quoted, symbol names start with a letter, underscore, or period and may include letters, digits, underscores, periods, and hyphens. Unquoted symbol names must not conflict with any keywords. You can specify a symbol which contains odd characters or has the same name as a keyword by surrounding the symbol name in double quotes:   "SECTION" = 9;   "with a space" = "also with a space" + 10; Since symbols can contain many non-alphabetic characters, it is safest to delimit symbols with spaces. For example, ‘A-B’ is one symbol, whereas ‘A - B’ is an expression involving subtraction. #################### so "0" is actually a legal symbol in the linker. Since '.' is also a legal character, I will change the endian character to '#'. I.E:    DIGEST "CRC#BE" "CRC64-ECMA" =============================================== >   Oh - and here is a good one.  If the start symbol is beyond the end > symbol >   the code triggers a segmentation fault: > >     % cat test.t >     [...] >     .text ALIGN (CONSTANT (COMMONPAGESIZE)) : >       { >         DIGEST "BE.BE.BE" POLY (64,a,b,c,d,e,f) (text_end , text_start) >         text_start = .; >         *(.text) >         text_end = .; >       } >      [...] > >     % ld foo.o -T test.t >     Segmentation fault (core dumped) OK,  I verify the sanity of those variables, and that is a simple check which I will add. > > >   I am not sure if the next one matters or not, but just in case: The >   checksum and crc table can be generated on unaligned boundaries if >   the code does not happen to be aligned.  For example: > >     .text ALIGN (CONSTANT (COMMONPAGESIZE)) : >       { >         BYTE(0); >         DIGEST "BE.BE.BE" POLY (64,a,b,c,d,e,f) (text_start , text_end) >         text_start = .; >         *(.text) >         text_end = .; >       } Since there are architectures which supports byte aligned instructions it is up to the user to specify alignment in the linker file. ( I was raised with the Series 32000 which supported instruction lengths between 2..23 bytes ) If they need the alignment but fail in the linker file, then this will result in a trap when testing. ================================================ >   If the new directives are used outside of an output section then they >   do not generate very helpful error messages.  For example: > >     % cat test.t >     _start = 0x000000; >     SECTIONS >     { >       DEBUG ON >       /DISCARD/ : { *(*) } >     } > >     % ld foo.o -T test.t >     ld:test.t:5: syntax error > >   (Note the incorrect line number for the error).  This is probably a > limitation >   of the grammar supported by bison however, so I would not be > surprised if there >   is no easy way to fix it. That is part of the higher level parsing, so there is probably nothing I can do about that. The line number problem is also high level parsing, so I think that should be reported as a bug. I suspect you get the same problem if you replace the DEBUG with a QUAD directive in the same place. ================================================ > >   On a related note - I think that there is a problem with the checksum >   generation.  Specifically using the ".BE" suffix to indicate that a >   big-endian checksum should be computed.  The problem is that some > targets >   support both big-endian and little-endian output, selectable via a >   command line option, and so a static choice in a linker script is >   inappropriate.  May I suggest an alternative ?  Support both ".LE" and >   ".BE" suffixes, and if neither is specified then use whichever endian >   format has been selected by the command line. > I can have a look at that. It will be small endian if nothing is specified in the label nor in the command line. > Cheers >   Nick >