From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=WvAX=62=emagii.com=binutils@sourceware.org>
Received: from emagii.se (www.emagii.com [185.133.207.17])
	by sourceware.org (Postfix) with ESMTPS id 39A4C3858CDB
	for <binutils@sourceware.org>; Thu,  2 Mar 2023 22:02:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 39A4C3858CDB
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=emagii.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=emagii.com
Received: from [10.175.196.145] (84-55-68-216.customers.ownit.se [84.55.68.216])
	by emagii.se (Postfix) with ESMTPSA id 0D58F12018B;
	Thu,  2 Mar 2023 23:02:34 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emagii.com;
	s=default; t=1677794554;
	bh=bu/IgKeOKdGrhGZW4UZSqKaWTu98wofm76Wyc99ha9U=; h=Subject:To:From;
	b=eM5TIvYOsw7p6luZxFYqD3OTaPE9PEy+L5lp5MfNSGUfZZs7P95HWF8Nj+0ZYSW01
	 8u+SMM3FCiLhTcjpv4sUb8mhRfhtD6L4DvdvVKTgjBBJYE5a/YsFY8yX3stUVG94Jo
	 GNabR73Ug9GYA5gM5CjMXV8IUawWESCNIjmgIA7U=
Authentication-Results: emagii.beebytevps.io;
        spf=pass (sender IP is 84.55.68.216) smtp.mailfrom=binutils@emagii.com smtp.helo=[10.175.196.145]
Received-SPF: pass (emagii.beebytevps.io: connection is authenticated)
Message-ID: <d125743e-21f2-e113-3b12-919774383a2f@emagii.com>
Date: Thu, 2 Mar 2023 23:02:30 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.7.1
Subject: Re: [PATCH v8 0/11 Add support for CRC64 generation in linker
Content-Language: en-US
To: Nick Clifton <nickc@redhat.com>, binutils@sourceware.org
References: <20230301174325.1858522-1-binutils@emagii.com>
 <2b4f460e-3796-10b0-7fc4-a0635acdea18@redhat.com>
From: Ulf Samuelsson <binutils@emagii.com>
In-Reply-To: <2b4f460e-3796-10b0-7fc4-a0635acdea18@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-PPP-Message-ID: 
 <167779455429.1167616.14570820210303322501@localhost.localdomain>
X-PPP-Vhost: emagii.com
X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_ASCII_DIVIDERS,NICE_REPLY_A,SPF_HELO_FAIL,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <binutils.sourceware.org>


On 2023-03-02 17:43, Nick Clifton wrote:
> Hi Ulf,
>
>   The patch set is looking good so far.
>
>   I have been running some experiments on misusing the DIGEST directive
>   and found a few problems.  For example, if I want to have two digests
>   in my binary:
>
>      % ld foo.o -T test.t
>     ld:test.t:19: warning: Only the first CRC polynome is used
>     ld:crc.t:0: warning: updating CRC failed
>     CRC generation aborted
>
>   which would be fine, except for the fact that the link succeeds:
>
>     % echo $?
>     0
>
>  (Hint: You can use %X in your message string when calling einfo() to set
>   an error exit code...)
>
>   Even better would be to allow an number of digests.  I still do not 
> know
>   why you want to have a limit of 1.

I only see the DIGEST feature as useful for an embedded system executing 
from flash.
It would not be used for applications running linux - until linux starts 
to CRC check
the applications before they run.
Embedded Linux, typically compresses the kernel and encapsulates the the
compressed data in a header (U-Boot mkimage) is an example,
but that, I think is outside the scope of the linker.

In an embedded system you use the digest in two ways.

1. The bootloader will start by verifying its own contents.
     At reset, the ".data" and ".bss" sections of the bootloader are 
uninitialized.
     They will be initialized by the bootloader, from the ".text" section.
     The CRC check will be done before initialization so you cannot
     use this to check the ".data" or ".bss".

2. The bootloader will verify the real application before it is started.

     The application has a header containing the digest in a location 
known to the bootloader.
     The check should be of the *entire flash contents* except the 
digest itself.
     There is nothing else to check so having additional checks is 
meaningless.
     The bootloader cannot check the ".data" or ".bss" of the application
     because they are initialized by the application, and the bootloader 
will check
     before the application is started.
     The bootloader will again check the entire flash contents, leaving 
nothing else to check.

     If the application needs to do a selfcheck of the flash while running,
     it will use the same digest as the bootloader and check the same area.
     There is no need for a new digest for this.

     If the application needs to calculate CRC on other things, like 
communications,
     using another CRC algorithm, it can add the CRC routines and table 
in the
     application code. Doing it in the linker just adds complexity.

     That is why I suspect it does not make sense to support anything 
but ".text"
     and it does not make sense to have more than one digest.

=================================================

     I can see that it would be good to have a built in function inside 
the gcc compiler
     that would generate crc routines.

     I.E: Something like this:

     #pragma digest "CRC64-ECMA" "crc_ecma" using "crc_tab"

     #pragma digest_table "CRC64-ECMA" "crc_tab"

       extern uint64_t crc_ecma(char *s, uint32_t numbytes);

=================================================

The digest table is strictly not neccessary, but it is convenient and 
removes the risk

that the user uses a different table than the table used to generate the 
digest in the linker.

=============================================

If you try to use two digests in error, then you have two cases.

1. You add a second digest which is not used by the application.
     Then the link completes with a warning.
     Really not worrysome.

2. You add a second digest which is used by the application.

     DIGEST "digest1" "CRC64-ECMA"
     DIGEST "digest2" "CRC64-ISO"

     The application needs to refer to the digest and declares.
     extern uint64_t digest1;
     extern uint64_t digest2;

     uint64_t mycrc = digest2;

     The linker will not find "digest2" and will report that as an error;
     and then the warning message will be detected as the user looks at 
the log.

===============================================
>
>   I do not like using the DIGEST SECTION directive to specify the section
>   for the digest.  This should be implied by the presence of the DIGEST
>   directive itself.  For example:
>
>     .data : { DIGEST ... }

I did not manage to figure out which section I am in
when the digest command is executed, so I tool the easy way out.

As I said above, this is not going to happen in the real world
as you cannot use this to check the ".data" section in an proper setup.

I'd like to remove the function and only support ".text".

>
>   ought to create a digest in the .data section.  Furthermore it seems 
> that
>   the DIGEST SECTION directive does not work reliably:
>
>     % cat test.t
>     _start = 0x000000;
>     SECTIONS
>     {
>       . = 0x1000 + SIZEOF_HEADERS;
>
>       .foo ALIGN (CONSTANT (COMMONPAGESIZE)) :
>        {
>           foo_start = .;
>           *(.foo)
>           foo_end = .;
>         }
>
>         .data ALIGN (CONSTANT (COMMONPAGESIZE)) :
>         {
>           DIGEST SECTION ".foo"
>           DIGEST "BE.BE.BE" POLY (64,a,b,c,d,e,f) (foo_start , foo_end)
>           *(.data)
>         }
>
>        /DISCARD/ : { *(*) }
>     }
>
>     % ld -T test.t foo.o
>     ld:test.t:0: warning: updating CRC failed
>     CRC generation aborted
>
>
The digest, start and end needs to be in the same section.
You put the start and end in ".foo" and the digest in ".data".

I do a check that the digest and the start/end are in the selected
section, but I do not understand why there is no error message.

===============================================
> Another issue - if the digest is unrecognised then the error message
>   is not very helpful.  For example I tried:
>
>      DIGEST "0.BE" "TCRC32" (___CRC32_START___ , text_end)
>
>   and received the error message:
>
>     ld: Unknown DIGEST: no error
>
Will see if I can fix that.

> (There is no line number in the error message, nor does it repeat of the
>   unrecognized name).
>
>   Incidentally if a valid digest name is used in the above example a 
> symbol
>   with the name "0" is generated, which is not really allowed as symbol
>   names should not start with a numeric value.  But that is probably 
> being
>   too pedantic.

If the user tries to access the digest, he needs to declare a variable with
the name "0"

I.E:

uint64_t   0;

Good luck with that :-)

If I read the manual, I find this:

####################

3.10.3 Symbol Names
Unless quoted, symbol names start with a letter, underscore, or period 
and may include
letters, digits, underscores, periods, and hyphens. Unquoted symbol 
names must not conflict
with any keywords. You can specify a symbol which contains odd 
characters or has the
same name as a keyword by surrounding the symbol name in double quotes:

   "SECTION" = 9;

   "with a space" = "also with a space" + 10;

Since symbols can contain many non-alphabetic characters, it is safest 
to delimit symbols
with spaces. For example, ‘A-B’ is one symbol, whereas ‘A - B’ is an 
expression involving
subtraction.

####################

so "0" is actually a legal symbol in the linker.

Since '.' is also a legal character, I will change the endian character 
to '#'.

I.E:

    DIGEST "CRC#BE" "CRC64-ECMA"

===============================================
>   Oh - and here is a good one.  If the start symbol is beyond the end 
> symbol
>   the code triggers a segmentation fault:
>
>     % cat test.t
>     [...]
>     .text ALIGN (CONSTANT (COMMONPAGESIZE)) :
>       {
>         DIGEST "BE.BE.BE" POLY (64,a,b,c,d,e,f) (text_end , text_start)
>         text_start = .;
>         *(.text)
>         text_end = .;
>       }
>      [...]
>
>     % ld foo.o -T test.t
>     Segmentation fault (core dumped)
OK,  I verify the sanity of those variables, and that is a simple check 
which I will add.
>
>
>   I am not sure if the next one matters or not, but just in case: The
>   checksum and crc table can be generated on unaligned boundaries if
>   the code does not happen to be aligned.  For example:
>
>     .text ALIGN (CONSTANT (COMMONPAGESIZE)) :
>       {
>         BYTE(0);
>         DIGEST "BE.BE.BE" POLY (64,a,b,c,d,e,f) (text_start , text_end)
>         text_start = .;
>         *(.text)
>         text_end = .;
>       }

Since there are architectures which supports byte aligned instructions
it is up to the user to specify alignment in the linker file.
( I was raised with the Series 32000 which supported instruction lengths 
between 2..23 bytes )

If they need the alignment but fail in the linker file,
then this will result in a trap when testing.

================================================

>   If the new directives are used outside of an output section then they
>   do not generate very helpful error messages.  For example:
>
>     % cat test.t
>     _start = 0x000000;
>     SECTIONS
>     {
>       DEBUG ON
>       /DISCARD/ : { *(*) }
>     }
>
>     % ld foo.o -T test.t
>     ld:test.t:5: syntax error
>
>   (Note the incorrect line number for the error).  This is probably a 
> limitation
>   of the grammar supported by bison however, so I would not be 
> surprised if there
>   is no easy way to fix it.

That is part of the higher level parsing, so there is probably nothing I 
can do about that.

The line number problem is also high level parsing, so I think that 
should be reported as a bug.
I suspect you get the same problem if you replace the DEBUG with a QUAD 
directive in the same place.

================================================

>
>   On a related note - I think that there is a problem with the checksum
>   generation.  Specifically using the ".BE" suffix to indicate that a
>   big-endian checksum should be computed.  The problem is that some 
> targets
>   support both big-endian and little-endian output, selectable via a
>   command line option, and so a static choice in a linker script is
>   inappropriate.  May I suggest an alternative ?  Support both ".LE" and
>   ".BE" suffixes, and if neither is specified then use whichever endian
>   format has been selected by the command line.
>
I can have a look at that.

It will be small endian if nothing is specified in the label nor in the 
command line.


> Cheers
>   Nick
>