[PATCH v1 0/7] ASCII output section command

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH v1 0/7] ASCII output section command
@ 2024-06-28 21:46 binutils
  2024-06-28 21:46 ` [PATCH v1 1/7] ldlex.l: Add ASCII token binutils
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: binutils @ 2024-06-28 21:46 UTC (permalink / raw)
  To: binutils; +Cc: nickc

Implement the ASCII command for output sections

Introduce a command which adds a fixed size string.

   ASCII '(' <size> ')' , <string>

Size specified in command. 
If the string is shorter than the allocated area, it is zero filled
If the string is longer than the allocated area, it is truncated
and a NULL character is added at the end.

[PATCH v1 1/7] ldlex.l: Add ASCII token
[PATCH v1 2/7] ldgram.y: Add ASCII parsing
[PATCH v1 3/7] ldlang.*: process ASCII command
[PATCH v1 4/7] ld.texi: Add ASCII to info file
[PATCH v1 5/7] Add testsuite for ASCII command
[PATCH v1 6/7] NEWS: Add ASCII command
[PATCH v1 7/7] ChangeLog: Add ASCII command


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 1/7] ldlex.l: Add ASCII token
  2024-06-28 21:46 [PATCH v1 0/7] ASCII output section command binutils
@ 2024-06-28 21:46 ` binutils
  2024-06-28 21:46 ` [PATCH v1 2/7] ldgram.y: Add ASCII parsing binutils
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: binutils @ 2024-06-28 21:46 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson

From: Ulf Samuelsson <ulf@emagii.com>

Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
---
 ld/ldlex.l | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ld/ldlex.l b/ld/ldlex.l
index aa613100db0..c67c83689d7 100644
--- a/ld/ldlex.l
+++ b/ld/ldlex.l
@@ -311,6 +311,7 @@ V_IDENTIFIER [*?.$_a-zA-Z\[\]\-\!\^\\]([*?.$_a-zA-Z0-9\[\]\-\!\^\\]|::)*
 <WILD>"LONG"				{ RTOKEN(LONG); }
 <WILD>"SHORT"				{ RTOKEN(SHORT); }
 <WILD>"BYTE"				{ RTOKEN(BYTE); }
+<WILD>"ASCII"				{ RTOKEN(ASCII); }
 <WILD>"ASCIZ"				{ RTOKEN(ASCIZ); }
 <WILD>"LINKER_VERSION"			{ RTOKEN(LINKER_VERSION); }
 <SCRIPT>"NOFLOAT"			{ RTOKEN(NOFLOAT); }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 2/7] ldgram.y: Add ASCII parsing
  2024-06-28 21:46 [PATCH v1 0/7] ASCII output section command binutils
  2024-06-28 21:46 ` [PATCH v1 1/7] ldlex.l: Add ASCII token binutils
@ 2024-06-28 21:46 ` binutils
  2024-06-28 21:46 ` [PATCH v1 3/7] ldlang.*: process ASCII command binutils
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: binutils @ 2024-06-28 21:46 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson

From: Ulf Samuelsson <ulf@emagii.com>

Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
---
 ld/ldgram.y | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/ld/ldgram.y b/ld/ldgram.y
index 7c04025f526..cee798db198 100644
--- a/ld/ldgram.y
+++ b/ld/ldgram.y
@@ -128,7 +128,7 @@ static void yyerror (const char *);
 %right UNARY
 %token END
 %left <token> '('
-%token <token> ALIGN_K BLOCK BIND QUAD SQUAD LONG SHORT BYTE ASCIZ
+%token <token> ALIGN_K BLOCK BIND QUAD SQUAD LONG SHORT BYTE ASCII ASCIZ
 %token SECTIONS PHDRS INSERT_K AFTER BEFORE LINKER_VERSION
 %token DATA_SEGMENT_ALIGN DATA_SEGMENT_RELRO_END DATA_SEGMENT_END
 %token SORT_BY_NAME SORT_BY_ALIGNMENT SORT_NONE
@@ -703,9 +703,18 @@ statement:
 		{
 		  lang_add_data ((int) $1, $3);
 		}
+        | ASCII '(' mustbe_exp ')' NAME
+		{
+		  /* 'value' is a memory leak, do we care?  */
+		  etree_type *value = $3;
+		  if (value->type.node_code == INT)
+		    lang_add_string (value->value.value, $5);
+		  else
+		    einfo (_("%X%P:%pS: ASCII expression must be an integer\n"), NULL);
+		}
 	| ASCIZ NAME
 		{
-		  lang_add_string ($2);
+		  lang_add_string (0, $2);
 		}
 	| FILL '(' fill_exp ')'
 		{
-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 3/7] ldlang.*: process ASCII command
  2024-06-28 21:46 [PATCH v1 0/7] ASCII output section command binutils
  2024-06-28 21:46 ` [PATCH v1 1/7] ldlex.l: Add ASCII token binutils
  2024-06-28 21:46 ` [PATCH v1 2/7] ldgram.y: Add ASCII parsing binutils
@ 2024-06-28 21:46 ` binutils
  2024-07-01  6:35   ` Jan Beulich
  2024-06-28 21:46 ` [PATCH v1 4/7] ld.texi: Add ASCII to info file binutils
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: binutils @ 2024-06-28 21:46 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson

From: Ulf Samuelsson <ulf@emagii.com>

Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
---
 ld/ldlang.c | 65 +++++++++++++++++++++++++++++++++++++++++------------
 ld/ldlang.h |  2 +-
 2 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/ld/ldlang.c b/ld/ldlang.c
index 9e8cc224f4d..c5955059859 100644
--- a/ld/ldlang.c
+++ b/ld/ldlang.c
@@ -8663,15 +8663,20 @@ lang_add_data (int type, union etree_union *exp)
   new_stmt->type = type;
 }
 
-void
-lang_add_string (const char *s)
+/* Convert escape codes in S.
+   Supports \n, \r, \t and \NNN octals.
+   Returns a copy of S in a malloc'ed buffer.  */
+
+static char *
+convert_string (const char * s)
 {
-  bfd_vma  len = strlen (s);
-  bfd_vma  i;
-  bool     escape = false;
+  size_t  len = strlen (s);
+  size_t  i;
+  bool    escape = false;
+  char *  buffer = malloc (len + 1);
+  char *  b;
 
-  /* Add byte expressions until end of string.  */
-  for (i = 0 ; i < len; i++)
+  for (i = 0, b = buffer; i < len; i++)
     {
       char c = *s++;
 
@@ -8729,21 +8734,53 @@ lang_add_string (const char *s)
 	      }
 	      break;
 	    }
-
-	  lang_add_data (BYTE, exp_intop (c));
 	  escape = false;
 	}
       else
 	{
 	  if (c == '\\')
-	    escape = true;
-	  else
-	    lang_add_data (BYTE, exp_intop (c));
+	    {
+	      escape = true;
+	      continue;
+	    }
 	}
+
+      * b ++ = c;
     }
 
-  /* Remeber to terminate the string.  */
-  lang_add_data (BYTE, exp_intop (0));
+  * b = 0;
+  return buffer;
+}
+
+void
+lang_add_string (size_t size, const char *s)
+{
+  size_t  len;
+  size_t  i;
+  char *  string;
+
+  string = convert_string (s);
+  len = strlen (string);
+
+  /* Check if it is ASCIZ command (len == 0) */
+  if (size == 0)
+    /* Make sure that we include the terminating nul byte.  */
+    size = len + 1;
+  else if (len >= size)
+    {
+      len = size - 1;
+
+      einfo (_("%P:%pS: warning: ASCII string does not fit in allocated space,"
+		" truncated\n"), NULL);
+    }
+
+  for (i = 0 ; i < len ; i++)
+    lang_add_data (BYTE, exp_intop (string[i]));
+
+  while (i++ < size)
+    lang_add_data (BYTE, exp_intop ('\0'));
+
+  free (string);
 }
 
 /* Create a new reloc statement.  RELOC is the BFD relocation type to
diff --git a/ld/ldlang.h b/ld/ldlang.h
index f36e04c586a..bf1b602799c 100644
--- a/ld/ldlang.h
+++ b/ld/ldlang.h
@@ -656,7 +656,7 @@ extern void pop_stat_ptr
 extern void lang_add_data
   (int, union etree_union *);
 extern void lang_add_string
-  (const char *);
+  (size_t, const char *s);
 extern void lang_add_reloc
   (bfd_reloc_code_real_type, reloc_howto_type *, asection *, const char *,
    union etree_union *);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-06-28 21:46 [PATCH v1 0/7] ASCII output section command binutils
                   ` (2 preceding siblings ...)
  2024-06-28 21:46 ` [PATCH v1 3/7] ldlang.*: process ASCII command binutils
@ 2024-06-28 21:46 ` binutils
  2024-07-01  6:50   ` Jan Beulich
  2024-06-28 21:46 ` [PATCH v1 5/7] Add testsuite for ASCII command binutils
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: binutils @ 2024-06-28 21:46 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson

From: Ulf Samuelsson <ulf@emagii.com>

Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
---
 ld/ld.texi | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/ld/ld.texi b/ld/ld.texi
index 89e3913317a..5e00413f5b3 100644
--- a/ld/ld.texi
+++ b/ld/ld.texi
@@ -5643,6 +5643,7 @@ C identifiers because they contain a @samp{.} character.
 @cindex section data
 @cindex output section data
 @kindex ASCIZ ``@var{string}''
+@kindex ASCII (@var{expression}) ``@var{string}''
 @kindex BYTE(@var{expression})
 @kindex SHORT(@var{expression})
 @kindex LONG(@var{expression})
@@ -5691,6 +5692,27 @@ For example, this string of 16 characters will create a 17 byte area
   ASCIZ "This is 16 bytes"
 @end smallexample
 
+You can include a fixed size string in an output section by using
+@code{ASCII}.  The keyword is followed by a size enclosed in
+parentheses and then a string.  The string is stored at the current
+value of the location counter and zero bytes are added at the end to
+fill up to the specified size.  Note the fill value is ignored for
+this padding. If the string includes spaces it must be enclosed in double
+quotes.  The string may contain '\n', '\r', '\t' and octal numbers.
+Hex numbers are not supported.
+
+If the string is too long, a warning is issued and the string is
+truncated.  The string will still be zero-terminated in this case.
+
+If the expression evaluates to zero then the directive will be treated
+as if it were @code{ASCIZ} instead.
+
+Example: This is string of 16 characters and will create a 32 byte
+area:
+@smallexample
+  ASCII (32) "This is 16 bytes"
+@end smallexample
+
 Note---these commands only work inside a section description and not
 between them, so the following will produce an error from the linker:
 @smallexample
-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 5/7] Add testsuite for ASCII command
  2024-06-28 21:46 [PATCH v1 0/7] ASCII output section command binutils
                   ` (3 preceding siblings ...)
  2024-06-28 21:46 ` [PATCH v1 4/7] ld.texi: Add ASCII to info file binutils
@ 2024-06-28 21:46 ` binutils
  2024-06-28 21:46 ` [PATCH v1 6/7] NEWS: Add " binutils
  2024-06-28 21:46 ` [PATCH v1 7/7] ChangeLog: " binutils
  6 siblings, 0 replies; 26+ messages in thread
From: binutils @ 2024-06-28 21:46 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson

From: Ulf Samuelsson <ulf@emagii.com>

Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
---
 ld/testsuite/ld-scripts/ascii.d    | 25 ++++++++++++++++++++
 ld/testsuite/ld-scripts/ascii.s    | 11 +++++++++
 ld/testsuite/ld-scripts/ascii.t    | 38 ++++++++++++++++++++++++++++++
 ld/testsuite/ld-scripts/script.exp |  1 +
 4 files changed, 75 insertions(+)
 create mode 100644 ld/testsuite/ld-scripts/ascii.d
 create mode 100644 ld/testsuite/ld-scripts/ascii.s
 create mode 100644 ld/testsuite/ld-scripts/ascii.t

diff --git a/ld/testsuite/ld-scripts/ascii.d b/ld/testsuite/ld-scripts/ascii.d
new file mode 100644
index 00000000000..ff3bd328c15
--- /dev/null
+++ b/ld/testsuite/ld-scripts/ascii.d
@@ -0,0 +1,25 @@
+#source: ascii.s
+#ld: -T ascii.t
+#objdump: -s -j .header
+#target: [is_elf_format] [is_coff_format]
+#notarget: tic4x-*-* tic54x-*-*
+
+.*:     file format .*
+
+Contents of section .header:
+ .... 70726f67 72616d20 6e616d65 00000000  program name....
+ .... 656d7074 79000000 00000000 00000000  empty...........
+ .... 00000000 00000000 00000000 00000000  ................
+ .... 00000000 00000000 00000000 00000000  ................
+ .... 00000000 00000000 00000000 00000000  ................
+ .... 636f6d6d 656e7420 310a0000 00000000  comment 1.......
+ .... 00000000 00000000 00000000 00000000  ................
+ .... 636f6d6d 656e7420 320a0000 00000000  comment 2.......
+ .... 00000000 00000000 00000000 00000000  ................
+ .... 636f6d6d 656e7420 330a0000 00000000  comment 3.......
+ .... 00000000 00000000 00000000 00000000  ................
+ .... 636f6d6d 656e7420 340a0000 00000000  comment 4.......
+ .... 00000000 00000000 49206d65 616e7420  ........I meant 
+ .... 746f2073 61793a20 54686973 20697320  to say: This is 
+ .... 77617920 746f6f20 6c6f6e67 00000000  way too long....
+#pass
diff --git a/ld/testsuite/ld-scripts/ascii.s b/ld/testsuite/ld-scripts/ascii.s
new file mode 100644
index 00000000000..a1b6148dc79
--- /dev/null
+++ b/ld/testsuite/ld-scripts/ascii.s
@@ -0,0 +1,11 @@
+        .extern ecc_start
+	.section .text
+main:
+	.long 0x45444F43
+	.long 0x12345678
+	
+	.section .data
+	.long 0x9abcdef0
+	
+	.section .bss
+	.long 0
diff --git a/ld/testsuite/ld-scripts/ascii.t b/ld/testsuite/ld-scripts/ascii.t
new file mode 100644
index 00000000000..6f682fabd38
--- /dev/null
+++ b/ld/testsuite/ld-scripts/ascii.t
@@ -0,0 +1,38 @@
+_start = 0x000000;
+
+SECTIONS
+{
+  . = 0x1000 + SIZEOF_HEADERS;
+
+  .header ALIGN (0x100) (READONLY) :
+    {
+      ASCII (16) "program name"
+      ASCII (64) "empty"
+      ASCII (4 * 8) "comment 1\n"
+      ASCII (32) "comment 2\n"
+      ASCII (32) "comment 3\n"
+      ASCII (24) "comment 4\n"
+      ASCII (64) "I meant to say: This is way too long"
+    }
+
+  .text ALIGN (0x100) :
+  {
+      entry = .;
+      *(.text)
+  }
+
+  .data : AT (0x400000)
+  {
+	*(.data)
+  }
+  
+  . = ALIGN(0x20);
+  
+  .bss :
+  {
+	*(.bss)
+  }
+
+  /DISCARD/ : { *(*) }
+}
+
diff --git a/ld/testsuite/ld-scripts/script.exp b/ld/testsuite/ld-scripts/script.exp
index 213383de962..bc3fffe92d8 100644
--- a/ld/testsuite/ld-scripts/script.exp
+++ b/ld/testsuite/ld-scripts/script.exp
@@ -228,6 +228,7 @@ foreach test_script $test_script_list {
 }
 
 run_dump_test "asciz"
+run_dump_test "ascii"
 run_dump_test "align-with-input"
 run_dump_test "pr20302"
 run_dump_test "output-section-types"
-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 6/7] NEWS: Add ASCII command
  2024-06-28 21:46 [PATCH v1 0/7] ASCII output section command binutils
                   ` (4 preceding siblings ...)
  2024-06-28 21:46 ` [PATCH v1 5/7] Add testsuite for ASCII command binutils
@ 2024-06-28 21:46 ` binutils
  2024-06-28 21:46 ` [PATCH v1 7/7] ChangeLog: " binutils
  6 siblings, 0 replies; 26+ messages in thread
From: binutils @ 2024-06-28 21:46 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson

From: Ulf Samuelsson <ulf@emagii.com>

Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
---
 ld/NEWS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/ld/NEWS b/ld/NEWS
index e0b9341a8ef..58144afd7a4 100644
--- a/ld/NEWS
+++ b/ld/NEWS
@@ -1,5 +1,9 @@
 -*- text -*-
 
+* The linker script syntax has a new command for output sections: ASCII (<size>) "string"
+  This will insert a string of the specified length at the current location.
+  It will be zero-filled if the string is shorter than <size> and truncated if longer.
+
 * Add -z isa-level-report=[none|all|needed|used] to the x86 ELF linker
   to report needed and used x86-64 ISA levels.
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 7/7] ChangeLog: Add ASCII command
  2024-06-28 21:46 [PATCH v1 0/7] ASCII output section command binutils
                   ` (5 preceding siblings ...)
  2024-06-28 21:46 ` [PATCH v1 6/7] NEWS: Add " binutils
@ 2024-06-28 21:46 ` binutils
  2024-07-01  6:38   ` Jan Beulich
  6 siblings, 1 reply; 26+ messages in thread
From: binutils @ 2024-06-28 21:46 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson

From: Ulf Samuelsson <ulf@emagii.com>

Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
---
 ld/ChangeLog | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/ld/ChangeLog b/ld/ChangeLog
index ab881707e11..0d7fc62a984 100644
--- a/ld/ChangeLog
+++ b/ld/ChangeLog
@@ -1,3 +1,18 @@
+2024-06-28  Ulf Samuelsson <binutils@emagii.com>
+	    Nick Clifton  <nickc@redhat.com>
+
+	* ldlex.l: Add ASCII token.
+	* ldgram.y: Add parsing of the ASCII command.
+	* ldlang.c (lang_add_string): Add maximum size parameter.  Move
+	escape character handling code into separate function.
+	* ldlang.h (lang_add_string): Update prototype.
+	* NEWS: Mention the new feature.
+	* ld.texi (Output Section Data): Document the new directives.
+	* testsuite/ld-scripts/ascii.d: New test driver.
+	* testsuite/ld-scripts/ascii.s: New test assembler source.
+	* testsuite/ld-scripts/ascii.t: New test script.
+	* testsuite/ld-scripts/script.exp: Run the new test.
+
 2024-01-15  Nick Clifton  <nickc@redhat.com>
 
 	* configure: Regenerate.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 3/7] ldlang.*: process ASCII command
  2024-06-28 21:46 ` [PATCH v1 3/7] ldlang.*: process ASCII command binutils
@ 2024-07-01  6:35   ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2024-07-01  6:35 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson, binutils

On 28.06.2024 23:46, binutils@emagii.com wrote:
> --- a/ld/ldlang.c
> +++ b/ld/ldlang.c
> @@ -8663,15 +8663,20 @@ lang_add_data (int type, union etree_union *exp)
>    new_stmt->type = type;
>  }
>  
> -void
> -lang_add_string (const char *s)
> +/* Convert escape codes in S.
> +   Supports \n, \r, \t and \NNN octals.
> +   Returns a copy of S in a malloc'ed buffer.  */
> +
> +static char *
> +convert_string (const char * s)
>  {
> -  bfd_vma  len = strlen (s);
> -  bfd_vma  i;
> -  bool     escape = false;
> +  size_t  len = strlen (s);
> +  size_t  i;
> +  bool    escape = false;
> +  char *  buffer = malloc (len + 1);

You want to avoid crashing if this allocation fails. Perhaps by simply
using xmalloc() or e.g. XNEWVEC().

> +  char *  b;
>  
> -  /* Add byte expressions until end of string.  */
> -  for (i = 0 ; i < len; i++)
> +  for (i = 0, b = buffer; i < len; i++)
>      {
>        char c = *s++;
>  
> @@ -8729,21 +8734,53 @@ lang_add_string (const char *s)
>  	      }
>  	      break;
>  	    }
> -
> -	  lang_add_data (BYTE, exp_intop (c));
>  	  escape = false;
>  	}
>        else
>  	{
>  	  if (c == '\\')
> -	    escape = true;
> -	  else
> -	    lang_add_data (BYTE, exp_intop (c));
> +	    {
> +	      escape = true;
> +	      continue;
> +	    }
>  	}
> +
> +      * b ++ = c;
>      }
>  
> -  /* Remeber to terminate the string.  */
> -  lang_add_data (BYTE, exp_intop (0));
> +  * b = 0;
> +  return buffer;
> +}
> +
> +void
> +lang_add_string (size_t size, const char *s)
> +{
> +  size_t  len;
> +  size_t  i;
> +  char *  string;
> +
> +  string = convert_string (s);
> +  len = strlen (string);
> +
> +  /* Check if it is ASCIZ command (len == 0) */
> +  if (size == 0)
> +    /* Make sure that we include the terminating nul byte.  */
> +    size = len + 1;
> +  else if (len >= size)
> +    {
> +      len = size - 1;
> +
> +      einfo (_("%P:%pS: warning: ASCII string does not fit in allocated space,"
> +		" truncated\n"), NULL);
> +    }
> +
> +  for (i = 0 ; i < len ; i++)

Nit: No blanks before ; please.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 7/7] ChangeLog: Add ASCII command
  2024-06-28 21:46 ` [PATCH v1 7/7] ChangeLog: " binutils
@ 2024-07-01  6:38   ` Jan Beulich
       [not found]     ` <094CF629-FDF8-436D-8B5B-62D35C3EA396@emagii.com>
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2024-07-01  6:38 UTC (permalink / raw)
  To: binutils; +Cc: nickc, Ulf Samuelsson, binutils

On 28.06.2024 23:46, binutils@emagii.com wrote:
> --- a/ld/ChangeLog
> +++ b/ld/ChangeLog
> @@ -1,3 +1,18 @@
> +2024-06-28  Ulf Samuelsson <binutils@emagii.com>
> +	    Nick Clifton  <nickc@redhat.com>
> +
> +	* ldlex.l: Add ASCII token.
> +	* ldgram.y: Add parsing of the ASCII command.
> +	* ldlang.c (lang_add_string): Add maximum size parameter.  Move
> +	escape character handling code into separate function.
> +	* ldlang.h (lang_add_string): Update prototype.
> +	* NEWS: Mention the new feature.
> +	* ld.texi (Output Section Data): Document the new directives.
> +	* testsuite/ld-scripts/ascii.d: New test driver.
> +	* testsuite/ld-scripts/ascii.s: New test assembler source.
> +	* testsuite/ld-scripts/ascii.t: New test script.
> +	* testsuite/ld-scripts/script.exp: Run the new test.

The latest here it is clear: No, this can't be all separate commits.
ChangeLog entries, if any, should be added by the commit with the
changes actually being described. Separation of testcases and doc
additions is questionable, too. Perhaps all of this series really
would best be a single commit?

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-06-28 21:46 ` [PATCH v1 4/7] ld.texi: Add ASCII to info file binutils
@ 2024-07-01  6:50   ` Jan Beulich
       [not found]     ` <6FDD8EBD-2371-40DB-B25B-052278BCBF49@emagii.com>
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2024-07-01  6:50 UTC (permalink / raw)
  To: binutils, binutils; +Cc: nickc, Ulf Samuelsson

On 28.06.2024 23:46, binutils@emagii.com wrote:
> --- a/ld/ld.texi
> +++ b/ld/ld.texi
> @@ -5643,6 +5643,7 @@ C identifiers because they contain a @samp{.} character.
>  @cindex section data
>  @cindex output section data
>  @kindex ASCIZ ``@var{string}''
> +@kindex ASCII (@var{expression}) ``@var{string}''

The arrangement is pretty odd, imo. I've tried to spot whether there was
any other construct with an argument following the closing parenthesis,
but I couldn't spot any. I may guess that it's ASCIZ that you're trying
to follow, yet there it's at least _only_ a string that follows the
keyword.

> @@ -5691,6 +5692,27 @@ For example, this string of 16 characters will create a 17 byte area
>    ASCIZ "This is 16 bytes"
>  @end smallexample
>  
> +You can include a fixed size string in an output section by using
> +@code{ASCII}.  The keyword is followed by a size enclosed in
> +parentheses and then a string.  The string is stored at the current
> +value of the location counter and zero bytes are added at the end to
> +fill up to the specified size.  Note the fill value is ignored for
> +this padding. If the string includes spaces it must be enclosed in double
> +quotes.

By requiring quotation, moving this 2nd argument into the parentheses
ought to be possible?

>  The string may contain '\n', '\r', '\t' and octal numbers.
> +Hex numbers are not supported.
> +
> +If the string is too long, a warning is issued and the string is
> +truncated.  The string will still be zero-terminated in this case.

This aspect in particular makes ASCII as the keyword misleading. If you
want to avoid overloading ASCIZ (as doing so would risk backwards
compatibility issues with unquoted strings used there, when those
were enclosed in parentheses), I think it would want to be ASCIIZ.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
       [not found]     ` <6FDD8EBD-2371-40DB-B25B-052278BCBF49@emagii.com>
@ 2024-07-02  8:12       ` Jan Beulich
  2024-07-05 15:38         ` Ulf Samuelsson
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2024-07-02  8:12 UTC (permalink / raw)
  To: Ulf Samuelsson; +Cc: Binutils, Nick Clifton

First: Please don't drop the mailing list from Cc. Second: Please don't
top-post.

On 02.07.2024 00:21, Ulf Samuelsson wrote:
> I submitted the code in this patch series more than a year ago, and it got accepted.
> Then a lot was reverted since I had not assigned the copyright to FSF.
> For unknown reasons, my ASCIZ patches remained, but the ASCII stuff was reverted.
> The reason for the naming is to be similar to how I remember naming in some assemblers where ASCIZ is zero terminated and ASCII is not zero terminated.
> 
> If the string is too long, so it does not fit, it is really an error.
> Adding a zero when truncating is a little less intrusive and may reduce runtime errors.
> ASCII tells people what it is meant to be doing. ASCIIZ will make a lot of people wonder what the difference is between ASCIZ and ASCIIZ. It will also be misleading, because it does not add a zero if the string fits.

Then did I read the code wrong? I was left with the impression that you'll
truncate at size-1, to ensure zero termination. Plus as you say in this
doc, size being 0 results in ASCIZ behavior, i.e. zero-terminates too.

> I would prefer the syntax ”ASCII <size>, string” but I never got it to work. Adding parantheses made it work.
> ”ASCII (<size>, string)” seems downright ugly and also very different from the ASCIZ directive.

In which case I have to question that original ASCIZ directive's syntax.
Imo it should have used parentheses, too, matching other statements with
"operands". Together with requiring quotation, overloading ASCIZ then
wouldn't have been a problem now.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 7/7] ChangeLog: Add ASCII command
       [not found]     ` <094CF629-FDF8-436D-8B5B-62D35C3EA396@emagii.com>
@ 2024-07-02  8:37       ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2024-07-02  8:37 UTC (permalink / raw)
  To: Ulf Samuelsson; +Cc: Binutils, Nick Clifton

On 02.07.2024 00:26, Ulf Samuelsson wrote:
> I think it is better to keep them as separate commits while the patch is being reviewed and to create a single commit once each part is OK.
> Would that be OK with you guys?

That'll be a little strange, but may work. It'll want making clear though
what exactly is intended to be folded for committing.

Jan

>> 1 juli 2024 kl. 08:38 skrev Jan Beulich <jbeulich@suse.com>:
>>
>> On 28.06.2024 23:46, binutils@emagii.com wrote:
>>> --- a/ld/ChangeLog
>>> +++ b/ld/ChangeLog
>>> @@ -1,3 +1,18 @@
>>> +2024-06-28  Ulf Samuelsson <binutils@emagii.com>
>>> +        Nick Clifton  <nickc@redhat.com>
>>> +
>>> +    * ldlex.l: Add ASCII token.
>>> +    * ldgram.y: Add parsing of the ASCII command.
>>> +    * ldlang.c (lang_add_string): Add maximum size parameter.  Move
>>> +    escape character handling code into separate function.
>>> +    * ldlang.h (lang_add_string): Update prototype.
>>> +    * NEWS: Mention the new feature.
>>> +    * ld.texi (Output Section Data): Document the new directives.
>>> +    * testsuite/ld-scripts/ascii.d: New test driver.
>>> +    * testsuite/ld-scripts/ascii.s: New test assembler source.
>>> +    * testsuite/ld-scripts/ascii.t: New test script.
>>> +    * testsuite/ld-scripts/script.exp: Run the new test.
>>
>> The latest here it is clear: No, this can't be all separate commits.
>> ChangeLog entries, if any, should be added by the commit with the
>> changes actually being described. Separation of testcases and doc
>> additions is questionable, too. Perhaps all of this series really
>> would best be a single commit?
>>
>> Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-02  8:12       ` Jan Beulich
@ 2024-07-05 15:38         ` Ulf Samuelsson
  2024-07-08  6:18           ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Ulf Samuelsson @ 2024-07-05 15:38 UTC (permalink / raw)
  To: Jan Beulich, Ulf Samuelsson; +Cc: Binutils, Nick Clifton


Den 2024-07-02 kl. 10:12, skrev Jan Beulich:
> First: Please don't drop the mailing list from Cc. Second: Please don't
> top-post.
>
> On 02.07.2024 00:21, Ulf Samuelsson wrote:
>> I submitted the code in this patch series more than a year ago, and it got accepted.
>> Then a lot was reverted since I had not assigned the copyright to FSF.
>> For unknown reasons, my ASCIZ patches remained, but the ASCII stuff was reverted.
>> The reason for the naming is to be similar to how I remember naming in some assemblers where ASCIZ is zero terminated and ASCII is not zero terminated.
>>
>> If the string is too long, so it does not fit, it is really an error.
>> Adding a zero when truncating is a little less intrusive and may reduce runtime errors.
>> ASCII tells people what it is meant to be doing. ASCIIZ will make a lot of people wonder what the difference is between ASCIZ and ASCIIZ. It will also be misleading, because it does not add a zero if the string fits.
> Then did I read the code wrong? I was left with the impression that you'll
> truncate at size-1, to ensure zero termination. Plus as you say in this
> doc, size being 0 results in ASCIZ behavior, i.e. zero-terminates too.

              "          111111"
              "0123456789012345"

   ASCII (16) "This is a string"

   results in 'size' == 16. Since the string is 16 characters, 'len' 
will be 16 as well.

   This loop will be run 16 times, and i will be 16 at the exit of

   for (i = 0; i < len; i++)
     lang_add_data (BYTE, exp_intop (string[i]));

   Since i == 16, and size == 16, the following loop will not be run 
even once.

   while (i++ < size)
     lang_add_data (BYTE, exp_intop ('\0'));

   so no '\0' will be added.

===============================

   ASCII (16) "This is a string."

   size = 16, len = 17.  len > size, solen is set to 16-1 = 15.

   copy the first 15 characters in the first loop, add a single '\0' in 
the second loop.

===============================

   ASCII (16) "This is string"

   size = 16, len = 14

   copy the first 14 characters in the first loop, add a '\0' twice in 
the second loop.


>
>> I would prefer the syntax ”ASCII <size>, string” but I never got it to work. Adding parantheses made it work.
>> ”ASCII (<size>, string)” seems downright ugly and also very different from the ASCIZ directive.
> In which case I have to question that original ASCIZ directive's syntax.
> Imo it should have used parentheses, too, matching other statements with
> "operands". Together with requiring quotation, overloading ASCIZ then
> wouldn't have been a problem now.
>
> Jan

-- 
Best Regards
Ulf Samuelsson


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-05 15:38         ` Ulf Samuelsson
@ 2024-07-08  6:18           ` Jan Beulich
  2024-07-08 10:57             ` Ulf Samuelsson
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2024-07-08  6:18 UTC (permalink / raw)
  To: Ulf Samuelsson, Ulf Samuelsson; +Cc: Binutils, Nick Clifton

On 05.07.2024 17:38, Ulf Samuelsson wrote:
> 
> Den 2024-07-02 kl. 10:12, skrev Jan Beulich:
>> First: Please don't drop the mailing list from Cc. Second: Please don't
>> top-post.
>>
>> On 02.07.2024 00:21, Ulf Samuelsson wrote:
>>> I submitted the code in this patch series more than a year ago, and it got accepted.
>>> Then a lot was reverted since I had not assigned the copyright to FSF.
>>> For unknown reasons, my ASCIZ patches remained, but the ASCII stuff was reverted.
>>> The reason for the naming is to be similar to how I remember naming in some assemblers where ASCIZ is zero terminated and ASCII is not zero terminated.
>>>
>>> If the string is too long, so it does not fit, it is really an error.
>>> Adding a zero when truncating is a little less intrusive and may reduce runtime errors.
>>> ASCII tells people what it is meant to be doing. ASCIIZ will make a lot of people wonder what the difference is between ASCIZ and ASCIIZ. It will also be misleading, because it does not add a zero if the string fits.
>> Then did I read the code wrong? I was left with the impression that you'll
>> truncate at size-1, to ensure zero termination. Plus as you say in this
>> doc, size being 0 results in ASCIZ behavior, i.e. zero-terminates too.
> 
>               "          111111"
>               "0123456789012345"
> 
>    ASCII (16) "This is a string"
> 
>    results in 'size' == 16. Since the string is 16 characters, 'len' 
> will be 16 as well.

And hence

  else if (len >= size)
    {
      len = size - 1;

would reduce len down to 15 afaics, such that ...


>    This loop will be run 16 times, and i will be 16 at the exit of
> 
>    for (i = 0; i < len; i++)
>      lang_add_data (BYTE, exp_intop (string[i]));
> 
>    Since i == 16, and size == 16, the following loop will not be run 
> even once.
> 
>    while (i++ < size)
>      lang_add_data (BYTE, exp_intop ('\0'));
> 
>    so no '\0' will be added.

... a '\0' will be added here? Otherwise, if I continue to overlook
something, what I can say is that ...

> ===============================
> 
>    ASCII (16) "This is a string."
> 
>    size = 16, len = 17.  len > size, solen is set to 16-1 = 15.
> 
>    copy the first 15 characters in the first loop, add a single '\0' in 
> the second loop.

... this would be pretty odd to me: Why would a nul be inserted in
this case, but not in the one further up?

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08  6:18           ` Jan Beulich
@ 2024-07-08 10:57             ` Ulf Samuelsson
  2024-07-08 11:07               ` Jan Beulich
  2024-07-08 11:18               ` Andreas Schwab
  0 siblings, 2 replies; 26+ messages in thread
From: Ulf Samuelsson @ 2024-07-08 10:57 UTC (permalink / raw)
  To: Jan Beulich, Ulf Samuelsson; +Cc: Binutils, Nick Clifton

[-- Attachment #1: Type: text/plain, Size: 3154 bytes --]


Den 2024-07-08 kl. 08:18, skrev Jan Beulich:
> On 05.07.2024 17:38, Ulf Samuelsson wrote:
>> Den 2024-07-02 kl. 10:12, skrev Jan Beulich:
>>> First: Please don't drop the mailing list from Cc. Second: Please don't
>>> top-post.
>>>
>>> On 02.07.2024 00:21, Ulf Samuelsson wrote:
>>>> I submitted the code in this patch series more than a year ago, and it got accepted.
>>>> Then a lot was reverted since I had not assigned the copyright to FSF.
>>>> For unknown reasons, my ASCIZ patches remained, but the ASCII stuff was reverted.
>>>> The reason for the naming is to be similar to how I remember naming in some assemblers where ASCIZ is zero terminated and ASCII is not zero terminated.
>>>>
>>>> If the string is too long, so it does not fit, it is really an error.
>>>> Adding a zero when truncating is a little less intrusive and may reduce runtime errors.
>>>> ASCII tells people what it is meant to be doing. ASCIIZ will make a lot of people wonder what the difference is between ASCIZ and ASCIIZ. It will also be misleading, because it does not add a zero if the string fits.
>>> Then did I read the code wrong? I was left with the impression that you'll
>>> truncate at size-1, to ensure zero termination. Plus as you say in this
>>> doc, size being 0 results in ASCIZ behavior, i.e. zero-terminates too.
>>                "          111111"
>>                "0123456789012345"
>>
>>     ASCII (16) "This is a string"
>>
>>     results in 'size' == 16. Since the string is 16 characters, 'len'
>> will be 16 as well.
> And hence
>
>    else if (len >= size)
>      {
>        len = size - 1;
>
> would reduce len down to 15 afaics, such that ...
>
Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16 
bytes") == 16,
the linker should add a 16 byte area without a terminating NUL,

If the string is shorter, then the linker should pad with NUL.

If the string is longer, then the linker should truncate the string 
adding a NUL
at the end, still fitting in the area. A warning should be issued.

You could issue an error, but a warning is less intrusive.

Checked again and it needs to be changed from

   else if (len >= size)

to

   else if (len > size)

then the logic hopefully works out.


>>     This loop will be run 16 times, and i will be 16 at the exit of
>>
>>     for (i = 0; i < len; i++)
>>       lang_add_data (BYTE, exp_intop (string[i]));
>>
>>     Since i == 16, and size == 16, the following loop will not be run
>> even once.
>>
>>     while (i++ < size)
>>       lang_add_data (BYTE, exp_intop ('\0'));
>>
>>     so no '\0' will be added.
> ... a '\0' will be added here? Otherwise, if I continue to overlook
> something, what I can say is that ...
>
>> ===============================
>>
>>     ASCII (16) "This is a string."
>>
>>     size = 16, len = 17.  len > size, solen is set to 16-1 = 15.
>>
>>     copy the first 15 characters in the first loop, add a single '\0' in
>> the second loop.
> ... this would be pretty odd to me: Why would a nul be inserted in
> this case, but not in the one further up?
>
> Jan

-- 
Best Regards
Ulf Samuelsson

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 10:57             ` Ulf Samuelsson
@ 2024-07-08 11:07               ` Jan Beulich
  2024-07-08 11:19                 ` Ulf Samuelsson
  2024-07-08 11:18               ` Andreas Schwab
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2024-07-08 11:07 UTC (permalink / raw)
  To: Ulf Samuelsson, Ulf Samuelsson; +Cc: Binutils, Nick Clifton

On 08.07.2024 12:57, Ulf Samuelsson wrote:
> 
> Den 2024-07-08 kl. 08:18, skrev Jan Beulich:
>> On 05.07.2024 17:38, Ulf Samuelsson wrote:
>>> Den 2024-07-02 kl. 10:12, skrev Jan Beulich:
>>>> First: Please don't drop the mailing list from Cc. Second: Please don't
>>>> top-post.
>>>>
>>>> On 02.07.2024 00:21, Ulf Samuelsson wrote:
>>>>> I submitted the code in this patch series more than a year ago, and it got accepted.
>>>>> Then a lot was reverted since I had not assigned the copyright to FSF.
>>>>> For unknown reasons, my ASCIZ patches remained, but the ASCII stuff was reverted.
>>>>> The reason for the naming is to be similar to how I remember naming in some assemblers where ASCIZ is zero terminated and ASCII is not zero terminated.
>>>>>
>>>>> If the string is too long, so it does not fit, it is really an error.
>>>>> Adding a zero when truncating is a little less intrusive and may reduce runtime errors.
>>>>> ASCII tells people what it is meant to be doing. ASCIIZ will make a lot of people wonder what the difference is between ASCIZ and ASCIIZ. It will also be misleading, because it does not add a zero if the string fits.
>>>> Then did I read the code wrong? I was left with the impression that you'll
>>>> truncate at size-1, to ensure zero termination. Plus as you say in this
>>>> doc, size being 0 results in ASCIZ behavior, i.e. zero-terminates too.
>>>                "          111111"
>>>                "0123456789012345"
>>>
>>>     ASCII (16) "This is a string"
>>>
>>>     results in 'size' == 16. Since the string is 16 characters, 'len'
>>> will be 16 as well.
>> And hence
>>
>>    else if (len >= size)
>>      {
>>        len = size - 1;
>>
>> would reduce len down to 15 afaics, such that ...
>>
> Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16 
> bytes") == 16,
> the linker should add a 16 byte area without a terminating NUL,
> 
> If the string is shorter, then the linker should pad with NUL.
> 
> If the string is longer, then the linker should truncate the string 
> adding a NUL
> at the end, still fitting in the area. A warning should be issued.

What I'm still actually missing a reason for is why in this case you
want to insert a nul while in the len == size case you want to keep
the string as is. To me this is inconsistent and hence yielding
surprising behavior. Yet with that harmonized (whichever direction)
it'll then become clear whether the directive wants to be ASCII or
ASCIZ (or alike).

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 10:57             ` Ulf Samuelsson
  2024-07-08 11:07               ` Jan Beulich
@ 2024-07-08 11:18               ` Andreas Schwab
  2024-07-08 11:29                 ` Ulf Samuelsson
  1 sibling, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2024-07-08 11:18 UTC (permalink / raw)
  To: Ulf Samuelsson; +Cc: Jan Beulich, Ulf Samuelsson, Binutils, Nick Clifton

On Jul 08 2024, Ulf Samuelsson wrote:

> Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16
> bytes") == 16,
> the linker should add a 16 byte area without a terminating NUL,
>
> If the string is shorter, then the linker should pad with NUL.
>
> If the string is longer, then the linker should truncate the string adding
> a NUL
> at the end, still fitting in the area. A warning should be issued.

This inonsistency is confusing.  The directive should either always use
the first size characters from the string, only padding with nulls when
the string is shorter, or reject longer strings at all.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 11:07               ` Jan Beulich
@ 2024-07-08 11:19                 ` Ulf Samuelsson
  0 siblings, 0 replies; 26+ messages in thread
From: Ulf Samuelsson @ 2024-07-08 11:19 UTC (permalink / raw)
  To: Jan Beulich, Ulf Samuelsson; +Cc: Binutils, Nick Clifton


Den 2024-07-08 kl. 13:07, skrev Jan Beulich:
> On 08.07.2024 12:57, Ulf Samuelsson wrote:
>> Den 2024-07-08 kl. 08:18, skrev Jan Beulich:
>>> On 05.07.2024 17:38, Ulf Samuelsson wrote:
>>>> Den 2024-07-02 kl. 10:12, skrev Jan Beulich:
>>>>> First: Please don't drop the mailing list from Cc. Second: Please don't
>>>>> top-post.
>>>>>
>>>>> On 02.07.2024 00:21, Ulf Samuelsson wrote:
>>>>>> I submitted the code in this patch series more than a year ago, and it got accepted.
>>>>>> Then a lot was reverted since I had not assigned the copyright to FSF.
>>>>>> For unknown reasons, my ASCIZ patches remained, but the ASCII stuff was reverted.
>>>>>> The reason for the naming is to be similar to how I remember naming in some assemblers where ASCIZ is zero terminated and ASCII is not zero terminated.
>>>>>>
>>>>>> If the string is too long, so it does not fit, it is really an error.
>>>>>> Adding a zero when truncating is a little less intrusive and may reduce runtime errors.
>>>>>> ASCII tells people what it is meant to be doing. ASCIIZ will make a lot of people wonder what the difference is between ASCIZ and ASCIIZ. It will also be misleading, because it does not add a zero if the string fits.
>>>>> Then did I read the code wrong? I was left with the impression that you'll
>>>>> truncate at size-1, to ensure zero termination. Plus as you say in this
>>>>> doc, size being 0 results in ASCIZ behavior, i.e. zero-terminates too.
>>>>                 "          111111"
>>>>                 "0123456789012345"
>>>>
>>>>      ASCII (16) "This is a string"
>>>>
>>>>      results in 'size' == 16. Since the string is 16 characters, 'len'
>>>> will be 16 as well.
>>> And hence
>>>
>>>     else if (len >= size)
>>>       {
>>>         len = size - 1;
>>>
>>> would reduce len down to 15 afaics, such that ...
>>>
>> Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16
>> bytes") == 16,
>> the linker should add a 16 byte area without a terminating NUL,
>>
>> If the string is shorter, then the linker should pad with NUL.
>>
>> If the string is longer, then the linker should truncate the string
>> adding a NUL
>> at the end, still fitting in the area. A warning should be issued.
> What I'm still actually missing a reason for is why in this case you
> want to insert a nul while in the len == size case you want to keep
> the string as is. To me this is inconsistent and hence yielding
> surprising behavior. Yet with that harmonized (whichever direction)
> it'll then become clear whether the directive wants to be ASCII or
> ASCIZ (or alike).
>
> Jan

Because if the warning is ignored, then I expect less havoc if the code 
is running.


The typical use I see is in a header where a field has a predefined size.

The user will be responsible for manually adding a NUL byte so the
typical directive is

ASCII 16, "This is 15 byte\0"   ; 16 byte string

ASCII 16, "This is 15\0\0\0\0\0\0"   ; 16 byte string

If the user adds

ASCII 16, "This is 15 bytes\0"  ; 17 byte string

the string is too long.

As the string is truncated, there is no ending NUL character,
and if the string is printed, expecting an ending NUL character
a severe problem may occur.

-- 
Best Regards
Ulf Samuelsson


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 11:18               ` Andreas Schwab
@ 2024-07-08 11:29                 ` Ulf Samuelsson
  2024-07-08 11:40                   ` Jan Beulich
  2024-07-08 11:48                   ` Andreas Schwab
  0 siblings, 2 replies; 26+ messages in thread
From: Ulf Samuelsson @ 2024-07-08 11:29 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Jan Beulich, Ulf Samuelsson, Binutils, Nick Clifton


Den 2024-07-08 kl. 13:18, skrev Andreas Schwab:
> On Jul 08 2024, Ulf Samuelsson wrote:
>
>> Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16
>> bytes") == 16,
>> the linker should add a 16 byte area without a terminating NUL,
>>
>> If the string is shorter, then the linker should pad with NUL.
>>
>> If the string is longer, then the linker should truncate the string adding
>> a NUL
>> at the end, still fitting in the area. A warning should be issued.
> This inonsistency is confusing.  The directive should either always use
> the first size characters from the string, only padding with nulls when
> the string is shorter, or reject longer strings at all.
>
I do not see that the user can be confused if the user reads the warning 
message:

"warning: ASCII string does not fit in allocated space, truncated"


-- 
Best Regards
Ulf Samuelsson


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 11:29                 ` Ulf Samuelsson
@ 2024-07-08 11:40                   ` Jan Beulich
  2024-07-08 12:10                     ` Ulf Samuelsson
  2024-07-08 11:48                   ` Andreas Schwab
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2024-07-08 11:40 UTC (permalink / raw)
  To: Ulf Samuelsson; +Cc: Ulf Samuelsson, Binutils, Nick Clifton, Andreas Schwab

On 08.07.2024 13:29, Ulf Samuelsson wrote:
> 
> Den 2024-07-08 kl. 13:18, skrev Andreas Schwab:
>> On Jul 08 2024, Ulf Samuelsson wrote:
>>
>>> Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16
>>> bytes") == 16,
>>> the linker should add a 16 byte area without a terminating NUL,
>>>
>>> If the string is shorter, then the linker should pad with NUL.
>>>
>>> If the string is longer, then the linker should truncate the string adding
>>> a NUL
>>> at the end, still fitting in the area. A warning should be issued.
>> This inonsistency is confusing.  The directive should either always use
>> the first size characters from the string, only padding with nulls when
>> the string is shorter, or reject longer strings at all.
>>
> I do not see that the user can be confused if the user reads the warning 
> message:
> 
> "warning: ASCII string does not fit in allocated space, truncated"

Well, paying attention to this warning is just one thing. Yet even if you
do, you still won't know _how_ the truncation was carried out. And for a
directive named ASCII I for one would expect "plain" truncation. Much
like you also do not warn when size == len.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 11:29                 ` Ulf Samuelsson
  2024-07-08 11:40                   ` Jan Beulich
@ 2024-07-08 11:48                   ` Andreas Schwab
  2024-07-08 12:01                     ` Ulf Samuelsson
  1 sibling, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2024-07-08 11:48 UTC (permalink / raw)
  To: Ulf Samuelsson; +Cc: Jan Beulich, Ulf Samuelsson, Binutils, Nick Clifton

On Jul 08 2024, Ulf Samuelsson wrote:

> Den 2024-07-08 kl. 13:18, skrev Andreas Schwab:
>> On Jul 08 2024, Ulf Samuelsson wrote:
>>
>>> Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16
>>> bytes") == 16,
>>> the linker should add a 16 byte area without a terminating NUL,
>>>
>>> If the string is shorter, then the linker should pad with NUL.
>>>
>>> If the string is longer, then the linker should truncate the string adding
>>> a NUL
>>> at the end, still fitting in the area. A warning should be issued.
>> This inonsistency is confusing.  The directive should either always use
>> the first size characters from the string, only padding with nulls when
>> the string is shorter, or reject longer strings at all.
>>
> I do not see that the user can be confused if the user reads the warning
> message:
>
> "warning: ASCII string does not fit in allocated space, truncated"

That does not tell the whole truth.  ASCII 4, "foob" gives you a 4 byte
string, but ASCII 4, "foobar" gives you a three byte string.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 11:48                   ` Andreas Schwab
@ 2024-07-08 12:01                     ` Ulf Samuelsson
  2024-07-08 12:16                       ` Andreas Schwab
  0 siblings, 1 reply; 26+ messages in thread
From: Ulf Samuelsson @ 2024-07-08 12:01 UTC (permalink / raw)
  To: Andreas Schwab, Ulf Samuelsson; +Cc: Jan Beulich, Binutils, Nick Clifton


Den 2024-07-08 kl. 13:48, skrev Andreas Schwab:
> On Jul 08 2024, Ulf Samuelsson wrote:
>
>> Den 2024-07-08 kl. 13:18, skrev Andreas Schwab:
>>> On Jul 08 2024, Ulf Samuelsson wrote:
>>>
>>>> Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16
>>>> bytes") == 16,
>>>> the linker should add a 16 byte area without a terminating NUL,
>>>>
>>>> If the string is shorter, then the linker should pad with NUL.
>>>>
>>>> If the string is longer, then the linker should truncate the string adding
>>>> a NUL
>>>> at the end, still fitting in the area. A warning should be issued.
>>> This inonsistency is confusing.  The directive should either always use
>>> the first size characters from the string, only padding with nulls when
>>> the string is shorter, or reject longer strings at all.
>>>
>> I do not see that the user can be confused if the user reads the warning
>> message:
>>
>> "warning: ASCII string does not fit in allocated space, truncated"
> That does not tell the whole truth.  ASCII 4, "foob" gives you a 4 byte
> string, but ASCII 4, "foobar" gives you a three byte string.

ASCII 4, "foob" gives you a 4 byte area which is not a zero-terminated 
string.
If you intended it to be a zero-terminated string, there is an uncatched 
bug.

ASCII 4, "foobar" gives you a 4 byte area which is different from your 
intention and a warning message.
It also tries to minimizes the damage of your mistake.


>
-- 
Best Regards,
Ulf Samuelsson
eMagii
+46 722 427437
ulf@emagii.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 11:40                   ` Jan Beulich
@ 2024-07-08 12:10                     ` Ulf Samuelsson
  0 siblings, 0 replies; 26+ messages in thread
From: Ulf Samuelsson @ 2024-07-08 12:10 UTC (permalink / raw)
  To: Jan Beulich, Ulf Samuelsson; +Cc: Binutils, Nick Clifton, Andreas Schwab


Den 2024-07-08 kl. 13:40, skrev Jan Beulich:
> On 08.07.2024 13:29, Ulf Samuelsson wrote:
>> Den 2024-07-08 kl. 13:18, skrev Andreas Schwab:
>>> On Jul 08 2024, Ulf Samuelsson wrote:
>>>
>>>> Given *ASCII 16, "This is 16 bytes*"  size == 16 and strlen("This is 16
>>>> bytes") == 16,
>>>> the linker should add a 16 byte area without a terminating NUL,
>>>>
>>>> If the string is shorter, then the linker should pad with NUL.
>>>>
>>>> If the string is longer, then the linker should truncate the string adding
>>>> a NUL
>>>> at the end, still fitting in the area. A warning should be issued.
>>> This inonsistency is confusing.  The directive should either always use
>>> the first size characters from the string, only padding with nulls when
>>> the string is shorter, or reject longer strings at all.
>>>
>> I do not see that the user can be confused if the user reads the warning
>> message:
>>
>> "warning: ASCII string does not fit in allocated space, truncated"
> Well, paying attention to this warning is just one thing. Yet even if you
> do, you still won't know _how_ the truncation was carried out. And for a
> directive named ASCII I for one would expect "plain" truncation. Much
> like you also do not warn when size == len.

You know what happens if you look at the output data from the linker.

It is of course possible to clarify this in the warning or include it in 
the documentation.

size == len is a valid case, and therefore no warning.

While I suspect the main use is to add a zero-terminated string,
there is nothing that stop the user from copying a fixed number of 
characters
before use


>
> Jan

-- 
Best Regards,
Ulf Samuelsson
eMagii
+46 722 427437
ulf@emagii.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 12:01                     ` Ulf Samuelsson
@ 2024-07-08 12:16                       ` Andreas Schwab
  2024-07-08 12:22                         ` Ulf Samuelsson
  0 siblings, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2024-07-08 12:16 UTC (permalink / raw)
  To: Ulf Samuelsson; +Cc: Ulf Samuelsson, Jan Beulich, Binutils, Nick Clifton

On Jul 08 2024, Ulf Samuelsson wrote:

> ASCII 4, "foob" gives you a 4 byte area which is not a zero-terminated
> string.
> If you intended it to be a zero-terminated string, there is an uncatched
> bug.
>
> ASCII 4, "foobar" gives you a 4 byte area which is different from your
> intention and a warning message.
> It also tries to minimizes the damage of your mistake.

This incosistency makes it a questionable feature.  The second example
should really be rejected at all.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/7] ld.texi: Add ASCII to info file
  2024-07-08 12:16                       ` Andreas Schwab
@ 2024-07-08 12:22                         ` Ulf Samuelsson
  0 siblings, 0 replies; 26+ messages in thread
From: Ulf Samuelsson @ 2024-07-08 12:22 UTC (permalink / raw)
  To: Andreas Schwab, Ulf Samuelsson; +Cc: Jan Beulich, Binutils, Nick Clifton


Den 2024-07-08 kl. 14:16, skrev Andreas Schwab:
> On Jul 08 2024, Ulf Samuelsson wrote:
>
>> ASCII 4, "foob" gives you a 4 byte area which is not a zero-terminated
>> string.
>> If you intended it to be a zero-terminated string, there is an uncatched
>> bug.
>>
>> ASCII 4, "foobar" gives you a 4 byte area which is different from your
>> intention and a warning message.
>> It also tries to minimizes the damage of your mistake.
> This incosistency makes it a questionable feature.  The second example
> should really be rejected at all.
>
Since the issue is normally harmless, I believe that exiting the linker 
with an error message is too harsh.

If you feel the use of "ASCII", makes it confusing, feel free to suggest 
a better name.

I'd rather keep the functionality than the name.

-- 
Best Regards
Ulf Samuelsson


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-07-08 12:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-28 21:46 [PATCH v1 0/7] ASCII output section command binutils
2024-06-28 21:46 ` [PATCH v1 1/7] ldlex.l: Add ASCII token binutils
2024-06-28 21:46 ` [PATCH v1 2/7] ldgram.y: Add ASCII parsing binutils
2024-06-28 21:46 ` [PATCH v1 3/7] ldlang.*: process ASCII command binutils
2024-07-01  6:35   ` Jan Beulich
2024-06-28 21:46 ` [PATCH v1 4/7] ld.texi: Add ASCII to info file binutils
2024-07-01  6:50   ` Jan Beulich
     [not found]     ` <6FDD8EBD-2371-40DB-B25B-052278BCBF49@emagii.com>
2024-07-02  8:12       ` Jan Beulich
2024-07-05 15:38         ` Ulf Samuelsson
2024-07-08  6:18           ` Jan Beulich
2024-07-08 10:57             ` Ulf Samuelsson
2024-07-08 11:07               ` Jan Beulich
2024-07-08 11:19                 ` Ulf Samuelsson
2024-07-08 11:18               ` Andreas Schwab
2024-07-08 11:29                 ` Ulf Samuelsson
2024-07-08 11:40                   ` Jan Beulich
2024-07-08 12:10                     ` Ulf Samuelsson
2024-07-08 11:48                   ` Andreas Schwab
2024-07-08 12:01                     ` Ulf Samuelsson
2024-07-08 12:16                       ` Andreas Schwab
2024-07-08 12:22                         ` Ulf Samuelsson
2024-06-28 21:46 ` [PATCH v1 5/7] Add testsuite for ASCII command binutils
2024-06-28 21:46 ` [PATCH v1 6/7] NEWS: Add " binutils
2024-06-28 21:46 ` [PATCH v1 7/7] ChangeLog: " binutils
2024-07-01  6:38   ` Jan Beulich
     [not found]     ` <094CF629-FDF8-436D-8B5B-62D35C3EA396@emagii.com>
2024-07-02  8:37       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).