public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] [Ada] Make middle-end string literals NUL terminated
@ 2018-07-31 12:07 Bernd Edlinger
  2018-08-03 16:50 ` Olivier Hainque
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-07-31 12:07 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

Hi!


This fixes a couple STRING_CST that are not explicitly NUL terminated.
These were caught in a new check in varasm.c I am currently working on.

Having a NUL terminated string does not change the binary output, but it
makes things easier for he middle-end.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-ada-strings.diff --]
[-- Type: text/x-patch; name="patch-ada-strings.diff", Size: 1347 bytes --]

2018-07-31  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* gcc-interface/trans.c (gnat_to_gnu): Make string literal properly
	NUL terminated.
	* gcc-interface/utils2.c (expand_sloc): Likewise.

diff -pur gcc/ada/gcc-interface/trans.c gcc/ada/gcc-interface/trans.c
--- gcc/ada/gcc-interface/trans.c	2018-07-17 10:10:04.000000000 +0200
+++ gcc/ada/gcc-interface/trans.c	2018-07-31 11:16:27.350728886 +0200
@@ -6079,7 +6079,7 @@ gnat_to_gnu (Node_Id gnat_node)
 	     where GCC will want to treat it as a C string.  */
 	  string[i] = 0;
 
-	  gnu_result = build_string (length, string);
+	  gnu_result = build_string (length + 1, string);
 
 	  /* Strings in GCC don't normally have types, but we want
 	     this to not be converted to the array type.  */
diff -pur gcc/ada/gcc-interface/utils2.c gcc/ada/gcc-interface/utils2.c
--- gcc/ada/gcc-interface/utils2.c	2017-12-21 07:57:41.000000000 +0100
+++ gcc/ada/gcc-interface/utils2.c	2018-07-31 11:44:01.517117923 +0200
@@ -1844,7 +1844,7 @@ expand_sloc (Node_Id gnat_node, tree *fi
     }
 
   const int len = strlen (str);
-  *filename = build_string (len, str);
+  *filename = build_string (len + 1, str);
   TREE_TYPE (*filename) = build_array_type (char_type_node,
 					    build_index_type (size_int (len)));
   *line = build_int_cst (NULL_TREE, line_number);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
  2018-07-31 12:07 [PATCH] [Ada] Make middle-end string literals NUL terminated Bernd Edlinger
@ 2018-08-03 16:50 ` Olivier Hainque
  2018-08-03 18:38   ` Bernd Edlinger
  2018-08-04 11:33   ` Bernd Edlinger
  0 siblings, 2 replies; 41+ messages in thread
From: Olivier Hainque @ 2018-08-03 16:50 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Olivier Hainque, gcc-patches, Richard Biener, Eric Botcazou,
	Arnaud Charlet, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3318 bytes --]

Hi Bernd,

> On 31 Jul 2018, at 14:07, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> 
> Hi!
> 
> 
> This fixes a couple STRING_CST that are not explicitly NUL terminated.
> These were caught in a new check in varasm.c I am currently working on.
> 
> Having a NUL terminated string does not change the binary output, but it
> makes things easier for he middle-end.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

--- gcc/ada/gcc-interface/utils2.c	2017-12-21 07:57:41.000000000 +0100
+++ gcc/ada/gcc-interface/utils2.c	2018-07-31 11:44:01.517117923 +0200
@@ -1844,7 +1844,7 @@ expand_sloc (Node_Id gnat_node, tree *fi
     }
 
   const int len = strlen (str);
-  *filename = build_string (len, str);
+  *filename = build_string (len + 1, str);
   TREE_TYPE (*filename) = build_array_type (char_type_node,
 					    build_index_type (size_int (len)));
   *line = build_int_cst (NULL_TREE, line_number);

This one looks good to me. I'm not officially listed as a maintainer
so I'll let Eric have the final word. I'm answering because ...


--- gcc/ada/gcc-interface/trans.c	2018-07-17 10:10:04.000000000 +0200
+++ gcc/ada/gcc-interface/trans.c	2018-07-31 11:16:27.350728886 +0200
@@ -6079,7 +6079,7 @@ gnat_to_gnu (Node_Id gnat_node)
 	     where GCC will want to treat it as a C string.  */
 	  string[i] = 0;
 
-	  gnu_result = build_string (length, string);
+	  gnu_result = build_string (length + 1, string);
 
 	  /* Strings in GCC don't normally have types, but we want
 	     this to not be converted to the array type.  */

We have a local variant of this one, on which I worked after we realized
that it was not enough to achieve the intended effect.

The difference at this spot is simply that we prevent the +1 if the
original string happens to be explicitly nul terminated already. Like:

 	    build_string
               ((length > 0 && string[length-1] == 0) ? length : length + 1,
                string);

This is however not enough because the extra zero is only conveyed
through the string value, not the corresponding type, and 

  varasm.c: mergeable_string_section

currently uses the type bounds to search for a zero termination.

We can't really change the type, so came up with an adjustment to
varasm. The motivation for using the type bounds was

  https://gcc.gnu.org/ml/gcc-patches/2006-06/msg01004.html

which, IIUC, was tightly linked to string constants used as
initializers for objects which have a size of their own.
(Jakub cc'ed)

For string constants not used as initializers, it seems that
we might be able to use the string bounds instead, possibly
wider. The attached patch does that and passed testing a few weeks
ago.

So, while we're at it, does that look right, in light of all the
string length related issues that have been discussed lately ?

I'm not sure I grasped all the ramifications.

Thanks in advance!


        * varasm.c (mergeable_string_section): Accept an extra SCAT
        argument conveying the section category from which the request
        originates.  Only restrict the search to the string type bounds
        if we're queried for an initializer.  Consider the string bounds
        otherwise.
        (default_elf_select_section, STR and STR_INIT): Pass the section
        category to mergeable_string_section.


[-- Attachment #2: ada-string.diff --]
[-- Type: application/octet-stream, Size: 3406 bytes --]


    	* varasm.c (mergeable_string_section): Accept an extra SCAT
    	argument conveying the section category from which the request
    	originates.  Only restrict the search to the string type bounds
    	if we're queried for an initializer.  Consider the string bounds
    	otherwise.
    	(default_elf_select_section, STR and STR_INIT): Pass the section
    	category to mergeable_string_section.
    
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 75595c0..1b1e0d6 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -772,16 +772,17 @@ function_mergeable_rodata_prefix (void)
 static section *
 mergeable_string_section (tree decl ATTRIBUTE_UNUSED,
 			  unsigned HOST_WIDE_INT align ATTRIBUTE_UNUSED,
-			  unsigned int flags ATTRIBUTE_UNUSED)
+			  unsigned int flags ATTRIBUTE_UNUSED,
+			  enum section_category scat ATTRIBUTE_UNUSED)
 {
-  HOST_WIDE_INT len;
+  HOST_WIDE_INT type_len, str_len;
 
   if (HAVE_GAS_SHF_MERGE && flag_merge_constants
       && TREE_CODE (decl) == STRING_CST
       && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
       && align <= 256
-      && (len = int_size_in_bytes (TREE_TYPE (decl))) > 0
-      && TREE_STRING_LENGTH (decl) >= len)
+      && (type_len = int_size_in_bytes (TREE_TYPE (decl))) > 0
+      && (str_len = TREE_STRING_LENGTH (decl)) >= type_len)
     {
       machine_mode mode;
       unsigned int modesize;
@@ -796,22 +797,28 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUSED,
       if (modesize >= 8 && modesize <= 256
 	  && (modesize & (modesize - 1)) == 0)
 	{
+	  /* Check if we have a terminating NUL at the end of the string to
+	     emit, and no other.  The length of the string we'll output is
+	     restricted to the type size if this is for an initializer.  */
+
+	  const HOST_WIDE_INT output_len =
+	    (scat == SECCAT_RODATA_MERGE_STR_INIT) ? type_len : str_len;
+
 	  if (align < modesize)
 	    align = modesize;
 
 	  str = TREE_STRING_POINTER (decl);
 	  unit = GET_MODE_SIZE (mode);
 
-	  /* Check for embedded NUL characters.  */
-	  for (i = 0; i < len; i += unit)
+	  for (i = 0; i < output_len; i += unit)
 	    {
 	      for (j = 0; j < unit; j++)
 		if (str[i + j] != '\0')
 		  break;
 	      if (j == unit)
-		break;
+		break;  /* found a NUL.  */
 	    }
-	  if (i == len - unit)
+	  if (i == output_len - unit)
 	    {
 	      sprintf (name, "%s.str%d.%d", prefix,
 		       modesize / 8, (int) (align / 8));
@@ -6565,7 +6572,10 @@ default_elf_select_section (tree decl, int reloc,
 			    unsigned HOST_WIDE_INT align)
 {
   const char *sname;
-  switch (categorize_decl_for_section (decl, reloc))
+  const enum section_category scat
+    = categorize_decl_for_section (decl, reloc);
+
+  switch (scat)
     {
     case SECCAT_TEXT:
       /* We're not supposed to be called on FUNCTION_DECLs.  */
@@ -6573,9 +6583,9 @@ default_elf_select_section (tree decl, int reloc,
     case SECCAT_RODATA:
       return readonly_data_section;
     case SECCAT_RODATA_MERGE_STR:
-      return mergeable_string_section (decl, align, 0);
+      return mergeable_string_section (decl, align, 0, scat);
     case SECCAT_RODATA_MERGE_STR_INIT:
-      return mergeable_string_section (DECL_INITIAL (decl), align, 0);
+      return mergeable_string_section (DECL_INITIAL (decl), align, 0, scat);
     case SECCAT_RODATA_MERGE_CONST:
       return mergeable_constant_section (DECL_MODE (decl), align, 0);
     case SECCAT_SRODATA:

[-- Attachment #3: Type: text/plain, Size: 4 bytes --]






^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
  2018-08-03 16:50 ` Olivier Hainque
@ 2018-08-03 18:38   ` Bernd Edlinger
  2018-08-06 18:17     ` Olivier Hainque
  2018-08-04 11:33   ` Bernd Edlinger
  1 sibling, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-03 18:38 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek

On 08/03/18 18:50, Olivier Hainque wrote:
> Hi Bernd,
> 
>> On 31 Jul 2018, at 14:07, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>> 
>> Hi!
>> 
>> 
>> This fixes a couple STRING_CST that are not explicitly NUL terminated.
>> These were caught in a new check in varasm.c I am currently working on.
>> 
>> Having a NUL terminated string does not change the binary output, but it
>> makes things easier for he middle-end.
>> 
>> 
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> 
> --- gcc/ada/gcc-interface/utils2.c      2017-12-21 07:57:41.000000000 +0100
> +++ gcc/ada/gcc-interface/utils2.c      2018-07-31 11:44:01.517117923 +0200
> @@ -1844,7 +1844,7 @@ expand_sloc (Node_Id gnat_node, tree *fi
>       }
> 
>     const int len = strlen (str);
> -  *filename = build_string (len, str);
> +  *filename = build_string (len + 1, str);
>     TREE_TYPE (*filename) = build_array_type (char_type_node,
>                                               build_index_type (size_int (len)));
>     *line = build_int_cst (NULL_TREE, line_number);
> 
> This one looks good to me. I'm not officially listed as a maintainer
> so I'll let Eric have the final word. I'm answering because ...
> 
> 
> --- gcc/ada/gcc-interface/trans.c       2018-07-17 10:10:04.000000000 +0200
> +++ gcc/ada/gcc-interface/trans.c       2018-07-31 11:16:27.350728886 +0200
> @@ -6079,7 +6079,7 @@ gnat_to_gnu (Node_Id gnat_node)
>                where GCC will want to treat it as a C string.  */
>             string[i] = 0;
> 
> -         gnu_result = build_string (length, string);
> +         gnu_result = build_string (length + 1, string);
> 
>             /* Strings in GCC don't normally have types, but we want
>                this to not be converted to the array type.  */
> 
> We have a local variant of this one, on which I worked after we realized
> that it was not enough to achieve the intended effect.
> 
> The difference at this spot is simply that we prevent the +1 if the
> original string happens to be explicitly nul terminated already. Like:
> 
>               build_string
>                 ((length > 0 && string[length-1] == 0) ? length : length + 1,
>                  string);
> 
> This is however not enough because the extra zero is only conveyed
> through the string value, not the corresponding type, and
> 
>    varasm.c: mergeable_string_section
> 
> currently uses the type bounds to search for a zero termination.
> 
> We can't really change the type, so came up with an adjustment to
> varasm. The motivation for using the type bounds was
> 
> https://gcc.gnu.org/ml/gcc-patches/2006-06/msg01004.html
> 
> which, IIUC, was tightly linked to string constants used as
> initializers for objects which have a size of their own.
> (Jakub cc'ed)
> 
> For string constants not used as initializers, it seems that
> we might be able to use the string bounds instead, possibly
> wider. The attached patch does that and passed testing a few weeks
> ago.
> 
> So, while we're at it, does that look right, in light of all the
> string length related issues that have been discussed lately ?
> 
> I'm not sure I grasped all the ramifications.
> 
> Thanks in advance!
> 
> 
>          * varasm.c (mergeable_string_section): Accept an extra SCAT
>          argument conveying the section category from which the request
>          originates.  Only restrict the search to the string type bounds
>          if we're queried for an initializer.  Consider the string bounds
>          otherwise.
>          (default_elf_select_section, STR and STR_INIT): Pass the section
>          category to mergeable_string_section.
> 
> 
> 
> 
> 

Oh, how interesting.

My patch only intends to add a dummy byte that is stripped again
in the assembly.  The purpose is to make it trivial to find out if
a string constant is NUL-terminated in  the middle-end by comparing
TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl).
TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated
TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated,
Such strings shall never chop off more than one nul character.
Ada strings are generally not zero terminated, right?

So in your search loop you would not have to look after
type_len, because that byte would be guaranteed to be zero.

That is if we agree that we want to restrict the string constants
in that way as I proposed.

In the case str_len > type_len I do not understand if that is right.

Because varasm will only output type_size bytes,
this seems only to select a string section
but the last nul byte will not be assembled.
Something that is not contained in the type_size
should not be assembled.

What exactly do you want to accomplish?


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
  2018-08-03 16:50 ` Olivier Hainque
  2018-08-03 18:38   ` Bernd Edlinger
@ 2018-08-04 11:33   ` Bernd Edlinger
  2018-08-04 15:43     ` [PATCH] Handle not explicitly zero terminated strings in merge sections Bernd Edlinger
  1 sibling, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-04 11:33 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

Hi Olivier,


I think I like your idea a lot, it should be highly useful for Fortran and GO as well.

Would somthing something like this (untested patch) work for you on top of my patch series.
It makes use of the zero-termination properties of STRING_CSTs, that the other patch ensures.

I have two C test cases for this:

$ cat y3.c
const char str[3] = "abc";
$ gcc -fmerge-all-constants -S y3.c
$ cat y3.s
	.file	"y3.c"
	.text
	.globl	str
	.section	.rodata.str1.1,"aMS",@progbits,1
	.type	str, @object
	.size	str, 3
str:
	.string	"abc"
	.ident	"GCC: (GNU) 9.0.0 20180730 (experimental)"
	.section	.note.GNU-stack,"",@progbits

$ cat y4.c
const char str[4] = "abc";
$ gcc -fmerge-all-constants -S y4.c
$ cat y4.s
$ cat y4.s
	.file	"y4.c"
	.text
	.globl	str
	.section	.rodata.str1.1,"aMS",@progbits,1
	.type	str, @object
	.size	str, 4
str:
	.string	"abc"
	.ident	"GCC: (GNU) 9.0.0 20180730 (experimental)"
	.section	.note.GNU-stack,"",@progbits


The .size directive uses the DECL_SIZE_UNIT, and is still 3.
I don't know of that needs to change or not.
But what is important that we have .string "abc"
and not .ascii "abc".

What do you think?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-merge-strings.diff --]
[-- Type: text/x-patch; name="patch-merge-strings.diff", Size: 3878 bytes --]

Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 263072)
+++ gcc/varasm.c	(working copy)
@@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre
 static void output_constant_def_contents (rtx);
 static void output_addressed_constants (tree);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
-					       unsigned int, bool);
+					       unsigned int, bool,
+					       bool = false);
 static void globalize_decl (tree);
 static bool decl_readonly_section_1 (enum section_category);
 #ifdef BSS_SECTION_ASM_OP
@@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS
 	  unit = GET_MODE_SIZE (mode);
 
 	  /* Check for embedded NUL characters.  */
-	  for (i = 0; i < len; i += unit)
+	  for (i = 0; i < len - unit; i += unit)
 	    {
 	      for (j = 0; j < unit; j++)
 		if (str[i + j] != '\0')
@@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char
 
 static void
 assemble_variable_contents (tree decl, const char *name,
-			    bool dont_output_data)
+			    bool dont_output_data, bool merge_strings = false)
 {
   /* Do any machine/system dependent processing of the object.  */
 #ifdef ASM_DECLARE_OBJECT_NAME
@@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char
 	output_constant (DECL_INITIAL (decl),
 			 tree_to_uhwi (DECL_SIZE_UNIT (decl)),
 			 get_variable_align (decl),
-			 false);
+			 false, merge_strings);
       else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
@@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB
 	switch_to_section (sect);
       if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-      assemble_variable_contents (decl, name, dont_output_data);
+      assemble_variable_contents (decl, name, dont_output_data,
+				  sect->common.flags & SECTION_MERGE
+				  && sect->common.flags & SECTION_STRINGS);
       if (asan_protected)
 	{
 	  unsigned HOST_WIDE_INT int size
@@ -4786,6 +4789,19 @@ static unsigned HOST_WIDE_INT
 output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool,
 		    oc_outer_state *);
 
+/* Find out if a string P of length LEN and character size UNIT
+   has double nul termination.
+   The string is already known to have a nul termination at
+   offset LEN.  Find out if there is already a NUL termination
+   at offset LEN - UNIT.  */
+
+static bool
+redundant_nul_term_p (const char *p, unsigned len, unsigned unit)
+{
+  gcc_assert(len >= unit);
+  return !memcmp (p + len, p + len - unit, unit);
+}
+
 /* Output assembler code for constant EXP, with no label.
    This includes the pseudo-op such as ".int" or ".byte", and a newline.
    Assumes output_addressed_constants has been done on EXP already.
@@ -4812,7 +4828,7 @@ output_constructor (tree, unsigned HOST_WIDE_INT,
 
 static unsigned HOST_WIDE_INT
 output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
-		 bool reverse)
+		 bool reverse, bool merge_strings /* = false */)
 {
   enum tree_code code;
   unsigned HOST_WIDE_INT thissize;
@@ -4940,8 +4956,13 @@ output_constant (tree exp, unsigned HOST_WIDE_INT
 	case CONSTRUCTOR:
 	  return output_constructor (exp, size, align, reverse, NULL);
 	case STRING_CST:
-	  thissize
-	    = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size);
+	  thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp);
+	  if (thissize > size && merge_strings
+	      && !redundant_nul_term_p (TREE_STRING_POINTER (exp),
+				        size, thissize - size))
+	    ;
+ 	  else
+	    thissize = MIN (thissize, size);
 	  assemble_string (TREE_STRING_POINTER (exp), thissize);
 	  break;
 	case VECTOR_CST:

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] Handle not explicitly zero terminated strings in merge sections
  2018-08-04 11:33   ` Bernd Edlinger
@ 2018-08-04 15:43     ` Bernd Edlinger
  2018-08-04 15:44       ` Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-04 15:43 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek

Hi!


This patch is inspired by Olivier's feedback to my previous patch on the
zero-termination of Ada STRING_CST.

The idea is that strings that do not have embedded nul characters _and_
do not happen to be zero-terminated in the DECL_UNIT_SIZE, are currently
not in the merge string sections.  To be in the merge string section
they need a terminating nul character, that gets written directly
in varasm while assembling the string constant.  This patch uses
the new string properties that my previous patch series implements,
and is based on the other patches here:

[PATCH] Check the STRING_CSTs in varasm.c
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00331.html

[PATCH] Handle overlength strings in the C FE
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00142.html

[PATCH] Handle overlength strings in C++ FE
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00045.html
Approved: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00105.html

[PATCH] Make GO string literals properly NUL terminated
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01931.html

[PATCH] [Ada] Make middle-end string literals NUL terminated
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01929.html


The existing test case gcc.dg/merge-all-constants-1.c
contains two overlength strings that get now stripped down
by the C FE, and look in the middle end exactly like normal
Ada, Fortran or Go strings.  And get now allocated
in the merge.str section, unless they have an embedded
nul character, which I changed to make the test pass again.

Olivier, could you add test cases from the Ada side to this?


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk (after all pre-condition patches
are approved/committed)?


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] Handle not explicitly zero terminated strings in merge sections
  2018-08-04 15:43     ` [PATCH] Handle not explicitly zero terminated strings in merge sections Bernd Edlinger
@ 2018-08-04 15:44       ` Bernd Edlinger
  2018-08-04 16:06         ` Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-04 15:44 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]

Again, with patch....

On 08/04/18 17:43, Bernd Edlinger wrote:
> Hi!
> 
> 
> This patch is inspired by Olivier's feedback to my previous patch on the
> zero-termination of Ada STRING_CST.
> 
> The idea is that strings that do not have embedded nul characters _and_
> do not happen to be zero-terminated in the DECL_UNIT_SIZE, are currently
> not in the merge string sections.  To be in the merge string section
> they need a terminating nul character, that gets written directly
> in varasm while assembling the string constant.  This patch uses
> the new string properties that my previous patch series implements,
> and is based on the other patches here:
> 
> [PATCH] Check the STRING_CSTs in varasm.c
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00331.html
> 
> [PATCH] Handle overlength strings in the C FE
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00142.html
> 
> [PATCH] Handle overlength strings in C++ FE
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00045.html
> Approved: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00105.html
> 
> [PATCH] Make GO string literals properly NUL terminated
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01931.html
> 
> [PATCH] [Ada] Make middle-end string literals NUL terminated
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01929.html
> 
> 
> The existing test case gcc.dg/merge-all-constants-1.c
> contains two overlength strings that get now stripped down
> by the C FE, and look in the middle end exactly like normal
> Ada, Fortran or Go strings.  And get now allocated
> in the merge.str section, unless they have an embedded
> nul character, which I changed to make the test pass again.
> 
> Olivier, could you add test cases from the Ada side to this?
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk (after all pre-condition patches
> are approved/committed)?
> 
> 
> Thanks
> Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-merge-strings.diff --]
[-- Type: text/x-patch; name="patch-merge-strings.diff", Size: 5250 bytes --]

gcc:
2018-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* varasm.c (output_constant): Add new parameter merge_strings.
	(assemble_variable_contents): Likewise.
	(mergeable_string_section): Don't fail if the last char is non-zero.
	(assemble_variable): Pass merge_strings for string merge sections.
	(redundant_nul_term_p): New helper function.
	(output_constant): Make strings properly zero terminated in merge
	string sections.

testsuite:
2018-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* gcc.dg/merge-all-constants-1.c: Adjust test.

Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 263072)
+++ gcc/varasm.c	(working copy)
@@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre
 static void output_constant_def_contents (rtx);
 static void output_addressed_constants (tree);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
-					       unsigned int, bool);
+					       unsigned int, bool,
+					       bool = false);
 static void globalize_decl (tree);
 static bool decl_readonly_section_1 (enum section_category);
 #ifdef BSS_SECTION_ASM_OP
@@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS
 	  unit = GET_MODE_SIZE (mode);
 
 	  /* Check for embedded NUL characters.  */
-	  for (i = 0; i < len; i += unit)
+	  for (i = 0; i < len - unit; i += unit)
 	    {
 	      for (j = 0; j < unit; j++)
 		if (str[i + j] != '\0')
@@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char
 
 static void
 assemble_variable_contents (tree decl, const char *name,
-			    bool dont_output_data)
+			    bool dont_output_data, bool merge_strings = false)
 {
   /* Do any machine/system dependent processing of the object.  */
 #ifdef ASM_DECLARE_OBJECT_NAME
@@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char
 	output_constant (DECL_INITIAL (decl),
 			 tree_to_uhwi (DECL_SIZE_UNIT (decl)),
 			 get_variable_align (decl),
-			 false);
+			 false, merge_strings);
       else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
@@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB
 	switch_to_section (sect);
       if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-      assemble_variable_contents (decl, name, dont_output_data);
+      assemble_variable_contents (decl, name, dont_output_data,
+				  sect->common.flags & SECTION_MERGE
+				  && sect->common.flags & SECTION_STRINGS);
       if (asan_protected)
 	{
 	  unsigned HOST_WIDE_INT int size
@@ -4786,6 +4789,19 @@ static unsigned HOST_WIDE_INT
 output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool,
 		    oc_outer_state *);
 
+/* Find out if a string P of length LEN and character size UNIT
+   has double nul termination.
+   The string is already known to have a nul termination at
+   offset LEN.  Find out if there is already a NUL termination
+   at offset LEN - UNIT.  */
+
+static bool
+redundant_nul_term_p (const char *p, unsigned len, unsigned unit)
+{
+  gcc_assert(len >= unit);
+  return !memcmp (p + len, p + len - unit, unit);
+}
+
 /* Output assembler code for constant EXP, with no label.
    This includes the pseudo-op such as ".int" or ".byte", and a newline.
    Assumes output_addressed_constants has been done on EXP already.
@@ -4812,7 +4828,7 @@ output_constructor (tree, unsigned HOST_WIDE_INT,
 
 static unsigned HOST_WIDE_INT
 output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
-		 bool reverse)
+		 bool reverse, bool merge_strings /* = false */)
 {
   enum tree_code code;
   unsigned HOST_WIDE_INT thissize;
@@ -4940,8 +4956,13 @@ output_constant (tree exp, unsigned HOST_WIDE_INT
 	case CONSTRUCTOR:
 	  return output_constructor (exp, size, align, reverse, NULL);
 	case STRING_CST:
-	  thissize
-	    = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size);
+	  thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp);
+	  if (thissize > size && merge_strings
+	      && !redundant_nul_term_p (TREE_STRING_POINTER (exp),
+					size, thissize - size))
+	    ;
+ 	  else
+	    thissize = MIN (thissize, size);
 	  gcc_checking_assert (check_string_literal (exp, thissize));
 	  assemble_string (TREE_STRING_POINTER (exp), thissize);
 	  break;
 	case VECTOR_CST:
Index: gcc/testsuite/gcc.dg/merge-all-constants-1.c
===================================================================
--- gcc/testsuite/gcc.dg/merge-all-constants-1.c	(revision 263072)
+++ gcc/testsuite/gcc.dg/merge-all-constants-1.c	(working copy)
@@ -1,8 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-w -O2 -fmerge-all-constants" } */
 
-const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str1[36] = "\000123456789abcdefghijklmnopqrstuvwxyz";
 const char str2[38] = "0123456789abcdefghijklmnopqrstuvwxyz";
-const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str3[10] = "\000123456789abcdefghijklmnopqrstuvwxyz";
 
 /* { dg-final { scan-assembler-not "\.rodata\.str" } } */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] Handle not explicitly zero terminated strings in merge sections
  2018-08-04 15:44       ` Bernd Edlinger
@ 2018-08-04 16:06         ` Bernd Edlinger
  2018-08-07 12:56           ` Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-04 16:06 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek

Aehm, I forgot the Fortran FE patch
which is also a pre-condition (at least for building with all languages):

[PATCH] Create internally nul terminated string literals in fortan FE
https://gcc.gnu.org/ml/fortran/2018-08/msg00000.html


Thanks
Bernd.


On 08/04/18 17:44, Bernd Edlinger wrote:
> Again, with patch....
> 
> On 08/04/18 17:43, Bernd Edlinger wrote:
>> Hi!
>>
>>
>> This patch is inspired by Olivier's feedback to my previous patch on the
>> zero-termination of Ada STRING_CST.
>>
>> The idea is that strings that do not have embedded nul characters _and_
>> do not happen to be zero-terminated in the DECL_UNIT_SIZE, are currently
>> not in the merge string sections.  To be in the merge string section
>> they need a terminating nul character, that gets written directly
>> in varasm while assembling the string constant.  This patch uses
>> the new string properties that my previous patch series implements,
>> and is based on the other patches here:
>>
>> [PATCH] Check the STRING_CSTs in varasm.c
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00331.html
>>
>> [PATCH] Handle overlength strings in the C FE
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00142.html
>>
>> [PATCH] Handle overlength strings in C++ FE
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00045.html
>> Approved: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00105.html
>>
>> [PATCH] Make GO string literals properly NUL terminated
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01931.html
>>
>> [PATCH] [Ada] Make middle-end string literals NUL terminated
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01929.html
>>
>>
>> The existing test case gcc.dg/merge-all-constants-1.c
>> contains two overlength strings that get now stripped down
>> by the C FE, and look in the middle end exactly like normal
>> Ada, Fortran or Go strings.  And get now allocated
>> in the merge.str section, unless they have an embedded
>> nul character, which I changed to make the test pass again.
>>
>> Olivier, could you add test cases from the Ada side to this?
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk (after all pre-condition patches
>> are approved/committed)?
>>
>>
>> Thanks
>> Bernd.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
  2018-08-03 18:38   ` Bernd Edlinger
@ 2018-08-06 18:17     ` Olivier Hainque
  2018-08-07 13:08       ` Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Olivier Hainque @ 2018-08-06 18:17 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Olivier Hainque, gcc-patches, Richard Biener, Eric Botcazou,
	Arnaud Charlet, Jakub Jelinek

Hi Bernd,

Things are moving fast so I'll answer your
messages incrementally :-)

> On 3 Aug 2018, at 20:37, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
[...]
> Oh, how interesting.
> 
> My patch only intends to add a dummy byte that is stripped again
> in the assembly.  The purpose is to make it trivial to find out if
> a string constant is NUL-terminated in  the middle-end by comparing
> TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl).

Ah, Ok. I hadn't quite realized that yet.

> TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated
> TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated,

That is opposite to the situation we were having
for Ada before the patch series:

> Such strings shall never chop off more than one nul character.
> Ada strings are generally not zero terminated, right?

Right. Strings have explicit bounds in Ada, and a NUL character
is like any other. It can appear anywhere. For example, the
excerpt below constructs a string of 5 characters, bounds 1 to 5,
with a nul character in second position:

 X : String (1 .. 5) := "1" & Ascii.Nul & "345";

('&' is the catenation operator)

The problem we had was that string literals not explicitly
NUL terminated (as in X : String := "123" & Ascii.NUL) would
never go in mergeable sections. For example:

Ada.Text_IO.Put_Line ("Hello World!");

Ada has so called "generic" units that are sometimes
instanciated many many times so there is real interest
in improving the situation there.

> So in your search loop you would not have to look after
> type_len, because that byte would be guaranteed to be zero.
> 
> That is if we agree that we want to restrict the string constants
> in that way as I proposed.
> 
> In the case str_len > type_len I do not understand if that is right.

In your new model, I don't know what sense it would make, indeed.

In the previous model, it was clearly expected to be a possibility.

> Because varasm will only output type_size bytes,
> this seems only to select a string section
> but the last nul byte will not be assembled.
> Something that is not contained in the type_size
> should not be assembled.
> 
> What exactly do you want to accomplish?

Based on the previous (say, gcc-7 or gcc-8) code base, arrange
for most Ada string literals to end up in a mergeable section
on ELF targets.

In at least gcc-7 and gcc-8, our gigi adjustment + the
patch I posted achieve this effect.

For example, for the following source:

procedure Hello is
  procedure Process (S : String) with Import, Convention => C;
begin
 Process ("1234");
end;

Without the gigi change, I get (x86_64-linux,
-O2 -fmerge-all-constants):

  .section .rodata
  .LC0:
      .ascii "1234"

Regular section, no terminating zero.

The gigi change alone gets us the terminating nul (*),
but not yet a mergeable section, because the extra nul
isn't covered by the type bounds:

  .section .rodata
  .LC0:
      .string "1234"

With the varasm change on top, checking beyond the
type bounds when the string constant isn't an initializer,
we get:

  .section .rodata.str1.1,"aMS",@progbits,1
  .LC0:
      .string "1234"

(*) The terminating zero, part of the string value,
not spanned by the type bounds, gets assembled through:

assemble_constant_contents (...)
...
size = get_constant_size (exp);

then:

get_constant_size (tree exp)

  size = int_size_in_bytes (TREE_TYPE (exp));
  if (TREE_CODE (exp) == STRING_CST)
    size = MAX (TREE_STRING_LENGTH (exp), size);

Again, this is just providing more context on the
original patch I posted, based on your questions.

I understand there are proposals to change things
pretty significantly in this area, so ...

now on to your next messages and ideas :-) 


Olivier

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] Handle not explicitly zero terminated strings in merge sections
  2018-08-04 16:06         ` Bernd Edlinger
@ 2018-08-07 12:56           ` Bernd Edlinger
  2018-08-17 17:35             ` Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-07 12:56 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

Hi,

I realized that the previous version did not handle
zero terminated Ada constants correctly as in

$ cat hello.adb
procedure Hello is
    procedure puts  (S : String) with Import, Convention => C;
    X : String(1..8) := "abcdefg" & Ascii.Nul;
begin
   puts ("1234" & Ascii.Nul );
   puts (X);
end;

accidentally strings were doubly nul-terminated, because I forgot to
handle merge string sections in assemble_constant_contents, and
because get_constant_size increased the string literal by one.

Fixed, and added a positive test case for the merging non-zero terminated
strings in C.


Boot-strapped and regtested on x86_64-pc-linux-gnu.
Is it OK for trunk (after pre-condition patches)?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-merge-strings.diff --]
[-- Type: text/x-patch; name="patch-merge-strings.diff", Size: 7977 bytes --]

gcc:
2018-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* varasm.c (output_constant): Add new parameter merge_strings.
	Make strings properly zero terminated in merge string sections.
	(mergeable_string_section): Don't fail if the last char is non-zero.
	(assemble_variable_contents): Handle merge string sections.
	(assemble_variable): Likewise.
	(assemble_constant_contents): Likewise.
	(output_constant_def_contents): Likewise.
	(get_constant_size): Remove special handling of STRING_CSTs.
	(redundant_nul_term_p): New helper function.

testsuite:
2018-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* gcc.dg/merge-all-constants-1.c: Adjust test.
	* gcc.dg/merge-all-constants-2.c: New test.

Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 263072)
+++ gcc/varasm.c	(working copy)
@@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre
 static void output_constant_def_contents (rtx);
 static void output_addressed_constants (tree);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
-					       unsigned int, bool);
+					       unsigned int, bool,
+					       bool = false);
 static void globalize_decl (tree);
 static bool decl_readonly_section_1 (enum section_category);
 #ifdef BSS_SECTION_ASM_OP
@@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS
 	  unit = GET_MODE_SIZE (mode);
 
 	  /* Check for embedded NUL characters.  */
-	  for (i = 0; i < len; i += unit)
+	  for (i = 0; i < len - unit; i += unit)
 	    {
 	      for (j = 0; j < unit; j++)
 		if (str[i + j] != '\0')
@@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char
 
 static void
 assemble_variable_contents (tree decl, const char *name,
-			    bool dont_output_data)
+			    bool dont_output_data, bool merge_strings = false)
 {
   /* Do any machine/system dependent processing of the object.  */
 #ifdef ASM_DECLARE_OBJECT_NAME
@@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char
 	output_constant (DECL_INITIAL (decl),
 			 tree_to_uhwi (DECL_SIZE_UNIT (decl)),
 			 get_variable_align (decl),
-			 false);
+			 false, merge_strings);
       else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
@@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB
 	switch_to_section (sect);
       if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-      assemble_variable_contents (decl, name, dont_output_data);
+      assemble_variable_contents (decl, name, dont_output_data,
+				  sect->common.flags & SECTION_MERGE
+				  && sect->common.flags & SECTION_STRINGS);
       if (asan_protected)
 	{
 	  unsigned HOST_WIDE_INT int size
@@ -3300,12 +3303,7 @@ get_constant_section (tree exp, unsigned int align
 static HOST_WIDE_INT
 get_constant_size (tree exp)
 {
-  HOST_WIDE_INT size;
-
-  size = int_size_in_bytes (TREE_TYPE (exp));
-  if (TREE_CODE (exp) == STRING_CST)
-    size = MAX (TREE_STRING_LENGTH (exp), size);
-  return size;
+  return int_size_in_bytes (TREE_TYPE (exp));
 }
 
 /* Subroutine of output_constant_def:
@@ -3468,7 +3466,8 @@ maybe_output_constant_def_contents (struct constan
    constant's alignment in bits.  */
 
 static void
-assemble_constant_contents (tree exp, const char *label, unsigned int align)
+assemble_constant_contents (tree exp, const char *label, unsigned int align,
+			    bool merge_strings = false)
 {
   HOST_WIDE_INT size;
 
@@ -3478,7 +3477,7 @@ static void
   targetm.asm_out.declare_constant_name (asm_out_file, label, exp, size);
 
   /* Output the value of EXP.  */
-  output_constant (exp, size, align, false);
+  output_constant (exp, size, align, false, merge_strings);
 
   targetm.asm_out.decl_end ();
 }
@@ -3519,10 +3518,13 @@ output_constant_def_contents (rtx symbol)
 		   || (VAR_P (decl) && DECL_IN_CONSTANT_POOL (decl))
 		   ? DECL_ALIGN (decl)
 		   : symtab_node::get (decl)->definition_alignment ());
-      switch_to_section (get_constant_section (exp, align));
+      section *sect = get_constant_section (exp, align);
+      switch_to_section (sect);
       if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-      assemble_constant_contents (exp, XSTR (symbol, 0), align);
+      assemble_constant_contents (exp, XSTR (symbol, 0), align,
+				  sect->common.flags & SECTION_MERGE
+				  && sect->common.flags & SECTION_STRINGS);
       if (asan_protected)
 	{
 	  HOST_WIDE_INT size = get_constant_size (exp);
@@ -4786,6 +4789,19 @@ static unsigned HOST_WIDE_INT
 output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool,
 		    oc_outer_state *);
 
+/* Find out if a string P of length LEN and character size UNIT
+   has double nul termination.
+   The string is already known to have a nul termination at
+   offset LEN.  Find out if there is already a NUL termination
+   at offset LEN - UNIT.  */
+
+static bool
+redundant_nul_term_p (const char *p, unsigned len, unsigned unit)
+{
+  gcc_assert(len >= unit);
+  return !memcmp (p + len, p + len - unit, unit);
+}
+
 /* Output assembler code for constant EXP, with no label.
    This includes the pseudo-op such as ".int" or ".byte", and a newline.
    Assumes output_addressed_constants has been done on EXP already.
@@ -4812,7 +4850,7 @@ output_constructor (tree, unsigned HOST_WIDE_INT,
 
 static unsigned HOST_WIDE_INT
 output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
-		 bool reverse)
+		 bool reverse, bool merge_strings /* = false */)
 {
   enum tree_code code;
   unsigned HOST_WIDE_INT thissize;
@@ -4940,8 +4978,13 @@ output_constant (tree exp, unsigned HOST_WIDE_INT
 	case CONSTRUCTOR:
 	  return output_constructor (exp, size, align, reverse, NULL);
 	case STRING_CST:
-	  thissize
-	    = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size);
+	  thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp);
+	  if (thissize > size && merge_strings
+	      && !redundant_nul_term_p (TREE_STRING_POINTER (exp),
+					size, thissize - size))
+	    ;
+	  else
+	    thissize = MIN (thissize, size);
 	  gcc_checking_assert (check_string_literal (exp, size));
 	  assemble_string (TREE_STRING_POINTER (exp), thissize);
 	  break;
Index: gcc/testsuite/gcc.dg/merge-all-constants-1.c
===================================================================
--- gcc/testsuite/gcc.dg/merge-all-constants-1.c	(revision 263072)
+++ gcc/testsuite/gcc.dg/merge-all-constants-1.c	(working copy)
@@ -1,8 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-w -O2 -fmerge-all-constants" } */
 
-const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str1[36] = "\000123456789abcdefghijklmnopqrstuvwxyz";
 const char str2[38] = "0123456789abcdefghijklmnopqrstuvwxyz";
-const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str3[10] = "\000123456789abcdefghijklmnopqrstuvwxyz";
 
 /* { dg-final { scan-assembler-not "\.rodata\.str" } } */
Index: gcc/testsuite/gcc.dg/merge-all-constants-2.c
===================================================================
--- gcc/testsuite/gcc.dg/merge-all-constants-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/merge-all-constants-2.c	(working copy)
--- /dev/null	2018-07-24 12:25:16.122475518 +0200
+++ gcc/testsuite/gcc.dg/merge-all-constants-2.c	2018-08-07 10:14:35.384190377 +0200
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-w -O2 -fmerge-all-constants" } */
+
+const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
+
+/* { dg-final { scan-assembler-not "\.rodata\[\n\r\]" } } */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
  2018-08-06 18:17     ` Olivier Hainque
@ 2018-08-07 13:08       ` Bernd Edlinger
  2018-08-07 16:41         ` Olivier Hainque
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-07 13:08 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek

Hi Olivier,

On 08/06/18 20:01, Olivier Hainque wrote:
> Hi Bernd,
> 
> Things are moving fast so I'll answer your
> messages incrementally :-)
> 
>> On 3 Aug 2018, at 20:37, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> [...]
>> Oh, how interesting.
>>
>> My patch only intends to add a dummy byte that is stripped again
>> in the assembly.  The purpose is to make it trivial to find out if
>> a string constant is NUL-terminated in  the middle-end by comparing
>> TYPE_SIZE_UNIT (TREE_TYPE (decl)) with TREE_STRING_LENGTH (decl).
> 
> Ah, Ok. I hadn't quite realized that yet.
> 
>> TYPE_SIZE_UNIT >= TREE_STRING_LENGTH means zero terminated
>> TYPE_SIZE_UNIT < TREE_STRING_LENGTH means not necessarily zero terminated,
> 
> That is opposite to the situation we were having
> for Ada before the patch series:
> 
>> Such strings shall never chop off more than one nul character.
>> Ada strings are generally not zero terminated, right?
> 
> Right. Strings have explicit bounds in Ada, and a NUL character
> is like any other. It can appear anywhere. For example, the
> excerpt below constructs a string of 5 characters, bounds 1 to 5,
> with a nul character in second position:
> 
>   X : String (1 .. 5) := "1" & Ascii.Nul & "345";
> 
> ('&' is the catenation operator)
> 
> The problem we had was that string literals not explicitly
> NUL terminated (as in X : String := "123" & Ascii.NUL) would
> never go in mergeable sections. For example:
> 
> Ada.Text_IO.Put_Line ("Hello World!");
> 
> Ada has so called "generic" units that are sometimes
> instanciated many many times so there is real interest
> in improving the situation there.
> 
>> So in your search loop you would not have to look after
>> type_len, because that byte would be guaranteed to be zero.
>>
>> That is if we agree that we want to restrict the string constants
>> in that way as I proposed.
>>
>> In the case str_len > type_len I do not understand if that is right.
> 
> In your new model, I don't know what sense it would make, indeed.
> 
> In the previous model, it was clearly expected to be a possibility.
> 
>> Because varasm will only output type_size bytes,
>> this seems only to select a string section
>> but the last nul byte will not be assembled.
>> Something that is not contained in the type_size
>> should not be assembled.
>>
>> What exactly do you want to accomplish?
> 
> Based on the previous (say, gcc-7 or gcc-8) code base, arrange
> for most Ada string literals to end up in a mergeable section
> on ELF targets.
> 
> In at least gcc-7 and gcc-8, our gigi adjustment + the
> patch I posted achieve this effect.
> 
> For example, for the following source:
> 
> procedure Hello is
>    procedure Process (S : String) with Import, Convention => C;
> begin
>   Process ("1234");
> end;
> 

Yes, thanks for this example,

$ cat hello.adb
procedure Hello is
    procedure puts  (S : String) with Import, Convention => C;
    X : String(1..8) := "abcdefg" & Ascii.Nul;
begin
   puts ("1234" & Ascii.Nul );
   puts (X);
end;

produces again identical output with the latest patch that I posted
right now.



> Without the gigi change, I get (x86_64-linux,
> -O2 -fmerge-all-constants):
> 
>    .section .rodata
>    .LC0:
>        .ascii "1234"
> 
> Regular section, no terminating zero.
> 
> The gigi change alone gets us the terminating nul (*),
> but not yet a mergeable section, because the extra nul
> isn't covered by the type bounds:
> 
>    .section .rodata
>    .LC0:
>        .string "1234"
> 
> With the varasm change on top, checking beyond the
> type bounds when the string constant isn't an initializer,
> we get:
> 
>    .section .rodata.str1.1,"aMS",@progbits,1
>    .LC0:
>        .string "1234"
> 
> (*) The terminating zero, part of the string value,
> not spanned by the type bounds, gets assembled through:
> 
> assemble_constant_contents (...)
> ...
> size = get_constant_size (exp);
> 
> then:
> 
> get_constant_size (tree exp)
> 
>    size = int_size_in_bytes (TREE_TYPE (exp));
>    if (TREE_CODE (exp) == STRING_CST)
>      size = MAX (TREE_STRING_LENGTH (exp), size);
> 
> Again, this is just providing more context on the
> original patch I posted, based on your questions.
> 

Ah, good point. That needs to be changed...
Actually I would like to have get_constant_size
return just the type size.

> I understand there are proposals to change things
> pretty significantly in this area, so ...
> 
> now on to your next messages and ideas :-)
> 
> 

When I try this example:

$ cat array9.adb
-- { dg-do run }

procedure Array9 is

   V1 : String(1..10) := "1234567890";
   V2 : String(1..-1) := "";

   procedure Compare (S : String) is
   begin
     if S'Size /= 8*S'Length then
       raise Program_Error;
     end if;
   end;

begin
   Compare ("");
   Compare ("1234");
   Compare (V1);
   Compare (V2);
end;

I see that "1234" is put in the merge section,
but V1 is not.  Maybe because of the alignment requirement?

But it sould not be much different from the C test case,
which is now able to merge the non-zero terminated strings.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
  2018-08-07 13:08       ` Bernd Edlinger
@ 2018-08-07 16:41         ` Olivier Hainque
  2018-08-07 17:10           ` Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Olivier Hainque @ 2018-08-07 16:41 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: Olivier Hainque, gcc-patches

Hi Bernd,

(Thanks for your interest in the Ada case as well :)

> On 7 Aug 2018, at 15:07, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:

> When I try this example:
> 
> $ cat array9.adb
> -- { dg-do run }
> 
> procedure Array9 is
> 
>   V1 : String(1..10) := "1234567890";
>   V2 : String(1..-1) := "";
> 
>   procedure Compare (S : String) is
>   begin
>     if S'Size /= 8*S'Length then
>       raise Program_Error;
>     end if;
>   end;
> 
> begin
>   Compare ("");
>   Compare ("1234");
>   Compare (V1);
>   Compare (V2);
> end;
> 
> I see that "1234" is put in the merge section,
> but V1 is not.  Maybe because of the alignment requirement?
> 
> But it sould not be much different from the C test case,
> which is now able to merge the non-zero terminated strings.

I'm happy to have a look. I'd just like to make
sure I'll be looking at the right thing:

I would need to start from trunk + the patches you
referenced at

  https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00339.html

+ the last one you sent at

  https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00481.html

Correct ?

Can you then confirm the target triplet and
compilation options ?

Thanks in advance!

Olivier




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
  2018-08-07 16:41         ` Olivier Hainque
@ 2018-08-07 17:10           ` Bernd Edlinger
  2018-08-09 13:08             ` Olivier Hainque
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-07 17:10 UTC (permalink / raw)
  To: Olivier Hainque; +Cc: gcc-patches

On 08/07/18 18:40, Olivier Hainque wrote:
> Hi Bernd,
> 
> (Thanks for your interest in the Ada case as well :)
> 
>> On 7 Aug 2018, at 15:07, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> 
>> When I try this example:
>>
>> $ cat array9.adb
>> -- { dg-do run }
>>
>> procedure Array9 is
>>
>>    V1 : String(1..10) := "1234567890";
>>    V2 : String(1..-1) := "";
>>
>>    procedure Compare (S : String) is
>>    begin
>>      if S'Size /= 8*S'Length then
>>        raise Program_Error;
>>      end if;
>>    end;
>>
>> begin
>>    Compare ("");
>>    Compare ("1234");
>>    Compare (V1);
>>    Compare (V2);
>> end;
>>
>> I see that "1234" is put in the merge section,
>> but V1 is not.  Maybe because of the alignment requirement?
>>
>> But it sould not be much different from the C test case,
>> which is now able to merge the non-zero terminated strings.
> 
> I'm happy to have a look. I'd just like to make
> sure I'll be looking at the right thing:
> 
> I would need to start from trunk + the patches you
> referenced at
> 
>    https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00339.html
> 
> + the last one you sent at
> 
>    https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00481.html
> 
> Correct ?
> 

Yes,

Please use the latest patch patch here:
[PATCH] Check the STRING_CSTs in varasm.c
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00361.html

There is also a Fortran patch:
[PATCH] Create internally nul terminated string literals in fortan FE
https://gcc.gnu.org/ml/fortran/2018-08/msg00000.html

And a JIT FE patch:
[PATCH] Fix not properly nul-terminated string constants in JIT
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00370.html


My host triplet is x86_64-pc-linux-gnu (I let config.guess determine it).
I use gmp, mpfr, mpc, isl in-tree.
And configure simply with ../gcc-trunk/configure --prefix=/home/ed/gnu/install --enable-languages=all


Thanks
Bernd.

> Can you then confirm the target triplet and
> compilation options ?
> 
> Thanks in advance!
> 
> Olivier
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [Ada] Make middle-end string literals NUL terminated
  2018-08-07 17:10           ` Bernd Edlinger
@ 2018-08-09 13:08             ` Olivier Hainque
  0 siblings, 0 replies; 41+ messages in thread
From: Olivier Hainque @ 2018-08-09 13:08 UTC (permalink / raw)
  To: Bernd Edlinger; +Cc: gcc-patches

Hi Bernd,

> On 07 Aug 2018, at 19:10, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> 
>>> procedure Array9 is

>>> I see that "1234" is put in the merge section,
>>> but V1 is not.  Maybe because of the alignment requirement?
>>> 
>>> But it sould not be much different from the C test case,
>>> which is now able to merge the non-zero terminated strings.
>> 
>> I'm happy to have a look.

An effect of alignments and inlining I think.
I see expected behavior with calls to an external program
instead of the local one.

I tried the testcases we have here plus a couple
of variations, e.g. with strings explicitly terminated
by multiple ascii.nul, and they all seem to be working
fine :-)

Regarding Ada testcases for the dejagnu suite, I'm not
quite sure how to go about them.

For starters at least, we could for example contemplate
something like:

-- { dg-do compile }                                                            
-- { dg-options "-O1 -fmerge-all-constants" }                                   

procedure String_Merge1 is
   procedure Process (X : String);
   pragma Import (Ada, Process);
begin
   Process ("1234");
   Process ("ABCD" & Ascii.NUL);
end;

-- We expect something like:                                                    

-- .section  .rodata.str1.1,"aMS",@progbits,1                                   
-- .LC2:                                                                        
--     .string "1234"                                                           
-- .LC3:                                                                        
--     .string "ABCD"                                                           

-- Not clear how to match that generally enough.                                

-- { dg-final { scan-assembler ".rodata.str1.1,\"aMS\"\[^\n\r\]*\n\.LC.:\n\[^\n\
\r\]*\.string \"1234\"\n\.LC.:\n\[^\n\r\]*\.string \"ABCD\"\n"  } }             


As the comment suggests, I'm not clear if that's robust/general enough.
I suppose we probably can't always expect .string, for instance. I suppose
we could also have other bits between the .section and the literals.

Would be fine as long as it's not something like ".section  .rodata".

Olivier

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] Handle not explicitly zero terminated strings in merge sections
  2018-08-07 12:56           ` Bernd Edlinger
@ 2018-08-17 17:35             ` Bernd Edlinger
  2018-09-14 19:02               ` [PATCHv2] " Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-08-17 17:35 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

Hi,

this is an update of the patch, which has just two Ada test cases
added for string merging which are based on Olivier's suggested test case.

Note that except the "Check STRING_CSTs in varasm.c" patch series,
there is now another patch needed to work around issues with 71625:

[PATCH] Call braced_list_to_string after array size is fixed
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01015.html


Thanks
Bernd.


On 08/07/18 14:55, Bernd Edlinger wrote:
> Hi,
> 
> I realized that the previous version did not handle
> zero terminated Ada constants correctly as in
> 
> $ cat hello.adb
> procedure Hello is
>     procedure puts  (S : String) with Import, Convention => C;
>     X : String(1..8) := "abcdefg" & Ascii.Nul;
> begin
>    puts ("1234" & Ascii.Nul );
>    puts (X);
> end;
> 
> accidentally strings were doubly nul-terminated, because I forgot to
> handle merge string sections in assemble_constant_contents, and
> because get_constant_size increased the string literal by one.
> 
> Fixed, and added a positive test case for the merging non-zero terminated
> strings in C.
> 
> 
> Boot-strapped and regtested on x86_64-pc-linux-gnu.
> Is it OK for trunk (after pre-condition patches)?
> 
> 
> Thanks
> Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-merge-strings.diff --]
[-- Type: text/x-patch; name="patch-merge-strings.diff", Size: 9527 bytes --]

gcc:
2018-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* varasm.c (output_constant): Add new parameter merge_strings.
	Make strings properly zero terminated in merge string sections.
	(mergeable_string_section): Don't fail if the last char is non-zero.
	(assemble_variable_contents): Handle merge string sections.
	(assemble_variable): Likewise.
	(assemble_constant_contents): Likewise.
	(output_constant_def_contents): Likewise.
	(get_constant_size): Remove special handling of STRING_CSTs.
	(redundant_nul_term_p): New helper function.

testsuite:
2018-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* gnat.dg/string_merge1.adb: New test.
	* gnat.dg/string_merge2.adb: New test.
	* gcc.dg/merge-all-constants-1.c: Adjust test.
	* gcc.dg/merge-all-constants-2.c: New test.

Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 263072)
+++ gcc/varasm.c	(working copy)
@@ -111,7 +111,8 @@ static int compare_constant (const tree, const tre
 static void output_constant_def_contents (rtx);
 static void output_addressed_constants (tree);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
-					       unsigned int, bool);
+					       unsigned int, bool,
+					       bool = false);
 static void globalize_decl (tree);
 static bool decl_readonly_section_1 (enum section_category);
 #ifdef BSS_SECTION_ASM_OP
@@ -827,7 +828,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS
 	  unit = GET_MODE_SIZE (mode);
 
 	  /* Check for embedded NUL characters.  */
-	  for (i = 0; i < len; i += unit)
+	  for (i = 0; i < len - unit; i += unit)
 	    {
 	      for (j = 0; j < unit; j++)
 		if (str[i + j] != '\0')
@@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, const char
 
 static void
 assemble_variable_contents (tree decl, const char *name,
-			    bool dont_output_data)
+			    bool dont_output_data, bool merge_strings = false)
 {
   /* Do any machine/system dependent processing of the object.  */
 #ifdef ASM_DECLARE_OBJECT_NAME
@@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, const char
 	output_constant (DECL_INITIAL (decl),
 			 tree_to_uhwi (DECL_SIZE_UNIT (decl)),
 			 get_variable_align (decl),
-			 false);
+			 false, merge_strings);
       else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
@@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_level ATTRIB
 	switch_to_section (sect);
       if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-      assemble_variable_contents (decl, name, dont_output_data);
+      assemble_variable_contents (decl, name, dont_output_data,
+				  sect->common.flags & SECTION_MERGE
+				  && sect->common.flags & SECTION_STRINGS);
       if (asan_protected)
 	{
 	  unsigned HOST_WIDE_INT int size
@@ -3300,12 +3303,7 @@ get_constant_section (tree exp, unsigned int align
 static HOST_WIDE_INT
 get_constant_size (tree exp)
 {
-  HOST_WIDE_INT size;
-
-  size = int_size_in_bytes (TREE_TYPE (exp));
-  if (TREE_CODE (exp) == STRING_CST)
-    size = MAX (TREE_STRING_LENGTH (exp), size);
-  return size;
+  return int_size_in_bytes (TREE_TYPE (exp));
 }
 
 /* Subroutine of output_constant_def:
@@ -3468,7 +3466,8 @@ maybe_output_constant_def_contents (struct constan
    constant's alignment in bits.  */
 
 static void
-assemble_constant_contents (tree exp, const char *label, unsigned int align)
+assemble_constant_contents (tree exp, const char *label, unsigned int align,
+			    bool merge_strings = false)
 {
   HOST_WIDE_INT size;
 
@@ -3478,7 +3477,7 @@ static void
   targetm.asm_out.declare_constant_name (asm_out_file, label, exp, size);
 
   /* Output the value of EXP.  */
-  output_constant (exp, size, align, false);
+  output_constant (exp, size, align, false, merge_strings);
 
   targetm.asm_out.decl_end ();
 }
@@ -3519,10 +3518,13 @@ output_constant_def_contents (rtx symbol)
 		   || (VAR_P (decl) && DECL_IN_CONSTANT_POOL (decl))
 		   ? DECL_ALIGN (decl)
 		   : symtab_node::get (decl)->definition_alignment ());
-      switch_to_section (get_constant_section (exp, align));
+      section *sect = get_constant_section (exp, align);
+      switch_to_section (sect);
       if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-      assemble_constant_contents (exp, XSTR (symbol, 0), align);
+      assemble_constant_contents (exp, XSTR (symbol, 0), align,
+				  sect->common.flags & SECTION_MERGE
+				  && sect->common.flags & SECTION_STRINGS);
       if (asan_protected)
 	{
 	  HOST_WIDE_INT size = get_constant_size (exp);
@@ -4786,6 +4789,19 @@ static unsigned HOST_WIDE_INT
 output_constructor (tree, unsigned HOST_WIDE_INT, unsigned int, bool,
 		    oc_outer_state *);
 
+/* Find out if a string P of length LEN and character size UNIT
+   has double nul termination.
+   The string is already known to have a nul termination at
+   offset LEN.  Find out if there is already a NUL termination
+   at offset LEN - UNIT.  */
+
+static bool
+redundant_nul_term_p (const char *p, unsigned len, unsigned unit)
+{
+  gcc_assert(len >= unit);
+  return !memcmp (p + len, p + len - unit, unit);
+}
+
 /* Output assembler code for constant EXP, with no label.
    This includes the pseudo-op such as ".int" or ".byte", and a newline.
    Assumes output_addressed_constants has been done on EXP already.
@@ -4812,7 +4850,7 @@ output_constructor (tree, unsigned HOST_WIDE_INT,
 
 static unsigned HOST_WIDE_INT
 output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
-		 bool reverse)
+		 bool reverse, bool merge_strings /* = false */)
 {
   enum tree_code code;
   unsigned HOST_WIDE_INT thissize;
@@ -4940,8 +4978,13 @@ output_constant (tree exp, unsigned HOST_WIDE_INT
 	case CONSTRUCTOR:
 	  return output_constructor (exp, size, align, reverse, NULL);
 	case STRING_CST:
-	  thissize
-	    = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size);
+	  thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp);
+	  if (thissize > size && merge_strings
+	      && !redundant_nul_term_p (TREE_STRING_POINTER (exp),
+					size, thissize - size))
+	    ;
+	  else
+	    thissize = MIN (thissize, size);
 	  gcc_checking_assert (check_string_literal (exp, size));
 	  assemble_string (TREE_STRING_POINTER (exp), thissize);
 	  break;
Index: gcc/testsuite/gnat.dg/string_merge1.adb
===================================================================
--- gcc/testsuite/gnat.dg/string_merge1.adb	(revision 0)
+++ gcc/testsuite/gnat.dg/string_merge1.adb	(working copy)
@@ -0,0 +1,19 @@
+-- { dg-do compile }
+-- { dg-options "-O1 -fmerge-all-constants" }
+
+procedure String_Merge1 is
+   procedure Process (X : String);
+   pragma Import (Ada, Process);
+begin
+   Process ("ABCD");
+end;
+
+-- We expect something like:
+
+-- .section  .rodata.str1.1,"aMS",@progbits,1
+-- .LC1:
+--     .string "ABCD"
+
+-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
+-- { dg-final { scan-assembler-times "\\.string" 1 } }
+-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }
Index: gcc/testsuite/gnat.dg/string_merge2.adb
===================================================================
--- gcc/testsuite/gnat.dg/string_merge2.adb	(revision 0)
+++ gcc/testsuite/gnat.dg/string_merge2.adb	(working copy)
@@ -0,0 +1,19 @@
+-- { dg-do compile }
+-- { dg-options "-O1 -fmerge-all-constants" }
+
+procedure String_Merge2 is
+   procedure Process (X : String);
+   pragma Import (Ada, Process);
+begin
+   Process ("ABCD" & Ascii.NUL);
+end;
+
+-- We expect something like:
+
+-- .section  .rodata.str1.1,"aMS",@progbits,1
+-- .LC1:
+--     .string "ABCD"
+
+-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
+-- { dg-final { scan-assembler-times "\\.string" 1 } }
+-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }
Index: gcc/testsuite/gcc.dg/merge-all-constants-1.c
===================================================================
--- gcc/testsuite/gcc.dg/merge-all-constants-1.c	(revision 263072)
+++ gcc/testsuite/gcc.dg/merge-all-constants-1.c	(working copy)
@@ -1,8 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-w -O2 -fmerge-all-constants" } */
 
-const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str1[36] = "\000123456789abcdefghijklmnopqrstuvwxyz";
 const char str2[38] = "0123456789abcdefghijklmnopqrstuvwxyz";
-const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str3[10] = "\000123456789abcdefghijklmnopqrstuvwxyz";
 
-/* { dg-final { scan-assembler-not "\.rodata\.str" } } */
+/* { dg-final { scan-assembler-not "\\.rodata\\.str" } } */
Index: gcc/testsuite/gcc.dg/merge-all-constants-2.c
===================================================================
--- gcc/testsuite/gcc.dg/merge-all-constants-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/merge-all-constants-2.c	(working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-w -O2 -fmerge-all-constants" } */
+
+const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
+
+/* { dg-final { scan-assembler-not "\\.rodata\[\n\r\]" } } */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-08-17 17:35             ` Bernd Edlinger
@ 2018-09-14 19:02               ` Bernd Edlinger
  2018-09-14 19:06                 ` Jakub Jelinek
                                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Bernd Edlinger @ 2018-09-14 19:02 UTC (permalink / raw)
  To: Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

Hi,

this is an upate of the string-merge section, it is based on the V2-STRING_CST
semantic patch series, which was finally installed yesterday.
It merges single-byte string constants with or without terminating NUL.
The patch has the same Ada and C test cases that were already in the V1 patch.

Thus there are no pre-requisite patches necessary at this time.

Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-merge-strings-v2.diff --]
[-- Type: text/x-patch; name="patch-merge-strings-v2.diff", Size: 8615 bytes --]

gcc:
2018-08-27  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* varasm.c (output_constant): Add new parameter merge_strings.
	Make strings properly zero terminated in merge string sections.
	(mergeable_string_section): Don't fail if the last char is non-zero.
	(assemble_variable_contents): Handle merge string sections.
	(assemble_variable): Likewise.
	(assemble_constant_contents): Likewise.
	(output_constant_def_contents): Likewise.

testsuite:
2018-08-27  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* gnat.dg/string_merge1.adb: New test.
	* gnat.dg/string_merge2.adb: New test.
	* gcc.dg/merge-all-constants-1.c: Adjust test.
	* gcc.dg/merge-all-constants-2.c: New test.

diff -Npur gcc/varasm.c gcc/varasm.c
--- gcc/varasm.c	2018-08-26 15:02:43.157905415 +0200
+++ gcc/varasm.c	2018-08-26 17:57:26.488494866 +0200
@@ -111,7 +111,8 @@ static int compare_constant (const tree,
 static void output_constant_def_contents (rtx);
 static void output_addressed_constants (tree);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
-					       unsigned int, bool);
+					       unsigned int, bool,
+					       bool = false);
 static void globalize_decl (tree);
 static bool decl_readonly_section_1 (enum section_category);
 #ifdef BSS_SECTION_ASM_OP
@@ -804,8 +805,8 @@ mergeable_string_section (tree decl ATTR
       && TREE_CODE (decl) == STRING_CST
       && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
       && align <= 256
-      && (len = int_size_in_bytes (TREE_TYPE (decl))) > 0
-      && TREE_STRING_LENGTH (decl) >= len)
+      && (len = int_size_in_bytes (TREE_TYPE (decl))) >= 0
+      && TREE_STRING_LENGTH (decl) == len)
     {
       scalar_int_mode mode;
       unsigned int modesize;
@@ -835,7 +836,7 @@ mergeable_string_section (tree decl ATTR
 	      if (j == unit)
 		break;
 	    }
-	  if (i == len - unit)
+	  if (i == len - unit || (unit == 1 && i == len))
 	    {
 	      sprintf (name, "%s.str%d.%d", prefix,
 		       modesize / 8, (int) (align / 8));
@@ -2117,7 +2118,7 @@ assemble_noswitch_variable (tree decl, c
 
 static void
 assemble_variable_contents (tree decl, const char *name,
-			    bool dont_output_data)
+			    bool dont_output_data, bool merge_strings = false)
 {
   /* Do any machine/system dependent processing of the object.  */
 #ifdef ASM_DECLARE_OBJECT_NAME
@@ -2140,7 +2141,7 @@ assemble_variable_contents (tree decl, c
 	output_constant (DECL_INITIAL (decl),
 			 tree_to_uhwi (DECL_SIZE_UNIT (decl)),
 			 get_variable_align (decl),
-			 false);
+			 false, merge_strings);
       else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
@@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_le
 	switch_to_section (sect);
       if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-      assemble_variable_contents (decl, name, dont_output_data);
+      assemble_variable_contents (decl, name, dont_output_data,
+				  sect->common.flags & SECTION_MERGE
+				  && sect->common.flags & SECTION_STRINGS);
       if (asan_protected)
 	{
 	  unsigned HOST_WIDE_INT int size
@@ -3471,7 +3474,8 @@ maybe_output_constant_def_contents (stru
    constant's alignment in bits.  */
 
 static void
-assemble_constant_contents (tree exp, const char *label, unsigned int align)
+assemble_constant_contents (tree exp, const char *label, unsigned int align,
+			    bool merge_strings = false)
 {
   HOST_WIDE_INT size;
 
@@ -3481,7 +3485,7 @@ assemble_constant_contents (tree exp, co
   targetm.asm_out.declare_constant_name (asm_out_file, label, exp, size);
 
   /* Output the value of EXP.  */
-  output_constant (exp, size, align, false);
+  output_constant (exp, size, align, false, merge_strings);
 
   targetm.asm_out.decl_end ();
 }
@@ -3522,10 +3526,13 @@ output_constant_def_contents (rtx symbol
 		   || (VAR_P (decl) && DECL_IN_CONSTANT_POOL (decl))
 		   ? DECL_ALIGN (decl)
 		   : symtab_node::get (decl)->definition_alignment ());
-      switch_to_section (get_constant_section (exp, align));
+      section *sect = get_constant_section (exp, align);
+      switch_to_section (sect);
       if (align > BITS_PER_UNIT)
 	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
-      assemble_constant_contents (exp, XSTR (symbol, 0), align);
+      assemble_constant_contents (exp, XSTR (symbol, 0), align,
+				  sect->common.flags & SECTION_MERGE
+				  && sect->common.flags & SECTION_STRINGS);
       if (asan_protected)
 	{
 	  HOST_WIDE_INT size = get_constant_size (exp);
@@ -4838,7 +4845,7 @@ output_constructor (tree, unsigned HOST_
 
 static unsigned HOST_WIDE_INT
 output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
-		 bool reverse)
+		 bool reverse, bool merge_strings /* = false */)
 {
   enum tree_code code;
   unsigned HOST_WIDE_INT thissize;
@@ -4966,8 +4973,11 @@ output_constant (tree exp, unsigned HOST
 	case CONSTRUCTOR:
 	  return output_constructor (exp, size, align, reverse, NULL);
 	case STRING_CST:
-	  thissize
-	    = MIN ((unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp), size);
+	  thissize = (unsigned HOST_WIDE_INT)TREE_STRING_LENGTH (exp);
+	  if (merge_strings
+	      && (thissize == 0
+		  || TREE_STRING_POINTER (exp) [thissize - 1] != '\0'))
+	    thissize++;
 	  gcc_checking_assert (check_string_literal (exp, size));
 	  assemble_string (TREE_STRING_POINTER (exp), thissize);
 	  break;
diff -Npur gcc/testsuite/gcc.dg/merge-all-constants-1.c gcc/testsuite/gcc.dg/merge-all-constants-1.c
--- gcc/testsuite/gcc.dg/merge-all-constants-1.c	2018-08-16 17:28:11.000000000 +0200
+++ gcc/testsuite/gcc.dg/merge-all-constants-1.c	2018-08-26 16:31:12.650271931 +0200
@@ -1,8 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-w -O2 -fmerge-all-constants" } */
 
-const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str1[36] = "\000123456789abcdefghijklmnopqrstuvwxyz";
 const char str2[38] = "0123456789abcdefghijklmnopqrstuvwxyz";
-const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str3[10] = "\000123456789abcdefghijklmnopqrstuvwxyz";
 
-/* { dg-final { scan-assembler-not "\.rodata\.str" } } */
+/* { dg-final { scan-assembler-not "\\.rodata\\.str" } } */
diff -Npur gcc/testsuite/gcc.dg/merge-all-constants-2.c gcc/testsuite/gcc.dg/merge-all-constants-2.c
--- gcc/testsuite/gcc.dg/merge-all-constants-2.c	1970-01-01 01:00:00.000000000 +0100
+++ gcc/testsuite/gcc.dg/merge-all-constants-2.c	2018-08-26 16:31:12.650271931 +0200
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-w -O2 -fmerge-all-constants" } */
+
+const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz";
+const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
+
+/* { dg-final { scan-assembler-not "\\.rodata\[\n\r\]" } } */
diff -Npur gcc/testsuite/gnat.dg/string_merge1.adb gcc/testsuite/gnat.dg/string_merge1.adb
--- gcc/testsuite/gnat.dg/string_merge1.adb	1970-01-01 01:00:00.000000000 +0100
+++ gcc/testsuite/gnat.dg/string_merge1.adb	2018-08-26 16:31:12.650271931 +0200
@@ -0,0 +1,19 @@
+-- { dg-do compile }
+-- { dg-options "-O1 -fmerge-all-constants" }
+
+procedure String_Merge1 is
+   procedure Process (X : String);
+   pragma Import (Ada, Process);
+begin
+   Process ("ABCD");
+end;
+
+-- We expect something like:
+
+-- .section  .rodata.str1.1,"aMS",@progbits,1
+-- .LC1:
+--     .string "ABCD"
+
+-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
+-- { dg-final { scan-assembler-times "\\.string" 1 } }
+-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }
diff -Npur gcc/testsuite/gnat.dg/string_merge2.adb gcc/testsuite/gnat.dg/string_merge2.adb
--- gcc/testsuite/gnat.dg/string_merge2.adb	1970-01-01 01:00:00.000000000 +0100
+++ gcc/testsuite/gnat.dg/string_merge2.adb	2018-08-26 16:31:12.650271931 +0200
@@ -0,0 +1,19 @@
+-- { dg-do compile }
+-- { dg-options "-O1 -fmerge-all-constants" }
+
+procedure String_Merge2 is
+   procedure Process (X : String);
+   pragma Import (Ada, Process);
+begin
+   Process ("ABCD" & Ascii.NUL);
+end;
+
+-- We expect something like:
+
+-- .section  .rodata.str1.1,"aMS",@progbits,1
+-- .LC1:
+--     .string "ABCD"
+
+-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
+-- { dg-final { scan-assembler-times "\\.string" 1 } }
+-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-09-14 19:02               ` [PATCHv2] " Bernd Edlinger
@ 2018-09-14 19:06                 ` Jakub Jelinek
  2018-09-14 19:31                   ` Bernd Edlinger
  2018-10-03 16:35                 ` Jeff Law
  2018-10-05 18:54                 ` Andreas Schwab
  2 siblings, 1 reply; 41+ messages in thread
From: Jakub Jelinek @ 2018-09-14 19:06 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Olivier Hainque, gcc-patches, Richard Biener, Eric Botcazou,
	Arnaud Charlet, Jeff Law

On Fri, Sep 14, 2018 at 06:39:38PM +0000, Bernd Edlinger wrote:
> Hi,
> 
> this is an upate of the string-merge section, it is based on the V2-STRING_CST
> semantic patch series, which was finally installed yesterday.
> It merges single-byte string constants with or without terminating NUL.
> The patch has the same Ada and C test cases that were already in the V1 patch.
> 
> Thus there are no pre-requisite patches necessary at this time.
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

I believe SHF_MERGE | SHF_STRINGS sections should only ever contain
zero-terminated strings.
See e.g. SCO ELF documentation:
"The size of each element is specified in the section header's sh_entsize field. If the SHF_STRINGS flag is also set, the data elements consist of null-terminated character strings."
http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html
https://docs.oracle.com/cd/E23824_01/html/819-0690/ggdlu.html

So, please never put non-zero terminated strings into MS sections.

	Jakub

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-09-14 19:06                 ` Jakub Jelinek
@ 2018-09-14 19:31                   ` Bernd Edlinger
  2018-09-21 11:59                     ` Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-09-14 19:31 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Olivier Hainque, gcc-patches, Richard Biener, Eric Botcazou,
	Arnaud Charlet, Jeff Law

On 09/14/18 21:01, Jakub Jelinek wrote:
> On Fri, Sep 14, 2018 at 06:39:38PM +0000, Bernd Edlinger wrote:
>> Hi,
>>
>> this is an upate of the string-merge section, it is based on the V2-STRING_CST
>> semantic patch series, which was finally installed yesterday.
>> It merges single-byte string constants with or without terminating NUL.
>> The patch has the same Ada and C test cases that were already in the V1 patch.
>>
>> Thus there are no pre-requisite patches necessary at this time.
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> 
> I believe SHF_MERGE | SHF_STRINGS sections should only ever contain
> zero-terminated strings.
> See e.g. SCO ELF documentation:
> "The size of each element is specified in the section header's sh_entsize field. If the SHF_STRINGS flag is also set, the data elements consist of null-terminated character strings."
> http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html
> https://docs.oracle.com/cd/E23824_01/html/819-0690/ggdlu.html
> 
> So, please never put non-zero terminated strings into MS sections.
> 

Yes, that is of course clear.

The idea is, if the string itself is not zero-terminated
another zero is added, that is only done in at the assembly level.

Strings with embedded zero bytes are not put in the merge section.


Bernd.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-09-14 19:31                   ` Bernd Edlinger
@ 2018-09-21 11:59                     ` Bernd Edlinger
  2018-09-21 12:24                       ` Jakub Jelinek
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-09-21 11:59 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Olivier Hainque, gcc-patches, Richard Biener, Eric Botcazou,
	Arnaud Charlet, Jeff Law

Hi Jakub,

are your concerns addressed with this answer, or do you have objections
to this patch?


Thanks
Bernd.


On 09/14/18 21:06, Bernd Edlinger wrote:
> On 09/14/18 21:01, Jakub Jelinek wrote:
>> On Fri, Sep 14, 2018 at 06:39:38PM +0000, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this is an upate of the string-merge section, it is based on the V2-STRING_CST
>>> semantic patch series, which was finally installed yesterday.
>>> It merges single-byte string constants with or without terminating NUL.
>>> The patch has the same Ada and C test cases that were already in the V1 patch.
>>>
>>> Thus there are no pre-requisite patches necessary at this time.
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>>
>> I believe SHF_MERGE | SHF_STRINGS sections should only ever contain
>> zero-terminated strings.
>> See e.g. SCO ELF documentation:
>> "The size of each element is specified in the section header's sh_entsize field. If the SHF_STRINGS flag is also set, the data elements consist of null-terminated character strings."
>> http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html
>> https://docs.oracle.com/cd/E23824_01/html/819-0690/ggdlu.html
>>
>> So, please never put non-zero terminated strings into MS sections.
>>
> 
> Yes, that is of course clear.
> 
> The idea is, if the string itself is not zero-terminated
> another zero is added, that is only done in at the assembly level.
> 
> Strings with embedded zero bytes are not put in the merge section.
> > 
> Bernd.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-09-21 11:59                     ` Bernd Edlinger
@ 2018-09-21 12:24                       ` Jakub Jelinek
  0 siblings, 0 replies; 41+ messages in thread
From: Jakub Jelinek @ 2018-09-21 12:24 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Olivier Hainque, gcc-patches, Richard Biener, Eric Botcazou,
	Arnaud Charlet, Jeff Law

On Fri, Sep 21, 2018 at 11:36:53AM +0000, Bernd Edlinger wrote:
> are your concerns addressed with this answer, or do you have objections
> to this patch?

No objections, but no time to review your patch either right now.

	Jakub

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-09-14 19:02               ` [PATCHv2] " Bernd Edlinger
  2018-09-14 19:06                 ` Jakub Jelinek
@ 2018-10-03 16:35                 ` Jeff Law
  2018-10-09 13:07                   ` Bernd Edlinger
  2018-10-05 18:54                 ` Andreas Schwab
  2 siblings, 1 reply; 41+ messages in thread
From: Jeff Law @ 2018-10-03 16:35 UTC (permalink / raw)
  To: Bernd Edlinger, Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek

On 9/14/18 12:39 PM, Bernd Edlinger wrote:
> Hi,
> 
> this is an upate of the string-merge section, it is based on the V2-STRING_CST
> semantic patch series, which was finally installed yesterday.
> It merges single-byte string constants with or without terminating NUL.
> The patch has the same Ada and C test cases that were already in the V1 patch.
> 
> Thus there are no pre-requisite patches necessary at this time.
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 
> 
> patch-merge-strings-v2.diff
> 
> gcc:
> 2018-08-27  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	* varasm.c (output_constant): Add new parameter merge_strings.
> 	Make strings properly zero terminated in merge string sections.
> 	(mergeable_string_section): Don't fail if the last char is non-zero.
> 	(assemble_variable_contents): Handle merge string sections.
> 	(assemble_variable): Likewise.
> 	(assemble_constant_contents): Likewise.
> 	(output_constant_def_contents): Likewise.
> 
> testsuite:
> 2018-08-27  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	* gnat.dg/string_merge1.adb: New test.
> 	* gnat.dg/string_merge2.adb: New test.
> 	* gcc.dg/merge-all-constants-1.c: Adjust test.
> 	* gcc.dg/merge-all-constants-2.c: New test.
Thanks for you patience.


> 
> diff -Npur gcc/varasm.c gcc/varasm.c
> --- gcc/varasm.c	2018-08-26 15:02:43.157905415 +0200
> +++ gcc/varasm.c	2018-08-26 17:57:26.488494866 +0200
> @@ -111,7 +111,8 @@ static int compare_constant (const tree,
>  static void output_constant_def_contents (rtx);
>  static void output_addressed_constants (tree);
>  static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
> -					       unsigned int, bool);
> +					       unsigned int, bool,
> +					       bool = false);
Given there's less than 10 call sites in total, avoid defaulting the
argument and just pass it in unconditionally.  Similarly for
assemble_constant_contents which has just a couple calls.



>  static void globalize_decl (tree);
>  static bool decl_readonly_section_1 (enum section_category);
>  #ifdef BSS_SECTION_ASM_OP
> @@ -804,8 +805,8 @@ mergeable_string_section (tree decl ATTR
>        && TREE_CODE (decl) == STRING_CST
>        && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
>        && align <= 256
> -      && (len = int_size_in_bytes (TREE_TYPE (decl))) > 0
> -      && TREE_STRING_LENGTH (decl) >= len)
> +      && (len = int_size_in_bytes (TREE_TYPE (decl))) >= 0
> +      && TREE_STRING_LENGTH (decl) == len)
Not sure why you want to test for >= 0 here.  > 0 seems sufficient,
though I guess there's no harm in the = 0 case.


> @@ -835,7 +836,7 @@ mergeable_string_section (tree decl ATTR
>  	      if (j == unit)
>  		break;
>  	    }
> -	  if (i == len - unit)
> +	  if (i == len - unit || (unit == 1 && i == len))
>  	    {
>  	      sprintf (name, "%s.str%d.%d", prefix,
>  		       modesize / 8, (int) (align / 8));
> @@ -2316,7 +2317,9 @@ assemble_variable (tree decl, int top_le
>  	switch_to_section (sect);
>        if (align > BITS_PER_UNIT)
>  	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
> -      assemble_variable_contents (decl, name, dont_output_data);
> +      assemble_variable_contents (decl, name, dont_output_data,
> +				  sect->common.flags & SECTION_MERGE
> +				  && sect->common.flags & SECTION_STRINGS);
I believe our style guidelines would call for parenthesis around the
multi-line expression (where you check SECTION_MERGE and SECTION_STRINGS).


> @@ -3522,10 +3526,13 @@ output_constant_def_contents (rtx symbol
>  		   || (VAR_P (decl) && DECL_IN_CONSTANT_POOL (decl))
>  		   ? DECL_ALIGN (decl)
>  		   : symtab_node::get (decl)->definition_alignment ());
> -      switch_to_section (get_constant_section (exp, align));
> +      section *sect = get_constant_section (exp, align);
> +      switch_to_section (sect);
>        if (align > BITS_PER_UNIT)
>  	ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
> -      assemble_constant_contents (exp, XSTR (symbol, 0), align);
> +      assemble_constant_contents (exp, XSTR (symbol, 0), align,
> +				  sect->common.flags & SECTION_MERGE
> +				  && sect->common.flags & SECTION_STRINGS);
Similarly, you need some parens here.



I think with the changing of the default parm to a normal parm and the
style changes this is OK for the trunk.  Given it's independent of
everything else going on, go ahead and commit it when you've fixed up
those minor issues.

jeff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-09-14 19:02               ` [PATCHv2] " Bernd Edlinger
  2018-09-14 19:06                 ` Jakub Jelinek
  2018-10-03 16:35                 ` Jeff Law
@ 2018-10-05 18:54                 ` Andreas Schwab
  2018-10-05 20:00                   ` Bernd Edlinger
  2 siblings, 1 reply; 41+ messages in thread
From: Andreas Schwab @ 2018-10-05 18:54 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Olivier Hainque, gcc-patches, Richard Biener, Eric Botcazou,
	Arnaud Charlet, Jakub Jelinek, Jeff Law

On Sep 14 2018, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:

> diff -Npur gcc/testsuite/gnat.dg/string_merge1.adb gcc/testsuite/gnat.dg/string_merge1.adb
> --- gcc/testsuite/gnat.dg/string_merge1.adb	1970-01-01 01:00:00.000000000 +0100
> +++ gcc/testsuite/gnat.dg/string_merge1.adb	2018-08-26 16:31:12.650271931 +0200
> @@ -0,0 +1,19 @@
> +-- { dg-do compile }
> +-- { dg-options "-O1 -fmerge-all-constants" }
> +
> +procedure String_Merge1 is
> +   procedure Process (X : String);
> +   pragma Import (Ada, Process);
> +begin
> +   Process ("ABCD");
> +end;
> +
> +-- We expect something like:
> +
> +-- .section  .rodata.str1.1,"aMS",@progbits,1
> +-- .LC1:
> +--     .string "ABCD"
> +
> +-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
> +-- { dg-final { scan-assembler-times "\\.string" 1 } }
> +-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }

FAIL: gnat.dg/string_merge1.adb scan-assembler-times \\.string 1

$ grep ABCD string_merge1.s 
        stringz "ABCD"

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-05 18:54                 ` Andreas Schwab
@ 2018-10-05 20:00                   ` Bernd Edlinger
  2018-10-08 11:22                     ` Rainer Orth
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-10-05 20:00 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Olivier Hainque, gcc-patches, Richard Biener, Eric Botcazou,
	Arnaud Charlet, Jakub Jelinek, Jeff Law

On 10/05/18 20:15, Andreas Schwab wrote:
> On Sep 14 2018, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> 
>> diff -Npur gcc/testsuite/gnat.dg/string_merge1.adb gcc/testsuite/gnat.dg/string_merge1.adb
>> --- gcc/testsuite/gnat.dg/string_merge1.adb	1970-01-01 01:00:00.000000000 +0100
>> +++ gcc/testsuite/gnat.dg/string_merge1.adb	2018-08-26 16:31:12.650271931 +0200
>> @@ -0,0 +1,19 @@
>> +-- { dg-do compile }
>> +-- { dg-options "-O1 -fmerge-all-constants" }
>> +
>> +procedure String_Merge1 is
>> +   procedure Process (X : String);
>> +   pragma Import (Ada, Process);
>> +begin
>> +   Process ("ABCD");
>> +end;
>> +
>> +-- We expect something like:
>> +
>> +-- .section  .rodata.str1.1,"aMS",@progbits,1
>> +-- .LC1:
>> +--     .string "ABCD"
>> +
>> +-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
>> +-- { dg-final { scan-assembler-times "\\.string" 1 } }
>> +-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }
> 
> FAIL: gnat.dg/string_merge1.adb scan-assembler-times \\.string 1
> 
> $ grep ABCD string_merge1.s
>          stringz "ABCD"
> 

Ah, thanks.

Turns out there are too much variations, like mentioned stringz, and asciz, and
probably lots more here.

But for the purpose of testing the optimization it should be sufficient to look for
".rodata.str" in the assembler.

So I committed the following as obvious:

Index: gnat.dg/string_merge2.adb
===================================================================
--- gnat.dg/string_merge2.adb	(Revision 264887)
+++ gnat.dg/string_merge2.adb	(Revision 264888)
@@ -15,5 +15,3 @@
  --     .string "ABCD"
  
  -- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
--- { dg-final { scan-assembler-times "\\.string" 1 } }
--- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }
Index: gnat.dg/string_merge1.adb
===================================================================
--- gnat.dg/string_merge1.adb	(Revision 264887)
+++ gnat.dg/string_merge1.adb	(Revision 264888)
@@ -15,5 +15,3 @@
  --     .string "ABCD"
  
  -- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
--- { dg-final { scan-assembler-times "\\.string" 1 } }
--- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }
Index: ChangeLog
===================================================================
--- ChangeLog	(Revision 264887)
+++ ChangeLog	(Revision 264888)
@@ -1,3 +1,8 @@
+2018-10-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>
+
+	* gnat.dg/string_merge1.adb: Fix test expectations.
+	* gnat.dg/string_merge2.adb: Likewise.
+
  2018-10-05  David Malcolm  <dmalcolm@redhat.com>
  
  	PR c++/56856



Thanks
Bernd.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-05 20:00                   ` Bernd Edlinger
@ 2018-10-08 11:22                     ` Rainer Orth
  2018-10-08 13:00                       ` Rainer Orth
                                         ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Rainer Orth @ 2018-10-08 11:22 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Andreas Schwab, Olivier Hainque, gcc-patches, Richard Biener,
	Eric Botcazou, Arnaud Charlet, Jakub Jelinek, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 3010 bytes --]

Hi Bernd,

> On 10/05/18 20:15, Andreas Schwab wrote:
>> On Sep 14 2018, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>> 
>>> diff -Npur gcc/testsuite/gnat.dg/string_merge1.adb
>>> gcc/testsuite/gnat.dg/string_merge1.adb
>>> --- gcc/testsuite/gnat.dg/string_merge1.adb 1970-01-01
>>> 01:00:00.000000000 +0100
>>> +++ gcc/testsuite/gnat.dg/string_merge1.adb 2018-08-26
>>> 16:31:12.650271931 +0200
>>> @@ -0,0 +1,19 @@
>>> +-- { dg-do compile }
>>> +-- { dg-options "-O1 -fmerge-all-constants" }
>>> +
>>> +procedure String_Merge1 is
>>> +   procedure Process (X : String);
>>> +   pragma Import (Ada, Process);
>>> +begin
>>> +   Process ("ABCD");
>>> +end;
>>> +
>>> +-- We expect something like:
>>> +
>>> +-- .section  .rodata.str1.1,"aMS",@progbits,1
>>> +-- .LC1:
>>> +--     .string "ABCD"
>>> +
>>> +-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
>>> +-- { dg-final { scan-assembler-times "\\.string" 1 } }
>>> +-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }
>> 
>> FAIL: gnat.dg/string_merge1.adb scan-assembler-times \\.string 1
>> 
>> $ grep ABCD string_merge1.s
>>          stringz "ABCD"
>> 
>
> Ah, thanks.
>
> Turns out there are too much variations, like mentioned stringz, and asciz, and
> probably lots more here.
>
> But for the purpose of testing the optimization it should be sufficient to look for
> ".rodata.str" in the assembler.
>
> So I committed the following as obvious:

unfortunately that patch is not enough and still shows a fundamental
problem:

* The tests still FAIL on Solaris with /bin/as and Darwin:

FAIL: gcc.dg/merge-all-constants-2.c scan-assembler-not \\\\.rodata[\\n\\r]

FAIL: gnat.dg/string_merge1.adb scan-assembler-times \\\\.rodata\\\\.str 1
FAIL: gnat.dg/string_merge2.adb scan-assembler-times \\\\.rodata\\\\.str 1

  Both targets lack string merging support, so the tests should check
  for that.

* The merge-all-constants-2.c test doesn't FAIL on Solaris/SPARC with
  /bin/as, although it lacks string merging support, too.  The assembler
  output contains

        .section        ".rodata"

  so the pattern currently used to check for .rodata is too
  restrictive.  There is assembler syntax beyond gas on x86 ;-)

The following patch fixes the former issue; tested on
sparc-sun-solaris2.11 with as (no string merging) resp. gas (with string
merging).  Installed on mainline.

Besides, the patch seems to have produced more fallout on Solaris: I see
many new Go testsuite failures on Solaris 10 which probably are related,
and Solaris bootstrap with Ada included is broken due to the extended
usage of string merging.  I'm currently investigating what's going on
there.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-10-08  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	* gcc.dg/merge-all-constants-2.c: Require string_merging support.
	* gnat.dg/string_merge1.adb: Likewise.
	* gnat.dg/string_merge2.adb: Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: testsuite-string_merge.patch --]
[-- Type: text/x-patch, Size: 1264 bytes --]

# HG changeset patch
# Parent  69da4995a4d01be3b3fb5cbf466648a814da22b1
Require string merging support in gnat.dg/string_merge?.adb etc.

diff --git a/gcc/testsuite/gcc.dg/merge-all-constants-2.c b/gcc/testsuite/gcc.dg/merge-all-constants-2.c
--- a/gcc/testsuite/gcc.dg/merge-all-constants-2.c
+++ b/gcc/testsuite/gcc.dg/merge-all-constants-2.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target string_merging } */
 /* { dg-options "-w -O2 -fmerge-all-constants" } */
 
 const char str1[36] = "0123456789abcdefghijklmnopqrstuvwxyz";
diff --git a/gcc/testsuite/gnat.dg/string_merge1.adb b/gcc/testsuite/gnat.dg/string_merge1.adb
--- a/gcc/testsuite/gnat.dg/string_merge1.adb
+++ b/gcc/testsuite/gnat.dg/string_merge1.adb
@@ -1,4 +1,5 @@
 -- { dg-do compile }
+-- { dg-require-effective-target string_merging }
 -- { dg-options "-O1 -fmerge-all-constants" }
 
 procedure String_Merge1 is
diff --git a/gcc/testsuite/gnat.dg/string_merge2.adb b/gcc/testsuite/gnat.dg/string_merge2.adb
--- a/gcc/testsuite/gnat.dg/string_merge2.adb
+++ b/gcc/testsuite/gnat.dg/string_merge2.adb
@@ -1,4 +1,5 @@
 -- { dg-do compile }
+-- { dg-require-effective-target string_merging }
 -- { dg-options "-O1 -fmerge-all-constants" }
 
 procedure String_Merge2 is

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-08 11:22                     ` Rainer Orth
@ 2018-10-08 13:00                       ` Rainer Orth
  2018-10-08 15:42                       ` Bernd Edlinger
  2018-10-08 22:16                       ` Eric Botcazou
  2 siblings, 0 replies; 41+ messages in thread
From: Rainer Orth @ 2018-10-08 13:00 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Andreas Schwab, Olivier Hainque, gcc-patches, Richard Biener,
	Eric Botcazou, Arnaud Charlet, Jakub Jelinek, Jeff Law

Hi Bernd,

> Besides, the patch seems to have produced more fallout on Solaris: I see
> many new Go testsuite failures on Solaris 10 which probably are related,
> and Solaris bootstrap with Ada included is broken due to the extended
> usage of string merging.  I'm currently investigating what's going on
> there.

I've filed PR bootstrap/87551 for the libgnat-9.so link failure.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-08 11:22                     ` Rainer Orth
  2018-10-08 13:00                       ` Rainer Orth
@ 2018-10-08 15:42                       ` Bernd Edlinger
  2018-10-09 13:08                         ` Rainer Orth
  2018-10-08 22:16                       ` Eric Botcazou
  2 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-10-08 15:42 UTC (permalink / raw)
  To: Rainer Orth
  Cc: Andreas Schwab, Olivier Hainque, gcc-patches, Richard Biener,
	Eric Botcazou, Arnaud Charlet, Jakub Jelinek, Jeff Law

On 10/08/18 13:14, Rainer Orth wrote:
> Hi Bernd,
> 
>> On 10/05/18 20:15, Andreas Schwab wrote:
>>> On Sep 14 2018, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>>>
>>>> diff -Npur gcc/testsuite/gnat.dg/string_merge1.adb
>>>> gcc/testsuite/gnat.dg/string_merge1.adb
>>>> --- gcc/testsuite/gnat.dg/string_merge1.adb 1970-01-01
>>>> 01:00:00.000000000 +0100
>>>> +++ gcc/testsuite/gnat.dg/string_merge1.adb 2018-08-26
>>>> 16:31:12.650271931 +0200
>>>> @@ -0,0 +1,19 @@
>>>> +-- { dg-do compile }
>>>> +-- { dg-options "-O1 -fmerge-all-constants" }
>>>> +
>>>> +procedure String_Merge1 is
>>>> +   procedure Process (X : String);
>>>> +   pragma Import (Ada, Process);
>>>> +begin
>>>> +   Process ("ABCD");
>>>> +end;
>>>> +
>>>> +-- We expect something like:
>>>> +
>>>> +-- .section  .rodata.str1.1,"aMS",@progbits,1
>>>> +-- .LC1:
>>>> +--     .string "ABCD"
>>>> +
>>>> +-- { dg-final { scan-assembler-times "\\.rodata\\.str" 1 } }
>>>> +-- { dg-final { scan-assembler-times "\\.string" 1 } }
>>>> +-- { dg-final { scan-assembler-times "\"ABCD\"" 1 } }
>>>
>>> FAIL: gnat.dg/string_merge1.adb scan-assembler-times \\.string 1
>>>
>>> $ grep ABCD string_merge1.s
>>>           stringz "ABCD"
>>>
>>
>> Ah, thanks.
>>
>> Turns out there are too much variations, like mentioned stringz, and asciz, and
>> probably lots more here.
>>
>> But for the purpose of testing the optimization it should be sufficient to look for
>> ".rodata.str" in the assembler.
>>
>> So I committed the following as obvious:
> 
> unfortunately that patch is not enough and still shows a fundamental
> problem:
> 
> * The tests still FAIL on Solaris with /bin/as and Darwin:
> 
> FAIL: gcc.dg/merge-all-constants-2.c scan-assembler-not \\\\.rodata[\\n\\r]
> 
> FAIL: gnat.dg/string_merge1.adb scan-assembler-times \\\\.rodata\\\\.str 1
> FAIL: gnat.dg/string_merge2.adb scan-assembler-times \\\\.rodata\\\\.str 1
> 
>    Both targets lack string merging support, so the tests should check
>    for that.
> 
> * The merge-all-constants-2.c test doesn't FAIL on Solaris/SPARC with
>    /bin/as, although it lacks string merging support, too.  The assembler
>    output contains
> 
>          .section        ".rodata"
> 
>    so the pattern currently used to check for .rodata is too
>    restrictive.  There is assembler syntax beyond gas on x86 ;-)
> 

For the test that failed with the quotes around .rodata.  I think
instead of looking for a end of line immediately after .rodata, it
would be sufficient to make sure it does not continue with .str, so
could you please try to add something like the following to your patch?

Index: gcc/testsuite/gcc.dg/merge-all-constants-2.c
===================================================================
--- gcc/testsuite/gcc.dg/merge-all-constants-2.c	(revision 264888)
+++ gcc/testsuite/gcc.dg/merge-all-constants-2.c	(working copy)
@@ -5,4 +5,4 @@
  const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz";
  const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
  
-/* { dg-final { scan-assembler-not "\\.rodata\[\n\r\]" } } */
+/* { dg-final { scan-assembler-not "\\.rodata\[^.]" } } */


> The following patch fixes the former issue; tested on
> sparc-sun-solaris2.11 with as (no string merging) resp. gas (with string
> merging).  Installed on mainline.
> 
> Besides, the patch seems to have produced more fallout on Solaris: I see
> many new Go testsuite failures on Solaris 10 which probably are related,
> and Solaris bootstrap with Ada included is broken due to the extended
> usage of string merging.  I'm currently investigating what's going on
> there.
> 
> 	Rainer
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-08 11:22                     ` Rainer Orth
  2018-10-08 13:00                       ` Rainer Orth
  2018-10-08 15:42                       ` Bernd Edlinger
@ 2018-10-08 22:16                       ` Eric Botcazou
  2018-10-09 12:40                         ` Rainer Orth
  2 siblings, 1 reply; 41+ messages in thread
From: Eric Botcazou @ 2018-10-08 22:16 UTC (permalink / raw)
  To: Rainer Orth
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

> Besides, the patch seems to have produced more fallout on Solaris: I see
> many new Go testsuite failures on Solaris 10 which probably are related,
> and Solaris bootstrap with Ada included is broken due to the extended
> usage of string merging.  I'm currently investigating what's going on
> there.

I can bootstrap on SPARC/Solaris 10 and 11 (GNU as and system linker) but I 
have regressions on Solaris 11 (and not 10) under the form of SIGBUSes:

                === acats tests ===
FAIL:   c34005g
FAIL:   c35508c
FAIL:   ce3602d
FAIL:   cxa4005
FAIL:   ee3412c

FAIL: gfortran.dg/allocatable_function_5.f90   -O1  execution test
FAIL: gfortran.dg/allocatable_function_5.f90   -O2  execution test
FAIL: gfortran.dg/allocatable_function_5.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/allocatable_function_5.f90   -O3 -g  execution test
FAIL: gfortran.dg/allocatable_function_5.f90   -Os  execution test
FAIL: gfortran.dg/char_allocation_1.f90   -O1  (test for excess errors)
UNRESOLVED: gfortran.dg/char_allocation_1.f90   -O1  compilation failed to 
produce executable
FAIL: gfortran.dg/char_allocation_1.f90   -O2  (test for excess errors)
UNRESOLVED: gfortran.dg/char_allocation_1.f90   -O2  compilation failed to 
produce executable
FAIL: gfortran.dg/char_allocation_1.f90   -O3 -fomit-frame-pointer -funroll-
loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
UNRESOLVED: gfortran.dg/char_allocation_1.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
produce executable
FAIL: gfortran.dg/char_allocation_1.f90   -O3 -g  (test for excess errors)
UNRESOLVED: gfortran.dg/char_allocation_1.f90   -O3 -g  compilation failed to 
produce executable
FAIL: gfortran.dg/char_pointer_assign_3.f90   -O2  execution test
FAIL: gfortran.dg/char_pointer_assign_3.f90   -O3 -fomit-frame-pointer -
funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/char_pointer_assign_3.f90   -O3 -g  execution test
FAIL: gfortran.dg/implied_do_io_2.f90   -Os  execution test
FAIL: gfortran.dg/namelist_38.f90   -O2  execution test
FAIL: gfortran.dg/namelist_38.f90   -O3 -fomit-frame-pointer -funroll-loops -
fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/namelist_38.f90   -O3 -g  execution test
FAIL: gfortran.dg/namelist_70.f90   -O1  execution test
FAIL: gfortran.dg/namelist_70.f90   -O2  execution test
FAIL: gfortran.dg/namelist_70.f90   -O3 -fomit-frame-pointer -funroll-loops -
fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/namelist_70.f90   -O3 -g  execution test
FAIL: gfortran.dg/namelist_70.f90   -Os  execution test
FAIL: gfortran.dg/nan_6.f90   -O1  execution test
FAIL: gfortran.dg/realloc_on_assign_4.f03   -O1  execution test
FAIL: gfortran.dg/realloc_on_assign_4.f03   -O2  execution test
FAIL: gfortran.dg/realloc_on_assign_4.f03   -O3 -fomit-frame-pointer -funroll-
loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/realloc_on_assign_4.f03   -O3 -g  execution test
FAIL: gfortran.dg/realloc_on_assign_4.f03   -Os  execution test
FAIL: gfortran.dg/streamio_14.f90   -O2  execution test
FAIL: gfortran.dg/streamio_14.f90   -O3 -fomit-frame-pointer -funroll-loops -
fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/streamio_14.f90   -O3 -g  execution test
FAIL: gfortran.dg/substr_7.f90   -O  (test for excess errors)
UNRESOLVED: gfortran.dg/substr_7.f90   -O  compilation failed to produce 
executable
FAIL: gfortran.dg/widechar_2.f90   -O1  execution test
FAIL: gfortran.dg/widechar_2.f90   -Os  execution test
FAIL: gfortran.fortran-torture/execute/adjustr.f90 execution,  -Os

FAIL: go.test/test/fixedbugs/issue4562.go execution,  -O2 -g 
FAIL: go.test/test/nil.go execution,  -O2 -g

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-08 22:16                       ` Eric Botcazou
@ 2018-10-09 12:40                         ` Rainer Orth
  2018-10-09 17:19                           ` Eric Botcazou
  0 siblings, 1 reply; 41+ messages in thread
From: Rainer Orth @ 2018-10-09 12:40 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

Hi Eric,

>> Besides, the patch seems to have produced more fallout on Solaris: I see
>> many new Go testsuite failures on Solaris 10 which probably are related,
>> and Solaris bootstrap with Ada included is broken due to the extended
>> usage of string merging.  I'm currently investigating what's going on
>> there.
>
> I can bootstrap on SPARC/Solaris 10 and 11 (GNU as and system linker) but I 
> have regressions on Solaris 11 (and not 10) under the form of SIGBUSes:

I could bootstrap on Solaris/SPARC with Bernd's patch, too.  The
bootstrap failure only occured on Solaris 11/x86 with gas.

Which version exactly (pkg list entire) of Solaris 11 are you running?
I'm using gas 2.31 and /bin/ld on Solaris 11.4 resp. 11.5 Beta, where
Bernd's patch in PR bootstrap/87551 fixed the remaining regressions.

>                 === acats tests ===
> FAIL:   c34005g
> FAIL:   c35508c
> FAIL:   ce3602d
> FAIL:   cxa4005
> FAIL:   ee3412c
>
> FAIL: gfortran.dg/allocatable_function_5.f90   -O1  execution test
> FAIL: gfortran.dg/allocatable_function_5.f90   -O2  execution test
> FAIL: gfortran.dg/allocatable_function_5.f90   -O3 -fomit-frame-pointer -
> funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/allocatable_function_5.f90   -O3 -g  execution test
> FAIL: gfortran.dg/allocatable_function_5.f90   -Os  execution test
> FAIL: gfortran.dg/char_allocation_1.f90   -O1  (test for excess errors)
> UNRESOLVED: gfortran.dg/char_allocation_1.f90   -O1  compilation failed to 
> produce executable
> FAIL: gfortran.dg/char_allocation_1.f90   -O2  (test for excess errors)
> UNRESOLVED: gfortran.dg/char_allocation_1.f90   -O2  compilation failed to 
> produce executable
> FAIL: gfortran.dg/char_allocation_1.f90   -O3 -fomit-frame-pointer -funroll-
> loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> UNRESOLVED: gfortran.dg/char_allocation_1.f90   -O3 -fomit-frame-pointer -
> funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
> produce executable
> FAIL: gfortran.dg/char_allocation_1.f90   -O3 -g  (test for excess errors)
> UNRESOLVED: gfortran.dg/char_allocation_1.f90   -O3 -g  compilation failed to 
> produce executable

Those are fixed by the patch in PR bootstrap/87551...

> FAIL: gfortran.dg/char_pointer_assign_3.f90   -O2  execution test
> FAIL: gfortran.dg/char_pointer_assign_3.f90   -O3 -fomit-frame-pointer -
> funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/char_pointer_assign_3.f90   -O3 -g  execution test
> FAIL: gfortran.dg/implied_do_io_2.f90   -Os  execution test
> FAIL: gfortran.dg/namelist_38.f90   -O2  execution test
> FAIL: gfortran.dg/namelist_38.f90   -O3 -fomit-frame-pointer -funroll-loops -
> fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/namelist_38.f90   -O3 -g  execution test
> FAIL: gfortran.dg/namelist_70.f90   -O1  execution test
> FAIL: gfortran.dg/namelist_70.f90   -O2  execution test
> FAIL: gfortran.dg/namelist_70.f90   -O3 -fomit-frame-pointer -funroll-loops -
> fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/namelist_70.f90   -O3 -g  execution test
> FAIL: gfortran.dg/namelist_70.f90   -Os  execution test
> FAIL: gfortran.dg/nan_6.f90   -O1  execution test
> FAIL: gfortran.dg/realloc_on_assign_4.f03   -O1  execution test
> FAIL: gfortran.dg/realloc_on_assign_4.f03   -O2  execution test
> FAIL: gfortran.dg/realloc_on_assign_4.f03   -O3 -fomit-frame-pointer -funroll-
> loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/realloc_on_assign_4.f03   -O3 -g  execution test
> FAIL: gfortran.dg/realloc_on_assign_4.f03   -Os  execution test
> FAIL: gfortran.dg/streamio_14.f90   -O2  execution test
> FAIL: gfortran.dg/streamio_14.f90   -O3 -fomit-frame-pointer -funroll-loops -
> fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/streamio_14.f90   -O3 -g  execution test

> FAIL: gfortran.dg/substr_7.f90   -O  (test for excess errors)
> UNRESOLVED: gfortran.dg/substr_7.f90   -O  compilation failed to produce 
> executable

... as is this one.

> FAIL: gfortran.dg/widechar_2.f90   -O1  execution test
> FAIL: gfortran.dg/widechar_2.f90   -Os  execution test
> FAIL: gfortran.fortran-torture/execute/adjustr.f90 execution,  -Os

> FAIL: go.test/test/fixedbugs/issue4562.go execution,  -O2 -g 
> FAIL: go.test/test/nil.go execution,  -O2 -g

These two were present before Bernd's patch.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-03 16:35                 ` Jeff Law
@ 2018-10-09 13:07                   ` Bernd Edlinger
  2018-10-10 18:25                     ` Jeff Law
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-10-09 13:07 UTC (permalink / raw)
  To: Jeff Law, Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek, Rainer Orth

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

On 10/03/18 18:31, Jeff Law wrote:
>> -      && (len = int_size_in_bytes (TREE_TYPE (decl))) > 0
>> -      && TREE_STRING_LENGTH (decl) >= len)
>> +      && (len = int_size_in_bytes (TREE_TYPE (decl))) >= 0
>> +      && TREE_STRING_LENGTH (decl) == len)
> Not sure why you want to test for >= 0 here.  > 0 seems sufficient,
> though I guess there's no harm in the = 0 case.
> 

Aehm, cough...

Sorry Jeff, I need to change that back.  It turns out that
completely empty strings don't work right, because of this
check in output_constant:

output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
                  bool reverse, bool merge_strings)
{
   enum tree_code code;
   unsigned HOST_WIDE_INT thissize;
   rtx cst;

   if (size == 0 || flag_syntax_only)
     return size;

So while my intention was to add a null-termination for all strings, including empty ones,
this does not work for empty strings, which was diagnosed by the solaris assembler/linker.

However since those empty strings do not use any space, there is no improvement
by merging them in the first place.


Rainer bootstrapped the attached patch successfully.
Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-string-merging-fix.diff --]
[-- Type: text/x-patch; name="patch-string-merging-fix.diff", Size: 719 bytes --]

2018-10-09  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* varasm.c (mergeable_string_section): Don't try to move zero-length
	strings to the merge section.

Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 264887)
+++ gcc/varasm.c	(working copy)
@@ -804,7 +804,7 @@ mergeable_string_section (tree decl ATTRIBUTE_UNUS
       && TREE_CODE (decl) == STRING_CST
       && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
       && align <= 256
-      && (len = int_size_in_bytes (TREE_TYPE (decl))) >= 0
+      && (len = int_size_in_bytes (TREE_TYPE (decl))) > 0
       && TREE_STRING_LENGTH (decl) == len)
     {
       scalar_int_mode mode;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-08 15:42                       ` Bernd Edlinger
@ 2018-10-09 13:08                         ` Rainer Orth
  2018-10-22 18:26                           ` Bernd Edlinger
  0 siblings, 1 reply; 41+ messages in thread
From: Rainer Orth @ 2018-10-09 13:08 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Andreas Schwab, Olivier Hainque, gcc-patches, Richard Biener,
	Eric Botcazou, Arnaud Charlet, Jakub Jelinek, Jeff Law

Hi Bernd,

>> * The merge-all-constants-2.c test doesn't FAIL on Solaris/SPARC with
>>    /bin/as, although it lacks string merging support, too.  The assembler
>>    output contains
>> 
>>          .section        ".rodata"
>> 
>>    so the pattern currently used to check for .rodata is too
>>    restrictive.  There is assembler syntax beyond gas on x86 ;-)
>> 
>
> For the test that failed with the quotes around .rodata.  I think
> instead of looking for a end of line immediately after .rodata, it
> would be sufficient to make sure it does not continue with .str, so
> could you please try to add something like the following to your patch?
>
> Index: gcc/testsuite/gcc.dg/merge-all-constants-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/merge-all-constants-2.c	(revision 264888)
> +++ gcc/testsuite/gcc.dg/merge-all-constants-2.c	(working copy)
> @@ -5,4 +5,4 @@
>   const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz";
>   const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
>   
> -/* { dg-final { scan-assembler-not "\\.rodata\[\n\r\]" } } */
> +/* { dg-final { scan-assembler-not "\\.rodata\[^.]" } } */

to do this; I've temporarily disabled the string_merging requirement in
the test and ran it on sparc-sun-solaris2.11:

* With as (no string merging), there's

        .section        ".rodata"

  in the output and I get the expected

FAIL: gcc.dg/merge-all-constants-2.c scan-assembler-not \\.rodata[^.]

* With gas however (string merging supported), the output has the usual

        .section        .rodata.str1.8,"aMS",@progbits,1

  and the test PASSes.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-09 12:40                         ` Rainer Orth
@ 2018-10-09 17:19                           ` Eric Botcazou
  2018-10-10 12:23                             ` Rainer Orth
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Botcazou @ 2018-10-09 17:19 UTC (permalink / raw)
  To: Rainer Orth
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

> Which version exactly (pkg list entire) of Solaris 11 are you running?
> I'm using gas 2.31 and /bin/ld on Solaris 11.4 resp. 11.5 Beta, where
> Bernd's patch in PR bootstrap/87551 fixed the remaining regressions.

Solaris 11.3 with Gas 2.30.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-09 17:19                           ` Eric Botcazou
@ 2018-10-10 12:23                             ` Rainer Orth
  2018-10-10 15:05                               ` Eric Botcazou
  0 siblings, 1 reply; 41+ messages in thread
From: Rainer Orth @ 2018-10-10 12:23 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

Hi Eric,

>> Which version exactly (pkg list entire) of Solaris 11 are you running?
>> I'm using gas 2.31 and /bin/ld on Solaris 11.4 resp. 11.5 Beta, where
>> Bernd's patch in PR bootstrap/87551 fixed the remaining regressions.
>
> Solaris 11.3 with Gas 2.30.

I could now reproduce the regressions on Solaris 11.3 SRU 35.6 (the
latest and last 11.3 update), e.g.

+FAIL: gfortran.dg/allocatable_function_5.f90   -O1  execution test


Program received signal SIGBUS: Access to an undefined portion of a memory object.

Backtrace for this error:

Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0x00010fe0 in foo (_carg=12, carg=..., .__result=0x215d4 <slen.0>, 
    __result=<optimized out>)
    at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gfortran.dg/allocatable_function_5.f90:41
41          res = carg(1:3)
(gdb) where
#0  0x00010fe0 in foo (_carg=12, carg=..., .__result=0x215d4 <slen.0>, 
    __result=<optimized out>)
    at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gfortran.dg/allocatable_function_5.f90:41
#1  MAIN__ ()
    at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gfortran.dg/allocatable_function_5.f90:22

1: x/i $pc
=> 0x10fe0 <MAIN__+24>: lduh  [ %g1 + 0x3a9 ], %g1
(gdb) p/x $g1
$1 = 0x10800

(gdb) x/7i MAIN__
   0x10fc8 <MAIN__>:    save  %sp, -104, %sp
   0x10fcc <MAIN__+4>:  call  0x213c4 <malloc@got.plt>
   0x10fd0 <MAIN__+8>:  mov  3, %o0
   0x10fd4 <MAIN__+12>: mov  %o0, %i5
   0x10fd8 <MAIN__+16>: sethi  %hi(0x10800), %g1
   0x10fdc <MAIN__+20>: or  %g1, 0x3a9, %g2     ! 0x10ba9
=> 0x10fe0 <MAIN__+24>: lduh  [ %g1 + 0x3a9 ], %g1

(gdb) x/s 0x10800+0x3a9
0x10ba9:        'foo calling \000'

Looking at the .rodata.str1.4 section, I see

$ objdump -s -j .rodata.str1.8 allocatable_function_5.exe

allocatable_function_5.exe:     file format elf32-sparc-sol2

Contents of section .rodata.str1.8:
 10ba8 6d666f6f 2063616c 6c696e67 20000000  mfoo calling ...
 10bb8 666f6f00 00000000 6c687300 00000000  foo.....lhs.....

This string table compression can be disabled with ld -z nocomprstrtab:

       -z nocompstrtab

           Disables the compression of ELF string  tables,  and  comment  sec-
           tions. By default, string compression is applied to SHT_STRTAB sec-
           tions, to SHT_PROGBITS  sections  that  have  their  SHF_MERGE  and
           SHF_STRINGS section flags set, and to comment sections.

This isn't necessary on Solaris 11.4, and Solaris 11.3/x86 isn't
affected as well.  I'm still determining what the best course of action
is: disable string merging support before Solaris 11.4 or enable the
workaround above instead.

sparc-sun-solaris2.11 and i386-pc-solaris2.11 bootstraps with
LD_OPTIONS='-z nocompstrtab' are currently running to check if this
fixes all regressions.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-10 12:23                             ` Rainer Orth
@ 2018-10-10 15:05                               ` Eric Botcazou
  2018-10-10 15:14                                 ` Rainer Orth
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Botcazou @ 2018-10-10 15:05 UTC (permalink / raw)
  To: Rainer Orth
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

> Looking at the .rodata.str1.4 section, I see
> 
> $ objdump -s -j .rodata.str1.8 allocatable_function_5.exe
> 
> allocatable_function_5.exe:     file format elf32-sparc-sol2
> 
> Contents of section .rodata.str1.8:
>  10ba8 6d666f6f 2063616c 6c696e67 20000000  mfoo calling ...
>  10bb8 666f6f00 00000000 6c687300 00000000  foo.....lhs.....
> 
> This string table compression can be disabled with ld -z nocomprstrtab:
> 
>        -z nocompstrtab
> 
>            Disables the compression of ELF string  tables,  and  comment 
> sec- tions. By default, string compression is applied to SHT_STRTAB sec-
> tions, to SHT_PROGBITS  sections  that  have  their  SHF_MERGE  and
> SHF_STRINGS section flags set, and to comment sections.

Thanks for tracking this down!

> This isn't necessary on Solaris 11.4, and Solaris 11.3/x86 isn't
> affected as well.  I'm still determining what the best course of action
> is: disable string merging support before Solaris 11.4 or enable the
> workaround above instead.

Out of curiosity, why isn't it necessary on Solaris 11.4?  Is string 
compression disabled or does it respect alignment on Solaris 11.4?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-10 15:05                               ` Eric Botcazou
@ 2018-10-10 15:14                                 ` Rainer Orth
  2018-10-23  8:36                                   ` Eric Botcazou
  0 siblings, 1 reply; 41+ messages in thread
From: Rainer Orth @ 2018-10-10 15:14 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

Hi Eric,

>> This isn't necessary on Solaris 11.4, and Solaris 11.3/x86 isn't
>> affected as well.  I'm still determining what the best course of action
>> is: disable string merging support before Solaris 11.4 or enable the
>> workaround above instead.
>
> Out of curiosity, why isn't it necessary on Solaris 11.4?  Is string 
> compression disabled or does it respect alignment on Solaris 11.4?

it's not disabled (I had to disable it when testing an a /bin/as version
with full SHF_MERGE/SHF_STRINGS suppurt recently), so I suspect the
latter.  In S11.4 .rodata and .rodata.str1.8 are merged, with the
alignment of the larger of the two on the output section.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-09 13:07                   ` Bernd Edlinger
@ 2018-10-10 18:25                     ` Jeff Law
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff Law @ 2018-10-10 18:25 UTC (permalink / raw)
  To: Bernd Edlinger, Olivier Hainque
  Cc: gcc-patches, Richard Biener, Eric Botcazou, Arnaud Charlet,
	Jakub Jelinek, Rainer Orth

On 10/9/18 6:45 AM, Bernd Edlinger wrote:
> On 10/03/18 18:31, Jeff Law wrote:
>>> -      && (len = int_size_in_bytes (TREE_TYPE (decl))) > 0
>>> -      && TREE_STRING_LENGTH (decl) >= len)
>>> +      && (len = int_size_in_bytes (TREE_TYPE (decl))) >= 0
>>> +      && TREE_STRING_LENGTH (decl) == len)
>> Not sure why you want to test for >= 0 here.  > 0 seems sufficient,
>> though I guess there's no harm in the = 0 case.
>>
> 
> Aehm, cough...
> 
> Sorry Jeff, I need to change that back.  It turns out that
> completely empty strings don't work right, because of this
> check in output_constant:
> 
> output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align,
>                   bool reverse, bool merge_strings)
> {
>    enum tree_code code;
>    unsigned HOST_WIDE_INT thissize;
>    rtx cst;
> 
>    if (size == 0 || flag_syntax_only)
>      return size;
> 
> So while my intention was to add a null-termination for all strings, including empty ones,
> this does not work for empty strings, which was diagnosed by the solaris assembler/linker.
> 
> However since those empty strings do not use any space, there is no improvement
> by merging them in the first place.
> 
> 
> Rainer bootstrapped the attached patch successfully.
> Is it OK for trunk?
OK.  Funny I almost called out the zero size stuff as being pointless to
bother handling in the first place.  Oh well.

Jeff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-09 13:08                         ` Rainer Orth
@ 2018-10-22 18:26                           ` Bernd Edlinger
  2018-10-22 18:29                             ` Rainer Orth
  0 siblings, 1 reply; 41+ messages in thread
From: Bernd Edlinger @ 2018-10-22 18:26 UTC (permalink / raw)
  To: Rainer Orth
  Cc: Andreas Schwab, Olivier Hainque, gcc-patches, Richard Biener,
	Eric Botcazou, Arnaud Charlet, Jakub Jelinek, Jeff Law

Hi Rainer,

On 10/9/18 3:05 PM, Rainer Orth wrote:
> Hi Bernd,
> 
>>> * The merge-all-constants-2.c test doesn't FAIL on Solaris/SPARC with
>>>     /bin/as, although it lacks string merging support, too.  The assembler
>>>     output contains
>>>
>>>           .section        ".rodata"
>>>
>>>     so the pattern currently used to check for .rodata is too
>>>     restrictive.  There is assembler syntax beyond gas on x86 ;-)
>>>
>>
>> For the test that failed with the quotes around .rodata.  I think
>> instead of looking for a end of line immediately after .rodata, it
>> would be sufficient to make sure it does not continue with .str, so
>> could you please try to add something like the following to your patch?
>>
>> Index: gcc/testsuite/gcc.dg/merge-all-constants-2.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/merge-all-constants-2.c	(revision 264888)
>> +++ gcc/testsuite/gcc.dg/merge-all-constants-2.c	(working copy)
>> @@ -5,4 +5,4 @@
>>    const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz";
>>    const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
>>    
>> -/* { dg-final { scan-assembler-not "\\.rodata\[\n\r\]" } } */
>> +/* { dg-final { scan-assembler-not "\\.rodata\[^.]" } } */
> 
> to do this; I've temporarily disabled the string_merging requirement in
> the test and ran it on sparc-sun-solaris2.11:
> 
> * With as (no string merging), there's
> 
>          .section        ".rodata"
> 
>    in the output and I get the expected
> 
> FAIL: gcc.dg/merge-all-constants-2.c scan-assembler-not \\.rodata[^.]
> 
> * With gas however (string merging supported), the output has the usual
> 
>          .section        .rodata.str1.8,"aMS",@progbits,1
> 
>    and the test PASSes.
> 

which is okay, right?


Bernd.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-22 18:26                           ` Bernd Edlinger
@ 2018-10-22 18:29                             ` Rainer Orth
  0 siblings, 0 replies; 41+ messages in thread
From: Rainer Orth @ 2018-10-22 18:29 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Andreas Schwab, Olivier Hainque, gcc-patches, Richard Biener,
	Eric Botcazou, Arnaud Charlet, Jakub Jelinek, Jeff Law

Hi Bernd,

> On 10/9/18 3:05 PM, Rainer Orth wrote:
>> Hi Bernd,
>> 
>>>> * The merge-all-constants-2.c test doesn't FAIL on Solaris/SPARC with
>>>>     /bin/as, although it lacks string merging support, too.  The assembler
>>>>     output contains
>>>>
>>>>           .section        ".rodata"
>>>>
>>>>     so the pattern currently used to check for .rodata is too
>>>>     restrictive.  There is assembler syntax beyond gas on x86 ;-)
>>>>
>>>
>>> For the test that failed with the quotes around .rodata.  I think
>>> instead of looking for a end of line immediately after .rodata, it
>>> would be sufficient to make sure it does not continue with .str, so
>>> could you please try to add something like the following to your patch?
>>>
>>> Index: gcc/testsuite/gcc.dg/merge-all-constants-2.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.dg/merge-all-constants-2.c	(revision 264888)
>>> +++ gcc/testsuite/gcc.dg/merge-all-constants-2.c	(working copy)
>>> @@ -5,4 +5,4 @@
>>>    const char str2[37] = "0123456789abcdefghijklmnopqrstuvwxyz";
>>>    const char str3[10] = "0123456789abcdefghijklmnopqrstuvwxyz";
>>>    
>>> -/* { dg-final { scan-assembler-not "\\.rodata\[\n\r\]" } } */
>>> +/* { dg-final { scan-assembler-not "\\.rodata\[^.]" } } */
>> 
>> to do this; I've temporarily disabled the string_merging requirement in
>> the test and ran it on sparc-sun-solaris2.11:
>> 
>> * With as (no string merging), there's
>> 
>>          .section        ".rodata"
>> 
>>    in the output and I get the expected
>> 
>> FAIL: gcc.dg/merge-all-constants-2.c scan-assembler-not \\.rodata[^.]
>> 
>> * With gas however (string merging supported), the output has the usual
>> 
>>          .section        .rodata.str1.8,"aMS",@progbits,1
>> 
>>    and the test PASSes.
>> 
>
> which is okay, right?

yes, exactly as expected.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-10 15:14                                 ` Rainer Orth
@ 2018-10-23  8:36                                   ` Eric Botcazou
  2018-10-23  8:37                                     ` Rainer Orth
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Botcazou @ 2018-10-23  8:36 UTC (permalink / raw)
  To: Rainer Orth
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

> it's not disabled (I had to disable it when testing an a /bin/as version
> with full SHF_MERGE/SHF_STRINGS suppurt recently), so I suspect the
> latter.  In S11.4 .rodata and .rodata.str1.8 are merged, with the
> alignment of the larger of the two on the output section.

OK.  The regressions are still there on Solaris 11.3 for me.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-23  8:36                                   ` Eric Botcazou
@ 2018-10-23  8:37                                     ` Rainer Orth
  2018-10-23  8:53                                       ` Eric Botcazou
  0 siblings, 1 reply; 41+ messages in thread
From: Rainer Orth @ 2018-10-23  8:37 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

Hi Eric,

>> it's not disabled (I had to disable it when testing an a /bin/as version
>> with full SHF_MERGE/SHF_STRINGS suppurt recently), so I suspect the
>> latter.  In S11.4 .rodata and .rodata.str1.8 are merged, with the
>> alignment of the larger of the two on the output section.
>
> OK.  The regressions are still there on Solaris 11.3 for me.

I know: a patch to fix this is almost ready, just needs a final round of
testing.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-23  8:37                                     ` Rainer Orth
@ 2018-10-23  8:53                                       ` Eric Botcazou
  2018-10-24 10:03                                         ` Rainer Orth
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Botcazou @ 2018-10-23  8:53 UTC (permalink / raw)
  To: Rainer Orth
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

> I know: a patch to fix this is almost ready, just needs a final round of
> testing.

OK, thanks for the information.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-23  8:53                                       ` Eric Botcazou
@ 2018-10-24 10:03                                         ` Rainer Orth
  2018-10-24 11:08                                           ` Richard Biener
  0 siblings, 1 reply; 41+ messages in thread
From: Rainer Orth @ 2018-10-24 10:03 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Bernd Edlinger, Andreas Schwab, Olivier Hainque,
	Richard Biener, Arnaud Charlet, Jakub Jelinek, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 2968 bytes --]

Hi Eric,

>> I know: a patch to fix this is almost ready, just needs a final round of
>> testing.
>
> OK, thanks for the information.

here's what I've got.  It took me two false starts, unfortunately:

* Initially, I just tried linking with LD_OPTIONS='-z nocompstrtab':

       -z nocompstrtab

           Disables the compression of ELF string  tables,  and  comment  sec-
           tions. By default, string compression is applied to SHT_STRTAB sec-
           tions, to SHT_PROGBITS  sections  that  have  their  SHF_MERGE  and
           SHF_STRINGS section flags set, and to comment sections.

  While that worked, this approach has several disadvantages: if the
  objects are somehow linked without that option, the resulting
  executables will suddenly and mysteriously start to die with SIGBUS.
  It would be far better if the objects themselves are save to use in
  whatever way without relying on a special non-default linker option.
  Besides, it also disable string table compression, so has a far larger
  impact than necessary.

* Next, I tried to disable HAVE_GAS_SHF_MERGE completely before Solaris
  11.4.  This also fixed the regressions you observed.  Unfortunately,
  while this is what's used with as/ld since the Solaris assembler
  doesn't fully support setting SHF_MERGE/SHF_STRINGS yet, when gas is
  in use, hundreds of Go test start to FAIL on both sparc and x86:

runtime stack:
fatal error: DW_FORM_strp out of range in .debug_info at 166762
panic during panic

  The error is from libbacktrace/dwarf.c (read_attribute) which is
  linked into libgo.so.13.  While readelf --debug-dump=info shows no
  problem, dwarfdump -a complains loudly.

  I haven't yet investigated what exactly is happening here.

* Instead I tried an approach that one of the Solaris linker engineers
  suggested which both avoids the need for special linker options and a
  large part of the impact: it just disables string merging for sections
  with alignment > 1 with older (pre-Solaris 11.4) versions of Solaris
  ld.  This one worked just fine and seems the preferred approach.
  While the Solaris 10/SPARC ld didn't show the regressions seen on
  11.3, I've decided to apply the workaround there, too, to avoid
  running into problems if those objects are later relinked on Solaris 11.

Bootstrapped without regressions on i386-pc-solaris2.1[01] and
sparc-sun-solaris2.1[01] (both Solaris 11.3 and 11.4 each) with as/ld,
gas/ld, and x86_64-pc-linux-gnu.

Ok for mainline?

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-10-19  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	* configure.ac (gcc_cv_ld_aligned_shf_merge): New test.
	* configure: Regenerate.
	* config.in: Regenerate.
	* varasm.c (mergeable_string_section): Use readonly_data_section
	if linker doesn't support SHF_MERGE with alignment > 8.
	(mergeable_constant_section): Likewise.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: sol2sparc-no-aligned-string-merging.patch --]
[-- Type: text/x-patch, Size: 2162 bytes --]

# HG changeset patch
# Parent  01647481afc4b26a304e22ee00c42bc5a9bbafe3
Disable string merging with alignment > 1 before Solaris 11.4/SPARC

diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3050,6 +3050,28 @@ AC_DEFINE_UNQUOTED(HAVE_GAS_SHF_MERGE,
   [`if test $gcc_cv_as_shf_merge = yes; then echo 1; else echo 0; fi`],
 [Define 0/1 if your assembler supports marking sections with SHF_MERGE flag.])
 
+gcc_cv_ld_aligned_shf_merge=yes
+case "$target" in
+  # While Solaris 10/SPARC ld isn't affected, disable to avoid problems
+  # relinking on Solaris 11 < 11.4.
+  sparc*-*-solaris2.10*)
+    if test x"$gnu_ld" = xno; then
+      gcc_cv_ld_aligned_shf_merge=no
+    fi
+    ;;
+  # SHF_MERGE support is broken in Solaris ld up to Solaris 11.3/SPARC for
+  # alignment > 1.
+  sparc*-*-solaris2.11*)
+    if test x"$gnu_ld" = xno \
+       && test "$ld_vers_major" -lt 2 && test "$ld_vers_minor" -lt 3159; then
+      gcc_cv_ld_aligned_shf_merge=no
+    fi
+    ;;
+esac
+AC_DEFINE_UNQUOTED(HAVE_LD_ALIGNED_SHF_MERGE,
+  [`if test $gcc_cv_ld_aligned_shf_merge = yes; then echo 1; else echo 0; fi`],
+[Define 0/1 if your linker supports the SHF_MERGE flag with section alignment > 1.])
+
 gcc_GAS_CHECK_FEATURE([stabs directive], gcc_cv_as_stabs_directive, ,,
 [.stabs "gcc2_compiled.",60,0,0,0],,
 [AC_DEFINE(HAVE_AS_STABS_DIRECTIVE, 1,
diff --git a/gcc/varasm.c b/gcc/varasm.c
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -823,6 +823,9 @@ mergeable_string_section (tree decl ATTR
 	  if (align < modesize)
 	    align = modesize;
 
+	  if (!HAVE_LD_ALIGNED_SHF_MERGE && align > 8)
+	    return readonly_data_section;
+
 	  str = TREE_STRING_POINTER (decl);
 	  unit = GET_MODE_SIZE (mode);
 
@@ -861,7 +864,8 @@ mergeable_constant_section (machine_mode
       && known_le (GET_MODE_BITSIZE (mode), align)
       && align >= 8
       && align <= 256
-      && (align & (align - 1)) == 0)
+      && (align & (align - 1)) == 0
+      && (HAVE_LD_ALIGNED_SHF_MERGE ? 1 : align == 8))
     {
       const char *prefix = function_mergeable_rodata_prefix ();
       char *name = (char *) alloca (strlen (prefix) + 30);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCHv2] Handle not explicitly zero terminated strings in merge sections
  2018-10-24 10:03                                         ` Rainer Orth
@ 2018-10-24 11:08                                           ` Richard Biener
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Biener @ 2018-10-24 11:08 UTC (permalink / raw)
  To: Rainer Orth
  Cc: Eric Botcazou, gcc-patches, Bernd Edlinger, Andreas Schwab,
	Olivier Hainque, Arnaud Charlet, Jakub Jelinek, Jeff Law

On Wed, 24 Oct 2018, Rainer Orth wrote:

> Hi Eric,
> 
> >> I know: a patch to fix this is almost ready, just needs a final round of
> >> testing.
> >
> > OK, thanks for the information.
> 
> here's what I've got.  It took me two false starts, unfortunately:
> 
> * Initially, I just tried linking with LD_OPTIONS='-z nocompstrtab':
> 
>        -z nocompstrtab
> 
>            Disables the compression of ELF string  tables,  and  comment  sec-
>            tions. By default, string compression is applied to SHT_STRTAB sec-
>            tions, to SHT_PROGBITS  sections  that  have  their  SHF_MERGE  and
>            SHF_STRINGS section flags set, and to comment sections.
> 
>   While that worked, this approach has several disadvantages: if the
>   objects are somehow linked without that option, the resulting
>   executables will suddenly and mysteriously start to die with SIGBUS.
>   It would be far better if the objects themselves are save to use in
>   whatever way without relying on a special non-default linker option.
>   Besides, it also disable string table compression, so has a far larger
>   impact than necessary.
> 
> * Next, I tried to disable HAVE_GAS_SHF_MERGE completely before Solaris
>   11.4.  This also fixed the regressions you observed.  Unfortunately,
>   while this is what's used with as/ld since the Solaris assembler
>   doesn't fully support setting SHF_MERGE/SHF_STRINGS yet, when gas is
>   in use, hundreds of Go test start to FAIL on both sparc and x86:
> 
> runtime stack:
> fatal error: DW_FORM_strp out of range in .debug_info at 166762
> panic during panic
> 
>   The error is from libbacktrace/dwarf.c (read_attribute) which is
>   linked into libgo.so.13.  While readelf --debug-dump=info shows no
>   problem, dwarfdump -a complains loudly.
> 
>   I haven't yet investigated what exactly is happening here.
> 
> * Instead I tried an approach that one of the Solaris linker engineers
>   suggested which both avoids the need for special linker options and a
>   large part of the impact: it just disables string merging for sections
>   with alignment > 1 with older (pre-Solaris 11.4) versions of Solaris
>   ld.  This one worked just fine and seems the preferred approach.
>   While the Solaris 10/SPARC ld didn't show the regressions seen on
>   11.3, I've decided to apply the workaround there, too, to avoid
>   running into problems if those objects are later relinked on Solaris 11.
> 
> Bootstrapped without regressions on i386-pc-solaris2.1[01] and
> sparc-sun-solaris2.1[01] (both Solaris 11.3 and 11.4 each) with as/ld,
> gas/ld, and x86_64-pc-linux-gnu.
> 
> Ok for mainline?

LGTM.

Richard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-10-24  9:38 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31 12:07 [PATCH] [Ada] Make middle-end string literals NUL terminated Bernd Edlinger
2018-08-03 16:50 ` Olivier Hainque
2018-08-03 18:38   ` Bernd Edlinger
2018-08-06 18:17     ` Olivier Hainque
2018-08-07 13:08       ` Bernd Edlinger
2018-08-07 16:41         ` Olivier Hainque
2018-08-07 17:10           ` Bernd Edlinger
2018-08-09 13:08             ` Olivier Hainque
2018-08-04 11:33   ` Bernd Edlinger
2018-08-04 15:43     ` [PATCH] Handle not explicitly zero terminated strings in merge sections Bernd Edlinger
2018-08-04 15:44       ` Bernd Edlinger
2018-08-04 16:06         ` Bernd Edlinger
2018-08-07 12:56           ` Bernd Edlinger
2018-08-17 17:35             ` Bernd Edlinger
2018-09-14 19:02               ` [PATCHv2] " Bernd Edlinger
2018-09-14 19:06                 ` Jakub Jelinek
2018-09-14 19:31                   ` Bernd Edlinger
2018-09-21 11:59                     ` Bernd Edlinger
2018-09-21 12:24                       ` Jakub Jelinek
2018-10-03 16:35                 ` Jeff Law
2018-10-09 13:07                   ` Bernd Edlinger
2018-10-10 18:25                     ` Jeff Law
2018-10-05 18:54                 ` Andreas Schwab
2018-10-05 20:00                   ` Bernd Edlinger
2018-10-08 11:22                     ` Rainer Orth
2018-10-08 13:00                       ` Rainer Orth
2018-10-08 15:42                       ` Bernd Edlinger
2018-10-09 13:08                         ` Rainer Orth
2018-10-22 18:26                           ` Bernd Edlinger
2018-10-22 18:29                             ` Rainer Orth
2018-10-08 22:16                       ` Eric Botcazou
2018-10-09 12:40                         ` Rainer Orth
2018-10-09 17:19                           ` Eric Botcazou
2018-10-10 12:23                             ` Rainer Orth
2018-10-10 15:05                               ` Eric Botcazou
2018-10-10 15:14                                 ` Rainer Orth
2018-10-23  8:36                                   ` Eric Botcazou
2018-10-23  8:37                                     ` Rainer Orth
2018-10-23  8:53                                       ` Eric Botcazou
2018-10-24 10:03                                         ` Rainer Orth
2018-10-24 11:08                                           ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).