public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* RFC: Supporting quoted symbol/label names
@ 2015-08-18 17:36 Nick Clifton
  2015-08-19  0:24 ` Matt Rice
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nick Clifton @ 2015-08-18 17:36 UTC (permalink / raw)
  To: binutils

[-- Attachment #1: Type: text/plain, Size: 1857 bytes --]

Hi Guys,

  I have been looking at PR 18581, which complains about the ARM
  assembler not accepting function names containing a dash:

https://sourceware.org/bugzilla/show_bug.cgi?id=18581

  Initially I was going to reject the bug as invalid, since normally
  symbol names do not contain dashes, but two things changed my mind -
  the symbols were being provided inside double quotes, and technically
  there is nothing in any file format standard to forbid such names.
  (It also helped that this output was being produced by LLVM, so this
  is not just a theoretical problem).

  Although the PR talks about the ARM port, the problem is generic, and
  in fact that filer even includes an x86 test case.  So I have been
  looking at a solution.  It turns out however that there are some
  pretty deep assumptions in GAS about how symbol names can be extracted
  from the input stream, and changing them all can be quite daunting.
  Not one to be stymied however I gave it a go and the result is the
  attached patch.

  The solution I have chosen is to modify the get_symbol_end() function
  so that it will allow any text enclosed between double quotes.  This
  is the function that is used in most places in GAS to read a symbol,
  label or operand name, but it does have one drawback: the function is
  expected to return the character that terminated the symbol name -
  typically a colon, comma, newline or space.  But in the case of a
  quoted string, this character will be a double quote.  Much of the
  code that calls get_symbol_end is not expecting this, so I have had
  to do a lot of hacking to handle this eventuality.

  I have tested the patch on 127 different toolchains with no
  regressions, so I think that it probably does work.  But what do
  people think ?  Is supporting "double quoted" symbol names a good idea ?

Cheers
  Nick


[-- Attachment #2: quoted-symbol-names.patch.xz --]
[-- Type: application/x-xz, Size: 24872 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: Supporting quoted symbol/label names
  2015-08-18 17:36 RFC: Supporting quoted symbol/label names Nick Clifton
@ 2015-08-19  0:24 ` Matt Rice
  2015-08-19  8:46 ` Jan Beulich
  2015-08-21 15:47 ` Nick Clifton
  2 siblings, 0 replies; 6+ messages in thread
From: Matt Rice @ 2015-08-19  0:24 UTC (permalink / raw)
  To: Nick Clifton; +Cc: Binutils

On Tue, Aug 18, 2015 at 10:36 AM, Nick Clifton <nickc@redhat.com> wrote:
> Hi Guys,
>
>   I have been looking at PR 18581, which complains about the ARM
>   assembler not accepting function names containing a dash:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=18581
>
>   Initially I was going to reject the bug as invalid, since normally
>   symbol names do not contain dashes, but two things changed my mind -
>   the symbols were being provided inside double quotes, and technically
>   there is nothing in any file format standard to forbid such names.
>   (It also helped that this output was being produced by LLVM, so this
>   is not just a theoretical problem).
>
>   Although the PR talks about the ARM port, the problem is generic, and
>   in fact that filer even includes an x86 test case.  So I have been
>   looking at a solution.  It turns out however that there are some
>   pretty deep assumptions in GAS about how symbol names can be extracted
>   from the input stream, and changing them all can be quite daunting.
>   Not one to be stymied however I gave it a go and the result is the
>   attached patch.
>
>   The solution I have chosen is to modify the get_symbol_end() function
>   so that it will allow any text enclosed between double quotes.  This
>   is the function that is used in most places in GAS to read a symbol,
>   label or operand name, but it does have one drawback: the function is
>   expected to return the character that terminated the symbol name -
>   typically a colon, comma, newline or space.  But in the case of a
>   quoted string, this character will be a double quote.  Much of the
>   code that calls get_symbol_end is not expecting this, so I have had
>   to do a lot of hacking to handle this eventuality.
>
>   I have tested the patch on 127 different toolchains with no
>   regressions, so I think that it probably does work.  But what do
>   people think ?  Is supporting "double quoted" symbol names a good idea ?

From an end user perspective I would have found this useful when I was
working on (and still would if I were to try again), the objective-c
mangling format...

The current mechanism for mangling in objective-c suffers from some
limitations, chiefly it cannot be demangled unambiguously
given a selector of:
    +[Class(Category) foo:bar:]
or -[Class(Category) foo:bar:]
it turns the first plus and minus character into _c_ and _i_ respectively,
followed by Class_Category_
so _i_Class_Category_
where we run into problems is: it then turns the colon's into _'s
so we end up with _i_Class_Category_foo_bar_

which works fine until you get methods like +[Class _foo:bar:] which
contain underscores.

being able to quote them and use the full +[](): characters in the
symbol would be the ideal way to do this.  And I believe this is what
apple does in their symbol naming, so it wouldn't surprise me if this
was the reason behind this llvm behaviour.

there are some characters gas accepts we can use $ among them, but i
wasn't exactly convinced it was worth the effort of attempting to
switch mangling formats and introducing a 3rd mangling format besides
the gnu and the apple ones.

I believe gdb already supports the apple mangling format along side
the gnu one, even if it is untriggered on gnu targets

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: Supporting quoted symbol/label names
  2015-08-18 17:36 RFC: Supporting quoted symbol/label names Nick Clifton
  2015-08-19  0:24 ` Matt Rice
@ 2015-08-19  8:46 ` Jan Beulich
  2015-08-21 15:23   ` Nick Clifton
  2015-08-21 15:47 ` Nick Clifton
  2 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2015-08-19  8:46 UTC (permalink / raw)
  To: Nick Clifton; +Cc: binutils

>>> On 18.08.15 at 19:36, <nickc@redhat.com> wrote:
>   I have tested the patch on 127 different toolchains with no
>   regressions, so I think that it probably does work.  But what do
>   people think ?  Is supporting "double quoted" symbol names a good idea ?

To be honest I never understood why there was a limitation in gas
on the characters usable for symbol names.

Of course (not having looked at the patch in any detail) one question
is how you (want to) deal with non-printable characters (potentially
encode-able via backslash escapes), since some are being used by
gas internally (on the assumption that user provided symbol names
can't contain them).

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: Supporting quoted symbol/label names
  2015-08-19  8:46 ` Jan Beulich
@ 2015-08-21 15:23   ` Nick Clifton
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Clifton @ 2015-08-21 15:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Hi Jan,

> Of course (not having looked at the patch in any detail) one question
> is how you (want to) deal with non-printable characters (potentially
> encode-able via backslash escapes),

The patch currently does not support escape sequences, except for an 
escaped double quote.  Perhaps it should.

> since some are being used by
> gas internally (on the assumption that user provided symbol names
> can't contain them).

Currently only two values are used by gas internally - 0x1 and 0x2. 
Assuming that escape sequences were enabled we could add a check and 
complain if either of these values were used.  Or just allow them and 
warn the user that they really need to know what they are doing.

Cheers
   Nick


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: Supporting quoted symbol/label names
  2015-08-18 17:36 RFC: Supporting quoted symbol/label names Nick Clifton
  2015-08-19  0:24 ` Matt Rice
  2015-08-19  8:46 ` Jan Beulich
@ 2015-08-21 15:47 ` Nick Clifton
  2015-08-26 13:41   ` Alan Modra
  2 siblings, 1 reply; 6+ messages in thread
From: Nick Clifton @ 2015-08-21 15:47 UTC (permalink / raw)
  To: binutils

Hi Guys,

>    I have been looking at PR 18581, which complains about the ARM
>    assembler not accepting function names containing a dash:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=18581

OK, I have gone ahead and applied my patch.  I have not extended it 
(yet) to handle escape sequences inside quoted strings, although that 
may come in the future.

Cheers
   Nick


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: Supporting quoted symbol/label names
  2015-08-21 15:47 ` Nick Clifton
@ 2015-08-26 13:41   ` Alan Modra
  0 siblings, 0 replies; 6+ messages in thread
From: Alan Modra @ 2015-08-26 13:41 UTC (permalink / raw)
  To: Nick Clifton; +Cc: binutils

On Fri, Aug 21, 2015 at 04:46:57PM +0100, Nick Clifton wrote:
> OK, I have gone ahead and applied my patch.  I have not extended it (yet) to
> handle escape sequences inside quoted strings, although that may come in the
> future.

Some fallout from this patch.  fr30-elf gas is broken too, and I
think I'll leave fixing that one to Nick.

	PR gas/18581
	* config/tc-mn10200.c (md_assemble <mdr>): Move restore_line_pointer
	call to where input line used to be restored.
	* config/tc-mn10300.c (md_assemble <usp>): Remove redundant input
	line restore.
	* config/tc-tilepro.c (parse_reg_expression): Add regname var.

diff --git a/gas/config/tc-mn10200.c b/gas/config/tc-mn10200.c
index 4d4f482..cabbcc1 100644
--- a/gas/config/tc-mn10200.c
+++ b/gas/config/tc-mn10200.c
@@ -995,13 +995,14 @@ md_assemble (char *str)
 	      char *start;
 	      char c = get_symbol_name (&start);
 
-	      (void) restore_line_pointer (c);
 	      if (strcmp (start, "mdr") != 0)
 		{
+		  (void) restore_line_pointer (c);
 		  input_line_pointer = hold;
 		  str = hold;
 		  goto error;
 		}
+	      (void) restore_line_pointer (c);
 	      goto keep_going;
 	    }
 	  else if (data_register_name (&ex))
diff --git a/gas/config/tc-mn10300.c b/gas/config/tc-mn10300.c
index 0d13622..da05cac 100644
--- a/gas/config/tc-mn10300.c
+++ b/gas/config/tc-mn10300.c
@@ -1417,14 +1417,12 @@ md_assemble (char *str)
 
 	      if (strcasecmp (start, "usp") != 0)
 		{
-	      (void) restore_line_pointer (c);
-		  *input_line_pointer = c;
+		  (void) restore_line_pointer (c);
 		  input_line_pointer = hold;
 		  str = hold;
 		  goto error;
 		}
 	      (void) restore_line_pointer (c);
-	      *input_line_pointer = c;
 	      goto keep_going;
 	    }
 	  else if (operand->flags & MN10300_OPERAND_SSP)
diff --git a/gas/config/tc-tilepro.c b/gas/config/tc-tilepro.c
index e7c7c64..a979473 100644
--- a/gas/config/tc-tilepro.c
+++ b/gas/config/tc-tilepro.c
@@ -980,6 +980,7 @@ parse_reg_expression (expressionS* expression)
   /* Zero everything to make sure we don't miss any flags.  */
   memset (expression, 0, sizeof *expression);
 
+  char *regname;
   char terminating_char = get_symbol_name (&regname);
 
   void* pval = hash_find (main_reg_hash, regname);

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-26 13:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-18 17:36 RFC: Supporting quoted symbol/label names Nick Clifton
2015-08-19  0:24 ` Matt Rice
2015-08-19  8:46 ` Jan Beulich
2015-08-21 15:23   ` Nick Clifton
2015-08-21 15:47 ` Nick Clifton
2015-08-26 13:41   ` Alan Modra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).