From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 491193857C6F for ; Thu, 18 Nov 2021 16:47:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 491193857C6F Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-334-RK-DTYv2NqeM9vvmm6LW4Q-1; Thu, 18 Nov 2021 11:47:55 -0500 X-MC-Unique: RK-DTYv2NqeM9vvmm6LW4Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A4F6D824F87 for ; Thu, 18 Nov 2021 16:47:54 +0000 (UTC) Received: from comet.redhat.com (unknown [10.39.193.215]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C20755D9D5 for ; Thu, 18 Nov 2021 16:47:53 +0000 (UTC) From: Nick Clifton To: binutils@sourceware.org Subject: Commit: GAS: Add option to warn about multibyte characters Date: Thu, 18 Nov 2021 16:47:51 +0000 Message-ID: <87ee7d8l6w.fsf@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Nov 2021 16:48:00 -0000 --=-=-= Content-Type: text/plain Hi Guys, I am applying the patch below to add a new option to the assembler. The option enables the generation of warning messages when multibyte characters are found in the input. It has two modes, in the first it will warn about multibyte characters detected anywhere in the input stream. In the second mode the warnings are only generated when multibyte characters are used in the names of defined symbols. The point of the second mode is that whilst there may be good reasons for multibyte characters to appear in the input - in comments, output strings, and so on, it is rare that they will be used in symbol names. Cheers Nick gas/ChangeLog 2021-11-18 Nick Clifton * as.c (parse_args): Add support for --multibyte-handling. * as.h (multibyte_handling): Declare. * app.c (scan_for_multibyte_characters): New function. (do_scrub_chars): Call the new function if multibyte warning is enabled. * input-scrub,c (input_scrub_next_buffer): Call the multibyte scanning function if multibyte warnings are enabled. * symbols.c (struct symbol_flags): Add multibyte_warned bit. (symbol_init): Call the multibyte scanning function if multibyte symbol warnings are enabled. (S_SET_SEGMENT): Likewise. * NEWS: Mention the new feature. * doc/as.texi: Document the new feature. * testsuite/gas/all/multibyte.s: New test source file. * testsuite/gas/all/multibyte1.d: New test driver file. * testsuite/gas/all/multibyte1.l: New test expected output. * testsuite/gas/all/multibyte2.d: New test driver file. * testsuite/gas/all/multibyte2.l: New test expected output. * testsuite/gas/all/gas.exp: Run the new tests. --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline; filename=gas.multibyte.patch Content-Transfer-Encoding: quoted-printable diff --git a/gas/NEWS b/gas/NEWS index aac75220cfe..4288e6213dd 100644 --- a/gas/NEWS +++ b/gas/NEWS @@ -13,6 +13,14 @@ =20 * Add support for Scalable Matrix Extension (SME) for AArch64. =20 +* The --multibyte-handling=3D[allow|warn|warn-sym-only] option tells the + assembler what to when it encoutners multibyte characters in the input. = The + default is to allow them. Setting the option to "warn" will generate a + warning message whenever any multibyte character is encountered. Using = the + option to "warn-sym-only" will make the assembler generate a warning whe= never a + symbol is defined containing multibyte characters. (References to undef= ined + symbols will not generate warnings). + * Outputs of .ds.x directive and .tfloat directive with hex input from x86 assembler have been reduced from 12 bytes to 10 bytes to match the output of .tfloat directive. diff --git a/gas/app.c b/gas/app.c index 712bffef851..0c15b969007 100644 --- a/gas/app.c +++ b/gas/app.c @@ -345,6 +345,55 @@ process_escape (int ch) } } =20 +#define MULTIBYTE_WARN_COUNT_LIMIT 10 +static unsigned int multibyte_warn_count =3D 0; + +bool +scan_for_multibyte_characters (const unsigned char * start, +=09=09=09 const unsigned char * end, +=09=09=09 bool warn) +{ + if (end <=3D start) + return false; + + if (warn && multibyte_warn_count > MULTIBYTE_WARN_COUNT_LIMIT) + return false; + + bool found =3D false; + + while (start < end) + { + unsigned char c; + + if ((c =3D * start++) <=3D 0x7f) +=09continue; + + if (!warn) +=09return true; + + found =3D true; + + const char * filename; + unsigned int lineno; + + filename =3D as_where (& lineno); + if (filename =3D=3D NULL) +=09as_warn (_("multibyte character (%#x) encountered in input"), c); + else if (lineno =3D=3D 0) +=09as_warn (_("multibyte character (%#x) encountered in %s"), c, filename)= ; + else +=09as_warn (_("multibyte character (%#x) encountered in %s at or near line= %u"), c, filename, lineno); + + if (++ multibyte_warn_count =3D=3D MULTIBYTE_WARN_COUNT_LIMIT) +=09{ +=09 as_warn (_("further multibyte character warnings suppressed")); +=09 break; +=09} + } + + return found; +} + /* This function is called to process input characters. The GET parameter is used to retrieve more input characters. GET should set its parameter to point to a buffer, and return the length of @@ -463,6 +512,11 @@ do_scrub_chars (size_t (*get) (char *, size_t), char *= tostart, size_t tolen) =09return 0; from =3D input_buffer; fromend =3D from + fromlen; + + if (multibyte_handling =3D=3D multibyte_warn) +=09(void) scan_for_multibyte_characters ((const unsigned char *) from, +=09=09=09=09=09 (const unsigned char* ) fromend, +=09=09=09=09=09 true /* Generate warnings. */); } =20 while (1) diff --git a/gas/as.c b/gas/as.c index 7de8af246e1..8af04aa85b8 100644 --- a/gas/as.c +++ b/gas/as.c @@ -474,7 +474,7 @@ parse_args (int * pargc, char *** pargv) OPTION_DEBUG_PREFIX_MAP, OPTION_DEFSYM, OPTION_LISTING_LHS_WIDTH, - OPTION_LISTING_LHS_WIDTH2, + OPTION_LISTING_LHS_WIDTH2, /* =3D STD_BASE + 10 */ OPTION_LISTING_RHS_WIDTH, OPTION_LISTING_CONT_LINES, OPTION_DEPFILE, @@ -484,7 +484,7 @@ parse_args (int * pargc, char *** pargv) OPTION_GDWARF_3, OPTION_GDWARF_4, OPTION_GDWARF_5, - OPTION_GDWARF_SECTIONS, + OPTION_GDWARF_SECTIONS, /* =3D STD_BASE + 20 */ OPTION_GDWARF_CIE_VERSION, OPTION_STRIP_LOCAL_ABSOLUTE, OPTION_TRADITIONAL_FORMAT, @@ -494,7 +494,7 @@ parse_args (int * pargc, char *** pargv) OPTION_NOEXECSTACK, OPTION_SIZE_CHECK, OPTION_ELF_STT_COMMON, - OPTION_ELF_BUILD_NOTES, + OPTION_ELF_BUILD_NOTES, /* =3D STD_BASE + 30 */ OPTION_SECTNAME_SUBST, OPTION_ALTERNATE, OPTION_AL, @@ -503,7 +503,8 @@ parse_args (int * pargc, char *** pargv) OPTION_WARN_FATAL, OPTION_COMPRESS_DEBUG, OPTION_NOCOMPRESS_DEBUG, - OPTION_NO_PAD_SECTIONS /* =3D STD_BASE + 40 */ + OPTION_NO_PAD_SECTIONS, + OPTION_MULTIBYTE_HANDLING /* =3D STD_BASE + 40 */ /* When you add options here, check that they do not collide with OPTION_MD_BASE. See as.h. */ }; @@ -581,6 +582,7 @@ parse_args (int * pargc, char *** pargv) ,{"target-help", no_argument, NULL, OPTION_TARGET_HELP} ,{"traditional-format", no_argument, NULL, OPTION_TRADITIONAL_FORMAT} ,{"warn", no_argument, NULL, OPTION_WARN} + ,{"multibyte-handling", required_argument, NULL, OPTION_MULTIBYTE_HAND= LING} }; =20 /* Construct the option lists from the standard list and the target @@ -683,6 +685,19 @@ parse_args (int * pargc, char *** pargv) =09 flag_traditional_format =3D 1; =09 break; =20 +=09case OPTION_MULTIBYTE_HANDLING: +=09 if (strcmp (optarg, "allow") =3D=3D 0) +=09 multibyte_handling =3D multibyte_allow; +=09 else if (strcmp (optarg, "warn") =3D=3D 0) +=09 multibyte_handling =3D multibyte_warn; +=09 else if (strcmp (optarg, "warn-sym-only") =3D=3D 0) +=09 multibyte_handling =3D multibyte_warn_syms; +=09 else if (strcmp (optarg, "warn_sym_only") =3D=3D 0) +=09 multibyte_handling =3D multibyte_warn_syms; +=09 else +=09 as_fatal (_("unexpected argument to --multibyte-input-option: '%s'"= ), optarg); +=09 break; + =09case OPTION_VERSION: =09 /* This output is intended to follow the GNU standards document. */ =09 printf (_("GNU assembler %s\n"), BFD_VERSION_STRING); diff --git a/gas/as.h b/gas/as.h index f3f12fbd2f8..89dae1b6833 100644 --- a/gas/as.h +++ b/gas/as.h @@ -344,6 +344,14 @@ COMMON int linkrelax; =20 COMMON int do_not_pad_sections_to_alignment; =20 +enum multibyte_input_handling +{ + multibyte_allow =3D 0, + multibyte_warn, + multibyte_warn_syms +}; +COMMON enum multibyte_input_handling multibyte_handling; + /* TRUE if we should produce a listing. */ extern int listing; =20 @@ -450,6 +458,7 @@ void input_scrub_insert_file (char *); char * input_scrub_new_file (const char *); char * input_scrub_next_buffer (char **bufp); size_t do_scrub_chars (size_t (*get) (char *, size_t), char *, size_t); +bool scan_for_multibyte_characters (const unsigned char *, const unsigne= d char *, bool); int gen_to_words (LITTLENUM_TYPE *, int, long); int had_err (void); int ignore_input (void); diff --git a/gas/doc/as.texi b/gas/doc/as.texi index 9c1924d4bbd..b83f50b0bfc 100644 --- a/gas/doc/as.texi +++ b/gas/doc/as.texi @@ -245,6 +245,7 @@ gcc(1), ld(1), and the Info entries for @file{binutils}= and @file{ld}. [@b{--sectname-subst}] [@b{--size-check=3D[error|warning]}] [@b{--elf-stt-common=3D[no|yes]}] [@b{--generate-missing-build-notes=3D[no|yes]}] + [@b{--multibyte-handling=3D[allow|warn|warn-sym-only]}] [@b{--target-help}] [@var{target-options}] [@b{--}|@var{files} @dots{}] @c @@ -871,6 +872,18 @@ Set the maximum width of an input source line, as disp= layed in a listing, to Set the maximum number of lines printed in a listing for a single line of = input to @var{number} + 1. =20 +@item --multibyte-handling=3Dallow +@itemx --multibyte-handling=3Dwarn +@itemx --multibyte-handling=3Dwarn-sym-only +Controls how the assembler handles multibyte characters in the input. The +default (which can be restored by using the @option{allow} argument) is to +allow such characters without complaint. Using the @option{warn} argument= will +make the assembler generate a warning message whenever any multibyte chara= cter +is encountered. Using the @option{warn-sym-only} argument will only cause= a +warning to be generated when a symbol is defined with a name that contains +multibyte characters. (References to undefined symbols will not generate = a +warning). + @item --no-pad-sections Stop the assembler for padding the ends of output sections to the alignmen= t of that section. The default is to pad the sections, but this can waste s= pace @@ -2966,9 +2979,11 @@ are noted in @ref{Machine Dependencies}. @end ifset No symbol may begin with a digit. Case is significant. There is no length limit; all characters are significant. Multibyte chara= cters -are supported. Symbols are delimited by characters not in that set, or by= the -beginning of a file (since the source program must end with a newline, the= end -of a file is not a possible symbol delimiter). @xref{Symbols}. +are supported, but note that the setting of the +@option{--multibyte-handling} option might prevent their use. Symbols +are delimited by characters not in that set, or by the beginning of a file +(since the source program must end with a newline, the end of a file is no= t a +possible symbol delimiter). @xref{Symbols}. =20 Symbol names may also be enclosed in double quote @code{"} characters. In= such cases any characters are allowed, except for the NUL character. If a doub= le @@ -3858,11 +3873,18 @@ than @code{Foo}. Symbol names do not start with a digit. An exception to this rule is made= for Local Labels. See below. =20 -Multibyte characters are supported. To generate a symbol name containing +Multibyte characters are supported, but note that the setting of the +@option{multibyte-handling} option might prevent their use. +To generate a symbol name containing multibyte characters enclose it within double quotes and use escape codes.= cf @xref{Strings}. Generating a multibyte symbol name from a label is not currently supported. =20 +Since multibyte symbol names are unusual, and could possibly be used +maliciously, @command{@value{AS}} provides a command line option +(@option{--multibyte-handling=3Dwarn-sym-only}) which can be used to gener= ate a +warning message whenever a symbol name containing multibyte characters is = defined. + Each symbol has exactly one name. Each name in an assembly language progr= am refers to exactly one symbol. You may use that symbol name any number of = times in a program. diff --git a/gas/input-scrub.c b/gas/input-scrub.c index b93afb26b43..c665402220e 100644 --- a/gas/input-scrub.c +++ b/gas/input-scrub.c @@ -377,6 +377,11 @@ input_scrub_next_buffer (char **bufp) =09 ++p; =09} =20 + if (multibyte_handling =3D=3D multibyte_warn) +=09(void) scan_for_multibyte_characters ((const unsigned char *) p, +=09=09=09=09=09 (const unsigned char *) limit, +=09=09=09=09=09 true /* Generate warnings */); + /* We found a newline in the newly read chars. */ partial_where =3D p; partial_size =3D limit - p; diff --git a/gas/symbols.c b/gas/symbols.c index 3cb9425c4ce..889ec662149 100644 --- a/gas/symbols.c +++ b/gas/symbols.c @@ -82,6 +82,10 @@ struct symbol_flags /* Whether the symbol has been marked to be removed by a .symver directive. */ unsigned int removed : 1; + + /* Set when a warning about the symbol containing multibyte characters + is generated. */ + unsigned int multibyte_warned : 1; }; =20 /* A pointer in the symbol may point to either a complete symbol @@ -198,7 +202,7 @@ static void * symbol_entry_find (htab_t table, const char *name) { hashval_t hash =3D htab_hash_string (name); - symbol_entry_t needle =3D { { { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, + symbol_entry_t needle =3D { { { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, =09=09=09 hash, name, 0, 0, 0 } }; return htab_find_with_hash (table, &needle, hash); } @@ -309,6 +313,18 @@ symbol_init (symbolS *symbolP, const char *name, asect= ion *sec, symbolP->bsym->name =3D name; symbolP->bsym->section =3D sec; =20 + if (multibyte_handling =3D=3D multibyte_warn_syms + && ! symbolP->flags.local_symbol + && sec !=3D undefined_section + && ! symbolP->flags.multibyte_warned + && scan_for_multibyte_characters ((const unsigned char *) name, +=09=09=09=09=09(const unsigned char *) name + strlen (name), +=09=09=09=09=09false /* Do not warn. */)) + { + as_warn (_("symbol '%s' contains multibyte characters"), name); + symbolP->flags.multibyte_warned =3D 1; + } + S_SET_VALUE (symbolP, valu); =20 symbol_clear_list_pointers (symbolP); @@ -2427,7 +2443,21 @@ S_SET_SEGMENT (symbolS *s, segT seg) =09abort (); } else - s->bsym->section =3D seg; + { + if (multibyte_handling =3D=3D multibyte_warn_syms +=09 && ! s->flags.local_symbol +=09 && seg !=3D undefined_section +=09 && ! s->flags.multibyte_warned +=09 && scan_for_multibyte_characters ((const unsigned char *) s->name, +=09=09=09=09=09 (const unsigned char *) s->name + strlen (s->name), +=09=09=09=09=09 false)) +=09{ +=09 as_warn (_("symbol '%s' contains multibyte characters"), s->name); +=09 s->flags.multibyte_warned =3D 1; +=09} + + s->bsym->section =3D seg; + } } =20 void diff --git a/gas/testsuite/gas/all/gas.exp b/gas/testsuite/gas/all/gas.exp index 2c812b1fd79..5eee4f8abfa 100644 --- a/gas/testsuite/gas/all/gas.exp +++ b/gas/testsuite/gas/all/gas.exp @@ -502,3 +502,5 @@ run_dump_test "nop" run_dump_test "asciz" run_dump_test "pr27384" run_dump_test "pr27381" +run_dump_test "multibyte1" +run_dump_test "multibyte2" --- /dev/null=092021-11-18 07:54:08.971751480 +0000 +++ gas/testsuite/gas/all/multibyte.s=092021-11-18 14:56:32.271513699 +0000 @@ -0,0 +1,8 @@ +=09.text +=09.globl=09he=E2=80=AEoll=E2=80=AC +he=E2=80=AEoll=E2=80=AC: +=09.nop +=09 +=09.globl=09hello +hello: +=09.nop --- /dev/null=092021-11-18 07:54:08.971751480 +0000 +++ gas/testsuite/gas/all/multibyte1.d=092021-11-18 15:02:28.440195596 +000= 0 @@ -0,0 +1,3 @@ +#source: multibyte.s +#as: --multibyte-handling=3Dwarn +#warning_output: multibyte1.l --- /dev/null=092021-11-18 07:54:08.971751480 +0000 +++ gas/testsuite/gas/all/multibyte1.l=092021-11-18 15:03:58.343862831 +000= 0 @@ -0,0 +1,12 @@ +[^:]*: Assembler messages: +[^:]*: Warning: multibyte character \(0xe2\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0x80\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0xae\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0xe2\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0x80\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0xac\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0xe2\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0x80\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0xae\) encountered in .*multibyte.s +[^:]*: Warning: multibyte character \(0xe2\) encountered in .*multibyte.s +[^:]*: Warning: further multibyte character warnings suppressed --- /dev/null=092021-11-18 07:54:08.971751480 +0000 +++ gas/testsuite/gas/all/multibyte2.l=092021-11-18 15:07:30.934075943 +000= 0 @@ -0,0 +1,2 @@ +[^:]*: Assembler messages: +[^:]*:3: Warning: symbol '.*' contains multibyte characters --- /dev/null=092021-11-18 07:54:08.971751480 +0000 +++ gas/testsuite/gas/all/multibyte2.d=092021-11-18 15:05:16.503573530 +000= 0 @@ -0,0 +1,3 @@ +#source: multibyte.s +#as: --multibyte-handling=3Dwarn-sym-only +#warning_output: multibyte2.l --=-=-=--