public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Gcc Digest, Vol 29, Issue 7
       [not found] <mailman.0.1656936003.1843426.gcc@gcc.gnu.org>
@ 2022-07-05  7:19 ` Yair Lenga
  2022-07-05 12:16   ` Florian Weimer
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Yair Lenga @ 2022-07-05  7:19 UTC (permalink / raw)
  To: gcc

Hi,

Wanted to get some feedback on an idea that I have - trying to address the
age long issue with type check on VA list function - like 'scanf' and
friends. In my specific case, I'm trying to build code that will parse a
list of values from SELECT statement into list of C variables. The type of
the values is known (by inspecting the result set meta-data). My ideal
solution will be to implement something like:

int result_set_read(struct result_set *p_result_set, ...);

Which can be called with

int int_var ; float float_var ; char c[20] ;
result_set_read(rs1, &int_var, &float_var, c) ;

The tricky part is to verify argument type - make sure . One possible path
I thought was - why not leverage the ability to describe scanf like
functions (
result_set_read(rs1, const char *format, ...) __attribute((format (scanf,
2, 3)) ;

And then the above call will be
result_set-read(rs1, "%d %f %s", &int_var, &float_var, c) ;

With the added benefit that GCC will flag as error, if there is mismatch
between the variable and the type. My function parses the scanf format to
decide on conversions (just the basic formatting '%f', '%d', '%*s', ...).
So far big improvement, and the only missing item is the ability to enforce
check on string sizes - to support better checks against buffer overflow
(side note: wish there was ability to force inclusion of the max string
size, similar to the sscanf_s).

My question: does anyone know how much effort it will be to add a new GCC
built-in (or extension), that will automatically generate a descriptive
format string, consistent with scanf formatting, avoiding the need to
manually enter the formatting string. This can be thought of as "poor man
introspection". Simple macro can then be used to generate it

#define RESULT_SET_READ(rs, ...) result_set_read(rs,
__builtin_format(__VA_ARGS__),  __VA_ARGS__)

Practically, making the function "safe" (with respect to buffer overflow,
type conversions) for most use cases.

Any feedback, pointers, ... to how to implement will be appreciated

Yair

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Gcc Digest, Vol 29, Issue 7
  2022-07-05  7:19 ` Gcc Digest, Vol 29, Issue 7 Yair Lenga
@ 2022-07-05 12:16   ` Florian Weimer
  2022-07-05 21:24     ` Yair Lenga
  2022-07-05 21:52   ` Gcc Digest, Vol 29, Issue 7 Andrew Pinski
  2022-07-06  7:17   ` David Brown
  2 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2022-07-05 12:16 UTC (permalink / raw)
  To: Yair Lenga via Gcc; +Cc: Yair Lenga

* Yair Lenga via Gcc:

> My question: does anyone know how much effort it will be to add a new GCC
> built-in (or extension), that will automatically generate a descriptive
> format string, consistent with scanf formatting, avoiding the need to
> manually enter the formatting string.

It's already possible to do this with C++.  Can't you just use C++
instead?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Gcc Digest, Vol 29, Issue 7
  2022-07-05 12:16   ` Florian Weimer
@ 2022-07-05 21:24     ` Yair Lenga
  2022-07-07 10:02       ` Safer vararg calls Florian Weimer
  0 siblings, 1 reply; 10+ messages in thread
From: Yair Lenga @ 2022-07-05 21:24 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Yair Lenga via Gcc

I prefer not to go into “flame wars” on the merits of C vs C++. My projects are (mostly) in “C” and I am happy with this setup. In addition, given technical / organizational / business issues, switching to C++ not an option.

Yair.

Sent from my iPad

> On Jul 5, 2022, at 3:16 PM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Yair Lenga via Gcc:
> 
>> My question: does anyone know how much effort it will be to add a new GCC
>> built-in (or extension), that will automatically generate a descriptive
>> format string, consistent with scanf formatting, avoiding the need to
>> manually enter the formatting string.
> 
> It's already possible to do this with C++.  Can't you just use C++
> instead?
> 
> Thanks,
> Florian
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Gcc Digest, Vol 29, Issue 7
  2022-07-05  7:19 ` Gcc Digest, Vol 29, Issue 7 Yair Lenga
  2022-07-05 12:16   ` Florian Weimer
@ 2022-07-05 21:52   ` Andrew Pinski
  2022-07-06  7:17   ` David Brown
  2 siblings, 0 replies; 10+ messages in thread
From: Andrew Pinski @ 2022-07-05 21:52 UTC (permalink / raw)
  To: Yair Lenga; +Cc: GCC Mailing List

On Tue, Jul 5, 2022 at 12:21 AM Yair Lenga via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi,
>
> Wanted to get some feedback on an idea that I have - trying to address the
> age long issue with type check on VA list function - like 'scanf' and
> friends. In my specific case, I'm trying to build code that will parse a
> list of values from SELECT statement into list of C variables. The type of
> the values is known (by inspecting the result set meta-data). My ideal
> solution will be to implement something like:
>
> int result_set_read(struct result_set *p_result_set, ...);
>
> Which can be called with
>
> int int_var ; float float_var ; char c[20] ;
> result_set_read(rs1, &int_var, &float_var, c) ;
>
> The tricky part is to verify argument type - make sure . One possible path
> I thought was - why not leverage the ability to describe scanf like
> functions (
> result_set_read(rs1, const char *format, ...) __attribute((format (scanf,
> 2, 3)) ;
>
> And then the above call will be
> result_set-read(rs1, "%d %f %s", &int_var, &float_var, c) ;
>
> With the added benefit that GCC will flag as error, if there is mismatch
> between the variable and the type. My function parses the scanf format to
> decide on conversions (just the basic formatting '%f', '%d', '%*s', ...).
> So far big improvement, and the only missing item is the ability to enforce
> check on string sizes - to support better checks against buffer overflow
> (side note: wish there was ability to force inclusion of the max string
> size, similar to the sscanf_s).
>
> My question: does anyone know how much effort it will be to add a new GCC
> built-in (or extension), that will automatically generate a descriptive
> format string, consistent with scanf formatting, avoiding the need to
> manually enter the formatting string. This can be thought of as "poor man
> introspection". Simple macro can then be used to generate it
>
> #define RESULT_SET_READ(rs, ...) result_set_read(rs,
> __builtin_format(__VA_ARGS__),  __VA_ARGS__)
>
> Practically, making the function "safe" (with respect to buffer overflow,
> type conversions) for most use cases.
>
> Any feedback, pointers, ... to how to implement will be appreciated

This is all recorded as https://gcc.gnu.org/PR47781 . You could do a
plugin to handle the attribute maybe.

Thanks,
Andrew Pinski

>
> Yair

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Gcc Digest, Vol 29, Issue 7
  2022-07-05  7:19 ` Gcc Digest, Vol 29, Issue 7 Yair Lenga
  2022-07-05 12:16   ` Florian Weimer
  2022-07-05 21:52   ` Gcc Digest, Vol 29, Issue 7 Andrew Pinski
@ 2022-07-06  7:17   ` David Brown
  2022-07-06  7:59     ` Yair Lenga
  2 siblings, 1 reply; 10+ messages in thread
From: David Brown @ 2022-07-06  7:17 UTC (permalink / raw)
  To: Yair Lenga, gcc

On 05/07/2022 09:19, Yair Lenga via Gcc wrote:
> Hi,
> 
> Wanted to get some feedback on an idea that I have - trying to address the
> age long issue with type check on VA list function - like 'scanf' and
> friends. In my specific case, I'm trying to build code that will parse a
> list of values from SELECT statement into list of C variables. The type of
> the values is known (by inspecting the result set meta-data). My ideal
> solution will be to implement something like:
> 
> int result_set_read(struct result_set *p_result_set, ...);
> 
> Which can be called with
> 
> int int_var ; float float_var ; char c[20] ;
> result_set_read(rs1, &int_var, &float_var, c) ;
> 
> The tricky part is to verify argument type - make sure . One possible path
> I thought was - why not leverage the ability to describe scanf like
> functions (
> result_set_read(rs1, const char *format, ...) __attribute((format (scanf,
> 2, 3)) ;
> 
> And then the above call will be
> result_set-read(rs1, "%d %f %s", &int_var, &float_var, c) ;
> 
> With the added benefit that GCC will flag as error, if there is mismatch
> between the variable and the type. My function parses the scanf format to
> decide on conversions (just the basic formatting '%f', '%d', '%*s', ...).
> So far big improvement, and the only missing item is the ability to enforce
> check on string sizes - to support better checks against buffer overflow
> (side note: wish there was ability to force inclusion of the max string
> size, similar to the sscanf_s).
> 
> My question: does anyone know how much effort it will be to add a new GCC
> built-in (or extension), that will automatically generate a descriptive
> format string, consistent with scanf formatting, avoiding the need to
> manually enter the formatting string. This can be thought of as "poor man
> introspection". Simple macro can then be used to generate it
> 
> #define RESULT_SET_READ(rs, ...) result_set_read(rs,
> __builtin_format(__VA_ARGS__),  __VA_ARGS__)
> 
> Practically, making the function "safe" (with respect to buffer overflow,
> type conversions) for most use cases.
> 
> Any feedback, pointers, ... to how to implement will be appreciated
> 
> Yair
> 

I haven't worked through all the details, but I wonder if this could be 
turned around a bit.  Rather than your function taking a variable number 
of arguments of different types, which as you know can be a risky 
business, have it take an array of (type, void*) pairs (where "type" is 
an enumeration).  Use some variadic macro magic to turn the 
"RESULT_SET_READ" into the creation of a local array that is then passed 
on to the function.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Gcc Digest, Vol 29, Issue 7
  2022-07-06  7:17   ` David Brown
@ 2022-07-06  7:59     ` Yair Lenga
  0 siblings, 0 replies; 10+ messages in thread
From: Yair Lenga @ 2022-07-06  7:59 UTC (permalink / raw)
  To: David Brown; +Cc: gcc

Thanks for suggestion, definitely doable, more verbose vs the scanf, but will do the trick. I will use it as my fallback, if no path with my current approach. Yair. Sent from my iPad

> On Jul 6, 2022, at 10:17 AM, David Brown <david@westcontrol.com> wrote:
> 
> I haven't worked through all the details, but I wonder if this could be turned around a bit.  Rather than your function taking a variable number of arguments of different types, which as you know can be a risky business, have it take an array of (type, void*) pairs (where "type" is an enumeration).  Use some variadic macro magic to turn the "RESULT_SET_READ" into the creation of a local array that is then passed on to the function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Safer vararg calls
  2022-07-05 21:24     ` Yair Lenga
@ 2022-07-07 10:02       ` Florian Weimer
  0 siblings, 0 replies; 10+ messages in thread
From: Florian Weimer @ 2022-07-07 10:02 UTC (permalink / raw)
  To: Yair Lenga; +Cc: Yair Lenga via Gcc

* Yair Lenga:

> I prefer not to go into “flame wars” on the merits of C vs C++. My
> projects are (mostly) in “C” and I am happy with this setup. In
> addition, given technical / organizational / business issues,
> switching to C++ not an option.

Sure, but you could at least use C++ to prototype the implementation and
even the ABI.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Safer vararg calls
  2022-06-21 10:43 ` Jonathan Wakely
@ 2022-06-25 14:27   ` Yair Lenga
  0 siblings, 0 replies; 10+ messages in thread
From: Yair Lenga @ 2022-06-25 14:27 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: gcc

Hi Jonathan, thanks for taking the time to review.

I agree with your comment about the attribute name (va_vector, va_type). My
best improvement is "va_sametype". Is it better ? ? May be "va_matchtype" ?
any other suggestions ?

For the case of the sentinel/va_sametype, I hope that the implementation
will recognize the combination of the two, and will allow NULL to be used
as-is, without having to cast it. Altough I believe that NULL pointers are
considered compatible with with any pointer. Not 100% sure about this.

I'm not sure I understand the question about mixing char * and const char
*. Probably I cause confusion with my rushed example, which should be:

      // join all parameters, return newly allocate string.
__attribute__ ((malloc(free), va_matchtype, sentinel)) char
*delimitedstr(char delim, const char *p1, ...);


On Tue, Jun 21, 2022 at 6:44 AM Jonathan Wakely <jwakely.gcc@gmail.com>
wrote:

> On Tue, 21 Jun 2022 at 11:17, Yair Lenga via Gcc <gcc@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > Looking for feedback on the adding new attribute to function calls that
> will help create safer vararg functions.
> >
> > Consider the case where a vararg function takes list of arguments of the
> same type. In my case, there are terminated with a sentinel of null.
> >
> > Char *result = delimitedstr(‘:’ “foo”, “bar”, “zoo”, NULL) ;
> >
> > The standard prototype
> > is char * delimitedstr(char delim, char *p1…) ;
> >
> > Which will currently allow many incorrect calls:
> >  delimitedstr(‘:’, “foo”, 5, 7.3, ‘a’) ;    // bad types + missing
> sentinel.
> >
> > The __attribute__((sentinel)) can force the last arg to be null.
> >
> > My proposal is to add new attribute ((va_vector)) that will add a check
> that all parameters in a vararg list match the typeof the last parameter.
> So that:
>
> "va_vector" is a bad name IMHO. It tells me nothing about what it
> means. Does it have something to do with SIMD vectors?
>
> >
> > __attribute__ ((va_typed)) delimitedstr(char delim, char *p1…) ;
>
> "va_typed" at least suggests something to do with types, but it
> doesn't tell me they have to be the same type.
>
> >
> > Will flag a call where any of the parameter after p1, is not a string.
>
> In your example NULL does not have the same type as the earlier
> arguments. You would have to write (char*)NULL to suppress a
> diagnostic.
>
> I also wonder how a mixture of char* and const char* arguments would
> be handled in your example.
>
>
> >
> > This can result in cleaner, safer code, without making the calling
> sequence more difficult, or modifying the behavior of the call.
> >
> > For Java developers, this is basically the same type checking provided
> by the as ‘datatype …’ (without the conversion into array).
> >
> > I am Looking for feedback, Pointers on how to implement, as I do not
> have experience with extending gcc.
> >
> > Yair
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Safer vararg calls
  2022-06-21 10:16 Safer vararg calls Yair Lenga
@ 2022-06-21 10:43 ` Jonathan Wakely
  2022-06-25 14:27   ` Yair Lenga
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2022-06-21 10:43 UTC (permalink / raw)
  To: Yair Lenga; +Cc: gcc

On Tue, 21 Jun 2022 at 11:17, Yair Lenga via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi,
>
> Looking for feedback on the adding new attribute to function calls that will help create safer vararg functions.
>
> Consider the case where a vararg function takes list of arguments of the same type. In my case, there are terminated with a sentinel of null.
>
> Char *result = delimitedstr(‘:’ “foo”, “bar”, “zoo”, NULL) ;
>
> The standard prototype
> is char * delimitedstr(char delim, char *p1…) ;
>
> Which will currently allow many incorrect calls:
>  delimitedstr(‘:’, “foo”, 5, 7.3, ‘a’) ;    // bad types + missing sentinel.
>
> The __attribute__((sentinel)) can force the last arg to be null.
>
> My proposal is to add new attribute ((va_vector)) that will add a check that all parameters in a vararg list match the typeof the last parameter. So that:

"va_vector" is a bad name IMHO. It tells me nothing about what it
means. Does it have something to do with SIMD vectors?

>
> __attribute__ ((va_typed)) delimitedstr(char delim, char *p1…) ;

"va_typed" at least suggests something to do with types, but it
doesn't tell me they have to be the same type.

>
> Will flag a call where any of the parameter after p1, is not a string.

In your example NULL does not have the same type as the earlier
arguments. You would have to write (char*)NULL to suppress a
diagnostic.

I also wonder how a mixture of char* and const char* arguments would
be handled in your example.


>
> This can result in cleaner, safer code, without making the calling sequence more difficult, or modifying the behavior of the call.
>
> For Java developers, this is basically the same type checking provided by the as ‘datatype …’ (without the conversion into array).
>
> I am Looking for feedback, Pointers on how to implement, as I do not have experience with extending gcc.
>
> Yair

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Safer vararg calls
@ 2022-06-21 10:16 Yair Lenga
  2022-06-21 10:43 ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Yair Lenga @ 2022-06-21 10:16 UTC (permalink / raw)
  To: gcc

Hi,

Looking for feedback on the adding new attribute to function calls that will help create safer vararg functions.

Consider the case where a vararg function takes list of arguments of the same type. In my case, there are terminated with a sentinel of null.

Char *result = delimitedstr(‘:’ “foo”, “bar”, “zoo”, NULL) ;

The standard prototype
is char * delimitedstr(char delim, char *p1…) ;

Which will currently allow many incorrect calls:
 delimitedstr(‘:’, “foo”, 5, 7.3, ‘a’) ;    // bad types + missing sentinel.

The __attribute__((sentinel)) can force the last arg to be null.

My proposal is to add new attribute ((va_vector)) that will add a check that all parameters in a vararg list match the typeof the last parameter. So that:

__attribute__ ((va_typed)) delimitedstr(char delim, char *p1…) ;

Will flag a call where any of the parameter after p1, is not a string.

This can result in cleaner, safer code, without making the calling sequence more difficult, or modifying the behavior of the call.

For Java developers, this is basically the same type checking provided by the as ‘datatype …’ (without the conversion into array).

I am Looking for feedback, Pointers on how to implement, as I do not have experience with extending gcc.

Yair

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-07 10:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.0.1656936003.1843426.gcc@gcc.gnu.org>
2022-07-05  7:19 ` Gcc Digest, Vol 29, Issue 7 Yair Lenga
2022-07-05 12:16   ` Florian Weimer
2022-07-05 21:24     ` Yair Lenga
2022-07-07 10:02       ` Safer vararg calls Florian Weimer
2022-07-05 21:52   ` Gcc Digest, Vol 29, Issue 7 Andrew Pinski
2022-07-06  7:17   ` David Brown
2022-07-06  7:59     ` Yair Lenga
2022-06-21 10:16 Safer vararg calls Yair Lenga
2022-06-21 10:43 ` Jonathan Wakely
2022-06-25 14:27   ` Yair Lenga

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).