libctf: new enum-related API functions: request for better names

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* libctf: new enum-related API functions: request for better names
@ 2024-05-17 12:08 Nick Alcock
  2024-05-20 20:47 ` Stephen Brennan
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Alcock @ 2024-05-17 12:08 UTC (permalink / raw)
  To: binutils

So Stephen Brennan pointed out many years ago that libctf's handling of
enumeration constants is needlessly unhelpful: it treats them as if they
are scoped within a given enum: you can only query from constant name ->
value and back within a given enum's scope, so if you don't already know
what enum something is part of you have to walk over every enum in the
dict hunting for it.

Worse yet, we do not consider enum constants with clashing values to be
a sign of a type conflict, so can easily end up with multiple distinct
enums containing enumeration constants with the *same name* appearing
in the shared dict. This definitely violates the principle of least
surprise and the (largely unstated) assumption that the shared dict
should be "as if" the entire C program's non-conflicting types were
declared in a single giant file which was compiled with -gctf: you can't
write a C file that declares the same enumeration constant twice!

Half of this is easy to fix: libctf, and in particular the deduplicator,
should track enumeration constant names just like it does all other
identifiers, and push enums with clashing names into child dicts. (This
might eat a lot of space when the enums have many other enumerators, but
most of that space is identical strings, which means we can win nearly
all the space back in v4 via the string-saving trick that is the second
entry in <https://sourceware.org/binutils/wiki/CTF/Todo/Compactness>.)

But I'm having trouble figuring out names for the new API functions
we'll need for the rest.  Right now libctf has these:

/* Convert the specified value to the corresponding enum tag name, if a
   matching name can be found.  Otherwise NULL is returned.  */

const char *ctf_enum_name (ctf_dict_t *fp, ctf_id_t type, int value);

/* Convert the specified enum tag name to the corresponding value, if a
   matching name can be found.  Otherwise CTF_ERR is returned.  */

int ctf_enum_value (ctf_dict_t *fp, ctf_id_t type, const char *name,
		    int *valp);

/* Iterate over the members of an ENUM.  We pass the string name and
   associated integer value of each enum element to the specified callback
   function.  */

int ctf_enum_iter (ctf_dict_t *fp, ctf_id_t type, ctf_enum_f *func, void *arg);

/* Iterate over the members of an enum TYPE, returning each enumerand's NAME or
   NULL at end of iteration or error, and optionally passing back the
   enumerand's integer VALue.  */

const char *ctf_enum_next (ctf_dict_t *fp, ctf_id_t type, ctf_next_t **it,
    	                   int *val);

At the very least we want something like dict-wide equivalents of the
first two: but ctf_enum_name has the very annoying behaviour of just
picking the first name if there are multiple conflicting ones with the
same value, and on a dict-wide basis there will be huge numbers of these
(can you imagine how many enumeration constants have the value 1? :) )

But also we want to not completely fail when faced with existing shared
dicts that have many enumeration constants, in different enums, with
thee same name. (These will still be able to happen even in the future,
albeit rarely, because when using custom linkers like the one I'm hoping
to upstream into the Linux kernel, CTF child dicts are not always the
same as C translation units: you can have two enumeration constantss FOO
that are in different translation units in the same kernel module, and
the kernel CTF linker will combine them into the same CTF dict. One enum
will be marked hidden/non-root, but both will be there.)

So functions to iterate over all enumeration constants with a given
value are obviously necessary, but we probably want a non-iterator to
just return the specific enum value that a given name expands to: maybe
we want to just pick one if this is ambiguous (as ctf_enum_value already
does). Below I have the maximally general approach, controlled by a
flag, but this is probably total overkill. I suspect
ctf_enumerator_name_next may still be necessary for existing dicts, but
probably not the flag to ctf_enumerator_value -- but I'm not sure.

For now, I've got this (using int64_t for enum values -- I'm not sure
how to migrate the other enum functions that way in future, but we
clearly have to *somehow*).

I'm very unsatisfied with the naming: to me, ctf_enumerator_* does not
read "like ctf_enum_* but dict-wide": but ctf_dict_enum_* feels wrong
too, as if it were dealing with enums *of* dicts. Suggestions?

/* ... _CTF_ERRORS ... */
  _CTF_ITEM (ECTF_ENUM_NAME_CONFLICT, "Multiple enumeration constants exist with this name.")

/* Flags for ctf_enumerator_value.  */

#define CTF_ENUM_UNIQUE 0x1

/* Get the value of a given named enumerator, dict-wide: also optionally return
   the associated enum type.  It is possible, but rare, for dicts to have
   multiple values for some names, or the same values in multiple distinct enum
   types: if the flags include CTF_ENUM_UNIQUE, fail with
   ECTF_ENUM_NAME_CONFLICT in this case.  Otherwise, return the first
   found.

   There is no function that maps the other way because it is downright common
   (and legal C) to have a single value that many enum constants are defined as
   across an entire dict.  Use ctf_enumerator_name_next instead.  */

int64_t ctf_enumerator_value (ctf_dict_t *fp, const char *name, int flags);

/* Iterate over all enumeration constants with a given value in one
    enum.  */

const char *ctf_enum_name_next (ctf_dict_t *fp, ctf_id_t type, ctf_next_t **it,
                                int64_t value);

/* Iterate over all enumerands in a dict.  The enum TYPE and the
   enumerand VALUE may optionally be returned as well.  */
const char *ctf_enumerator_next (ctf_dict_t *fp, ctf_next_t **it, ctf_id_t *type,
                                int64_t *value);

/* Iterate over the values of all enumeration constants in a dict with a given
   name.  The enum TYPE may optionally be returned as well.  */
int64_t ctf_enumerator_value_next (ctf_dict_t *fp, const char *name,
                                   ctf_next_t **it, ctf_id_t *type);

/* Iterate over the names of all enumeration constants in a dict with a given
   value.  The enum TYPE may optionally be returned as well.  */
const char *ctf_enumerator_name_next (ctf_dict_t *, int64_t value,
                                      ctf_next_t **, ctf_id_t *type);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: libctf: new enum-related API functions: request for better names
  2024-05-17 12:08 libctf: new enum-related API functions: request for better names Nick Alcock
@ 2024-05-20 20:47 ` Stephen Brennan
  2024-05-21 10:13   ` Nick Alcock
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Brennan @ 2024-05-20 20:47 UTC (permalink / raw)
  To: Nick Alcock; +Cc: binutils

Hi Nick,

I'm not subscribed here but found the mailto: link with In-Reply-To
header set on the archive page; hopefully this reply works as expected.

Nick Alcock writes:
> So Stephen Brennan pointed out many years ago that libctf's handling of
> enumeration constants is needlessly unhelpful: it treats them as if they
> are scoped within a given enum: you can only query from constant name ->
> value and back within a given enum's scope, so if you don't already know
> what enum something is part of you have to walk over every enum in the
> dict hunting for it.
>
> Worse yet, we do not consider enum constants with clashing values to be
> a sign of a type conflict, so can easily end up with multiple distinct
> enums containing enumeration constants with the *same name* appearing
> in the shared dict. This definitely violates the principle of least
> surprise and the (largely unstated) assumption that the shared dict
> should be "as if" the entire C program's non-conflicting types were
> declared in a single giant file which was compiled with -gctf: you can't
> write a C file that declares the same enumeration constant twice!
>
> Half of this is easy to fix: libctf, and in particular the deduplicator,
> should track enumeration constant names just like it does all other
> identifiers, and push enums with clashing names into child dicts. (This
> might eat a lot of space when the enums have many other enumerators, but
> most of that space is identical strings, which means we can win nearly
> all the space back in v4 via the string-saving trick that is the second
> entry in <https://sourceware.org/binutils/wiki/CTF/Todo/Compactness>.)
>
> But I'm having trouble figuring out names for the new API functions
> we'll need for the rest.  Right now libctf has these:
>
> /* Convert the specified value to the corresponding enum tag name, if a
>    matching name can be found.  Otherwise NULL is returned.  */
>
> const char *ctf_enum_name (ctf_dict_t *fp, ctf_id_t type, int value);
>
> /* Convert the specified enum tag name to the corresponding value, if a
>    matching name can be found.  Otherwise CTF_ERR is returned.  */
>
> int ctf_enum_value (ctf_dict_t *fp, ctf_id_t type, const char *name,
> 		    int *valp);
>
> /* Iterate over the members of an ENUM.  We pass the string name and
>    associated integer value of each enum element to the specified callback
>    function.  */
>
> int ctf_enum_iter (ctf_dict_t *fp, ctf_id_t type, ctf_enum_f *func, void *arg);
>
> /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or
>    NULL at end of iteration or error, and optionally passing back the
>    enumerand's integer VALue.  */
>
> const char *ctf_enum_next (ctf_dict_t *fp, ctf_id_t type, ctf_next_t **it,
>     	                   int *val);
>
> At the very least we want something like dict-wide equivalents of the
> first two: but ctf_enum_name has the very annoying behaviour of just
> picking the first name if there are multiple conflicting ones with the
> same value, and on a dict-wide basis there will be huge numbers of these
> (can you imagine how many enumeration constants have the value 1? :) )

I've never personally had a use-case for ctf_enum_name(), looking up an
enumerator by the integer value. However, I can understand why you might
want to do it if you know the type ID already (e.g. a debugger may want
to represent an enum variable with the symbolic name).

But I can't imagine a case where:

  a. I have an integer value, and I know it's an enum, but
  b. I don't know which enum type it belongs to, and yet
  c. I *do* know which CTF dictionary it belongs to...
  d. And I want to get the list of all possible enumerators it could be

Maybe I'm deficient in imagination. I'm sure this use case could come
up, but is it something that libctf really ought to optimize for?  I'd
argue that there are only two use cases that really ought to be
supported at the dict-wide level:

1. Lookup an enumerator by name. This is something that users of C do
constantly, implicitly, just by using the constant name in their source
code. So libctf really ought to support it efficiently with some sort of
string index. (IMO, it should be supported at the dict and archive
level).

2. Iterate over all enumerators. This one is already supported quite
well in with libctf today:

ctf_next_t *next = NULL, *enum_next;
ctf_id_t id;
int isroot, enum_value;
const char *enum_name;
while ((id = ctf_type_next(dict, &next, &isroot, 1)) != CTF_ERR) {
    if (ctf_type_kind(dict, id) != CTF_K_ENUM)
        continue;
    enum_next = NULL;
    while ((enum_name = ctf_enum_next(dict, id, &enum_next, &enum_value))) {
        /* do something with (dict, id, enum_name, enum_value, isroot) */
    }
}

You could introduce an API to eliminate some of the boilerplate, which
could be nice enough. As an existing user, I probably wouldn't take
advantage of the new function, since I need to support older libctf
versions. New users might appreciate not needing to write this
function. However, a new API for this would be much less flexible... The
above function allows me to run code for each enum type, before and
after handling all of the enumerators for that type. A
ctf_enumerator_next() function cannot really give me that information.
I'd argue it would be better to let users manually do their own
iteration. Especially since they could combine all their iteration needs
into a single ctf_type_next() loop.

So in my humble opinion, there's really only one function which needs
to be added, handling use case #1.

> But also we want to not completely fail when faced with existing shared
> dicts that have many enumeration constants, in different enums, with
> thee same name. (These will still be able to happen even in the future,
> albeit rarely, because when using custom linkers like the one I'm hoping
> to upstream into the Linux kernel, CTF child dicts are not always the
> same as C translation units: you can have two enumeration constantss FOO
> that are in different translation units in the same kernel module, and
> the kernel CTF linker will combine them into the same CTF dict. One enum
> will be marked hidden/non-root, but both will be there.)
> 
> So functions to iterate over all enumeration constants with a given
> value are obviously necessary, but we probably want a non-iterator to
> just return the specific enum value that a given name expands to: maybe
> we want to just pick one if this is ambiguous (as ctf_enum_value already
> does). Below I have the maximally general approach, controlled by a
> flag, but this is probably total overkill. I suspect
> ctf_enumerator_name_next may still be necessary for existing dicts, but
> probably not the flag to ctf_enumerator_value -- but I'm not sure.
>
> For now, I've got this (using int64_t for enum values -- I'm not sure
> how to migrate the other enum functions that way in future, but we
> clearly have to *somehow*).
>
>
> I'm very unsatisfied with the naming: to me, ctf_enumerator_* does not
> read "like ctf_enum_* but dict-wide": but ctf_dict_enum_* feels wrong
> too, as if it were dealing with enums *of* dicts. Suggestions?

Given my (maybe not so humble) opinion above, I think that naming can
become a bit clearer if you don't try to handle so many use cases. To
me, this is a clear case of a "lookup". The word "lookup" to me implies
a wide search, not simply within a single enum type. And if you only
support name lookup, then the functions can be called:

ctf_id_t ctf_lookup_enumerator(ctf_dict_t *, const char *, int64_t *enum_value);
ctf_id_t ctf_lookup_enumerator_next(ctf_dict_t *, const char *,
                                    int64_t *enum_value, ctf_next_t **next);

This would also make it easier to perform the (in my opinion, equally
important) archive-level lookup:

ctf_id_t ctf_arc_lookup_enumerator_next(ctf_archive_t *, const char *,
                                        int64_t *enum_value, ctf_dict_t **dict,
                                        ctf_next_t **next);

The reason this feels so important for me is that, from a debugger's
perspective, we don't frequently know which dictionary to search.
Frequently a user is just saying "give me the constant FOO", with no
scope or anything to give a hint. Looking up a constant at the dict
level is good enough for the 95% of the time when the constant lives in
the parent dict. But in the remaining 5% it really stinks that you would
need to go through each dict and re-do the search (which would likely
repeat the failed search in the parent dict).

 => Note: with these archive-level lookups, though, it would be really
 nice to avoid returning a brand new ctf_dict_t *. I don't know how the
 semantics would work: only search dictionaries that already have an
 open handle? Reference count the dictionaries and simply return the
 same one if it's already open?

If you do stick with your proposed API, which is good too, then I
suppose I have a slight preference for "ctf_dict_enum_*". It doesn't
make my mind jump to "enums of dicts", it just sounds like it's a
function/method of a dict dealing with enums.

Hopefully something in my ramblings above proves helpful!

Thanks for taking a great stab at this and the documentation efforts too,

Stephen

> /* ... _CTF_ERRORS ... */
>   _CTF_ITEM (ECTF_ENUM_NAME_CONFLICT, "Multiple enumeration constants exist with this name.")
>
> /* Flags for ctf_enumerator_value.  */
>
> #define CTF_ENUM_UNIQUE 0x1
>
> /* Get the value of a given named enumerator, dict-wide: also optionally return
>    the associated enum type.  It is possible, but rare, for dicts to have
>    multiple values for some names, or the same values in multiple distinct enum
>    types: if the flags include CTF_ENUM_UNIQUE, fail with
>    ECTF_ENUM_NAME_CONFLICT in this case.  Otherwise, return the first
>    found.
>
>    There is no function that maps the other way because it is downright common
>    (and legal C) to have a single value that many enum constants are defined as
>    across an entire dict.  Use ctf_enumerator_name_next instead.  */
>
> int64_t ctf_enumerator_value (ctf_dict_t *fp, const char *name, int flags);
>
> /* Iterate over all enumeration constants with a given value in one
>     enum.  */
>
> const char *ctf_enum_name_next (ctf_dict_t *fp, ctf_id_t type, ctf_next_t **it,
>                                 int64_t value);
>
> /* Iterate over all enumerands in a dict.  The enum TYPE and the
>    enumerand VALUE may optionally be returned as well.  */
> const char *ctf_enumerator_next (ctf_dict_t *fp, ctf_next_t **it, ctf_id_t *type,
>                                 int64_t *value);
>
> /* Iterate over the values of all enumeration constants in a dict with a given
>    name.  The enum TYPE may optionally be returned as well.  */
> int64_t ctf_enumerator_value_next (ctf_dict_t *fp, const char *name,
>                                    ctf_next_t **it, ctf_id_t *type);
>
> /* Iterate over the names of all enumeration constants in a dict with a given
>    value.  The enum TYPE may optionally be returned as well.  */
> const char *ctf_enumerator_name_next (ctf_dict_t *, int64_t value,
>                                       ctf_next_t **, ctf_id_t *type);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libctf: new enum-related API functions: request for better names
  2024-05-20 20:47 ` Stephen Brennan
@ 2024-05-21 10:13   ` Nick Alcock
  2024-05-22 10:31     ` Nick Alcock
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Nick Alcock @ 2024-05-21 10:13 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: jose.marchesi, indu.bhagat, binutils

[Jose, Indu: your only interest here might be my musing about
 identifying what mysterious values in memory dumps are: search for
 "#define". If this were to happen it wouldn't be soon, would definitely
 not be on by default, but it would need compiler help to do something
 much like -g3 does now.]

On 20 May 2024, Stephen Brennan spake thusly:

> Hi Nick,
>
> I'm not subscribed here but found the mailto: link with In-Reply-To
> header set on the archive page; hopefully this reply works as expected.

It does! I like your suggested API and am switching straight to that
instead.

It's such a good API that my eye skipped over one function and I thought
'oh, we're missing that' and proposed a new one with the exact same name
and parameters in the same order before noticing it was already in your
list. :)

> Nick Alcock writes:
>> So Stephen Brennan pointed out many years ago that libctf's handling of
>> enumeration constants is needlessly unhelpful: it treats them as if they
>> are scoped within a given enum: you can only query from constant name ->
>> value and back within a given enum's scope, so if you don't already know
>> what enum something is part of you have to walk over every enum in the
>> dict hunting for it.
>>
>> Worse yet, we do not consider enum constants with clashing values to be
>> a sign of a type conflict, so can easily end up with multiple distinct
>> enums containing enumeration constants with the *same name* appearing
>> in the shared dict. This definitely violates the principle of least
>> surprise and the (largely unstated) assumption that the shared dict
>> should be "as if" the entire C program's non-conflicting types were
>> declared in a single giant file which was compiled with -gctf: you can't
>> write a C file that declares the same enumeration constant twice!
>>
>> Half of this is easy to fix: libctf, and in particular the deduplicator,
>> should track enumeration constant names just like it does all other
>> identifiers, and push enums with clashing names into child dicts. (This
>> might eat a lot of space when the enums have many other enumerators, but
>> most of that space is identical strings, which means we can win nearly
>> all the space back in v4 via the string-saving trick that is the second
>> entry in <https://sourceware.org/binutils/wiki/CTF/Todo/Compactness>.)
>>
>> But I'm having trouble figuring out names for the new API functions
>> we'll need for the rest.  Right now libctf has these:
>>
>> /* Convert the specified value to the corresponding enum tag name, if a
>>    matching name can be found.  Otherwise NULL is returned.  */
>>
>> const char *ctf_enum_name (ctf_dict_t *fp, ctf_id_t type, int value);
>>
>> /* Convert the specified enum tag name to the corresponding value, if a
>>    matching name can be found.  Otherwise CTF_ERR is returned.  */
>>
>> int ctf_enum_value (ctf_dict_t *fp, ctf_id_t type, const char *name,
>> 		    int *valp);
>>
>> /* Iterate over the members of an ENUM.  We pass the string name and
>>    associated integer value of each enum element to the specified callback
>>    function.  */
>>
>> int ctf_enum_iter (ctf_dict_t *fp, ctf_id_t type, ctf_enum_f *func, void *arg);
>>
>> /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or
>>    NULL at end of iteration or error, and optionally passing back the
>>    enumerand's integer VALue.  */
>>
>> const char *ctf_enum_next (ctf_dict_t *fp, ctf_id_t type, ctf_next_t **it,
>>     	                   int *val);
>>
>> At the very least we want something like dict-wide equivalents of the
>> first two: but ctf_enum_name has the very annoying behaviour of just
>> picking the first name if there are multiple conflicting ones with the
>> same value, and on a dict-wide basis there will be huge numbers of these
>> (can you imagine how many enumeration constants have the value 1? :) )
>
> I've never personally had a use-case for ctf_enum_name(), looking up an
> enumerator by the integer value. However, I can understand why you might
> want to do it if you know the type ID already (e.g. a debugger may want
> to represent an enum variable with the symbolic name).

I mostly put it there for completeness -- it's something you can't do
without a global view of the type system, which you *can* do with one.

> But I can't imagine a case where:
>
>   a. I have an integer value, and I know it's an enum, but
>   b. I don't know which enum type it belongs to, and yet

Indeed -- this alone suggests you have no idea what it's being passed
to, so where did you get it from? Usually in my case this is "memory
dumps but I'm not quite sure what type it is" and I want to know if some
huge mysterious magic number is actually an enum -- but for this to be
really useful we also need to translate #defines of integers into
something enum-like (a single big "enum" named "#DEFINE" perhaps, or
some other C-invalid name). Hmmmm...

>   c. I *do* know which CTF dictionary it belongs to...
>   d. And I want to get the list of all possible enumerators it could be

... but this is a good point. Sounds like a cross-archive one would be
more useful, but still not very useful :) I'll drop it for now.

> Maybe I'm deficient in imagination. I'm sure this use case could come
> up, but is it something that libctf really ought to optimize for?  I'd
> argue that there are only two use cases that really ought to be
> supported at the dict-wide level:
>
> 1. Lookup an enumerator by name. This is something that users of C do
> constantly, implicitly, just by using the constant name in their source
> code. So libctf really ought to support it efficiently with some sort of
> string index. (IMO, it should be supported at the dict and archive
> level).

And this is something we need to track *anyway* to prevent people adding
the same enumerator identifier repeatedly (in a given dict), you know
like they can now without libctf complaining at all, even though it's
totally impossible in C.

We probably do need to provide an archive-wide iterator version of this
as well, so we can look up all enumerators with a given name -- and that
would also handle the unfortunate existing case of multiple identifiers
with different values, coming from different enums, in the same dict.
Like the one you propose below!

> 2. Iterate over all enumerators. This one is already supported quite
> well in with libctf today:

... of course this too only works on a per-dict level, but that's
probably what you're after, since usually you're looking at things from
the perspective of a particular child dict.

> You could introduce an API to eliminate some of the boilerplate, which
> could be nice enough. As an existing user, I probably wouldn't take
> advantage of the new function, since I need to support older libctf
> versions.

For now :)

> function. However, a new API for this would be much less flexible... The
> above function allows me to run code for each enum type, before and
> after handling all of the enumerators for that type. A
> ctf_enumerator_next() function cannot really give me that information.
[...]

I think it could if properly defined, though the definition would be a
bit odd. Something like:

/* Iterate over the members of an enum TYPE, returning each enumerand's NAME or
   NULL at end of iteration or error, and optionally passing back the
   enumerand's integer VALue.  On end of iteration, sets ECTF_NEXT_END.
   At end of each enum, sets ECTF_NEXT_ENUM_END (and iteration
   continues).  */

const char *ctf_enumerator_next (ctf_dict_t *fp, ctf_next_t **it,
    	                         ctf_id_t *enum, int *val);

With that in place, you can do this:

const char *enumrator;
ctf_id_t enum_id;
int64_t value;

while ((enumerator = ctf_enumerator_next (dict, &next, &enum_id, &value)) != NULL
       || ctf_errno (dict) == ECTF_NEXT_ENUM_END) {
{
	if (ctf_errno (dict) == ECTF_NEXT_ENUM_END) {
		/* end-of-this-enum-type stuff */
		continue;
	}
	/* one-enumerand stuff... */
}
if (ctf_errno (dict) != ECTF_NEXT_END) {
	ctf_next_destroy (next);
	/* oops, error... */
}

Now possibly this is too different from the way other iterators work,
I'm not sure... the repetition of ECTF_NEXT_ENUM_END is ugly but there
is probably a less ugly way if I just thought for a moment :)

Compared to your example, hm, I'm not sure if this new API would buy us
enough to be worth it, but it's at least *possible* to do the same thing
looks pretty similar, and means you don't need to filter out non-enums:

> ctf_next_t *next = NULL, *enum_next;
> ctf_id_t id;
> int isroot, enum_value;
> const char *enum_name;
> while ((id = ctf_type_next(dict, &next, &isroot, 1)) != CTF_ERR) {
>     if (ctf_type_kind(dict, id) != CTF_K_ENUM)
>         continue;
>     enum_next = NULL;
>     while ((enum_name = ctf_enum_next(dict, id, &enum_next, &enum_value))) {
>         /* do something with (dict, id, enum_name, enum_value, isroot) */
>     }
> }

Maybe I'm just overdesigning again. I'm not proposing adding this one
yet, not until you tell me it might be useful.

> I'd argue it would be better to let users manually do their own
> iteration. Especially since they could combine all their iteration needs
> into a single ctf_type_next() loop.

True!

>> I'm very unsatisfied with the naming: to me, ctf_enumerator_* does not
>> read "like ctf_enum_* but dict-wide": but ctf_dict_enum_* feels wrong
>> too, as if it were dealing with enums *of* dicts. Suggestions?
>
> Given my (maybe not so humble) opinion above, I think that naming can
> become a bit clearer if you don't try to handle so many use cases. To

I knew my problem here was wild overdesign, but I wasn't sure what
direction I was overdesigning :) this is very valuable feedback, thank
you.

> me, this is a clear case of a "lookup". The word "lookup" to me implies
> a wide search, not simply within a single enum type. And if you only
> support name lookup, then the functions can be called:

Good point.

> ctf_id_t ctf_lookup_enumerator(ctf_dict_t *, const char *, int64_t *enum_value);
> ctf_id_t ctf_lookup_enumerator_next(ctf_dict_t *, const char *,
>                                     int64_t *enum_value, ctf_next_t **next);
>
> This would also make it easier to perform the (in my opinion, equally
> important) archive-level lookup:
>
> ctf_id_t ctf_arc_lookup_enumerator_next(ctf_archive_t *, const char *,
>                                         int64_t *enum_value, ctf_dict_t **dict,
>                                         ctf_next_t **next);

Very nice! I hereby ditch my design and switch to these, with one slight
rearrangement of parameters to satisfy the not-terribly-well-documented
parameter ordering rule for _next iterators ("dict, query, iterator,
out"):

ctf_id_t ctf_lookup_enumerator_next(ctf_dict_t *, const char *,
                                    ctf_next_t **next, int64_t *enum_value);

ctf_id_t ctf_arc_lookup_enumerator_next(ctf_archive_t *, const char *,
                                        ctf_next_t **next, int64_t *enum_value,
                                        ctf_dict_t **dict);

(but do we want to return the *enum type*, or the value? I guess the
type, as you do above, because you can do more with it.)

ctf_lookup_enumerator has an annoying naming difference from the
one-enum-type existing version, ctf_enum_value, but so be it. It's too
common an operation to force people to use iterators every time.

One question remains about ctf_lookup_enumerator: if it finds more than
one enumerator with that name (i.e. an old dict, before the deduplicator
considered such things conflicts), should we error or just pick one to
return? I'm wavering towards erroring with ECTF_DUPLICATE so the caller
can switch to using the iterator (or just choose not to, since such
cases are rare even now and will get rarer).

> The reason this feels so important for me is that, from a debugger's
> perspective, we don't frequently know which dictionary to search.

Yes indeed. I want to provide better "find an appropriate dict holding
this conflicting THING" functions, but I was holding off doing that
until after v4 and until I could think up a way to let the caller impose
an ordering over dicts -- in every case I know of so far the caller
knows that some child dicts, if they exist, should take priority over
others if the THING is found in them, so a function doing the search in
an arbitrary order would be useless.

> Frequently a user is just saying "give me the constant FOO", with no
> scope or anything to give a hint. Looking up a constant at the dict
> level is good enough for the 95% of the time when the constant lives in
> the parent dict. But in the remaining 5% it really stinks that you would
> need to go through each dict and re-do the search (which would likely
> repeat the failed search in the parent dict).

Hell yes.

>  => Note: with these archive-level lookups, though, it would be really
>  nice to avoid returning a brand new ctf_dict_t *. I don't know how the
>  semantics would work: only search dictionaries that already have an
>  open handle? Reference count the dictionaries and simply return the
>  same one if it's already open?

This is already handled :) dicts are alrady refcounted (which is where
the huge memory leak you spotted a while back came from...). We cache
the dicts internally to the ctf_archive_t, so you can act as if they're
new dicts, close them when you're done, and not pay any of the opening
costs except for the first time.

All this is done for you for all dicts returned from ctf_archive_next,
so everything that uses that to walk over dicts will get automatic
caching. See ctf-archive.c:ctf_dict_open_cached.

> If you do stick with your proposed API, which is good too, then I

Nah, yours is much better! I was just seduced by the previously
impossible operation of looking up enumerator constants given values :)
but as noted it's also a pretty useless operation most of the time, so
we can put it off until we actually have a use for it (and a better
interface, and a way to turn #defines into something similar).

> Hopefully something in my ramblings above proves helpful!

Absolutely!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libctf: new enum-related API functions: request for better names
  2024-05-21 10:13   ` Nick Alcock
@ 2024-05-22 10:31     ` Nick Alcock
  2024-05-22 19:50     ` Indu Bhagat
  2024-05-22 21:09     ` Indu Bhagat
  2 siblings, 0 replies; 8+ messages in thread
From: Nick Alcock @ 2024-05-22 10:31 UTC (permalink / raw)
  To: Stephen Brennan, jose.marchesi, indu.bhagat, binutils

On 21 May 2024, Nick Alcock uttered the following:

>> function. However, a new API for this would be much less flexible... The
>> above function allows me to run code for each enum type, before and
>> after handling all of the enumerators for that type. A
>> ctf_enumerator_next() function cannot really give me that information.
> [...]
>
> I think it could if properly defined, though the definition would be a
> bit odd. Something like:
>
> /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or
>    NULL at end of iteration or error, and optionally passing back the
>    enumerand's integer VALue.  On end of iteration, sets ECTF_NEXT_END.
>    At end of each enum, sets ECTF_NEXT_ENUM_END (and iteration
>    continues).  */
>
> const char *ctf_enumerator_next (ctf_dict_t *fp, ctf_next_t **it,
>     	                         ctf_id_t *enum, int *val);
>
> With that in place, you can do this:
>
> const char *enumrator;
> ctf_id_t enum_id;
> int64_t value;
>
> while ((enumerator = ctf_enumerator_next (dict, &next, &enum_id, &value)) != NULL
>        || ctf_errno (dict) == ECTF_NEXT_ENUM_END) {
> {
> 	if (ctf_errno (dict) == ECTF_NEXT_ENUM_END) {
> 		/* end-of-this-enum-type stuff */
> 		continue;
> 	}
> 	/* one-enumerand stuff... */
> }
> if (ctf_errno (dict) != ECTF_NEXT_END) {
> 	ctf_next_destroy (next);
> 	/* oops, error... */
> }
>
> Now possibly this is too different from the way other iterators work,
> I'm not sure... the repetition of ECTF_NEXT_ENUM_END is ugly but there
> is probably a less ugly way if I just thought for a moment :)

No no that's way too complex! You can do it with a simple
ctf_enumerator_next() that does nothing special at all other than return
different enumeration constants one after the other like any other
iterator would (assuming it does it in a sane way and returns all the
constants for one enum before moving to the next, which, well, yes of
course it will):

const char *enumerator;
ctf_id_t old_enum_id = CTF_ERR, enum_id;
int64_t value;

while ((enumerator = ctf_enumerator_next (dict, &next, &enum_id, &value)) != NULL)
{
	if (old_enum_id != enum_id) {
		/* end-of-this-enum-type stuff, then fall through */
	}
}
if (ctf_errno (dict) != ECTF_NEXT_END) {
	ctf_next_destroy (next);
	/* oops, error... */
}
if (old_enum_id != CTF_ERR) {
	/* end-of-the-last-enum-type stuff */
}

It's a bit clunky because you have to double some stuff up, but that's
common in C loop-like constructs and easily fixed by splitting a bit out
into another function.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libctf: new enum-related API functions: request for better names
  2024-05-21 10:13   ` Nick Alcock
  2024-05-22 10:31     ` Nick Alcock
@ 2024-05-22 19:50     ` Indu Bhagat
  2024-06-06 16:01       ` Nick Alcock
  2024-05-22 21:09     ` Indu Bhagat
  2 siblings, 1 reply; 8+ messages in thread
From: Indu Bhagat @ 2024-05-22 19:50 UTC (permalink / raw)
  To: Nick Alcock, Stephen Brennan; +Cc: jose.marchesi, indu.bhagat, binutils

On 5/21/24 3:13 AM, Nick Alcock wrote:
> [Jose, Indu: your only interest here might be my musing about
>   identifying what mysterious values in memory dumps are: search for
>   "#define". If this were to happen it wouldn't be soon, would definitely
>   not be on by default, but it would need compiler help to do something
>   much like -g3 does now.]
> 
> On 20 May 2024, Stephen Brennan spake thusly:
> 
>> Hi Nick,
>>
>> I'm not subscribed here but found the mailto: link with In-Reply-To
>> header set on the archive page; hopefully this reply works as expected.
> It does! I like your suggested API and am switching straight to that
> instead.
> 
> It's such a good API that my eye skipped over one function and I thought
> 'oh, we're missing that' and proposed a new one with the exact same name
> and parameters in the same order before noticing it was already in your
> list.:)
> 
>> Nick Alcock writes:
>>> So Stephen Brennan pointed out many years ago that libctf's handling of
>>> enumeration constants is needlessly unhelpful: it treats them as if they
>>> are scoped within a given enum: you can only query from constant name ->
>>> value and back within a given enum's scope, so if you don't already know
>>> what enum something is part of you have to walk over every enum in the
>>> dict hunting for it.
>>>
>>> Worse yet, we do not consider enum constants with clashing values to be
>>> a sign of a type conflict, so can easily end up with multiple distinct
>>> enums containing enumeration constants with the*same name*  appearing
>>> in the shared dict. This definitely violates the principle of least
>>> surprise and the (largely unstated) assumption that the shared dict
>>> should be "as if" the entire C program's non-conflicting types were
>>> declared in a single giant file which was compiled with -gctf: you can't
>>> write a C file that declares the same enumeration constant twice!
>>>
>>> Half of this is easy to fix: libctf, and in particular the deduplicator,
>>> should track enumeration constant names just like it does all other
>>> identifiers, and push enums with clashing names into child dicts. (This
>>> might eat a lot of space when the enums have many other enumerators, but
>>> most of that space is identical strings, which means we can win nearly
>>> all the space back in v4 via the string-saving trick that is the second
>>> entry in<https://sourceware.org/binutils/wiki/CTF/Todo/Compactness>.)
>>>
>>> But I'm having trouble figuring out names for the new API functions
>>> we'll need for the rest.  Right now libctf has these:
>>>
>>> /* Convert the specified value to the corresponding enum tag name, if a
>>>     matching name can be found.  Otherwise NULL is returned.  */
>>>
>>> const char *ctf_enum_name (ctf_dict_t *fp, ctf_id_t type, int value);
>>>
>>> /* Convert the specified enum tag name to the corresponding value, if a
>>>     matching name can be found.  Otherwise CTF_ERR is returned.  */
>>>
>>> int ctf_enum_value (ctf_dict_t *fp, ctf_id_t type, const char *name,
>>> 		    int *valp);
>>>
>>> /* Iterate over the members of an ENUM.  We pass the string name and
>>>     associated integer value of each enum element to the specified callback
>>>     function.  */
>>>
>>> int ctf_enum_iter (ctf_dict_t *fp, ctf_id_t type, ctf_enum_f *func, void *arg);
>>>
>>> /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or
>>>     NULL at end of iteration or error, and optionally passing back the
>>>     enumerand's integer VALue.  */
>>>
>>> const char *ctf_enum_next (ctf_dict_t *fp, ctf_id_t type, ctf_next_t **it,
>>>      	                   int *val);
>>>
>>> At the very least we want something like dict-wide equivalents of the
>>> first two: but ctf_enum_name has the very annoying behaviour of just
>>> picking the first name if there are multiple conflicting ones with the
>>> same value, and on a dict-wide basis there will be huge numbers of these
>>> (can you imagine how many enumeration constants have the value 1?:)  )
>> I've never personally had a use-case for ctf_enum_name(), looking up an
>> enumerator by the integer value. However, I can understand why you might
>> want to do it if you know the type ID already (e.g. a debugger may want
>> to represent an enum variable with the symbolic name).
> I mostly put it there for completeness -- it's something you can't do
> without a global view of the type system, which you*can*  do with one.
> 
>> But I can't imagine a case where:
>>
>>    a. I have an integer value, and I know it's an enum, but
>>    b. I don't know which enum type it belongs to, and yet
> Indeed -- this alone suggests you have no idea what it's being passed
> to, so where did you get it from? Usually in my case this is "memory
> dumps but I'm not quite sure what type it is" and I want to know if some
> huge mysterious magic number is actually an enum -- but for this to be
> really useful we also need to translate #defines of integers into
> something enum-like (a single big "enum" named "#DEFINE" perhaps, or
> some other C-invalid name). Hmmmm...
> 

I am not convinced this should be done in CTF at all. This would fall in 
the category of supporting "debugging" in general (which opens up a 
whole different pandora box of other things), not "type inspection / 
introspection".  IOW, strictly speaking this isn't type information. 
Having to use fake types is not something palatable either.

Anyway, IIUC, this discussion has already evolved enough to agree that 
we dont need such an API. Thats good :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libctf: new enum-related API functions: request for better names
  2024-05-21 10:13   ` Nick Alcock
  2024-05-22 10:31     ` Nick Alcock
  2024-05-22 19:50     ` Indu Bhagat
@ 2024-05-22 21:09     ` Indu Bhagat
  2024-06-06 16:00       ` Nick Alcock
  2 siblings, 1 reply; 8+ messages in thread
From: Indu Bhagat @ 2024-05-22 21:09 UTC (permalink / raw)
  To: Nick Alcock, Stephen Brennan; +Cc: jose.marchesi, indu.bhagat, binutils

On 5/21/24 3:13 AM, Nick Alcock wrote:
> [Jose, Indu: your only interest here might be my musing about
>   identifying what mysterious values in memory dumps are: search for
>   "#define". If this were to happen it wouldn't be soon, would definitely
>   not be on by default, but it would need compiler help to do something
>   much like -g3 does now.]
> 
> On 20 May 2024, Stephen Brennan spake thusly:
> 
>> Hi Nick,
>>
>> I'm not subscribed here but found the mailto: link with In-Reply-To
>> header set on the archive page; hopefully this reply works as expected.
> 
> It does! I like your suggested API and am switching straight to that
> instead.
> 
> It's such a good API that my eye skipped over one function and I thought
> 'oh, we're missing that' and proposed a new one with the exact same name
> and parameters in the same order before noticing it was already in your
> list. :)
> 
>> Nick Alcock writes:
>>> So Stephen Brennan pointed out many years ago that libctf's handling of
>>> enumeration constants is needlessly unhelpful: it treats them as if they
>>> are scoped within a given enum: you can only query from constant name ->
>>> value and back within a given enum's scope, so if you don't already know
>>> what enum something is part of you have to walk over every enum in the
>>> dict hunting for it.
>>>
>>> Worse yet, we do not consider enum constants with clashing values to be
>>> a sign of a type conflict, so can easily end up with multiple distinct
>>> enums containing enumeration constants with the *same name* appearing
>>> in the shared dict. This definitely violates the principle of least
>>> surprise and the (largely unstated) assumption that the shared dict
>>> should be "as if" the entire C program's non-conflicting types were
>>> declared in a single giant file which was compiled with -gctf: you can't
>>> write a C file that declares the same enumeration constant twice!
>>>
>>> Half of this is easy to fix: libctf, and in particular the deduplicator,
>>> should track enumeration constant names just like it does all other
>>> identifiers, and push enums with clashing names into child dicts. (This
>>> might eat a lot of space when the enums have many other enumerators, but
>>> most of that space is identical strings, which means we can win nearly
>>> all the space back in v4 via the string-saving trick that is the second
>>> entry in <https://sourceware.org/binutils/wiki/CTF/Todo/Compactness>.)
>>>
>>> But I'm having trouble figuring out names for the new API functions
>>> we'll need for the rest.  Right now libctf has these:
>>>
>>> /* Convert the specified value to the corresponding enum tag name, if a
>>>     matching name can be found.  Otherwise NULL is returned.  */
>>>
>>> const char *ctf_enum_name (ctf_dict_t *fp, ctf_id_t type, int value);
>>>
>>> /* Convert the specified enum tag name to the corresponding value, if a
>>>     matching name can be found.  Otherwise CTF_ERR is returned.  */
>>>
>>> int ctf_enum_value (ctf_dict_t *fp, ctf_id_t type, const char *name,
>>> 		    int *valp);
>>>
>>> /* Iterate over the members of an ENUM.  We pass the string name and
>>>     associated integer value of each enum element to the specified callback
>>>     function.  */
>>>
>>> int ctf_enum_iter (ctf_dict_t *fp, ctf_id_t type, ctf_enum_f *func, void *arg);
>>>
>>> /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or
>>>     NULL at end of iteration or error, and optionally passing back the
>>>     enumerand's integer VALue.  */
>>>
>>> const char *ctf_enum_next (ctf_dict_t *fp, ctf_id_t type, ctf_next_t **it,
>>>      	                   int *val);
>>>
>>> At the very least we want something like dict-wide equivalents of the
>>> first two: but ctf_enum_name has the very annoying behaviour of just
>>> picking the first name if there are multiple conflicting ones with the
>>> same value, and on a dict-wide basis there will be huge numbers of these
>>> (can you imagine how many enumeration constants have the value 1? :) )
>>
>> I've never personally had a use-case for ctf_enum_name(), looking up an
>> enumerator by the integer value. However, I can understand why you might
>> want to do it if you know the type ID already (e.g. a debugger may want
>> to represent an enum variable with the symbolic name).
> 
> I mostly put it there for completeness -- it's something you can't do
> without a global view of the type system, which you *can* do with one.
> 
>> But I can't imagine a case where:
>>
>>    a. I have an integer value, and I know it's an enum, but
>>    b. I don't know which enum type it belongs to, and yet
> 
> Indeed -- this alone suggests you have no idea what it's being passed
> to, so where did you get it from? Usually in my case this is "memory
> dumps but I'm not quite sure what type it is" and I want to know if some
> huge mysterious magic number is actually an enum -- but for this to be
> really useful we also need to translate #defines of integers into
> something enum-like (a single big "enum" named "#DEFINE" perhaps, or
> some other C-invalid name). Hmmmm...
> 
>>    c. I *do* know which CTF dictionary it belongs to...
>>    d. And I want to get the list of all possible enumerators it could be
> 
> ... but this is a good point. Sounds like a cross-archive one would be
> more useful, but still not very useful :) I'll drop it for now.
> 
>> Maybe I'm deficient in imagination. I'm sure this use case could come
>> up, but is it something that libctf really ought to optimize for?  I'd
>> argue that there are only two use cases that really ought to be
>> supported at the dict-wide level:
>>
>> 1. Lookup an enumerator by name. This is something that users of C do
>> constantly, implicitly, just by using the constant name in their source
>> code. So libctf really ought to support it efficiently with some sort of
>> string index. (IMO, it should be supported at the dict and archive
>> level).
> 
> And this is something we need to track *anyway* to prevent people adding
> the same enumerator identifier repeatedly (in a given dict), you know
> like they can now without libctf complaining at all, even though it's
> totally impossible in C.
> 
> We probably do need to provide an archive-wide iterator version of this
> as well, so we can look up all enumerators with a given name -- and that
> would also handle the unfortunate existing case of multiple identifiers
> with different values, coming from different enums, in the same dict.
> Like the one you propose below!
> 
>> 2. Iterate over all enumerators. This one is already supported quite
>> well in with libctf today:
> 
> ... of course this too only works on a per-dict level, but that's
> probably what you're after, since usually you're looking at things from
> the perspective of a particular child dict.
> 
>> You could introduce an API to eliminate some of the boilerplate, which
>> could be nice enough. As an existing user, I probably wouldn't take
>> advantage of the new function, since I need to support older libctf
>> versions.
> 
> For now :)
> 
>> function. However, a new API for this would be much less flexible... The
>> above function allows me to run code for each enum type, before and
>> after handling all of the enumerators for that type. A
>> ctf_enumerator_next() function cannot really give me that information.
> [...]
> 
> I think it could if properly defined, though the definition would be a
> bit odd. Something like:
> 
> /* Iterate over the members of an enum TYPE, returning each enumerand's NAME or
>     NULL at end of iteration or error, and optionally passing back the
>     enumerand's integer VALue.  On end of iteration, sets ECTF_NEXT_END.
>     At end of each enum, sets ECTF_NEXT_ENUM_END (and iteration
>     continues).  */
> 
> const char *ctf_enumerator_next (ctf_dict_t *fp, ctf_next_t **it,
>      	                         ctf_id_t *enum, int *val);
> 
> With that in place, you can do this:
> 
> const char *enumrator;
> ctf_id_t enum_id;
> int64_t value;
> 
> while ((enumerator = ctf_enumerator_next (dict, &next, &enum_id, &value)) != NULL
>         || ctf_errno (dict) == ECTF_NEXT_ENUM_END) {
> {
> 	if (ctf_errno (dict) == ECTF_NEXT_ENUM_END) {
> 		/* end-of-this-enum-type stuff */
> 		continue;
> 	}
> 	/* one-enumerand stuff... */
> }
> if (ctf_errno (dict) != ECTF_NEXT_END) {
> 	ctf_next_destroy (next);
> 	/* oops, error... */
> }
> 
> Now possibly this is too different from the way other iterators work,
> I'm not sure... the repetition of ECTF_NEXT_ENUM_END is ugly but there
> is probably a less ugly way if I just thought for a moment :)
> 
> Compared to your example, hm, I'm not sure if this new API would buy us
> enough to be worth it, but it's at least *possible* to do the same thing
> looks pretty similar, and means you don't need to filter out non-enums:
> 
>> ctf_next_t *next = NULL, *enum_next;
>> ctf_id_t id;
>> int isroot, enum_value;
>> const char *enum_name;
>> while ((id = ctf_type_next(dict, &next, &isroot, 1)) != CTF_ERR) {
>>      if (ctf_type_kind(dict, id) != CTF_K_ENUM)
>>          continue;
>>      enum_next = NULL;
>>      while ((enum_name = ctf_enum_next(dict, id, &enum_next, &enum_value))) {
>>          /* do something with (dict, id, enum_name, enum_value, isroot) */
>>      }
>> }
> 
> Maybe I'm just overdesigning again. I'm not proposing adding this one
> yet, not until you tell me it might be useful.
> 
>> I'd argue it would be better to let users manually do their own
>> iteration. Especially since they could combine all their iteration needs
>> into a single ctf_type_next() loop.
> 
> True!
> 
>>> I'm very unsatisfied with the naming: to me, ctf_enumerator_* does not
>>> read "like ctf_enum_* but dict-wide": but ctf_dict_enum_* feels wrong
>>> too, as if it were dealing with enums *of* dicts. Suggestions?
>>
>> Given my (maybe not so humble) opinion above, I think that naming can
>> become a bit clearer if you don't try to handle so many use cases. To
> 
> I knew my problem here was wild overdesign, but I wasn't sure what
> direction I was overdesigning :) this is very valuable feedback, thank
> you.
> 
>> me, this is a clear case of a "lookup". The word "lookup" to me implies
>> a wide search, not simply within a single enum type. And if you only
>> support name lookup, then the functions can be called:
> 
> Good point.
> 
>> ctf_id_t ctf_lookup_enumerator(ctf_dict_t *, const char *, int64_t *enum_value);
>> ctf_id_t ctf_lookup_enumerator_next(ctf_dict_t *, const char *,
>>                                      int64_t *enum_value, ctf_next_t **next);
>>
>> This would also make it easier to perform the (in my opinion, equally
>> important) archive-level lookup:
>>
>> ctf_id_t ctf_arc_lookup_enumerator_next(ctf_archive_t *, const char *,
>>                                          int64_t *enum_value, ctf_dict_t **dict,
>>                                          ctf_next_t **next);
> 
> Very nice! I hereby ditch my design and switch to these, with one slight
> rearrangement of parameters to satisfy the not-terribly-well-documented
> parameter ordering rule for _next iterators ("dict, query, iterator,
> out"):
> 

FWIW, I too like the above proposal by Steve better: the naming of the 
API and the aspect of including the minimal necessary ones for 
supporting enumerator name lookup / enumerator iteration look good.

> ctf_id_t ctf_lookup_enumerator_next(ctf_dict_t *, const char *,
>                                      ctf_next_t **next, int64_t *enum_value);
> 
> ctf_id_t ctf_arc_lookup_enumerator_next(ctf_archive_t *, const char *,
>                                          ctf_next_t **next, int64_t *enum_value,
>                                          ctf_dict_t **dict);
> 
> (but do we want to return the *enum type*, or the value? I guess the
> type, as you do above, because you can do more with it.)
> 

On first look, I, too, had a similar question.  I was about to question 
whether the return type of const char * is more appropriate for 
ctf_lookup_enumerator () (a return value of NULL seemed to indicate 
better when lookup was unsuccessful).   But, then I realized that there 
is precedent to this: most other libctf lookup APIs return a CTF_ERR for 
ctf_id_t if the lookup is unsuccessful with appropriate errno set.  So 
it seems that return type of ctf_id_t not only may work better but also 
helps keep symmetry across lookup APIs.

> ctf_lookup_enumerator has an annoying naming difference from the
> one-enum-type existing version, ctf_enum_value, but so be it. It's too
> common an operation to force people to use iterators every time.
> 
> One question remains about ctf_lookup_enumerator: if it finds more than
> one enumerator with that name (i.e. an old dict, before the deduplicator
> considered such things conflicts), should we error or just pick one to
> return? I'm wavering towards erroring with ECTF_DUPLICATE so the caller
> can switch to using the iterator (or just choose not to, since such
> cases are rare even now and will get rarer).
> 

It's a tricky one.  I thought about this, but didnt manage to settle one 
way or another.  Perhaps the additional complexity the ECTF_DUPLICATE 
(or should it be ECTF_ENUM_DUPLICATE?) support requires is a way to make 
a choice ?

>> The reason this feels so important for me is that, from a debugger's
>> perspective, we don't frequently know which dictionary to search.
> 
> Yes indeed. I want to provide better "find an appropriate dict holding
> this conflicting THING" functions, but I was holding off doing that
> until after v4 and until I could think up a way to let the caller impose
> an ordering over dicts -- in every case I know of so far the caller
> knows that some child dicts, if they exist, should take priority over
> others if the THING is found in them, so a function doing the search in
> an arbitrary order would be useless.
> 
>> Frequently a user is just saying "give me the constant FOO", with no
>> scope or anything to give a hint. Looking up a constant at the dict
>> level is good enough for the 95% of the time when the constant lives in
>> the parent dict. But in the remaining 5% it really stinks that you would
>> need to go through each dict and re-do the search (which would likely
>> repeat the failed search in the parent dict).
> 
> Hell yes.
> 
>>   => Note: with these archive-level lookups, though, it would be really
>>   nice to avoid returning a brand new ctf_dict_t *. I don't know how the
>>   semantics would work: only search dictionaries that already have an
>>   open handle? Reference count the dictionaries and simply return the
>>   same one if it's already open?
> 
> This is already handled :) dicts are alrady refcounted (which is where
> the huge memory leak you spotted a while back came from...). We cache
> the dicts internally to the ctf_archive_t, so you can act as if they're
> new dicts, close them when you're done, and not pay any of the opening
> costs except for the first time.
> 
> All this is done for you for all dicts returned from ctf_archive_next,
> so everything that uses that to walk over dicts will get automatic
> caching. See ctf-archive.c:ctf_dict_open_cached.
> 
>> If you do stick with your proposed API, which is good too, then I
> 
> Nah, yours is much better! I was just seduced by the previously
> impossible operation of looking up enumerator constants given values :)
> but as noted it's also a pretty useless operation most of the time, so
> we can put it off until we actually have a use for it (and a better
> interface, and a way to turn #defines into something similar).
> 
>> Hopefully something in my ramblings above proves helpful!
> 
> Absolutely!
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libctf: new enum-related API functions: request for better names
  2024-05-22 21:09     ` Indu Bhagat
@ 2024-06-06 16:00       ` Nick Alcock
  0 siblings, 0 replies; 8+ messages in thread
From: Nick Alcock @ 2024-06-06 16:00 UTC (permalink / raw)
  To: Indu Bhagat; +Cc: Stephen Brennan, jose.marchesi, binutils

On 22 May 2024, Indu Bhagat uttered the following:

> On 5/21/24 3:13 AM, Nick Alcock wrote:
>> Very nice! I hereby ditch my design and switch to these, with one slight
>> rearrangement of parameters to satisfy the not-terribly-well-documented
>> parameter ordering rule for _next iterators ("dict, query, iterator,
>> out"):
>> 
>
> FWIW, I too like the above proposal by Steve better: the naming of the API and the aspect of including the minimal necessary ones
> for supporting enumerator name lookup / enumerator iteration look good.

Oh good, that's what I'm implementing :)

>> ctf_id_t ctf_lookup_enumerator_next(ctf_dict_t *, const char *,
>>                                      ctf_next_t **next, int64_t *enum_value);
>> ctf_id_t ctf_arc_lookup_enumerator_next(ctf_archive_t *, const char *,
>>                                          ctf_next_t **next, int64_t *enum_value,
>>                                          ctf_dict_t **dict);
>> (but do we want to return the *enum type*, or the value? I guess the
>> type, as you do above, because you can do more with it.)
>> 
>
> On first look, I, too, had a similar question.  I was about to question whether the return type of const char * is more appropriate
> for ctf_lookup_enumerator () (a return value of NULL seemed to indicate better when lookup was unsuccessful).   But, then I realized
> that there is precedent to this: most other libctf lookup APIs return a CTF_ERR for ctf_id_t if the lookup is unsuccessful with
> appropriate errno set.  So it seems that return type of ctf_id_t not only may work better but also helps keep symmetry across lookup
> APIs.

Yeah -- though, again, iterators should return whatever is "most useful"
as their actual return type, because all the others are clumsier to deal
with in (you *have* to declare an additional variable, it's not the
value of the expression etc). But in this case a type ID is probably
right anyway, so it's moot :)

>> ctf_lookup_enumerator has an annoying naming difference from the
>> one-enum-type existing version, ctf_enum_value, but so be it. It's too
>> common an operation to force people to use iterators every time.
>> One question remains about ctf_lookup_enumerator: if it finds more than
>> one enumerator with that name (i.e. an old dict, before the deduplicator
>> considered such things conflicts), should we error or just pick one to
>> return? I'm wavering towards erroring with ECTF_DUPLICATE so the caller
>> can switch to using the iterator (or just choose not to, since such
>> cases are rare even now and will get rarer).
>
> It's a tricky one. I thought about this, but didnt manage to settle
> one way or another. Perhaps the additional complexity the
> ECTF_DUPLICATE (or should it be ECTF_ENUM_DUPLICATE?) support requires
> is a way to make a choice ?

That's basically the conslusion I came to as well -- this case will just
get rarer and rarer over time, and making it error out in future is a
thing we can do if we need to.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libctf: new enum-related API functions: request for better names
  2024-05-22 19:50     ` Indu Bhagat
@ 2024-06-06 16:01       ` Nick Alcock
  0 siblings, 0 replies; 8+ messages in thread
From: Nick Alcock @ 2024-06-06 16:01 UTC (permalink / raw)
  To: Indu Bhagat; +Cc: Stephen Brennan, jose.marchesi, binutils

On 22 May 2024, Indu Bhagat said:
> On 5/21/24 3:13 AM, Nick Alcock wrote:
>>> But I can't imagine a case where:
>>>
>>>    a. I have an integer value, and I know it's an enum, but
>>>    b. I don't know which enum type it belongs to, and yet
>>
>> Indeed -- this alone suggests you have no idea what it's being passed
>> to, so where did you get it from? Usually in my case this is "memory
>> dumps but I'm not quite sure what type it is" and I want to know if some
>> huge mysterious magic number is actually an enum -- but for this to be
>> really useful we also need to translate #defines of integers into
>> something enum-like (a single big "enum" named "#DEFINE" perhaps, or
>> some other C-invalid name). Hmmmm...
>
> I am not convinced this should be done in CTF at all. This would fall
> in the category of supporting "debugging" in general (which opens up a
> whole different pandora box of other things), not "type inspection /
> introspection". IOW, strictly speaking this isn't type information.
> Having to use fake types is not something palatable either.

Yeah, true enough!

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-06-06 16:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-17 12:08 libctf: new enum-related API functions: request for better names Nick Alcock
2024-05-20 20:47 ` Stephen Brennan
2024-05-21 10:13   ` Nick Alcock
2024-05-22 10:31     ` Nick Alcock
2024-05-22 19:50     ` Indu Bhagat
2024-06-06 16:01       ` Nick Alcock
2024-05-22 21:09     ` Indu Bhagat
2024-06-06 16:00       ` Nick Alcock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).