From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mjambor@suse.cz>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id 90EA83858C83
 for <gcc@gcc.gnu.org>; Wed,  9 Mar 2022 16:15:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 90EA83858C83
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=suse.cz
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out1.suse.de (Postfix) with ESMTP id 6920121115;
 Wed,  9 Mar 2022 16:15:13 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa;
 t=1646842513; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
 mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=jnit/J+cfsVgHi18W/6W6RGXBVOUnsavUt1oEZtmJ0M=;
 b=qrTdCHIG4EK2x0MDBchE6n4sXgEgfg1nvJfsnecNMwqmRSvKUAzXTsKS8oW3i0mo5ZQJku
 pr9jXYqbDM7cLjRi9xlZIbrN9iMM/2gi8io9WamIPlZET5HN7S5epqRIvx/UldXyZ6ikOr
 94bglGUDpawGh7Aui5bFWw2fk9HLseE=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz;
 s=susede2_ed25519; t=1646842513;
 h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
 mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=jnit/J+cfsVgHi18W/6W6RGXBVOUnsavUt1oEZtmJ0M=;
 b=u3MxH2BFpgFEtzY1zqVZ4bDAqYGB1UHiuPr/4Tk62ELEIUZc3oBF9WckDdd5JLZ6IxGctc
 W4thsbGrsOJcwfDA==
Received: from suse.cz (virgil.suse.cz [10.100.13.50])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id 4A409A3B81;
 Wed,  9 Mar 2022 16:15:13 +0000 (UTC)
From: Martin Jambor <mjambor@suse.cz>
To: Juan Scerri <juan.scerri.21@um.edu.mt>
Cc: GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: GSoC (Make cp-demangle non-recursive)
In-Reply-To: <CACJSivvYgNG+C2Y5ph=4yv5y0iaV+Ak7=vh3FGD7W6eQADxk9w@mail.gmail.com>
References: <CACJSivv=BmcxsSoM8o8q_82VQ9LDSRC-2_HBsb_AowP+Scir7A@mail.gmail.com>
 <ri65yowqjzc.fsf@suse.cz>
 <CACJSivvYgNG+C2Y5ph=4yv5y0iaV+Ak7=vh3FGD7W6eQADxk9w@mail.gmail.com>
User-Agent: Notmuch/0.34.1 (https://notmuchmail.org) Emacs/27.2
 (x86_64-suse-linux-gnu)
Date: Wed, 09 Mar 2022 17:15:13 +0100
Message-ID: <ri6zglzm7am.fsf@suse.cz>
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Mar 2022 16:15:16 -0000

Hello Juan,

please remember to CC the list when discussing technical problems, ideas
and approaches.  Others may help you also, perhaps even better than I do.

On Thu, Mar 03 2022, Juan Scerri wrote:
> Hi Martin,
>
> Firstly, thank you for your response.
>
> I think I have a good knowledge of C. It is part of my course at
> university.
>
> Moreover, I have figured out that the code which I need to change is in
> libiberty and more specifically cp-demangle.c and cp-demangle.h. Thank you
> for
> pointing out that I can view binutils. I will give the repository a look
> since
> compiling gcc takes a very long time, even with 16 threads.
>
> I have also read some of the code and I have an idea as to what I can do.
>
> In the code there are certain functions which use recursion which are very
> trivial to convert to an iterative function for example:
>
> static int
> has_return_type (struct demangle_component *dc)
> {
>   if (dc == NULL)
>     return 0;
>   switch (dc->type)
>     {
>   default:
>     return 0;
>   case DEMANGLE_COMPONENT_LOCAL_NAME:
>     return has_return_type (d_right (dc));
>   case DEMANGLE_COMPONENT_TEMPLATE:
>     return ! is_ctor_dtor_or_conversion (d_left (dc));
>   FNQUAL_COMPONENT_CASE:
>     return has_return_type (d_left (dc));
>     }
> }
>
> This can be converted into an iterative function through the introduction
> of a
> while-loop or a for-loop (whichever one is preferred in the codebase) as
> shown
> below.
>
> static int
> has_return_type (struct demangle_component *dc)
> {
>   while (1) {
>     if (dc == NULL)
>       return 0;
>     switch (dc->type)
>       {
>       default:
>         return 0;
>       case DEMANGLE_COMPONENT_LOCAL_NAME:
>         dc = d_right (dc);
>         continue;
>       case DEMANGLE_COMPONENT_TEMPLATE:
>         return ! is_ctor_dtor_or_conversion (d_left (dc));
>       FNQUAL_COMPONENT_CASE:
>         dc = d_left (dc);
>         continue;
>       }
>   }
> }

Right, those are the easy cases, the project of course is about the
non-trivial ones.

>
> For the more complicated functions like `d_print_comp` and
> `d_print_comp_inner`, there is very large number of cases. Moreover,
> different
> cases can have multiple recursive calls to `d_print_comp`. My idea to tackle
> these is to keep track of the number of calls which would be made if the
> algorithm
> was recursive. Basically, the count starts at 1 and for:
>
>> a base case we subtract 1
>> a single call we add 0
>> two calls we add 1
>> three calls we add 2
>> and so on.
>
> However, this is not enough. We need to deal with the instructions in
> between
> calls and afterwards. The best structure to keep track of the work which
> needs
> to be done is a stack (as hinted in the website). This is the implementation
> which I came up with (it is heavily inspired by `d_growable_string`):

You are correct about the stack, you definitely need a stack to drive
the whole process, I am not sure I understand the call counting above or
what purpose it would serve.

>
> #define EMPTY 0
>
> /* Stack of operations to do after traversing a full branch. */
> struct d_work_stack {
>   /* Base pointer. */
>   int *bp;
>   /* Current length of data on the stack. */
>   size_t len;
>   /* Allocated size of buffer. */
>   size_t alc;
>   /* Set to 1 if we had a memory allocation failure. */
>   int allocation_failure;
> };
>
> static void d_work_stack_init(struct d_work_stack *dws, size_t estimate) {
>   dws->bp = NULL;
>   dws->len = 0;
>   dws->alc = 0;
>   dws->allocation_failure = 0;
>
>   if (estimate > 0)
>     d_work_stack_resize(dws, estimate);
> }
>
> static inline void d_work_stack_resize(struct d_work_stack *dws, size_t
> need) {
>   size_t newalc;
>   int *newbp;
>
>   if (dws->allocation_failure)
>     return;
>
>   /* Start allocation at two bytes to avoid any possibility of confusion
>      with the special value of 1 used as a retrun to indicate
>      allocation failures.  */
>   newalc = dws->alc > 0 ? dws->alc : 2;
>   while (newalc < need)
>     newalc <<= 1;
>
>   newbp = (int *)realloc(dws->bp, newalc * sizeof(int));
>   if (newbp == NULL) {
>     free(dws->bp);
>     dws->bp = NULL;
>     dws->len = 0;
>     dws->alc = 0;
>     dws->allocation_failure = 1;
>     return;
>   }
>   dws->bp = newbp;
>   dws->alc = newalc;
> }
>
> static void d_work_stack_push(struct d_work_stack *dws, int dw) {
>   size_t need = dws->len + 1;
>   if (need > dws->alc)
>     d_work_stack_resize(dws, need);
>
>   if (dws->allocation_failure)
>     return;
>
>   dws->bp[(dws->len)++] = dw;
> }
>
> static int d_work_stack_pop(struct d_work_stack *dws) {
>   if (dws->len == 0)
>     return EMPTY;
>
>   return dws->bp[--(dws->len)];
> }
>
> static void d_work_stack_deinit(struct d_work_stack *dws) {
>   if (dws->bp != NULL) {
>     free(dws->bp);
>     dws->bp = NULL;
>   }
>
>   dws->len = 0;
>   dws->alc = 0;
>   dws->allocation_failure = 0;
> }

Note that the stack will have to hold intermediate data for many kinds
of mutually recursive functions in the demangler, so (either directly or
conceptually) it will have to hold items - probably structures - of
different types.  I do not think integers will cut it.

In the past when we were discussing the project, it was helpful to look
at function d_parmlist.  Making the indirect recursion explicit means
replacing the recursive calls, such as the call to cplus_demangle_type,
with a push on the stack of some structure encoding that a type has to
be demangled and immediately returning.  But that means that the rest of
the work has to be done later (when this and more stuff pushed on the
stack gets popped and the driving loop gets back to the same point in
the stack) and there is a lot of state - stored in local variables now -
which has to be carried over.  That state needs to be in the stack
(describing the current task) too, hence the need for a richer type.

>
> The only thing I need to iron out is the best way to use this structure. My
> crude idea at the moment is to give code which needs to be executed, after
> or in
> between recursive calls, an index. Then using a switch the algorithm can
> decide
> what to do depending on the index (only if the count is still greater than 0
> and the end of a branch was reached).
>
> Note: that you should think of a `branch` as the branch of the tree formed
> if
> you visualise the recursive calls until completion.
>
> However, this might not be the best approach. I might consider using the
> stack
> to hold function pointers instead.

I agree that function pointers are likely to be useful.  But note that often
you'll need to encode both something to identify the kind of stuff that
is being demangled (type, parmlist, component...) and the state how far
you already got (from what point you need to continue), so you may want
to use both.

Hope this is useful, good luck.

Martin