From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2c.google.com (mail-oa1-x2c.google.com [IPv6:2001:4860:4864:20::2c]) by sourceware.org (Postfix) with ESMTPS id 6C0DB3858D1E for ; Wed, 15 Feb 2023 03:09:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6C0DB3858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-oa1-x2c.google.com with SMTP id 586e51a60fabf-16e55be7c76so122422fac.6 for ; Tue, 14 Feb 2023 19:09:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=D1wGmEY8f3j6xsGuO11I772KqZZpgNuCOGQlCeQUbFI=; b=RxTmFkMp9Fn7/rXmvBDoNANAHevfiiLc8YyW/fiM+vGuoD3dpx8NYIjcPKsE2CloeQ 7aQLCrRCgKUWQJ00vpsNm+qxzp6r+myK0ZBcI5Qw4IWh7XqQq6QxEIsie5NwWAJVp+8+ SC0OrVDbvnXCeGtt2BkIly9gKW89PRkDxy5TsNnSbncpDte8RL5NpUqSk/fA6WmKHjZH OgI62MRzqileX8k0edRVz6iWQTxLy0rFcPym5Y6zUW/znXIUveDdsoFp3nf7oZ6ZZagU /WE1Guj6yu9mtbWapkDSLURV8xv858/uloVADeqehPchs+2qfoWDq+ZuS3278Qg7opS2 tEFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=D1wGmEY8f3j6xsGuO11I772KqZZpgNuCOGQlCeQUbFI=; b=pzCKhHCotH+cxyaj+sI6XJ7rieQ/shfsPxgA9efta0tXb1nVKqq2GnP+btFSkZJLbM pRX76yf99MoGaNppPaSo2GKmeUBH8LxTAOmTvqGp3epd5V0yNXIHdKwc4oc78mJARfgy UGX3Y3BmXZ4HMY/t8gkaXOpIjw6zN4Bvq2zWX8tD9CvJeRNB7iMOsZ/Ykg9WTgOYYODb HPcLmEjL8z1y/Sv8v9850o673aGmIHiNRktT15mHfvgX7foXxjtOX7x/lp8+9y6ZlneJ j4hMuxQThR0bRxgI+RZ6g7M/Pyw9hfHMstC3NIpfL85OrehJ+ZGlS2EE4ulglSl9YAfs OuCQ== X-Gm-Message-State: AO0yUKXpM6L/DBMdTUF+lE6IUqrWaFlr/qXbJD0DqU7R3V84cCJy9BlF VkHyAd+vJpIKcYOp4z3anNbETBLtChEWjnUR3NZaEQ== X-Google-Smtp-Source: AK7set9maTyBkhPcnzsOjTsjeziSur6NYC+itnXrnaNNsOXzF5Tk1THXgHZH96TW2IyeQ0piJNZSTOqEIldkNgXQIlc= X-Received: by 2002:a05:6871:614:b0:16d:ff6a:4870 with SMTP id w20-20020a056871061400b0016dff6a4870mr203098oan.244.1676430551747; Tue, 14 Feb 2023 19:09:11 -0800 (PST) MIME-Version: 1.0 References: <2df3478a-3cd9-d7b5-670e-60683ae3066e@suse.com> <3bb126f6-5bf1-dab0-f540-25c386b9507e@suse.com> In-Reply-To: <3bb126f6-5bf1-dab0-f540-25c386b9507e@suse.com> From: Nelson Chu Date: Wed, 15 Feb 2023 11:09:00 +0800 Message-ID: Subject: Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols To: Jan Beulich Cc: Binutils , Palmer Dabbelt , Andrew Waterman , Jim Wilson Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Though the logic becomes kind of complicated to me, you are more familiar with other targets than me, so maybe this is the right way to do it. Basically, the logic looks correct and fine to me, but I still need to run regressions in case of an accident. Just make sure that - 1. The deferred_sym_rootP are the symbols that have the same names as GPR (or FPR, VPR in the future maybe), but we still need to add them into the symbol table? 2. Even if the expression has the same name as GPR, only my_getSmallExpression is possible to set exp to O_register, but my_getExpression won't have, so the probing_insn_operands is used to distinguish between them? Thanks Nelson On Mon, Feb 13, 2023 at 4:02 PM Jan Beulich wrote: > > Special casing GPR names in my_getSmallExpression() leads to a number of > inconsistencies. Generalize this by utilizing the md_parse_name() hook, > limited to when instruction operands are being parsed (really: probed). > Then both the GPR lookup there and the yet more ad hoc workaround for > PR/gas 29940 can be removed (including its extension needed for making > the compressed form JAL work again). > --- > When a floating point extension (but not [hfdq]inx) is enabled, floating > point registers would perhaps better be recognized by riscv_parse_name() > as well. Same for vector registers and likely also CSRs. Otherwise > inconsistent behavior may (continue to) result when equates come into > play, afaict. > > Considering equates, the new behavior isn't the only sensible one: We > could also omit the symbol_find() call, thus never honoring equates. > Then, however, even with the "i" insn suffix (or sometimes infix) > explicitly specified, equates with names matching a GPR name (or, see > above, potentially also other register names) wouldn't be accepted > anymore. I guess the primary question here is what the intended (and > consistent) behavior is to be. > > Overall the uses of my_getSmallExpression() vs my_getExpression() look > pretty random to me: Is there any underlying principle when which of the > two is expected to be used? The my_getSmallExpression is used when we expect the immediate operand with % prefix, and those % prefixes are used to insert different relocations. If there are other exceptions, then we need to fix them. > --- a/gas/config/tc-riscv.c > +++ b/gas/config/tc-riscv.c > @@ -171,6 +171,8 @@ static enum float_abi float_abi = FLOAT_ > > static unsigned elf_flags = 0; > > +static unsigned probing_insn_operands; Maybe add some comments to mention that "the probing_insn_operands is used to distinguish between my_getSmallExpression and my_getExpression", or other comments, is good for understanding. > + > /* Set the default_isa_spec. Return 0 if the spec isn't supported. > Otherwise, return 1. */ > > @@ -2228,21 +2230,9 @@ my_getSmallExpression (expressionS *ep, > char *str, const struct percent_op_match *percent_op) > { > size_t reloc_index; > - unsigned crux_depth, str_depth, regno; > + unsigned crux_depth, str_depth; > char *crux; > > - /* First, check for integer registers. No callers can accept a reg, but > - we need to avoid accidentally creating a useless undefined symbol below, > - if this is an instruction pattern that can't match. A glibc build fails > - if this is removed. */ > - if (reg_lookup (&str, RCLASS_GPR, ®no)) > - { > - ep->X_op = O_register; > - ep->X_add_number = regno; > - expr_parse_end = str; > - return 0; > - } > - > /* Search for the start of the main expression. > > End the loop with CRUX pointing to the start of the main expression and > @@ -2265,9 +2255,15 @@ my_getSmallExpression (expressionS *ep, > && reloc_index < 1 > && parse_relocation (&str, reloc, percent_op)); > > + if (probing_insn_operands) > + ++probing_insn_operands; > + > my_getExpression (ep, crux); > str = expr_parse_end; > > + if (probing_insn_operands) > + --probing_insn_operands; > + > /* Match every open bracket. */ > while (crux_depth > 0 && (*str == ')' || *str == ' ' || *str == '\t')) > if (*str++ == ')') > @@ -2453,6 +2449,13 @@ riscv_is_priv_insn (insn_t insn) > || ((insn ^ MATCH_SFENCE_VM) & MASK_SFENCE_VM) == 0); > } > > +static symbolS *deferred_sym_rootP; > +static symbolS *deferred_sym_lastP; > +/* Since symbols can't easily be freed, try to recycle ones which weren't > + committed. */ > +static symbolS *orphan_sym_rootP; > +static symbolS *orphan_sym_lastP; > + > /* This routine assembles an instruction into its binary format. As a > side effect, it sets the global variable imm_reloc to the type of > relocation to do if one of the operands is an address expression. */ > @@ -2488,6 +2491,8 @@ riscv_ip (char *str, struct riscv_cl_ins > > insn = (struct riscv_opcode *) str_hash_find (hash, str); > > + probing_insn_operands = 1; > + > asargStart = asarg; > for ( ; insn && insn->name && strcmp (insn->name, str) == 0; insn++) > { > @@ -2504,6 +2509,17 @@ riscv_ip (char *str, struct riscv_cl_ins > /* Reset error message of the previous round. */ > error.msg = _("illegal operands"); > error.missing_ext = NULL; > + > + /* Purge deferred symbols from the previous round, if any. */ > + while (deferred_sym_rootP) > + { > + symbolS *sym = deferred_sym_rootP; > + > + symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP); > + symbol_append (sym, orphan_sym_lastP, &orphan_sym_rootP, > + &orphan_sym_lastP); > + } > + > create_insn (ip, insn); > > imm_expr->X_op = O_absent; > @@ -2558,9 +2574,22 @@ riscv_ip (char *str, struct riscv_cl_ins > } > if (*asarg != '\0') > break; > + > /* Successful assembly. */ > error.msg = NULL; > insn_with_csr = false; > + > + /* Commit deferred symbols, if any. */ > + while (deferred_sym_rootP) > + { > + symbolS *sym = deferred_sym_rootP; > + > + symbol_remove (sym, &deferred_sym_rootP, > + &deferred_sym_lastP); > + symbol_append (sym, symbol_lastP, &symbol_rootP, > + &symbol_lastP); > + symbol_table_insert (sym); > + } > goto out; > > case 'C': /* RVC */ > @@ -2764,8 +2793,6 @@ riscv_ip (char *str, struct riscv_cl_ins > case 'p': > goto branch; > case 'a': > - if (oparg == insn->args + 1) > - goto jump_check_gpr; > goto jump; > case 'S': /* Floating-point RS1 x8-x15. */ > if (!reg_lookup (&asarg, RCLASS_FPR, ®no) > @@ -3270,18 +3297,6 @@ riscv_ip (char *str, struct riscv_cl_ins > continue; > > case 'a': /* 20-bit PC-relative offset. */ > - /* Like in my_getSmallExpression() we need to avoid emitting > - a stray undefined symbol if the 1st JAL entry doesn't match, > - but the 2nd (with 2 operands) might. */ > - if (oparg == insn->args) > - { > - jump_check_gpr: > - asargStart = asarg; > - if (reg_lookup (&asarg, RCLASS_GPR, NULL) > - && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ','))) > - break; > - asarg = asargStart; > - } > jump: > my_getExpression (imm_expr, asarg); > asarg = expr_parse_end; > @@ -3504,6 +3519,8 @@ riscv_ip (char *str, struct riscv_cl_ins > if (save_c) > *(asargStart - 1) = save_c; > > + probing_insn_operands = 0; > + > return error; > } > > @@ -3800,6 +3817,62 @@ riscv_after_parse_args (void) > flag_dwarf_cie_version = 3; > } The comment of ppc said "This function is called for each symbol seen in an expression. It handles the special parsing ...", looks good and more easy to understand the logic, so could we also have a similar description detailed? > +bool riscv_parse_name (const char *name, struct expressionS *ep, > + enum expr_mode mode) > +{ > + unsigned int regno; > + > + if (!probing_insn_operands) > + return false; > + > + gas_assert (mode == expr_normal); > + > + regno = reg_lookup_internal (name, RCLASS_GPR); > + if (regno == (unsigned int)-1) > + return false; > + > + if (symbol_find (name) != NULL) > + return false; > + > + if (probing_insn_operands > 1) > + { > + ep->X_op = O_register; > + ep->X_add_number = regno; > + } > + else > + { > + /* Create a symbol without adding it to the symbol table yet. > + Insertion will happen only once we commit to using the insn > + we're probing operands for. */ > + symbolS *sym; > + > + for (sym = deferred_sym_rootP; sym; sym = symbol_next (sym)) > + if (strcmp (name, S_GET_NAME (sym)) == 0) > + break; > + if (!sym) > + { > + for (sym = orphan_sym_rootP; sym; sym = symbol_next (sym)) > + if (strcmp (name, S_GET_NAME (sym)) == 0) > + { > + symbol_remove (sym, &orphan_sym_rootP, &orphan_sym_lastP); > + break; > + } > + if (!sym) > + sym = symbol_create (name, undefined_section, > + &zero_address_frag, 0); > + > + symbol_append (sym, deferred_sym_lastP, &deferred_sym_rootP, > + &deferred_sym_lastP); > + } > + > + ep->X_op = O_symbol; > + ep->X_add_symbol = sym; > + ep->X_add_number = 0; > + } > + > + return true; > +} > + > long > md_pcrel_from (fixS *fixP) > { > --- a/gas/config/tc-riscv.h > +++ b/gas/config/tc-riscv.h > @@ -123,6 +123,10 @@ extern void riscv_elf_final_processing ( > /* Adjust debug_line after relaxation. */ > #define DWARF2_USE_FIXED_ADVANCE_PC 1 > > +#define md_parse_name(name, exp, mode, c) \ > + riscv_parse_name (name, exp, mode) > +bool riscv_parse_name (const char *, struct expressionS *, enum expr_mode); > + > #define md_finish riscv_md_finish > #define CONVERT_SYMBOLIC_ATTRIBUTE riscv_convert_symbolic_attribute > >