From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com [IPv6:2607:f8b0:4864:20::82d]) by sourceware.org (Postfix) with ESMTPS id 2CFB23858C2C for ; Fri, 8 Jul 2022 07:23:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2CFB23858C2C Received: by mail-qt1-x82d.google.com with SMTP id k14so26138763qtm.3 for ; Fri, 08 Jul 2022 00:23:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hZLKkoTWrMkmwPFSSuwzF2Nb2rhNvI7BibZUSGgcD3E=; b=5NSBCRyvaO4AK3eLnBcKhdOWceoKRYKN5JDdZ8MHfNiwiV/lYu9pwMlbcMGKvG+hlw zBjlbFs9M4eiih49vl6PGk0NEj0E99IM26st6OcVOp9ehfT5UAKezX0Xq8S5NZayxsyF n8ihC2LmfFRrVNR3ZfIfShzCoetDX2ea1odw/6ErNaVv6gZWQ3xZiQlkfJZUEBwVUD/C Rvf9YFhL9BVZYtaqBLhCyo2rM8b9A11TZr0Adp/Bp9C04OwgZKfE5F5bVGylij7tF1Gw j0zuKjy5cmZWN2WlJjFGPXS6WOc8h9DQ/f+0aeIyTvLiyv2NOVSFm4AKpM1oJJ7LH5kv B33g== X-Gm-Message-State: AJIora+cedGDZkx6eF8lUUX+zo4Ip0j5W/q2soZigqFUMcrSOL/74iHf 3K1QNMZqhvCHNcgK9kBoqNl9K5AktBm0d9pH+LcQ7xGu3k0= X-Google-Smtp-Source: AGRyM1sHDR4R6KWW0kXMjkFIzK6ZEk60kwsc6xIM3LGWboh7LVy5oITjgtqtZNlgzGV4IHBiqjn9yecXAJneSJTqKEM= X-Received: by 2002:a05:6214:c83:b0:470:b3e3:c25e with SMTP id r3-20020a0562140c8300b00470b3e3c25emr1538296qvr.1.1657265036503; Fri, 08 Jul 2022 00:23:56 -0700 (PDT) MIME-Version: 1.0 References: <003b01d8929a$70be2cc0$523a8640$@nextmovesoftware.com> In-Reply-To: <003b01d8929a$70be2cc0$523a8640$@nextmovesoftware.com> From: Uros Bizjak Date: Fri, 8 Jul 2022 09:23:45 +0200 Message-ID: Subject: Re: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md. To: Roger Sayle Cc: "gcc-patches@gcc.gnu.org" , Segher Boessenkool Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, MEDICAL_SUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2022 07:23:59 -0000 On Fri, Jul 8, 2022 at 9:15 AM Roger Sayle wrote: > > > This patch adds support for x86's single-byte encoded stc (set carry flag) > and clc (clear carry flag) instructions to i386.md. > > The motivating example is the simple code snippet: > > unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) > { > return __builtin_ia32_addcarryx_u32 (1, a, b, c); > } > > which uses the target built-in to generate an adc instruction, adding > together A and B with the incoming carry flag already set. Currently > for this mainline GCC generates (with -O2): > > movl $1, %eax > addb $-1, %al > adcl %esi, %edi > setc %al > movl %edi, (%rdx) > movzbl %al, %eax > ret > > where the first two instructions (to load 1 into a byte register and > then add 255 to it) are the idiom used to set the carry flag. This > is a little inefficient as x86 has a "stc" instruction for precisely > this purpose. With the attached patch we now generate: > > stc > adcl %esi, %edi > setc %al > movl %edi, (%rdx) > movzbl %al, %eax > ret > > The central part of the patch is the addition of x86_stc and x86_clc > define_insns, represented as "(set (reg:CCC FLAGS_REG) (const_int 1))" > and "(set (reg:CCC FLAGS_REG) (const_int 0))" respectively, then using > x86_stc appropriately in the ix86_expand_builtin. > > Alas this change exposes two latent bugs/issues in the compiler. > The first is that there are several peephole2s in i386.md that propagate > the flags register, but take its mode from the SET_SRC rather than > preserve the mode of the original SET_DEST. The other, which is > being discussed with Segher, is that the middle-end's simplify-rtx > inappropriately tries to interpret/optimize MODE_CC comparisons, > converting the above adc into an add, as it mistakenly believes > (ltu:SI (const_int 1) (const_int 0))" is always const0_rtx even when > the mode of the comparison is MODE_CCC. > > I believe Segher will review (and hopefully approve) the middle-end > chunk of this patch independently, but hopefully this backend patch > provides the necessary context to explain why that change is needed. > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} with > no new failures. Ok for mainline? > > > 2022-07-08 Roger Sayle > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_builtin) : > Use new x86_stc or negqi_ccc_1 instructions to set the carry flag. > * config/i386/i386.md (x86_clc): New define_insn. > (x86_stc): Likewise, new define_insn to set the carry flag. > (*setcc_qi_negqi_ccc_1_): New define_insn_and_split to > recognize (and eliminate) the carry flag being copied to itself. > (neg_ccc_1): Renamed from *neg_ccc_1 for gen function. > (define_peephole2): Use match_operand of flags_reg_operand to > capture and preserve the mode of FLAGS_REG. > (define_peephole2): Likewise. > > * simplify-rtx.cc (simplify_const_relational_operation): Handle > case where both operands of a MODE_CC comparison have been > simplified to constant integers. > > gcc/testsuite/ChangeLog > * gcc.target/i386/stc-1.c: New test case. Please split out the peephole2 part of the patch. This part is pre-approved and should be committed independently of the main part of the patch. Thanks, Uros. > > > Thanks in advance (both Uros and Segher), > Roger > -- > > > -----Original Message----- > > From: Segher Boessenkool > > Sent: 07 July 2022 23:39 > > To: Roger Sayle > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH] Be careful with MODE_CC in > > simplify_const_relational_operation. > > > > Hi! > > > > On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote: > > > I think it's fair to describe RTL's representation of condition flags > > > using MODE_CC as a little counter-intuitive. > > > > "A little challenging", and you should see that as a good thing, as a > puzzle to > > crack :-) > > > > > For example, the i386 > > > backend represents the carry flag (in adc instructions) using RTL of > > > the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs to > > > be taken not to treat this like a normal RTX expression, after all LTU > > > (less-than-unsigned) against const0_rtx would normally always be > > > false. > > > > A comparison of a MODE_CC thing against 0 means the result of a > > *previous* comparison (or other cc setter) is looked at. Usually it > simply looks > > at some condition bits in a flags register. It does not do any actual > comparison: > > that has been done before (if at all even). > > > > > Hence, MODE_CC comparisons need to be treated with caution, and > > > simplify_const_relational_operation returns early (to avoid > > > problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC. > > > > Not just to avoid problems: there simply isn't enough information to do a > > correct job. > > > > > However, consider the (currently) hypothetical situation, where the > > > RTL optimizers determine that a previous instruction unconditionally > > > sets or clears the carry flag, and this gets propagated by combine > > > into the above expression, we'd end up with something that looks like > > > (ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says. > > > Fortunately, simplify_const_relational_operation is passed the > > > original mode of the comparison (cmp_mode, the original mode of op0) > > > which can be checked for MODE_CC, even when op0 is now VOIDmode > > > (const_int) after the substitution. Defending against this is clearly > > > the right thing to do. > > > > > > More controversially, rather than just abort > > > simplification/optimization in this case, we can use the comparison > > > operator to infer/select the semantics of the CC_MODE flag. > > > Hopefully, whenever a backend uses LTU, it represents the (set) carry > > > flag (and behaves like i386.md), in which case the result of the > simplified > > expression is the first operand. > > > [If there's no standardization of semantics across backends, then we > > > should always just return 0; but then miss potential optimizations]. > > > > On PowerPC, ltu means the result of an unsigned comparison (we have > > instructions for that, cmpl[wd][i] mainly) was "smaller than". It does > not mean > > anything is unsigned smaller than zero. It also has nothing to do with > carries, > > which are done via a different register (the XER). > > > > > + /* Handle MODE_CC comparisons that have been simplified to > > > + constants. */ > > > + if (GET_MODE_CLASS (mode) == MODE_CC > > > + && op1 == const0_rtx > > > + && CONST_INT_P (op0)) > > > + { > > > + /* LTU represents the carry flag. */ > > > + if (code == LTU) > > > + return op0 == const0_rtx ? const0_rtx : const_true_rtx; > > > + return 0; > > > + } > > > + > > > /* We can't simplify MODE_CC values since we don't know what the > > > actual comparison is. */ > > > > ^^^ > > This comment is 100% true. We cannot simplify any MODE_CC comparison > > without having more context. The combiner does have that context when it > > tries to combine the CC setter with the CC consumer, for example. > > > > Do you have some piece of motivating example code? > > > > > > Segher