From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id 9EB6E385E019 for ; Wed, 20 Dec 2023 07:02:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9EB6E385E019 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9EB6E385E019 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::432 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703055725; cv=none; b=qg5Si8B17Gc/HWckC3yY2ECZuOXTURSbJYtDBYxuGnLW/CnZKvTOUwJpztSUGPjgPqyHhly3OBP6Rx/lWQkS+CL9XsmyC0DpDUGdqMajcqm4apgvQcjU5Q/OS0cwUqbT4Ncogqdre0ZjtQZTdMvUG4OujIPEuvDfia41ZL+9Hmc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703055725; c=relaxed/simple; bh=VSGSEPCj0Cd3g+1EqXQpXK4e/FWwDuhEyh6+p8xGkoc=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=XNsA94jTO3zRYBMUPP1a9rv/kHkZbJ04S5VnsQgDaZ2kre6lV4d2aRVse6X0DwyCIG95a+/wMnoT6k6cQA0lqY3HTJ41XrJAIOV+RKongrK5PYKpiLfx7cB2rPokO11LT18W37E9iY1kelymwEVpeJXHexl3lVfFuPk6irSZlD0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x432.google.com with SMTP id ffacd0b85a97d-33660cf2296so4450646f8f.3 for ; Tue, 19 Dec 2023 23:02:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1703055721; x=1703660521; darn=gcc.gnu.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=a46zeiBjdL1ej26OZ9S74DWd9qD/P1HDJWZJezPRFPk=; b=P7WUxQtmaMfxTfa7ZtU3Oo/x/ThS0pRp1hbDY1aOyUvhsrsSzfDlAkAvI8EOX0A1BH 5zkdLEg2+CFt0e+gBK85492e9zV4CgDF/jsD9NspuOYuDbnnuSRIY9jiFnUcxKc3a68y KtDcIpKsWkw+vNAan10qKDB6UZCuOVQDs/HJQfGbZhwChaIka4otEXTo9hmSR1AnTZTP y1DKHdzpElU3bTkWjl54ahfSezgtqO8L0nRnCIwe1Pc1s/Oq9EOr/cXC3TZ+/F89sCRH hjArxKZRp1579tUQJP53pcpwMEpxnCZ703quInAsEfzCopxTPh8bF9j3CwBZ92WoyZaK 0ktw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703055721; x=1703660521; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=a46zeiBjdL1ej26OZ9S74DWd9qD/P1HDJWZJezPRFPk=; b=Fwc0QHK0iFx+UAD/Y01CxaJAzNctu69ytSKWpgxT51c8HS8445Qi1ywNiZCFhrbdOh 15i1dS4U2FZybEtFyWSGTE5Saxwf06UCrV9/BGDxKt/XbFfu5jUG7X95NDBsJeMZbceV 75I8Y3V78YW5b0jaks8POeFE858Dxiel36iso1fMIQhGafok0THA1aeqVvSeL0cBNXXm pN5SqpjnClViOkP5pEfYnP4mNrjiADwv8UtaKsmzzhuQt68IyCWkdQuBCufugPVgQtCs cbb4xzL7qUclIkKe80g3sifTNGjfYIvhU3fhxieLrOXpu0NklYCDrMbFMxYEu06QVS2F fgbQ== X-Gm-Message-State: AOJu0Yy8TmpjKOFLY2oFCKhErrwpLWmKIaLHBpSimoDpWsDTj7102IsZ aCZ5mV09H/Tf1KuOJivLie4cSWnrQyXZc0De1lsFXvkPH8d4ci3cAdU= X-Google-Smtp-Source: AGHT+IFzneEV0nijcr4i+K0Pi1rvGZS+oi5/5N3hTWWYxn1aJGjAXufMtEnp+lpZJHenN3go3QOnucQTJqrOzBKFUfY= X-Received: by 2002:adf:f889:0:b0:336:5ddc:f474 with SMTP id u9-20020adff889000000b003365ddcf474mr4425692wrp.51.1703055720954; Tue, 19 Dec 2023 23:02:00 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Wed, 20 Dec 2023 12:31:24 +0530 Message-ID: Subject: Re: [aarch64] PR111702 - ICE in insert_regs after interleave+zip1 vector initialization patch To: Richard Sandiford , gcc Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 4 Dec 2023 at 14:44, Prathamesh Kulkarni wrote: > > On Thu, 23 Nov 2023 at 17:06, Prathamesh Kulkarni > wrote: > > > > Hi Richard, > > For the test-case mentioned in PR111702, compiling with -O2 > > -frounding-math -fstack-protector-all results in following ICE during > > cse2 pass: > > > > test.c: In function 'foo': > > test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120 > > 119 | } > > | ^ > > 0xb7ebb0 insert_regs > > ../../gcc/gcc/cse.cc:1120 > > 0x1f95134 merge_equiv_classes > > ../../gcc/gcc/cse.cc:1764 > > 0x1f9b9ab cse_insn > > ../../gcc/gcc/cse.cc:4793 > > 0x1f9fe30 cse_extended_basic_block > > ../../gcc/gcc/cse.cc:6577 > > 0x1f9fe30 cse_main > > ../../gcc/gcc/cse.cc:6722 > > 0x1fa0984 rest_of_handle_cse2 > > ../../gcc/gcc/cse.cc:7620 > > 0x1fa0984 execute > > ../../gcc/gcc/cse.cc:7675 > > > > This happens only with interleave+zip1 vector initialization with > > -frounding-math -fstack-protector-all, while it compiles OK without > > -fstack-protector-all. Also, it compiles OK with fallback sequence > > code-gen (with or without -fstack-protector-all). Unfortunately, I > > haven't been able to reduce the test-case further :/ > > > > From the test-case, it seems only the vector initializer for type J > > uses interleave+zip1 approach, while rest of the vector initializers > > use fallback sequence. > > > > J is defined as: > > typedef _Float16 __attribute__((__vector_size__ (16))) J; > > > > and the initializer is: > > (J) { 11654, 4801, 5535, 9743, 61680} > > > > interleave+zip1 sequence for above initializer J: > > mode = V8HF > > > > vals: (parallel:V8HF [ > > (reg:HF 642) > > (reg:HF 645) > > (reg:HF 648) > > (reg:HF 651) > > (reg:HF 654) > > (const_double:HF 0.0 [0x0.0p+0]) repeated x3 > > ]) > > > > target: (reg:V8HF 641) > > seq: > > (insn 1058 0 1059 (set (reg:V4HF 657) > > (const_vector:V4HF [ > > (const_double:HF 0.0 [0x0.0p+0]) repeated x4 > > ])) "test.c":81:8 -1 > > (nil)) > > (insn 1059 1058 1060 (set (reg:V4HF 657) > > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642)) > > (reg:V4HF 657) > > (const_int 1 [0x1]))) "test.c":81:8 -1 > > (nil)) > > (insn 1060 1059 1061 (set (reg:V4HF 657) > > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648)) > > (reg:V4HF 657) > > (const_int 2 [0x2]))) "test.c":81:8 -1 > > (nil)) > > (insn 1061 1060 1062 (set (reg:V4HF 657) > > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654)) > > (reg:V4HF 657) > > (const_int 4 [0x4]))) "test.c":81:8 -1 > > (nil)) > > (insn 1062 1061 1063 (set (reg:V4HF 658) > > (const_vector:V4HF [ > > (const_double:HF 0.0 [0x0.0p+0]) repeated x4 > > ])) "test.c":81:8 -1 > > (nil)) > > (insn 1063 1062 1064 (set (reg:V4HF 658) > > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645)) > > (reg:V4HF 658) > > (const_int 1 [0x1]))) "test.c":81:8 -1 > > (nil)) > > (insn 1064 1063 1065 (set (reg:V4HF 658) > > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651)) > > (reg:V4HF 658) > > (const_int 2 [0x2]))) "test.c":81:8 -1 > > (nil)) > > (insn 1065 1064 0 (set (reg:V8HF 641) > > (unspec:V8HF [ > > (subreg:V8HF (reg:V4HF 657) 0) > > (subreg:V8HF (reg:V4HF 658) 0) > > ] UNSPEC_ZIP1)) "test.c":81:8 -1 > > (nil)) > > > > It seems to me that the above sequence correctly initializes the > > vector into r641 ? > > insns 1058-1061 construct r657 = { r642, r648, r654, 0 } > > insns 1062-1064 construct r658 = { r645, r651, 0, 0 } > > and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 } > > > > For the above test, it seems that with interleave+zip1 approach and > > -fstack-protector-all, > > in cse pass, there are two separate equivalence classes created for > > (const_int 1), that need > > to be merged in cse_insn: > > > > if (elt->first_same_value != src_eqv_elt->first_same_value) > > { > > /* The REG_EQUAL is indicating that two formerly distinct > > classes are now equivalent. So merge them. */ > > merge_equiv_classes (elt, src_eqv_elt); > > > > elt equivalence chain: > > Equivalence chain for (subreg:QI (reg:V16QI 671) 0): > > (subreg:QI (reg:V16QI 671) 0) > > (const_int 1 [0x1]) > > > > src_eqv_elt equivalence chain: > > Equivalence chain for (const_int 1 [0x1]): > > (reg:QI 34 v2) > > (reg:QI 32 v0) > > (reg:QI 34 v2) > > (const_int 1 [0x1]) > > (vec_select:QI (reg:V16QI 671) > > (parallel [ > > (const_int 1 [0x1]) > > ])) > > (vec_select:QI (reg:V16QI 32 v0) > > (parallel [ > > (const_int 1 [0x1]) > > ])) > > (vec_select:QI (reg:V16QI 33 v1) > > (parallel [ > > (const_int 2 [0x2]) > > ])) > > (vec_select:QI (reg:V16QI 33 v1) > > (parallel [ > > (const_int 1 [0x1]) > > ])) > > > > The issue is that merge_equiv_classes doesn't seem to deal correctly with > > multiple occurences of same register in class2 (src_eqv_elt), which > > has two occurrences of > > (reg:QI 34 v2) > > > > In merge_equiv_classes, on first iteration, it will remove (reg:QI 34) > > from reg_equiv_table > > by calling delete_equiv_reg(34), and in insert_regs it will create an > > entry for (reg:QI 34) in qty_table with new quantity number, and > > create new equivalence in reg_eqv_table. > > > > When we again come across (reg:QI 34) in class2, it will > > unconditionally remove the register from reg_eqv_table, thus making > > REG_QTY(34) = -35, even tho (reg:QI 34) is now present in class1 > > chain. > > > > Then in insert_regs, we have: > > x: (reg:QI 34 v2) > > > > classp: > > (subreg:QI (reg:V16QI 671) 0) > > (reg:QI 34 v2) > > (const_int 1 [0x1]) > > > > And while iterating over elements in classp, we end up with regno == > > c_regno == 34. > > However, as mentioned above, merge_equiv_classes has deleted entry for > > (reg:QI 34) from reg_eqv_table, so it's no longer valid, and thus end > > up hitting the following assert: > > gcc_assert (REGNO_QTY_VALID_P (c_regno)); > > > > I am not sure tho why this is triggered only with interleave+zip1 approach with > > -fstack-protector-all. > > > > The attached (untested) patch is a workaround for the above issue -- > > In merge_equiv_classes, > > while iterating over elements in class2, it simply checks if element > > is a reg, and already inserted in class1 with equivalent mode, and > > avoids deleting it from reg_eqv_table in that case. > > > > This avoids hitting the assert, and following is the result of merging > > above two classes: > > Equivalence chain for (subreg:QI (reg:V16QI 671) 0): > > (subreg:QI (reg:V16QI 671) 0) > > (reg:QI 34 v2) > > (reg:QI 32 v0) > > (reg:QI 34 v2) > > (const_int 1 [0x1]) > > (const_int 1 [0x1]) > > (vec_select:QI (reg:V16QI 671) > > (parallel [ > > (const_int 1 [0x1]) > > ])) > > (vec_select:QI (reg:V16QI 33 v1) > > (parallel [ > > (const_int 1 [0x1]) > > ])) > > (vec_select:QI (reg:V16QI 33 v1) > > (parallel [ > > (const_int 2 [0x2]) > > ])) > > (vec_select:QI (reg:V16QI 32 v0) > > (parallel [ > > (const_int 1 [0x1]) > > ])) > > > > Which seems to be OK (?), but am not sure if this patch is in the > > right direction, > > and is also not efficient. > > Could you please suggest how to proceed ? > Hi, > ping: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637913.html Hi, ping * 2: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637913.html Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh