From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 52083 invoked by alias); 23 Jan 2020 08:16:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 52074 invoked by uid 89); 23 Jan 2020 08:16:56 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.9 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy= X-HELO: us-smtp-1.mimecast.com Received: from us-smtp-delivery-1.mimecast.com (HELO us-smtp-1.mimecast.com) (205.139.110.120) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 23 Jan 2020 08:16:54 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1579767413; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=oEzJOu3uGYyOoTkdMTweWLsdSMaUFEEB7dg/jqrV86M=; b=CmEZQGvMBnISXOWL8uSVidyJEKEV6leUgARToTgBZnwF4grHk5AhxWeBADL/KKL8QOBeSA TycquEu67JoQmDkZuv07iJxKRaIv7xRbQ2dsZ5F54iwa0M0dRt0la7vsM8cfiThLp4aST6 1yHDWUx465rxVTjk5o6YSjF26jMaP5o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-426-XvR7czQ3OQiBzkSkTjUpIA-1; Thu, 23 Jan 2020 03:16:50 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E886D18A6ECB; Thu, 23 Jan 2020 08:16:48 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-51.ams2.redhat.com [10.36.116.51]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 31AAD5DA7D; Thu, 23 Jan 2020 08:16:48 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id 00N8Gjk0021133; Thu, 23 Jan 2020 09:16:45 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id 00N8GhMT021132; Thu, 23 Jan 2020 09:16:43 +0100 Date: Thu, 23 Jan 2020 09:14:00 -0000 From: Jakub Jelinek To: Richard Biener , Richard Sandiford , Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] wide-int: i386: Fix ICEs on TImode signed overflow add/sub patterns [PR93376] Message-ID: <20200123081643.GV10088@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 User-Agent: Mutt/1.11.3 (2019-02-01) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="dnBRaVjvtun1Q0Od" Content-Disposition: inline X-IsSubscribed: yes X-SW-Source: 2020-01/txt/msg01549.txt.bz2 --dnBRaVjvtun1Q0Od Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-length: 3285 Hi! The following testcase ICEs, because during try_combine of i3: (insn 18 17 19 2 (parallel [ (set (reg:CCO 17 flags) (eq:CCO (plus:OI (sign_extend:OI (reg:TI 96)) (const_int 1 [0x1])) (sign_extend:OI (plus:TI (reg:TI 96) (const_int 1 [0x1]))))) (set (reg:TI 98) (plus:TI (reg:TI 96) (const_int 1 [0x1]))) ]) "pr93376.c":8:10 223 {*addvti4_doubleword_1} (expr_list:REG_UNUSED (reg:TI 98) (expr_list:REG_DEAD (reg:TI 96) (nil)))) and i2: (insn 17 37 18 2 (set (reg:TI 96) (const_wide_int 0x7fffffffffffffffffffffffffffffff)) "pr93376.c":8:= 10 65 {*movti_internal} (nil)) the eq in there gets simplified into: (eq:CCO (const_wide_int 0x080000000000000000000000000000000) (const_wide_int 0x80000000000000000000000000000000)) and simplify-rtx.c tries to simplify it by simplifying MINUS of the two operands. Now, i386 defines MAX_BITSIZE_MODE_ANY_INT to 128, because OImode and XImode are used mainly as a placeholder for the vector modes; these new signed overflow patterns are an exception to that, but what they really need is just TImode precision + 1 (maybe 2 worst case) bits at any time. wide-int.h defines WIDE_INT_MAX_ELTS in a way that it contains one more HWI above number of HWIs to cover WIDE_INT_MAX_ELTS, so on i386 that is 3 HWIs, meaning that TImode precision + 1/2 bits is still representable in there. Unfortunately, the way wi::sub_large is implemented, it needs not just those 3 HWIs, but one HWI above the maximum of the lengths of both operands, which means it buffer overflows, overwrites the following precision in wide_int_storage and ICEs later on. The need for 4 HWIs is only temporary, because canonize immediately after it canonicalizes it back to 3 HWIs only. Attached are two patches, the first one is admittedly a hack, but could be considered an optimization too, simply before overwriting the last HWI ({add,sub}_large seem to be the only one that have such len++) perform a cheap check if canonize won't optimize it away immediately and in such case don't store it. Yes, we are playing with fire because full OImode is 256 bits and we can't store that many, but we in the end only need 129 bits. The other patch is something suggested by Richard S., avoid using OImode for this and instead use a partial int mode that is smaller. This is still playing with fire because even the partial int mode is larger than MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3 WIDE_INT_MAX_ELTS wide-int.h uses. The disadvantage of that approach is that it is more costly at compile time, various RA data structures contains number_of_modes^2 sized arrays, and RA initialization will want to compute at runtime various properties of each of the modes. I've tried to think about other ways how to represent signed overflow check in RTL, but didn't find any that would actually properly describe it and not be way too complicated for the patterns. So, my preference is the first patch, but Richard S. doesn't like that much. Both patches have been bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (which one)? Jakub --dnBRaVjvtun1Q0Od Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=S176a Content-Transfer-Encoding: quoted-printable Content-length: 1972 2020-01-23 Jakub Jelinek PR target/93376 * wide-int.cc (wi::add_large, wi::sub_large): Don't extend len by writing SIGN_MASK of the highest element that canonize would remove again. * gcc.dg/pr93376.c: New test. --- gcc/wide-int.cc.jj 2020-01-12 11:54:38.535381394 +0100 +++ gcc/wide-int.cc 2020-01-22 12:05:06.739486117 +0100 @@ -1155,8 +1155,14 @@ wi::add_large (HOST_WIDE_INT *val, const =20 if (len * HOST_BITS_PER_WIDE_INT < prec) { - val[len] =3D mask0 + mask1 + carry; - len++; + HOST_WIDE_INT val_top =3D mask0 + mask1 + carry; + /* Don't extend len unnecessarily when canonize would shrink it + again immediately. */ + if (SIGN_MASK (val[len - 1]) !=3D val_top) + { + val[len] =3D val_top; + len++; + } if (overflow) *overflow =3D (sgn =3D=3D UNSIGNED && carry) ? wi::OVF_OVERFLOW : wi::OVF_NONE; @@ -1566,8 +1572,14 @@ wi::sub_large (HOST_WIDE_INT *val, const =20 if (len * HOST_BITS_PER_WIDE_INT < prec) { - val[len] =3D mask0 - mask1 - borrow; - len++; + HOST_WIDE_INT val_top =3D mask0 - mask1 - borrow; + /* Don't extend len unnecessarily when canonize would shrink it + again immediately. */ + if (SIGN_MASK (val[len - 1]) !=3D val_top) + { + val[len] =3D val_top; + len++; + } if (overflow) *overflow =3D (sgn =3D=3D UNSIGNED && borrow) ? OVF_UNDERFLOW : OVF_NONE; } --- gcc/testsuite/gcc.dg/pr93376.c.jj 2020-01-22 12:16:40.231937796 +0100 +++ gcc/testsuite/gcc.dg/pr93376.c 2020-01-22 12:16:23.953190057 +0100 @@ -0,0 +1,20 @@ +/* PR target/93376 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-Og -finline-functions-called-once" } */ + +unsigned a, b, c; + +int +bar (int x) +{ + short s =3D __builtin_sub_overflow (~x, 0, &b); + a =3D __builtin_ffsll (~x); + return __builtin_add_overflow_p (-(unsigned __int128) a, s, + (unsigned __int128) 0); +} + +void +foo (void) +{ + c =3D bar (0); +} --dnBRaVjvtun1Q0Od Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=S176 Content-Transfer-Encoding: quoted-printable Content-length: 6341 2020-01-23 Jakub Jelinek PR target/93376 * config/i386/i386-modes.def (POImode): New mode. * config/i386/i386.md (DPWI): New mode attribute. (addv4, subv4): Use instead of . (QWI): Rename to... (QPWI): ... this. Use POI instead of OI for TImode. (*addv4_doubleword, *addv4_doubleword_1, *subv4_doubleword, *subv4_doubleword_1): Use instead of . * gcc.dg/pr93376.c: New test. --- gcc/config/i386/i386-modes.def.jj 2020-01-12 11:54:36.323414766 +0100 +++ gcc/config/i386/i386-modes.def 2020-01-22 14:10:44.781367495 +0100 @@ -107,6 +107,13 @@ INT_MODE (XI, 64); PARTIAL_INT_MODE (HI, 16, P2QI); PARTIAL_INT_MODE (SI, 32, P2HI); =20 +/* Mode used for signed overflow checking of TImode. As + MAX_BITSIZE_MODE_ANY_INT is only 128, wide-int.h reserves only that + plus HOST_BITS_PER_WIDE_INT bits in wide_int etc., so OImode is too + large. For the overflow checking we actually need just 1 or 2 bits + beyond TImode precision. */ +PARTIAL_INT_MODE (OI, 192, POI); + /* Keep the OI and XI modes from confusing the compiler into thinking that these modes could actually be used for computation. They are only holders for vectors during data movement. */ --- gcc/config/i386/i386.md.jj 2020-01-22 14:03:52.597593584 +0100 +++ gcc/config/i386/i386.md 2020-01-22 14:31:22.661691613 +0100 @@ -6054,15 +6054,18 @@ (define_insn "*addqi_ext_2" [(set_attr "type" "alu") (set_attr "mode" "QI")]) =20 +;; Like DWI, but use POImode instead of OImode. +(define_mode_attr DPWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "POI")= ]) + ;; Add with jump on overflow. (define_expand "addv4" [(parallel [(set (reg:CCO FLAGS_REG) (eq:CCO - (plus: - (sign_extend: + (plus: + (sign_extend: (match_operand:SWIDWI 1 "nonimmediate_operand")) (match_dup 4)) - (sign_extend: + (sign_extend: (plus:SWIDWI (match_dup 1) (match_operand:SWIDWI 2 ""))))) @@ -6078,7 +6081,7 @@ (define_expand "addv4" if (CONST_SCALAR_INT_P (operands[2])) operands[4] =3D operands[2]; else - operands[4] =3D gen_rtx_SIGN_EXTEND (mode, operands[2]); + operands[4] =3D gen_rtx_SIGN_EXTEND (mode, operands[2]); }) =20 (define_insn "*addv4" @@ -6123,17 +6126,17 @@ (define_insn "addv4_1" (const_string "")))]) =20 ;; Quad word integer modes as mode attribute. -(define_mode_attr QWI [(SI "TI") (DI "OI")]) +(define_mode_attr QPWI [(SI "TI") (DI "POI")]) =20 (define_insn_and_split "*addv4_doubleword" [(set (reg:CCO FLAGS_REG) (eq:CCO - (plus: - (sign_extend: + (plus: + (sign_extend: (match_operand: 1 "nonimmediate_operand" "%0,0")) - (sign_extend: + (sign_extend: (match_operand: 2 "x86_64_hilo_general_operand" "r,o"))) - (sign_extend: + (sign_extend: (plus: (match_dup 1) (match_dup 2))))) (set (match_operand: 0 "nonimmediate_operand" "=3Dro,r") (plus: (match_dup 1) (match_dup 2)))] @@ -6172,11 +6175,11 @@ (define_insn_and_split "*addv4_doub (define_insn_and_split "*addv4_doubleword_1" [(set (reg:CCO FLAGS_REG) (eq:CCO - (plus: - (sign_extend: + (plus: + (sign_extend: (match_operand: 1 "nonimmediate_operand" "%0")) - (match_operand: 3 "const_scalar_int_operand" "")) - (sign_extend: + (match_operand: 3 "const_scalar_int_operand" "")) + (sign_extend: (plus: (match_dup 1) (match_operand: 2 "x86_64_hilo_general_operand" ""))))) @@ -6570,11 +6573,11 @@ (define_insn "*subsi_2_zext" (define_expand "subv4" [(parallel [(set (reg:CCO FLAGS_REG) (eq:CCO - (minus: - (sign_extend: + (minus: + (sign_extend: (match_operand:SWIDWI 1 "nonimmediate_operand")) (match_dup 4)) - (sign_extend: + (sign_extend: (minus:SWIDWI (match_dup 1) (match_operand:SWIDWI 2 ""))))) @@ -6590,7 +6593,7 @@ (define_expand "subv4" if (CONST_SCALAR_INT_P (operands[2])) operands[4] =3D operands[2]; else - operands[4] =3D gen_rtx_SIGN_EXTEND (mode, operands[2]); + operands[4] =3D gen_rtx_SIGN_EXTEND (mode, operands[2]); }) =20 (define_insn "*subv4" @@ -6637,12 +6640,12 @@ (define_insn "subv4_1" (define_insn_and_split "*subv4_doubleword" [(set (reg:CCO FLAGS_REG) (eq:CCO - (minus: - (sign_extend: + (minus: + (sign_extend: (match_operand: 1 "nonimmediate_operand" "0,0")) - (sign_extend: + (sign_extend: (match_operand: 2 "x86_64_hilo_general_operand" "r,o"))) - (sign_extend: + (sign_extend: (minus: (match_dup 1) (match_dup 2))))) (set (match_operand: 0 "nonimmediate_operand" "=3Dro,r") (minus: (match_dup 1) (match_dup 2)))] @@ -6679,11 +6682,11 @@ (define_insn_and_split "*subv4_doub (define_insn_and_split "*subv4_doubleword_1" [(set (reg:CCO FLAGS_REG) (eq:CCO - (minus: - (sign_extend: + (minus: + (sign_extend: (match_operand: 1 "nonimmediate_operand" "0")) - (match_operand: 3 "const_scalar_int_operand" "")) - (sign_extend: + (match_operand: 3 "const_scalar_int_operand" "")) + (sign_extend: (minus: (match_dup 1) (match_operand: 2 "x86_64_hilo_general_operand" ""))))) --- gcc/testsuite/gcc.dg/pr93376.c.jj 2020-01-22 14:04:57.955604949 +0100 +++ gcc/testsuite/gcc.dg/pr93376.c 2020-01-22 14:04:57.955604949 +0100 @@ -0,0 +1,20 @@ +/* PR target/93376 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-Og -finline-functions-called-once" } */ + +unsigned a, b, c; + +int +bar (int x) +{ + short s =3D __builtin_sub_overflow (~x, 0, &b); + a =3D __builtin_ffsll (~x); + return __builtin_add_overflow_p (-(unsigned __int128) a, s, + (unsigned __int128) 0); +} + +void +foo (void) +{ + c =3D bar (0); +} --dnBRaVjvtun1Q0Od--