From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 126283858D39; Tue, 26 Oct 2021 14:53:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 126283858D39 From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: =?UTF-8?B?W0J1ZyBtaWRkbGUtZW5kLzEwMjk0OF0gTmV3OiA2MCUgYnVpbGQg?= =?UTF-8?B?dGltZSByZWdyZXNzaW9uIG9uIGdhbWVzcyBpbiByYW5nZSAyZmMyZTM5MTdm?= =?UTF-8?B?OWM4ZmQ5NGY1ZDEwMTQ3Nzk3MWQxNmM0ODNlZjg4Li4uYzE2ZjIxYzdjZjk3?= =?UTF-8?B?Y2U0ODk2N2U0MmQzYjVkMjJlYTE2OWE5YzJjOA==?= Date: Tue, 26 Oct 2021 14:53:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Oct 2021 14:53:11 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102948 Bug ID: 102948 Summary: 60% build time regression on gamess in range 2fc2e3917f9c8fd94f5d101477971d16c483ef88...c16f21c7cf9 7ce48967e42d3b5d22ea169a9c2c8 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D322.50.8&plot.1= =3D307.50.8&plot.2=3D343.50.8&plot.3=3D266.50.8&plot.4=3D395.50.8&plot.5=3D= 412.50.8&plot.6=3D289.50.8& this is with -Ofast -march=3Dnative -flto on zen commit c16f21c7cf97ce48967e42d3b5d22ea169a9c2c8 Author: liuhongt Date: Wed Aug 4 18:43:22 2021 +0800 Support cond_{xor,ior,and} for vector integer mode under AVX512. gcc/ChangeLog: * config/i386/sse.md (cond_): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_anylogic_d-1.c: New test. * gcc.target/i386/cond_op_anylogic_d-2.c: New test. * gcc.target/i386/cond_op_anylogic_q-1.c: New test. * gcc.target/i386/cond_op_anylogic_q-2.c: New test. commit f7aa81892eb54bc040ee6f7fd6134d800a5ee89c Author: liuhongt Date: Wed Aug 4 18:15:43 2021 +0800 Support cond_{smax,smin} for vector float/double modes under AVX512. gcc/ChangeLog: * config/i386/sse.md (cond_): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_maxmin_double-1.c: New test. * gcc.target/i386/cond_op_maxmin_double-2.c: New test. * gcc.target/i386/cond_op_maxmin_float-1.c: New test. * gcc.target/i386/cond_op_maxmin_float-2.c: New test. commit 9a8c3fc2b2cc6d73b2e3006625fca2b588ebc1b0 Author: liuhongt Date: Wed Aug 4 16:03:58 2021 +0800 Support cond_{smax,smin,umax,umin} for vector integer modes under AVX51= 2. gcc/ChangeLog: * config/i386/sse.md (cond_): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_maxmin_b-1.c: New test. * gcc.target/i386/cond_op_maxmin_b-2.c: New test. * gcc.target/i386/cond_op_maxmin_d-1.c: New test. * gcc.target/i386/cond_op_maxmin_d-2.c: New test. * gcc.target/i386/cond_op_maxmin_q-1.c: New test. * gcc.target/i386/cond_op_maxmin_q-2.c: New test. * gcc.target/i386/cond_op_maxmin_ub-1.c: New test. * gcc.target/i386/cond_op_maxmin_ub-2.c: New test. * gcc.target/i386/cond_op_maxmin_ud-1.c: New test. * gcc.target/i386/cond_op_maxmin_ud-2.c: New test. * gcc.target/i386/cond_op_maxmin_uq-1.c: New test. * gcc.target/i386/cond_op_maxmin_uq-2.c: New test. * gcc.target/i386/cond_op_maxmin_uw-1.c: New test. * gcc.target/i386/cond_op_maxmin_uw-2.c: New test. * gcc.target/i386/cond_op_maxmin_w-1.c: New test. * gcc.target/i386/cond_op_maxmin_w-2.c: New test. commit 2697f8324fbb09b0d92036ba6a6b8a2b8d256b23 Author: GCC Administrator Date: Thu Aug 5 00:17:03 2021 +0000 Daily bump. commit ded2c2c068f6f2825474758cb03a05070a5837e8 Author: David Malcolm Date: Wed Aug 4 18:21:21 2021 -0400 analyzer: initial implementation of asm support [PR101570] gcc/ChangeLog: PR analyzer/101570 * Makefile.in (ANALYZER_OBJS): Add analyzer/region-model-asm.o. gcc/analyzer/ChangeLog: PR analyzer/101570 * analyzer.cc (maybe_reconstruct_from_def_stmt): Add GIMPLE_ASM case. * analyzer.h (class asm_output_svalue): New forward decl. (class reachable_regions): New forward decl. * complexity.cc (complexity::from_vec_svalue): New. * complexity.h (complexity::from_vec_svalue): New decl. * engine.cc (feasibility_state::maybe_update_for_edge): Handle asm stmts by calling on_asm_stmt. * region-model-asm.cc: New file. * region-model-manager.cc (region_model_manager::maybe_fold_asm_output_svalue): New. (region_model_manager::get_or_create_asm_output_svalue): New. (region_model_manager::log_stats): Log m_asm_output_values_map. * region-model.cc (region_model::on_stmt_pre): Handle GIMPLE_AS= M. * region-model.h (visitor::visit_asm_output_svalue): New. (region_model_manager::get_or_create_asm_output_svalue): New de= cl. (region_model_manager::maybe_fold_asm_output_svalue): New decl. (region_model_manager::asm_output_values_map_t): New typedef. (region_model_manager::m_asm_output_values_map): New field. (region_model::on_asm_stmt): New. * store.cc (binding_cluster::on_asm): New. * store.h (binding_cluster::on_asm): New decl. * svalue.cc (svalue::cmp_ptr): Handle SK_ASM_OUTPUT. (asm_output_svalue::dump_to_pp): New. (asm_output_svalue::dump_input): New. (asm_output_svalue::input_idx_to_asm_idx): New. (asm_output_svalue::accept): New. * svalue.h (enum svalue_kind): Add SK_ASM_OUTPUT. (svalue::dyn_cast_asm_output_svalue): New. (class asm_output_svalue): New. (is_a_helper ::test): New. (struct default_hash_traits): New. gcc/testsuite/ChangeLog: PR analyzer/101570 * gcc.dg/analyzer/asm-x86-1.c: New test. * gcc.dg/analyzer/asm-x86-lp64-1.c: New test. * gcc.dg/analyzer/asm-x86-lp64-2.c: New test. * gcc.dg/analyzer/pr101570.c: New test. * gcc.dg/analyzer/torture/asm-x86-linux-array_index_mask_nospec= .c: New test. * gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c: New test. * gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c: New test. * gcc.dg/analyzer/torture/asm-x86-linux-cpuid.c: New test. * gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c: New test. * gcc.dg/analyzer/torture/asm-x86-linux-rdmsr.c: New test. * gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full= .c: New test. * gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-reduced.c: New test. Signed-off-by: David Malcolm commit 5738a64f8b3cf132b88b39af84b9f5f5a9a1554c Author: H.J. Lu Date: Tue Aug 3 06:17:22 2021 -0700 x86: Update STORE_MAX_PIECES Update STORE_MAX_PIECES to allow 16/32/64 bytes only if inter-unit move is enabled since vec_duplicate enabled by inter-unit move is used to implement store_by_pieces of 16/32/64 bytes. gcc/ PR target/101742 * config/i386/i386.h (STORE_MAX_PIECES): Allow 16/32/64 bytes only if TARGET_INTER_UNIT_MOVES_TO_VEC is true. gcc/testsuite/ PR target/101742 * gcc.target/i386/pr101742a.c: New test. * gcc.target/i386/pr101742b.c: Likewise. commit 09dba016db937e61be21ef1e9581065a9ed2847d Author: H.J. Lu Date: Wed Aug 4 06:15:04 2021 -0700 x86: Avoid stack realignment when copying data with SSE register To avoid stack realignment, call ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy data with with SSE register from one memory location to another. gcc/ PR target/101772 * config/i386/i386-expand.c (ix86_expand_vector_move): Call ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy data with SSE register from one memory location to another. gcc/testsuite/ PR target/101772 * gcc.target/i386/eh_return-2.c: New test. commit 361da782a25031c6ae3967bf8c10a8119845255c Author: Andreas Krebbel Date: Wed Aug 4 18:40:11 2021 +0200 IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vpdi This patch makes use of the vector permute double immediate instruction for constant permute vectors. gcc/ChangeLog: * config/s390/s390.c (expand_perm_with_vpdi): New function. (vectorize_vec_perm_const_1): Call expand_perm_with_vpdi. * config/s390/vector.md (*vpdi1, @vpdi1): Enable a parameterized expander. (*vpdi4, @vpdi4): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/perm-vpdi.c: New test. commit 6dc8c4656444153c9e2f98d382de39728a849672 Author: Andreas Krebbel Date: Wed Aug 4 18:40:10 2021 +0200 IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vector merge This patch implements the TARGET_VECTORIZE_VEC_PERM_CONST in the IBM Z backend. The initial implementation only exploits the vector merge instruction but there is more to come. gcc/ChangeLog: * config/s390/s390.c (MAX_VECT_LEN): Define macro. (struct expand_vec_perm_d): Define struct. (expand_perm_with_merge): New function. (vectorize_vec_perm_const_1): New function. (s390_vectorize_vec_perm_const): New function. (TARGET_VECTORIZE_VEC_PERM_CONST): Define target macro. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/perm-merge.c: New test. * gcc.target/s390/vector/vec-types.h: New test. commit 4e34925ef1aeab73e022d80149be8cec92c48667 Author: Andreas Krebbel Date: Wed Aug 4 18:40:10 2021 +0200 IBM Z: Remove redundant V_HW_64 mode iterator. gcc/ChangeLog: * config/s390/vector.md (V_HW_64): Remove mode iterator. (*vec_load_pair): Use V_HW_2 instead of V_HW_64. * config/s390/vx-builtins.md (vec_scatter_element_SI): Use V_HW_2 instead of V_HW_64. commit 0aa7091befa9fdb67f7013dbd454d336a31ef71d Author: Andreas Krebbel Date: Wed Aug 4 18:40:09 2021 +0200 IBM Z: Get rid of vpdi unspec The patch gets rid of the unspec used for the vector permute double immediate instruction and replaces it with generic rtx. gcc/ChangeLog: * config/s390/s390.md (UNSPEC_VEC_PERMI): Remove constant definition. * config/s390/vector.md (*vpdi1, *vpdi4): New patte= rn definitions. * config/s390/vx-builtins.md (*vec_permi): Emit generic r= tx instead of an unspec. gcc/testsuite/ChangeLog: * gcc.target/s390/zvector/vec-permi.c: Removed. * gcc.target/s390/zvector/vec_permi.c: New test. commit 5391688acc997e26375e42340cea885fa6ad0d7d Author: Andreas Krebbel Date: Wed Aug 4 18:40:09 2021 +0200 IBM Z: Get rid of vec merge unspec This patch gets rid of the unspecs we were using for the vector merge instruction and replaces it with generic rtx. gcc/ChangeLog: * config/s390/s390-modes.def: Add more vector modes to support concatenation of two vectors. * config/s390/s390-protos.h (s390_expand_merge_perm_const): Add prototype. (s390_expand_merge): Likewise. * config/s390/s390.c (s390_expand_merge_perm_const): New functi= on. (s390_expand_merge): New function. * config/s390/s390.md (UNSPEC_VEC_MERGEH, UNSPEC_VEC_MERGEL): Remove constant definitions. * config/s390/vector.md (V_HW_2): Add mode iterators. (VI_HW_4, V_HW_4): Rename VI_HW_4 to V_HW_4. (vec_2x_nelts, vec_2x_wide): New mode attributes. (*vmrhb, *vmrlb, *vmrhh, *vmrlh, *vmrhf, *vmrlf, *vmrhg, *vmrlg= ): New pattern definitions. (vec_widen_umult_lo_, vec_widen_umult_hi_) (vec_widen_smult_lo_, vec_widen_smult_hi_) (vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf, vec_unpacks_lo_v2df) (vec_unpacks_hi_v2df): Adjust expanders to emit non-unspec RTX = for vec merge. * config/s390/vx-builtins.md (V_HW_4): Remove mode iterator. Now in vector.md. (vec_mergeh, vec_mergel): Use s390_expand_merge to emit vec merge pattern. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: Instead of vpdi with 0 and 5 vmrlg and vmrhg are used now. * gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: Likewise. * gcc.target/s390/zvector/vec-types.h: New test. * gcc.target/s390/zvector/vec_merge.c: New test. commit 63834c84d43fc2eeeaa054c5e24d1e468e9eddab Author: Jonathan Wright Date: Mon Jul 19 10:19:30 2021 +0100 aarch64: Don't include vec_select high-half in SIMD multiply cost The Neon multiply/multiply-accumulate/multiply-subtract instructions can select the top or bottom half of the operand registers. This selection does not change the cost of the underlying instruction and this should be reflected by the RTL cost function. This patch adds RTL tree traversal in the Neon multiply cost function to match vec_select high-half of its operands. This traversal prevents the cost of the vec_select from being added into the cost of the multiply - meaning that these instructions can now be emitted in the combine pass as they are no longer deemed prohibitively expensive. gcc/ChangeLog: 2021-07-19 Jonathan Wright * config/aarch64/aarch64.c (aarch64_strip_extend_vec_half): Define. (aarch64_rtx_mult_cost): Traverse RTL tree to prevent cost of vec_select high-half from being added into Neon multiply cost. * rtlanal.c (vec_series_highpart_p): Define. * rtlanal.h (vec_series_highpart_p): Declare. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vmul_high_cost.c: New test. commit 1d65c9d25199264bc8909018df1b0dca71c0b32d Author: Jonathan Wright Date: Mon Jul 19 14:01:52 2021 +0100 aarch64: Don't include vec_select element in SIMD multiply cost The Neon multiply/multiply-accumulate/multiply-subtract instructions can take various forms - multiplying full vector registers of values or multiplying one vector by a single element of another. Regardless of the form used, these instructions have the same cost, and this should be reflected by the RTL cost function. This patch adds RTL tree traversal in the Neon multiply cost function to match the vec_select used by the lane-referencing forms of the instructions already mentioned. This traversal prevents the cost of the vec_select from being added into the cost of the multiply - meaning that these instructions can now be emitted in the combine pass as they are no longer deemed prohibitively expensive. gcc/ChangeLog: 2021-07-19 Jonathan Wright * config/aarch64/aarch64.c (aarch64_strip_duplicate_vec_elt): Define. (aarch64_rtx_mult_cost): Traverse RTL tree to prevent vec_select cost from being added into Neon multiply cost. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vmul_element_cost.c: New test. commit 5a1017dc305c49c59129d45536630d02dbc01c45 Author: Richard Sandiford Date: Wed Aug 4 16:52:09 2021 +0100 vect: Tweak comparisons with existing epilogue loops This patch uses a more accurate scalar iteration estimate when comparing the epilogue of a constant-iteration loop with a candidate replacement epilogue. In the testcase, the patch prevents a 1-to-3-element SVE epilogue from seeming better than a 64-bit Advanced SIMD epilogue. gcc/ * tree-vect-loop.c (vect_better_loop_vinfo_p): Detect cases in which old_loop_vinfo is an epilogue loop that handles a constant number of iterations. gcc/testsuite/ * gcc.target/aarch64/sve/cost_model_12.c: New test. commit 315a1c3756cbc751c4af0ce0da2157a88d7c3b09 Author: Richard Sandiford Date: Wed Aug 4 16:52:08 2021 +0100 vect: Tweak dump messages for vector mode choice After vect_analyze_loop has successfully analysed a loop for one base vector mode B1, it considers using following base vector modes to vectorise an epilogue. However, for VECT_COMPARE_COSTS, a later mode B2 might turn out to be better than B1 was. Initially this comparison will be between an epilogue loop (for B2) and a main loop (for B1). However, in r11-6458 I'd added code to reanalyse the B2 epilogue loop as a main loop, partly for correctness and partly for better costing. This can lead to a situation in which we think that the B2 epilogue loop was better than the B1 main loop, but that the B2 main loop is not better than the B1 main loop. There was no dump message to say that this had happened, which made it look like B2 had still won. gcc/ * tree-vect-loop.c (vect_analyze_loop): Print a dump message when a reanalyzed loop fails to be cheaper than the current main loop. commit eb55b5b0df26e95c98ab59d34e69189d4f61bc0c Author: Richard Sandiford Date: Wed Aug 4 16:52:07 2021 +0100 aarch64: Fix a typo gcc/ * config/aarch64/aarch64.c: Fix a typo. commit 929f2cf4105ccf12d0684c6d5838f58f0ee5e7c7 Author: Vincent Lef=C3=A8vre Date: Wed Aug 4 17:25:52 2021 +0200 gcov: check return code of a fclose gcc/ChangeLog: PR gcov-profile/101773 * gcov-io.c (gcov_close): Check return code of a fclose. commit 96c82a16b2076891a9974d0f0e96a0b85fbc2df4 Author: Bernd Edlinger Date: Sat Jul 24 12:53:39 2021 +0200 Fix debug info for ignored decls at start of assembly Ignored functions decls that are compiled at the start of the assembly have bogus line numbers until the first .file directive, as reported in PR101575. The corresponding binutils bug report is https://sourceware.org/bugzilla/show_bug.cgi?id=3D28149 The work around for this issue is to emit a dummy .file directive before the first function is compiled, unless another .file directive was already emitted previously. 2021-08-04 Bernd Edlinger PR ada/101575 * dwarf2out.c (dwarf2out_assembly_start): Emit a dummy .file statement when needed. commit 9fcb8ec60302f5f110f94a885b618993c28d18d3 Author: Tamar Christina Date: Wed Aug 4 14:36:15 2021 +0100 [testsuite] Fix trapping access in test PR101750 I believe PR101750 to be a testism. Fix it by giving the class a name. gcc/testsuite/ChangeLog: PR tree-optimization/101750 * g++.dg/vect/pr99149.cc: Name class. commit 31855ba6b16cd138d7484076a08cd40d609654b8 Author: Richard Biener Date: Thu Jul 29 14:14:48 2021 +0200 Add emulated gather capability to the vectorizer This adds a gather vectorization capability to the vectorizer without target support by decomposing the offset vector, doing sclar loads and then building a vector from the result. This is aimed mainly at cases where vectorizing the rest of the loop offsets the cost of vectorizing the gather. Note it's difficult to avoid vectorizing the offset load, but in some cases later passes can turn the vector load + extract into scalar loads, see the followup patch. On SPEC CPU 2017 510.parest_r this improves runtime from 250s to 219s on a Zen2 CPU which has its native gather instructions disabled (using those the runtime instead increases to 254s) using -Ofast -march=3Dznver2 [-flto]. It turns out the critical loops in this benchmark all perform gather operations. 2021-07-30 Richard Biener * tree-vect-data-refs.c (vect_check_gather_scatter): Include widening conversions only when the result is still handed by native gather or the current offset size not already matches the data size. Also succeed analysis in case there's no native support, noted by a IFN_LAST ifn and a NULL decl. (vect_analyze_data_refs): Always consider gathers. * tree-vect-patterns.c (vect_recog_gather_scatter_pattern): Test for no IFN gather rather than decl gather. * tree-vect-stmts.c (vect_model_load_cost): Pass in the gather-scatter info and cost emulated gathers accordingly. (vect_truncate_gather_scatter_offset): Properly test for no IFN gather. (vect_use_strided_gather_scatters_p): Likewise. (get_load_store_type): Handle emulated gathers and its restrictions. (vectorizable_load): Likewise. Emulate them by extracting scalar offsets, doing scalar loads and a vector construct. * gcc.target/i386/vect-gather-1.c: New testcase. * gfortran.dg/vect/vect-8.f90: Adjust. commit f2e5d2717d9e249edc5e0d45e49e4f9ef81fc694 Author: H.J. Lu Date: Tue Aug 3 06:17:22 2021 -0700 by_pieces: Pass MAX_PIECES to op_by_pieces_d Pass MAX_PIECES to op_by_pieces_d::op_by_pieces_d for move, store and compare. PR target/101742 * expr.c (op_by_pieces_d::op_by_pieces_d): Add a max_pieces argument to set m_max_size. (move_by_pieces_d): Pass MOVE_MAX_PIECES to op_by_pieces_d. (store_by_pieces_d): Pass STORE_MAX_PIECES to op_by_pieces_d. (compare_by_pieces_d): Pass COMPARE_MAX_PIECES to op_by_pieces_= d. commit 96146e61cd7aee62c21c2845916ec42152918ab7 Author: Roger Sayle Date: Wed Aug 4 14:19:14 2021 +0100 Fold (X< Marc Glisse gcc/ChangeLog * match.pd (bit_ior, bit_xor): Canonicalize (X*C1)|(X*C2) and (X*C1)^(X*C2) as X*(C1+C2), and related variants, using tree_nonzero_bits to ensure that operands are bit-wise disjoint. gcc/testsuite/ChangeLog * gcc.dg/fold-ior-4.c: New test. commit 0d04fe49239d91787850036599164788f1c87785 Author: Jonathan Wakely Date: Tue Aug 3 20:50:52 2021 +0100 libstdc++: Add [[nodiscard]] to sequence containers ... and container adaptors. This adds the [[nodiscard]] attribute to functions with no side-effects for the sequence containers and their iterators, and the debug versions of those containers, and the container adaptors, Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * include/bits/forward_list.h: Add [[nodiscard]] to functions with no side-effects. * include/bits/stl_bvector.h: Likewise. * include/bits/stl_deque.h: Likewise. * include/bits/stl_list.h: Likewise. * include/bits/stl_queue.h: Likewise. * include/bits/stl_stack.h: Likewise. * include/bits/stl_vector.h: Likewise. * include/debug/deque: Likewise. * include/debug/forward_list: Likewise. * include/debug/list: Likewise. * include/debug/safe_iterator.h: Likewise. * include/debug/vector: Likewise. * include/std/array: Likewise. * testsuite/23_containers/array/creation/3_neg.cc: Use -Wno-unused-result. * testsuite/23_containers/array/debug/back1_neg.cc: Cast result to void. * testsuite/23_containers/array/debug/back2_neg.cc: Likewise. * testsuite/23_containers/array/debug/front1_neg.cc: Likewise. * testsuite/23_containers/array/debug/front2_neg.cc: Likewise. * testsuite/23_containers/array/debug/square_brackets_operator1_neg.cc: Likewise. * testsuite/23_containers/array/debug/square_brackets_operator2_neg.cc: Likewise. * testsuite/23_containers/array/tuple_interface/get_neg.cc: Adjust dg-error line numbers. * testsuite/23_containers/deque/cons/clear_allocator.cc: Cast result to void. * testsuite/23_containers/deque/debug/invalidation/4.cc: Likewise. * testsuite/23_containers/deque/types/1.cc: Use -Wno-unused-result. * testsuite/23_containers/list/types/1.cc: Cast result to void. * testsuite/23_containers/priority_queue/members/7161.cc: Likewise. * testsuite/23_containers/queue/members/7157.cc: Likewise. * testsuite/23_containers/vector/59829.cc: Likewise. * testsuite/23_containers/vector/ext_pointer/types/1.cc: Likewise. * testsuite/23_containers/vector/ext_pointer/types/2.cc: Likewise. * testsuite/23_containers/vector/types/1.cc: Use -Wno-unused-result. commit 240b01b0215f9e46ecf04267c8a3faeb19d4fe3c Author: Jonathan Wakely Date: Tue Aug 3 18:06:27 2021 +0100 libstdc++: Add [[nodiscard]] to iterators and related utilities This adds [[nodiscard]] throughout , as proposed by P2377R0 (with some minor corrections). The attribute is added for all modes from C++11 up, using [[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't be used directly. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * include/bits/iterator_concepts.h (iter_move): Add [[nodiscard]]. * include/bits/range_access.h (begin, end, cbegin, cend) (rbegin, rend, crbegin, crend, size, data, ssize): Likewise. * include/bits/ranges_base.h (ranges::begin, ranges::end) (ranges::cbegin, ranges::cend, ranges::rbegin, ranges::rend) (ranges::crbegin, ranges::crend, ranges::size, ranges::ssize) (ranges::empty, ranges::data, ranges::cdata): Likewise. * include/bits/stl_iterator.h (reverse_iterator, __normal_itera= tor) (back_insert_iterator, front_insert_iterator, insert_iterator) (move_iterator, move_sentinel, common_iterator) (counted_iterator): Likewise. * include/bits/stl_iterator_base_funcs.h (distance, next, prev): Likewise. * include/bits/stream_iterator.h (istream_iterator) (ostream_iterartor): Likewise. * include/bits/streambuf_iterator.h (istreambuf_iterator) (ostreambuf_iterator): Likewise. * include/std/ranges (views::single, views::iota, views::all) (views::filter, views::transform, views::take, views::take_whil= e) (views::drop, views::drop_while, views::join, views::lazy_split) (views::split, views::counted, views::common, views::reverse) (views::elements): Likewise. * testsuite/20_util/rel_ops.cc: Use -Wno-unused-result. * testsuite/24_iterators/move_iterator/greedy_ops.cc: Likewise. * testsuite/24_iterators/normal_iterator/greedy_ops.cc: Likewise. * testsuite/24_iterators/reverse_iterator/2.cc: Likewise. * testsuite/24_iterators/reverse_iterator/greedy_ops.cc: Likewise. * testsuite/21_strings/basic_string/range_access/char/1.cc: Cast result to void. * testsuite/21_strings/basic_string/range_access/wchar_t/1.cc: Likewise. * testsuite/21_strings/basic_string_view/range_access/char/1.cc: Likewise. * testsuite/21_strings/basic_string_view/range_access/wchar_t/1= .cc: Likewise. * testsuite/23_containers/array/range_access.cc: Likewise. * testsuite/23_containers/deque/range_access.cc: Likewise. * testsuite/23_containers/forward_list/range_access.cc: Likewise. * testsuite/23_containers/list/range_access.cc: Likewise. * testsuite/23_containers/map/range_access.cc: Likewise. * testsuite/23_containers/multimap/range_access.cc: Likewise. * testsuite/23_containers/multiset/range_access.cc: Likewise. * testsuite/23_containers/set/range_access.cc: Likewise. * testsuite/23_containers/unordered_map/range_access.cc: Likewise. * testsuite/23_containers/unordered_multimap/range_access.cc: Likewise. * testsuite/23_containers/unordered_multiset/range_access.cc: Likewise. * testsuite/23_containers/unordered_set/range_access.cc: Likewise. * testsuite/23_containers/vector/range_access.cc: Likewise. * testsuite/24_iterators/customization_points/iter_move.cc: Likewise. * testsuite/24_iterators/istream_iterator/sentinel.cc: Likewise. * testsuite/24_iterators/istreambuf_iterator/sentinel.cc: Likewise. * testsuite/24_iterators/move_iterator/dr2061.cc: Likewise. * testsuite/24_iterators/operations/prev_neg.cc: Likewise. * testsuite/24_iterators/ostreambuf_iterator/2.cc: Likewise. * testsuite/24_iterators/range_access/range_access.cc: Likewise. * testsuite/24_iterators/range_operations/100768.cc: Likewise. * testsuite/26_numerics/valarray/range_access2.cc: Likewise. * testsuite/28_regex/range_access.cc: Likewise. * testsuite/experimental/string_view/range_access/char/1.cc: Likewise. * testsuite/experimental/string_view/range_access/wchar_t/1.cc: Likewise. * testsuite/ext/vstring/range_access.cc: Likewise. * testsuite/std/ranges/adaptors/take.cc: Likewise. * testsuite/std/ranges/p2259.cc: Likewise. commit 2724d1bba6b36451404811fba3244f8897717ef3 Author: Richard Biener Date: Fri Jul 30 11:06:50 2021 +0200 Rewrite more vector loads to scalar loads This teaches forwprop to rewrite more vector loads that are only used in BIT_FIELD_REFs as scalar loads. This provides the remaining uplift to SPEC CPU 2017 510.parest_r on Zen 2 which has CPU gathers disabled. In particular vector load + vec_unpack + bit-field-ref is turned into (extending) scalar loads which avoids costly XMM/GPR transitions. To not conflict with vector load + bit-field-ref + vector constructor matching to vector load + shuffle the extended transform is only done after vector lowering. 2021-07-30 Richard Biener * tree-ssa-forwprop.c (pass_forwprop::execute): Split out code to decompose vector loads ... (optimize_vector_load): ... here. Generalize it to handle intermediate widening and TARGET_MEM_REF loads and apply it to loads with a supported vector mode as well. commit 87a0b607e40f8122c7fc45d496ef48799fe11550 Author: Richard Biener Date: Wed Aug 4 11:42:41 2021 +0200 tree-optimization/101756 - avoid vectorizing boolean MAX reductions The following avoids vectorizing MIN/MAX reductions on bools which, when ending up as vector(2) would need to be adjusted because of the sign change. The fix instead avoids any reduction vectorization where the result isn't compatible to the original scalar type since we don't compensate for that either. 2021-08-04 Richard Biener PR tree-optimization/101756 * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Make sure the result of the reduction epilogue is compatible to the origi= nal scalar result. * gcc.dg/vect/bb-slp-pr101756.c: New testcase. commit af31cab04770f7a1a1da069415ab62ca2ef54fc4 Author: Jakub Jelinek Date: Wed Aug 4 11:53:48 2021 +0200 c++: Fix up #pragma omp declare {simd,variant} and acc routine parsing When parsing default arguments, we need to temporarily clear parser->omp_declare_simd and parser->oacc_routine, otherwise it can clash with further declarati= ons inside of e.g. lambdas inside of those default arguments. 2021-08-04 Jakub Jelinek PR c++/101759 * parser.c (cp_parser_default_argument): Temporarily override parser->omp_declare_simd and parser->oacc_routine to NULL. * g++.dg/gomp/pr101759.C: New test. * g++.dg/goacc/pr101759.C: New test. commit 8aa14fa7d98b4d641de9c3ea8d0fa094e0a0ec76 Author: Jakub Jelinek Date: Wed Aug 4 11:42:59 2021 +0200 testsuite: Fix duplicated content of gcc.c-torture/execute/ieee/pr29302= -1.x The file has two identical halves, seems like twice applied patch. 2021-08-04 Jakub Jelinek * gcc.c-torture/execute/ieee/pr29302-1.x: Undo doubly applied patch. commit 9f26640f7b89c771b0ebffd7e7f5019d0709a955 Author: liuhongt Date: Wed Aug 4 10:50:28 2021 +0800 Refine predicate of peephole2 to general_reg_operand. [PR target/101743] The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc should only work on general registers, considering that x86 also supports mov instructions between gpr, sse reg, mask reg, limiting the peephole2 predicate to general_reg_operand. gcc/ChangeLog: PR target/101743 * config/i386/i386.md (peephole2): Refine predicate from register_operand to general_reg_operand. commit 7195fa03e7b8dfaff85d122da3b75f0a30ce95f8 Author: Jakub Jelinek Date: Wed Aug 4 11:40:52 2021 +0200 libgcc: Fix duplicated content of config/t-slibgcc-fuchsia The file has two identical halves, seems like twice applied patch. 2021-08-04 Jakub Jelinek * config/t-slibgcc-fuchsia: Undo doubly applied patch. commit 9db0bcd9fdc2e3a659d56435cb18d553f4292edb Author: Aldy Hernandez Date: Wed Aug 4 10:55:12 2021 +0200 Mark path_range_query::dump as override. gcc/ChangeLog: * gimple-range-path.h (path_range_query::dump): Mark override. commit 4d562591018a51f155a2e5d8b9f3e5860111a327 Author: Richard Biener Date: Wed Aug 4 09:22:51 2021 +0200 tree-optimization/101769 - tail recursion creates possibly infinite loop This makes tail recursion optimization produce a loop structure manually rather than relying on loop fixup. That also allows the loop to be marked as finite (it would eventually blow the stack if it were not). 2021-08-04 Richard Biener PR tree-optimization/101769 * tree-tailcall.c (eliminate_tail_call): Add the created loop for the first recursion and return it via the new output parame= ter. (optimize_tail_call): Pass through new output param. (tree_optimize_tail_calls_1): After creating all latches, add the created loop to the loop tree. Do not mark loops for fixup. * g++.dg/tree-ssa/pr101769.C: New testcase. commit 5c73b94fdc46f03c761ee5c66e30e00a2bf9ee91 Author: Martin Liska Date: Wed Aug 4 09:48:05 2021 +0200 docs: document threader-mode param gcc/ChangeLog: * doc/invoke.texi: Document threader-mode param. commit 3ae1468e260bf1f8e8c8637133263010213b6ac9 Author: liuhongt Date: Wed Aug 4 13:20:56 2021 +0800 Add dg-require-effective-target for testcases. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_addsubmul_d-2.c: Add dg-require-effective-target for avx512. * gcc.target/i386/cond_op_addsubmul_q-2.c: Ditto. * gcc.target/i386/cond_op_addsubmul_w-2.c: Ditto. * gcc.target/i386/cond_op_addsubmuldiv_double-2.c: Ditto. * gcc.target/i386/cond_op_addsubmuldiv_float-2.c: Ditto. * gcc.target/i386/cond_op_fma_double-2.c: Ditto. * gcc.target/i386/cond_op_fma_float-2.c: Ditto. commit 2fc2e3917f9c8fd94f5d101477971d16c483ef88 Author: liuhongt Date: Wed Aug 4 11:41:37 2021 +0800 Support cond_{fma,fms,fnma,fnms} for vector float/double under AVX512. gcc/ChangeLog: * config/i386/sse.md (cond_fma): New expander. (cond_fms): Ditto. (cond_fnma): Ditto. (cond_fnms): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_fma_double-1.c: New test. * gcc.target/i386/cond_op_fma_double-2.c: New test. * gcc.target/i386/cond_op_fma_float-1.c: New test. * gcc.target/i386/cond_op_fma_float-2.c: New test.=