public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Richard Biener <rguenther@suse.de>,  gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]
Date: Mon, 09 Oct 2023 13:54:19 +0100	[thread overview]
Message-ID: <mpty1gcdkhw.fsf@arm.com> (raw)
In-Reply-To: <ZSPcBmLxejYfgyGq@tucnak> (Jakub Jelinek's message of "Mon, 9 Oct 2023 12:55:02 +0200")

Jakub Jelinek <jakub@redhat.com> writes:
> Hi!
>
> As mentioned in the _BitInt support thread, _BitInt(N) is currently limited
> by the wide_int/widest_int maximum precision limitation, which is depending
> on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION).
> That is fairly low limit for _BitInt, especially on the targets with the 191
> bit limitation.
>
> The following patch bumps that limit to 16319 bits on all arches, which is
> the limit imposed by INTEGER_CST representation (unsigned char members
> holding number of HOST_WIDE_INT limbs).
>
> In order to achieve that, wide_int is changed from a trivially copyable type
> which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or
> 11 limbs depending on target) limbs into a non-trivially copy constructible,
> copy assignable and destructible type which for the usual small cases (up
> to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses
> an inline array of limbs, but for larger precisions uses heap allocated
> limb array.  This makes wide_int unusable in GC structures, so for dwarf2out
> which was the only place which needed it there is a new rwide_int type
> (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
> inline and is trivially copyable (dwarf2out should never deal with large
> _BitInt constants, those should have been lowered earlier).
>
> Similarly, widest_int has been changed from a trivially copyable type which
> contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike
> wide_int didn't contain precision and assumed that to be
> WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy
> assignable and destructible type which has always WIDEST_INT_MAX_PRECISION
> precision (32640 bits currently, twice as much as INTEGER_CST limitation
> allows) and unlike wide_int decides depending on get_len () value whether
> it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap
> allocated one.  In wide-int.h this means we need to estimate an upper
> bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h)
> need to write, heap allocate if needed based on that estimation and upon
> set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS
> and allocated dynamically, while we actually need less than that
> copy/deallocate.  The unexact guesses are needed because the exact
> computation of the length in wide-int.cc is sometimes quite complex and
> especially canonicalize at the end can decrease it.  widest_int is again
> because of this not usable in GC structures, so cfgloop.h has been changed
> to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if
> we'd have larger _BitInt based iterators, programs having more than 128-bit
> iterators will be hopefully rare and I think it is fine to treat loops with
> more than 2^127 iterations as effectively possibly infinite, omp-general.cc
> is changed to use fixed_wide_int_storage <1024>, as it better should support
> scores with the same precision on all arches.
>
> Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing
> wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for
> larger lengths.
>
> On x86_64, the patch in --enable-checking=yes,rtl,extra configured
> bootstrapped cc1plus enlarges the .text section by 1.01% - from
> 0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc
> with the usual bootstrap option slows compilation down by 1.01%,
> user 4m22.046s and 4m22.384s on vanilla trunk vs.
> 4m25.947s and 4m25.581s on patched trunk.  I'm afraid some code size growth
> and compile time slowdown is unavoidable in this case, we use wide_int and
> widest_int everywhere, and while the rare cases are marked with UNLIKELY
> macros, it still means extra checks for it.

Yeah, it's unfortunate, but like you say, it's probably unavoidable.
Having effectively arbitrary-size integers breaks most of the simplifying
asssumptions.

> The patch also regresses
> +FAIL: gm2/pim/fail/largeconst.mod,  -O  
> +FAIL: gm2/pim/fail/largeconst.mod,  -O -g  
> +FAIL: gm2/pim/fail/largeconst.mod,  -O3 -fomit-frame-pointer  
> +FAIL: gm2/pim/fail/largeconst.mod,  -O3 -fomit-frame-pointer -finline-functions  
> +FAIL: gm2/pim/fail/largeconst.mod,  -Os  
> +FAIL: gm2/pim/fail/largeconst.mod,  -g  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -O  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -O -g  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -O3 -fomit-frame-pointer  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -O3 -fomit-frame-pointer -finline-functions  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -Os  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -g  
> tests, which previously were rejected with
> error: constant literal ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’ exceeds internal ZTYPE range
> kind of errors, but now are accepted.  Seems the FE tries to parse constants
> into widest_int in that case and only diagnoses if widest_int overflows,
> that seems wrong, it should at least punt if stuff doesn't fit into
> WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support
> for middle-end for precisions above 128-bit, it better should be using
> BITINT_TYPE.  Will file a PR and defer to Modula2 maintainer.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> I've additionally built it with the incremental attached patch and
> on make -C gcc check-gcc check-g++ -j32 -k it didn't show any
> wide_int/widest_int heap allocations unless a > 128-bit _BitInt or wb/uwb
> constant needing > 128-bit _BitInt was used in a testcase.

Overall it looks really good to me FWIW.  Some comments about the
wide-int.h changes below.  Will send a separate message about wide-int.cc.

> 2023-10-09  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR c/102989
> 	* wide-int.h: Adjust file comment to mention 4 different kinds
> 	instead of 3 and how they behave.
> 	(WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS.
> 	(WIDE_INT_MAX_INL_PRECISION): Define.
> 	(WIDE_INT_MAX_ELTS): Change to 255.  Assert that WIDE_INT_MAX_INL_ELTS
> 	is smaller than WIDE_INT_MAX_ELTS.
> 	(RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS,
> 	WIDEST_INT_MAX_PRECISION): Define.
> 	(WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers
> 	to pass 0 as a new argument.
> 	(class rwide_int_storage): Forward declare.
> 	(class widest_int_storage): Likewise.
> 	(rwide_int): New typedef.
> 	(widest_int, widest2_int): Change typedefs to use widest_int_storage
> 	rather than fixed_wide_int_storage.
> 	(enum wi::precision_type): Add WIDEST_CONST_PRECISION enumerator.
> 	(struct binary_traits): Add partial specializations for
> 	WIDEST_CONST_PRECISION.
> 	(generic_wide_int): Add needs_write_val_arg static data member.
> 	(int_traits): Likewise.
> 	(wide_int_storage): Replace val non-static data member with a union
> 	u of it and HOST_WIDE_INT *valp.  Declare copy constructor, copy
> 	assignment operator and destructor.  Add unsigned int argument to
> 	write_val.
> 	(wide_int_storage::wide_int_storage): Initialize precision to 0
> 	in the default ctor.  Remove unnecessary {}s around STATIC_ASSERTs.
> 	Assert in non-default ctor T's precision_type is not
> 	WIDEST_CONST_PRECISION and allocate u.valp for large precision.  Add
> 	copy constructor.
> 	(wide_int_storage::~wide_int_storage): New.
> 	(wide_int_storage::operator=): Add copy assignment operator.  In
> 	assignment operator remove unnecessary {}s around STATIC_ASSERTs,
> 	assert ctor T's precision_type is not WIDEST_CONST_PRECISION and
> 	if precision changes, deallocate and/or allocate u.valp.
> 	(wide_int_storage::get_val): Return u.valp rather than u.val for
> 	large precision.
> 	(wide_int_storage::write_val): Likewise.  Add an unused unsigned int
> 	argument.
> 	(wide_int_storage::set_len): Use write_val instead of writing val
> 	directly.
> 	(wide_int_storage::from, wide_int_storage::from_array): Adjust
> 	write_val callers.
> 	(wide_int_storage::create): Allocate u.valp for large precisions.
> 	(wi::int_traits <wide_int_storage>::get_binary_precision): New.
> 	(class rwide_int_storage): New class, copied from old wide_int.
> 	(rwide_int_storage::write_val): New, add unused unsigned int argument.
> 	(wi::int_traits <rwide_int_storage>): New.
> 	(wi::int_traits <rwide_int_storage>::get_binary_precision): New.
> 	(fixed_wide_int_storage::fixed_wide_int_storage): Make default
> 	ctor defaulted.
> 	(fixed_wide_int_storage::write_val): Add unused unsigned int argument.
> 	(fixed_wide_int_storage::from, fixed_wide_int_storage::from_array):
> 	Adjust write_val callers.
> 	(wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New.
> 	(WIDEST_INT): Define.
> 	(widest_int_storage): New template class.
> 	(wi::int_traits <widest_int_storage>): New.
> 	(trailing_wide_int_storage::write_val): Add unused unsigned int
> 	argument.
> 	(wi::get_binary_precision): Use
> 	wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision
> 	rather than get_precision on get_binary_result.
> 	(wi::copy): Adjust write_val callers.  Don't call set_len if
> 	needs_write_val_arg.
> 	(wi::bit_not): If result.needs_write_val_arg, call write_val
> 	again with upper bound estimate of len.
> 	(wi::sext, wi::zext, wi::set_bit): Likewise.
> 	(wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not,
> 	wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc,
> 	wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc,
> 	wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round,
> 	wi::lshift, wi::lrshift, wi::arshift): Likewise.
> 	(wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg
> 	is false.
> 	(gt_ggc_mx, gt_pch_nx): Remove generic template for all
> 	generic_wide_int, instead add functions and templates for each
> 	storage of generic_wide_int.  Make functions for
> 	generic_wide_int <wide_int_storage> and templates for
> 	generic_wide_int <widest_int_storage <N>> deleted.
> 	(wi::mask, wi::shifted_mask): Adjust write_val calls.
> 	* wide-int.cc (zeros): Decrease array size to 1.
> 	(BLOCKS_NEEDED): Use CEIL.
> 	(canonize): Use HOST_WIDE_INT_M1.
> 	(wi::from_buffer): Pass 0 to write_val.
> 	(wi::to_mpz): Use CEIL.
> 	(wi::from_mpz): Likewise.  Pass 0 to write_val.  Use
> 	WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS.
> 	(wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of
> 	MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec
> 	above WIDE_INT_MAX_INL_PRECISION estimate precision from
> 	lengths of operands.  Use XALLOCAVEC allocated buffers for
> 	prec above WIDE_INT_MAX_INL_PRECISION.
> 	(wi::divmod_internal): Likewise.
> 	(wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate
> 	it from xlen and skip.
> 	(rshift_large_common): Remove xprecision argument, add len
> 	argument with len computed in caller.  Don't return anything.
> 	(wi::lrshift_large, wi::arshift_large): Compute len here
> 	and pass it to rshift_large_common, for lengths above
> 	WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible.
> 	(assert_deceq, assert_hexeq): For lengths above
> 	WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer.
> 	(test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of
> 	WIDE_INT_MAX_PRECISION.
> 	* wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use
> 	WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION.
> 	* wide-int-print.cc (print_decs, print_decu, print_hex): For
> 	lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer.
> 	* tree.h (wi::int_traits<extended_tree <N>>): Change precision_type
> 	to WIDEST_CONST_PRECISION for N > ADDR_MAX_PRECISION.  Add
> 	inl_precision static data member.
> 	(widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of
> 	WIDE_INT_MAX_PRECISION.
> 	(wi::ints_for): Use int_traits <extended_tree <N> >::precision_type
> 	instead of hard coded CONST_PRECISION.
> 	(widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of
> 	WIDE_INT_MAX_PRECISION.
> 	(wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather
> 	than WIDE_INT_MAX_PRECISION.
> 	(wi::ints_for::zero): Use
> 	wi::int_traits <wi::extended_tree <N> >::precision_type instead of
> 	wi::CONST_PRECISION.
> 	* tree.cc (build_replicated_int_cst): Formatting fix.  Use
> 	WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS.
> 	* print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on
> 	INTEGER_CSTs, TREE_VECs or SSA_NAMEs.
> 	* poly-int.h (struct poly_coeff_traits): Add partial specialization
> 	for wi::WIDEST_CONST_PRECISION.
> 	* cfgloop.h (bound_wide_int): New typedef.
> 	(struct nb_iter_bound): Change bound type from widest_int to
> 	bound_wide_int.
> 	(struct loop): Change nb_iterations_upper_bound,
> 	nb_iterations_likely_upper_bound and nb_iterations_estimate type from
> 	widest_int to bound_wide_int.
> 	* cfgloop.cc (record_niter_bound): Return early if wi::min_precision
> 	of i_bound is too large for bound_wide_int.  Adjustments for the
> 	widest_int to bound_wide_int type change in non-static data members.
> 	(get_estimated_loop_iterations, get_max_loop_iterations,
> 	get_likely_max_loop_iterations): Adjustments for the widest_int to
> 	bound_wide_int type change in non-static data members.
> 	* tree-vect-loop.cc (vect_transform_loop): Likewise.
> 	* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
> 	XALLOCAVEC allocated buffer for i_bound len above
> 	WIDE_INT_MAX_INL_ELTS.
> 	(record_estimate): Return early if wi::min_precision of i_bound is too
> 	large for bound_wide_int.  Adjustments for the widest_int to
> 	bound_wide_int type change in non-static data members.
> 	(wide_int_cmp): Use bound_wide_int instead of widest_int.
> 	(bound_index): Use bound_wide_int instead of widest_int.
> 	(discover_iteration_bound_by_body_walk): Likewise.  Use
> 	widest_int::from to convert it to widest_int when passed to
> 	record_niter_bound.
> 	(maybe_lower_iteration_bound): Use widest_int::from to convert it to
> 	widest_int when passed to record_niter_bound.
> 	(estimate_numbers_of_iteration): Don't record upper bound if
> 	loop->nb_iterations has too large precision for bound_wide_int.
> 	(n_of_executions_at_most): Use widest_int::from.
> 	* tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for
> 	the widest_int to bound_wide_int changes.
> 	* match.pd (fold_sign_changed_comparison simplification): Use
> 	wide_int::from on wi::to_wide instead of wi::to_widest.
> 	* value-range.h (irange::maybe_resize): Avoid using memcpy on
> 	non-trivially copyable elements.
> 	* value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated
> 	buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE.
> 	* fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc):
> 	Use wide_int::from on wi::to_wide instead of wi::to_widest.
> 	* tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width
> 	before calling wi::udiv_trunc.
> 	* dwarf2out.h (wide_int_ptr): Remove.
> 	(rwice_int_ptr): New typedef.
> 	(struct dw_val_node): Use rwide_int_ptr for val_wide rather than
> 	wide_int_ptr.
> 	* dwarf2out.cc (get_full_len): Use rwide_int instead of wide_int.
> 	(insert_wide_int, add_AT_wide, mem_loc_descriptor, loc_descriptor,
> 	add_const_value_attribute): Likewise.
> 	* lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to
> 	bound_wide_int type change in non-static data members.
> 	* lto-streamer-in.cc (input_cfg): Likewise.
> 	(lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than
> 	WIDE_INT_MAX_ELTS.  For length above WIDE_INT_MAX_INL_ELTS use
> 	XALLOCAVEC allocated buffer.  Formatting fix.
> 	* data-streamer-in.cc (streamer_read_wide_int,
> 	streamer_read_widest_int): Likewise.
> 	* tree-affine.cc (aff_combination_expand): Use placement new to
> 	construct name_expansion.
> 	(free_name_expansion): Destruct name_expansion.
> 	* gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change
> 	index type from widest_int to offset_int.
> 	(class incr_info_d): Change incr type from widest_int to offset_int.
> 	(alloc_cand_and_find_basis, backtrace_base_for_ref,
> 	restructure_reference, slsr_process_ref, create_mul_ssa_cand,
> 	create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand,
> 	slsr_process_add, cand_abs_increment, replace_mult_candidate,
> 	replace_unconditional_candidate, incr_vec_index,
> 	create_add_on_incoming_edge, create_phi_basis_1,
> 	replace_conditional_candidate, record_increment,
> 	record_phi_increments_1, phi_incr_cost_1, phi_incr_cost,
> 	lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis,
> 	nearest_common_dominator_for_cands, insert_initializers,
> 	all_phi_incrs_profitable_1, replace_one_candidate,
> 	replace_profitable_candidates): Use offset_int rather than widest_int
> 	and wi::to_offset rather than wi::to_widest.
> 	* real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than
> 	2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC
> 	allocated buffer.
> 	* tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new
> 	to construct tree_niter_desc and destruct it on failure.
> 	(free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL.
> 	* gengtype.cc (main): Remove widest_int handling.
> 	* graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use
> 	WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS.
> 	* gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use
> 	WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and
> 	assert get_len () fits into it.
> 	* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
> 	For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC
> 	allocated buffer.
> 	* gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use
> 	wide_int::from on wi::to_wide instead of wi::to_widest.
> 	* omp-general.cc (score_wide_int): New typedef.
> 	(omp_context_compute_score): Use score_wide_int instead of widest_int
> 	and adjust for those changes.
> 	(struct omp_declare_variant_entry): Change score and
> 	score_in_declare_simd_clone non-static data member type from widest_int
> 	to score_wide_int.
> 	(omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use
> 	score_wide_int instead of widest_int and adjust for those changes.
> 	(omp_lto_output_declare_variant_alt): Likewise.
> 	(omp_lto_input_declare_variant_alt): Likewise.
> 	* godump.cc (go_output_typedef): Assert get_len () is smaller than
> 	WIDE_INT_MAX_INL_ELTS.
> gcc/c-family/
> 	* c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead
> 	of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS.
> gcc/testsuite/
> 	* gcc.dg/bitint-38.c: New test.
>
> --- gcc/wide-int.h.jj	2023-10-08 16:37:32.095269217 +0200
> +++ gcc/wide-int.h	2023-10-08 17:02:21.083935772 +0200
> @@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.
>     other longer storage GCC representations (rtl and tree).
>  
>     The actual precision of a wide_int depends on the flavor.  There
> -   are three predefined flavors:
> +   are four predefined flavors:
>  
>       1) wide_int (the default).  This flavor does the math in the
>       precision of its input arguments.  It is assumed (and checked)
> @@ -53,6 +53,10 @@ along with GCC; see the file COPYING3.
>       multiply, division, shifts, comparisons, and operations that need
>       overflow detected), the signedness must be specified separately.
>  
> +     For precisions up to WIDE_INT_MAX_INL_PRECISION, it uses an inline
> +     buffer in the type, for larger precisions up to WIDEST_INT_MAX_PRECISION
> +     it uses a pointer to heap allocated buffer.
> +
>       2) offset_int.  This is a fixed-precision integer that can hold
>       any address offset, measured in either bits or bytes, with at
>       least one extra sign bit.  At the moment the maximum address
> @@ -76,11 +80,15 @@ along with GCC; see the file COPYING3.
>         wi::leu_p (a, b) as a more efficient short-hand for
>         "a >= 0 && a <= b". ]
>  
> -     3) widest_int.  This representation is an approximation of
> +     3) rwide_int.  Restricted wide_int.  This is similar to
> +     wide_int, but maximum possible precision is RWIDE_INT_MAX_PRECISION
> +     and it always uses an inline buffer.  offset_int and rwide_int are
> +     GC-friendly, wide_int and widest_int are not.
> +
> +     4) widest_int.  This representation is an approximation of
>       infinite precision math.  However, it is not really infinite
>       precision math as in the GMP library.  It is really finite
> -     precision math where the precision is 4 times the size of the
> -     largest integer that the target port can represent.
> +     precision math where the precision is WIDEST_INT_MAX_PRECISION.
>  
>       Like offset_int, widest_int is wider than all the values that
>       it needs to represent, so the integers are logically signed.
> @@ -231,17 +239,34 @@ along with GCC; see the file COPYING3.
>     can be arbitrarily different from X.  */
>  
>  /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very
> -   early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
> +   early examination of the target's mode file.  The WIDE_INT_MAX_INL_ELTS
>     can accomodate at least 1 more bit so that unsigned numbers of that
>     mode can be represented as a signed value.  Note that it is still
>     possible to create fixed_wide_ints that have precisions greater than
>     MAX_BITSIZE_MODE_ANY_INT.  This can be useful when representing a
>     double-width multiplication result, for example.  */
> -#define WIDE_INT_MAX_ELTS \
> -  ((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT) / HOST_BITS_PER_WIDE_INT)
> -
> +#define WIDE_INT_MAX_INL_ELTS \
> +  ((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT) \
> +   / HOST_BITS_PER_WIDE_INT)
> +
> +#define WIDE_INT_MAX_INL_PRECISION \
> +  (WIDE_INT_MAX_INL_ELTS * HOST_BITS_PER_WIDE_INT)
> +
> +/* Precision of wide_int and largest _BitInt precision + 1 we can
> +   support.  */
> +#define WIDE_INT_MAX_ELTS 255
>  #define WIDE_INT_MAX_PRECISION (WIDE_INT_MAX_ELTS * HOST_BITS_PER_WIDE_INT)
>  
> +#define RWIDE_INT_MAX_ELTS WIDE_INT_MAX_INL_ELTS
> +#define RWIDE_INT_MAX_PRECISION WIDE_INT_MAX_INL_PRECISION
> +
> +/* Precision of widest_int and largest _BitInt precision + 1 we can
> +   support.  */
> +#define WIDEST_INT_MAX_ELTS 510
> +#define WIDEST_INT_MAX_PRECISION (WIDEST_INT_MAX_ELTS * HOST_BITS_PER_WIDE_INT)
> +
> +STATIC_ASSERT (WIDE_INT_MAX_INL_ELTS < WIDE_INT_MAX_ELTS);
> +
>  /* This is the max size of any pointer on any machine.  It does not
>     seem to be as easy to sniff this out of the machine description as
>     it is for MAX_BITSIZE_MODE_ANY_INT since targets may support
> @@ -307,17 +332,19 @@ along with GCC; see the file COPYING3.
>  #define WI_BINARY_RESULT_VAR(RESULT, VAL, T1, X, T2, Y) \
>    WI_BINARY_RESULT (T1, T2) RESULT = \
>      wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_result (X, Y); \
> -  HOST_WIDE_INT *VAL = RESULT.write_val ()
> +  HOST_WIDE_INT *VAL = RESULT.write_val (0)
>  
>  /* Similar for the result of a unary operation on X, which has type T.  */
>  #define WI_UNARY_RESULT_VAR(RESULT, VAL, T, X) \
>    WI_UNARY_RESULT (T) RESULT = \
>      wi::int_traits <WI_UNARY_RESULT (T)>::get_binary_result (X, X); \
> -  HOST_WIDE_INT *VAL = RESULT.write_val ()
> +  HOST_WIDE_INT *VAL = RESULT.write_val (0)
>  
>  template <typename T> class generic_wide_int;
>  template <int N> class fixed_wide_int_storage;
>  class wide_int_storage;
> +class rwide_int_storage;
> +template <int N> class widest_int_storage;
>  
>  /* An N-bit integer.  Until we can use typedef templates, use this instead.  */
>  #define FIXED_WIDE_INT(N) \
> @@ -325,10 +352,9 @@ class wide_int_storage;
>  
>  typedef generic_wide_int <wide_int_storage> wide_int;
>  typedef FIXED_WIDE_INT (ADDR_MAX_PRECISION) offset_int;
> -typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION) widest_int;
> -/* Spelled out explicitly (rather than through FIXED_WIDE_INT)
> -   so as not to confuse gengtype.  */
> -typedef generic_wide_int < fixed_wide_int_storage <WIDE_INT_MAX_PRECISION * 2> > widest2_int;
> +typedef generic_wide_int <rwide_int_storage> rwide_int;
> +typedef generic_wide_int <widest_int_storage <WIDE_INT_MAX_INL_PRECISION> > widest_int;
> +typedef generic_wide_int <widest_int_storage <WIDE_INT_MAX_INL_PRECISION * 2> > widest2_int;
>  
>  /* wi::storage_ref can be a reference to a primitive type,
>     so this is the conservatively-correct setting.  */
> @@ -380,7 +406,11 @@ namespace wi
>  
>      /* The integer has a constant precision (known at GCC compile time)
>         and is signed.  */
> -    CONST_PRECISION
> +    CONST_PRECISION,
> +
> +    /* Like CONST_PRECISION, but with WIDEST_INT_MAX_PRECISION or larger
> +       precision where not all elements of arrays are always present.  */
> +    WIDEST_CONST_PRECISION
>    };

Sorry to bring this up so late, but how about using INL_CONST_PRECISION
for the fully inline case and CONST_PRECISION for the general case?
That seems more consistent with the other naming in the patch.

>  
>    /* This class, which has no default implementation, is expected to
> @@ -390,9 +420,15 @@ namespace wi
>         Classifies the type of T.
>  
>       static const unsigned int precision;
> -       Only defined if precision_type == CONST_PRECISION.  Specifies the
> +       Only defined if precision_type == CONST_PRECISION or
> +       precision_type == WIDEST_CONST_PRECISION.  Specifies the
>         precision of all integers of type T.
>  
> +     static const unsigned int inl_precision;
> +       Only defined if precision_type == WIDEST_CONST_PRECISION.
> +       Specifies precision which is represented in the inline
> +       arrays.
> +
>       static const bool host_dependent_precision;
>         True if the precision of T depends (or can depend) on the host.
>  
> @@ -415,9 +451,10 @@ namespace wi
>    struct binary_traits;
>  
>    /* Specify the result type for each supported combination of binary
> -     inputs.  Note that CONST_PRECISION and VAR_PRECISION cannot be
> -     mixed, in order to give stronger type checking.  When both inputs
> -     are CONST_PRECISION, they must have the same precision.  */
> +     inputs.  Note that CONST_PRECISION, WIDEST_CONST_PRECISION and
> +     VAR_PRECISION cannot be mixed, in order to give stronger type
> +     checking.  When both inputs are CONST_PRECISION or both are
> +     WIDEST_CONST_PRECISION, they must have the same precision.  */
>    template <typename T1, typename T2>
>    struct binary_traits <T1, T2, FLEXIBLE_PRECISION, FLEXIBLE_PRECISION>
>    {
> @@ -447,6 +484,17 @@ namespace wi
>    };
>  
>    template <typename T1, typename T2>
> +  struct binary_traits <T1, T2, FLEXIBLE_PRECISION, WIDEST_CONST_PRECISION>
> +  {
> +    typedef generic_wide_int < widest_int_storage
> +			       <int_traits <T2>::inl_precision> > result_type;
> +    typedef result_type operator_result;
> +    typedef bool predicate_result;
> +    typedef result_type signed_shift_result_type;
> +    typedef bool signed_predicate_result;
> +  };
> +
> +  template <typename T1, typename T2>
>    struct binary_traits <T1, T2, VAR_PRECISION, FLEXIBLE_PRECISION>
>    {
>      typedef wide_int result_type;
> @@ -468,6 +516,17 @@ namespace wi
>    };
>  
>    template <typename T1, typename T2>
> +  struct binary_traits <T1, T2, WIDEST_CONST_PRECISION, FLEXIBLE_PRECISION>
> +  {
> +    typedef generic_wide_int < widest_int_storage
> +			       <int_traits <T1>::inl_precision> > result_type;
> +    typedef result_type operator_result;
> +    typedef bool predicate_result;
> +    typedef result_type signed_shift_result_type;
> +    typedef bool signed_predicate_result;
> +  };
> +
> +  template <typename T1, typename T2>
>    struct binary_traits <T1, T2, CONST_PRECISION, CONST_PRECISION>
>    {
>      STATIC_ASSERT (int_traits <T1>::precision == int_traits <T2>::precision);
> @@ -482,6 +541,18 @@ namespace wi
>    };
>  
>    template <typename T1, typename T2>
> +  struct binary_traits <T1, T2, WIDEST_CONST_PRECISION, WIDEST_CONST_PRECISION>
> +  {
> +    STATIC_ASSERT (int_traits <T1>::precision == int_traits <T2>::precision);

Should this assert for equal inl_precision too?  Although it probably
isn't necessary computationally, it seems a bit arbitrary to pick the
first inl_precision...

> +    typedef generic_wide_int < widest_int_storage
> +			       <int_traits <T1>::inl_precision> > result_type;

...here, and mismatched inl_precisions would break typing commutativity of +.

> +    typedef result_type operator_result;
> +    typedef bool predicate_result;
> +    typedef result_type signed_shift_result_type;
> +    typedef bool signed_predicate_result;
> +  };
> +
> +  template <typename T1, typename T2>
>    struct binary_traits <T1, T2, VAR_PRECISION, VAR_PRECISION>
>    {
>      typedef wide_int result_type;
> @@ -709,8 +780,10 @@ wi::storage_ref::get_val () const
>     Although not required by generic_wide_int itself, writable storage
>     classes can also provide the following functions:
>  
> -   HOST_WIDE_INT *write_val ()
> -     Get a modifiable version of get_val ()
> +   HOST_WIDE_INT *write_val (unsigned int)
> +     Get a modifiable version of get_val ().  The argument should be
> +     upper estimation for LEN (ignored by all storages but
> +     widest_int_storage).
>  
>     unsigned int set_len (unsigned int len)
>       Set the value returned by get_len () to LEN.  */
> @@ -777,6 +850,8 @@ public:
>  
>    static const bool is_sign_extended
>      = wi::int_traits <generic_wide_int <storage> >::is_sign_extended;
> +  static const bool needs_write_val_arg
> +    = wi::int_traits <generic_wide_int <storage> >::needs_write_val_arg;
>  };
>  
>  template <typename storage>
> @@ -1049,6 +1124,7 @@ namespace wi
>      static const enum precision_type precision_type = VAR_PRECISION;
>      static const bool host_dependent_precision = HDP;
>      static const bool is_sign_extended = SE;
> +    static const bool needs_write_val_arg = false;
>    };
>  }
>  
> @@ -1065,7 +1141,11 @@ namespace wi
>  class GTY(()) wide_int_storage
>  {
>  private:
> -  HOST_WIDE_INT val[WIDE_INT_MAX_ELTS];
> +  union
> +  {
> +    HOST_WIDE_INT val[WIDE_INT_MAX_INL_ELTS];
> +    HOST_WIDE_INT *valp;
> +  } GTY((skip)) u;
>    unsigned int len;
>    unsigned int precision;
>  
> @@ -1073,14 +1153,17 @@ public:
>    wide_int_storage ();
>    template <typename T>
>    wide_int_storage (const T &);
> +  wide_int_storage (const wide_int_storage &);
> +  ~wide_int_storage ();
>  
>    /* The standard generic_wide_int storage methods.  */
>    unsigned int get_precision () const;
>    const HOST_WIDE_INT *get_val () const;
>    unsigned int get_len () const;
> -  HOST_WIDE_INT *write_val ();
> +  HOST_WIDE_INT *write_val (unsigned int);
>    void set_len (unsigned int, bool = false);
>  
> +  wide_int_storage &operator = (const wide_int_storage &);
>    template <typename T>
>    wide_int_storage &operator = (const T &);
>  
> @@ -1099,12 +1182,15 @@ namespace wi
>      /* Guaranteed by a static assert in the wide_int_storage constructor.  */
>      static const bool host_dependent_precision = false;
>      static const bool is_sign_extended = true;
> +    static const bool needs_write_val_arg = false;
>      template <typename T1, typename T2>
>      static wide_int get_binary_result (const T1 &, const T2 &);
> +    template <typename T1, typename T2>
> +    static unsigned int get_binary_precision (const T1 &, const T2 &);
>    };
>  }
>  
> -inline wide_int_storage::wide_int_storage () {}
> +inline wide_int_storage::wide_int_storage () : precision (0) {}
>  
>  /* Initialize the storage from integer X, in its natural precision.
>     Note that we do not allow integers with host-dependent precision
> @@ -1113,21 +1199,75 @@ inline wide_int_storage::wide_int_storag
>  template <typename T>
>  inline wide_int_storage::wide_int_storage (const T &x)
>  {
> -  { STATIC_ASSERT (!wi::int_traits<T>::host_dependent_precision); }
> -  { STATIC_ASSERT (wi::int_traits<T>::precision_type != wi::CONST_PRECISION); }
> +  STATIC_ASSERT (!wi::int_traits<T>::host_dependent_precision);
> +  STATIC_ASSERT (wi::int_traits<T>::precision_type != wi::CONST_PRECISION);
> +  STATIC_ASSERT (wi::int_traits<T>::precision_type
> +		 != wi::WIDEST_CONST_PRECISION);
>    WIDE_INT_REF_FOR (T) xi (x);
>    precision = xi.precision;
> +  if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> +    u.valp = XNEWVEC (HOST_WIDE_INT, CEIL (precision, HOST_BITS_PER_WIDE_INT));
>    wi::copy (*this, xi);
>  }
>  
> +inline wide_int_storage::wide_int_storage (const wide_int_storage &x)
> +{
> +  len = x.len;
> +  precision = x.precision;
> +  if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> +    {
> +      u.valp = XNEWVEC (HOST_WIDE_INT, CEIL (precision, HOST_BITS_PER_WIDE_INT));
> +      memcpy (u.valp, x.u.valp, len * sizeof (HOST_WIDE_INT));
> +    }
> +  else if (LIKELY (precision))
> +    memcpy (u.val, x.u.val, len * sizeof (HOST_WIDE_INT));
> +}

Does the variable-length memcpy pay for itself?  If so, perhaps that's a
sign that we should have a smaller inline buffer for this class (say 2 HWIs).

It would probably be worth having a move constructor too.  I think that
could just memcpy the whole value and then clear u.valp in the argument
(where appropriate).

Same for assignment.

> +
> +inline wide_int_storage::~wide_int_storage ()
> +{
> +  if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> +    XDELETEVEC (u.valp);
> +}
> +
> +inline wide_int_storage&
> +wide_int_storage::operator = (const wide_int_storage &x)
> +{
> +  if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> +    {
> +      if (this == &x)
> +	return *this;
> +      XDELETEVEC (u.valp);
> +    }
> +  len = x.len;
> +  precision = x.precision;
> +  if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> +    {
> +      u.valp = XNEWVEC (HOST_WIDE_INT, CEIL (precision, HOST_BITS_PER_WIDE_INT));
> +      memcpy (u.valp, x.u.valp, len * sizeof (HOST_WIDE_INT));
> +    }
> +  else if (LIKELY (precision))
> +    memcpy (u.val, x.u.val, len * sizeof (HOST_WIDE_INT));
> +  return *this;
> +}
> +
>  template <typename T>
>  inline wide_int_storage&
>  wide_int_storage::operator = (const T &x)
>  {
> -  { STATIC_ASSERT (!wi::int_traits<T>::host_dependent_precision); }
> -  { STATIC_ASSERT (wi::int_traits<T>::precision_type != wi::CONST_PRECISION); }
> +  STATIC_ASSERT (!wi::int_traits<T>::host_dependent_precision);
> +  STATIC_ASSERT (wi::int_traits<T>::precision_type != wi::CONST_PRECISION);
> +  STATIC_ASSERT (wi::int_traits<T>::precision_type
> +		 != wi::WIDEST_CONST_PRECISION);
>    WIDE_INT_REF_FOR (T) xi (x);
> -  precision = xi.precision;
> +  if (UNLIKELY (precision != xi.precision))
> +    {
> +      if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> +	XDELETEVEC (u.valp);
> +      precision = xi.precision;
> +      if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> +	u.valp = XNEWVEC (HOST_WIDE_INT,
> +			  CEIL (precision, HOST_BITS_PER_WIDE_INT));
> +    }
>    wi::copy (*this, xi);
>    return *this;
>  }
> @@ -1141,7 +1281,7 @@ wide_int_storage::get_precision () const
>  inline const HOST_WIDE_INT *
>  wide_int_storage::get_val () const
>  {
> -  return val;
> +  return UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION) ? u.valp : u.val;
>  }
>  
>  inline unsigned int
> @@ -1151,9 +1291,9 @@ wide_int_storage::get_len () const
>  }
>  
>  inline HOST_WIDE_INT *
> -wide_int_storage::write_val ()
> +wide_int_storage::write_val (unsigned int)
>  {
> -  return val;
> +  return UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION) ? u.valp : u.val;
>  }
>  
>  inline void
> @@ -1161,8 +1301,10 @@ wide_int_storage::set_len (unsigned int
>  {
>    len = l;
>    if (!is_sign_extended && len * HOST_BITS_PER_WIDE_INT > precision)
> -    val[len - 1] = sext_hwi (val[len - 1],
> -			     precision % HOST_BITS_PER_WIDE_INT);
> +    {
> +      HOST_WIDE_INT &v = write_val (len)[len - 1];
> +      v = sext_hwi (v, precision % HOST_BITS_PER_WIDE_INT);
> +    }
>  }
>  
>  /* Treat X as having signedness SGN and convert it to a PRECISION-bit
> @@ -1172,7 +1314,7 @@ wide_int_storage::from (const wide_int_r
>  			signop sgn)
>  {
>    wide_int result = wide_int::create (precision);
> -  result.set_len (wi::force_to_size (result.write_val (), x.val, x.len,
> +  result.set_len (wi::force_to_size (result.write_val (x.len), x.val, x.len,
>  				     x.precision, precision, sgn));
>    return result;
>  }
> @@ -1185,7 +1327,7 @@ wide_int_storage::from_array (const HOST
>  			      unsigned int precision, bool need_canon_p)
>  {
>    wide_int result = wide_int::create (precision);
> -  result.set_len (wi::from_array (result.write_val (), val, len, precision,
> +  result.set_len (wi::from_array (result.write_val (len), val, len, precision,
>  				  need_canon_p));
>    return result;
>  }
> @@ -1196,6 +1338,9 @@ wide_int_storage::create (unsigned int p
>  {
>    wide_int x;
>    x.precision = precision;
> +  if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> +    x.u.valp = XNEWVEC (HOST_WIDE_INT,
> +			CEIL (precision, HOST_BITS_PER_WIDE_INT));
>    return x;
>  }
>  
> @@ -1212,6 +1357,194 @@ wi::int_traits <wide_int_storage>::get_b
>      return wide_int::create (wi::get_precision (x));
>  }
>  
> +template <typename T1, typename T2>
> +inline unsigned int
> +wi::int_traits <wide_int_storage>::get_binary_precision (const T1 &x,
> +							 const T2 &y)
> +{
> +  /* This shouldn't be used for two flexible-precision inputs.  */
> +  STATIC_ASSERT (wi::int_traits <T1>::precision_type != FLEXIBLE_PRECISION
> +		 || wi::int_traits <T2>::precision_type != FLEXIBLE_PRECISION);
> +  if (wi::int_traits <T1>::precision_type == FLEXIBLE_PRECISION)
> +    return wi::get_precision (y);
> +  else
> +    return wi::get_precision (x);
> +}
> +
> +/* The storage used by rwide_int.  */
> +class GTY(()) rwide_int_storage
> +{
> +private:
> +  HOST_WIDE_INT val[RWIDE_INT_MAX_ELTS];
> +  unsigned int len;
> +  unsigned int precision;
> +
> +public:
> +  rwide_int_storage () = default;
> +  template <typename T>
> +  rwide_int_storage (const T &);
> +
> +  /* The standard generic_rwide_int storage methods.  */
> +  unsigned int get_precision () const;
> +  const HOST_WIDE_INT *get_val () const;
> +  unsigned int get_len () const;
> +  HOST_WIDE_INT *write_val (unsigned int);
> +  void set_len (unsigned int, bool = false);
> +
> +  template <typename T>
> +  rwide_int_storage &operator = (const T &);
> +
> +  static rwide_int from (const wide_int_ref &, unsigned int, signop);
> +  static rwide_int from_array (const HOST_WIDE_INT *, unsigned int,
> +			       unsigned int, bool = true);
> +  static rwide_int create (unsigned int);
> +};
> +
> +namespace wi
> +{
> +  template <>
> +  struct int_traits <rwide_int_storage>
> +  {
> +    static const enum precision_type precision_type = VAR_PRECISION;
> +    /* Guaranteed by a static assert in the rwide_int_storage constructor.  */
> +    static const bool host_dependent_precision = false;
> +    static const bool is_sign_extended = true;
> +    static const bool needs_write_val_arg = false;
> +    template <typename T1, typename T2>
> +    static rwide_int get_binary_result (const T1 &, const T2 &);
> +    template <typename T1, typename T2>
> +    static unsigned int get_binary_precision (const T1 &, const T2 &);
> +  };
> +}
> +
> +/* Initialize the storage from integer X, in its natural precision.
> +   Note that we do not allow integers with host-dependent precision
> +   to become rwide_ints; rwide_ints must always be logically independent
> +   of the host.  */
> +template <typename T>
> +inline rwide_int_storage::rwide_int_storage (const T &x)
> +{
> +  STATIC_ASSERT (!wi::int_traits<T>::host_dependent_precision);
> +  STATIC_ASSERT (wi::int_traits<T>::precision_type != wi::CONST_PRECISION);
> +  STATIC_ASSERT (wi::int_traits<T>::precision_type
> +		 != wi::WIDEST_CONST_PRECISION);
> +  WIDE_INT_REF_FOR (T) xi (x);
> +  precision = xi.precision;
> +  gcc_assert (precision <= RWIDE_INT_MAX_PRECISION);
> +  wi::copy (*this, xi);
> +}
> +
> +template <typename T>
> +inline rwide_int_storage&
> +rwide_int_storage::operator = (const T &x)
> +{
> +  STATIC_ASSERT (!wi::int_traits<T>::host_dependent_precision);
> +  STATIC_ASSERT (wi::int_traits<T>::precision_type != wi::CONST_PRECISION);
> +  STATIC_ASSERT (wi::int_traits<T>::precision_type
> +		 != wi::WIDEST_CONST_PRECISION);
> +  WIDE_INT_REF_FOR (T) xi (x);
> +  precision = xi.precision;
> +  gcc_assert (precision <= RWIDE_INT_MAX_PRECISION);
> +  wi::copy (*this, xi);
> +  return *this;
> +}
> +
> +inline unsigned int
> +rwide_int_storage::get_precision () const
> +{
> +  return precision;
> +}
> +
> +inline const HOST_WIDE_INT *
> +rwide_int_storage::get_val () const
> +{
> +  return val;
> +}
> +
> +inline unsigned int
> +rwide_int_storage::get_len () const
> +{
> +  return len;
> +}
> +
> +inline HOST_WIDE_INT *
> +rwide_int_storage::write_val (unsigned int)
> +{
> +  return val;
> +}
> +
> +inline void
> +rwide_int_storage::set_len (unsigned int l, bool is_sign_extended)
> +{
> +  len = l;
> +  if (!is_sign_extended && len * HOST_BITS_PER_WIDE_INT > precision)
> +    val[len - 1] = sext_hwi (val[len - 1],
> +			     precision % HOST_BITS_PER_WIDE_INT);
> +}
> +
> +/* Treat X as having signedness SGN and convert it to a PRECISION-bit
> +   number.  */
> +inline rwide_int
> +rwide_int_storage::from (const wide_int_ref &x, unsigned int precision,
> +			 signop sgn)
> +{
> +  rwide_int result = rwide_int::create (precision);
> +  result.set_len (wi::force_to_size (result.write_val (x.len), x.val, x.len,
> +				     x.precision, precision, sgn));
> +  return result;
> +}
> +
> +/* Create a rwide_int from the explicit block encoding given by VAL and
> +   LEN.  PRECISION is the precision of the integer.  NEED_CANON_P is
> +   true if the encoding may have redundant trailing blocks.  */
> +inline rwide_int
> +rwide_int_storage::from_array (const HOST_WIDE_INT *val, unsigned int len,
> +			       unsigned int precision, bool need_canon_p)
> +{
> +  rwide_int result = rwide_int::create (precision);
> +  result.set_len (wi::from_array (result.write_val (len), val, len, precision,
> +				  need_canon_p));
> +  return result;
> +}
> +
> +/* Return an uninitialized rwide_int with precision PRECISION.  */
> +inline rwide_int
> +rwide_int_storage::create (unsigned int precision)
> +{
> +  rwide_int x;
> +  gcc_assert (precision <= RWIDE_INT_MAX_PRECISION);
> +  x.precision = precision;
> +  return x;
> +}
> +
> +template <typename T1, typename T2>
> +inline rwide_int
> +wi::int_traits <rwide_int_storage>::get_binary_result (const T1 &x,
> +						       const T2 &y)
> +{
> +  /* This shouldn't be used for two flexible-precision inputs.  */
> +  STATIC_ASSERT (wi::int_traits <T1>::precision_type != FLEXIBLE_PRECISION
> +		 || wi::int_traits <T2>::precision_type != FLEXIBLE_PRECISION);
> +  if (wi::int_traits <T1>::precision_type == FLEXIBLE_PRECISION)
> +    return rwide_int::create (wi::get_precision (y));
> +  else
> +    return rwide_int::create (wi::get_precision (x));
> +}
> +
> +template <typename T1, typename T2>
> +inline unsigned int
> +wi::int_traits <rwide_int_storage>::get_binary_precision (const T1 &x,
> +							  const T2 &y)
> +{
> +  /* This shouldn't be used for two flexible-precision inputs.  */
> +  STATIC_ASSERT (wi::int_traits <T1>::precision_type != FLEXIBLE_PRECISION
> +		 || wi::int_traits <T2>::precision_type != FLEXIBLE_PRECISION);
> +  if (wi::int_traits <T1>::precision_type == FLEXIBLE_PRECISION)
> +    return wi::get_precision (y);
> +  else
> +    return wi::get_precision (x);
> +}
> +
>  /* The storage used by FIXED_WIDE_INT (N).  */
>  template <int N>
>  class GTY(()) fixed_wide_int_storage
> @@ -1221,7 +1554,7 @@ private:
>    unsigned int len;
>  
>  public:
> -  fixed_wide_int_storage ();
> +  fixed_wide_int_storage () = default;
>    template <typename T>
>    fixed_wide_int_storage (const T &);
>  
> @@ -1229,7 +1562,7 @@ public:
>    unsigned int get_precision () const;
>    const HOST_WIDE_INT *get_val () const;
>    unsigned int get_len () const;
> -  HOST_WIDE_INT *write_val ();
> +  HOST_WIDE_INT *write_val (unsigned int);
>    void set_len (unsigned int, bool = false);
>  
>    static FIXED_WIDE_INT (N) from (const wide_int_ref &, signop);
> @@ -1245,15 +1578,15 @@ namespace wi
>      static const enum precision_type precision_type = CONST_PRECISION;
>      static const bool host_dependent_precision = false;
>      static const bool is_sign_extended = true;
> +    static const bool needs_write_val_arg = false;
>      static const unsigned int precision = N;
>      template <typename T1, typename T2>
>      static FIXED_WIDE_INT (N) get_binary_result (const T1 &, const T2 &);
> +    template <typename T1, typename T2>
> +    static unsigned int get_binary_precision (const T1 &, const T2 &);
>    };
>  }
>  
> -template <int N>
> -inline fixed_wide_int_storage <N>::fixed_wide_int_storage () {}
> -
>  /* Initialize the storage from integer X, in precision N.  */
>  template <int N>
>  template <typename T>
> @@ -1288,7 +1621,7 @@ fixed_wide_int_storage <N>::get_len () c
>  
>  template <int N>
>  inline HOST_WIDE_INT *
> -fixed_wide_int_storage <N>::write_val ()
> +fixed_wide_int_storage <N>::write_val (unsigned int)
>  {
>    return val;
>  }
> @@ -1308,7 +1641,7 @@ inline FIXED_WIDE_INT (N)
>  fixed_wide_int_storage <N>::from (const wide_int_ref &x, signop sgn)
>  {
>    FIXED_WIDE_INT (N) result;
> -  result.set_len (wi::force_to_size (result.write_val (), x.val, x.len,
> +  result.set_len (wi::force_to_size (result.write_val (x.len), x.val, x.len,
>  				     x.precision, N, sgn));
>    return result;
>  }
> @@ -1323,7 +1656,7 @@ fixed_wide_int_storage <N>::from_array (
>  					bool need_canon_p)
>  {
>    FIXED_WIDE_INT (N) result;
> -  result.set_len (wi::from_array (result.write_val (), val, len,
> +  result.set_len (wi::from_array (result.write_val (len), val, len,
>  				  N, need_canon_p));
>    return result;
>  }
> @@ -1337,6 +1670,244 @@ get_binary_result (const T1 &, const T2
>    return FIXED_WIDE_INT (N) ();
>  }
>  
> +template <int N>
> +template <typename T1, typename T2>
> +inline unsigned int
> +wi::int_traits < fixed_wide_int_storage <N> >::
> +get_binary_precision (const T1 &, const T2 &)
> +{
> +  return N;
> +}
> +
> +#define WIDEST_INT(N) generic_wide_int < widest_int_storage <N> >

FTR: current code used this construct to work within pre-C++11 limitations.
It would be better as a templated using instead.  But I suppose that's a
separate clean-up and that it's better to stick a single style until then.

> +
> +/* The storage used by widest_int.  */
> +template <int N>
> +class GTY(()) widest_int_storage
> +{
> +private:
> +  union
> +  {
> +    HOST_WIDE_INT val[WIDE_INT_MAX_HWIS (N)];
> +    HOST_WIDE_INT *valp;
> +  } GTY((skip)) u;
> +  unsigned int len;
> +
> +public:
> +  widest_int_storage ();
> +  widest_int_storage (const widest_int_storage &);
> +  template <typename T>
> +  widest_int_storage (const T &);
> +  ~widest_int_storage ();
> +  widest_int_storage &operator = (const widest_int_storage &);
> +  template <typename T>
> +  inline widest_int_storage& operator = (const T &);
> +
> +  /* The standard generic_wide_int storage methods.  */
> +  unsigned int get_precision () const;
> +  const HOST_WIDE_INT *get_val () const;
> +  unsigned int get_len () const;
> +  HOST_WIDE_INT *write_val (unsigned int);
> +  void set_len (unsigned int, bool = false);
> +
> +  static WIDEST_INT (N) from (const wide_int_ref &, signop);
> +  static WIDEST_INT (N) from_array (const HOST_WIDE_INT *, unsigned int,
> +				    bool = true);
> +};
> +
> +namespace wi
> +{
> +  template <int N>
> +  struct int_traits < widest_int_storage <N> >
> +  {
> +    static const enum precision_type precision_type = WIDEST_CONST_PRECISION;
> +    static const bool host_dependent_precision = false;
> +    static const bool is_sign_extended = true;
> +    static const bool needs_write_val_arg = true;
> +    static const unsigned int precision
> +      = N / WIDE_INT_MAX_INL_PRECISION * WIDEST_INT_MAX_PRECISION;

What's the reasoning behind this calculation?  It would give 0 for
N < WIDE_INT_MAX_INL_PRECISION, and the "MAX" suggests that N
shouldn't be > WIDE_INT_MAX_INL_PRECISION either.

I wonder whether this should be a second template parameter, with an
assert that precision > inl_precision.

> +    static const unsigned int inl_precision = N;
> +    template <typename T1, typename T2>
> +    static WIDEST_INT (N) get_binary_result (const T1 &, const T2 &);
> +    template <typename T1, typename T2>
> +    static unsigned int get_binary_precision (const T1 &, const T2 &);
> +  };
> +}
> +
> +template <int N>
> +inline widest_int_storage <N>::widest_int_storage () : len (0) {}
> +
> +/* Initialize the storage from integer X, in precision N.  */
> +template <int N>
> +template <typename T>
> +inline widest_int_storage <N>::widest_int_storage (const T &x) : len (0)
> +{
> +  /* Check for type compatibility.  We don't want to initialize a
> +     widest integer from something like a wide_int.  */
> +  WI_BINARY_RESULT (T, WIDEST_INT (N)) *assertion ATTRIBUTE_UNUSED;
> +  wi::copy (*this, WIDE_INT_REF_FOR (T) (x, N / WIDE_INT_MAX_INL_PRECISION
> +					    * WIDEST_INT_MAX_PRECISION));
> +}
> +
> +template <int N>
> +inline
> +widest_int_storage <N>::widest_int_storage (const widest_int_storage &x)
> +{
> +  len = x.len;
> +  if (UNLIKELY (len > N / HOST_BITS_PER_WIDE_INT))
> +    {
> +      u.valp = XNEWVEC (HOST_WIDE_INT, len);
> +      memcpy (u.valp, x.u.valp, len * sizeof (HOST_WIDE_INT));
> +    }
> +  else
> +    memcpy (u.val, x.u.val, len * sizeof (HOST_WIDE_INT));
> +}
> +
> +template <int N>
> +inline widest_int_storage <N>::~widest_int_storage ()
> +{
> +  if (UNLIKELY (len > N / HOST_BITS_PER_WIDE_INT))
> +    XDELETEVEC (u.valp);
> +}
> +
> +template <int N>
> +inline widest_int_storage <N>&
> +widest_int_storage <N>::operator = (const widest_int_storage <N> &x)
> +{
> +  if (UNLIKELY (len > N / HOST_BITS_PER_WIDE_INT))
> +    {
> +      if (this == &x)
> +	return *this;
> +      XDELETEVEC (u.valp);
> +    }
> +  len = x.len;
> +  if (UNLIKELY (len > N / HOST_BITS_PER_WIDE_INT))
> +    {
> +      u.valp = XNEWVEC (HOST_WIDE_INT, len);
> +      memcpy (u.valp, x.u.valp, len * sizeof (HOST_WIDE_INT));
> +    }
> +  else
> +    memcpy (u.val, x.u.val, len * sizeof (HOST_WIDE_INT));
> +  return *this;
> +}
> +
> +template <int N>
> +template <typename T>
> +inline widest_int_storage <N>&
> +widest_int_storage <N>::operator = (const T &x)
> +{
> +  /* Check for type compatibility.  We don't want to assign a
> +     widest integer from something like a wide_int.  */
> +  WI_BINARY_RESULT (T, WIDEST_INT (N)) *assertion ATTRIBUTE_UNUSED;
> +  if (UNLIKELY (len > N / HOST_BITS_PER_WIDE_INT))
> +    XDELETEVEC (u.valp);
> +  len = 0;
> +  wi::copy (*this, WIDE_INT_REF_FOR (T) (x, N / WIDE_INT_MAX_INL_PRECISION
> +					    * WIDEST_INT_MAX_PRECISION));
> +  return *this;
> +}
> +
> +template <int N>
> +inline unsigned int
> +widest_int_storage <N>::get_precision () const
> +{
> +  return N / WIDE_INT_MAX_INL_PRECISION * WIDEST_INT_MAX_PRECISION;
> +}
> +
> +template <int N>
> +inline const HOST_WIDE_INT *
> +widest_int_storage <N>::get_val () const
> +{
> +  return UNLIKELY (len > N / HOST_BITS_PER_WIDE_INT) ? u.valp : u.val;
> +}
> +
> +template <int N>
> +inline unsigned int
> +widest_int_storage <N>::get_len () const
> +{
> +  return len;
> +}
> +
> +template <int N>
> +inline HOST_WIDE_INT *
> +widest_int_storage <N>::write_val (unsigned int l)
> +{
> +  if (UNLIKELY (len > N / HOST_BITS_PER_WIDE_INT))
> +    XDELETEVEC (u.valp);
> +  len = l;
> +  if (UNLIKELY (l > N / HOST_BITS_PER_WIDE_INT))
> +    {
> +      u.valp = XNEWVEC (HOST_WIDE_INT, l);
> +      return u.valp;
> +    }
> +  return u.val;
> +}
> +
> +template <int N>
> +inline void
> +widest_int_storage <N>::set_len (unsigned int l, bool)
> +{
> +  gcc_checking_assert (l <= len);
> +  if (UNLIKELY (len > N / HOST_BITS_PER_WIDE_INT)
> +      && l <= N / HOST_BITS_PER_WIDE_INT)
> +    {
> +      HOST_WIDE_INT *valp = u.valp;
> +      memcpy (u.val, valp, l * sizeof (u.val[0]));
> +      XDELETEVEC (valp);
> +    }
> +  len = l;
> +  /* There are no excess bits in val[len - 1].  */
> +  STATIC_ASSERT (N % HOST_BITS_PER_WIDE_INT == 0);
> +}
> +
> +/* Treat X as having signedness SGN and convert it to an N-bit number.  */
> +template <int N>
> +inline WIDEST_INT (N)
> +widest_int_storage <N>::from (const wide_int_ref &x, signop sgn)
> +{
> +  WIDEST_INT (N) result;
> +  unsigned int exp_len = x.len;
> +  unsigned int prec = result.get_precision ();
> +  if (sgn == UNSIGNED && prec > x.precision && x.val[x.len - 1] < 0)
> +    exp_len = CEIL (x.precision, HOST_BITS_PER_WIDE_INT) + 1;
> +  result.set_len (wi::force_to_size (result.write_val (exp_len), x.val, x.len,
> +				     x.precision, prec, sgn));
> +  return result;
> +}
> +
> +/* Create a WIDEST_INT (N) from the explicit block encoding given by
> +   VAL and LEN.  NEED_CANON_P is true if the encoding may have redundant
> +   trailing blocks.  */
> +template <int N>
> +inline WIDEST_INT (N)
> +widest_int_storage <N>::from_array (const HOST_WIDE_INT *val,
> +				    unsigned int len,
> +				    bool need_canon_p)
> +{
> +  WIDEST_INT (N) result;
> +  result.set_len (wi::from_array (result.write_val (len), val, len,
> +				  result.get_precision (), need_canon_p));
> +  return result;
> +}
> +
> +template <int N>
> +template <typename T1, typename T2>
> +inline WIDEST_INT (N)
> +wi::int_traits < widest_int_storage <N> >::
> +get_binary_result (const T1 &, const T2 &)
> +{
> +  return WIDEST_INT (N) ();
> +}
> +
> +template <int N>
> +template <typename T1, typename T2>
> +inline unsigned int
> +wi::int_traits < widest_int_storage <N> >::
> +get_binary_precision (const T1 &, const T2 &)
> +{
> +  return N / WIDE_INT_MAX_INL_PRECISION * WIDEST_INT_MAX_PRECISION;
> +}
> +
>  /* A reference to one element of a trailing_wide_ints structure.  */
>  class trailing_wide_int_storage
>  {
> @@ -1359,7 +1930,7 @@ public:
>    unsigned int get_len () const;
>    unsigned int get_precision () const;
>    const HOST_WIDE_INT *get_val () const;
> -  HOST_WIDE_INT *write_val ();
> +  HOST_WIDE_INT *write_val (unsigned int);
>    void set_len (unsigned int, bool = false);
>  
>    template <typename T>
> @@ -1445,7 +2016,7 @@ trailing_wide_int_storage::get_val () co
>  }
>  
>  inline HOST_WIDE_INT *
> -trailing_wide_int_storage::write_val ()
> +trailing_wide_int_storage::write_val (unsigned int)
>  {
>    return m_val;
>  }
> @@ -1528,6 +2099,7 @@ namespace wi
>      static const enum precision_type precision_type = FLEXIBLE_PRECISION;
>      static const bool host_dependent_precision = true;
>      static const bool is_sign_extended = true;
> +    static const bool needs_write_val_arg = false;
>      static unsigned int get_precision (T);
>      static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int, T);
>    };
> @@ -1699,6 +2271,7 @@ namespace wi
>         precision of HOST_WIDE_INT.  */
>      static const bool host_dependent_precision = false;
>      static const bool is_sign_extended = true;
> +    static const bool needs_write_val_arg = false;
>      static unsigned int get_precision (const wi::hwi_with_prec &);
>      static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int,
>  				      const wi::hwi_with_prec &);
> @@ -1804,8 +2377,8 @@ template <typename T1, typename T2>
>  inline unsigned int
>  wi::get_binary_precision (const T1 &x, const T2 &y)
>  {
> -  return get_precision (wi::int_traits <WI_BINARY_RESULT (T1, T2)>::
> -			get_binary_result (x, y));
> +  return wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision (x,
> +									   y);

Nit: might format more naturally with:

  using res_traits = wi::int_traits <WI_BINARY_RESULT (T1, T2)>:
  ...

>  }
>  
>  /* Copy the contents of Y to X, but keeping X's current precision.  */
> @@ -1813,14 +2386,17 @@ template <typename T1, typename T2>
>  inline void
>  wi::copy (T1 &x, const T2 &y)
>  {
> -  HOST_WIDE_INT *xval = x.write_val ();
> -  const HOST_WIDE_INT *yval = y.get_val ();
>    unsigned int len = y.get_len ();
> +  HOST_WIDE_INT *xval = x.write_val (len);
> +  const HOST_WIDE_INT *yval = y.get_val ();
>    unsigned int i = 0;
>    do
>      xval[i] = yval[i];
>    while (++i < len);
> -  x.set_len (len, y.is_sign_extended);
> +  /* For widest_int write_val is called with an exact value, not
> +     upper bound for len, so nothing is needed further.  */
> +  if (!wi::int_traits <T1>::needs_write_val_arg)
> +    x.set_len (len, y.is_sign_extended);
>  }
>  
>  /* Return true if X fits in a HOST_WIDE_INT with no loss of precision.  */
> @@ -2162,6 +2738,8 @@ wi::bit_not (const T &x)
>  {
>    WI_UNARY_RESULT_VAR (result, val, T, x);
>    WIDE_INT_REF_FOR (T) xi (x, get_precision (result));
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (xi.len);
>    for (unsigned int i = 0; i < xi.len; ++i)
>      val[i] = ~xi.val[i];
>    result.set_len (xi.len);
> @@ -2203,6 +2781,9 @@ wi::sext (const T &x, unsigned int offse
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T) xi (x, precision);
>  
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len,
> +				 CEIL (offset, HOST_BITS_PER_WIDE_INT)));

Why MAX rather than MIN?

>    if (offset <= HOST_BITS_PER_WIDE_INT)
>      {
>        val[0] = sext_hwi (xi.ulow (), offset);

I wondered for this kind of thing whether we should have:

  if (result.needs_write_val_arg)
    val = result.write_val (1);

and leave the complicated case in the slow path.  But maybe it doesn't
pay for itself.

> @@ -2259,6 +2843,9 @@ wi::set_bit (const T &x, unsigned int bi
>    WI_UNARY_RESULT_VAR (result, val, T, x);
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T) xi (x, precision);
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len,
> +				 bit / HOST_BITS_PER_WIDE_INT + 1));
>    if (precision <= HOST_BITS_PER_WIDE_INT)
>      {
>        val[0] = xi.ulow () | (HOST_WIDE_INT_1U << bit);
> @@ -2280,6 +2867,8 @@ wi::bswap (const T &x)
>    WI_UNARY_RESULT_VAR (result, val, T, x);
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T) xi (x, precision);
> +  if (result.needs_write_val_arg)
> +    gcc_unreachable (); /* bswap on widest_int makes no sense.  */

Doesn't this work as a static_assert?  (You might have covered this
before, sorry.)

>    result.set_len (bswap_large (val, xi.val, xi.len, precision));
>    return result;
>  }
> @@ -2292,6 +2881,8 @@ wi::bitreverse (const T &x)
>    WI_UNARY_RESULT_VAR (result, val, T, x);
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T) xi (x, precision);
> +  if (result.needs_write_val_arg)
> +    gcc_unreachable (); /* bitreverse on widest_int makes no sense.  */
>    result.set_len (bitreverse_large (val, xi.val, xi.len, precision));
>    return result;
>  }
> @@ -2368,6 +2959,8 @@ wi::bit_and (const T1 &x, const T2 &y)
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
>    bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended;
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len));
>    if (LIKELY (xi.len + yi.len == 2))
>      {
>        val[0] = xi.ulow () & yi.ulow ();
> @@ -2389,6 +2982,8 @@ wi::bit_and_not (const T1 &x, const T2 &
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
>    bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended;
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len));
>    if (LIKELY (xi.len + yi.len == 2))
>      {
>        val[0] = xi.ulow () & ~yi.ulow ();
> @@ -2410,6 +3005,8 @@ wi::bit_or (const T1 &x, const T2 &y)
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
>    bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended;
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len));
>    if (LIKELY (xi.len + yi.len == 2))
>      {
>        val[0] = xi.ulow () | yi.ulow ();
> @@ -2431,6 +3028,8 @@ wi::bit_or_not (const T1 &x, const T2 &y
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
>    bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended;
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len));
>    if (LIKELY (xi.len + yi.len == 2))
>      {
>        val[0] = xi.ulow () | ~yi.ulow ();
> @@ -2452,6 +3051,8 @@ wi::bit_xor (const T1 &x, const T2 &y)
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
>    bool is_sign_extended = xi.is_sign_extended && yi.is_sign_extended;
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len));
>    if (LIKELY (xi.len + yi.len == 2))
>      {
>        val[0] = xi.ulow () ^ yi.ulow ();
> @@ -2472,6 +3073,8 @@ wi::add (const T1 &x, const T2 &y)
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len) + 1);
>    if (precision <= HOST_BITS_PER_WIDE_INT)
>      {
>        val[0] = xi.ulow () + yi.ulow ();
> @@ -2515,6 +3118,8 @@ wi::add (const T1 &x, const T2 &y, signo
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len) + 1);
>    if (precision <= HOST_BITS_PER_WIDE_INT)
>      {
>        unsigned HOST_WIDE_INT xl = xi.ulow ();
> @@ -2558,6 +3163,8 @@ wi::sub (const T1 &x, const T2 &y)
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len) + 1);
>    if (precision <= HOST_BITS_PER_WIDE_INT)
>      {
>        val[0] = xi.ulow () - yi.ulow ();
> @@ -2601,6 +3208,8 @@ wi::sub (const T1 &x, const T2 &y, signo
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (MAX (xi.len, yi.len) + 1);
>    if (precision <= HOST_BITS_PER_WIDE_INT)
>      {
>        unsigned HOST_WIDE_INT xl = xi.ulow ();
> @@ -2643,6 +3252,8 @@ wi::mul (const T1 &x, const T2 &y)
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (xi.len + yi.len + 2);
>    if (precision <= HOST_BITS_PER_WIDE_INT)
>      {
>        val[0] = xi.ulow () * yi.ulow ();

I realise this is deliberately conservative, just curious: why + 2
rather than + 1?

Thanks,
Richard

> @@ -2664,6 +3275,8 @@ wi::mul (const T1 &x, const T2 &y, signo
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (xi.len + yi.len + 2);
>    result.set_len (mul_internal (val, xi.val, xi.len,
>  				yi.val, yi.len, precision,
>  				sgn, overflow, false));
> @@ -2698,6 +3311,8 @@ wi::mul_high (const T1 &x, const T2 &y,
>    unsigned int precision = get_precision (result);
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y, precision);
> +  if (result.needs_write_val_arg)
> +    gcc_unreachable (); /* mul_high on widest_int doesn't make sense.  */
>    result.set_len (mul_internal (val, xi.val, xi.len,
>  				yi.val, yi.len, precision,
>  				sgn, 0, true));
> @@ -2716,6 +3331,12 @@ wi::div_trunc (const T1 &x, const T2 &y,
>    WIDE_INT_REF_FOR (T1) xi (x, precision);
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
> +  if (quotient.needs_write_val_arg)
> +    quotient_val = quotient.write_val ((sgn == UNSIGNED
> +					&& xi.val[xi.len - 1] < 0)
> +				       ? CEIL (precision,
> +					       HOST_BITS_PER_WIDE_INT) + 1
> +				       : xi.len + 1);
>    quotient.set_len (divmod_internal (quotient_val, 0, 0, xi.val, xi.len,
>  				     precision,
>  				     yi.val, yi.len, yi.precision,
> @@ -2753,6 +3374,15 @@ wi::div_floor (const T1 &x, const T2 &y,
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
>    unsigned int remainder_len;
> +  if (quotient.needs_write_val_arg)
> +    {
> +      quotient_val = quotient.write_val ((sgn == UNSIGNED
> +					  && xi.val[xi.len - 1] < 0)
> +					 ? CEIL (precision,
> +						 HOST_BITS_PER_WIDE_INT) + 1
> +					 : xi.len + 1);
> +      remainder_val = remainder.write_val (yi.len);
> +    }
>    quotient.set_len (divmod_internal (quotient_val,
>  				     &remainder_len, remainder_val,
>  				     xi.val, xi.len, precision,
> @@ -2795,6 +3425,15 @@ wi::div_ceil (const T1 &x, const T2 &y,
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
>    unsigned int remainder_len;
> +  if (quotient.needs_write_val_arg)
> +    {
> +      quotient_val = quotient.write_val ((sgn == UNSIGNED
> +					  && xi.val[xi.len - 1] < 0)
> +					 ? CEIL (precision,
> +						 HOST_BITS_PER_WIDE_INT) + 1
> +					 : xi.len + 1);
> +      remainder_val = remainder.write_val (yi.len);
> +    }
>    quotient.set_len (divmod_internal (quotient_val,
>  				     &remainder_len, remainder_val,
>  				     xi.val, xi.len, precision,
> @@ -2828,6 +3467,15 @@ wi::div_round (const T1 &x, const T2 &y,
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
>    unsigned int remainder_len;
> +  if (quotient.needs_write_val_arg)
> +    {
> +      quotient_val = quotient.write_val ((sgn == UNSIGNED
> +					  && xi.val[xi.len - 1] < 0)
> +					 ? CEIL (precision,
> +						 HOST_BITS_PER_WIDE_INT) + 1
> +					 : xi.len + 1);
> +      remainder_val = remainder.write_val (yi.len);
> +    }
>    quotient.set_len (divmod_internal (quotient_val,
>  				     &remainder_len, remainder_val,
>  				     xi.val, xi.len, precision,
> @@ -2871,6 +3519,15 @@ wi::divmod_trunc (const T1 &x, const T2
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
>    unsigned int remainder_len;
> +  if (quotient.needs_write_val_arg)
> +    {
> +      quotient_val = quotient.write_val ((sgn == UNSIGNED
> +					  && xi.val[xi.len - 1] < 0)
> +					 ? CEIL (precision,
> +						 HOST_BITS_PER_WIDE_INT) + 1
> +					 : xi.len + 1);
> +      remainder_val = remainder.write_val (yi.len);
> +    }
>    quotient.set_len (divmod_internal (quotient_val,
>  				     &remainder_len, remainder_val,
>  				     xi.val, xi.len, precision,
> @@ -2915,6 +3572,8 @@ wi::mod_trunc (const T1 &x, const T2 &y,
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
>    unsigned int remainder_len;
> +  if (remainder.needs_write_val_arg)
> +    remainder_val = remainder.write_val (yi.len);
>    divmod_internal (0, &remainder_len, remainder_val,
>  		   xi.val, xi.len, precision,
>  		   yi.val, yi.len, yi.precision, sgn, overflow);
> @@ -2955,6 +3614,15 @@ wi::mod_floor (const T1 &x, const T2 &y,
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
>    unsigned int remainder_len;
> +  if (quotient.needs_write_val_arg)
> +    {
> +      quotient_val = quotient.write_val ((sgn == UNSIGNED
> +					  && xi.val[xi.len - 1] < 0)
> +					 ? CEIL (precision,
> +						 HOST_BITS_PER_WIDE_INT) + 1
> +					 : xi.len + 1);
> +      remainder_val = remainder.write_val (yi.len);
> +    }
>    quotient.set_len (divmod_internal (quotient_val,
>  				     &remainder_len, remainder_val,
>  				     xi.val, xi.len, precision,
> @@ -2991,6 +3659,15 @@ wi::mod_ceil (const T1 &x, const T2 &y,
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
>    unsigned int remainder_len;
> +  if (quotient.needs_write_val_arg)
> +    {
> +      quotient_val = quotient.write_val ((sgn == UNSIGNED
> +					  && xi.val[xi.len - 1] < 0)
> +					 ? CEIL (precision,
> +						 HOST_BITS_PER_WIDE_INT) + 1
> +					 : xi.len + 1);
> +      remainder_val = remainder.write_val (yi.len);
> +    }
>    quotient.set_len (divmod_internal (quotient_val,
>  				     &remainder_len, remainder_val,
>  				     xi.val, xi.len, precision,
> @@ -3017,6 +3694,15 @@ wi::mod_round (const T1 &x, const T2 &y,
>    WIDE_INT_REF_FOR (T2) yi (y);
>  
>    unsigned int remainder_len;
> +  if (quotient.needs_write_val_arg)
> +    {
> +      quotient_val = quotient.write_val ((sgn == UNSIGNED
> +					  && xi.val[xi.len - 1] < 0)
> +					 ? CEIL (precision,
> +						 HOST_BITS_PER_WIDE_INT) + 1
> +					 : xi.len + 1);
> +      remainder_val = remainder.write_val (yi.len);
> +    }
>    quotient.set_len (divmod_internal (quotient_val,
>  				     &remainder_len, remainder_val,
>  				     xi.val, xi.len, precision,
> @@ -3086,12 +3772,16 @@ wi::lshift (const T1 &x, const T2 &y)
>    /* Handle the simple cases quickly.   */
>    if (geu_p (yi, precision))
>      {
> +      if (result.needs_write_val_arg)
> +	val = result.write_val (1);
>        val[0] = 0;
>        result.set_len (1);
>      }
>    else
>      {
>        unsigned int shift = yi.to_uhwi ();
> +      if (result.needs_write_val_arg)
> +	val = result.write_val (xi.len + shift / HOST_BITS_PER_WIDE_INT + 1);
>        /* For fixed-precision integers like offset_int and widest_int,
>  	 handle the case where the shift value is constant and the
>  	 result is a single nonnegative HWI (meaning that we don't
> @@ -3130,12 +3820,23 @@ wi::lrshift (const T1 &x, const T2 &y)
>    /* Handle the simple cases quickly.   */
>    if (geu_p (yi, xi.precision))
>      {
> +      if (result.needs_write_val_arg)
> +	val = result.write_val (1);
>        val[0] = 0;
>        result.set_len (1);
>      }
>    else
>      {
>        unsigned int shift = yi.to_uhwi ();
> +      if (result.needs_write_val_arg)
> +	{
> +	  unsigned int est_len = xi.len;
> +	  if (xi.val[xi.len - 1] < 0 && shift)
> +	    /* Logical right shift of sign-extended value might need a very
> +	       large precision e.g. for widest_int.  */
> +	    est_len = CEIL (xi.precision - shift, HOST_BITS_PER_WIDE_INT) + 1;
> +	  val = result.write_val (est_len);
> +	}
>        /* For fixed-precision integers like offset_int and widest_int,
>  	 handle the case where the shift value is constant and the
>  	 shifted value is a single nonnegative HWI (meaning that all
> @@ -3171,6 +3872,8 @@ wi::arshift (const T1 &x, const T2 &y)
>       since the result can be no larger than that.  */
>    WIDE_INT_REF_FOR (T1) xi (x);
>    WIDE_INT_REF_FOR (T2) yi (y);
> +  if (result.needs_write_val_arg)
> +    val = result.write_val (xi.len);
>    /* Handle the simple cases quickly.   */
>    if (geu_p (yi, xi.precision))
>      {
> @@ -3374,25 +4077,56 @@ operator % (const T1 &x, const T2 &y)
>    return wi::smod_trunc (x, y);
>  }
>  
> -template<typename T>
> +void gt_ggc_mx (generic_wide_int <wide_int_storage> *) = delete;
> +void gt_pch_nx (generic_wide_int <wide_int_storage> *) = delete;
> +void gt_pch_nx (generic_wide_int <wide_int_storage> *,
> +		gt_pointer_operator, void *) = delete;
> +
> +inline void
> +gt_ggc_mx (generic_wide_int <rwide_int_storage> *)
> +{
> +}
> +
> +inline void
> +gt_pch_nx (generic_wide_int <rwide_int_storage> *)
> +{
> +}
> +
> +inline void
> +gt_pch_nx (generic_wide_int <rwide_int_storage> *, gt_pointer_operator, void *)
> +{
> +}
> +
> +template<int N>
>  void
> -gt_ggc_mx (generic_wide_int <T> *)
> +gt_ggc_mx (generic_wide_int <fixed_wide_int_storage <N> > *)
>  {
>  }
>  
> -template<typename T>
> +template<int N>
>  void
> -gt_pch_nx (generic_wide_int <T> *)
> +gt_pch_nx (generic_wide_int <fixed_wide_int_storage <N> > *)
>  {
>  }
>  
> -template<typename T>
> +template<int N>
>  void
> -gt_pch_nx (generic_wide_int <T> *, gt_pointer_operator, void *)
> +gt_pch_nx (generic_wide_int <fixed_wide_int_storage <N> > *,
> +	   gt_pointer_operator, void *)
>  {
>  }
>  
>  template<int N>
> +void gt_ggc_mx (generic_wide_int <widest_int_storage <N> > *) = delete;
> +
> +template<int N>
> +void gt_pch_nx (generic_wide_int <widest_int_storage <N> > *) = delete;
> +
> +template<int N>
> +void gt_pch_nx (generic_wide_int <widest_int_storage <N> > *,
> +		gt_pointer_operator, void *) = delete;
> +
> +template<int N>
>  void
>  gt_ggc_mx (trailing_wide_ints <N> *)
>  {
> @@ -3465,7 +4199,7 @@ inline wide_int
>  wi::mask (unsigned int width, bool negate_p, unsigned int precision)
>  {
>    wide_int result = wide_int::create (precision);
> -  result.set_len (mask (result.write_val (), width, negate_p, precision));
> +  result.set_len (mask (result.write_val (0), width, negate_p, precision));
>    return result;
>  }
>  
> @@ -3477,7 +4211,7 @@ wi::shifted_mask (unsigned int start, un
>  		  unsigned int precision)
>  {
>    wide_int result = wide_int::create (precision);
> -  result.set_len (shifted_mask (result.write_val (), start, width, negate_p,
> +  result.set_len (shifted_mask (result.write_val (0), start, width, negate_p,
>  				precision));
>    return result;
>  }
> @@ -3498,8 +4232,8 @@ wi::mask (unsigned int width, bool negat
>  {
>    STATIC_ASSERT (wi::int_traits<T>::precision);
>    T result;
> -  result.set_len (mask (result.write_val (), width, negate_p,
> -			wi::int_traits <T>::precision));
> +  result.set_len (mask (result.write_val (width / HOST_BITS_PER_WIDE_INT + 1),
> +			width, negate_p, wi::int_traits <T>::precision));
>    return result;
>  }
>  
> @@ -3512,9 +4246,13 @@ wi::shifted_mask (unsigned int start, un
>  {
>    STATIC_ASSERT (wi::int_traits<T>::precision);
>    T result;
> -  result.set_len (shifted_mask (result.write_val (), start, width,
> -				negate_p,
> -				wi::int_traits <T>::precision));
> +  unsigned int prec = wi::int_traits <T>::precision;
> +  unsigned int est_len
> +    = result.needs_write_val_arg
> +      ? ((start + (width > prec - start ? prec - start : width))
> +	 / HOST_BITS_PER_WIDE_INT + 1) : 0;
> +  result.set_len (shifted_mask (result.write_val (est_len), start, width,
> +				negate_p, prec));
>    return result;
>  }
>  

  reply	other threads:[~2023-10-09 12:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09 10:55 Jakub Jelinek
2023-10-09 12:54 ` Richard Sandiford [this message]
2023-10-09 13:44   ` Jakub Jelinek
2023-10-09 18:28     ` Jakub Jelinek
2023-10-10 17:41       ` Richard Sandiford
2023-10-10 18:13         ` Jakub Jelinek
2023-10-09 14:59 ` [PATCH] wide-int: Remove rwide_int, introduce dw_wide_int Jakub Jelinek
2023-10-10  9:30   ` Richard Biener
2023-10-10  9:49     ` Jakub Jelinek
2023-10-10 13:42     ` [PATCH] dwarf2out: Stop using wide_int in GC structures Jakub Jelinek
2023-10-10 13:43       ` Richard Biener
2023-10-11 16:47 [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989] Jakub Jelinek
2023-10-12 10:54 ` Richard Sandiford
2023-10-12 11:10   ` Jakub Jelinek
2023-10-12 11:34     ` Richard Sandiford
2023-10-12 11:10 ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpty1gcdkhw.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).