On 05/13/2016 01:13 PM, Andrew Pinski wrote: > On Fri, May 13, 2016 at 12:58 PM, Richard Biener > wrote: >> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis wrote: >>> The cse_sincos pass tries to optimize sequences such as >>> >>> sin (x); >>> cos (x); >>> >>> into a single call to sincos, or cexpi, when available. However, the >>> nvptx target has sin and cos instructions, albeit with some loss of >>> precision (so it's only enabled with -ffast-math). This patch teaches >>> cse_sincos pass to ignore sin, cos and cexpi instructions when the >>> target can expand those calls. This yields a 6x speedup in 314.omriq >> >from spec accel when running on Nvidia accelerators. >>> >>> Is this OK for trunk? >> >> Isn't there an optab for sincos? > > This is exactly what I was going to suggest. This transformation > should be done in the back-end back to sin/cos instructions. I didn't realize that the 387 has sin, cos and sincos instructions, so yeah, my original patch is bad. Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos pattern in the nvptx backend. I haven't testing a standalone nvptx toolchain prior to this patch, so I'm not sure if my test results look sane. I seem to be getting a different set of failures when I test a clean trunk build multiple times. I attached my results below for reference. Cesar g++.sum Tests that now fail, but worked before: nvptx-none-run: g++.dg/abi/param1.C -std=c++14 execution test Tests that now work, but didn't before: nvptx-none-run: g++.dg/opt/pr30590.C -std=gnu++98 execution test nvptx-none-run: g++.dg/opt/pr36187.C -std=gnu++14 execution test gfortran.sum Tests that now fail, but worked before: nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test nvptx-none-run: gfortran.dg/allocate_with_source_5.f90 -O1 execution test nvptx-none-run: gfortran.dg/func_assign_3.f90 -O3 -g execution test nvptx-none-run: gfortran.dg/inline_sum_3.f90 -O1 execution test nvptx-none-run: gfortran.dg/inline_sum_3.f90 -O3 -g execution test nvptx-none-run: gfortran.dg/internal_pack_15.f90 -O2 execution test nvptx-none-run: gfortran.dg/internal_pack_8.f90 -Os execution test nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90 -O0 execution test nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90 -O3 -g execution test nvptx-none-run: gfortran.dg/intrinsic_product_1.f90 -O1 execution test nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90 -O3 -g execution test nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test Tests that now work, but didn't before: nvptx-none-run: gfortran.dg/char_pointer_assign.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test nvptx-none-run: gfortran.dg/char_pointer_dummy.f90 -O1 execution test nvptx-none-run: gfortran.dg/char_pointer_dummy.f90 -Os execution test nvptx-none-run: gfortran.dg/char_result_13.f90 -O3 -g execution test nvptx-none-run: gfortran.dg/char_result_2.f90 -O1 execution test nvptx-none-run: gfortran.dg/char_type_len.f90 -Os execution test nvptx-none-run: gfortran.dg/character_array_constructor_1.f90 -O0 execution test nvptx-none-run: gfortran.dg/nested_allocatables_1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test gcc.sum Tests that now fail, but worked before: nvptx-none-run: gcc.c-torture/execute/20100316-1.c -Os execution test nvptx-none-run: gcc.c-torture/execute/20100708-1.c -O1 execution test nvptx-none-run: gcc.c-torture/execute/20100805-1.c -O0 execution test nvptx-none-run: gcc.dg/torture/pr52028.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test nvptx-none-run: gcc.dg/torture/pr52028.c -O3 -g execution test Tests that now work, but didn't before: nvptx-none-run: gcc.c-torture/execute/20091229-1.c -O3 -g execution test nvptx-none-run: gcc.c-torture/execute/20101013-1.c -Os execution test nvptx-none-run: gcc.c-torture/execute/20101025-1.c -Os execution test nvptx-none-run: gcc.c-torture/execute/20120105-1.c -O0 execution test nvptx-none-run: gcc.c-torture/execute/20120111-1.c -O0 execution test New tests that PASS: nvptx-none-run: gcc.target/nvptx/sincos-1.c (test for excess errors) nvptx-none-run: gcc.target/nvptx/sincos-1.c scan-assembler-times cos.approx.f32 1 nvptx-none-run: gcc.target/nvptx/sincos-1.c scan-assembler-times sin.approx.f32 1 nvptx-none-run: gcc.target/nvptx/sincos-2.c (test for excess errors) nvptx-none-run: gcc.target/nvptx/sincos-2.c execution test >> ISTR x87 handles this pass just fine and also can do sin and cos. >> >> Richard. >> >>> Cesar >> >>