Hi,
For the following test-case:

#include <arm_neon.h>

uint8x8_t f1(int8x8_t a, int8x8_t b) {
  return (uint8x8_t) ((a & b) != 0);
}

gcc fails to lower test operation to vtst, and instead emits:
f1:
        vand    d0, d0, d1
        vceq.i8 d0, d0, #0
        vmvn    d0, d0
        bx      lr

The attached patch tries to fix this by adding a pattern to match this combine:
Trying 7, 8 -> 9:
    7: r120:V8QI=r123:V8QI&r124:V8QI
      REG_DEAD r124:V8QI
      REG_DEAD r123:V8QI
    8: r122:V8QI=-r120:V8QI==const_vector
      REG_DEAD r120:V8QI
    9: r121:V8QI=~r122:V8QI
      REG_DEAD r122:V8QI
Failed to match this instruction:
(set (reg:V8QI 121)
    (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
                (reg:V8QI 124))
            (const_vector:V8QI [
                    (const_int 0 [0]) repeated x8
                ]))
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])))

Essentially it converts:
r120 = (and r123 r124)
r122 = (neg (eq r120 0))
r121 = (not r122)
-->
r121 = vtst r123, r124

(I guess it simplifies (not (neg X)) to (plus X -1) above).

Code-gen after patch:
f1:
        vtst.8  d0, d0, d1
        bx      lr

Bootstrapped + tested on arm-linux-gnueabihf, and
cross tested on arm*-*-*.
Does it look OK for next stage-1 ?

Thanks,
Prathamesh