One remark here,
i dont know about your exact approach but i also see possible bias here:
How did you optimize the external model parameters like input space, network structure, voting scheme? If you just picked these according to your results above, its biased. I agree this is a general issue...