Parallel is preferable in other ways. ECP mode can transfer data at 500-1000 kBps (vs USB's 1500 kBps). Many chips (eg. 74VHC161284, 74ACT1284I) don't require any setup logic. However, the IEEE 1284 is nontrivial as well (see the only chip implementation: W91284PIC), and that solution is risky (cost? long-term availability?)
I don't know how to program a CPLD, nor do I have a gut feeling how much functionality can be embedded in them, vs. an FPGA. I don't have a gut feeling of how fast they are, and whether they can really greatly improve on bit-banging.
Multiple clocks: a clock could come from each of the interfaces... from the PC side, or from the external side. How to consilidate or pick from them?
Multiple voltages: parallel and USB both run at 5V. External interfaces don't necessarily run at this... some clever chip selection can aleviate this problem. Otherwise, a voltage shifter would have to be used. And at a worst case, one or two power supplies would have to be used.
I don't know how to solder. Many of these chips come in very small packages, I could have a very hard time connecting them.
External interface protection might be hard. I don't know how to do it. But there's all sorts of things like static discharge, shorts, etc...
JTAG <-> parallel is pretty easy. (though I don't understand it well enough to necessarily attached it to a USB interface)