I was doing all my work in the RXNE (receive buffer not empty) handler for SPI. As soon as I moved my sending code to a TXE (transmit buffer empty) handler it seems like everything is working great.
So bascially in my previous interrupt handler, I needed to do all my work in the one clock cycle between the byte finishing up coming down from MOSI and the next byte coming in, this was just a few micro seconds and the timing was too tight to have it work reliably
With the working approach -
I receive the first character from the master, then when I queue up the next byte to be sent, enable the TXE SPI interrupt. Then as soon as the transmit buffer is empty (which looks like about 25% through the transmission, the TXE SPI IRQ will fire and I can queue up the next byte to be spent with plenty of time to spare. Then when I know I'm done sending data, I just need to disable the SPI TXE IRQ.
Then first thing in my handler I have code to figure out what needs to happen:
if(SPI->SR & 0x01) GO_ProcessByte(SPI_ReceiveData()); else if(SPI->SR & 0x02) GO_HandleTxe();
So far it seems to work.