Working on a voice-controlled app, but latency is too high on mobile. Using a small RNN model—should I switch to something like Conformer or use on-device streaming ASR?