In this paper, the authors propose a cycle estimation methodology for fast instruction-level CPU emulators. This methodology suggests achieving accurate software performance estimation at high emulation speed by utilizing a two-phase pipeline scheduling process: a static pipeline scheduling phase performed off-line before runtime, followed by an accuracy refinement phase performed at runtime. The first phase delivers a pre-estimated CPU cycle count while limiting impact on the emulation speed. The second phase refines the pre-estimated cycle count to provide further accuracy. They implemented this methodology on QEMU and compared cycle counts with a physical ARM CPU.