The authors present a high-level analytical model for Chip Multi-Processors (CMPs) that encompasses processors, memory, and communication in an area-constrained, global optimization process. Applying this analytical model to the design of a symmetric CMP for speech recognition, they demonstrate a methodology for estimating model parameters prior to design exploration. Then they present an automated approach for finding the optimal high-level CMP architecture. The result is the ability to find the allocation of silicon resources for each architectural element that maximizes overall system performance. This balances the performance gains from parallelism, processor micro-architecture, and cache memory with the energy-delay costs of computation and communication.