Cycles Per Instruction Measure

From GM-RKB
(Redirected from Cycles Per Instruction)
Jump to navigation Jump to search

A Cycles Per Instruction Measure is a computer programming measure of the average number of clock cycles per instruction.



References

2017a

[math]\displaystyle{ CPI = \frac{\Sigma_i(IC_i)(CC_i)}{IC} }[/math]
Where [math]\displaystyle{ IC_i }[/math] is the number of instructions for a given instruction type [math]\displaystyle{ i }[/math], [math]\displaystyle{ CC_i }[/math] is the clock-cycles for that instruction type and [math]\displaystyle{ IC=\Sigma_i(IC_i) }[/math] is the total instruction count. The summation sums over all instruction types for a given benchmarking process.

2017b

  • (UMN, 2017) ⇒ https://www.d.umn.edu/~gshute/arch/performance-equation.xhtml 2017-06-04
    • The performance equation analyzes execution time as a product of three factors that are relatively independent of each other.

      This equation remains valid if the time units are changed on both sides of the equation. The left-hand side and the factors on the right-hand side are discussed in the following sections.

      The three factors are, in order, known as the instruction count (IC), clocks per instruction (CPI), and clock time (CT).

      (...) Clocks per instruction (CPI) is an effective average. It is averaged over all of the instruction executions in a program.

      CPI is affected by instruction-level parallelism and by instruction complexity. Without instruction-level parallelism, simple instructions usually take 4 or more cycles to execute. Instructions that execute loops take at least one clock per loop iteration. Pipelining (overlapping execution of instructions) can bring the average for simple instructions down to near 1 clock per instruction. Superscalar pipelining (issuing multiple instructions per cycle) can bring the average down to a fraction of a clock per instruction.

      For computing clocks per instruction as an effective average, the cases are categories of instructions, such as branches, loads, and stores. Frequencies for the categories can be extracted from execution traces. Knowledge of how the architecture handles each category yields the clocks per instruction for that category.