Pentium(R) 4 processor topicSelf-Modifying Code Clear

Thread Specificity: TS

The number of times the entire pipeline of the machine is cleared due to self-modifying code issues.

A write to a memory location in a code segment that is currently cached in the processor causes the associated cache line (or lines) to be invalidated. This check is based on the physical address of the instruction. In addition, the P6 family and Intel® Pentium® processors check whether a write to a code segment may modify an instruction that has been prefetched for execution. If the write affects a prefetched instruction, the prefetch queue is invalidated. This latter check is based on the linear address of the instruction. For the Pentium 4 processor, a write or a snoop of an instruction in a code segment, where the target instruction is already decoded and resident in the trace cache, invalidates the entire trace cache. The latter behavior means that programs that run on the Pentium 4 processor that self-modify code can cause severe degradation of performance.

More on snooping:

If processor A has a line in the trace cache, and processor B wants to read that line (either as code or data) it will snoop processor A's caches. This "read snoop" does not impact processor A. However, if processor B attempts to write to the line, it snoops processor A's caches with a "write snoop" and causes the trace cache to be flushed. Therefore, on the Pentium 4 processor, a write to a cache line by this processor, or any processor, will flush the trace cache. A read of that line by this processor, or any processor, has no performance impact.

In practice, the check on linear addresses should not create compatibility problems among IA-32 processors. Applications that include self-modifying code use the same linear address for modifying and fetching the instruction. Systems software, such as a debugger, that might possibly modify an instruction using a different linear address than that used to fetch the instruction, will execute a serializing operation, such as a CPUID instruction, before the modified instruction is executed, which will automatically resynchronize the instruction cache and prefetch queue. For Intel486TM processors, a write to an instruction in the cache will modify it in both the cache and memory, but if the instruction was prefetched before the write, the old version of the instruction could be the one executed. To prevent the old instruction from being executed, flush the instruction prefetch unit by coding a jump instruction immediately after any write that modifies an instruction.

Handling Self- and Cross-Modifying Code

The act of a processor writing data into a currently executing code segment with the intent of executing that data as code is called self-modifying code. IA-32 processors exhibit model-specific behavior when executing self-modified code, depending upon how far ahead of the current execution pointer the code has been modified. As processor architectures become more complex and start to speculatively execute code ahead of the retirement point (as in the Pentium 4 and P6 family processors), the rules regarding which code should execute, pre- or post-modification, become blurred.

See Also Additional Stall Event:

Memory Order Machine Clear

Stalls of Store Buffer Resources (non-standard)