A spoiler up front. This is mostly a detective story driven by erroneous information that indicated that the CPU was running at 10 Mhz. This was bogus and wrong, but did lead to several important discoveries.
Banging out GPIO bits to send data to the HUB75 panel did expose the issue. Even though I have two ARM cores in the Zynq, and they both can run at 666 Mhz, I discovered that the one I am using is running only at 10 Mhz. Why? And how can I fix this.
The above diagram from the TRM should explain everything. You also need to figure out where the registers are that control the various blocks in the diagram. For that, refer to section 25 of the Zynq TRM (technical reference manual) along with B.28 in the Appendix (page 1570) which lists and documents all the registers in the "slcr" (system level control register) section.
CCNT for 1 sec: 9993902 CCNT for 1 sec: 10000467 CCNT for 1 sec: 10000496 CCNT for 1 sec: 10000485 CCNT for 1 sec: 10000483
We get a nice 50 Mhz Fabric Clock 1 in the FPGA. We have routed this to an external pin and measured it with a scope. We have also fiddled with the FPGA clock control registers in the slcr and been able to set the Fabric clock rate as we please, from 25 to 250 Mhz. The Fabric clock is derived from the IO clock, which is running at 1000 Mhz.
We see a multiplier of 30 set for the IO clock. We also have a 33.3 Mhz crystal on our board. Multiplying 33.3 by 30 gives us 1000 Mhz, which confirms that the PS_CLK in the clock diagram above is indeed running at 33.3 Mhz.
PLL -- arm: 0x00028008 PLL -- ddr: 0x00020008 PLL -- io : 0x0001E008I see that bit 3 is set (008) in all three. I was confused for some time thinking this was the bypass bit. It is not -- that is bit 4
Consider the arm PLL register. The multiplier is 0x28 = 40. With a 33.3 Mhz crystal this would give 1332 Mhz. Then if we divide this by 2, we get 666 Mhz, which is exactly what we want.
The documentation could be better. As near as I can tell the timer (triple timer" is not in the IOP collection. Rather it is special and the Pclk it gets is the cpu_1x signal, which ought to be 111 Mhz in a properly configured system.
The timer has a prescaler of 16 and a preload value of 6666. If we multiply 6666 by 16 we get 106656 which is sort of 111,000. This sort of suggests that the clock feeding the timer is the 111 Mhz cpu_1x signal, and that that signal is indeed running at 111 Mhz.
The timer documents call this "pclk". Note that there is a 4 bit prescaler inside the timer. (The one we note above is set to 16). It is not one of the 6 bit programmable dividers shown in the clock diagram above.
Section 8.5 of the TRM talks about the "triple timer". It can select one of three sources for its clock. Pclk is one, and external clock from MIO is another, and a clock from the PL (fpga) is another. I see the Pclk versus Extclk selection in the Timer registers, but not the PL clock selection.
But even more important is that this is not the 3 way selection shown for IOP devices in the clock diagram above.
It is as though the processor has some secret way of getting the 10 Mhz clock it is running on. It seems independent of what I am doing to the ARM PLL. Even more interesting, the timer also seems undisturbed. I ask the timer for a 20 second delay (using Kyu command "i 8" and I get a 20 second delay. I would have expected the ARM PLL output to be halved and the timer to be running at half speed (and thus get a 40 second delay when I asked for 20). See below for more on this!
On page 1578, the TRM says that the PLL must first be bypassed and then put into reset mode before changing the divisor. The reset is the low bit of the same register .... Aha!
Second is that the 10 Mhz CPU clock does not correspond to anything I am finding in reading about clocks. Could it be some kind of boot setting that needs to be bypassed somewhere else entirely.
I will note that the CCNT has a divide by 64 feature that can be enabled. Suppose it was accidentally disabled (due to some effectively undocumented change in the Cortex A9). That would mean that my processor is really running at 640 Mhz (which is mighty close to the expected 666). This seems awfully suspicious. Maybe I can figure out an independent way to check the clock speed. Something like cranking out pulses on some MIO pin on the Zynq. 666/64 = 10.41 Mhz. Very suspicious.
Note that my 1000 Hz timer was set up expecting a 100 Mhz pClk, but I now know it is a 111 Mhz signal. I could recalibrate my timer knowing this and might learn some new things. But the first thing to do is set up the experiment with MIO pulses. Some decent CCNT performance monitor documentation for the Cortex A9 would certainly be nice.
Kyu (zynq), ready> i 8 Kyu (zynq), ready> Collecting data for 8 seconds CCNT for 1/10 sec: 665453568 CCNT for 1/10 sec: 666046464 CCNT for 1/10 sec: 666047296 CCNT for 1/10 sec: 666046720 CCNT for 1/10 sec: 666046976 CCNT for 1/10 sec: 666046976 CCNT for 1/10 sec: 666046592 CCNT for 1/10 sec: 666047104 Looks like 666 Mhz CPU clockSo I am calling this case closed. The big surprise was the divide by 64 in the CCNT system. But I learned a lot about the Zynq clocks and also made the useful discovery that the timer is getting a 111 Mhz clock, which allowed me to make adjustments and improve accuracy.
20004b94: e51b3008 ldr r3, [fp, #-8] 20004b98: e51b2014 ldr r2, [fp, #-20] @ 0xffffffec 20004b9c: e5832010 str r2, [r3, #16] 20004ba0: e51b3008 ldr r3, [fp, #-8] 20004ba4: e51b2018 ldr r2, [fp, #-24] @ 0xffffffe8 20004ba8: e5832010 str r2, [r3, #16] 20004bac: eafffff8 b 20004b94I measure 1.54 Mhz, with a 330 ns high time and a 320 ns low time. This boils down to 7 instructions in the loop, and the loop taking 650 ns, so we get 92.9 ns per instruction.for ( ;; ) { gp->output2_low = m_on; gp->output2_low = m_off; }
This is pretty close to the 100 ns per instruction that a 10 Mhz CPU clock would give us. This is both surprising and disappointing.
My take on this is that the 10 Mhz rate is just a coincidence (in that it nearly matches the erroneous 10 Mhz that we sorted out above). My guess is that I need to investigate whether caches are enabled, and related issues involving the caches. Rather than invest time in that now, I am going to transition my work to using the FPGA for better performance sending data to my HUB75 panel.
Tom's Electronics pages / tom@mmto.org