June 26, 2024

Antminer S9 board - NAND memory -- boot without bitstream

Here is what the partition header looks like on our board:

00000c80 00007005 00007005 00007005 00000000
00000c90 00000000 000005c0 00000010 00000001
00000ca0 00000000 00000240 00000000 00000000
00000cb0 00000000 00000000 00000000 fffea7df

00000cc0 0007f2e8 0007f2e8 0007f2e8 00000000
00000cd0 00000000 000075d0 00000020 00000001
00000ce0 00000000 00000250 00000000 00000000
00000cf0 00000000 00000000 00000000 ffe7af06

00000d00 00015b20 00015b20 00015b20 04000000
00000d10 04000000 000868c0 00000010 00000001
00000d20 00000000 00000260 00000000 00000000
00000d30 00000000 00000000 00000000 f7f3836e

00000d40 00000000 00000000 00000000 00000000
00000d50 00000000 00000000 00000000 00000000
00000d60 00000000 00000000 00000000 00000000
00000d70 00000000 00000000 00000000 ffffffff
We have 3 entries, the middle one is the bitstream. What we want to do is to modify the attribute word to indicate that it is not owned by the FSBL.

This is the word at 0xcd8 with the value 0x20. We want to make the partition owner non-zero. Just making the entire word 0xffffffff would probably work, but we will use 0x30020 instead.

We interruppt U-boot and get to the U-boot prompt, then:

nand read 0x100 0xcd8 4
NAND read: device 0 offset 0xcd8, size 0x4
Attempt to read non page-aligned data
 0 bytes read: ERROR
And typing "nand info" tells us that the page size is 2048. Some experimenting shows that the count can be any value, but the offset must be on a 2048 boundary, i.e. on 0 or 0x800 or 0x1000 .... So we do this:
nand read 0x800 0x800 0x800
NAND read: device 0 offset 0x800, size 0x800
 2048 bytes read: OK
For some reason, typing "2048" as the count gets interpretted by U-Boot as hex (you are never sure if U-boot wants hex or decimal, so the safe approach is to always use hex and always prefix the hex with "0x" just in case U-boot was expecting decimal).

I let the DDR address match the flash offset to make my life simple. All of the low addresses are available to us. Then we dump what we obtained.

zynq-uboot> md 0xc80
00000c80: 00007005 00007005 00007005 00000000    .p...p...p......
00000c90: 00000000 000005c0 00000010 00000001    ................
00000ca0: 00000000 00000240 00000000 00000000    ....@...........
00000cb0: 00000000 00000000 00000000 fffea7df    ................
00000cc0: 0007f2e8 0007f2e8 0007f2e8 00000000    ................
00000cd0: 00000000 000075d0 00000020 00000001    .....u.. .......
00000ce0: 00000000 00000250 00000000 00000000    ....P...........
00000cf0: 00000000 00000000 00000000 ffe7af06    ................
00000d00: 00015b20 00015b20 00015b20 04000000     [.. [.. [......
00000d10: 04000000 000868c0 00000010 00000001    .....h..........
00000d20: 00000000 00000260 00000000 00000000    ....`...........
00000d30: 00000000 00000000 00000000 f7f3836e    ............n...
00000d40: 00000000 00000000 00000000 00000000    ................
00000d50: 00000000 00000000 00000000 00000000    ................
00000d60: 00000000 00000000 00000000 00000000    ................
00000d70: 00000000 00000000 00000000 ffffffff    ................
We want to modify the word at 0xcd0.
zynq-uboot> mm 0xcd8
00000cd8: 00000020 ? 30020
00000cdc: 00000001 ? .
Then we want to write it back to flash.

Try it

It fails. I move the JP4 jumper from left (SD) to right (NAND) and power up the board without the SD card. I get:
Xilinx First Stage Boot Loader
Release 2015.4	Mar 29 2018-17:25:31
Devcfg driver initialized
Silicon Version 3.1
Boot mode is NAND
                 InitNand: Geometry = 0x8
Nand driver initialized
NAND Init Done
Flash Base Address: 0xE1000000
Reboot status register: 0x60400000
Multiboot Register: 0x0000C000
Image Start Address: 0x00000000
Partition Header Offset:0x00000C80
Move Image failed
Header Information Load Failed
Partition Header Load Failed
FSBL Status = 0xA00E
This is not what I first thought. When I saw all this, I figured that I needed to recompute checksums or such, but checking these messages against the FSBL code, it seems that the flash read simply failed.

I can't make sense of this. U-Boot itself seems to be able to read the partition header information just fine. Each header entry does have a checksum, but it is never checked that I can see, and nothing like that caused the above error.

Stranger yet, the word at 0xcd8 now reads as 0x20 (the original value) not 0x30020 as I patched it to (or thought I did).

Dig a bit deeper

I tried the process again. The mystery is the modification to the word at 0xcd8 not making it to NAND. So I repeat the read, modify the word, use "md" to verify that the ram copy has the word modified. The I do the "nand write" and follow it with a "nand read" -- and the word has not changed.

I do the same, but now write to 0x300000 on flash. This works.

The other mystery (besides the write not writing) is that after the write that doesn't seem to work, the board will launch the FSBL, but the FSBL now has trouble in a different way.

Could some of the first sectors of NAND be write protected?
And why does this break the boot?

Why waste time on bugs?

After sleeping on this and thinking about it with a clear mind in the morning, I find myself asking, "why waste time on bugs?".

The business of savenv going to 0xe0000 and trashing the bitstream image is clearly a bug. And this business of a NAND write not writing (but somehow trashing the boot) also very much looks like a bug.

But I want to do one last experiment before moving on.

-- get to the U-boot prompt
nand read 0x300000 0x800 0x800
mw 0x800 deadbeef 0x800
nand write 0x800 0x800 0x800
nand read 0x1800 0x800 0x1000
Note that the count for "mw" is in words not bytes.

Whoa! Now look at this. We should see nothing but 0xdeadbeef, but there is some disaster that took place when writing to NAND. I say when writing, since we seem to get coherent information reading from NAND that allowed us to dump and inspect to boot data.

zynq-uboot> md 0x1800
00001800: deadbeef 00000000 deadbeef 00000000    ................
00001810: deadbeef 00000000 deadbeef 00000000    ................
00001820: deadbeef 00000000 deadbeef 00000000    ................
00001830: deadbeef 00000000 deadbeef 00000000    ................
00001840: deadbeef 00000000 deadbeef 00000000    ................
00001850: deadbeef 00000000 deadbeef 00000000    ................
00001860: deadbeef 00000000 deadbeef 00000000    ................
00001870: deadbeef 00000000 deadbeef 00000000    ................
00001880: deadbeef 00000000 deadbeef 00000000    ................
00001890: deadbeef 00000000 deadbeef 00000000    ................
000018a0: deadbeef deadbeef deadbeef deadbeef    ................
000018b0: deadbeef deadbeef deadbeef deadbeef    ................
000018c0: 00000000 00000003 00000220 00000240    ........ ...@...
000018d0: 00000000 deadbeef deadbeef deadbeef    ................
000018e0: deadbeef deadbeef deadbeef deadbeef    ................
000018f0: deadbeef deadbeef deadbeef deadbeef    ................
zynq-uboot>
00001900: 00000240 00000220 00000000 00000001    @... ...........
00001910: 5a292e61 16203020 5e001443 0e252c66    a.)Z 0 .C..^f,%.
00001920: 00000000 00000000 deadbeef deadbeef    ................
00001930: deadbeef deadbeef deadbeef deadbeef    ................
00001940: 00000260 00000220 00000000 00000001    `... ...........
00001950: 422d1e62 54212266 5e243067 402c2269    b.-Bf"!Tg0$^i",@
00001960: 54000000 00000000 deadbeef deadbeef    ...T............
00001970: deadbeef deadbeef deadbeef deadbeef    ................
00001980: 00000000 00000240 00000000 00000001    ....@...........
00001990: 542d226f 4e242e65 4c240000 00000000    o"-Te.$N..$L....
000019a0: deadbeef deadbeef deadbeef deadbeef    ................
000019b0: deadbeef deadbeef deadbeef deadbeef    ................
000019c0: deadbeef deadbeef deadbeef deadbeef    ................
000019d0: deadbeef deadbeef deadbeef deadbeef    ................
000019e0: deadbeef deadbeef deadbeef deadbeef    ................
000019f0: deadbeef deadbeef deadbeef deadbeef    ................

Conclusion

This final experiment was very much worthwhile. We are chasing our tail with a buggy NAND write in the ancient (circa 2016) U-Boot on the Antminer board.
Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org