June 22, 2024

Antminer S9 board - NAND memory -- some preliminary analysis

The start of NAND looks like this:
00000000 eafffffe eafffffe eafffffe eafffffe
00000010 eafffffe eafffffe eafffffe eafffffe
00000020 aa995566 584c4e58 00000000 01010000   fU  XNLX
00000030 00001700 0001c014 00000000 00000000
00000040 0001c014 00000001 fc15c518 00000000
00000050 00000000 00000000 00000000 00000000
00000060 00000000 00000000 00000000 00000000
00000070 00000000 00000000 00000000 00000000
00000080 00000000 00000000 00000000 00000000
00000090 00000000 00000000 000008c0 00000c80
This is the first part of a Xilinx bootrom header.
Word 0x30 is 0x1700 - which is the offset to the FSBL
Word 0x34 is 0x1c014 - which is the size of the FSBL
Word 0x40 is 0x1c014 - the size again
The two sizes match, since the FSBL is unencrypted. The bootrom header ends at 0x990 and is followed by a bunch of all one data (0xffffffff). Something pops up at 0xc80 (through 0xd7f). Then more ones until we reach 0x1700 where we have ARM code:
00001700 ea000045 ea000025 ea000028 ea000038   E   %   (   8
00001710 ea00002f e320f000 ea000000 ea00000f   /
00001720 e92d500f ed2d0b10 ed6d0b20 eef11a10    P-   -   m
00001730 e52d1004 eef81a10 e52d1004 eb002371     -       - q#

The size of 0x1c014 seems big. This is 114708 decimal. We have 3*64K of OCM for the FSBL image, so there is no trouble at all with an image this big.

We have two interesting pointers at the end of the block dumped above:

Word 0x98 - 0x8c0 -- image header table
Word 0x9c - 0xc80 -- partition header table
Both point to information, neither is documented in the xynq TRM.

0x1700 + 0x1c014 = 0x1d714 the word after the FSBL image. We see a short run of 0xffffffff, then data, then lots of zeros, all the way to offset 0x5a44c.

Searching for strings, I find things that smell like U-Boot around address 0025f400. I see what looks like the U-Boot environment variables at 0025669e, namely:

00256690 02020202 02021002 02020202 6f620202                 bo
002566a0 6d63746f 75723d64 6d24206e 6265646f   otcmd=run $modeb
002566b0 00746f6f 746f6f62 616c6564 00333d79   oot bootdelay=3
002566c0 64756162 65746172 3531313d 00303032   baudrate=115200
002566d0 64617069 313d7264 30312e30 2e30372e   ipaddr=10.10.70.
002566e0 00323031 76726573 70697265 2e30313d   102 serverip=10.
002566f0 372e3031 30312e30 74650031 64646168   10.70.101 ethadd
00256700 30303d72 3a61303a 303a3533 31303a30   r=00:0a:35:00:01
00256710 0032323a 6e72656b 695f6c65 6567616d   :22 kernel_image

The U-Boot environment variables themselves are informative:

kernel_size=0x800000
devicetree_size=0x20000

nandboot=echo Copying Linux from NAND flash to RAM... &&
nand read 0x2000000 0x1100000 ${kernel_size} &&
nand read 0x3000000 0x1020000 ${devicetree_size} &&
bootm 0x2000000 - 0x3000000

nandroot=/dev/mtdblock2
nandrootfstype=jffs2
Here "nand read" has the form:
nand read ram-address offset size

What can linux tell us?

I go ahead and clear the root password on this board, as well as disabling the bitminer software. Now I look around:
rootfs on / type rootfs (rw)
ubi0:rootfs on / type ubifs (rw,relatime)
And more interesting:
 ls -l /dev/mtd*
crw-------    1 root     root       90,   0 Jan  1  1970 /dev/mtd0
crw-------    1 root     root       90,   1 Jan  1  1970 /dev/mtd0ro
crw-------    1 root     root       90,   2 Jan  1  1970 /dev/mtd1
crw-------    1 root     root       90,   3 Jan  1  1970 /dev/mtd1ro
crw-------    1 root     root       90,   4 Jan  1  1970 /dev/mtd2
crw-------    1 root     root       90,   5 Jan  1  1970 /dev/mtd2ro
brw-------    1 root     root       31,   0 Jan  1  1970 /dev/mtdblock0
brw-------    1 root     root       31,   1 Jan  1  1970 /dev/mtdblock1
brw-------    1 root     root       31,   2 Jan  1  1970 /dev/mtdblock2
And ...
oot@antMiner:/dev# fdisk -l

Disk /dev/mtdblock0: 33 MB, 33554432 bytes
255 heads, 63 sectors/track, 4 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/mtdblock0 doesn't contain a valid partition table

Disk /dev/mtdblock1: 150 MB, 150994944 bytes
255 heads, 63 sectors/track, 18 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/mtdblock1 doesn't contain a valid partition table

Disk /dev/mtdblock2: 83 MB, 83886080 bytes
255 heads, 63 sectors/track, 10 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/mtdblock2 doesn't contain a valid partition table
And ...
cat /proc/mtd
dev:    size   erasesize  name
mtd0: 02000000 00020000 "BOOT.bin-env-dts-kernel"
mtd1: 09000000 00020000 "angstram-rootfs"
mtd2: 05000000 00020000 "upgrade-rootfs"
And in dmesg we see:
[    1.296684] UBI: attaching mtd1 to ubi0
[    1.834499] UBI: attached mtd1 (name "angstram-rootfs", size 144 MiB) to ubi0
The trick is that we haven't found the offsets for NAND partitions. We can actually run vi on /var/log/dmesg and find:
[    1.214942] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[    1.221242] nand: Micron MT29F2G08ABAEAWP
[    1.225207] nand: 256MiB, SLC, page size: 2048, OOB size: 64
[    1.231148] Bad block table found at page 131008, version 0x01
[    1.237376] Bad block table found at page 130944, version 0x01
[    1.243427] 3 ofpart partitions found on MTD device pl353-nand
[    1.249214] Creating 3 MTD partitions on "pl353-nand":
[    1.254302] 0x000000000000-0x000002000000 : "BOOT.bin-env-dts-kernel"
[    1.262354] 0x000002000000-0x00000b000000 : "angstram-rootfs"
[    1.269660] 0x00000b000000-0x000010000000 : "upgrade-rootfs"
UBI is a special filesystem for flash devices:
  • The UBI filesystem
  • Linux MTD partitions
  • more on Linux MTD
  • It is important to note that the usual "fdisk" partition scheme does not apply in any way to MTD devices. They have their partition structure "baked in" like the early days of linux -- or so it seems.

    Now, consider these 3 partitions we have discovered.

    0x000000000000-0x000002000000 : "BOOT.bin-env-dts-kernel"
    0x000002000000-0x00000b000000 : "angstram-rootfs"
    0x00000b000000-0x000010000000 : "upgrade-rootfs"
    
    The sizes are 2, 9, and 5 in units of 0x100_0000 (16M chunks), so we have 32M, 144M, and 80M -- for a total of 256M. That is nice, since it totals to the size of our NAND chip.

    Now consider the offsets that U-Boot uses to fetch the kernel and the dtb for the kernel.

    nand read 0x2000000 0x1100000 0x800000 (kernel)
    nand read 0x3000000 0x1020000 0x20000  (dtb)
    
    These offsets (0x0102_0000 and 0x0110_0000) are well within the first 32M partition. In fact they are pretty much right in the middle of it. So we could have just dumped 32M using U-boot and had all we really care about to figure out the boot setup. The remaining 224M are linux filesystems.

    Where the heck is U-Boot?

    If this was an SD card, we would go digging into bootgen to figure that out. But NAND is different. We are going to have to look at the FSBL source to figure out how it locates U-Boot.

    What about the errant U-Boot saveenv?

    You may remember the following messages, after which my board would no longer launch U-Boot.
    zynq-uboot> saveenv
    Saving Environment to NAND...
    Erasing NAND...
    Erasing at 0xe0000 -- 100% complete.
    Writing to NAND... OK
    
    It launches the FSBL, but the FSBL gives the odd message:
    PCAP_FPGA_DONE_FAIL
    PCAP Bitstream Download Failed
    PARTITION_MOVE_FAIL
    FSBL Status = 0xA00B
    
    What exactly is the message "Erasing at 0xe0000" telling us? Is this a byte offset into the NAND image? If so, it is in the midst of a big block of zeros. But perhaps this is in the midst of a bitstream? Many questions and we need more information.
    Feedback? Questions? Drop me a line!

    Tom's Computer Info / tom@mmto.org