June 23, 2024

Antminer S9 board - NAND memory -- partition header and image header

The start of NAND looks like this:
00000000 eafffffe eafffffe eafffffe eafffffe
00000010 eafffffe eafffffe eafffffe eafffffe
00000020 aa995566 584c4e58 00000000 01010000   fU  XNLX
00000030 00001700 0001c014 00000000 00000000
00000040 0001c014 00000001 fc15c518 00000000
00000050 00000000 00000000 00000000 00000000
00000060 00000000 00000000 00000000 00000000
00000070 00000000 00000000 00000000 00000000
00000080 00000000 00000000 00000000 00000000
00000090 00000000 00000000 000008c0 00000c80
We talked about a lot of this in our first analysis page. Here we want to focus on the last two words above.
Word 0x98 is 0x8c0 - a pointer to the image header
Word 0x9c is 0xc80 - a pointer to the partiton header
These pointers are part of the bootrom header, but they are not used by the bootrom. They are used by the FSBL and we can learn about them by looking at the FSBL code.

Partition Header

Here is what a partition header entry looks like:
typedef struct StructPartHeader {
        u32 ImageWordLen;       /* 0x0 */
        u32 DataWordLen;        /* 0x4 */
        u32 PartitionWordLen;   /* 0x8 */
        u32 LoadAddr;           /* 0xC */
        u32 ExecAddr;           /* 0x10 */
        u32 PartitionStart;     /* 0x14 */
        u32 PartitionAttr;      /* 0x18 */
        u32 SectionCount;       /* 0x1C */
        u32 CheckSumOffset;     /* 0x20 */
        u32 Pads1[1];
        u32 ACOffset;		/* 0x28 */
        u32 Pads2[4];
        u32 CheckSum;           /* 0x3C */
} PartHeader;
We can have up to 14 of these. Each is 16 words in size.
Here is what we see at that offset on NAND:
00000c80 00007005 00007005 00007005 00000000
00000c90 00000000 000005c0 00000010 00000001
00000ca0 00000000 00000240 00000000 00000000
00000cb0 00000000 00000000 00000000 fffea7df

00000cc0 0007f2e8 0007f2e8 0007f2e8 00000000
00000cd0 00000000 000075d0 00000020 00000001
00000ce0 00000000 00000250 00000000 00000000
00000cf0 00000000 00000000 00000000 ffe7af06

00000d00 00015b20 00015b20 00015b20 04000000
00000d10 04000000 000868c0 00000010 00000001
00000d20 00000000 00000260 00000000 00000000
00000d30 00000000 00000000 00000000 f7f3836e

00000d40 00000000 00000000 00000000 00000000
00000d50 00000000 00000000 00000000 00000000
00000d60 00000000 00000000 00000000 00000000
00000d70 00000000 00000000 00000000 ffffffff

00000d80 ffffffff ffffffff ffffffff ffffffff
....
So, our table has 3 entries, then we get a terminating entry with a bunch of zeros and all ones for the checksum. The FSBL scans this array looking carefully for this terminating entry, then concludes we have 3 entries:
Partition Count: 3
It skips the first entry (the FSBL itself).
It recognizes the second entry as a bitstream by the 0x20 attribute value.
It recognizes the third entry as code (application by the 0x10 attribute value.

We will ignore the bitstream for now (and perhaps forever). We don't have the checksum or RSA attributes set.

The excitement happens in PartitionMove(). Look at these lines:

SourceAddr = ImageBaseAddress;
SourceAddr += Header->PartitionStart << WORD_LENGTH_SHIFT;
The shift is key. This is by 2 (so a multiply by 4). The start address in the partition header is a word count!

We have 000868c0 for partition 2 (U-Boot). If we shift this by 2, we get 0x21A300 We see this:

0021a2d0 20000000 20000000 20000000 20000000
0021a2e0 ffffffff ffffffff ffffffff ffffffff
0021a2f0 ffffffff ffffffff ffffffff ffffffff
0021a300 ea000013 e59ff014 e59ff014 e59ff014
0021a310 e59ff014 e59ff014 e59ff014 e59ff014
0021a320 04000100 04000160 040001c0 04000220       `
0021a330 04000280 040002e0 04000340 12345678
That certainly looks like the start of executable code. I recently compiled U-boot from recent sources and when I dump the binary, I get:
00000000 ea0000b8 e59ff014 e59ff014 e59ff014
00000010 e59ff014 e59ff014 e59ff014 e59ff014
00000020 04000060 040000c0 04000120 04000180   `
Different, but a close enough family resemblance to persuade me. When the call is finally made to read the image from NAND, it looks like this:
Status = MoveImage(SourceAddr, LoadAddr, (ImageWordLen << WORD_LENGTH_SHIFT));
So the count is also in words and must be adjusted. Let's look at the "chunks" that are defined by the partition header now that we understand the need to multiply by 4.
part 0 (fsbl )   5e0  7005 0x001780 0x01c014 (114,708 bytes)
part 1 (bits )  75d0 7f2e8 0x01d740 0x1fcba0 (2,083,744 bytes)
part 2 (uboot) 868c0 15b20 0x21a300 0x056c80 (355,456 bytes)
The small size of U-boot is a bit surprising, my current build is over 1M in size.
Also worthy of note is that a 32M partition is set up to hold all of this. Clearly that is excessive. Even 4M would suffice.
Now let's look at some address ranges:
FSBL -- 0x001780 to 0x0001D794
Bits -- 0x01d740 to 0x0021a2e0
Uboot - 0x21a300 to 0x00270f80
32M --- 0x000000 to 0x01ffffff
The range for the entire 32M partition is 0 to 01fffff0

Indeed if we go to 0x270f80 we see all ones from there to the end, I wrote a little program to report blocks of zeros and all ones and it reports:

One: 00270f80 - 007ffffc (1457184)

It is curious in the above that the end of the FSBL overlaps the start of the bitstream. This is surprising and somewhat unsettling, but it all seems to work.

Also consider the evil business of U-boot dumping saveenv stuff at offset 0xe0000. This is clearly in the middle of the bitstream area. It is too bad it chose this location and there is a vast area further along in our 32M partition that it could use without difficulty.

Why does corrupting the bitstream cause the FSBL to throw a foul?
The messages were:

PCAP_FPGA_DONE_FAIL
PCAP Bitstream Download Failed
PARTITION_MOVE_FAIL
FSBL Status = 0xA00B
The first message is in pcap.c where it is waiting for the DMA transfer of the bitstream to finish. So it has taken the corrupt bitstream and merrily tried to load it via the PCAP.

The second message is in image_mover.c where it checks the return value from PcapLoadPartition(), which is simply relaying the situation detected above. So the FSBL did not do a checksum or anything like that. I don't know why the PCAP gagged on the bad bitstream, but I suppose it is good that it did.

Hot tip

The FSBL loops through the array of partition headers. The first check it makes is that the FSBL "owns" the partition. It does this by checking a masked field in the PartitionAttr field. The mask is 0x30000 and it wants to see 0. If it doesn't see zero, it skips it. This would be something we could hack so that it would skip loading the bitstream.

In other words, if we change the attribute word for the bitstream (partition 1) from 0x10 to 0x10010, it should skip the bitstream. Anything nonzero as the partition owner would do.

/* Attribute word defines */
#define ATTRIBUTE_IMAGE_TYPE_MASK               0xF0    /* Destination Device type */
#define ATTRIBUTE_PS_IMAGE_MASK                 0x10    /* Code partition */
#define ATTRIBUTE_PL_IMAGE_MASK                 0x20    /* Bit stream partition */
#define ATTRIBUTE_CHECKSUM_TYPE_MASK    	0x7000  /* Checksum Type */
#define ATTRIBUTE_RSA_PRESENT_MASK              0x8000  /* RSA Signature Present */
#define ATTRIBUTE_PARTITION_OWNER_MASK  	0x30000 /* Partition Owner */
#define ATTRIBUTE_PARTITION_OWNER_FSBL  	0x00000 /* FSBL Partition Owner */

Image Header

What do we see at 0x8c0 where the image header resides:

000008c0 01020000 00000003 00000320 00000240               @
000008d0 00000000 ffffffff ffffffff ffffffff
000008e0 ffffffff ffffffff ffffffff ffffffff
000008f0 ffffffff ffffffff ffffffff ffffffff

00000900 00000250 00000320 00000000 00000001   P
00000910 5a796e71 37303130 5f425443 2e656c66   qnyZ0107CTB_fle.
00000920 00000000 00000000 ffffffff ffffffff
00000930 ffffffff ffffffff ffffffff ffffffff

00000940 00000260 00000330 00000000 00000001   `   0
00000950 626d5f62 74636376 5f667067 612e6269   b_mbvcctgpf_ib.a
00000960 74000000 00000000 ffffffff ffffffff      t
00000970 ffffffff ffffffff ffffffff ffffffff

00000980 00000000 00000340 00000000 00000001       @
00000990 752d626f 6f742e65 6c660000 00000000   ob-ue.to  fl
000009a0 ffffffff ffffffff ffffffff ffffffff
000009b0 ffffffff ffffffff ffffffff ffffffff
The strings have their byte order flipped in groups of 4. With the order fixed, they are:
Zynq7010_BTC.elf
bm_btccv_fpga.bit
u-boot.elf
It looks like there is a first item (image header table), followed by image header entries

A comment in the FSBL source says that the first partition is ignored as it is the FSBL itself. The second partition would seem to contain a bitstream and the third partition contains U-Boot.

Looking at the FSBL code, it seems to me that the only code that reads and/or looks at the image header is in GetNAuthImageHeader() and this code gets called only if we have RSA support enabled and RSA is enabled in the eFuse status register. In other words, in our case, none of this applies.

We can verify this by looking at the FSBL messages below. If RSA was enabled, we would see:

RSA enabled for Chip

What it would do, if it was in the game, would be to compute an sha2 checksum of both the image header and the partition header and validate it.

FSBL messages

The FSBL is pretty chatty. Here is what I see from the FSBL in NAND. All these messages are instructive now that I am looking at the FSBL source. In particular, note the "Silicon Version 3.1"

When it reports "Partition Number: 1" this is the second partition. In other words they are numbered 0, 1, 2 -- Partition 0 is skipped as it is the FSBL.

Xilinx First Stage Boot Loader
Release 2015.4	Mar 29 2018-17:25:31
Devcfg driver initialized
Silicon Version 3.1
Boot mode is NAND
                 InitNand: Geometry = 0x8
Nand driver initialized
NAND Init Done
Flash Base Address: 0xE1000000
Reboot status register: 0x60400000
Multiboot Register: 0x0000C000
Image Start Address: 0x00000000
Partition Header Offset:0x00000C80
Partition Count: 3

Partition Number: 1
Header Dump
Image Word Len: 0x0007F2E8
Data Word Len: 0x0007F2E8
Partition Word Len:0x0007F2E8
Load Addr: 0x00000000
Exec Addr: 0x00000000
Partition Start: 0x000075D0
Partition Attr: 0x00000020
Partition Checksum Offset: 0x00000000
Section Count: 0x00000001
Checksum: 0xFFE7AF06
Bitstream

In FsblHookBeforeBitstreamDload function
PCAP:StatusReg = 0x40000A30
PCAP:device ready
PCAP:Clear done
Level Shifter Value = 0xA
Devcfg Status register = 0x40000A30
PCAP:Fabric is Initialized done
PCAP register dump:
PCAP CTRL 0xF8007000: 0x4C00E07F
PCAP LOCK 0xF8007004: 0x0000001A
PCAP CONFIG 0xF8007008: 0x00000508
PCAP ISR 0xF800700C: 0x0802000B
PCAP IMR 0xF8007010: 0xFFFFFFFF
PCAP STATUS 0xF8007014: 0x50000F30
PCAP DMA SRC ADDR 0xF8007018: 0x00100001
PCAP DMA DEST ADDR 0xF800701C: 0xFFFFFFFF
PCAP DMA SRC LEN 0xF8007020: 0x0007F2E8
PCAP DMA DEST LEN 0xF8007024: 0x0007F2E8
PCAP ROM SHADOW CTRL 0xF8007028: 0xFFFFFFFF
PCAP MBOOT 0xF800702C: 0x0000C000
PCAP SW ID 0xF8007030: 0x00000000
PCAP UNLOCK 0xF8007034: 0x757BDF0D
PCAP MCTRL 0xF8007080: 0x30800100
DMA Done !
FPGA Done !
In FsblHookAfterBitstreamDload function

Partition Number: 2
Header Dump
Image Word Len: 0x00015B20
Data Word Len: 0x00015B20
Partition Word Len:0x00015B20
Load Addr: 0x04000000
Exec Addr: 0x04000000
Partition Start: 0x000868C0
Partition Attr: 0x00000010
Partition Checksum Offset: 0x00000000
Section Count: 0x00000001
Checksum: 0xF7F3836E
Application
Handoff Address: 0x04000000
In FsblHookBeforeHandoff function
SUCCESSFUL_HANDOFF
FSBL Status = 0x1

U-Boot 2014.01-gfb1d3e7-dirty (Mar 09 2018 - 19:36:04)

Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org