August 18, 2018

U-Boot - track down that exception

I am getting this when I try to do any kind of network command.
nanopi3# dhcp
tjt: dw_eth_init: d2:a5:61:c9:2e:e4
In dw_write_hwaddr: d2:a5:61:c9:2e:e4 (tjt)
Speed: 1000, full duplex
BOOTP broadcast 1
"Synchronous Abort" handler, esr 0x96000061
ELR:     7da8bb20
LR:      7da8bb0c
x0 : 0000000000060101 x1 : 000000007daedc30
x2 : 0000000000000006 x3 : 0000000000000006
x4 : 000000007daedc86 x5 : 00000000000000e4
x6 : 00000000000000bd x7 : 000000007daee270
x8 : 0000000000000000 x9 : 0000000039e3b000
x10: 000000007ba36562 x11: 00000000ffffffff
x12: 00000000ffffffff x13: 000000007da9fdc0
x14: 0000000000000001 x15: 000000007ba38800
x16: 0000000000000005 x17: 0000000000000004
x18: 000000007ba36df8 x19: 000000007daedc8e
x20: 000000007daedc8e x21: 000000000000000e
x22: 000000007daef000 x23: 000000007daed000
x24: 0000000000000001 x25: 000000007daef000
x26: 000000007daefadc x27: 0000000000000000
x28: 000000007ba38b30 x29: 000000007ba36a30

Resetting CPU ...
The most valuable piece of information is the ELR register, which tells us where the exception will return to. As the manual says, "The Exception Link Register holds the exception return address." Is this the address of the instruction that caused the exception or the instruction after the bad instruction? As we discover below, it is the address of the instruction that caused the exception. So it is just what we want.

A big problem is that the code is running at a different location from where u-boot.bin was linked to run at. Some mystery is involved here, and we will ignore the why and how of this and just deal with the offset. I put the following routine into my code and run it:

static void where ( void )
{
        void (*fp)(void);

        fp = where;
        printf ( "I am at %lx\n", (unsigned long int) fp );
}
This prints out "I am at 7da3dc44". If we disassemble (using objdump) u-boot.bin, we see:
0000000043c02c44 :
    43c02c44:   90000001        adrp    x1, 43c02000
    43c02c48:   d0000300        adrp    x0, 43c64000
    43c02c4c:   91311021        add     x1, x1, #0xc44
    43c02c50:   912f1000        add     x0, x0, #0xbc4
    43c02c54:   14013307        b       43c4f870 
So "where" was linked at 0x43c02c44 but is actually running at 0x7da3dc44. Heaven knows how or why. But the offset is 0x39e3b000, which is the useful thing. Now take the ELR value and subtract this offset. 0x7da8d2cc becomes 43c522cc and we can find that in the disassembly.
0000000043c522ac :
    43c522ac:   a9bd7bfd        stp     x29, x30, [sp, #-48]!
    43c522b0:   910003fd        mov     x29, sp
    43c522b4:   f9000bf3        str     x19, [sp, #16]
    43c522b8:   aa0003f3        mov     x19, x0
    43c522bc:   528008a0        mov     w0, #0x45                       // #69
    43c522c0:   b9002ba1        str     w1, [x29, #40]
    43c522c4:   9100c3a1        add     x1, x29, #0x30
    43c522c8:   72a28000        movk    w0, #0x1400, lsl #16
    43c522cc:   b9000260        str     w0, [x19]		<<<<<< location of our fault
    43c522d0:   7900167f        strh    wzr, [x19, #10]
    43c522d4:   b81f0c22        str     w2, [x1, #-16]!
    43c522d8:   90000302        adrp    x2, 43cb2000 
    43c522dc:   b94bc440        ldr     w0, [x2, #3012]
    43c522e0:   11000403        add     w3, w0, #0x1
    43c522e4:   b90bc443        str     w3, [x2, #3012]
Here is my take on pseudo code for the above:
    x29, x30 --> stack (save these two)
    sp --> x29 (fp)
    x19 --> stack
    arg1 (x0) --> x19
    w0 = 0x45
    arg2 (w1) --> fp+40
    x1 = gp + 0x30
    movk ??
    w0 --> *(x19)  (which yields our fault)

The C source for this function looks like this:

void net_set_ip_header(uchar *pkt, struct in_addr dest, struct in_addr source)
{
        struct ip_udp_hdr *ip = (struct ip_udp_hdr *)pkt;

        /*
         *      Construct an IP header.
         */
        /* IP_HDR_SIZE / 4 (not including UDP) */
        ip->ip_hl_v  = 0x45;
        ip->ip_tos   = 0;
        ip->ip_len   = htons(IP_HDR_SIZE);
        ip->ip_id    = htons(net_ip_id++);
	ip->ip_off   = htons(IP_FLAGS_DFRAG);   /* Don't fragment */
        ip->ip_ttl   = 255;
        ip->ip_sum   = 0;
        /* already in network byte order */
        net_copy_ip((void *)&ip->ip_src, &source);
        /* already in network byte order */
        net_copy_ip((void *)&ip->ip_dst, &dest);
}
Going through the assembly instruction by instruction: All indications are that x19 is bad. The fault is triggered by trying to store the 0x45 value using the "ip" pointer. The value of x19 in the register dump is 7daedc8e, which looks OK to me. Further examination though shows this to be an alignment issue. Using the uboot md.w command on this address works fine. Using md.l also yields a Synchronous Abort.

So this a deeper issue with U-Boot and armv8. Somebody must have tackled this before (unless I am the first person in the world trying to use a network driver with U-Boot on armv8 hardware, which hardly seems likely.

U-Boot and a buffer for the transmit packet

Interestingly, there is only one. U-Boot does not allow any outstanding transmit packets, nor does it queue them, so one buffer will do. The pointer to the buffer is initialized in net/net.c in net_init() as follows:
    net_tx_packet = &net_pkt_buf[0] + (PKTALIGN - 1);
    net_tx_packet -= (ulong)net_tx_packet % PKTALIGN;
We add a print statement to the code that shows us that the value of PKTALIGN is 64. Using grep shows us:
include/net.h:#define PKTALIGN	ARCH_DMA_MINALIGN


Have any comments? Questions? Drop me a line!

Tom's electronics pages / tom@mmto.org