January 10, 2023

Kyu networking -- H3 network PHY, fails after switch to 100 Mbit

We have figured out the PHY business and autonegotiation and a glance at our little network switch tells us we have indeed transitioned to 100 Mbit. But once Kyu is booted up, we are unable to do anything with the network.

At first I thought that I had the PHY autonegotion business right and it was all about getting the emac driver to work at the new speed, but there is more involved than that.

When I examine the startup messages more closely I see that a fair number of rx and tx packets have been transmitted. Also I see that Kyu is sometimes (but not always) able to fetch the symbol table via TFTP.

The next thought is that I am doing something after the symbol table fetch that is the problem. So I need to find where I fetch the symbols table. I look at main.c and see that it calls "shell_init() with notes about that needing the network to fetch the symbol table. This routine is in symbols.c and does little more than using tftp_fetch() to get the symbol table. Initialization rattles along, eventually launching a thread to run user_init(). This in turn launches another thread to run shell_main() which is in tests.c. It is hard to imagine how any of this could cause the network to get jammed up (and only get jammed up in 100 Mbit mode).

Also, sometimes we get the symbol table and sometimes we don't. I try setting "ping" going while I reboot Kyu. We get and respond to a few ICMP packets for a short interval after the autonegotiation and even while the symbol table transfer is going on.

This starts to look like some kind of race that is being exacerbated by the faster network speed. But who can say at this point?

Get more information (1-12-2023)

I at first thought I had done something wrong with the switch to 100 Mbit, but now that I have noticed that the symbol table transfer works (some of the time) I am thinking that I have it correct and am just provoking some lurking bug. We need more information.

An existing facility "n 11" calls emac_debug() and displays lots of interesting information. It shows the entirety of both the Rx and Tx buffer rings. I currently have 16 entries in each of the Rx and Tx rings. This is handy for debug (and I may use 8 during debug). Linux uses 256.

Study the NetBSD driver

This promises to be more straightforward and digestible than the linux driver.
cd /u1/BSD/NetBSD-git (segfaults!!)
ctags -R
cd /u1/BSD/NetBSD-git
ctags -R .
cd /u1/BSD/NetBSD-git/sys/arch/arm/sunxi
vi sunxi_emac.c
This is interesting, but has not yet led me to a solution.

We are dropping the link!

I add code to tell me whether the link is up or down and what speed and duplex. I can read PHY status registers at any time to get the current state of things, so I do this after every Tx interrupt and when sending a packet.

What I see is that the link goes from UP to DOWN. Curiously I am able to transfer a few packets after the PHY status indicates the link is down. My little Netgear switch still shows 100 Mbit even after the PHY decides that the link is down. Along with the links status going to DOWN, the values in the PHY "peer" register all go to zero.

I try a variety of things. What I find most useful is looking at the datasheets for the two PHY chips (the RTL8211 and the 8710a chip) and contemplating the bits in the registers. Some wording in the RTL8211 sheet makes me wonder about the setting of the ANEG_ENABLE bit. I set it, then trigger autonegotion, then when autonegotiation finishes I clear it. I decide not to clear it and just leave it set. Now everything works! My take on this is that it is telling the PHY chip to ignore the setting of the speed and duplex bits in the control register and just obey what has been determined by autonegotiation.

100 Mbit, but slower

Now everything works and the LED on my network switch claim that I am at 100 Mbit. However my 1M test transfer via TCP now takes 5 seconds. It used to take 1 second when I was running at 10 Mbit with 200K of data. So it seems almost that I am still running at 10 Mbit if that is possible. This is certainly frustrating.

Wireshark shows Kyu sending two packets, each with 500 byte payloads, back to back. Then an ACK comes almost immediately from the linux side. Then there is a bit of a delay (on the order of .002 to .006 seconds) before Kyu sends another pair. If this delay was .005 seconds on the average, this would total to 5 seconds. I wonder if we see this on the BBB where 100 Mbit seems to run as fast as it should.

It isn't that hard to look at the BBB. I use "kyucon1" to get a console to ttyUSB1, then stop wireshark and change the mac address in the capture filter "ether host xxxx". and I am able to get the data I want. The 1M transfer goes in 0.15 seconds. Now I see Kyu sending sometimes 3, 2, or rarely 1 packet with a 500 byte payload back to back. In general, the delay from when linux sends the ACK to when Kyu again sends a packet is now very small. 0.00005 seems typical. Taking 1000 delays like this gives us 0.05 seconds rather than 5.0 seconds.

So, the new idea now is to just send a bunch of packets and see how long it takes to send them. I try this the stupid way by calling emac_send in a loop 1000 times. This could never work because there are only 64 slots in the TX ring and when it gets full, the routine will just print a message, do nothing and return. What does happen is I get:

Kyu, ready> n 12
PANIC: do_irq, resume

Kyu, ready> l
  Thread:       name (  &tp   )    state     sem       pc       sp     pri
  Thread:      shell (400a7a28)   READY I          400045fc 40568000   11
  Thread:  n_wrapper (400a737c)   FAULT J          40010c90 40576000   14
  Thread:     net-in (400a7658)     SEM I  net-inq 4001112c 40570000   20
  ....
Kyu, ready> u n_wrapper
Thread n_wrapper, dump (-)
  Thread:  n_wrapper (400a737c)   FAULT J          40010c90 40576000   14
 state:   FAULT
 stack: 40576000
 mode: JMP
CPU: 0
pc : [<40010c90>]	   lr : [<40010c90>]
sp : 40577f20  ip : 60000013	 fp : 40577f3c
r10: 00000002  r9 : 79f38ee0	 r8 : 7dfb4878
r7 : 79f3ac90  r6 : 40000000	 r5 : 00000002  r4 : 4005c100
r3 : 400a73c0  r2 : 00000000	 r1 : 00000000  r0 : 00000001
cpsr: 600000d3  Flags: nZCv  IRQs off  FIQs off  Mode SVC_32
PC = 40010c90 ( change_thread+194 )
Called from resched+230 -- 400110b8
Called from thr_block+7c -- 40010258
Called from kyu_panic+a4 -- 4000e5bc
Called from panic+1c -- 4000e5e4
Called from do_irq+c4 -- 4000117c
Called from irq+34 -- 40000434
Called from board_net_send+18 -- 40002f88
Called from test_fast+94 -- 4000b434
Called from n_wrapper+30 -- 4000a5d0
Called from thr_exit -- 4000ffa0

The smart way to do this involves coding up a new routine in emac.c Maybe call it "emac_send_block()". The idea is that it will kick off the send, then block on a semaphore. When the Tx interrupt happens to indicate something (I may have choices) relating to the Tx being finished, it will unblock the semaphore. I can do several of these in series and time them -- and with the resolution of the CCNT counter even just doing one would do the job.

So I did this and asked it to send 10 packets with 1000 byte payloads. They end up 1014 bytes on the wire with ethernet headers. The times in CCNT units were:

data sent = 9467587
data sent = 9464027
So, we sent 10,140 bytes in .0095 seconds. Call this 8 bits per byte and we get 8.5 megabits per second. This is disappointing. In fact the times shown by wireshark indicate it takes just under .001 seconds per 1014 byte packet, confirming this measurement using the CCNT register.

Some things to not forget (TODO)

When the smoke clears on all of this, I want to look into these things:


Have any comments? Questions? Drop me a line!

Kyu / tom@mmto.org