May 24, 2019

Grinder - The scoop on ovenp and ovenb

These are started in the "Blue" window as part of the pilot startup. What do they actually do? The documentation (which is entirely correct) states that they listen on a "well known port" and transmit information to the requester.

They do exactly this. The question though is who the requestor might be (or more precisely, who the expected requester is in normal operation.) The truth is that these are almost certainly seldom if ever used. They are part of a fallback scheme to provide an automatic way for parts of the parameter database to be reinitialized in extremely rare circumstances.

Ovenp listens on port 5102 (PORTRP), and ovenb listens on port 5104 (PORTRB). They will transmit the p_database or b_database respectively, exactly as the V computer would do contacted on the same port.

The purpose of this is to answer requests from a V computer that is rebooting for some reason. The intent is to allow a transparent reloading of parameters and biparameters if a V computer should unexpectedly reboot.

When a V computer boots, it calls getparameters(), which calls getdatabase() (all of this in oven.c). The getdatabase() function has a list of places is tries to get B and P data from. The list looks like this:

The first item in the list (the boothost) will currently try to contact ovenb and ovenp on crater. However, if the ovenb and ovenp tasks were not running, one of the other V computers would be likely to be available.

The machine astro no longer exists, so we can forget about it. There is a machine "astro.as.arizona.edu", but it no longer runs Solaris and certainly does not run ovenp and ovenb. Not only that, it is on a different IP number now (128.196.208.2).

The "boot host" is specified in the MV147 configuration ram for VxWorks and may well be corrupt (the V computers boot from EPROM rather than the network, and do not depend on this information in battery backed up RAM). However, it may well be "crater" if the battery has not gone dead and the IP will be whatever is in battery backed RAM for the boothost.

Note that the IP numbers for these machines are compiled into the oven code. They are set as follows:

	hostAdd ("oven0v0", "192.168.1.40");
        hostAdd ("oven0v1", "192.168.1.41");
        hostAdd ("oven0v2", "192.168.1.42");
        hostAdd ("oven1v0", "192.168.1.50");
        hostAdd ("crater", "192.168.1.11");
        hostAdd ("dorado", "192.168.1.10");
        hostAdd ("astro", "128.196.176.1");

The fact that these IP numbers are immutable and compiled into code in ROM is important to keep in mind as we migrate the control room software to linux. The linux host will need to respond to the IP number for crater for this belt and suspenders scheme to work.

Starting up ovenp and ovenb

I added some print statements to shmalloc and then see this when I launch either ovenp or ovenb:
shm - cannot get shared memory (key=100, flag=124)
shm - cannot get shared memory (key=120, flag=124)
shm - cannot get shared memory (key=130, flag=124)
shm - cannot get shared memory (key=140, flag=124)
shm - cannot get shared memory (key=150, flag=124)
shm - cannot get shared memory (key=160, flag=124)
shm - cannot get shared memory (key=170, flag=124)
shm - cannot get shared memory (key=180, flag=124)
shm - cannot get shared memory (key=190, flag=124)
poven error -3
Notice that key 110 is missing (that shm segment exists). For some crazy reason, these scan through all computers for a given oven. The last shm segment will win (overwriting any prior winners in the global pointer). The upshot of this is that at the least, I should quietly ignore these failures to find shared memory.

I am not sure about the poven error. What we need is a list of these error codes in some document if we are simply going to use cryptic numbers like this. Better yet an informative error message.

Digging through the source indicates that error -3 is returned when the bind() call in tportwrite() fails, which would indicate that a server is already started. So either be satisfied with that server, or kill it and try this again.

What about the MM6702 non-volatile ram board

These depend on a battery soldered onto the board, and it is hard to imagine that this battery has not gone dead (or begin to leak) after all these years.

Notice in the above list that this is the last resort for reinitializing the database. Even with a dead battery, this ought to work if the board has simply rebooted and power has not been lost. It is worth noting that there is no checksum or magic number. If the MM6702 hardware responds to a bus probe, the contents are copied without question.

This is a last ditch fallback that has almost certainly never been exercised. These boards could be removed and we could continue to run (the operator would simply have to load the database himself, or get one of the other myriad sources earlier on the list up and running properly.


Have any comments? Questions? Drop me a line!

Tom's home page / tom@mmto.org