June 16, 2019

Grinder - Getting core dumps

For unknown reasons, the "odisp" command in the IRAF mirror package began giving segmentation faults. It works fine on one linux system, and on another it began doing this suddenly for no apparent reason.
vocl> odisp ( date=190520, info="ttmp",asp="b",frame=3 )
ERROR: segmentation violation
A variety of experiments were performed. Using "display dev$pix" demonstrates that ximtool is working just fine. Running "cl < hello.cl" with the following one line script yields no problems.
[tom@trona Casting]$ cat hello.cl
print "Hello Oven"
I would like to get a core dump and browse it a bit with gdb and see if that yields some clues, but that is not as easy as it sounds.

It turns out, for some unexplainable reason, on systems running systemd (why should that matter?) the core dumps go to /var/lib/systemd/coredump. This was verified by writing and running the following short C program (fail.c):

main()
{
    char *p = (char *) 0;
    int val;

    val = *p;
}
Compiling and running this on my Fedora 30 system yields:
ls /var/lib/systemd/coredump
core.fail.1004.627907121ac942628ff05686de6393a4.9681.1560717343000000.lz4
This compressed thing can be decompressed using unlz4 (or lz4 -d)

Then run "gdb fail core"

bt - gives a backtrace
disass 0x40116 - disassembles at address
disas STARTADDRESS ENDADDRESS

But we ain't getting no core dump?!

After running odisp and getting the message shown above, examining /var/lib/systemd/coredump shows nothing there. One suggestion is the following bash command:
ulimit -c unlimited
Supposedly this will be inherited by child processes. For the case in question, I run a terminal with bash, and give it the "ulimit -c unlimited command", then I start the cl, then I type "odisp ( date=190520, info="ttmp",asp="b",frame=3 )" and get the segment violation message. But no core dump appears.
coredumpctl
No coredumps found.
On this same system (the one yielding the seg fault running odisp) I compile and run the fail.c program:
./fail
Segmentation fault (core dumped)
[tom@linuxpilot Fail]$ coredumpctl
TIME                            PID   UID   GID SIG COREFILE  EXE
Sun 2019-06-16 13:58:56 MST   26008  1002  1002  11 present   /u1/tom/Fail/fail
[tom@linuxpilot Fail]$
[tom@linuxpilot Fail]$ ls /var/lib/systemd/coredump
core.fail.1002.7c67ecda536347a690b162a87c6736ff.26008.1560718736000000.lz4
So, this system is doing what it should be regarding core dumps, but the IRAF cl must be catching signals or doing something obnoxious and brain-damaged to prevent it. The only way to push further on this problem is to figure out how to run processes without the cl getting involved.
Have any comments? Questions? Drop me a line!

Tom's home page / tom@mmto.org