March 30, 2024

Fedora 38 -- FAIL

My office machine (and web server) is still running 37. Fedora 40 is due to come out in April and F38 will be end of life 4 months later. Fedora 37 is already end of life, so this machine needs some prompt attention.

Here are the conclusions up front. My efforts to do a dnf upgrade from 37 to 38 failed because of a failing hard drive. I give all the details here as they give ideas for possible future troubleshooting. What I intend to do with that system is to replace the failing 2T drive, and do a fresh install of Fedora 39 on the new drive. I'll start a new page when I begin doing the install on the new drive. See the end of this for a discussion of the drive replacement.

Get started -- 3-28-2024

dnf update just hangs

last kernel was from 2023 (May 31, 2023)

Fedora 37 went end of life Dec 5, 2023
Fedora 38 end of life is May 14, 2024

dnf upgrade --refresh sort of works, shows 52 packages, but then says "already downloaded, skipped" and then seems to hang.

Full partition

It isn't hung. After I wait a while it tells me:
Disk Requirements:
   At least 2248MB more space needed on the / filesystem.
   At least 11MB more space needed on the /boot filesystem.

Indeed, the root partition is full:

[root@cholla tom]# df
Filesystem      1K-blocks      Used  Available Use% Mounted on
/dev/sda1          587908    449620      95280  83% /boot
/dev/sda2        51422028  49972992          0 100% /
/dev/sdb5      1853934316 280960004 1478773324  16% /u2
/dev/sda5      1853934228 181976704 1577756540  11% /u1

By deleting 2 old kernels, I get:

[root@cholla tom]# df
Filesystem      1K-blocks      Used  Available Use% Mounted on
/dev/sda1          587908    211008     333892  39% /boot

du -xs / stays on one filesystem

fdisk -l shows:
/dev/sda1  *         2048    1230847    1228800  600M 83 Linux
/dev/sda2         1230848  106088447  104857600   50G 83 Linux
/dev/sda3       106088448  139642879   33554432   16G 82 Linux swap / Solaris
/dev/sda4       139642880 3907028991 3767386112  1.8T  5 Extended
/dev/sda5       139644928 3907028991 3767384064  1.8T 83 Linux

So we have a 50G root (and it is full)

[root@cholla /]# du -xs /
49973308	/

If you do "du -xs *" you don't get restricted to the root, but honestly only /boot /u1 and /u2 need to be ignored. And these 3 subdirectories are by far the biggest:

[root@cholla /]# du -xs *
446296		opt
14374548	usr
35025468	var

The big thing is /var, and /var/cache, and ultimately /var/cache/dnf (22G)

There is stuff from fc34, fc35, fc37, ... I make a backup as /u1/dnf_cache.tar then

cd /var/cache/dnf
rm -rf *

Update the F37 system and reboot

Now we should have plenty of space (and we do).
dnf upgrade --refresh
This ends up with over 600 packages to update.
I say "yes" and away it goes.
It takes quite a while, but finishes with:

df
Filesystem      1K-blocks      Used  Available Use% Mounted on
/dev/sda1          587908    349032     195868  65% /boot
/dev/sda2        51422028  28409760   20374444  59% /

It also has a 6.5.12 kernel installed. (I am now running 6.3.4), so ...

sync
reboot

It comes up running 6.5.12!!

Start the actual F38 upgrade

su
dnf -y system-upgrade download --refresh --releasever=38

We hit some conflicts:

xplayer-plparser-1.0.2-7.fc29.x86_64
dnf erase xplayer-plparser

jack-audio-connection-kit-example-clients-1.9.21-3.fc37.x86_64
dnf erase jack-audio-connection-kit-example-clients

and repeat:

dnf -y system-upgrade download --refresh --releasever=38
away it goes for: 3454 packages ...
rm /u1/dnf_cache.tar  (I don't seem to need this backup)
dnf system-upgrade reboot

A failing disk rears its ugly head

At this point I ran into problems caused by my failing disk, but it took me a while to figure out that was the problem.

I am doing this remote. I type the above at 9:33 PM, and expect that it will take at least an hour. I consider going to bed and checkint in the morning. But I check at 9:50 and it responds. Unfortunately it is still running f37.

Some searches suggest looking at:

    dnf system-upgrade log
    /var/log/dnf.log

Looking at dnf.log shows:

2024-03-28T21:38:56-0700 CRITICAL Problem opening package kernel-debug-modules-6.7.10-100.fc38.x86_64.rpm
2024-03-28T21:38:56-0700 CRITICAL Problem opening package adobe-source-han-sans-cn-fonts-2.004-4.fc38.noarch.rpm
2024-03-28T21:38:56-0700 CRITICAL Problem opening package adobe-source-han-serif-cn-fonts-2.001-3.fc38.noarch.rpm

Then a bunch of python traceback that includes the message:
dnf.exceptions.Error: GPG check FAILED
2024-03-28T21:38:56-0700 CRITICAL Error: GPG check FAILED

I do this:

su
updatedb
[root@cholla log]# locate adobe-source-han-sans-cn-fonts
/usr/share/licenses/adobe-source-han-sans-cn-fonts
/usr/share/licenses/adobe-source-han-sans-cn-fonts/LICENSE.txt
/var/lib/dnf/system-upgrade/fedora-376ef8e983c65ce0/packages/adobe-source-han-sans-cn-fonts-2.004-4.fc38.noarch.rpm
[root@cholla log]# locate kernel-debug-modules-6.7.10
/var/lib/dnf/system-upgrade/updates-b7ba662710b98f1a/packages/kernel-debug-modules-6.7.10-100.fc38.x86_64.rpm

[root@cholla licenses]# cd /var/lib/dnf/system-upgrade/updates-b7ba662710b98f1a/packages
[root@cholla packages]# rm kernel-debug-modules-6.7.10-100.fc38.x86_64.rpm
[root@cholla packages]# cd /var/lib/dnf/system-upgrade/fedora-376ef8e983c65ce0/packages
[root@cholla packages]# rm adobe-source-han-sans-cn-fonts-2.004-4.fc38.noarch.rpm

dnf -y system-upgrade download --refresh --releasever=38
It skips 3453 packages and downloads the kernel package again.
It trips over the adobe thing I deleted, then complains
"GPG check FAILED"

Note that you can pass dnf --nogpgcheck

cd /var/lib
rm -rf system-up*
dnf -y system-upgrade download --refresh --releasever=38
away it goes again 3453 packages

And now I will try the reboot step, but tell it not to perform GPG checks. This will just be a brief leapfrog through f38 to f39 anyway. And I don't care much about the kernel debug modules. It is honestly quite nice that it drops back to 37 on an error, rather than giving me a messed up system.

dnf system-upgrade reboot --nogpgcheck

Away we go again at 10:23 PM  And at 10:29 we are in trouble again.
This time it ends with:

2024-03-28T22:26:04-0700 DEBUG Using rpmkeys executable at /usr/bin/rpmkeys to verify signatures
2024-03-28T22:28:13-0700 CRITICAL Problem opening package edk2-ovmf-20230524-3.fc38.noarch.rpm

2024-03-28T22:28:13-0700 CRITICAL Error: GPG check FAILED

dnf clean all
dnf upgrade refresh
dnf -y system-upgrade download --refresh --releasever=38

error: /var/lib/dnf/system-upgrade/updates-b7ba662710b98f1a/packages/edk2-ovmf-20230524-3.fc38.noarch.rpm: Fread failed: Input/output error
error: /var/lib/dnf/system-upgrade/updates-b7ba662710b98f1a/packages/firefox-langpacks-124.0-1.fc38.x86_64.rpm: Fread failed: Input/output error
Problem opening package edk2-ovmf-20230524-3.fc38.noarch.rpm
Problem opening package firefox-langpacks-124.0-1.fc38.x86_64.rpm
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.

Disk errors

This is erratic enough I am starting to wonder about disk problems. And indeed, a look at /var/log/messages shows a lot of this.
(but only for March 28 .....)

ata1.00: status: { DRDY ERR }
Mar 28 22:43:54 cholla kernel: ata1.00: error: { UNC }
Mar 28 22:43:54 cholla kernel: ata1.00: configured for UDMA/133
Mar 28 22:43:54 cholla kernel: sd 0:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
Mar 28 22:43:54 cholla kernel: sd 0:0:0:0: [sda] tag#12 Sense Key : Medium Error [current]
Mar 28 22:43:54 cholla kernel: sd 0:0:0:0: [sda] tag#12 Add. Sense: Unrecovered read error - auto reallocate failed
Mar 28 22:43:54 cholla kernel: sd 0:0:0:0: [sda] tag#12 CDB: Read(10) 28 00 04 3d ca b8 00 00 08 00
Mar 28 22:43:54 cholla kernel: I/O error, dev sda, sector 71158456 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Mar 28 22:43:54 cholla kernel: ata1: EH complete

The disk is a WDC WD20EZRZ-00Z 2T disk.
This is a "blue" 5400 rpm drive with 64M of cache.

-- Enough for one night, time for bed.

New disk

I get on Amazon and do some searching. I find a WD20EZBZ for $68. It is a 7200 rpm "blue" with 32M cache. Manufactured Feb 3, 2024 with a 2 year warranty

Interestingly the system runs just fine for the next 3 days (over the weekend) while I wait for the new disk to arrive and find time in my schedule to work on this some more. No errors in /var/log/messages.

I realize after ordering the new drive, that I have a 2T drive, unused, on my shelf. It is a WD2000F9YZ, "black", 7200 rpm with 64M cache. Manufactured in Dec, 2013!! I see this exact model selling on Amazon for $39 and called a "datacenter drive" with a yellow label. Who knows what all of these colors and words mean. Or why what used to be a "black" is now a "yellow" datacenter drive.

So which drive do I use? I am inclined towards the recently manufactured "blue" with a 2 year warranty. I selected it on Amazon primarily because it was being sold by the "Western Digital Store". The only downside is the 32M cache.

My plan is to install F39 on the new disk at home, then transport it to my office, copy the /u1 files from the failing disk to the new disk, then configure the new disk with the proper IP number and set up ssh and the web server. Honestly if I just set up the IP and get ssh running, I can go home to do everything else.

All of this will be another page. See the following:


Have any comments? Questions? Drop me a line!

Adventures in Computing / tom@mmto.org