Here are the conclusions up front. My efforts to do a dnf upgrade from 37 to 38 failed because of a failing hard drive. I give all the details here as they give ideas for possible future troubleshooting. What I intend to do with that system is to replace the failing 2T drive, and do a fresh install of Fedora 39 on the new drive. I'll start a new page when I begin doing the install on the new drive. See the end of this for a discussion of the drive replacement.
dnf update just hangs
last kernel was from 2023 (May 31, 2023)
Fedora 37 went end of life Dec 5, 2023
Fedora 38 end of life is May 14, 2024
dnf upgrade --refresh sort of works, shows 52 packages, but then says "already downloaded, skipped" and then seems to hang.
Disk Requirements: At least 2248MB more space needed on the / filesystem. At least 11MB more space needed on the /boot filesystem.
Indeed, the root partition is full:
[root@cholla tom]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 587908 449620 95280 83% /boot /dev/sda2 51422028 49972992 0 100% / /dev/sdb5 1853934316 280960004 1478773324 16% /u2 /dev/sda5 1853934228 181976704 1577756540 11% /u1
By deleting 2 old kernels, I get:
[root@cholla tom]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 587908 211008 333892 39% /boot
du -xs / stays on one filesystem
fdisk -l shows: /dev/sda1 * 2048 1230847 1228800 600M 83 Linux /dev/sda2 1230848 106088447 104857600 50G 83 Linux /dev/sda3 106088448 139642879 33554432 16G 82 Linux swap / Solaris /dev/sda4 139642880 3907028991 3767386112 1.8T 5 Extended /dev/sda5 139644928 3907028991 3767384064 1.8T 83 Linux
So we have a 50G root (and it is full)
[root@cholla /]# du -xs / 49973308 /
If you do "du -xs *" you don't get restricted to the root, but honestly only /boot /u1 and /u2 need to be ignored. And these 3 subdirectories are by far the biggest:
[root@cholla /]# du -xs * 446296 opt 14374548 usr 35025468 var
The big thing is /var, and /var/cache, and ultimately /var/cache/dnf (22G)
There is stuff from fc34, fc35, fc37, ... I make a backup as /u1/dnf_cache.tar then
cd /var/cache/dnf rm -rf *
dnf upgrade --refresh This ends up with over 600 packages to update. I say "yes" and away it goes. It takes quite a while, but finishes with: df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 587908 349032 195868 65% /boot /dev/sda2 51422028 28409760 20374444 59% /
It also has a 6.5.12 kernel installed. (I am now running 6.3.4), so ...
sync reboot
It comes up running 6.5.12!!
su dnf -y system-upgrade download --refresh --releasever=38
We hit some conflicts:
xplayer-plparser-1.0.2-7.fc29.x86_64 dnf erase xplayer-plparser jack-audio-connection-kit-example-clients-1.9.21-3.fc37.x86_64 dnf erase jack-audio-connection-kit-example-clients
and repeat:
dnf -y system-upgrade download --refresh --releasever=38 away it goes for: 3454 packages ... rm /u1/dnf_cache.tar (I don't seem to need this backup) dnf system-upgrade reboot
At this point I ran into problems caused by my failing disk, but it took me a while to figure out that was the problem.
I am doing this remote. I type the above at 9:33 PM, and expect that it will take at least an hour. I consider going to bed and checkint in the morning. But I check at 9:50 and it responds. Unfortunately it is still running f37.
Some searches suggest looking at:
dnf system-upgrade log /var/log/dnf.log
Looking at dnf.log shows:
2024-03-28T21:38:56-0700 CRITICAL Problem opening package kernel-debug-modules-6.7.10-100.fc38.x86_64.rpm 2024-03-28T21:38:56-0700 CRITICAL Problem opening package adobe-source-han-sans-cn-fonts-2.004-4.fc38.noarch.rpm 2024-03-28T21:38:56-0700 CRITICAL Problem opening package adobe-source-han-serif-cn-fonts-2.001-3.fc38.noarch.rpm Then a bunch of python traceback that includes the message: dnf.exceptions.Error: GPG check FAILED 2024-03-28T21:38:56-0700 CRITICAL Error: GPG check FAILED
I do this:
su updatedb [root@cholla log]# locate adobe-source-han-sans-cn-fonts /usr/share/licenses/adobe-source-han-sans-cn-fonts /usr/share/licenses/adobe-source-han-sans-cn-fonts/LICENSE.txt /var/lib/dnf/system-upgrade/fedora-376ef8e983c65ce0/packages/adobe-source-han-sans-cn-fonts-2.004-4.fc38.noarch.rpm [root@cholla log]# locate kernel-debug-modules-6.7.10 /var/lib/dnf/system-upgrade/updates-b7ba662710b98f1a/packages/kernel-debug-modules-6.7.10-100.fc38.x86_64.rpm [root@cholla licenses]# cd /var/lib/dnf/system-upgrade/updates-b7ba662710b98f1a/packages [root@cholla packages]# rm kernel-debug-modules-6.7.10-100.fc38.x86_64.rpm [root@cholla packages]# cd /var/lib/dnf/system-upgrade/fedora-376ef8e983c65ce0/packages [root@cholla packages]# rm adobe-source-han-sans-cn-fonts-2.004-4.fc38.noarch.rpm dnf -y system-upgrade download --refresh --releasever=38 It skips 3453 packages and downloads the kernel package again. It trips over the adobe thing I deleted, then complains "GPG check FAILED"
Note that you can pass dnf --nogpgcheck
cd /var/lib rm -rf system-up* dnf -y system-upgrade download --refresh --releasever=38 away it goes again 3453 packages
And now I will try the reboot step, but tell it not to perform GPG checks. This will just be a brief leapfrog through f38 to f39 anyway. And I don't care much about the kernel debug modules. It is honestly quite nice that it drops back to 37 on an error, rather than giving me a messed up system.
dnf system-upgrade reboot --nogpgcheck Away we go again at 10:23 PM And at 10:29 we are in trouble again. This time it ends with: 2024-03-28T22:26:04-0700 DEBUG Using rpmkeys executable at /usr/bin/rpmkeys to verify signatures 2024-03-28T22:28:13-0700 CRITICAL Problem opening package edk2-ovmf-20230524-3.fc38.noarch.rpm 2024-03-28T22:28:13-0700 CRITICAL Error: GPG check FAILED dnf clean all dnf upgrade refresh dnf -y system-upgrade download --refresh --releasever=38 error: /var/lib/dnf/system-upgrade/updates-b7ba662710b98f1a/packages/edk2-ovmf-20230524-3.fc38.noarch.rpm: Fread failed: Input/output error error: /var/lib/dnf/system-upgrade/updates-b7ba662710b98f1a/packages/firefox-langpacks-124.0-1.fc38.x86_64.rpm: Fread failed: Input/output error Problem opening package edk2-ovmf-20230524-3.fc38.noarch.rpm Problem opening package firefox-langpacks-124.0-1.fc38.x86_64.rpm The downloaded packages were saved in cache until the next successful transaction. You can remove cached packages by executing 'dnf clean packages'.
This is erratic enough I am starting to wonder about disk problems.
And indeed, a look at /var/log/messages shows a lot of this.
(but only for March 28 .....)
ata1.00: status: { DRDY ERR } Mar 28 22:43:54 cholla kernel: ata1.00: error: { UNC } Mar 28 22:43:54 cholla kernel: ata1.00: configured for UDMA/133 Mar 28 22:43:54 cholla kernel: sd 0:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s Mar 28 22:43:54 cholla kernel: sd 0:0:0:0: [sda] tag#12 Sense Key : Medium Error [current] Mar 28 22:43:54 cholla kernel: sd 0:0:0:0: [sda] tag#12 Add. Sense: Unrecovered read error - auto reallocate failed Mar 28 22:43:54 cholla kernel: sd 0:0:0:0: [sda] tag#12 CDB: Read(10) 28 00 04 3d ca b8 00 00 08 00 Mar 28 22:43:54 cholla kernel: I/O error, dev sda, sector 71158456 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 Mar 28 22:43:54 cholla kernel: ata1: EH complete
The disk is a WDC WD20EZRZ-00Z 2T disk.
This is a "blue" 5400 rpm drive with 64M of cache.
-- Enough for one night, time for bed.
I get on Amazon and do some searching. I find a WD20EZBZ for $68. It is a 7200 rpm "blue" with 32M cache. Manufactured Feb 3, 2024 with a 2 year warranty
Interestingly the system runs just fine for the next 3 days (over the weekend) while I wait for the new disk to arrive and find time in my schedule to work on this some more. No errors in /var/log/messages.
I realize after ordering the new drive, that I have a 2T drive, unused, on my shelf. It is a WD2000F9YZ, "black", 7200 rpm with 64M cache. Manufactured in Dec, 2013!! I see this exact model selling on Amazon for $39 and called a "datacenter drive" with a yellow label. Who knows what all of these colors and words mean. Or why what used to be a "black" is now a "yellow" datacenter drive.
So which drive do I use? I am inclined towards the recently manufactured "blue" with a 2 year warranty. I selected it on Amazon primarily because it was being sold by the "Western Digital Store". The only downside is the 32M cache.
My plan is to install F39 on the new disk at home, then transport it to my office, copy the /u1 files from the failing disk to the new disk, then configure the new disk with the proper IP number and set up ssh and the web server. Honestly if I just set up the IP and get ssh running, I can go home to do everything else.
All of this will be another page. See the following:
Adventures in Computing / tom@mmto.org