This new PC is showing serious instability. It has just thrown up
hundreds (yes, hundreds) of kernel warnings like this:
On Tue, 08 Jun 2021 06:28:15 -0400, Grimble <grimble@nomail.afraid.org> wrote:
This new PC is showing serious instability. It has just thrown up
hundreds (yes, hundreds) of kernel warnings like this:
Please open a bug report so it can be referred to people more
experienced with
debugging kernel problems.
In the bug report, include the output of inxi -MSGxx and attach the
output of
journalctl --no-hostname -b>journal.txt, with everything before the failure deleted.
My less than expert experience suggests it may be a gpu (not cpu) heat problem
or a problem with the gpu video driver/firmware. I haven't used boinc in
the
last year or two. IIRC it has an option for whether or not to use the
gpu. Try
disabling that so it only uses the cpu.
Regards, Dave Hodgins
On 9/6/21 2:39 am, David W. Hodgins wrote:Doug, I've noticed the same problem. Googling around, it seems to be
On Tue, 08 Jun 2021 06:28:15 -0400, GrimbleI am still running BOINC, but disabled it due to high CPU usage. I will watch Grimble's progress with interest. The only other problem I had
<grimble@nomail.afraid.org> wrote:
This new PC is showing serious instability. It has just thrown up
hundreds (yes, hundreds) of kernel warnings like this:
Please open a bug report so it can be referred to people more
experienced with
debugging kernel problems.
In the bug report, include the output of inxi -MSGxx and attach the
output of
journalctl --no-hostname -b>journal.txt, with everything before the
failure
deleted.
My less than expert experience suggests it may be a gpu (not cpu) heat
problem
or a problem with the gpu video driver/firmware. I haven't used boinc
in the
last year or two. IIRC it has an option for whether or not to use the
gpu. Try
disabling that so it only uses the cpu.
Regards, Dave Hodgins
was a "WONTFIX" bug. It keeps generating lines in the journal every second, similar to the following:
[CODE]
Jun 12 21:44:23 dougshost.douglaidlaw.net boinc[2081]: No protocol
specified
[/CODE]
This seems to have been around for a while, and nobody at BOINC knows
what to do about it.
Googled "Mageia 8 iommu" and this popped up first: https://forums.mageia.org/en/viewtopic.php?f=41&t=6936
which describes errors that are very similar to the ones I have been experiencing. (Reminder: this is a Ryzen 6 core processor) BIOS had
IOMMU = Automatic. Changed to "Enabled", so lets see what happens.
The post and a linked post also suggests adding "iommu=soft" or
"iommu=pt" to the boot menu. I would welcome some informed opinion as to which to use.
On Fri, 18 Jun 2021 08:37:16 -0400, Grimble <grimble@nomail.afraid.org> wrote:
Googled "Mageia 8 iommu" and this popped up first:
https://forums.mageia.org/en/viewtopic.php?f=41&t=6936
which describes errors that are very similar to the ones I have been
experiencing. (Reminder: this is a Ryzen 6 core processor) BIOS had
IOMMU = Automatic. Changed to "Enabled", so lets see what happens.
The post and a linked post also suggests adding "iommu=soft" or
"iommu=pt" to the boot menu. I would welcome some informed opinion as to
which to use.
Thanks for the info. The /usr/share/doc/kernel-doc/admin-guide/kernel-parameters.txt
file from the kernel-doc package only lists the various possible settings for iommu,
with no details on what they do, or what they are abbreviations of.
In general, when a device parameter is set to soft, I expect that means the kernel
processes the interrupts etc. in the kernel software rather then relying on the
processor built into the memory management hardware chip. I have no idea what pt
would refer to.
Keep a watch out for any bios/uefi firmware updates from the motherboard manufacturer
that fix the iommu issues.
Regards, Dave Hodgins
On 2021-06-18, David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
On Fri, 18 Jun 2021 08:37:16 -0400, Grimble <grimble@nomail.afraid.org> wrote:
Googled "Mageia 8 iommu" and this popped up first:
https://forums.mageia.org/en/viewtopic.php?f=41&t=6936
which describes errors that are very similar to the ones I have been
experiencing. (Reminder: this is a Ryzen 6 core processor) BIOS had
IOMMU = Automatic. Changed to "Enabled", so lets see what happens.
The post and a linked post also suggests adding "iommu=soft" or
"iommu=pt" to the boot menu. I would welcome some informed opinion as to >>> which to use.
Thanks for the info. The /usr/share/doc/kernel-doc/admin-guide/kernel-parameters.txt
file from the kernel-doc package only lists the various possible settings for iommu,
with no details on what they do, or what they are abbreviations of.
In general, when a device parameter is set to soft, I expect that means the kernel
processes the interrupts etc. in the kernel software rather then relying on the
processor built into the memory management hardware chip. I have no idea what pt
would refer to.
Keep a watch out for any bios/uefi firmware updates from the motherboard manufacturer
that fix the iommu issues.
Regards, Dave Hodgins
https://unix.stackexchange.com/questions/592538/what-are-the-implication-of-using-iommu-force-in-the-boot-kernel-options
seems to have some very brief explanations.
I did a google linux iommu search
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 481 |
Nodes: | 16 (2 / 14) |
Uptime: | 28:25:40 |
Calls: | 9,544 |
Calls today: | 4 |
Files: | 13,656 |
Messages: | 6,140,683 |