Skip to main content

Error log-Parallel UDF

What should I do when hwloc reports "operatingsystem" warnings?

When the operating system reports invalid locality information (because of either software or hardware bugs), hwloc may fail to insert some objects in the topology because they cannot fit in the already built tree of resources. If so, hwloc will report a warning like the following. The object causing this error is ignored, the discovery continues but the resulting topology will miss some objects and may be asymmetric (see also What happens if my topology is asymmetric?).
****************************************************************************
* hwloc has encountered what looks like an error from the operating system.
*
* L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f) without inclusion!
* Error occurred in topology.c line 940
*
* Please report this error message to the hwloc user's mailing list,
* along with the output from the hwloc-gather-topology script.
****************************************************************************
These errors are common on large AMD platforms because of BIOS and/or Linux kernel bugs causing invalid L3 cache information. In the above example, the hardware reports a L3 cache that is shared by 2 cores in the first NUMA node and 4 cores in the second NUMA node. That's wrong, it should actually be shared by all 6 cores in a single NUMA node. The resulting topology will miss some L3 caches.
If your application not care about cache sharing, or if you do not plan to request cache-aware binding in your process launcher, you may likely ignore this error (and hide it by setting HWLOC_HIDE_ERRORS=1 in your environment).
Some platforms report similar warnings about conflicting Packages and NUMANodes. Upgrading the BIOS and/or the operating system may help. Otherwise, as explained in the message, reporting this issue to the hwloc developers (by sending the tarball that is generated by the hwloc-gather-topology script on this platform) is a good way to make sure that this is a software (operating system) or hardware bug (BIOS, etc).

Comments

Popular posts from this blog

TUI Fluent

‎ Table of Contents 1. TUI 1.1. Examples 1.1.1. Steady 1.1.2. Unsteady 1.2. discretization schemes 1.3. Turbulence model 1.4. Reference 1.5. Save residual 1.6. Journal 1.6.1. record journal GUI 1.6.2. The interactive TUI inside Fluent helps: 1.7. define 1.7.1. boundary-conditions 1.8. change rotational velocity of moving reference frame 1.8.1. batch model 1.8.2. interactive console TUI 1.9. set background color 1.9.1. invalid command [background] 1.10. syntax 1.11. Batch model 1.12. Boundary condition 1.12.1. Inlet BC 1.13. Animation/residual/monitor on cluster 1.14. Solver 1.15. Change pressure-velocity-coupling model in batch mode 1.16. time step size 1.17. Modifying the View 1.18. initialization 1.19. discretization schemes 1.20. Set under relaxation 1.21. log of execute makefile 1 TUI keywords: Background Execution on Linux Systems, journal file Programming language : Scheme , as a Lisp dial

Fluent Error FAQ

  Process 1928: Received signal SIGSEGV. Running on windows Mesh size, 12M serial     Error:  received a fatal signal (Segmentation fault).     Error Object: #f parallel     select 4 processors         error information     Node 0: Process 1928: Received signal SIGSEGV.         Node 5: Process 2824: Received signal SIGSEGV.     MPI Application rank 0 exited before MPI_Finalize() with status 2      The fl process could not be started.         Reason         This is primarily a Windows issue.                 If running Fluent with -t1 or higher number of processes and leave the session for an extended period of time (2-20 hours), it receives the following message in the console:                 The fl process could not be started.                 No other information about what timed out is provided, and only the cortex process is left running. This issue becomes more significant in light of the switch from serial to -t1.         IP interfaces on the machine

Turbulent viscosity limited to viscosity ratio of 1e+05

** Turbulent viscosity limited to viscosity ratio of 1e+05 *** reason The possible *causes* for large turbulent viscosity ratio include: - Bad initial conditions for the turbulence quantities (k and e) - Improper turbulent boundary conditions - Skewed cells *** solution If the problem is not caused by *bad mesh*, then *the beginning of the phenomena* can usually be avoided by: -Turn off solving *turbulence equations* for the first 100-200 iterations -Turn on turbulence and continue iterations If the problem occurs *in the middle of the iteration process*, then use the following procedure: - Stop the iteration - Turn *off* all equations except the *turbulence equations* - Increase turbulence under relaxation factors (URFs) (k and e) to 1 and iterate for 20-50 iterations - *Turn back all equations* and reduce the turbulence URFs to 0.5-0.8 and then continue iterations - Repeat the above steps for several times For *faster convergence*, it might be useful to obtain an initial solution wit