Description
This may have something to do with #1678. We had a user come to us with a parallel DisplacedProblem
simulation running with --distributed-mesh
that fails with a PETSc error like:
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Incompatible vector local lengths parameter # 1 local size 4644 != parameter # 2 local size 4632
The problem occurs when we try to sync the displaced_system.current_local_solution
with the undisplaced_system.current_local_solution
. The global and local sizes of these vectors are the same; however, the _send_list
is different. The _send_list
is larger for the undisplaced_system
because the number of elements to ghost is larger by two elements (for both processes: 67 vs 65 for processor 0; 65 vs 63 for processor 1). I latched onto one of the elements (element 1448) that is ghosted for the undisplaced_system
on processor 0 but not for the displaced_system
. 1448 is a neighbor of element 664 in the undisplaced system, but not in the displaced system!
An lldb summary:
(lldb) b EquationSystems::init
Breakpoint 1: where = libmesh_dbg.so.0`libMesh::EquationSystems::init() + 19 at equation_systems.C:98, address = 0x00007f55759b9843
(lldb) c
Process 30667 resuming
Process 30667 stopped
* thread #1, name = 'golem-dbg', stop reason = breakpoint 1.1
frame #0: 0x00007f55759b9843 libmesh_dbg.so.0`libMesh::EquationSystems::init(this=0x0000000001f5a3d8) at equation_systems.C:98
95
96 void EquationSystems::init ()
97 {
-> 98 const unsigned int n_sys = this->n_systems();
99
100 libmesh_assert_not_equal_to (n_sys, 0);
101
This corresponds to the the undisplaced equation systems:
(lldb) p _mesh.elem_ptr(664) (libMesh::Tri3 *) $0 = 0x0000000001dbb330 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0) (libMesh::Tri3 *) $1 = 0x0000000001e2aa60 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1) (libMesh::Tri3 *) $2 = 0x0000000001e29800 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2) (libMesh::Tri3 *) $3 = 0x0000000001e2a520 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->id() (libMesh::dof_id_type) $5 = 1448 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->id() (libMesh::dof_id_type) $6 = 678 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->id() (libMesh::dof_id_type) $7 = 683 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->centroid() (libMesh::Point) $8 = { libMesh::TypeVector = { _coords = ([0] = -7.607472007162869, [1] = -2.5389266380419335, [2] = -23.063092092672985) } } (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->centroid() (libMesh::Point) $9 = { libMesh::TypeVector = { _coords = ([0] = -14.014609877951443, [1] = -13.641004916901389, [2] = -5.1979944109916687) } } (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->centroid() (libMesh::Point) $10 = { libMesh::TypeVector = { _coords = ([0] = -24.067560513814289, [1] = -17.952677090962727, [2] = -31.695480346679688) } } (lldb) c Process 30667 resuming Process 30667 stopped * thread #1, name = 'golem-dbg', stop reason = breakpoint 1.1 frame #0: 0x00007f55759b9843 libmesh_dbg.so.0`libMesh::EquationSystems::init(this=0x0000000001f451a8) at equation_systems.C:98 95 96 void EquationSystems::init () 97 { -> 98 const unsigned int n_sys = this->n_systems(); 99 100 libmesh_assert_not_equal_to (n_sys, 0); 101
This corresponds to the displaced systems:
(lldb) p _mesh.elem_ptr(664) (libMesh::Tri3 *) $11 = 0x0000000001e2b960 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0) (libMesh::Tri3 *) $12 = 0x0000000001edbbf0 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1) (libMesh::Tri3 *) $13 = 0x0000000001eda530 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2) (libMesh::Tri3 *) $14 = 0x0000000001eda990 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->id() (libMesh::dof_id_type) $15 = 704 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->id() (libMesh::dof_id_type) $16 = 678 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->id() (libMesh::dof_id_type) $17 = 683 (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->centroid() (libMesh::Point) $18 = { libMesh::TypeVector = { _coords = ([0] = -14.503511334148547, [1] = -0.97677584178745746, [2] = -20.124373932679493) } } (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->centroid() (libMesh::Point) $19 = { libMesh::TypeVector = { _coords = ([0] = -14.014609877951443, [1] = -13.641004916901389, [2] = -5.1979944109916687) } } (lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->centroid() (libMesh::Point) $20 = { libMesh::TypeVector = { _coords = ([0] = -24.067560513814289, [1] = -17.952677090962727, [2] = -31.695480346679688) } } (lldb)
Summary
The zeroth neighbor of element 664 for undisplaced
has a totally different id and centroid than the zeroth neighbor for the displaced
systems.
Important notes
- The displaced mesh is copy constructed from the undisplaced mesh after the latter is partitioned; if I pass
skip_find_neighbors = true
toUnstructuredMesh::copy_nodes_and_elements
, then there is no issue whatsoever, so for me this points to something happening inUnstructuredMesh::find_neighbors
. - This is a mesh whose highest dimensional elements are
TET4
but that (as shown by the lldb session) contains lower dimensionalTRI3
elements. The issue here is happening for aTRI3
element and neighbors, but perhaps the presence of the higher dimensionality elements is playing a role?