Skip to content

find_neighbors finds a different neighbor? #2026

Open
@lindsayad

Description

@lindsayad

This may have something to do with #1678. We had a user come to us with a parallel DisplacedProblem simulation running with --distributed-mesh that fails with a PETSc error like:

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Incompatible vector local lengths parameter # 1 local size 4644 != parameter # 2 local size 4632

The problem occurs when we try to sync the displaced_system.current_local_solution with the undisplaced_system.current_local_solution. The global and local sizes of these vectors are the same; however, the _send_list is different. The _send_list is larger for the undisplaced_system because the number of elements to ghost is larger by two elements (for both processes: 67 vs 65 for processor 0; 65 vs 63 for processor 1). I latched onto one of the elements (element 1448) that is ghosted for the undisplaced_system on processor 0 but not for the displaced_system. 1448 is a neighbor of element 664 in the undisplaced system, but not in the displaced system!

An lldb summary:

(lldb) b EquationSystems::init
Breakpoint 1: where = libmesh_dbg.so.0`libMesh::EquationSystems::init() + 19 at equation_systems.C:98, address = 0x00007f55759b9843
(lldb) c
Process 30667 resuming
Process 30667 stopped
* thread #1, name = 'golem-dbg', stop reason = breakpoint 1.1
    frame #0: 0x00007f55759b9843 libmesh_dbg.so.0`libMesh::EquationSystems::init(this=0x0000000001f5a3d8) at equation_systems.C:98
   95  
   96   void EquationSystems::init ()
   97   {
-> 98     const unsigned int n_sys = this->n_systems();
   99  
   100    libmesh_assert_not_equal_to (n_sys, 0);
   101 

This corresponds to the the undisplaced equation systems:

(lldb) p _mesh.elem_ptr(664)
(libMesh::Tri3 *) $0 = 0x0000000001dbb330
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)
(libMesh::Tri3 *) $1 = 0x0000000001e2aa60
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)
(libMesh::Tri3 *) $2 = 0x0000000001e29800
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)
(libMesh::Tri3 *) $3 = 0x0000000001e2a520
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->id()
(libMesh::dof_id_type) $5 = 1448
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->id()
(libMesh::dof_id_type) $6 = 678
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->id()
(libMesh::dof_id_type) $7 = 683
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->centroid()
(libMesh::Point) $8 = {
  libMesh::TypeVector = {
    _coords = ([0] = -7.607472007162869, [1] = -2.5389266380419335, [2] = -23.063092092672985)
  }
}
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->centroid()
(libMesh::Point) $9 = {
  libMesh::TypeVector = {
    _coords = ([0] = -14.014609877951443, [1] = -13.641004916901389, [2] = -5.1979944109916687)
  }
}
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->centroid()
(libMesh::Point) $10 = {
  libMesh::TypeVector = {
    _coords = ([0] = -24.067560513814289, [1] = -17.952677090962727, [2] = -31.695480346679688)
  }
}
(lldb) c
Process 30667 resuming
Process 30667 stopped
* thread #1, name = 'golem-dbg', stop reason = breakpoint 1.1
    frame #0: 0x00007f55759b9843 libmesh_dbg.so.0`libMesh::EquationSystems::init(this=0x0000000001f451a8) at equation_systems.C:98
   95  
   96   void EquationSystems::init ()
   97   {
-> 98     const unsigned int n_sys = this->n_systems();
   99  
   100    libmesh_assert_not_equal_to (n_sys, 0);
   101 

This corresponds to the displaced systems:

(lldb) p _mesh.elem_ptr(664)
(libMesh::Tri3 *) $11 = 0x0000000001e2b960
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)
(libMesh::Tri3 *) $12 = 0x0000000001edbbf0
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)
(libMesh::Tri3 *) $13 = 0x0000000001eda530
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)
(libMesh::Tri3 *) $14 = 0x0000000001eda990
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->id()
(libMesh::dof_id_type) $15 = 704
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->id()
(libMesh::dof_id_type) $16 = 678
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->id()
(libMesh::dof_id_type) $17 = 683
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(0)->centroid()
(libMesh::Point) $18 = {
  libMesh::TypeVector = {
    _coords = ([0] = -14.503511334148547, [1] = -0.97677584178745746, [2] = -20.124373932679493)
  }
}
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(1)->centroid()
(libMesh::Point) $19 = {
  libMesh::TypeVector = {
    _coords = ([0] = -14.014609877951443, [1] = -13.641004916901389, [2] = -5.1979944109916687)
  }
}
(lldb) p _mesh.elem_ptr(664)->neighbor_ptr(2)->centroid()
(libMesh::Point) $20 = {
  libMesh::TypeVector = {
    _coords = ([0] = -24.067560513814289, [1] = -17.952677090962727, [2] = -31.695480346679688)
  }
}
(lldb) 

Summary

The zeroth neighbor of element 664 for undisplaced has a totally different id and centroid than the zeroth neighbor for the displaced systems.

Important notes

  1. The displaced mesh is copy constructed from the undisplaced mesh after the latter is partitioned; if I pass skip_find_neighbors = true to UnstructuredMesh::copy_nodes_and_elements, then there is no issue whatsoever, so for me this points to something happening in UnstructuredMesh::find_neighbors.
  2. This is a mesh whose highest dimensional elements are TET4 but that (as shown by the lldb session) contains lower dimensional TRI3 elements. The issue here is happening for a TRI3 element and neighbors, but perhaps the presence of the higher dimensionality elements is playing a role?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions