MOCFE-Bone: The 3D-MOC Mini-Application for Exascale Research
Last Updated: 08/23/13
==Introduction==
MOCFE-Bone is a mini-application which simulates the main procedures in a 3D method of characteristics (MOC) code for numerical solution of the steady state neutron transport equation. 3D-MOC was chosen as a specific methodology of interest in exascale computing for nuclear reactor applications due to its heterogeneous geometry capability, high degree of accuracy, and potential for scalability.
The MOCFE-Bone mini-app focuses on emulating the behavior of a production 3D MOC tool without requiring the input overhead associated with such a code. MOCFE-Bone does not provide actual solutions to a given problem as it does not properly connect the systems of equations. However, it simulates the communication and computational requirements in correct proportions. The mini-app only requires a few command line arguments to run.
==Methodology==
The system of equations (Ax=b) is solved using a multigroup Krylov formulation (restarted GMRES). The matrix A is not explicitly formed and is implemented through on-the-fly actions that require much less memory than storing A explicitly.
The following discretizations are used.
Energy: Multigroup Approximation
Space: Structured 3D Cartesian
This is more restrictive than a production 3D MOC code and the choice was made to avoid implementation of the complex record-keeping needed to trace rays in an unstructured geometry. This affects the communication pattern in that a block will have at most 6 nearest neighbors (could be much higher in general 3D MOC).
Angle: Discrete ordinates
MOCFE-Bone uses a flat source approximation (source within an element is restricted to a flat function in space).
==Obtain the MOCFE-Bone code==
Download the source code from https://svn.mcs.anl.gov/repos/cesar-codes/mini_apps/mocfe_bone
==Linking and Compiling==
MOCFE-Bone is written in Fortran and uses pure MPI parallelism. The makefiles Makefile.txt and Makefile_BGP.txt are provided. The user should specify the installed MPI directory and preferred compilers if they differ from those specified in the makefile.
Copy the appropriate Makefile text file into a new file called Makefile and type make. An executable called emulate_MOC.x will be created.
==Usage==
The mini-app uses command line options, however the following things cannot be changed:
Element Size (Fixed):
The element size is fixed (0.125cm x 0.125cm x 0.14cm) inside the code which is consistent with typical element size in MOC using a flat source approximation. When you specify more elements, you are specifying a physically larger problem. The local domain size (per processor) is fixed in shape, but the size can be increased using MeshScale (see below). To elongate the geometry, increase the number of processors in one of the directions (X,Y,Z) with input options <6>, <7> or <8>.
Composition/Geometry (Fixed):
Material cross sections are hard-wired such that a well-posed system of equations will be produced for any number of energy groups. For more details, see the user manual.
Command Line Options (User choice):
Example Usage:
emulate_MOC.x <1> <2> <3> <4> <5> <6> <7> <8> <9> <10> <11>
<1> Number of energy groups
<2> Number of energy groups per process (between 1 and <1>)
<3> Number of angles (multiple of 8)
<4> Number of angles per process (must be a factor of <3>)
<5> MeshScale: Number of cells along the X,Y,Z directions (same in all directions) within a decomposed spatial domain ("local subdomain"). MeshScale^3 therefore gives the total number of elements per process.
<6> Number of X-processors (at least 1). Increasing this value to N means you that want to repeat the local subdomain N times along the X-axis. (A local subdomain consists of "MeshScale*MeshScale*MeshScale" elements.)
<7> Number of Y-processors (at least 1)
<8> Number of Z-processors (at least 1)
<9> Trajectory Area (recommended to use a value less than 0.1). Decrease this number if the output reports "missed elements". The resulting trajectory density is 1/Trajectory Area.
<10> Number of Krylov Iterations (use at least 60). This value would be automatically calculated in a real MOC code but because we are not solving the problem fully, we use it as a parameter. Select an intelligent number of iterations (measurements) relative to the GMRES back-vectors (<11>): Use a ratio of 60 iterations for 30 GMRES backvectors (per outer iteration) in serial. Use a ratio of 120 iterations for 30 GMRES back vectors when parallelizing in space.
<11> GMRES Back Vectors (use at least 30).
==Example==
An example with parallelization in space.
mpiexec -n 32 emulate_MOC.x 2 2 8 8 32 4 4 2 0.01 120 30
This problem uses 2 energy groups with 2 energy groups per process (no parallelization in energy). It uses 8 angles with 8 angles per process (no parallelization in angle). It uses a MeshScale of 32. This makes the local subdomain equal to (32x32x32) elements.
There are 4 processors in the X direction, 4 processors in the Y direction, and 2 processors in the Z direction for a total of 32 spatial processors. Therefore, the total number of elements is (32x4) x (32x4) x (32x2), i.e. 1,048,576 elements divided evenly over 32 processors. The physical problem size is, correspondingly, (32x4x0.125) x (32x4x0.125) x (32x2x0.14), or 16 cm x 16 cm x 8.96 cm. Trajectories will be assigned to the domain surfaces with a minimum density of 1/0.01 cm2, or 100/cm2. The Krylov method (restarted GMRES) will be executed for 120 iterations, using 30 back vectors. The number of GMRES iterations was chosen to be roughly double the number specified in serial to simulate the iteration growth under spatial decomposition.
==Output==
The output contains an echo of the input options use, counts of global elements, trajectories, and intersections, and reports any missed elements. If missed elements are reported, request a smaller trajectory area as missing elements will destroy the load balance.
The number of trajectories broken in (X,Y,Z) indicate the additional degrees of freedom added to the problem due to spatial decomposition. In the following example, parallelization was specified only along the X direction, so broken trajectories appear only along X. The reported time at the end of the simulation is the time spent in GMRES only. The convergence criteria and norm are irrelevant as the real problem is not being solved.
In addition, a file called "ParallelSetupDetails.out" is created which indicates how the degrees of freedom are divided up by processor (rank). See the user manual for a further description on this output file.