Preconditioned parallel solvers based on the Krylov iterative method are widely used in scientific and engineering applications. 
Communication overhead is a critical issue when executing these solvers on large-scale massively parallel supercomputers. 
In the previous work, we introduced communication-computation overlapping with dynamic loop scheduling of OpenMP to the sparse matrix-vector multiplication (SpMV) process of a parallel iterative solver by Conjugate Gradient (CG) method in a parallel finite element application (GeoFEM/Cube) on multicore and manycore clusters. 
In the present work, first, we re-evaluated the method on our new system, Wisteria/BDEC-01 (Odyssey) (Fujitsu PRIMEHPC FX1000 with A64FX), and a significant performance improvement of 25-30% for parallel iterative solver at 2,048 nodes (98,304 cores) was obtained.
 Moreover, we proposed a new reordering method for communication-computation overlapping in ICCG solvers for a parallel finite volume application (Poisson3D/Dist), and attained 5-12% improvement at 1,024 nodes of Odyssey.