The development of hybrid supercomputers provides an opportunity to solve extremely large physics problems. However, current hardware involves potential bottlenecks, launching accelerator kernels and moving data to and from accelerator memory. We investigate ``heavy smoothingff within a multigrid preconditioner, using aggressive coarsening and heavy smoothers. This can be shown to maintain optimal scaling and allow for a tradeoff between local work and non-local communication, often the bottleneck on tightly-coupled problems on todayfs largest supercomputers. Smoothers are defined as overlapping local smoothers. In particular, we examine the computation and application of the recently-developed Chow-Patel fine-grained ILU decomposition, which is amenable to computation on an accelerator. Implementation is done within ViennaCL and PETSc. We present software contributions to provide the required capabilities in a portable way, and results of applying the smoothers to precondition highly ill-conditioned linear systems arising from mantle convection simulation with highly heterogeneous viscosity structure, with pTatin3D.