In many applications in computational science and engineering the solution of a partial differential equation is sought for, often these applications demand a huge amount of compute power or memory, thus requiring the use of supercomputers. For many problems multigrid methods are optimal solvers. By optimality we mean that the convergence rate is bounded from above independently from the system size and that the number of arithmetic operations grows linear with the system size. Multigrid methods have been developed especially for the solution of linear systems that arise when partial differential equations are discretized. They rely on a grid hierarchy that is available naturally when structured grids are used. If this is not the case, other, more expensive techniques like algebraic multigrid have to be used. Besides allowing for the use of computationally cheaper geometric multigrid methods, the presence of structure also enables to use more efficient implementations on modern computer architectures, including GPUs or vector units in general. We work on highly scalable multigrid methods on high performance computers and accelerators. This includes the design and analysis of coarse grid and grid transfer operators, as well as the development of smoothers with a special emphasize on scalability. E.g., block smoothers that posses a higher arithmetic complexity and better smoothing properties, resulting in shorter time to solution. Additionally, for time-dependent problems parallelization in time is employed. In the talk our work on block smoothers and analysis techniques will be presented. Further, results on different parallel architectures will be shown, including the solution of parabolic PDEs using parallelization in time.