今回は、招待講演として、ETH Zurichから Markus Püschel 教授をお呼びし、自動チューニングと機械学習に関する講演を行います。 また、国内から、工学院大学の藤井昭宏 博士、筑波大学の山崎育朗様による、数値計算アルゴリズムにおける自動チューニングの講演を行います。
懇親会(Markus Püschel 教授を囲んで)：19時00分～
The Tenth Advanced Supercomputing Environment (ASE) Seminar
13th October, 2011
Information Technology Center, The University of Tokyo (Hongo)
4th Floor, telecom lecture room
15:00 - 15:05 Introduction
Takahiro Katagiri (The University of Tokyo)
15:05 - 16:05
Professor Markus Püschel
ETH Zurich, Switzerland
Automatic Performance Tuning and Machine Learning
Automatic performance tuning has emerged as a paradigm complementing traditional compilers to port software and performance between platforms. Several techniques have proven useful including adaptive libraries, program generation, domain-specific languages, and models. However, one technique is shared by almost all approaches: search for the fastest among a set of alternative implementations. Typically the search space is huge and hence the search is costly. This may be bearable in offline tuning (e.g., ATLAS) that is performed during installation but becomes cumbersome in online tuning (e.g., FFTW) that is performed at runtime since the input size is required. We argue that machine learning, which has already been studied and used in the compiler community, can solve this problem and should be added to the portfolio of performance tuning tools. As example we show a successful approach to automatically convert Spiral-generated online-tunable transform libraries into offline-tunable ones.
About the speaker
Markus Püschel is a Professor of Computer Science at ETH Zurich, Switzerland since 2010. Before, he was a Professor of Electrical and Computer Engineering at Carnegie Mellon University, where he still has an adjunct status. He received his Diploma (M.Sc.) in Mathematics and his Doctorate (Ph.D.) in Computer Science, in 1995 and 1998, respectively, both from the University of Karlsruhe, Germany. He is a recipient of the Outstanding Research Award of the College of Engineering at Carnegie Mellon and the Eta Kappa Nu Award for Outstanding Teaching. He also holds the title of Privatdozent at the University of Technology, Vienna, Austria. In 2009 he cofounded SpiralGen Inc. Much of his research revolves around automating the production of high performance software for applications of a mathematical nature (the Spiral project). He is also interested in fundamental signal processing theory and algorithms.
16:15 - 16:45
Akihiro Fujii (Kogakuin University), Osamu Nakamura(Sumitomo Metal Industries)
Automatic tuning for Algebraic Multigrid solver for Fluid analysis
This talk presents an online automatic tuning method of Algebraic Multigrid (AMG) solver for fluid analysis based on SMAC method. This type of fluid analysis requires to solve Pressure Poisson equation every time step. As time steps forward, problem matrix does not change, but the right hand side vector changes gradually in many cases. To shorten the total time the solver uses, we try to optimize AMG solver by selecting its parameter setting for each time step. In this talk, we consider automatic tuning technique of AMG solver parameters such as smoothers, multigrid cycles, acceleration coefficient of the smoother and others. We studied the auto tuning method which narrows the search domain by determining parameters in a step-by-step manner based on typical property of the AMG solver parameters. In addition, optimizing parameters is done every constant number of time steps such as 100 time steps to reduce the overhead of measuring the efficiency of various parameter settings. Our AMG library with online auto-tuning mechanism which determines appropriate solver parameter set among 900 parameter settings improved the performance of AMG solver with default setting up to 20 percent in our numerical tests.
16:45 - 17:15
Ikuro Yamazaki, Hiroto Tadano, Tetsuya Sakurai (Graduate School of Systems and Information Engineering, University of Tsukuba)
A parameter selection for a preconditioner using a cutoff for Krylov subspace methods
Large linear systems involving a semi-sparse matrix which has relatively large number of nonzero elements appear in nano-science simulations. Preconditioners with a cutoff is proposed as suitable methods for such semi-sparse linear systems. The performance of these preconditioners is highly dependent on a cutoff parameter. A smaller value of cutoff increases a computational cost in preconditioner construction. Meanwhile, a larger value of cutoff leads to a less effective preconditioning matrix with a large number of iterations. Hence, the selection of an appropriate cutoff parameter is important. In this talk, we present a strategy to find an efficient cutoff parameter for the preconditioners using a cutoff. We estimate an appropriate cutoff parameter using residual norms before applying an iterative solver, and verify the validity of our strategy by numerical experiments.
17:15 - 17:30 Break
17:30 - 18: 00
Satoshi Ohshima (The University of Tokyo)
Implementation of 3D FEM program on GPU
GPU is now utilizing for several scientific applications because GPU has
high calculation performance and memory transfer performance.
FEM (Finite Element Method) is one of the applications expected to
accelerate with GPU.
This talk shows the implementation of 3D FEM program on CUDA GPU. The main targets of acceleration are sparse matrix solver (CG method) and matrix assembly.
Our GPU implementation has obtained better performance than multi-core CPU. Also we are now trying multi GPUs implementation.
18:00 - 18:30
Masae Hayashi (The University of Tokyo)
OpenMP/MPI Hybrid Parallel FEM Based on Extended Hierarchical Interface Decomposition for Multi-core Clusters.
ILU preconditioner is a powerful and popular preconditioning method for Krylov iterative solvers on sparse matrices derived from FEM applications. Block-Jacobi-type localized ILU preconditioner is generally employed in parallel computation basing on domain decomposition. The localization of ILU process is good for parallel efficiency but decreases the effectiveness of preconditioning since it neglects the effect from out side the domain. Hierarchical Interface Decomposition (HID) and proposed extended version of HID, where additional layers of separators are introduced for robust and efficient computation is a robust and efficient parallel preconditioning method. We are developing preconditioning methods using OpenMP/MPI hybrid parallel programming models on multicore/multisocket clusters. HID and extended version of HID (ExHID) is applied to both of inter-node parallelization with message-passing (e.g. MPI) part and intra-node parallelization with multi-threading (e.g. OpenMP). We implement the OpenMP/MPI hybrid parallel programing model to FEM application solving 3- dimensional linear elasticity problem. The developed code has been tested on the T2K Open Super Computer (T2K/Tokyo) using up to 8 nodes, 16 cores. We report the scalability and the robustness resulted from numerical experiments of parallel ILU preconditioner with fill-ins and iterative solver and we show the developed code provides better performance compared to that of multicoloring method used for intra-node parallelization.
18: 30 Closing
Takahiro Katagiri (The University of Tokyo)
19:00 Banquet near Nedu station
備 考：領収書が必要な方はここにご記載ください。（宛名： ）
ASE研究会幹事 特任准教授 片桐孝洋