今回は、招待講演として Lawrence Berkeley National Laboratory から Osni Marques 博士をお呼びし、アプリケーション開発環境における現在とこれからに関する講演を行います。
また、国内招待講演者として、東京大学の相島健助 助教をお呼びし、特異値計算におけるdqds法の講演、山梨大学の鈴木智博 准教授による近年の計算機におけるQR分解の進展に関する講演、および、筑波大学の山本和磨氏による非線形固有値問題の並列アルゴリズムに関する講演を行います。
12th Advanced Supercomputing Environment (ASE) Seminar
25th April 2012 (Wed)
Information Technology Center, The University of Tokyo (Hongo Campus)
4th Floor, Telecommunication Lecture Room
13:30 - 14:15
Dr. Kensuke Aishima (The University of Tokyo, Japan), Dr. Yuji Nakatsukasa (University of Manchester, UK), Dr. Ichitaro Yamazaki (University of Tennessee, USA)
dqds with aggressive deflation for singular values
Matrix singular values play an important role in many applications. Accordingly, numerical methods for computing singular values are of great importance in practice. In order to compute the singular values, the given matrix is first transformed to a bidiagonal matrix with suitable orthogonal transformations, and then a certain iterative method is applied to the bidiagonal matrix. In 1994, Fernando and Parlett discovered the differential quotient difference with shifts (dqds) algorithm for computing singular values of bidiagonal matrices to high relative accuracy. The dqds algorithm is currently implemented in LAPACK as the DLASQ routine. Our objective is to reduce the dqds runtime without loss of high relative accuracy. More specifically, we incorporate into the dqds a technique called aggressive deflation, which has been applied successfully to the Hessenberg QR algorithm. We propose an efficient and stable implementation by taking advantage of the bidiagonal structure. Numerical results are also shown to illustrate that our aggressive deflation strategy often reduces the dqds runtime significantly. In addition, a shift-free version of our algorithm has a potential to be parallelized in a pipelined fashion. Our mixed forward-backward stability analysis proves that with our proposed deflation strategy, all the singular values are computed to high relative accuracy.
14:25 - 15:10
Associate Professor Tomohiro Suzuki (Yamanashi University, Japan)
On implementations of tile QR factorization algorithm for recent hardware
There are many important matrix factorizations in dense linear algebra. Classic implementations suffer from performance limitations due to the use of L2 and L1 BLAS operations. The scalability limitation exists even in a blocking algorithms which are rich in L3 BLAS. Such limitations are called fork-join bottlenecks. In order to take advantage of the architectural features on recent multi-core or many-core systems, tile algorithms for the matrix factorization are proposed. In this talk, we present our implementations of the tile QR factorization algorithm for the GPU system and the multi-core CPU cluster system. It is implemented with OpenMP and MPI hybrid programing model for the multi-core cluster system i.e. T2K open supercomputer (U.Tokyo). For the GPU system, we also show the implementation for the multi GPU environment. In order to achieve high performance, it is important to tune each sub program (kernel) of the tile algorithm. In addition to that, a proper scheduling with checking dependencies among all kernels has an equivalent importance. Some studies for an optimized scheduling for the tile QR algorithm are reported.
15:20 - 16:20
Dr. Osni Marques (Lawrence Berkeley National Laboratory, USA)
Dealing with Application Development -- Now and Henceforth
The development of simulation codes is often a costly process that results from the combination of the increasing complex problems to be solved and the evolution of computer architectures. Practitioners are expected to develop highly efficient codes, although emerging computer architectures pose formidable challenges in achieving adequate levels of performance. Code developers usually have a range of choices for programming ? MPI, OpenMP, PGAS Languages, CUDA, and the emerging OpenACC ? but whose benefits / advantages may not be clear. To easy the development process, scientific software libraries are increasingly used in simulation codes: in many cases, this approach has lessened the development effort, contributed to an optimal usage of the available computational resources, and lessened issues related to portability and application lifecycle. However, how will advances in programming and hardware impact libraries? This presentation will discuss some of these issues.
16:30 - 17:15
Mr. Kazuma Yamamoto, Mr. Yasuyuki Maeda, Mr. Yasunori Futamura, Professor Tetsuya Sakurai (Department of Computer Science, University of Tsukuba, Japan)
Adaptive parallel algorithm for stochastic estimation of nonlinear eigenvalue density
A numerical method that estimates the eigenvalue density of nonlinear eigenvalue problems in the specified region has been proposed. Nonlinear eigenvalue problems arise in science and engineering. Since parameter settings for eigensolver that based on eigenvalues are required, accuracy and parallel efficiency can be improved by using eigenvalue density. In this presentation, we propose an algorithm for efficient execution of the estimation method on parallel computers. Conventional approach requires the solutions of linear systems for each integral point that uniformly distributed on the complex plane. Thus, it causes the load imbalance and requires a large computational cost due to the variation of solution time for linear systems. The proposed master-worker type adaptive algorithm improves the load balance and reduces the computational cost by the placing integral points according to the density of eigenvalue in the specified region. Moreover, we propose a look-ahead algorithm that balances the loads more efficiently by recycling the variables in the linear solver. We evaluate the efficiency of the proposed algorithms by several numerical examples.
17:25 - 18:10
Satoshi Itoh (Information Technology Center, The University of Tokyo, Japan)
Study of plugging-in AT mechanism in OpenFOAM
OpenFOAM is an open source CFD software package. It is free software and developers can describe the governing equations simply with its instinctive interface, it is spread widely. OpenFOAM is based on the finite volume method (FVM), so that the main application is CFD. However, it has a problem that it is difficult to achieve high performance on high-end machine such as supercomputers. We are developing ppOpen-AT, which is an infrastructure of auto-tuning (AT) for ppOpen-HPC. ppOpen-HPC is a numerical middleware for post Petascale era. One of its features is auto-tuning mechanism (ppOpen-AT). We chose OpenFOAM as one of testing software. In this study, we optimize OpenFOAM manually for the first step of auto-tuning. We show numerical results on T2K, and discuss the AT methodology for OpenFOAM.
18:10 Closing Remarks
Takahiro Katagiri (The University of Tokyo)
18:40 A Banquet near Nedu station
備 考：領収書が必要な方はここにご記載ください。（宛名： ）
ASE研究会幹事 准教授 片桐孝洋