Abstract: On heterogeneous clusters using accelerators such as GPUs, the inter-node communication among accelerators becomes bottleneck due to the overhead including latency along the communication path. To eliminate such overhead, we have been proposed Tightly Coupled Accelerators (TCA) architecture which enables direct communication between accelerators over nodes, and which enables direct communication between accelerators over nodes, and developed prototypes using FPGA. Also we are investigating high-level parallel programming language, and several practical application programs on our concept, as well as enhancement of TCA and its system software stack in the JST-CREST "Research and Development on Unified Environment of Accelerated Computing and Interconnection for Post-Petascale Era” (AC-CREST) project. In this talk, our achievement on TCA and AC-CREST project, and our future direction will be presented.