Itoyori: A Distributed Parallel Runtime with Global Task Parallelism and Global Address Space Task parallelism has attracted attention for its concise and straightforward expression for irregular parallelism, but how to scale it beyond a single node remains a challenge. To this end, we are currently developing a new runtime called Itoyori, which enables efficient dynamic load balancing for task parallelism over a global view of distributed memory (global address space). A notable feature of Itoyori is that tasks can be dynamically migrated across nodes and are scheduled by efficient RDMA-based work stealing. It also provides a software cache for global memory accesses, which reduces redundant communication issued by different tasks scheduled on the same node. We demonstrate its high productivity and performance by porting an existing shared-memory task-parallel implementation of the fast multipole method (FMM) to distributed memory.