Vancouver, BC
May 21-24, 2018

Please note: All times listed below are in Central Time Zone

Case Study: Large Scale Deployment for Machine Learning with High Speed Storage

Join our presentation to learn how you can build your cluster for machine learning business. Machine learning and AI are obviously recent new trend of technologies. NTT, our big telecommunication company, also has its AI brand "Corevo". This presentation shares the experience, how to build and manage our cloud-like computing infrastructure for our company use case, in which how we've been managing the full open source computing cluster environment including OpenStack components and container technologies.

In this talk, we'd like to introduce our case study that a full-open sourced reference cluster model with Ansible and container orchestrator automation. The environment built on GPU computation and high speed storage, in which we use Chainer and ChainerMN learning framework with many NVIDIA GPU nodes, and attach perfectly scalable OpenStack Swift object storage with file system APIs as the high speed data storage.

What can I expect to learn?

Attendees will be able to learn basic strategies on how you can build your own machine learning cluster on your use case. In this talk, we will share the software stack and the hardware stack consideration, in particlur including modern machine learning framework like Chainer and ChainerMN, Ansible and docker container orchestration, and OpenStack Swift storage with FileSytem API for AI/HPC. And we will also describe about the summary of the performance and the operation efficiency.

On the architecture design, our consideration consists of both operators and users (UsersOps) rather than DevOps because our machine learning researchers has joined the operation team to build the cluster. Absolutely, attendees will be able to learn such a significant perspective when building your own cluster and they will be able to get connected with us to discuss how we can improve the cluster management.


Thursday, May 24, 4:40pm-5:20pm(UTC -5)
Difficulty Level: Intermediate
Kota is a Software Engineer at Nippon Telegraph and Telephone Corporation (NTT). NTT is one of the biggest telecommunication companies which provide cloud services in Japan. Kota has worked on OpenStack Swift for approximately 6 years. Recently, he has worked on global distributed cluster efficiency and the area of erasure code stuff in the Swift community and he has joined Swift core team...
Senior Research Engineer
Takeharu Eda is a senior research engineer at NTT Software Innovation Center in Japan. He has been developping a scalable surveillance video system utilizing deep learning-based computer vision techniques. Before joining the center, he launched a web hosting service utilizing CloudStack/OpenStack -based infrastructre and migration tools for it, while managing international development...
Kengo is a Research Engineer at Nippon Telegraph and Telephone Corporation (NTT). NTT is one of the biggest telecommunication companies which provide cloud services in Japan. Kengo has been studying cloud resource management and scheduling.