Master's student or intern
Master’s student or intern
Enabling Persistent Memory in Modern Data Centers
New non-volatile storage technologies, such as phase-change memory, enabled the introduction of persistent memory (PM) as a new layer in the memory hierarchy positioned in between DRAM and NAND flash. PM is expected to become ubiquitous in all modern datacenters as it holds the promise to significantly accelerate the performance and reduce the cost of existing cloud services. Compared to DRAM, PM offers data persistence, cost and power savings, and increased capacity. Compared to Flash, PM offers byte addressability, nanosecond level latency, and increased durability. As PM is a complementary technology – rather than a DRAM or NAND flash drop-in replacement – it requires rethinking the design of high-performance data processing and data storage systems. In this context, we propose adding PM support to two open-source systems that are widely used in hybrid cloud deployments and are ideal candidates to benefit from PM. The project can focus on either system based on the interests of the candidate.
PM support for Ray
Ray is an open-source computing framework designed to ease the scaling of complex computational tasks across a computing cluster. Ray abstracts the complexities of building distributed systems and provides a rich set of programming abstractions and primitives that reduce the knowledge required to develop distributed applications. On top of Ray, domain experts can develop applications that specialize in solving a particular problem, e.g., develop machine learning or serverless applications, without having to also become distributed systems experts. One of the existing limitations of Ray, however, is that all data must be stored in a distributed shared in-memory store which ensures high performance and ease of use but at the same time limits the size of the data that can be processed and requires over-provisioning DRAM. A promising direction is to extend Ray to transparently spill data from main memory first to persistent memory and then to high-performance NVMe SSDs. This could be accomplished, for example, by extending Plasma, an open-source distributed data store part of the Apache Arrow project, that is envisioned to become the default Ray data store.
PM support for Ceph
Cloud deployments rely on storing data in distributed storage systems that provide high performance, fault tolerance, unconstrained scalability, and abstraction of the underlying hardware. Ceph & OpenShift Container Storage (OCS) is the leading open-source storage platform that provides all three common storage interfaces, i.e., object, block, and file-level APIs and is envisioned to become the storage backbone for both public and private cloud deployments. We propose to extend Ceph, so that it can efficiently utilize PM as a store for the object metadata and as a cache for the frequently accessed data.
IBM is committed to diversity at the workplace. With us you will find an open, multicultural environment. Excellent flexible working arrangements enable all genders to strike the desired balance between their professional development and their personal lives.
How to apply
We are inviting applications from students to conduct their master’s thesis work or an internship project at the IBM Research lab in Zurich on this exciting new topic. The research focus will be on advancing the state-of-the-art in distributed storage for cloud applications. The work also involves interactions with several researchers focusing on various aspects of the project. The ideal candidate should be well versed in either storage systems or distributed systems, and have strong programming skills (Python, C++).
For more information on technical questions please contact Dr. Radu Stoica (RST@zurich.ibm.com).