Persistent Memory Microconference Notes




Welcome to Linux Plumbers Conference 2015



The structure will be short introductions to an issue or topic followed by a discussion with the audience.


A limit of 3 slides per presentation is enforced to ensure focus and allocate enough time for discussions.



Please use this etherpad to take notes. Microconf leaders will be giving a TWO MINUTE summary of their microconference during the Friday afternoon closing session.



Please remember there is no video this year, so your notes are the only record of your microconference.



Miniconf leaders: Please remember to take note of the approximate number of attendees in your session(s).



SCHEDULE




Andy Rudoff/Intel



http://pmem.io/ - lots of libraries, etc.

- up to 1000x faster than NAND (slower than DRAM)
- up to 1000x endurance of NAND
- half the cost of DRAM
6TB per 2-socket system
SSDs will arrive first
DIMMs for next gen platform (NVDIMMs available today)

Byte-addressable persistence - fast enough to load directly

Working with Dan Williams/Intel to handle the Linux kernel changes to support PMEM


THE FUTURE

- Some more basics to be done: 
    - RAS
   - replication
   - RDMA
- Microsoft C compiler has a feature called "based pointers"

-  Many emerging memory types with different characteristics (not necessarily from Intel)
- Application transparent

- Not application transparent



Dan Williams/Intel



PMEM & BLK

- Data path for PMEM - CPU writes directly to the memory
- Data path for BLK - CPU writes to some aperture registers, then writes to the memory

Lots of components in the Linux software stack...
- libnvdimm/libndctl: userspace
- DAX, ACPI.NFIT, libnvdimm kernel bus driver, PMEM, BTT, BLK

PMEM: ramdisk driver
BLK: block I/O layer

BTT layer: "makes the storage look more like a disk"

recommendation: use the library not the sysfs files to muck around with it

What to do about struct page? 

- 6TB 
- RDMA, etc need struct page to work
- kernel v4.3: proposing to use memory hotplugging

Q: What if drivers store PFN numbers?
A: "PFNs will always be relative to a block device" - PFNs are accessed through the PMEM driver.  "Offset 17" in the PMEM driver will always give you the same address throughout one boot

Q: Keith: "I can't afford to hotplug my 320TB of memory" ... "I really am planning on using mmap() for these"
A: You have the choice to never hotplug with PMEM ... "the entire block layer can be done without struct page" ... "we can start removing the struct page from all of the subsystems"  ... "I initially started out doing pageless block IO and not everybody was cheering my name"

What happens if someone is doing RDMA to memory and someone hot-unplugs the device?
"remove always goes through"
Proposal: make remove sleep ... but what if it sleeps indefinitely?

PMEM and NUMA

Q: do we need to be able to query numa locality at a granularity finer than that of a device or file?
A: Yes.  Keith mentions that he has a system with a single pmem device that spans numa nodes, and that the numa node of a particular address can be found out via a table
    Also, Jeff has preliminary code to add .direct_access() support to device mapper targets (though it's only useful for dm-linear and maybe thinp)
    
Q: How do you specify allocation policy for a file?
A: Keith - we used xattrs, not because they're necessarily the best method, but because it was easy
    Jeff - FIEMAP may also be useful for querying locality
    
Bottom line: we need to be able to both set an allocation policy and query the numa affinity of data at page/file system block size granularity