Steps to Enable Memory Spill to Disk on EMR Presto Clusters.
Presto, by default, is an in-memory query engine, which stores intermediate query results only in memory. However, this does not work well with memory-intensive queries. Therefore, it’s important to enable the Spill to Disk for Presto to use disk memory to store the temporary data.
On EMR Clusters, to implement this, Perform the following.
- Create the following bootstrap Action script
#Create spill disk directory
sudo mkdir -p /mnt/spilldisk
sudo chmod -R 777 /mnt/spilldisk
#Increase the open files and processes
echo ‘presto – nofile 65536’ >> /etc/security/limits.d/presto.conf
echo ‘presto – noproc 65536’ >> /etc/security/limits.d/presto.conf
- Add the following Configurations in “presto-config” Classification
{
“Configurations”: [],
“Properties”: {
“experimental.spill-enabled”: “true”,
“experimental.spill-compression-enabled”: “true”,
“task.concurrency”: “3”,
“experimental.spiller-spill-path”: “/mnt/spilldisk”,
},
“Classification”: “presto-config”
}