EMR Presto – Enable Memory Spill to Disk

  • Post author:
  • Post category:Presto

Steps to Enable Memory Spill to Disk on EMR Presto Clusters.

Presto, by default, is an in-memory query engine, which stores intermediate query results only in memory. However, this does not work well with memory-intensive queries. Therefore, it’s important to enable the Spill to Disk for Presto to use disk memory to store the temporary data.

On EMR Clusters, to implement this, Perform the following.

  • Create the following bootstrap Action script

           #Create spill disk directory
           sudo mkdir -p /mnt/spilldisk
           sudo chmod -R 777 /mnt/spilldisk

           #Increase the open files and processes
           echo ‘presto – nofile 65536’ >> /etc/security/limits.d/presto.conf
           echo ‘presto – noproc 65536’ >> /etc/security/limits.d/presto.conf

  • Add the following Configurations in “presto-config” Classification

         {
              “Configurations”: [],
                    “Properties”: {
                            “experimental.spill-enabled”: “true”,
                            “experimental.spill-compression-enabled”: “true”,
                            “task.concurrency”: “3”,
                            “experimental.spiller-spill-path”: “/mnt/spilldisk”,
                              },
                              “Classification”: “presto-config”
           }