GC-stall and Page Scan Attacks
by Linux
Cuong Tran
LinkedIn Performance Group
Agenda
• GC attacks by Linux
• Page scan attacks by Linux
• Recommendations
Examples of
GC attacks by Linux
• 2013-10-05T05:01:04.179+0000:…. : 216982K>9328K(256000K), 0.0666320 secs] 377835K-

>170188K(768000K), 0.0675850 secs] [Times:

user=0.17

sys=0.00, real=3.18 secs]
• 2013-09-19T06:14:03.632+0000: 44372.834: [GC [1 CMS-initial-mark:
703914K(921600K)] 718372K(1433600K), 126.1196340 secs] [Times:

user=0.00 sys=127.31, real=126.10 secs]
• GC stopped the world for minutes but:
– Did no real work (CPU time in user mode = 0)
– Burned cycles in Linux kernel
GC attacks by Linux
• IO starvation
– Symptom: GC log shows “low user time, low system
time, long GC pause”.
– Cause: GC threads stuck in kernel waiting for
IO, usually due to journal commits or FS flush of
changes by gzip of log rolling

• Memory starvation.
– Symptom: GC log shows “Low user time, high system
time, long GC pause”
– Cause: Memory pressure triggers swapping or
scanning for free memory

4
Solutions for GC-attacks
• IO Starvation
– Strategy: Even out workload to disk drives (flush every 5 s rather
than 30 s)
sysctl –w vm.dirty_writeback_centisecs = 500
sysctl –w vm.dirty_expire_centisecs = 500

– In progress: Direct IO with gzip or gzip as-you-go

• Memory Starvation
– Strategy: Pre-allocate memory to JVM heap and protect it
against swapping or scanning
– Turn on –XX:+AlwaysPreTouch option in JVM
– Sysctl –w vm.swappiness=0 to protect heap and
anonymous memory
– JVM start up has 2 second delay to allocate all memory (17GB)

5
Page scan attacks by Linux
Measured: 7,000,000 scans/sec
Stall: 2+ minutes
Goal: 0 scans/sec

6
Cause : Page Scan Attacks

Transparent Huge Page (THP)
• A Redhat enhancement for performance
–
–
–
–

2MB huge pages vs. 4KB regular pages
Less TLB miss and page table walk
Only work for anonymous memory (malloc)
Improve 10% performance for SPECjbb, app server workload

• But THP can degrade performance severely
– Collapsing, Compacting, Splitting, Migration
– Very high pgscand/s
– Very busy khugepaged
– Very high system time when process compacts memory or
khugepaged runs

• THP optimization can increase GC stall time by minutes
Cause : Page Scan Attacks

NUMA Optimization
• A Linux optimization for NUMA
– 2 CPU sockets, each having 12 cores and local memory.
– Memory accessible by all 24 cores but local memory is faster
– Linux tries to allocate local memory to application
threads, i.e., from local zone
– Best suited for applications that can fit in one local zone

• NUMA optimization can degrade performance severely
– Very high pgscand/s
– Linux zone-reclaim insists on finding memory on local
zone although memory is plentiful on the other zone
– Linux migrates memory including THP, creating a viscous cycle of
breaking up 2 MB pages, scanning for 4 KB free pages, and reassembling 4KB into 2 MB pages
Cause : Page Scan Attacks

Solutions
• Turn off THP optimization and thus

khugepaged
– echo never >
/sys/kernel/mm/redhat_transparent_hugepa
ge/enabled

– Will not affect file-IO or memory mapped files
– Redhat, Oracle, Hadoop recommends no THP

• Turn off zone-reclaim optimization
– sysctl –w vm.zone_reclaim_mode=0

– Twitter recommends NUMA interleaving
9
Recommendations
• Gate keepers: SRE and SysOps
• Safe to roll-out fixes for GC attacks now
– Linux: Flush changes more frequently and protect heap
• sysctl –w vm.dirty_writeback_centisecs = 500
• sysctl –w vm.dirty_expire_centisecs = 500

• sysctl –w vm.swappiness=0
– JVM: Give JVM heap all memory it needs when started
• –XX:+AlwaysPreTouch
• Heap size per AutoTune

• Gradual roll-out fixes of page scan attacks.
– Best for back-end servers
– Linux: Turn off THP and NUMA optimization
• echo never >
/sys/kernel/mm/redhat_transparent_hugepage/enabled
• sysctl –w vm.zone_reclaim_mode = 0

– Work with product groups to test on small group of servers before
applying changes to the rest

Gc and-pagescan-attacks-by-linux

  • 1.
    GC-stall and PageScan Attacks by Linux Cuong Tran LinkedIn Performance Group
  • 2.
    Agenda • GC attacksby Linux • Page scan attacks by Linux • Recommendations
  • 3.
    Examples of GC attacksby Linux • 2013-10-05T05:01:04.179+0000:…. : 216982K>9328K(256000K), 0.0666320 secs] 377835K- >170188K(768000K), 0.0675850 secs] [Times: user=0.17 sys=0.00, real=3.18 secs] • 2013-09-19T06:14:03.632+0000: 44372.834: [GC [1 CMS-initial-mark: 703914K(921600K)] 718372K(1433600K), 126.1196340 secs] [Times: user=0.00 sys=127.31, real=126.10 secs] • GC stopped the world for minutes but: – Did no real work (CPU time in user mode = 0) – Burned cycles in Linux kernel
  • 4.
    GC attacks byLinux • IO starvation – Symptom: GC log shows “low user time, low system time, long GC pause”. – Cause: GC threads stuck in kernel waiting for IO, usually due to journal commits or FS flush of changes by gzip of log rolling • Memory starvation. – Symptom: GC log shows “Low user time, high system time, long GC pause” – Cause: Memory pressure triggers swapping or scanning for free memory 4
  • 5.
    Solutions for GC-attacks •IO Starvation – Strategy: Even out workload to disk drives (flush every 5 s rather than 30 s) sysctl –w vm.dirty_writeback_centisecs = 500 sysctl –w vm.dirty_expire_centisecs = 500 – In progress: Direct IO with gzip or gzip as-you-go • Memory Starvation – Strategy: Pre-allocate memory to JVM heap and protect it against swapping or scanning – Turn on –XX:+AlwaysPreTouch option in JVM – Sysctl –w vm.swappiness=0 to protect heap and anonymous memory – JVM start up has 2 second delay to allocate all memory (17GB) 5
  • 6.
    Page scan attacksby Linux Measured: 7,000,000 scans/sec Stall: 2+ minutes Goal: 0 scans/sec 6
  • 7.
    Cause : PageScan Attacks Transparent Huge Page (THP) • A Redhat enhancement for performance – – – – 2MB huge pages vs. 4KB regular pages Less TLB miss and page table walk Only work for anonymous memory (malloc) Improve 10% performance for SPECjbb, app server workload • But THP can degrade performance severely – Collapsing, Compacting, Splitting, Migration – Very high pgscand/s – Very busy khugepaged – Very high system time when process compacts memory or khugepaged runs • THP optimization can increase GC stall time by minutes
  • 8.
    Cause : PageScan Attacks NUMA Optimization • A Linux optimization for NUMA – 2 CPU sockets, each having 12 cores and local memory. – Memory accessible by all 24 cores but local memory is faster – Linux tries to allocate local memory to application threads, i.e., from local zone – Best suited for applications that can fit in one local zone • NUMA optimization can degrade performance severely – Very high pgscand/s – Linux zone-reclaim insists on finding memory on local zone although memory is plentiful on the other zone – Linux migrates memory including THP, creating a viscous cycle of breaking up 2 MB pages, scanning for 4 KB free pages, and reassembling 4KB into 2 MB pages
  • 9.
    Cause : PageScan Attacks Solutions • Turn off THP optimization and thus khugepaged – echo never > /sys/kernel/mm/redhat_transparent_hugepa ge/enabled – Will not affect file-IO or memory mapped files – Redhat, Oracle, Hadoop recommends no THP • Turn off zone-reclaim optimization – sysctl –w vm.zone_reclaim_mode=0 – Twitter recommends NUMA interleaving 9
  • 10.
    Recommendations • Gate keepers:SRE and SysOps • Safe to roll-out fixes for GC attacks now – Linux: Flush changes more frequently and protect heap • sysctl –w vm.dirty_writeback_centisecs = 500 • sysctl –w vm.dirty_expire_centisecs = 500 • sysctl –w vm.swappiness=0 – JVM: Give JVM heap all memory it needs when started • –XX:+AlwaysPreTouch • Heap size per AutoTune • Gradual roll-out fixes of page scan attacks. – Best for back-end servers – Linux: Turn off THP and NUMA optimization • echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled • sysctl –w vm.zone_reclaim_mode = 0 – Work with product groups to test on small group of servers before applying changes to the rest