After digging into the weeds of Linux file systems (IO schedulers, xfsslower tracing and friends), we suspect that the performance degradation is due to excessive fsyncs. In-order to support Cross Cluster Replication
, Elasticsearch retains historical operations on certain indices. The amount of historical operations that are retained in an index is controlled by a new mechanism called retention leases
. The leases are maintained by the primary copy of each shard and are synchronized to the replicas. With every synchronization, we issue an fsync to the file system to persist the file where the leases are stored. For simplicity, we are currently syncing the leases every 30 seconds. Sadly, for clusters which have a lot of shards and run on spinning disks (hello, warm nodes!) this creates a lot of fsyncs. Those numerous fsyncs appear to be causing heavy IO load on the machines and cause delays in persisting cluster state updates to disk. The delays can be so large that the new cluster coordination subsystem
deems the nodes unstable and removes them from the cluster. These fsyncs only arise on indices created since 6.5 with a special index setting
; to support future features, that setting is the default for indices created since 7.0. The Elasticsearch team has already created a pull request
to fix this issue and we are currently working on confirming it in our staging Cloud envirionment. Once we have confirmed that the pull request fixes the issue, we will take the necessary next steps to get this fixed for all impacted Cloud users (and other users of Elasticsearch). We will update you again when we have confirmation of the fix (ETA 6 hours).