Resize Linux partition while online

Previously I always had to fumble around using FDISK, do some dangerous task of deleting the partition table while in use, then add re-adding the table with the new extended settings!
Too difficult and too danger prone (although it never failed on me…)

Anyway I just stumbled on a way easier process! Introducing ‘growpart’.
The process for resizing EXT4 partition, ‘/dev/xvda1′ would be:

Backup the partition table (just in case)

sfdisk -d /dev/xvda > partition_backup

Resize the partition

growpart -v /dev/xvda 1

And finally resize the filesystem to make use of the larger partition

resize2fs /dev/xvda1

Happy resizing.

Note: Quick investigation it seems ‘growpart’ is from the RPM ‘cloud-disk-utils’ from Amazon AWS hosted systems. So not sure about regular avaliability.

Elasticsearch: search_context_missing_exception - No search context found for id

Consolidating daily logstash indexes into monthly logstash indexes sometimes results in the following error:

      {
        "index": "logstash-2018-11-01",
        "shard": 0,
        "node": "aodScdJuQ5OWLyucQ6Px5Q",
        "reason": {
          "type": "search_context_missing_exception",
          "reason": "No search context found for id [3708723]"
        }

This error is typically caused by the following:

Reindexing uses the scroll api under the covers to read a “point-in-time” view of the source data.

This point in time consists of a set of segments (Lucene files) that are essentially locked and prevented from being deleted by the usual segment merging process that works in the background to reorganise the index in response to ongoing CUD (create/update/delete) operations.

It is costly to preserve this view of data which is why users of the scroll API must renew their lock with each new request for a page of results. Locks will timeout if the client fails to return within the timespan which they said they would return.
The error you are seeing is because the reindex function has requested another page of results but the scroll ID which represents a lock on a set of files is either:

* timed out (i.e the reindex client spent too long indexing the previous page) or
* lost because the node serving the scroll api was restarted or otherwise became unavailable

(Source: https://discuss.elastic.co/t/problem-when-reindexing-large-index/117421)

Looking at the Elasticsearch logs I can see:

[INFO ][o.e.t.LoggingTaskListener] 3406201 finished with response BulkByScrollResponse[took=2.1h,timed_out=false,sliceId=null,updated=0,created=37036000,deleted=0,batches=37036,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[{"shard":-1,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [1102015]"}}, {"shard":-1,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [1102016]"}}, {"shard":-1,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [1102023]"}}]]
[INFO ][o.e.m.j.JvmGcMonitorService] [server-ls2.local] [gc][40994] overhead, spent [335ms] collecting in the last [1s]

[WARN ][o.e.t.TransportService   ] [server-ls2.local] Received response for a request that has timed out, sent [51770ms] ago, timed out [21769ms] ago, action [internal:discovery/zen/fd/master_ping], node [{server-ls5.local}{aM_wxa2mTY2XI9P7bsobSg}{gsNCWS-GQ0md1u-yuhxSNw}{192.168.11.55}{192.168.11.55:9300}{site_id=rack1, ml.machine_memory=67279155200, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [11840408]

So there was a timeout issue - but the cause is unknown at this time….