Elasticsearch: search_context_missing_exception - No search context found for id

Consolidating daily logstash indexes into monthly logstash indexes sometimes results in the following error:

      {
        "index": "logstash-2018-11-01",
        "shard": 0,
        "node": "aodScdJuQ5OWLyucQ6Px5Q",
        "reason": {
          "type": "search_context_missing_exception",
          "reason": "No search context found for id [3708723]"
        }

This error is typically caused by the following:

Reindexing uses the scroll api under the covers to read a “point-in-time” view of the source data.

This point in time consists of a set of segments (Lucene files) that are essentially locked and prevented from being deleted by the usual segment merging process that works in the background to reorganise the index in response to ongoing CUD (create/update/delete) operations.

It is costly to preserve this view of data which is why users of the scroll API must renew their lock with each new request for a page of results. Locks will timeout if the client fails to return within the timespan which they said they would return.
The error you are seeing is because the reindex function has requested another page of results but the scroll ID which represents a lock on a set of files is either:

* timed out (i.e the reindex client spent too long indexing the previous page) or
* lost because the node serving the scroll api was restarted or otherwise became unavailable

(Source: https://discuss.elastic.co/t/problem-when-reindexing-large-index/117421)

Looking at the Elasticsearch logs I can see:

[INFO ][o.e.t.LoggingTaskListener] 3406201 finished with response BulkByScrollResponse[took=2.1h,timed_out=false,sliceId=null,updated=0,created=37036000,deleted=0,batches=37036,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[{"shard":-1,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [1102015]"}}, {"shard":-1,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [1102016]"}}, {"shard":-1,"reason":{"type":"search_context_missing_exception","reason":"No search context found for id [1102023]"}}]]
[INFO ][o.e.m.j.JvmGcMonitorService] [server-ls2.local] [gc][40994] overhead, spent [335ms] collecting in the last [1s]

[WARN ][o.e.t.TransportService   ] [server-ls2.local] Received response for a request that has timed out, sent [51770ms] ago, timed out [21769ms] ago, action [internal:discovery/zen/fd/master_ping], node [{server-ls5.local}{aM_wxa2mTY2XI9P7bsobSg}{gsNCWS-GQ0md1u-yuhxSNw}{192.168.11.55}{192.168.11.55:9300}{site_id=rack1, ml.machine_memory=67279155200, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [11840408]

So there was a timeout issue - but the cause is unknown at this time….

Debian Lighttpd does infinite redirect loop and fails to connect

Just imagine your running a blog that requires zero maintenance and one day you access it and it doesn’t load!

You try Firefox and then Chrome and finally Edge (the new IE)

You notice that Firefox and Chrome seem to loop and then finally fail - You notice that Edge works….

You notice that cURL works.

Things are but aren’t working.

Finally you notice Firefox is trying to do TLS1.3! Interesting how do I disable that on Debian 9 with Lighttpd? You Can’t!

What’s the fix?

in lighttpd.conf in your SSL section input:

ssl.disable-client-renegotiation = “disable”

ssl.disable-client-renegotiation exists because of a bug back in 2009 - This bug has long been patch in newer versions of OpenSSL and is safe to turn back on.

Disabling this setting allowed you to find the answer to your troubles :-)

Disable Fedora Cockpit

Quick and dirty:

service cockpit stop
service cockpit.socket stop
systemctl disable cockpit
systemctl disable cockpit.socket
systemctl mask cockpit.socket
systemctl mask cockpit

Ansible Conditionals and Parentheses evaluate to True

I had fun wasting hours working out how to do correct ‘when’ statements in Ansible - In end up consulting #ansible on IRC to get the answers.
Anyway I hope the following playbook makes sense to you. Note that ‘admintool’ is a valid group in my situation.

- name: Debug all the things
  hosts: all

  tasks:
    - set_fact: renew_cert="renew"

      # Valid - Should pause
    - name: Test 0 PASS
      pause: prompt="Test" seconds=1
      when: '"admintool" in group_names and renew_cert == "renew"'

      # Valid - Should skip
    - name: Test 1 SKIP
      pause: prompt="Test" seconds=1
      when: '"I-Dont-Exist" in group_names and renew_cert == "renew"'

      # Valid - Should skip
    - name: Test 2 SKIP
      pause: prompt="Test" seconds=1
      when: 
        - "'i-dont-exist' in group_names"
        - renew_cert == "renew"

      # Valid - Should pause
    - name: Test 3 PASS
      pause: prompt="Test" seconds=1
      when: 
        - "'admintool' in group_names"
        - renew_cert == "renew"

      # Invalid - Should skip - but eval's True - DONT USE
    - name: Test 4 SKIP
      pause: prompt="Test" seconds=1
      when: ("'admintool' in group_names" and renew_cert == "renew")

      # Invalid - Should skip - but eval's True - DONT USE
    - name: Test 5 SKIP
      pause: prompt="Test" seconds=1
      when: ("'I-dont-exist' in group_names")

      # Valid - Should pause
    - name: Test 6 PASS
      pause: prompt="Test" seconds=1
      when: ("admintool" in group_names and renew_cert == "renew")

      # Valid - Should skip
    - name: Test 7 SKIP
      pause: prompt="Test" seconds=1
      when: ("I-dont-exist" in group_names and renew_cert == "renew")

List comparison and list manipulation in Ansible

I keep saying time and time again that Ansible is not a programming language, it’s similar to one, it can do some programming things but ultimately it’s messy and I hate it BUT I can make it do some strange things.
List manipulation being one of those.

In this example I have two directories that I want to compare, directory one (/tmp/1) and directory two (/tmp/2). Directory one is the Source, that I want directory two to look like.

The use case is I want to sync /tmp/1 to /tmp/2 but you only want to remove the files in that are no longer /tmp/1, then you can sync (copy/template) the /tmp/1 directory knowing that nothing exists /tmp/2 that shouldn’t be there.

The ansible code is this with debug statements:

- hosts: local
  become: false
  tasks:

    - name: find 1
      find: path=/tmp/1
      register: one
    - debug: msg="{{ one }}"

    - name: find 2
      find: path=/tmp/2
      register: two

    - debug: msg="{{ item.path }}"
      with_items:
        - "{{ two.files }}"

    - set_fact:
        one_list: []
        two_list: []
        new_list: []

    - name: append
      set_fact: one_list="{{ one_list }} + [ '{{ item.path | basename }}' ]"
      with_items:
        - "{{ one.files }}"

    - name: append
      set_fact: two_list="{{ two_list }} + [ '{{ item.path | basename }}' ]"
      with_items:
        - "{{ two.files }}"

    - debug: msg="{{ one_list }}"
    - debug: msg="{{ two_list }}"

    - set_fact: new_list="{{ two_list | difference(one_list) }}"
    - debug: msg="{{ new_list }}"

The final result is new_list is a list (array) that contains what needs to be removed from /tmp/2 to bring it in line with /tmp/1