Deleting an index and individual records in Elasticsearch

Introduction

Elasticsearch provides REST API methods for deleting individual documents or an entire index. When deleting documents, you can specify the document by its ID to get rid of that particular document. You can also use a query to delete documents that match the query.

In SQL, the first is equivalent to deleting a row by its rowid or primary key. For example:

DELETE from "employees" WHERE id = 70001;

The second is equivalent to deleting rows using a DELETE by WHERE clause as below.

DELETE from "employees" WHERE "quit_date" IS NOT NULL;

Deleting a document by ID

To delete a single document by ID from an index named employees.csv, you can use the following cURL command: This deletes a document with the ID of 70001.

curl -XDELETE "localhost:9200/employees.csv/doc/70001"

Should you use _doc or doc in the DELETE invocation?

While Elasticsearch documentation refers to using _doc, it really depends on the way the index has been mapped. Lookup the mapping of the index as follows:

curl -XGET "localhost:9200/employees.csv/_mapping"

Which reports (in my Elasticsearch database as) follows. Note that the mapping is named doc and not _doc.

{
  "employees.csv": {
    "mappings": {
      "doc": {
        "properties": {
          "emp_no": {
            "type": "long"
          },
...

When attempting to use _doc instead of doc for this case, we get an error from Elasticsearch:

{
  "status": 400,
  "error": {
    "root_cause": [
      {
        "reason": "Rejecting mapping update to [employees.csv] as the final mapping would have more than 1 type: [_doc, doc]",
        "type": "illegal_argument_exception"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Rejecting mapping update to [employees.csv] as the final mapping would have more than 1 type: [_doc, doc]"
  }
}

Delete documents matching a query

Let us now look into deleting documents matching a query. For the sake of this example, we use the following query which restricts by first_name and a date range based on hire_date.

{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "hire_date": {
              "lte": "1985-5-13",
              "format": "yyyy-M-d"
            }
          }
        },
        {
          "match_phrase": {
            "first_name": "mayuko"
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 25
}

We find 8 records for this query (sample shown below):

"hits": [
  {
    "_index": "employees.csv",
    "_type": "doc",
    "_id": "74444",
    "_score": 8.052864,
    "_source": {
      "emp_no": 84444,
      "birth_date": "1964-05-13",
      "first_name": "Mayuko",
      "last_name": "Rahier",
      "gender": false,
      "hire_date": "1985-05-13"
    }
  },
  {
    "_index": "employees.csv",
    "_type": "doc",
    "_id": "3443",
    "_score": 8.052864,
    "_source": {
      "emp_no": 13443,
      "birth_date": "1954-07-22",
      "first_name": "Mayuko",
      "last_name": "Puppo",
      "gender": null,
      "hire_date": "1985-03-06"
    }
  },
...

To delete these records, run the following cURL command: This is assuming the above query is stored in a file called search.json.

curl -H "Content-Type: application/json" -X POST -d @search.json "localhost:9200/employees.csv/_delete_by_query"

Let us now verify that the records are really gone.

curl -H "Content-Type: application/json" -X POST -d @search.json "localhost:9200/employees.csv/_search"

And the response indicates that the records have been deleted.

"hits": {
  "hits": [],
  "total": 0,
  "max_score": null
},

Deleting all documents from an index

To delete all documents from an index, you can specify a query which matches all documents. An example would be:

{
  "query": {
    "bool": {}
  }
}

The following DELETE command will remove all records from the index.

curl -H "Content-Type: application/json" -X POST -d @search.json "localhost:9200/employees.csv/_delete_by_query"

Deleting an index

To delete an index completely, you can specify the DELETE command on the index.

curl -X DELETE "localhost:9200/employees.csv"

Nguồn: https://www.getargon.io/docs/articles/index/delete.html

Bạn nghĩ gì về bài viết này?