Introduction
Elasticsearch provides REST API methods for deleting individual documents or an entire index. When deleting documents, you can specify the document by its ID to get rid of that particular document. You can also use a query to delete documents that match the query.
In SQL, the first is equivalent to deleting a row by its rowid or primary key. For example:
DELETE from "employees" WHERE id = 70001;
The second is equivalent to deleting rows using a DELETE by WHERE clause as below.
DELETE from "employees" WHERE "quit_date" IS NOT NULL;
Deleting a document by ID
To delete a single document by ID from an index named employees.csv
, you can use the following cURL command: This deletes a document with the ID of 70001
.
curl -XDELETE "localhost:9200/employees.csv/doc/70001"
Should you use _doc or doc in the DELETE invocation?
While Elasticsearch documentation refers to using _doc
, it really depends on the way the index has been mapped. Lookup the mapping of the index as follows:
curl -XGET "localhost:9200/employees.csv/_mapping"
Which reports (in my Elasticsearch database as) follows. Note that the mapping is named doc
and not _doc
.
{
"employees.csv": {
"mappings": {
"doc": {
"properties": {
"emp_no": {
"type": "long"
},
...
When attempting to use _doc
instead of doc
for this case, we get an error from Elasticsearch:
{
"status": 400,
"error": {
"root_cause": [
{
"reason": "Rejecting mapping update to [employees.csv] as the final mapping would have more than 1 type: [_doc, doc]",
"type": "illegal_argument_exception"
}
],
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [employees.csv] as the final mapping would have more than 1 type: [_doc, doc]"
}
}
Delete documents matching a query
Let us now look into deleting documents matching a query. For the sake of this example, we use the following query which restricts by first_name
and a date range based on hire_date
.
{
"query": {
"bool": {
"must": [
{
"range": {
"hire_date": {
"lte": "1985-5-13",
"format": "yyyy-M-d"
}
}
},
{
"match_phrase": {
"first_name": "mayuko"
}
}
]
}
},
"from": 0,
"size": 25
}
We find 8 records for this query (sample shown below):
"hits": [
{
"_index": "employees.csv",
"_type": "doc",
"_id": "74444",
"_score": 8.052864,
"_source": {
"emp_no": 84444,
"birth_date": "1964-05-13",
"first_name": "Mayuko",
"last_name": "Rahier",
"gender": false,
"hire_date": "1985-05-13"
}
},
{
"_index": "employees.csv",
"_type": "doc",
"_id": "3443",
"_score": 8.052864,
"_source": {
"emp_no": 13443,
"birth_date": "1954-07-22",
"first_name": "Mayuko",
"last_name": "Puppo",
"gender": null,
"hire_date": "1985-03-06"
}
},
...
To delete these records, run the following cURL command: This is assuming the above query is stored in a file called search.json
.
curl -H "Content-Type: application/json" -X POST -d @search.json "localhost:9200/employees.csv/_delete_by_query"
Let us now verify that the records are really gone.
curl -H "Content-Type: application/json" -X POST -d @search.json "localhost:9200/employees.csv/_search"
And the response indicates that the records have been deleted.
"hits": {
"hits": [],
"total": 0,
"max_score": null
},
Deleting all documents from an index
To delete all documents from an index, you can specify a query which matches all documents. An example would be:
{
"query": {
"bool": {}
}
}
The following DELETE command will remove all records from the index.
curl -H "Content-Type: application/json" -X POST -d @search.json "localhost:9200/employees.csv/_delete_by_query"
Deleting an index
To delete an index completely, you can specify the DELETE command on the index.
curl -X DELETE "localhost:9200/employees.csv"
Nguồn: https://www.getargon.io/docs/articles/index/delete.html