在Elasticsearch中导入和更新数据

Z时代
2024-01-10
分类：问答

我们有一个现有的搜索功能，该功能涉及SQL

Server中多个表之间的数据。这给我们的数据库造成了沉重的负担，因此我试图寻找一种更好的方式来搜索这些数据（它不会经常更改）。我与Logstash和Elasticsearch一起工作了大约一个星期，使用包含120万条记录的导入。我的问题本质上是“如何使用“主键”更新现有文档”？

CSV数据文件（以竖线分隔）如下所示：

369|90045|123 ABC ST|LOS ANGELES|CA
368|90045|PVKA0010|LA|CA
367|90012|20000 Venice Boulvd|Los Angeles|CA
365|90045|ABC ST 123|LOS ANGELES|CA
363|90045|ADHOCTESTPROPERTY|DALES|CA

我的logstash配置如下所示：

input {
  stdin {
    type => "stdin-type"
  }
  file {
    path => ["C:/Data/sample/*"]
    start_position => "beginning"
  }
}
filter {
  csv {
    columns => ["property_id","postal_code","address_1","city","state_code"]
    separator => "|"
  }
}
output {
  elasticsearch {
    embedded => true
    index => "samples4"
    index_type => "sample"
  }
}

然后，elasticsearch中的文档如下所示：

{ "_index": "samples4", "_type": "sample", "_id": "64Dc0_1eQ3uSln_k-4X26A", "_score": 1.4054651, "_source": { "message": [ "369|90045|123 ABC ST|LOS ANGELES|CA\r" ], "@version": "1", "@timestamp": "2014-02-11T22:58:38.365Z", "host": "[host]", "path": "C:/Data/sample/sample.csv", "property_id": "369", "postal_code": "90045", "address_1": "123 ABC ST", "city": "LOS ANGELES", "state_code": "CA" }

我想将_id字段中的唯一ID

替换为的值property_id。这个想法是，后续数据文件将包含更新。我不需要保留以前的版本，也不会出现我们在文档中添加或删除键的情况。

document_idelasticsearch输出的设置不会将该字段的值放入其中_id（它只是放在“

property_id”中，并且仅存储/更新了一个文档）。我知道我在这里想念什么。我只是采取了错误的方法吗？

使用@rutter的建议，我将output配置更新为： ``

output {
  elasticsearch {
    embedded => true
    index => "samples6"
    index_type => "sample"
    document_id => "%{property_id}"
  }
}

现在，通过按预期将新文件放入数据文件夹来更新文档。_id和property_id是相同的值。 ``

{ "_index": "samples6", "_type": "sample", "_id": "351", "_score": 1, "_source": { "message": [ "351|90045|Easy as 123 ST|LOS ANGELES|CA\r" ], "@version": "1", "@timestamp": "2014-02-12T16:12:52.102Z", "host": "TXDFWL3474", "path": "C:/Data/sample/sample_update_3.csv", "property_id": "351", "postal_code": "90045", "address_1": "Easy as 123 ST", "city": "LOS ANGELES", "state_code": "CA" }

回答：

从评论转换：

您可以通过发送另一个具有相同ID的文档来覆盖文档…但是对于以前的数据，这可能会有些棘手，因为默认情况下会获得随机ID。

您可以使用输出插件的document_idfield设置ID

，但是它使用文字字符串，而不是字段名称。要使用字段的内容，可以使用sprintf格式的字符串，例如%{property_id}。

这样的事情，例如：

output {
  elasticsearch {
    ... other settings...
    document_id => "%{property_id}"
  }
}

以上是在Elasticsearch中导入和更新数据的全部内容，来源链接： utcz.com/qa/433115.html

在Elasticsearch中导入和更新数据

回答：

其他人也看了：