python 的 elasticsearch 驱动库用的是 http 还是私有的 tcp 协议通讯的?

python 的 elasticsearch 驱动库用的是 http 还是私有的 tcp  协议通讯的?

python 的 elasticsearch 驱动库用的是 http 还是私有的 tcp 协议通讯的?

我需要批量插入文档到 es 的功能需求,一次性插入到 es 中 7000+ 个文档

我是用 elasticsearch 提供的 helpers.bulk(es, doc_list) 插入,但是发现非常的耗时,需要 38 秒才能完成!

应该不是网络带宽的问题,因为用的是 aliyun 的 es 服务,入网带宽有 50Mbps

我是用 tshark 抓包了一下:

输入命令:

sudo tshark  -i ens1f0  -Y "ip.addr==121.199.xxx.xxx"

部分结果如下:

56871 48.011406170 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1616693 Ack=85763635 Win=7416 Len=1268 TSval=965894257 TSecr=3255003261 [TCP segment of a reassembled PDU]

56872 48.011419682 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1617961 Win=2087 Len=0 TSval=3255003268 TSecr=965894257

56891 48.017191036 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1617961 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56892 48.017241771 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1619229 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56893 48.017256249 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1620497 Win=2087 Len=0 TSval=3255003273 TSecr=965894264

56894 48.017293804 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1620497 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56895 48.017311081 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1623033 Win=2087 Len=0 TSval=3255003273 TSecr=965894264

56896 48.017339213 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1623033 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56897 48.017354477 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1625569 Win=2087 Len=0 TSval=3255003273 TSecr=965894264

56898 48.017393905 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1625569 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56899 48.017454274 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1626837 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56900 48.017469170 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1629373 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56901 48.017511582 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1629373 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56902 48.017526962 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1631909 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56903 48.017570866 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1631909 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56904 48.017586380 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1634445 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56905 48.017636684 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1634445 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56906 48.017652399 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1636981 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56907 48.017694925 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1636981 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56908 48.017710077 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1639517 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56910 48.017760121 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1639517 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56912 48.017827581 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1640785 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56913 48.017842134 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1642053 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56914 48.017897140 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1642053 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56915 48.017962428 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1643321 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56916 48.017976258 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1644589 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56917 48.018044837 121.199.xxx.xxx → 192.168.60.251 TCP 3870 9200 → 33754 [ACK] Seq=1644589 Ack=85763635 Win=7416 Len=3804 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56918 48.018061739 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1648393 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56919 48.018112941 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1648393 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56920 48.018122579 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1650929 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]

56921 48.018130499 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1650929 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

56922 48.018152407 121.199.xxx.xxx → 192.168.60.251 HTTP 956 HTTP/1.1 200 OK (application/json)

56923 48.018181027 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1653087 Win=2087 Len=0 TSval=3255003274 TSecr=965894264

57309 53.984411706 192.168.60.251 → 121.199.xxx.xxx TCP 66 33916 → 9200 [FIN, ACK] Seq=1011 Ack=269 Win=64128 Len=0 TSval=3255009240 TSecr=965178229

57314 54.020048944 121.199.xxx.xxx → 192.168.60.251 TCP 66 9200 → 33916 [FIN, ACK] Seq=269 Ack=1012 Win=32256 Len=0 TSval=965222829 TSecr=3255009240

57315 54.020094840 192.168.60.251 → 121.199.xxx.xxx TCP 66 33916 → 9200 [ACK] Seq=1012 Ack=270 Win=64128 Len=0 TSval=3255009276 TSecr=965222829

可以看到 elasticsearch 驱动库库貌似并没有一次性把 7000 多个文档一次性发给 es,而是进行了 7000次的 tcp 通讯

我该如何优化这么大量的文档的批量插入?


回答:

TCP 包的大小受限于底层其它网络层的限制,一般都不会特别大。

HTTP 是一个应用层协议,它的一个大请求肯定是会被拆分成多个的 TCP 包发过去的。


回答:

Python的Elasticsearch驱动库默认使用HTTP协议进行通信。Elasticsearch本身支持基于HTTP协议的RESTful API,因此Elasticsearch驱动库使用HTTP协议与Elasticsearch服务器进行通信。不过,Elasticsearch也支持基于私有的TCP协议进行通信,但这需要通过Elasticsearch提供的Java API或其他第三方驱动库来实现。

以上是 python 的 elasticsearch 驱动库用的是 http 还是私有的 tcp 协议通讯的? 的全部内容, 来源链接: utcz.com/p/938427.html

回到顶部