python 的 elasticsearch 驱动库用的是 http 还是私有的 tcp 协议通讯的?
python 的 elasticsearch 驱动库用的是 http 还是私有的 tcp 协议通讯的?
我需要批量插入文档到 es 的功能需求,一次性插入到 es 中 7000+ 个文档
我是用 elasticsearch 提供的 helpers.bulk(es, doc_list)
插入,但是发现非常的耗时,需要 38 秒才能完成!
应该不是网络带宽的问题,因为用的是 aliyun 的 es 服务,入网带宽有 50Mbps
我是用 tshark
抓包了一下:
输入命令:
sudo tshark -i ens1f0 -Y "ip.addr==121.199.xxx.xxx"
部分结果如下:
56871 48.011406170 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1616693 Ack=85763635 Win=7416 Len=1268 TSval=965894257 TSecr=3255003261 [TCP segment of a reassembled PDU]56872 48.011419682 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1617961 Win=2087 Len=0 TSval=3255003268 TSecr=965894257
56891 48.017191036 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1617961 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56892 48.017241771 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1619229 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56893 48.017256249 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1620497 Win=2087 Len=0 TSval=3255003273 TSecr=965894264
56894 48.017293804 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1620497 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56895 48.017311081 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1623033 Win=2087 Len=0 TSval=3255003273 TSecr=965894264
56896 48.017339213 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1623033 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56897 48.017354477 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1625569 Win=2087 Len=0 TSval=3255003273 TSecr=965894264
56898 48.017393905 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1625569 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56899 48.017454274 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1626837 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56900 48.017469170 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1629373 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56901 48.017511582 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1629373 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56902 48.017526962 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1631909 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56903 48.017570866 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1631909 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56904 48.017586380 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1634445 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56905 48.017636684 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1634445 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56906 48.017652399 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1636981 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56907 48.017694925 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1636981 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56908 48.017710077 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1639517 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56910 48.017760121 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1639517 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56912 48.017827581 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1640785 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56913 48.017842134 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1642053 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56914 48.017897140 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1642053 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56915 48.017962428 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1643321 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56916 48.017976258 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1644589 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56917 48.018044837 121.199.xxx.xxx → 192.168.60.251 TCP 3870 9200 → 33754 [ACK] Seq=1644589 Ack=85763635 Win=7416 Len=3804 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56918 48.018061739 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1648393 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56919 48.018112941 121.199.xxx.xxx → 192.168.60.251 TCP 2602 9200 → 33754 [ACK] Seq=1648393 Ack=85763635 Win=7416 Len=2536 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56920 48.018122579 121.199.xxx.xxx → 192.168.60.251 TCP 1334 9200 → 33754 [ACK] Seq=1650929 Ack=85763635 Win=7416 Len=1268 TSval=965894264 TSecr=3255003267 [TCP segment of a reassembled PDU]
56921 48.018130499 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1650929 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
56922 48.018152407 121.199.xxx.xxx → 192.168.60.251 HTTP 956 HTTP/1.1 200 OK (application/json)
56923 48.018181027 192.168.60.251 → 121.199.xxx.xxx TCP 66 33754 → 9200 [ACK] Seq=85763635 Ack=1653087 Win=2087 Len=0 TSval=3255003274 TSecr=965894264
57309 53.984411706 192.168.60.251 → 121.199.xxx.xxx TCP 66 33916 → 9200 [FIN, ACK] Seq=1011 Ack=269 Win=64128 Len=0 TSval=3255009240 TSecr=965178229
57314 54.020048944 121.199.xxx.xxx → 192.168.60.251 TCP 66 9200 → 33916 [FIN, ACK] Seq=269 Ack=1012 Win=32256 Len=0 TSval=965222829 TSecr=3255009240
57315 54.020094840 192.168.60.251 → 121.199.xxx.xxx TCP 66 33916 → 9200 [ACK] Seq=1012 Ack=270 Win=64128 Len=0 TSval=3255009276 TSecr=965222829
可以看到 elasticsearch
驱动库库貌似并没有一次性把 7000
多个文档一次性发给 es
,而是进行了 7000
次的 tcp 通讯
我该如何优化这么大量的文档的批量插入?
回答:
TCP 包的大小受限于底层其它网络层的限制,一般都不会特别大。
HTTP 是一个应用层协议,它的一个大请求肯定是会被拆分成多个的 TCP 包发过去的。
回答:
Python的Elasticsearch驱动库默认使用HTTP协议进行通信。Elasticsearch本身支持基于HTTP协议的RESTful API,因此Elasticsearch驱动库使用HTTP协议与Elasticsearch服务器进行通信。不过,Elasticsearch也支持基于私有的TCP协议进行通信,但这需要通过Elasticsearch提供的Java API或其他第三方驱动库来实现。
以上是 python 的 elasticsearch 驱动库用的是 http 还是私有的 tcp 协议通讯的? 的全部内容, 来源链接: utcz.com/p/938427.html