一尘不染

UTF8编码长于最大长度32766

elasticsearch

我已经将我的Elasticsearch集群从1.1升级到1.2,并且在索引一个较大的字符串时出现错误。

{
  "error": "IllegalArgumentException[Document contains at least one immense term in field=\"response_body\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[7b 22 58 48 49 5f 48 6f 74 65 6c 41 76 61 69 6c 52 53 22 3a 7b 22 6d 73 67 56 65 72 73 69]...']",
  "status": 500
}

索引的映射:

{
  "template": "partner_requests-*",
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "request": {
      "properties": {
        "asn_id": { "index": "not_analyzed", "type": "string" },
        "search_id": { "index": "not_analyzed", "type": "string" },
        "partner": { "index": "not_analyzed", "type": "string" },
        "start": { "type": "date" },
        "duration": { "type": "float" },
        "request_method": { "index": "not_analyzed", "type": "string" },
        "request_url": { "index": "not_analyzed", "type": "string" },
        "request_body": { "index": "not_analyzed", "type": "string" },
        "response_status": { "type": "integer" },
        "response_body": { "index": "not_analyzed", "type": "string" }
      }
    }
  }
}

我搜索了文档,但没有找到与最大字段大小有关的任何内容。根据核心类型部分,我不明白为什么要为某个not_analyzed字段“校正分析仪” 。


阅读 603

收藏
2020-06-22

共1个答案

一尘不染

因此,您遇到了一个术语的最大大小问题。当您将一个字段设置为not_analyzed时,会将其视为一个术语。基本Lucene索引中单个术语的最大大小为32766字节,我相信这是硬编码的。

您的两个主要选择是将类型更改为二进制或继续使用字符串,但将索引类型设置为“ no”。

2020-06-22