我想使用Tire gem作为ElasticSearch的客户端来索引pdf附件。在我的映射中,我从_source中排除了附件字段,因此附件不存储在索引中, 也不在搜索结果中返回 :
mapping :_source => { :excludes => ['attachment_original'] } do indexes :id, :type => 'integer' indexes :folder_id, :type => 'integer' indexes :attachment_file_name indexes :attachment_updated_at, :type => 'date' indexes :attachment_original, :type => 'attachment' end
运行以下curl命令时,仍然可以看到搜索结果中包含的附件内容:
curl -X POST "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{ "query": { "query_string": { "query": "rspec" } } }'
我已经在这个线程中发布了我的问题:
但是我刚刚注意到,不仅附件包含在搜索结果中,而且所有其他字段(包括未映射的字段)也都包含在内,如下所示:
{ "took": 20, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.025427073, "hits": [ { "_index": "user_files", "_type": "user_file", "_id": "5", "_score": 0.025427073, "_source": { "user_file": { "id": 5, "folder_id": 1, "updated_at": "2012-08-16T11:32:41Z", "attachment_file_size": 179895, "attachment_updated_at": "2012-08-16T11:32:41Z", "attachment_file_name": "hw4.pdf", "attachment_content_type": "application/pdf", "created_at": "2012-08-16T11:32:41Z", "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA" } } } ] } }
attachment_file_size并且attachment_content_type未在映射中定义,而是在搜索结果中返回:
attachment_file_size
attachment_content_type
{ "id": 5, "folder_id": 1, "updated_at": "2012-08-16T11:32:41Z", "attachment_file_size": 179895, <--------------------- "attachment_updated_at": "2012-08-16T11:32:41Z", "attachment_file_name": "hw4.pdf", <------------------ "attachment_content_type": "application/pdf", "created_at": "2012-08-16T11:32:41Z", "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA" }
这是我的完整实现:
include Tire::Model::Search include Tire::Model::Callbacks def self.search(folder, params) tire.search() do query { string params[:query], default_operator: "AND"} if params[:query].present? #filter :term, folder_id: folder.id #highlight :attachment_original, :options => {:tag => "<em>"} raise to_curl end end mapping :_source => { :excludes => ['attachment_original'] } do indexes :id, :type => 'integer' indexes :folder_id, :type => 'integer' indexes :attachment_file_name indexes :attachment_updated_at, :type => 'date' indexes :attachment_original, :type => 'attachment' end def to_indexed_json to_json(:methods => [:attachment_original]) end def attachment_original if attachment_file_name.present? path_to_original = attachment.path Base64.encode64(open(path_to_original) { |f| f.read }) end end
有人可以帮我弄清楚为什么所有字段都包含在中_source吗?
_source
编辑: 这是运行的输出localhost:9200/user_files/_mapping
localhost:9200/user_files/_mapping
{ "user_files": { "user_file": { "_source": { "excludes": [ "attachment_original" ] }, "properties": { "attachment_content_type": { "type": "string" }, "attachment_file_name": { "type": "string" }, "attachment_file_size": { "type": "long" }, "attachment_original": { "type": "attachment", "path": "full", "fields": { "attachment_original": { "type": "string" }, "author": { "type": "string" }, "title": { "type": "string" }, "name": { "type": "string" }, "date": { "type": "date", "format": "dateOptionalTime" }, "keywords": { "type": "string" }, "content_type": { "type": "string" } } }, "attachment_updated_at": { "type": "date", "format": "dateOptionalTime" }, "created_at": { "type": "date", "format": "dateOptionalTime" }, "folder_id": { "type": "integer" }, "id": { "type": "integer" }, "updated_at": { "type": "date", "format": "dateOptionalTime" } } } } }
如您所见,由于某些原因,所有字段都包含在映射中!
在您的中to_indexed_json,您包含了attachment_original方法,因此将其发送给elasticsearch。这也是为什么所有其他属性都包含在映射中并因此包含在源中的原因。
to_indexed_json
attachment_original
有关该主题的更多信息,请参见ElasticSearch&Tire:使用映射和to_indexed_json问题。
似乎Tire确实确实在将正确的映射JSON发送到elasticsearch -我的建议是使用Tire.configure { logger STDERR,level: "debug" }来检查正在发生的事情,并使用trz在原始级别上查明问题。
Tire.configure { logger STDERR,level: "debug" }