我在S3存储桶中存储了一些json文件,其中每个文件都有多个具有相同结构的元素。例如,
[{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}},{"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}},{"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}]
我想在雅典娜中创建一个与上述数据相对应的表。
我为创建表编写的查询:
CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.elb_logs2 ( `eventId` string, `eventName` string, `eventVersion` string, `eventSource` string, `awsRegion` string, `image` map<string,string> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'field.delim' = ' ' ) LOCATION 's3://<bucketname>/';
但是,如果我按以下方式执行SELECT查询,
SELECT * FROM sampledb.elb_logs4;
我得到以下结果:
1 {"eventid":"1","eventversion":"1.0","image":{"id":"101","message":"New item!"},"eventsource":"aws:dynamodb","eventname":"INSERT","awsregion":"us-west-2"} {"eventid":"2","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"MODIFY","awsregion":"us-west-2"} {"eventid":"3","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"REMOVE","awsregion":"us-west-2"}
json文件的全部内容在此处被选为一项。
如何将json文件的每个元素作为一个条目读取?
编辑:如何读取图像的每个子列,即地图的每个元素?
谢谢。
问题1:在AWS Athena的json文件中存储多个元素
我需要将我的json文件重写为
{“ eventId”:“ 1”,“ eventName”:“ INSERT”,“ eventVersion”:“ 1.0”,“ eventSource”:“ aws:dynamodb”,“ awsRegion”:“ us-west-2”,“ image” :{“ Message”:“新项!”,“ Id”:101}},{“ eventId”:“ 2”,“ eventName”:“ MODIFY”,“ eventVersion”:“ 1.0”,“ eventSource”:“ aws:dynamodb“,” awsRegion“:” us- west-2“,” image“:{” Message“:”此项已更改“,” Id“:101}},{” eventId“:” 3“, “ eventName”:“ REMOVE”,“ eventVersion”:“ 1.0”,“ eventSource”:“ aws:dynamodb”,“ awsRegion”:“ us-west-2”,“ image”:{“ Message”:“此项目已更改“,” Id“:101}}
那意味着
删除方括号[]将每个元素放在一行中
{.....................} {.....................} {.....................}
问题2。 访问非线性json属性
CREATE EXTERNAL TABLE IF NOT EXISTS <tablename> ( `eventId` string, `eventName` string, `eventVersion` string, `eventSource` string, `awsRegion` string, `image` struct <`Id` : string, `Message` : string> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', "dots.in.keys" = "true" ) LOCATION 's3://exampletablewithstream-us-west-2/';
查询:
select image.Id, image.message from <tablename>;
参考:
http://engineering.skybettingandgaming.com/2015/01/20/parsing-json-in- hive/
https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive- keywords