我们有一个表格用来存储问题的答案。我们需要能够找到对特定问题有特定答案的用户。因此,如果我们的表包含以下数据:
user_id question_id answer_value Sally 1 Pooch Sally 2 Peach John 1 Pooch John 2 Duke
我们想找到对问题 1 回答“Pooch”,对问题 2 回答“Peach”的用户,以下 SQL 将(显然)不起作用:
select user_id from answers where question_id=1 and answer_value = 'Pooch' and question_id=2 and answer_value='Peach'
我的第一个想法是为我们正在寻找的每个答案自行加入表格:
select a.user_id from answers a, answers b where a.user_id = b.user_id and a.question_id=1 and a.answer_value = 'Pooch' and b.question_id=2 and b.answer_value='Peach'
这可行,但由于我们允许任意数量的搜索过滤器,我们需要找到更有效的东西。我的下一个解决方案是这样的:
select user_id, count(question_id) from answers where ( (question_id=2 and answer_value = 'Peach') or (question_id=1 and answer_value = 'Pooch') ) group by user_id having count(question_id)>1
但是,我们希望用户能够两次填写同一份问卷,这样他们就有可能在答案表中对问题 1 有两个答案。
所以,现在我很茫然。解决这个问题的最佳方法是什么?谢谢!
我找到了一种无需自联接即可执行此查询的巧妙方法。
我在 MySQL 5.5.8 for Windows 中运行了这些命令,得到了以下结果:
use test DROP TABLE IF EXISTS answers; CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20)); INSERT INTO answers VALUES ('Sally',1,'Pouch'), ('Sally',2,'Peach'), ('John',1,'Pooch'), ('John',2,'Duke'); INSERT INTO answers VALUES ('Sally',1,'Pooch'), ('Sally',2,'Peach'), ('John',1,'Pooch'), ('John',2,'Duck'); SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers FROM answers GROUP BY user_id,question_id; +---------+-------------+---------------+ | user_id | question_id | given_answers | +---------+-------------+---------------+ | John | 1 | Pooch | | John | 2 | Duke,Duck | | Sally | 1 | Pouch,Pooch | | Sally | 2 | Peach | +---------+-------------+---------------+
该显示显示约翰对问题 2 给出了两个不同的答案,而莎莉对问题 1 给出了两个不同的答案。
要捕获所有用户对哪些问题的回答不同,只需将上述查询放在子查询中并检查给定答案列表中的逗号以获得不同答案的计数,如下所示:
SELECT user_id,question_id,given_answers, (LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers FROM answers GROUP BY user_id,question_id) A;
我懂了:
+---------+-------------+---------------+-------------------+ | user_id | question_id | given_answers | multianswer_count | +---------+-------------+---------------+-------------------+ | John | 1 | Pooch | 1 | | John | 2 | Duke,Duck | 2 | | Sally | 1 | Pouch,Pooch | 2 | | Sally | 2 | Peach | 1 | +---------+-------------+---------------+-------------------+
现在只需使用另一个子查询过滤掉 multianswer_count = 1 的行:
SELECT * FROM (SELECT user_id,question_id,given_answers, (LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;
这就是我得到的:
+---------+-------------+---------------+-------------------+ | user_id | question_id | given_answers | multianswer_count | +---------+-------------+---------------+-------------------+ | John | 2 | Duke,Duck | 2 | | Sally | 1 | Pouch,Pooch | 2 | +---------+-------------+---------------+-------------------+
本质上,我执行了三个表扫描:1 次在主表上,2 次在小子查询上。没有加入!