SQL GROUP BY不同的行(SQL GROUP BY distinct rows)

我有一个Postges数据库,有一个非常长的表和3列,如下所示:

s_id | c_id | a_id 1 | 1 | 2 1 | 1 | 3 1 | 3 | 15 2 | 1 | 2 2 | 2 | 23 3 | 1 | 2 3 | 3 | 16

我有一个查询,找到所有具有c_id 1和3的s_ids,返回它们及其计数:

SELECT s_id, COUNT(s_id) as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(c_id) >= 2 ORDER BY matching_clusters DESC

我得到的是以下内容:

s_id | matching_clusters 1 | 3 3 | 2

但是,我只想计算一次重复的c_id,这样结果应该是

s_id | matching_clusters 1 | 2 3 | 2

有关如何做到这一点的任何建议? 我以为我可以将DISTINCT粘贴到COUNT命令中,但这不起作用。 我可以使用不同的c_id将表结果连接到表本身但我不想重新运行查询,因为在此表上运行查询是非常昂贵的计算方式。

I have a Postges database with one very long table and 3 columns like so:

s_id | c_id | a_id 1 | 1 | 2 1 | 1 | 3 1 | 3 | 15 2 | 1 | 2 2 | 2 | 23 3 | 1 | 2 3 | 3 | 16

I have a query that finds all s_ids that have c_id 1 and 3, returns them and their counts:

SELECT s_id, COUNT(s_id) as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(c_id) >= 2 ORDER BY matching_clusters DESC

What I get back is the following:

s_id | matching_clusters 1 | 3 3 | 2

But, I only want to count recurring c_id once, such that results here should be

s_id | matching_clusters 1 | 2 3 | 2

Any suggestions on how to do this? I thought I can stick DISTINCT into the COUNT command, but that didn't work. I can probably join the result on table itself with distinct c_id but I don't want to re-run the query because running a query on this table is very expensive computation wise.

最满意答案

如果我理解正确,那么这将有效:

SELECT s_id, 2 as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(c_id) >= 2 ORDER BY matching_clusters DESC;

这可能是你想要的:

SELECT s_id, COUNT(DISTINCT c_id) as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(DISTINCT c_id) = 2 ORDER BY matching_clusters DESC;

注意在having子句中使用distinct 。

If I understand correctly, then this will work:

SELECT s_id, 2 as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(c_id) >= 2 ORDER BY matching_clusters DESC;

This may be what you want:

SELECT s_id, COUNT(DISTINCT c_id) as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(DISTINCT c_id) = 2 ORDER BY matching_clusters DESC;

Note the use of distinct in the having clause.

SQL GROUP BY不同的行(SQL GROUP BY distinct rows)

我有一个Postges数据库,有一个非常长的表和3列,如下所示:

s_id | c_id | a_id 1 | 1 | 2 1 | 1 | 3 1 | 3 | 15 2 | 1 | 2 2 | 2 | 23 3 | 1 | 2 3 | 3 | 16

我有一个查询,找到所有具有c_id 1和3的s_ids,返回它们及其计数:

SELECT s_id, COUNT(s_id) as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(c_id) >= 2 ORDER BY matching_clusters DESC

我得到的是以下内容:

s_id | matching_clusters 1 | 3 3 | 2

但是,我只想计算一次重复的c_id,这样结果应该是

s_id | matching_clusters 1 | 2 3 | 2

有关如何做到这一点的任何建议? 我以为我可以将DISTINCT粘贴到COUNT命令中,但这不起作用。 我可以使用不同的c_id将表结果连接到表本身但我不想重新运行查询,因为在此表上运行查询是非常昂贵的计算方式。

I have a Postges database with one very long table and 3 columns like so:

s_id | c_id | a_id 1 | 1 | 2 1 | 1 | 3 1 | 3 | 15 2 | 1 | 2 2 | 2 | 23 3 | 1 | 2 3 | 3 | 16

I have a query that finds all s_ids that have c_id 1 and 3, returns them and their counts:

SELECT s_id, COUNT(s_id) as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(c_id) >= 2 ORDER BY matching_clusters DESC

What I get back is the following:

s_id | matching_clusters 1 | 3 3 | 2

But, I only want to count recurring c_id once, such that results here should be

s_id | matching_clusters 1 | 2 3 | 2

Any suggestions on how to do this? I thought I can stick DISTINCT into the COUNT command, but that didn't work. I can probably join the result on table itself with distinct c_id but I don't want to re-run the query because running a query on this table is very expensive computation wise.

最满意答案

如果我理解正确,那么这将有效:

SELECT s_id, 2 as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(c_id) >= 2 ORDER BY matching_clusters DESC;

这可能是你想要的:

SELECT s_id, COUNT(DISTINCT c_id) as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(DISTINCT c_id) = 2 ORDER BY matching_clusters DESC;

注意在having子句中使用distinct 。

If I understand correctly, then this will work:

SELECT s_id, 2 as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(c_id) >= 2 ORDER BY matching_clusters DESC;

This may be what you want:

SELECT s_id, COUNT(DISTINCT c_id) as matching_clusters FROM test WHERE c_id IN (1,3) GROUP BY s_id HAVING COUNT(DISTINCT c_id) = 2 ORDER BY matching_clusters DESC;

Note the use of distinct in the having clause.