我使用NOT IN，但它是缓慢

Z时代
2024-01-10
分类：问答

CRM表例子：我使用NOT IN，但它是缓慢

`crm` example: 
+----+--------+---------------------+--------------------+ 
| id | name |   date  |  status  | 
+----+--------+---------------------+--------------------+ 
| 1 | john | 2017-12-27 10:58:10 | A status   | 
| 2 | steve | 2017-12-27 10:58:08 | A status   | 
| 3 | eric | 2017-12-27 10:58:04 | Delivery Arranged | 
| 4 | phil | 2017-12-27 10:57:55 | A status   | 
| 5 | bob | 2017-12-27 10:57:52 | A status   | 
| 6 | foo | 2017-12-27 10:57:50 | A status   | 
| 7 | steven | 2017-12-27 10:57:48 | Delivery Arranged | 
| 8 | paul | 2017-12-27 10:57:43 | A status   | 
| 9 | alex | 2017-12-27 10:57:31 | Delivery Arranged |

我查询的目的是要返回的crm行，其中的status是交货安排的数量， date介于2017-12-01和2018-01-01之间。

所以，这里是我的主要查询：

SET @from='2017-12-01'; 
SET @to='2018-01-01'; 
SELECT 
     COUNT(*) AS `delivery_arranged` 
    FROM 
     `crm` a 
    WHERE 
     a.`status` = 'Delivery Arranged' 
      AND DATE(a.`date`) BETWEEN @from AND @to

结果：

+---------------------+ 
| delivery_arranged | 
+---------------------+ 
| 30     |

都很好。但我想要折扣那些曾经有过的行（实际上除此日期范围外）已被设置为交货安排。我有一个statuslog表，我可以用这个：

STATUSLOG表例子：

`statuslog` example: 
+--------+-------+---------------------+-----------+---------------------+ 
| id | crmid |  date   | user |  status  | 
+--------+-------+---------------------+-----------+---------------------+ 
| 818572 | 1  | 2017-12-27 10:58:10 | johnsmith | Some status change | 
| 818571 | 2  | 2017-12-27 10:58:08 | johnsmith | Some status change | 
| 818570 | 3  | 2017-12-27 10:58:04 | another | Delivery Arranged | 
| 818569 | 4  | 2017-12-27 10:57:55 | another | Delivery Arranged | 
| 818568 | 5  | 2017-12-27 10:57:52 | johnsmith | Some status change | 
| 818567 | 6  | 2017-12-27 10:57:50 | another | Some status change | 
| 818566 | 7  | 2017-12-27 10:57:48 | johnsmith | Delivery Arranged | 
| 818565 | 8  | 2017-12-27 10:57:43 | another | Some status change | 
| 818564 | 9  | 2017-12-27 10:57:31 | johnsmith | Some status change |

所以用这个表，我可以从statuslog得到行不日期间然后做一个NOT IN：

SELECT 
     COUNT(*) AS `delivery_arranged` 
    FROM 
     `crm` a 
    WHERE 
     a.`status` = 'Delivery Arranged' 
      AND DATE(a.`date`) BETWEEN @from AND @to 
      AND a.`id` 
      NOT IN (
      SELECT 
       a.crmid AS `crmid` 
      FROM 
       statuslog a 
      WHERE 
       a.status = 'Delivery Arranged' 
        AND DATE(a.`date`) NOT BETWEEN @from AND @to 
      GROUP BY a.crmid 
      ORDER BY a.`date` DESC 
      )

这个工程，但取决于th e日期范围的大小可能需要很长时间！ statuslog有> 2,000,000行。

如何使此查询更快？

回答：

LEFT JOIN可能比代孕子查询更好：

SELECT 
    COUNT(*) AS `delivery_arranged` 
FROM 
    `crm` a 
LEFT OUTER JOIN 
    (
     SELECT 
      a.crmid AS `crmid` 
     FROM 
      statuslog a 
     WHERE 
      a.status = 'Delivery Arranged' 
       AND DATE(a.`date`) NOT BETWEEN @from AND @to 
     GROUP BY a.crmid 
     --ORDER BY a.`date` DESC --<-- this has no sense 
    ) b 
    on a.`id` = b.crmid 
WHERE 
    b.crmid is null and --<- not int translated to left join 
    a.`status` = 'Delivery Arranged' 
    AND DATE(a.`date`) BETWEEN @from AND @to

另外，记得使用正确的索引。

回答：

这通常会更快，如果您使用的是LEFT JOIN/WHERE：

SELECT COUNT(*) AS delivery_arranged 
FROM crm c LEFT JOIN 
    statuslog sl 
    ON sl.crmid = c.id AND 
     sl.status = 'Delivery Arranged' 
     sl.date >= @from AND 
     sl.date < @to + INTERVAL 1 DAY 
WHERE c.status = 'Delivery Arranged' AND 
     c.date >= @from AND 
     c.date < @to + INTERVAL 1 DAY AND 
     sl.crmid IS NULL;

对于这个版本，你想在crm(status, date, id)和statuslog(crmid, status, date)指标。

请注意，这会更改日期比较以避免在列上调用函数。这使得使用包含date列的索引更为可行。

以上是我使用NOT IN，但它是缓慢的全部内容，来源链接： utcz.com/qa/264890.html