PostgreSQL查询删除重叠时间记录,同时保留最早?

我试图找出一种方法来删除重叠时间的记录,但我无法找出保持所有但这些记录重叠的一个的简单和优雅的方法。这个问题与this one类似,但有一些差异。我们的表看起来像:PostgreSQL查询删除重叠时间记录,同时保留最早?

╔════╤═══════════════════════════════════════╤══════════════════════════════════════╤════════╤═════════╗ 

║ id │ start_time │ end_time │ bar │ baz ║

╠════╪═══════════════════════════════════════╪══════════════════════════════════════╪════════╪═════════╣

║ 0 │ Mon, 18 Dec 2017 16:08:33 UTC +00:00 │ Mon, 18 Dec 2017 17:08:33 UTC +00:00 │ "ham" │ "eggs" ║

╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢

║ 1 │ Mon, 18 Dec 2017 16:08:32 UTC +00:00 │ Mon, 18 Dec 2017 17:08:32 UTC +00:00 │ "ham" │ "eggs" ║

╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢

║ 2 │ Mon, 18 Dec 2017 16:08:31 UTC +00:00 │ Mon, 18 Dec 2017 17:08:31 UTC +00:00 │ "spam" │ "bacon" ║

╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢

║ 3 │ Mon, 18 Dec 2017 16:08:30 UTC +00:00 │ Mon, 18 Dec 2017 17:08:30 UTC +00:00 │ "ham" │ "eggs" ║

╚════╧═══════════════════════════════════════╧══════════════════════════════════════╧════════╧═════════╝

在上面的例子中,所有的记录有重叠的时间,其中重叠只是意味着时间的范围内定义的记录的start_timeend_time(含)覆盖或延伸的一部分另一个记录。但是,对于这个问题,我们不仅对那些有重叠时间的记录感兴趣,而且还有匹配的barbaz列(上面的行0,1和3)。在找到这些记录后,我们希望最早删除所有记录,仅留下记录2和3的表格,因为记录2没有匹配的barbaz列,而且3具有最早的开始和结束时间。

这是我到目前为止有:

delete from foos where id in (

select

foo_one.id

from

foos foo_one

where

user_id = 42

and exists (

select

1

from

foos foo_two

where

tsrange(foo_two.start_time::timestamp, foo_two.end_time::timestamp, '[]') &&

tsrange(foo_one.start_time::timestamp, foo_one.end_time::timestamp, '[]')

and

foo_one.bar = foo_two.bar

and

foo_one.baz = foo_two.baz

and

user_id = 42

and

foo_one.id != foo_two.id

)

);

感谢您的阅读!

更新:我发现,对我工作的解决方案,基本上我可以申请窗口函数row_number()在由barbaz领域分组,则该表的分区添加WHERE条款的DELETE声明不包括第一个条目(最小的那个)id

delete from foos where id in (

select id from (

select

foo_one.id,

row_number() over(partition by

bar,

baz

order by id asc)

from

foos foo_one

where

user_id = 42

and exists (

select

*

from

foos foo_two

where

tsrange(foo_two.start_time::timestamp,

foo_two.end_time::timestamp,

'[]') &&

tsrange(foo_one.start_time::timestamp,

foo_one.end_time::timestamp,

'[]')

and

foo_one.id != foo_two.id

)

) foos where row_number <> 1

);

回答:

首先,小记:你真的应该提供一些更多的信息。我知道你可能不想展示你的业务的一些真实的专栏,但它的方式使你更难理解你想要的东西。

但是,我将就这个问题提供一些提示。我希望这能帮助你,以及有类似问题的人。

  1. 你需要明确什么定义为重叠。对每个人来说,这可能有很多不同的事情。

看看这些事件:

<--a--> 

<---- b ---->

<---- c ---->

<-- d -->

<---- e ---->

<------- f -------->

<--- g --->

如果定义重叠像谷歌的定义:上延伸,以覆盖部分,然后 “B”, “d”, “E”和“f”重叠部分“c”事件。如果定义重叠就像覆盖整个事件一样,则“c”重叠“d”,并且“f”重叠“b”和“c”和“d”。

  1. 删除组可能是一个问题。在之前的情况下,我们应该做什么?我们是否应该删除“b”,“c”和“d”并保持“f”?我们应该总结他们的价值吗也许是平均值?所以,这是一个逐列的决定。每列的含义非常重要。所以,我无法帮助你“酒吧”和“巴兹”。

  2. 所以,试图猜测你真的想,我创造与ID事件的类似的表什么,开始,结束和user_id说明

    create table events (

    id integer,

    user_id integer,

    start_time timestamp,

    end_time timestamp,

    name varchar(100)

    );

我加入例如值

现在

insert into events 

(id, user_id, start_time, end_time, name) values

(1, 1000, timestamp('2017-10-09 01:00:00'),timestamp('2017-10-09 04:00:00'), 'a');

insert into events

(id, user_id, start_time, end_time, name) values

(2, 1000, timestamp('2017-10-09 03:00:00'),timestamp('2017-10-09 15:00:00'), 'b');

insert into events

(id, user_id, start_time, end_time, name) values

(3, 1000, timestamp('2017-10-09 07:00:00'),timestamp('2017-10-09 19:00:00'), 'c');

insert into events

(id, user_id, start_time, end_time, name) values

(4, 1000, timestamp('2017-10-09 09:00:00'),timestamp('2017-10-09 17:00:00'), 'd');

insert into events

(id, user_id, start_time, end_time, name) values

(5, 1000, timestamp('2017-10-09 17:00:00'),timestamp('2017-10-09 23:00:00'), 'e');

insert into events

(id, user_id, start_time, end_time, name) values

(6, 1000, timestamp('2017-10-09 02:30:00'),timestamp('2017-10-09 22:00:00'), 'f');

insert into events

(id, user_id, start_time, end_time, name) values

(7, 1000, timestamp('2017-10-09 17:30:00'),timestamp('2017-10-10 02:00:00'), 'g');

,我们可以用一些不错的发挥疑问:

列出所有充满事件另一个事件重叠:

select 

# EVENT NAME

event_1.name as event_name,

# LIST EVENTS THAT THE EVENT OVERLAPS

GROUP_CONCAT(event_2.name) as overlaps_names

from events as event_1

inner join events as event_2

on

event_1.user_id = event_2.user_id

and

event_1.id != event_2.id

and

(

# START AFTER THE EVENT ONE

event_2.start_time >= event_1.start_time and

# ENDS BEFORE THE EVENT ONE

event_2.end_time <= event_1.end_time

)

group by

event_1.name

结果:

+------------+----------------+ 

| event_name | overlaps_names |

+------------+----------------+

| c | d |

| f | b,d,c |

+------------+----------------+

要检测的部分重叠,则需要像这样:

select 

# EVENT NAME

event_1.name as event_name,

# LIST EVENTS THAT THE EVENT OVERLAPS

GROUP_CONCAT(event_2.name) as overlaps_names

from events as event_1

inner join events as event_2

on

event_1.user_id = event_2.user_id

and

event_1.id != event_2.id

and

(

(

# START AFTER THE EVENT ONE

event_2.start_time >= event_1.start_time and

# ENDS BEFORE THE EVENT ONE

event_2.start_time <= event_1.end_time

) or

(

# START AFTER THE EVENT ONE

event_2.end_time >= event_1.start_time and

# ENDS BEFORE THE EVENT ONE

event_2.end_time <= event_1.end_time

)

)

group by

event_1.name

结果:

+------------+----------------+ 

| event_name | overlaps_names |

+------------+----------------+

| a | b,f |

| b | c,d,a |

| c | b,d,e,g |

| d | b,e |

| e | f,g,d,c |

| f | a,g,b,d,c,e |

| g | c,e,f |

+------------+----------------+

当然,我正在使用“g按照“更容易阅读。如果你想在删除之前总结或取平均重叠数据来更新你的父数据,那么这也会很有用。也许这个“group_concat”函数不存在Postgres中或具有不同的名称。一“标准的SQL”,你可以测试它是:

select 

# EVENT NAME

event_1.name as event_name,

# LIST EVENTS THAT THE EVENT OVERLAPS

event_2.name as overlaps_name

from events as event_1

inner join events as event_2

on

event_1.user_id = event_2.user_id

and

event_1.id != event_2.id

and

(

# START AFTER THE EVENT ONE

event_2.start_time >= event_1.start_time and

# ENDS BEFORE THE EVENT ONE

event_2.end_time <= event_1.end_time

)

结果:

+------------+---------------+ 

| event_name | overlaps_name |

+------------+---------------+

| f | b |

| f | c |

| c | d |

| f | d |

+------------+---------------+

如果你想尝试一些数学运算,记住增加的价值的风险“ c“和”d“数据放在”b“上,再将它们的值加到”f“上,使”f“的值错误。

// should be 

new f = old f + b + old c + d

new c = old c + b + d // unecessary if you are going to delete it

// very common mistake

new c = old c + b + d // unecessary but not wrong yet

new f = new c + b + d = (old c + b + d) + b + d // wrong!!

您可以测试所有这些查询并创建自己的在线将使用此URL http://sqlfiddle.com/#!9/1d2455/19同一个数据库。但是,请记住它是Mysql,而不是Postgresql。但是测试标准SQL是非常好的。

以上是 PostgreSQL查询删除重叠时间记录,同时保留最早? 的全部内容, 来源链接: utcz.com/qa/260602.html

回到顶部