SAS中的复杂数据重构问题

Z时代
2024-01-10
分类：问答

我有一个卡历史数据集，如下所示。对于每个客户，他们可能在同一天申请一张或多张卡片。但是，由于各种原因，他们的卡被取代。发卡日期是发卡时的日期。新卡ID是替换卡的ID。例如，对于顾客A，他的卡首先在2017年2月1日发行，卡ID为1234. 3天后，他失去了他的卡并且在2017年5月5日发行了新卡（1235）。SAS中的复杂数据重构问题

Customer ID First Issue Date Card Issue Date Card ID New Card ID A 2/1/2017 2/1/2017 1234 1235 A 2/1/2017 5/2/2017 1235 　 B 5/2/2017 5/2/2017 1245 1248 B 5/2/2017 5/2/2017 1236 1249 B 5/2/2017 10/3/2017 1248 1250 B 5/2/2017 5/3/2017 1249 1251 B 5/2/2017 10/4/2017 1250 　 B 5/2/2017 5/4/2017 1251

我要的是集团原卡和所有的替代在一起。例如，客户B在5/2/217申请了两张卡。卡ID 1245,1248和1250在同一组（Seq No 1）中，卡ID 1236,1249和1251在同一组中（Seq No 2）。

Customer ID Open Date Card Issue Date Card ID Seq No A 2/1/2017 2/1/2017 1234 1 A 2/1/2017 5/2/2017 1235 1 B 5/2/2017 5/2/2017 1245 1 B 5/2/2017 10/3/2017 1248 1 B 5/2/2017 10/4/2017 1250 1 B 5/2/2017 5/2/2017 1236 2 B 5/2/2017 5/3/2017 1249 2 B 5/2/2017 5/4/2017 1251 2

请帮我完成这个数据转换。

下面是输入文件

data test; infile datalines dsd truncover ; input Customer:$1. First_Issue_Date: ddmmyy10. Card_Issue_Date: ddmmyy10. Card_ID: $4. New_Card_ID: $4. ; format First_Issue_Date ddmmyy10. Card_Issue_Date ddmmyy10.; datalines; A,02/01/2017,02/01/2017,1234,1235, A,02/01/2017,05/02/2017,1235,, B,05/02/2017,05/02/2017,1245,1248, B,05/02/2017,05/02/2017,1236,1249, B,05/02/2017,10/03/2017,1248,1250, B,05/02/2017,05/03/2017,1249,1251, B,05/02/2017,10/04/2017,1250,, B,05/02/2017,05/04/2017,1251,, ;

回答：

数据跳跃哈希对象对于遍历身份跟踪数据中的路径非常有效。假定每个Card_ID在所有客户中都是唯一的，并且每个New_Card_ID值在数据集中都有对应的Card_ID值，那么此代码将在无数重发中找到唯一的路径ID。

data paths(keep=card_id path_id); 
    if 0 then set have; * prep pdv; 
    call missing (Path_ID); 
    * for tracking the tip of the card_id trail; 
    DECLARE HASH currentCard(hashexp: 9); 
    currentCard.defineKey ('Card_ID'); 
    currentCard.defineData ('Card_ID', 'Path_ID'); 
    currentCard.defineDone(); 
    * for tracking everything up to the tip (segment); 
    DECLARE HASH replacedCard(hashexp:10); 
    replacedCard.defineKey ('New_Card_ID');  
    replacedCard.defineData('Card_ID'); 
    replacedCard.defineDone(); 
    * fill the two hashes; 
    do until (lastrow); 
    set have (keep=Card_ID New_Card_ID) end=lastrow; 
    if missing(New_Card_ID) then 
     Path_ID + 1; 
    if missing(New_Card_ID) 
     then currentCard.add(); 
     else replacedCard.add(); 
    end; 
    * for each tip of a path output the tip and all its segments; 
    declare hiter tipIter('currentCard'); 
    do while (tipIter.next() = 0); 
    output; * tip; 
    do while (replacedCard.find(key:Card_ID) = 0); 
     output; * segment; 
    end; 
    end; 
    stop; 
run;

如果您真的需要Seq = 1..N在客户内您将不得不做额外的排序和合并。

我的2009年NESUG论文"Using HASH to find a sum over a transactional path"有关于关联交易的类似讨论。

回答：

数据跳跃你所寻找的是一个连接成分分析。如果你有它，PROC OPTNET可以给你你想要的。

不幸的是，它不支持BY语句，因此您在使用它来分组卡片后必须生成序号。

首先根据卡数据创建节点“从/到”数据。

data nodes; 
set test; 
node = put(_n_,best12.); 
from = card_id; 
to = new_card_id; 
if to = . then to=from; 
run;

然后运行分析。

proc optnet data_links=nodes out_nodes=nodes_out; 
concomp; 
run;

这会生成卡片及其组（变量concomp）的列表。

将该组加回原始数据并对其进行分类。

proc sql noprint; create table want as select a.customer, a.First_Issue_Date, a.Card_Issue_Date, a.Card_ID, b.concomp from test as a left join nodes_out as b on a.card_id = b.node order by customer, concomp, Card_Issue_Date; quit;

现在都只是在排列1，2，...，N你想要可以使用数据步骤把这些信息并创建seq_no

data want(drop=concomp); 
set want; 
by customer concomp; 
retain seq_no ; 
if first.customer then 
    seq_no = 0; 
if first.concomp then 
    seq_no = seq_no + 1; 
run;

以上是 SAS中的复杂数据重构问题的全部内容，来源链接： utcz.com/qa/263381.html

SAS中的复杂数据重构问题

回答：

回答：

其他人也看了：