「Flink」使用ManagedKeyedState实现计数窗口功能
先上代码:
public class WordCountKeyedState {public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env
= StreamExecutionEnvironment.getExecutionEnvironment();// 初始化测试单词数据流DataStreamSource
<String> lineDS = env.addSource(new RichSourceFunction<String>() {private boolean isCanaled
= false;@Override
public void run(SourceContext
<String> ctx) throws Exception {while(!isCanaled) {ctx.collect(
"hadoop flink spark");Thread.sleep(
1000);}
}
@Override
public void cancel() {
isCanaled
= true;}
});
// 切割单词,并转换为元组SingleOutputStreamOperator
<Tuple2<String, Integer>> wordTupleDS = lineDS.flatMap((String line, Collector<Tuple2<String, Integer>> ctx) -> {Arrays.stream(line.split(
"")).forEach(word -> ctx.collect(Tuple2.of(word, 1)));}).returns(Types.TUPLE(Types.STRING, Types.INT));
// 按照单词进行分组KeyedStream
<Tuple2<String, Integer>, Integer> keyedWordTupleDS = wordTupleDS.keyBy(t -> t.f1);// 对单词进行计数keyedWordTupleDS.flatMap(new RichFlatMapFunction
<Tuple2<String, Integer>, Tuple2<String, Integer>>() {private transient ValueState
<Tuple2<Integer, Integer>> countSumValueState;@Override
public void open(Configuration parameters) throws Exception {
// 初始化ValueStateValueStateDescriptor
<Tuple2<Integer, Integer>> countSumValueStateDesc = new ValueStateDescriptor("countSumValueState",TypeInformation.of(new TypeHint
<Tuple2<Integer, Integer>>() {}));
countSumValueState
= getRuntimeContext().getState(countSumValueStateDesc);}
@Override
public void flatMap(Tuple2
<String, Integer> value, Collector<Tuple2<String, Integer>> out) throws Exception {if(countSumValueState.value() == null) {countSumValueState.update(Tuple2.of(0, 0));
}
Integer count
= countSumValueState.value().f0;count
++;Integer valueSum
= countSumValueState.value().f1;valueSum
+= value.f1;countSumValueState.update(Tuple2.of(count, valueSum));
// 每当达到3次,发送到下游if(count > 3) {out.collect(Tuple2.of(value.f0, valueSum));
// 清除计数countSumValueState.update(Tuple2.of(0, valueSum));
}
}
}).
print();env.execute(
"KeyedState State");}
}
代码说明:
1、构建测试数据源,每秒钟发送一次文本,为了测试方便,这里就发一个包含三个单词的文本行
2、对句子按照空格切分,并将单词转换为元组,每个单词初始出现的次数为1
3、按照单词进行分组
4、自定义FlatMap
初始化ValueState,注意:ValueState只能在KeyedStream中使用,而且每一个ValueState都对一个一个key。每当一个并发处理ValueState,都会从上下文获取到Key的取值,所以每个处理逻辑拿到的ValueStated都是对应指定key的ValueState,这个部分是由Flink自动完成的。
注意:
带默认初始值的ValueStateDescriptor已经过期了,官方推荐让我们手动在处理时检查是否为空
instead and manually manage the default value by checking whether the contents of the state is null.
”
/**
* Creates a new {@code ValueStateDescriptor} with the given name, default value, and the specific
* serializer.
*
* @deprecated Use {@link #ValueStateDescriptor(String, TypeSerializer)} instead and manually
* manage the default value by checking whether the contents of the state is {@code null}.
*
* @param name The (unique) name for the state.
* @param typeSerializer The type serializer of the values in the state.
* @param defaultValue The default value that will be set when requesting state without setting
* a value before.
*/
@Deprecated
public ValueStateDescriptor(String name, TypeSerializer<T> typeSerializer, T defaultValue) {
super(name, typeSerializer, defaultValue);
}
5、逻辑实现
在flatMap逻辑中判断ValueState是否已经初始化,如果没有手动给一个初始值。并进行累加后更新。每当count > 3发送计算结果到下游,并清空计数。
以上是 「Flink」使用ManagedKeyedState实现计数窗口功能 的全部内容, 来源链接: utcz.com/z/532143.html