关于hadoop运行python写的单词计数的脚本报错

clipboard.png
这是我参考的文章,按着这个文章来跑的程序https://blog.csdn.net/wangato...
如图,总是报第一句话的错误,我百度了原因,但是都没能解决。
1.在文章中python没有报错,按照文章中也都可以统计单词次数,但是用hadoop就报

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)

at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)

at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

新手,而且实在没找到原因。这是我的python程序
1.mapper.py

#!/usr/bin/env python3

import sys

# input comes from STDIN (standard input)

for line in sys.stdin:

# remove leading and trailing whitespace

line = line.strip()

# split the line into words

words = line.split()

# increase counters

for word in words:

# write the results to STDOUT (standard output);

# what we output here will be the input for the

# Reduce step, i.e. the input for reducer.py

#

# tab-delimited; the trivial word count is 1

print ('%s\t%s' % (word, 1))

2.reduce.py

#!/usr/bin/env python3

from operator import itemgetter

import sys

current_word = None

current_count = 0

word = None

# input comes from STDIN

for line in sys.stdin:

# remove leading and trailing whitespace

line = line.strip()

# parse the input we got from mapper.py

word, count = line.split('\t', 1)

# convert count (currently a string) to int

try:

count = int(count)

except ValueError:

# count was not a number, so silently

# ignore/discard this line

continue

# this IF-switch only works because Hadoop sorts map output

# by key (here: word) before it is passed to the reducer

if current_word == word:

current_count += count

else:

if current_word:

# write result to STDOUT

print ('%s\t%s' % (current_word, current_count))

current_count = count

current_word = word

# do not forget to output the last word if needed!

if current_word == word:

print ('%s\t%s' % (current_word, current_count))

以上是 关于hadoop运行python写的单词计数的脚本报错 的全部内容, 来源链接: utcz.com/a/158138.html

回到顶部