Spark:将Scala ML模型加载到PySpark
我在scala Spark中训练了一个LDA模型。Spark:将Scala ML模型加载到PySpark
val lda = new LDA().setK(k).setMaxIter(iter).setFeaturesCol(colnames).fit(data) lda.save(path)
我检查了我保存的模型,它包含两个文件夹:元数据和数据。
然而,当我尝试这种模式加载到PySpark,我得到了一个错误说:
model = LDAModel.load(sc, path = path) File "/Users/hongbowang/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-
0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o33.loadLDAModel.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist:file:/Users/hongbowang/Personal/Spark%20Program/Spark%20Project/
T1/output_K20_topic/lda/metadata
有谁知道我该如何解决?非常感谢〜!
回答:
您已保存ml.clustering.LDAModel
,但您尝试使用mllib.clustering.LDAModel
阅读。您应该导入正确的LDAModel
。对于本地模式:
from pyspark.ml.clustering import LocalLDAModel LocalLDAModel.load(path)
分布式模型:
from pyspark.ml.clustering import DistributedLDAModel DistributedLDAModel.load(path)
以上是 Spark:将Scala ML模型加载到PySpark 的全部内容, 来源链接: utcz.com/qa/258683.html