SparkTempView和GlobalTempView的区别

database

Spark TempView和GlobalTempView的区别

TempView和GlobalTempView在spark的Dataframe中经常使用,两者的区别和应用场景有什么不同。

我们以下面的例子比较下两者的不同。

from pyspark.sql import SparkSession

import numpy as np

import pandas as pd

spark = SparkSession.builder.getOrCreate()

d = np.random.randint(1,100, 5*5).reshape(5,-1)

data = pd.DataFrame(d, columns=list("abcde"))

df = spark.createDataFrame(data)

df.show()

+---+---+---+---+---+

| a| b| c| d| e|

+---+---+---+---+---+

| 17| 30| 61| 61| 33|

| 32| 23| 24| 7| 7|

| 47| 6| 4| 95| 34|

| 50| 69| 83| 21| 46|

| 52| 12| 83| 49| 85|

+---+---+---+---+---+

从tempview中取数据

temp = df.createTempView("temp")

temp_sql = "select * from temp where a=50"

res = spark.sql(temp_sql)

res.show()

+---+---+---+---+---+

| a| b| c| d| e|

+---+---+---+---+---+

| 50| 69| 83| 21| 46|

+---+---+---+---+---+

从globaltempview中取数据

glob = df.createGlobalTempView("glob")

glob_sql = "select * from global_temp.glob where a = 17"

res2 = spark.sql(glob_sql)

res2.show()

+---+---+---+---+---+

| a| b| c| d| e|

+---+---+---+---+---+

| 17| 30| 61| 61| 33|

+---+---+---+---+---+

Globaltempview 数据可以在多个sparkSession中共享

# 创建新的sparkSession

spark2 = spark.newSession()

spark2 == spark

False

# 新的sparkSession可以获取globaltempview中的数据

new_sql = "select * from global_temp.glob where a = 47"

temp = spark2.sql(new_sql)

temp.show()

+---+---+---+---+---+

| a| b| c| d| e|

+---+---+---+---+---+

| 47| 6| 4| 95| 34|

+---+---+---+---+---+

# 新的sparkSession无法获取tempview中的数据

# 会提示找不到temp表

new_sql2 = "select * from temp where a = 47"

temp = spark2.sql(new_sql2)

temp.show()

# 使用global_temp前缀也不行

new_sql2 = "select * from global_temp.temp where a = 47"

temp = spark2.sql(new_sql2)

temp.show()

---------------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

# 此处多行删除异常信息

AnalysisException: "Table or view not found: `global_temp`.`temp`; line 1 pos 14;

"Project [*]

+- "Filter ("a = 47)

+- "UnresolvedRelation `global_temp`.`temp`

"

tempview删除后无法使用

spark.catalog.dropTempView("temp")

spark.catalog.dropGlobalTempView("glob")

# 报错,找不到table temp

temp_sql2 = "select * from temp where a = 47"

temp = spark.sql(temp_sql2)

# 报错,找不到global_temp.glob,spark和spark2中均报错

glob_sql2 = "select * from global_temp.glob where a = 47"

temp = spark.sql(glob_sql2)

temp = spark2.sql(glob_sql2)

总结

spark中有四个tempview方法

  • df.createGlobalTempView
  • df.createOrReplaceGlobalTempView
  • df.createOrReplaceTempView
  • df.createTempView

replace方法:不存在则直接创建,存在则替换

tempview删除后无法使用

两个删除方法

spark.catalog.dropTempView("temp")

spark.catalog.dropGlobalTempView("glob")

TempView和GlobalTempView的异同

  1. tempview只能在一个sparkSession中使用
  2. GlobaltempView可以在多个sparkSession中共享使用
  3. 但是他们都不能跨Application使用

以上是 SparkTempView和GlobalTempView的区别 的全部内容, 来源链接: utcz.com/z/534448.html

回到顶部