当前位置：首页 > news >正文

pyspark null类型在 json.dumps(null) 之后，会变为字符串‘null‘

news 2025/9/18 8:17:16

在将 hive 数仓数据写入 MySQL 时候，有时我们需将数据转为 json 字符串，然后再存入 MySQL。但 hive 数仓中的 null 类型遇到 json 函数之后会变为 ‘null’ 字符串，这时我们只需在使用 json 函数之前对值进行判断即可，当值为 null 时，直接返回 null, 当值非null 时，则使用 json 函数

1 正常情况

在 pyspark 中执行如下代码

history_loc_df = spark.sql("""SELECTuser_id,null as active_points,'20230405'  as ymdFROM tmp.tmp_user"""
export_data_mysql(mysql_result_df)

在这里插入图片描述
将 history_loc_df 数据存入 MySQL，null 数据会为空，如下所示

2 null 类型变为 ‘null’ 字符串

使用 to_json 函数之后，null 类型会变为 ‘null’ 字符串

def to_json(info):return json.dumps(info)# udf 注册: 转为 json
spark.udf.register("to_json", to_json, StringType())history_loc_df = spark.sql("""SELECTuser_id,to_json(null) as active_points,'20230405'  as ymdFROM tmp.tmp_user"""export_data_mysql(mysql_result_df)

这时将 history_loc_df 数据存入 MySQL，null 数据会变为字符串，如下所示
在这里插入图片描述

3 在 to_json 之前判断是否为空

若想使用 to_json 函数，当遇到 null 值，返回 null 类型，遇到其它值则转为 json 字符串

只需要在转为 json 字符串之前对值进行判断即可

def to_json(info):return json.dumps(info)# udf 注册: 转为 json
spark.udf.register("to_json", to_json, StringType())history_loc_df = spark.sql("""SELECTuser_id,if(active_points is null, null, to_json(null)) as active_points,'20230405'  as ymdFROM tmp.tmp_user"""
export_data_mysql(mysql_result_df)