Accepted answer

You get the error because you are returning 1 as an Integer. Also, bigint in hive is actually a Long. So your else is returning Long and your if is returning a Int which makes the return type of your UDF Any which isn't supported by Spark DataFrame. Here's a list of supported datatypes

If you use df.schema, it'll show you that what you actually need is LongType

val df = sqlContext.sql(" select cast(2 as bigint) as a ")
// df: org.apache.spark.sql.DataFrame = [a: bigint]

// root
//  |-- a: long (nullable = false)

// res16: org.apache.spark.sql.types.StructType = StructType(StructField(a,LongType,false))

Your UDF should look something like:

val makeSIfTesla = udf {(make: Long) => if(make == 0) 1.toLong else make}
//makeSIfTesla : UserDefinedFunction = UserDefinedFunction(<function1>,LongType,List(LongType))

However, for something as simple as this, you really don't need a UDF. You can use the when-otherwise construct available in Spark.

df.withColumn("x" , when($"x" === lit(0) , lit(1) ).otherwise($"x") )

where x is the column you are passing to your UDF makeSIfTesla.


Fix the code like below:

val makeSIfTesla = udf {(make: BigInt) => if(make == 0) BigInt(1) else make}

The problem was that 1 is Int, and make is BigInt, so the method in the udf was returning Any. Any is not supported with udf function and hence the error you see. Making the type consistent makes the method return BigInt and fixes the issue. You can also make make's type Int

Related Query

More Query from same tag