score:1
Allow me to start with two short observations:
- I believe that it would be safer to avoid naming our columns with single numbers. Think of the case where we need to evaluate the expression
1 is not null
. Here it is ambiguous whether we meancolumn 1
or thevalue 1
itself. - As far as I am aware, it is not performant to store and process the target columns through a dataframe. That would create an overhead that can be easily avoided by using a single scala collection i.e: Seq, Array, Set, etc.
And here is the solution to your problem:
import org.apache.spark.sql.functions.col
val id_df = Seq(
("c1","gender"),
("c2","city"),
("c3","state"),
("c4","age")
).toDF("id","type")
val main_df = Seq(
("male", "los angeles", null),
("female", "new york", "new york"),
("trans", null, "new york")
).toDF("c1","c2","c3")
val targetCols = id_df.collect()
.map{_.getString(0)} //get id
.toSet //convert current sequence to a set (required for the intersection)
.intersect(main_df.columns.toSet) //get common columns with main_df
.map(col(_).isNotNull) //convert c1,..cN to col(c[i]).isNotNull
.reduce(_ && _) // apply the AND operator between items
// (((c1 IS NOT NULL) AND (c2 IS NOT NULL)) AND (c3 IS NOT NULL))
main_df.withColumn("meets_conditions", targetCols).show(false)
// +------+-----------+--------+----------------+
// |c1 |c2 |c3 |meets_conditions|
// +------+-----------+--------+----------------+
// |male |los angeles|null |false |
// |female|new york |new york|true |
// |trans |null |new york|false |
// +------+-----------+--------+----------------+
Source: stackoverflow.com
Related Query
- How to check whether multiple columns values of a row are not null and then add a true/false resulting column in Spark Scala
- How to add new columns and the corresponding row specific values to a spark dataframe?
- Check content of multiple columns of one row and add new column with value depending on contents
- How to do distinct on multiple columns after join and then sort and select latest for each group?
- Spark Scala - How do I iterate rows in dataframe, and add calculated values as new columns of the data frame
- How to check if a String contains multiple values and ignore case with StringUtils
- Are these values empty or null and how do I drop these columns?
- How to explode multple columns to multiple rows and add an additional column, based on exploded ones?
- How to check whether column names and data associated with it matches or not in spark scala
- How to count occurrences of different values in multiple columns all at once where number or name of columns is not known?
- how to merge the multiple columns in single columns using UDF and remove the 0 value row from the column in pyspark
- How to iterate over a List of numbers and check if there are pairs of numbers that add up to a given value using Scala?
- How to declare empty list and then add string in scala?
- How to compare two dataframe and print columns that are different in scala
- Spark error: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
- How to add new columns to DataFrame given their names when they are missing?
- how shall I check that a Future is not null
- How to add null values in an array in spark scala
- How to add multiple columns in a spark dataframe using SCALA
- According to the Scala Language Specification, packages are AnyRef values and have types. How does this make sense?
- Why in-scope implicit values typed A and B are not ambiguous when B extends A?
- Split multiple fields or columns of a single row and create multiple rows using Scala
- Multiple apps are getting submitted to spark Cluster and keeps in waiting and then exits withError
- AWS EMR add step: How to add multiple jars from s3 in --jars and --driver-class-path options?
- How to add null columns to complex array struct in Spark with a udf
- How to check whether all elements in list are numeric?
- How to groupby columns from two dataframes and then apply aggregate difference function between rows?
- In Apache Spark DataFrame, how to drop all columns where all non None values are identical?
- How to check if values of a DateType column are within a range of specified dates?
- How to add rows and columns dynamically to a Breeze DenseMatrix in Scala
More Query from same tag
- Apache KafkaProducer throwing TimeoutException when sending a message
- How to copy the "first" row of a spark data frame to another data frame? Why does my minimal example fails?
- Scala: filter with different conditions specified by tuples in a set
- Scala RDD - Relaxing data aggregation based on criteria
- Dynamic bind in Lift framework
- Why is Scala Source.fromURL slower than Apaches IOUtils?
- NoSuchMethodError on apache spark. Sbt dispatcher library not found
- Can alpakka-xml process multiple xml files?
- Implicit lookup for Typeclass of None is not compatible with Contravariant Typeclass of Option
- How to create generic implicit conversions for json marshalling
- Is there any ways to make the code more clean
- restrictions on generator types in scala for comprehension
- Map and Split the data based on key in Spark Scala
- Meaning of the Singleton word in Scala
- exitBlockOnFail causes my failing my script with "failed to execute: No attribute named "
- akka-http : could not find implicit value for parameter unmarshalling
- Finding the average of corresponding values by zipping
- How to iterate over XML elements which can have optional attributes?
- Can I recursively apply transformations to a Spark dataframe in scala?
- Iterate and trim string based on condition in spark Scala
- Aggregating analyses over sbt subprojects
- What means "not enclosing class" here in Scala
- How to update weights on Scala-Graph?
- How to provide Java callbacks (with 'void' return type), to Scala?
- scala: get column name corresponding to max column value from variable columns list
- Why Spark writes Null in DeltaLake Table
- Play Framework Json Object mapping partial objects
- play scala json of map with list
- scala - how to substring column names after the last dot?
- How does scala sandboxing work