I have a pyspark dataframe which looks like below

df

num11   num21 
10     10
20     30 
5      25

I am filtering above dataframe on all columns present, and selecting rows with number greater than 10 [no of columns can be more than two]

from pyspark.sql.functions import col
col_list = df.schema.names
df_fltered = df.where(col(c) >= 10 for c in col_list)

desired output is :

num11    num21
10       10
20       30

How can we achieve filtering on multiple columns using iteration on column list as above. [all efforts are appriciated]

[error i reveive is : condition should be string or column]

score:1

Accepted answer

You can use functools.reduce to combine the column conditions, to simulate an all condition, for instance, you can use reduce(lambda x, y: x & y, ...):

import pyspark.sql.functions as F
from functools import reduce

df.where(reduce(lambda x, y: x & y,  (F.col(x) >= 10 for x in df.columns))).show()
+-----+-----+
|num11|num21|
+-----+-----+
|   10|   10|
|   20|   30|
+-----+-----+

score:2

As an alternative, if you not averse to some sql-like snippets of code, the following should work:

df.where("AND".join(["(%s >=10)"%(col) for col in col_list]))