score:0

Two small issues with your code:

  1. htf variable isn't used, I assume it's missing from the pipeline? Since this is the PipelineStage creating the rawFeatures field required by the next stage, you get the Field does not exist error.

  2. Even if we fix this - the last stage (LogisticRegression) will fail because it requires a label field with type DoubleType, in addition to the features field. You'll need to add such a field to your dataframe before fitting.

Changing the last rows in your code ..

// pipeline - with "htf" stage added
val pipeline = new Pipeline().setStages(Array(tokenizer, htf, idf, lr))
//Model - with an addition (constant...) label field 
val model = pipeline.fit(dataframe.withColumn("label", lit(1.0)))

... will make this finish successfully, but of course the labeling here is just for the example's sake, create the labels as you see fit.


Related Query

More Query from same tag