score:0
Two small issues with your code:
htf
variable isn't used, I assume it's missing from the pipeline? Since this is thePipelineStage
creating therawFeatures
field required by the next stage, you get theField does not exist
error.Even if we fix this - the last stage (LogisticRegression) will fail because it requires a
label
field with typeDoubleType
, in addition to thefeatures
field. You'll need to add such a field to your dataframe before fitting.
Changing the last rows in your code ..
// pipeline - with "htf" stage added
val pipeline = new Pipeline().setStages(Array(tokenizer, htf, idf, lr))
//Model - with an addition (constant...) label field
val model = pipeline.fit(dataframe.withColumn("label", lit(1.0)))
... will make this finish successfully, but of course the labeling here is just for the example's sake, create the labels as you see fit.
Source: stackoverflow.com
Related Query
- Multiclass classification in Spark with Term Frequency
- Spark Multiclass Classification Example
- Multiclass SVM with Spark 1.6?
- How to count the frequency of words with CountVectorizer in spark ML?
- how to find a line in a text with specific term in spark scala
- Bigram frequency of query logs events with Apache Spark
- How to calculate Binary Classification Metrics in Spark MLlib with Dataframe API
- Multiclass classification with Gradient Boosting Trees in Spark: only supporting binary classification
- Multiclass Classification Evaluator field does not exist error - Apache Spark
- RandomForestClassifier for multiclass classification Spark 2.x
- Multiclass classification, show raw predictions better in Scala with Spark
- run multiclass classification using spark ml pipeline
- Scala Multiclass classification with labeled point
- Using Spark ML's Logistic Regression model on MultiClass Classification giving error : Column prediction already exists
- Querying Spark SQL DataFrame with complex types
- aggregate function Count usage with groupBy in Spark
- Why does Spark fail with java.lang.OutOfMemoryError: GC overhead limit exceeded?
- Optimal way to create a ml pipeline in Apache Spark for dataset with high number of columns
- Filter Spark DataFrame by checking if value is in a list, with other criteria
- Aggregating multiple columns with custom function in Spark
- Create new column with function in Spark Dataframe
- Why does Spark fail with "Detected cartesian product for INNER join between logical plans"?
- How to create correct data frame for classification in Spark ML
- Column name with dot spark
- Perform a typed join in Scala with Spark Datasets
- Spark / Scala: forward fill with last observation
- How to use orderby() with descending order in Spark window functions?
- Reading TSV into Spark Dataframe with Scala API
- Merge Spark output CSV files with a single header
- Using Scala 2.12 with Spark 2.x
More Query from same tag
- How do I get sbt to not try to compile a directory whose name ends in .scala as a Scala source file?
- Laziness in ScalaFX nodes
- copy a file to unix server from windows shared folder using unix command
- how can I change the coursier cache directory of the bundled sbt in IntelliJ
- Scala Parse recursive JSON
- Handle exception in for comprehension
- Spark PageRank Tuning
- Caused by: java.sql.SQLException: JDBC4 Connection.isValid() method not supported
- Akka timeout resource leak
- Can you dynamically generate Test names for ScalaTest from input data?
- How to filter out incorrect records?
- scala, swing : thread issue with the Event Dispatch Thread(actors)
- RDD.map function hangs in Spark
- Why is ClassManifest needed with Array but not List?
- matching more than one element from a sequence of elements
- How to use SMTP server in scala to send emails
- where do i put html files in my web-app folder for a lift project with maven?
- Clone and build Scalala with Eclipse/IntelliJ
- Scala/Spark SQL Array[row] to Array(Array(values))
- Type parameters versus member types in Scala
- How to test sbt resolvers
- Scala implicits for specialization
- Why this Scala code doesn't type checked
- filtering the data which is of iretable[map[string,string ]] type
- How to return number of times a base case is reached
- Type refinements in Scala but without using refined
- How to "contramap" akka-streams Sink
- How to read and write Map<String, Object> from/to parquet file in Java or Scala?
- How to store date/time with ReactiveMongo + Play Json into mongodb
- How to assign value to a generic variable in scala?