score:-1
Accepted answer
i got the solution i was able to do so.
import scala.beans.beaninfo
import org.apache.spark.{sparkconf, sparkcontext}
import org.apache.spark.ml.pipeline
import org.apache.spark.ml.classification.logisticregression
import org.apache.spark.ml.feature.{hashingtf, tokenizer}
import org.apache.spark.mllib.linalg.vector
import org.apache.spark.sql.{row, sqlcontext}
import org.apache.spark.mllib.linalg.vectors
import org.apache.spark.ml.attribute.nominalattribute
import org.apache.spark.sql.row
import org.apache.spark.sql.types.{structtype,structfield,stringtype}
case class labeleddocument(userid: double, date: string, label: double)
val trainingdata = spark.read.option("inferschema", true).csv("/root/predictiondata10.csv").todf("userid","date","label").todf().as[labeleddocument]
import org.apache.spark.ml.feature.stringindexer
import org.apache.spark.ml.feature.vectorassembler
val dateindexer = new stringindexer().setinputcol("date").setoutputcol("datecat")
val indexed = dateindexer.fit(trainingdata).transform(trainingdata)
val assembler = new vectorassembler().setinputcols(array("datecat", "userid")).setoutputcol("rawfeatures")
val output = assembler.transform(indexed)
val rows = output.select("userid","date","label","datecat","rawfeatures").collect()
val astuple=rows.map(a=>(a.getint(0),a.getstring(1),a.getdouble(2),a.getdouble(3),a(4).tostring()))
val r2 = sc.parallelize(astuple).todf("userid","date","label","datecat","rawfeatures")
val array(training, testdata) = r2.randomsplit(array(0.7, 0.3))
import org.apache.spark.ml.feature.{hashingtf, tokenizer}
val tokenizer = new tokenizer().setinputcol("rawfeatures").setoutputcol("words")
val hashingtf = new hashingtf().setnumfeatures(1000).setinputcol(tokenizer.getoutputcol).setoutputcol("features")
import org.apache.spark.ml.regression.linearregression
val lr = new linearregression().setmaxiter(100).setregparam(0.001).setelasticnetparam(0.0001)
val pipeline = new pipeline().setstages(array(tokenizer, hashingtf, lr))
val model = pipeline.fit(training.todf())
model.transform(testdata.todf()).show()
Source: stackoverflow.com
Related Query
- How to provide multiple columns to setInputCol()
- How do you update multiple columns using Slick Lifted Embedding?
- How to groupBy using multiple columns in scala collections
- How to explode an array into multiple columns in Spark
- How to concatenate multiple columns into single column (with no prior knowledge on their number)?
- How to use spark quantilediscretizer on multiple columns
- How to do aggregation on multiple columns at once in Spark
- How to do distinct on multiple columns after join and then sort and select latest for each group?
- How to split Comma-separated multiple columns into multiple rows?
- How to add multiple columns in a spark dataframe using SCALA
- How to sort on multiple columns using takeOrdered?
- How to sort the data on multiple columns in apache spark scala?
- How to combine multiple columns in one case class field when using lifted embedding?
- Spark: How to convert a String to multiple columns
- How to drop multiple columns from JSON body using scala
- How to check whether multiple columns values of a row are not null and then add a true/false resulting column in Spark Scala
- How to explode two array fields to multiple columns in Spark?
- How to create generic method to update multiple columns in Slick?
- How do I explode multiple columns of arrays in a Spark Scala dataframe when the columns contain arrays that line up with one another?
- How to partition a dataframe on multiple columns and write the output to xlsx in Apache Spark
- How to convert List to Row with multiple columns
- How to unpivot the table based on the multiple columns
- How do I send multiple columns to a udf from a When Clause in Spark dataframe?
- How to use group by for multiple columns with count?
- In Slick, how to provide default columns for Table
- How to pass multiple columns in setLabelCol to xgboost in spark mllib?
- How to explode multple columns to multiple rows and add an additional column, based on exploded ones?
- how to concat multiple columns in spark while getting the column names to be concatenated from another table (different for each row)
- How to extract efficiently multiple columns from a single string column RDD?
- How to split a text file into multiple columns with Spark
More Query from same tag
- How can I make Eclipse detect Scala JUnit tests where package doesn't match folder?
- Xml attribute rendering oddness
- How to add to set and increase counter in scala?
- Scala actors exception "react on channel belonging to other actor"
- How to combine multiple PNGs into one big PNG file?
- Read XML in spark 2.2 with java and expected output in key value format
- Library-Scala on IntelliJ
- Companion object for Set in Scala
- Get arguments back from partially applied function in scala
- Idiomatic Scala for applying functions in a chain if Option(s) are defined
- scala generic sort function doesn't work with arrays
- In Scala, when is it necessary to specify laziness?
- Creating graph from text file functionally using scala
- Add element to Seq[String] in Scala
- Scala Spark how to use --files
- Akka Route TestKit could not unmarshall response as String
- What is Scala's Comparable trait?
- Scala, Cake Pattern, and MacWire
- How to perform "Lookup" operation on Spark dataframes given multiple conditions
- Sbt / javaAgents / force jar-with-dependencies
- Use foldLeft to replace occurrences of character in String
- Regular expressions in scala, find a value corresponding to a key
- Select and remove element on list with many term in Scala
- Generate a Confusion Matrix in Spark Scala implementing RandomForest of MLlib
- Implement a new feature with loose coupling
- Spark Scala read CSV which has a comma in the data
- Find all indices of pattern in string?
- How to get String value from Rep[String] using slick 3 in Play scala?
- Get maximum from struct with select on all fields Spark dataframe
- get the distinct elements of an ArrayType column in a spark dataframe