score:1
scala> val df = Seq(("Eric" ,"Theodore", "Cartman"), ("Butters", "Leopold", "Stotch")).toDF.select(concat($"_1", lit(" "), ($"_2")) as "first_and_middle_name", $"_3" as "last_name")
df: org.apache.spark.sql.DataFrame = [first_and_middle_name: string, last_name: string]
scala> df.show
+---------------------+---------+
|first_and_middle_name|last_name|
+---------------------+---------+
| Eric Theodore| Cartman|
| Butters Leopold| Stotch|
+---------------------+---------+
scala> val ccnames = df.columns.map(sc => {val ccn = sc.split("_")
| (ccn.head +: ccn.tail.map(_.capitalize)).mkString
| })
ccnames: Array[String] = Array(firstAndMiddleName, lastName)
scala> df.toDF(ccnames: _*).show
+------------------+--------+
|firstAndMiddleName|lastName|
+------------------+--------+
| Eric Theodore| Cartman|
| Butters Leopold| Stotch|
+------------------+--------+
EDIT: Would this help? Defining a single function that takes loader: String => DataFrame and path: String.
scala> val parquetloader = spark.read.parquet _
parquetloader: String => org.apache.spark.sql.DataFrame = <function1>
scala> val tableloader = spark.read.table _
tableloader: String => org.apache.spark.sql.DataFrame = <function1>
scala> val textloader = spark.read.text _
textloader: String => org.apache.spark.sql.DataFrame = <function1>
// csv loader and others
def snakeCaseToCamelCaseDataFrameColumns(path: String, loader: String => DataFrame): DataFrame = {
val ccnames = loader(path).columns.map(sc => {val ccn = sc.split("_")
(ccn.head +: ccn.tail.map(_.capitalize)).mkString
})
df.toDF(ccnames: _*)
}
scala> :paste
// Entering paste mode (ctrl-D to finish)
def snakeCaseToCamelCaseDataFrameColumns(path: String, loader: String => DataFrame): DataFrame = {
val ccnames = loader(path).columns.map(sc => {val ccn = sc.split("_")
(ccn.head +: ccn.tail.map(_.capitalize)).mkString
})
df.toDF(ccnames: _*)
}
// Exiting paste mode, now interpreting.
snakeCaseToCamelCaseDataFrameColumns: (path: String, loader: String => org.apache.spark.sql.DataFrame)org.apache.spark.sql.DataFrame
val oneDF = snakeCaseToCamelCaseDataFrameColumns(tableloader("/path/to/table"))
val twoDF = snakeCaseToCamelCaseDataFrameColumns(parquetloader("/path/to/parquet/file"))
Source: stackoverflow.com
Related Query
- Scala spark: how to use dataset for a case class with the schema has snake_case?
- How to infer StructType schema for Spark Scala at run time given a Fully Qualified Name of a case class
- How to use withColumn with condition for the each row in Scala / Spark data frame
- How can I get random data generated for scala case classes with the ability to "change some values" for unit testing?
- How to implement a trait with a generic case class that creates a dataset in Scala
- How do I write a scala extractor for a case class with default parameters?
- How to set the type of array with dataset in spark scala
- Issue converting scala case class to spark dataset with embedded options that wrap monads
- How to create Dataset with case class Type Parameter ? (Unable to find encoder for type T)
- How to use Scala Case Class to map Kafka source in Spark Structured Streaming
- How to write a toCSV method for scala case class that would generate the csv string for the class?
- No TypeTag available for a case class using scala 3 with spark 3
- How to use Avro serialization for scala case classes with Flink 1.7?
- How to convert a simple DataFrame to a DataSet Spark Scala with case class?
- How to create schema in Spark with Scala if more than 100 columns in the input?
- How to create a new column for dataset using ".withColumn" with many conditions in Scala Spark
- How to define case class with a list of tuples and access the tuples in scala
- How to parse a csv with matching case class and store the output to treemap[Int, List[List[InputConfig]] object in scala
- scala - equivalent case class for the text parsing that has dynamic number of fields
- Join dataset with case class spark scala
- Nested Spark Row in case class for dealing with data that has variable types?
- How to update a mongo record using Rogue with MongoCaseClassField when case class contains a scala Enumeration
- How to get around the Scala case class limit of 22 fields?
- Scala 2.10 reflection, how do I extract the field values from a case class, i.e. field list from case class
- How do I find the correct Maven archetype project for developing with Scala in Eclipse?
- How to use s3 with Apache spark 2.2 in the Spark shell
- How to specify schema for CSV file without using Scala case class?
- SLICK How to define bidirectional one-to-many relationship for use in case class
- Scala How to use extends with an anonymous class
- How to create a Scala class with private field with public getter, and primary constructor taking a parameter of the same name
More Query from same tag
- lift json :Custom serializer for java 8 LocalDateTime throwing mapping exception
- Executor is taking more memory than defined
- Scala Dataframe : How can I add a column to a Dataframe using a condition between two Dataframes?
- Pass closure to Scala compiler plugin
- saveAsNewAPIHadoopFile() giving error when used as output format
- Gatling Simulating Auto Complete
- Add plugins under a same project in sbt
- what is the best way to get the number of futures running in background in an execution context?
- Porting app from Kafka 0.8.2.1 to Kafka 0.9.0. Reading offsets issue
- How to Append the text file using stored value variable in Scala
- Play ScalaJSON Reads[T] parsing ValidationError(error.path.missing,WrappedArray())
- Scala Generic function assuming type
- Rounding numeric parameter types in Scala
- Scala for loop with enum yield issue
- Neo4j embedded online backup from Java
- Play 2.0: Optional list in query
- Avoiding NPE in trait initialization without using lazy vals
- Is Scala functional programming slower than traditional coding?
- Spark scala Dataframe isin
- Usages of Null / Nothing / Unit in Scala
- canEqual in List returns true for Set argument
- specs and specs2: how to implement doBefore{} in specs2?
- How can I get HBaseTestingUtility to find classes in map reduce jobs?
- Integration tests for a simple Spark application
- Creating custom DOM events with scalajs
- Are cyclic dependencies supported in SBT?
- Calculate min max using Spark dataframe and vertically align output
- What is JavaRDD[(Int, String)]?
- How to implement a write method that return a state
- Filter only Numeric values from a line of Strings