score:5
With Spark SQL each line must contain a separate, self-contained valid JSON otherwise the computation fails.
However you can try this
spark.read.json(spark.sparkContext.wholeTextFiles("path to json").values)
or
spark.read.option("wholeFile", true).option("mode", "PERMISSIVE").json("path to json")
This should convert the json to the dataframe.
score:0
Adding to this, as it took some time to understand it, when querying for something nested inside totals
you may have to use the "explode" method:
Dataset<Row> socials = sparkSession
.read()
.option("multiLine", true)
.option("mode", "PERMISSIVE")
.json(<path to file>).cache();
socials.select(org.apache.spark.sql.functions.explode(socials.col("total")).as("t")).where("t.<some nested column under total> = 'foo'").toJSON().collectAsList();
This is for Java Spark but hope the explode method be of some help.
score:1
Given input file with json
data as
{
"element" : value,
"id" : value,
"total" : []
}
{
"element" : value,
"id" : value,
"total: []
}
which is not a valid json to be converted to a dataframe
, so you have to convert the data into valid spark readable json format.
val rdd = sc.wholeTextFiles("path to the json file")
val validJsonRdd = rdd.flatMap(_._2.replace(" ", "").replace("\n", "").replace(":value", ":\"value\"").replace("}{", "}\n{").split("\n"))
Above step would work only if you have value string without inverted commas in element and id fields. Otherwise you can modify it according to your needs.
Next step is to convert into dataframe
using sqlcontext
.
val df = sqlContext.read.json(validJsonRdd)
which should result to
+-------+-----+-----+
|element|id |total|
+-------+-----+-----+
|value |value|[] |
|value |value|[] |
+-------+-----+-----+
now you should be able to select id
and respective totals
and play with them
I hope the answer is helpful
Source: stackoverflow.com
Related Query
- Read JSON inside a text file using spark and Scala
- Flattening a json file using Spark and Scala
- How to read a fixed length file in Spark using DataFrame API and SCALA
- How to find keys within a text file and compare them to another using Spark & JSON
- How to filter the data from Rdd and save it to text file using scala in spark
- How to read a text file in S3 bucket from inside an AWS EMR without using spark
- Read a CSV file with , as delim and numeric data also contain , separator to create RDD in Spark using Scala
- read csv file with json as string in csv and convert to json apache spark scala
- Spark scala dataframe read and show multiline json file
- How can i read and modify a json file in scala spark play
- read a large csv file and separate it according to conditions using scala / spark
- Scala fast text file read and upload to memory
- Write and read raw byte arrays in Spark - using Sequence File SequenceFile
- How to read json data using scala from kafka topic in apache spark
- How to read a text file using Relative path in scala
- How to read the json file in spark using scala?
- Writing CSV file using Spark and scala - empty quotes instead of Null values
- Read from Kafka topic process the data and write back to Kafka topic using scala and spark
- How to read text file using Scala(spark) line by line and split using delimiter and store values in respective columns?
- Read JSON files from multiple line file in spark scala
- Flatten any nested json string and convert to dataframe using spark scala
- How to read .csv files using spark streaming and write to parquet file using Scala?
- How to read json file and convert to case class with Spark and Spray Json
- Read external json file into RDD and extract specific values in scala
- How to Read and Change position of lines in a file using scala
- Text File of specific format into DataFrame in Spark using Scala
- Process large text file using Zeppelin and Spark
- Read Content from Files which are inside Zip file using Scala/pysprak and also on Databricks and file stored on ADLS
- How to read a JSON file into a Map, using Scala
- How to read a text file from HDFS in Scala natively (without using Spark)?
More Query from same tag
- Receiving HTTP request params in Lift
- Error: Could not find or load main class ... in eclipse
- Iterate across columns in spark dataframe and calculate min max value
- How to implement a custom tail recursive map for lists in Scala
- Could Java(Scala) blocking queue allows queue jumper(higher priority)?
- Error in verifying sbt installation
- Using Scala match case with lists
- Is it possible to include directory information in spark.read.csv?
- Performance Difference Using Update Operation on a Mutable Map in Scala with a Large Size Data
- DAG scheduler repeating the processing stages while using unionALL
- Replace all invalid escape characters
- Passing parameter - number of users
- Scala regex and for comprehension
- removing alternating elements on a list in scala
- Make constructor params implicitly available in class body
- Scala if condition for string gives different result
- dl4j MultiLayerNetwork does not have a constructor that take NeuralNetConfiguration (version 0.9.1)
- Specs2 Given when then A simple test failed
- Querying a continously running operation for its current state/value in Scala
- ScalaMock won't mock my TextMessage
- Play Framework Deployment Failing
- Spark Driver Pod Getting Stuck in init state due to no driver configmap found
- Why the function call never gets executed?
- Possible to perform pattern match on a generic value with type conforming result?
- Scala foldLeft adding one more element?
- How to implement "unescape" in Scala?
- How to approach the simplification of expression(and, or, not) in Scala?
- Better way to write Monoid for Filter returning Disjunction in Scalaz
- List of Scala's "magic" functions
- creating implicit type class instance in generic way for Seq collections