score:15

Accepted answer

Within the code for socketTextStream, Spark creates an instance of SocketInputDStream which uses java.net.Socket https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L73

java.net.Socket is a client socket, which means it is expecting there to be a server already running at the address and port you specify. Unless you have some service running a server on port 7777 of your local machine, the error you are seeing is as expected.

To see what I mean, try the following (you may not need to set master or appName in your environment).

import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf

object MyStream
{
  def main(args:Array[String])
  {
    val sc = new StreamingContext(new SparkConf().setMaster("local").setAppName("socketstream"),Seconds(10))
    val mystreamRDD = sc.socketTextStream("bbc.co.uk",80)
    mystreamRDD.print()
    sc.start()
    sc.awaitTermination()
  }
}

This doesn't return any content because the app doesn't speak HTTP to the bbc website but it does not get a connection refused exception.

To run a local server when on linux, I would use netcat with a simple command such as

cat data.txt | ncat -l -p 7777

I'm not sure what your best approach is in Windows. You could write another application which listens as a server on that port and sends some data.

score:1

Make sure to start the netcat or the port connection before you run the program. nc -lk 8080


Related Query

More Query from same tag