score:1

Accepted answer

you can get rdd[byte] from rdd[string] by doing rdd.flatmap(s => s.getbytes) however beware - it very well might happen that string has 2 bytes per character (depends on locale settings, i guess).

also when you have rdd[byte] you will need to call, for example, mappartitions give your data as array[byte] to your c code. in that case you will have quite large arrays passed to your c code, but for each partition the c app will be called only once. another way would be to use rdd.map(s => s.getbytes) in which case you will have rdd[array[byte]] and thus you will have multiple c application runs per partition.

i think you can try to pipe() api for launching your c code and just pipeline rdd elements to your c code and get output of your c application for further processing.


Related Query

More Query from same tag