score:4

Accepted answer

after troubleshooting some previous methods i had attempted before, i've come across the following fix:

in my pom.xml i excluded the hadoop-client dependency automatically imported by the spark-core jar. this dependency was version 2.6.5 which conflicted with the cluster's version of hadoop. instead, i import the version i require.

<dependency>
            <groupid>org.apache.spark</groupid>
            <artifactid>spark-core_${scala.version.major}</artifactid>
            <version>${spark.version}</version>
            <exclusions>
                <exclusion>
                    <groupid>org.apache.hadoop</groupid>
                    <artifactid>hadoop-client</artifactid>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupid>org.apache.hadoop</groupid>
            <artifactid>hadoop-client</artifactid>
            <version>${hadoop.version}</version>
        </dependency>
</dependency>

after making this change, i encountered the error java.lang.unsatisfiedlinkerror: org.apache.hadoop.io.nativeio.nativeio$windows.access0. further research revealed this was due to a problem with the hadoop configuration on my local machine. per this article's advice, i modified the winutils.exe version i had under c://winutils/bin to be the version i required and also added the corresponding hadoop.dll. after making these changes, i was able to successfully read data from blob storage as expected.

tldr issue was the auto imported hadoop-client dependency which was fixed by excluding it & adding the new winutils.exe and hadoop.dll under c://winutils/bin.

this no longer required downgrading the hadoop versions within the hdinsight cluster or changing my downloaded spark version.

score:2

problem: i was facing same issue while running fat jar with hadoop 2.7 and spark 2.4 on cluster with hadoop 3.x , i was using maven shade plugin.

observation: while building fat jar it was including jar org.apache.hadoop:hadoop-hdfs:jar:2.6.5 which has class class org.apache.hadoop.hdfs.web.hftpfilesystem. which was causing problem in hadoop 3

solution: i have excluded this jar while building fat jar as below.issue got resolved.

enter image description here


Related Query

More Query from same tag