본문 바로가기
Spark/Spark와 머신 러닝

Spark 시작하기05 - Exception

by java개발자 2016. 4. 3.

16/04/03 21:12:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/04/03 21:12:57 WARN : Your hostname, MSDN-SPECIAL resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:5%net12, but we couldn't find any external IP address!

Exception in thread "main" org.apache.spark.SparkException: Task not serializable

at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)

at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)

at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)

at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)

at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)

at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)

at org.apache.spark.rdd.RDD.map(RDD.scala:323)

at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:96)

at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:46)

at org.test.sparkNmachineLearning3.Ch3_JavaApp.proc2(Ch3_JavaApp.java:106)

at org.test.sparkNmachineLearning3.Ch3_JavaApp.main(Ch3_JavaApp.java:24)

Caused by: java.io.NotSerializableException: org.test.sparkNmachineLearning3.Ch3_JavaApp

Serialization stack:

- object not serializable (class: org.test.sparkNmachineLearning3.Ch3_JavaApp, value: org.test.sparkNmachineLearning3.Ch3_JavaApp@2dd0f797)

- element of array (index: 0)

- array (class [Ljava.lang.Object;, size 1)

- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)

- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.test.sparkNmachineLearning3.Ch3_JavaApp, functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeSpecial org/test/sparkNmachineLearning3/Ch3_JavaApp.lambda$13:(Ljava/lang/String;)Ljava/lang/String;, instantiatedMethodType=(Ljava/lang/String;)Ljava/lang/String;, numCaptured=1])

- writeReplace data (class: java.lang.invoke.SerializedLambda)

- object (class org.test.sparkNmachineLearning3.Ch3_JavaApp$$Lambda$6/1234435772, org.test.sparkNmachineLearning3.Ch3_JavaApp$$Lambda$6/1234435772@57562473)

- field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)

- object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)

at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)

at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)

at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)

at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)

... 13 more








//

seializable 문제...

해결책:

https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Exception-Task-Not-Serializable/td-p/30058


함수를 전달할때, 그 함수 안에서 다른 외부 함수등을 호출하면 종종 위와 같은 exception이 발생한다.
클로저 느낌인가?
익명클래스 final 느낌인가?


>>> [러닝스파크] 책을 보면 위와 같은 현상을 주의!!!시킨다. 람다식 안에서는 serializable 한 클래스나 함수를 사용해야 한다. 왜냐??? 아마... spark 자체적으로 람다를 직렬화를 시켜서,,, 작동하나 보다..추측!