HadoopFsRelation

Instance Constructors

new HadoopFsRelation(location: FileIndex, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String])(sparkSession: SparkSession)

location
A FileIndex that can enumerate the locations of all the files that comprise this relation.
partitionSchema
The schema of the columns (if any) that are used to partition the relation
dataSchema
The schema of any remaining columns. Note that if any partition columns are present in the actual data files as well, they are preserved.
bucketSpec
Describes the bucketing (hash-partitioning of the files by some column values).
fileFormat
A file format that can be used to read and write the data in files.
options
Configuration used when reading / writing data.

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
val bucketSpec: Option[BucketSpec]

Describes the bucketing (hash-partitioning of the files by some column values).
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
val dataSchema: StructType

The schema of any remaining columns.
The schema of any remaining columns. Note that if any partition columns are present in the actual data files as well, they are preserved.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
val fileFormat: FileFormat

A file format that can be used to read and write the data in files.
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def inputFiles: Array[String]

Returns the list of files that will be read when scanning this relation.
Returns the list of files that will be read when scanning this relation.

Definition Classes
HadoopFsRelation → FileRelation
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val location: FileIndex

A FileIndex that can enumerate the locations of all the files that comprise this relation.
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def needConversion: Boolean

Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String to UTF8String java.lang.Decimal to Decimal
Whether does it need to convert the objects in Row to internal representation, for example: java.lang.String to UTF8String java.lang.Decimal to Decimal
If needConversion is false, buildScan() should return an RDD of InternalRow

Definition Classes
BaseRelation
Since
1.4.0
Note
The internal representation is not stable across releases and thus data sources outside of Spark SQL should leave this as true.
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val options: Map[String, String]

Configuration used when reading / writing data.
val partitionSchema: StructType

The schema of the columns (if any) that are used to partition the relation
def partitionSchemaOption: Option[StructType]
val schema: StructType

Definition Classes
HadoopFsRelation → BaseRelation
def sizeInBytes: Long

Returns an estimated size of this relation in bytes.
Returns an estimated size of this relation in bytes. This information is used by the planner to decide when it is safe to broadcast a relation and can be overridden by sources that know the size ahead of time. By default, the system will assume that tables are too large to broadcast. This method will be called multiple times during query planning and thus should not perform expensive operations for each invocation.

Definition Classes
HadoopFsRelation → BaseRelation
Since
1.3.0
Note
It is always better to overestimate size than underestimate, because underestimation could lead to execution plans that are suboptimal (i.e. broadcasting a very large table).
val sparkSession: SparkSession
def sqlContext: SQLContext

Definition Classes
HadoopFsRelation → BaseRelation
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
HadoopFsRelation → AnyRef → Any
def unhandledFilters(filters: Array[Filter]): Array[Filter]

Returns the list of Filters that this datasource may not be able to handle.
Returns the list of Filters that this datasource may not be able to handle. These returned Filters will be evaluated by Spark SQL after data is output by a scan. By default, this function will return all filters, as it is always safe to double evaluate a Filter. However, specific implementations can override this function to avoid double filtering when they are capable of processing a filter internally.

Definition Classes
BaseRelation
Since
1.6.0
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package datasources

case class HadoopFsRelation(location: FileIndex, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String])(sparkSession: SparkSession) extends BaseRelation with FileRelation with Product with Serializable

Instance Constructors

new HadoopFsRelation(location: FileIndex, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String])(sparkSession: SparkSession)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

val bucketSpec: Option[BucketSpec]

def clone(): AnyRef

val dataSchema: StructType

final def eq(arg0: AnyRef): Boolean

val fileFormat: FileFormat

def finalize(): Unit

final def getClass(): Class[_]

def inputFiles: Array[String]

final def isInstanceOf[T0]: Boolean

val location: FileIndex

final def ne(arg0: AnyRef): Boolean

def needConversion: Boolean

final def notify(): Unit

final def notifyAll(): Unit

val options: Map[String, String]

val partitionSchema: StructType

def partitionSchemaOption: Option[StructType]

val schema: StructType

def sizeInBytes: Long

val sparkSession: SparkSession

def sqlContext: SQLContext

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

def unhandledFilters(filters: Array[Filter]): Array[Filter]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from FileRelation

Inherited from BaseRelation

Inherited from AnyRef

Inherited from Any

Ungrouped