Returns a function that can be used to read a single file in as an Iterator of InternalRow.
Returns a function that can be used to read a single file in as an Iterator of InternalRow.
The global data schema. It can be either specified by the user, or reconciled/merged from all underlying data files. If any partition columns are contained in the files, they are preserved in this schema.
The schema of the partition column row that will be present in each PartitionedFile. These columns should be appended to the rows that are produced by the iterator.
The schema of the data that should be output for each row. This may be a subset of the columns that are present in the file if column pruning has occurred.
A set of filters than can optionally be used to reduce the number of rows output
A set of string -> string configuration options.
Exactly the same as buildReader except that the reader function returned by this method appends partition values to InternalRows produced by the reader function buildReader returns.
Exactly the same as buildReader except that the reader function returned by this method appends partition values to InternalRows produced by the reader function buildReader returns.
When possible, this method should return the schema of the given files
.
When possible, this method should return the schema of the given files
. When the format
does not support inference, or no valid files are given should return None. In these cases
Spark will require that user specify the schema manually.
Returns whether a file with path
could be splitted or not.
Returns whether a file with path
could be splitted or not.
Prepares a write job and returns an OutputWriterFactory.
Prepares a write job and returns an OutputWriterFactory. Client side job preparation can be put here. For example, user defined output committer can be configured here by setting the output committer class in the conf of spark.sql.sources.outputCommitterClass.
The string that represents the format that this data source provider uses.
The string that represents the format that this data source provider uses. This is overridden by children to provide a nice alias for the data source. For example:
override def shortName(): String = "parquet"
1.5.0
Returns whether this format support returning columnar batch or not.
Returns whether this format support returning columnar batch or not.
TODO: we should just have different traits for the different formats.
Provides access to CSV data from pure SQL statements.