@Evolving
public interface StreamingWrite
createStreamingWriterFactory(PhysicalWriteInfo)
,
serialize and send it to all the partitions of the input data(RDD).
2. For each epoch in each partition, create the data writer, and write the data of the epoch in
the partition with this writer. If all the data are written successfully, call
DataWriter.commit()
. If exception happens during the writing, call
DataWriter.abort()
.
3. If writers in all partitions of one epoch are successfully committed, call
commit(long, WriterCommitMessage[])
. If some writers are aborted, or the job failed
with an unknown reason, call abort(long, WriterCommitMessage[])
.
While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should
do it manually in their Spark applications if they want to retry.
Please refer to the documentation of commit/abort methods for detailed specifications.Modifier and Type | Method and Description |
---|---|
void |
abort(long epochId,
WriterCommitMessage[] messages)
Aborts this writing job because some data writers are failed and keep failing when retried, or
the Spark job fails with some unknown reasons, or
commit(long, WriterCommitMessage[])
fails. |
void |
commit(long epochId,
WriterCommitMessage[] messages)
Commits this writing job for the specified epoch with a list of commit messages.
|
StreamingDataWriterFactory |
createStreamingWriterFactory(PhysicalWriteInfo info)
Creates a writer factory which will be serialized and sent to executors.
|
StreamingDataWriterFactory createStreamingWriterFactory(PhysicalWriteInfo info)
info
- Information about the RDD that will be written to this data writervoid commit(long epochId, WriterCommitMessage[] messages)
DataWriter.commit()
.
If this method fails (by throwing an exception), this writing job is considered to have been
failed, and the execution engine will attempt to call
abort(long, WriterCommitMessage[])
.
The execution engine may call `commit` multiple times for the same epoch in some circumstances.
To support exactly-once data semantics, implementations must ensure that multiple commits for
the same epoch are idempotent.void abort(long epochId, WriterCommitMessage[] messages)
commit(long, WriterCommitMessage[])
fails.
If this method fails (by throwing an exception), the underlying data source may require manual
cleanup.
Unless the abort is triggered by the failure of commit, the given messages will have some
null slots, as there may be only a few data writers that were committed before the abort
happens, or some data writers were committed but their commit messages haven't reached the
driver when the abort is triggered. So this is just a "best effort" for data sources to
clean up the data left by data writers.