Skip to content

Commit

Permalink
Addressed Josh's comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
tdas committed Dec 11, 2014
1 parent ce299e4 commit f53154a
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 9 deletions.
2 changes: 1 addition & 1 deletion docs/_layouts/global.html
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
<!-- Google analytics script -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-32518208-1']);
_gaq.push(['_setAccount', 'UA-32518208-2']);
_gaq.push(['_trackPageview']);

(function() {
Expand Down
6 changes: 3 additions & 3 deletions docs/streaming-custom-receivers.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ there are two kinds of receivers based on their reliability and fault-tolerance

To implement a *reliable receiver*, you have to use `store(multiple-records)` to store data.
This flavour of `store` is a blocking call which returns only after all the given records have
been stored inside Spark. If replication is enabled receiver's configured storage level
been stored inside Spark. If the receiver's configured storage level uses replication
(enabled by default), then this call returns after replication has completed.
Thus it ensures that the data is reliably stored, and the receiver can now acknowledge the
source appropriately. This ensures that no data is caused when the receiver fails in the middle
Expand All @@ -226,7 +226,7 @@ not get the reliability guarantees of `store(multiple-records)`, it has the foll
- The system takes care of chunking that data into appropriate sized blocks (look for block
interval in the [Spark Streaming Programming Guide](streaming-programming-guide.html)).
- The system takes care of controlling the receiving rates if the rate limits have been specified.
- Because of these two, *unreliable receivers are simpler to implement than reliable receivers.
- Because of these two, unreliable receivers are simpler to implement than reliable receivers.

The following table summarizes the characteristics of both types of receivers

Expand All @@ -240,7 +240,7 @@ The following table summarizes the characteristics of both types of receivers
<td>
Simple to implement.<br>
System takes care of block generation and rate control.
No fault-tolerance guarantees, can loose data on receiver failure.
No fault-tolerance guarantees, can lose data on receiver failure.
</td>
</tr>
<tr>
Expand Down
2 changes: 1 addition & 1 deletion docs/streaming-flume-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Instead of Flume pushing data directly to Spark Streaming, this approach runs a
and transactions to pull data from the sink. Transactions succeed only after data is received and
replicated by Spark Streaming.

This ensures that stronger reliability and
This ensures stronger reliability and
[fault-tolerance guarantees](streaming-programming-guide.html#fault-tolerance-semantics)
than the previous approach. However, this requires configuring Flume to run a custom sink.
Here are the configuration steps.
Expand Down
8 changes: 4 additions & 4 deletions docs/streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1568,15 +1568,15 @@ To run a Spark Streaming applications, you need to have the following.


- *[Experimental in Spark 1.2] Configuring write ahead logs* - In Spark 1.2,
we have introduced a new experimental feature of write ahead logs for achieved strong
we have introduced a new experimental feature of write ahead logs for achieving strong
fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into
a write ahead log in the configuration checkpoint directory. This prevents data loss on driver
recovery, thus ensuring zero data loss (discussed in detail in the
[Fault-tolerance Semantics](#fault-tolerance-semantics) section). This can be enabled by setting
the [configuration parameter](configuration.html#spark-streaming)
`spark.streaming.receiver.writeAheadLogs.enable` to `true`. However, this stronger semantics may
come at the cost of the receiving throughput of individual receivers. can be corrected by running
[more receivers in parallel](#level-of-parallelism-in-data-receiving)
`spark.streaming.receiver.writeAheadLogs.enable` to `true`. However, these stronger semantics may
come at the cost of the receiving throughput of individual receivers. This can be corrected by
running [more receivers in parallel](#level-of-parallelism-in-data-receiving)
to increase aggregate throughput. Additionally, it is recommended that the replication of the
received data within Spark be disabled when the write ahead log is enabled as the log is already
stored in a replicated storage system. This can be done by setting the storage level for the
Expand Down

0 comments on commit f53154a

Please sign in to comment.