|
| 1 | +# USGS Water Services - Instantaneous Values Service bridge to Apache Kafka, Azure Event Hubs, and Fabric Event Streams |
| 2 | + |
| 3 | +This container image provides a bridge between the [USGS Water Services](https://waterservices.usgs.gov/) Instantaneous Values |
| 4 | +Service and Apache Kafka, Azure Event Hubs, and Fabric Event Streams. The bridge |
| 5 | +fetches entries from specified feeds and forwards them to the configured Kafka |
| 6 | +endpoints. |
| 7 | + |
| 8 | +## Functionality |
| 9 | + |
| 10 | +The bridge retrieves data from the USGS Instantaneous Values Service and writes the entries to a |
| 11 | +Kafka topic as [CloudEvents](https://cloudevents.io/) in a JSON format, which is |
| 12 | +documented in [EVENTS.md](EVENTS.md). You can specify multiple feed URLs by |
| 13 | +providing them in the configuration. |
| 14 | + |
| 15 | +## Database Schemas and handling |
| 16 | + |
| 17 | +If you want to build a full data pipeline with all events ingested into |
| 18 | +database, the integration with Fabric Eventhouse and Azure Data Explorer is |
| 19 | +described in [DATABASE.md](../DATABASE.md). |
| 20 | + |
| 21 | +## Installing the Container Image |
| 22 | + |
| 23 | +Pull the container image from the GitHub Container Registry: |
| 24 | + |
| 25 | +```shell |
| 26 | +$ docker pull ghcr.io/clemensv/real-time-sources-usgs-iv:latest |
| 27 | +``` |
| 28 | + |
| 29 | +To use it as a base image in a Dockerfile: |
| 30 | + |
| 31 | +```dockerfile |
| 32 | +FROM ghcr.io/clemensv/real-time-sources-usgs-iv:latest |
| 33 | +``` |
| 34 | + |
| 35 | +## Using the Container Image |
| 36 | + |
| 37 | +The container defines a command that starts the bridge, reading data from the |
| 38 | +USGS services and writing it to Kafka, Azure Event Hubs, or |
| 39 | +Fabric Event Streams. |
| 40 | + |
| 41 | +### With a Kafka Broker |
| 42 | + |
| 43 | +Ensure you have a Kafka broker configured with TLS and SASL PLAIN |
| 44 | +authentication. Run the container with the following command: |
| 45 | + |
| 46 | +```shell |
| 47 | +$ docker run --rm \ |
| 48 | + -e KAFKA_BOOTSTRAP_SERVERS='<kafka-bootstrap-servers>' \ |
| 49 | + -e KAFKA_TOPIC='<kafka-topic>' \ |
| 50 | + -e SASL_USERNAME='<sasl-username>' \ |
| 51 | + -e SASL_PASSWORD='<sasl-password>' \ |
| 52 | + ghcr.io/clemensv/real-time-sources-usgs-iv:latest |
| 53 | +``` |
| 54 | + |
| 55 | +### With Azure Event Hubs or Fabric Event Streams |
| 56 | + |
| 57 | +Use the connection string to establish a connection to the service. Obtain the |
| 58 | +connection string from the Azure portal, Azure CLI, or the "custom endpoint" of |
| 59 | +a Fabric Event Stream. |
| 60 | + |
| 61 | +```shell |
| 62 | +$ docker run --rm \ |
| 63 | + -e CONNECTION_STRING='<connection-string>' \ |
| 64 | + ghcr.io/clemensv/real-time-sources-usgs-iv:latest |
| 65 | +``` |
| 66 | + |
| 67 | +### Preserving State Between Restarts |
| 68 | + |
| 69 | +To preserve the state between restarts and avoid reprocessing feed entries, |
| 70 | +mount a volume to the container and set the `USGS_LAST_POLLED_FILE` environment variable: |
| 71 | + |
| 72 | +```shell |
| 73 | +$ docker run --rm \ |
| 74 | + -v /path/to/state:/mnt/state \ |
| 75 | + -e USGS_LAST_POLLED_FILE='/mnt/state/usgs_last_polled.json' \ |
| 76 | + ... other args ... \ |
| 77 | + ghcr.io/clemensv/real-time-sources-usgs-iv:latest |
| 78 | +``` |
| 79 | + |
| 80 | +## Environment Variables |
| 81 | + |
| 82 | +### `CONNECTION_STRING` |
| 83 | + |
| 84 | +An Azure Event Hubs-style connection string used to connect to Azure Event Hubs |
| 85 | +or Fabric Event Streams. This replaces the need for `KAFKA_BOOTSTRAP_SERVERS`, |
| 86 | +`SASL_USERNAME`, and `SASL_PASSWORD`. |
| 87 | + |
| 88 | +### `KAFKA_BOOTSTRAP_SERVERS` |
| 89 | + |
| 90 | +The address of the Kafka broker. Provide a comma-separated list of host and port |
| 91 | +pairs (e.g., `broker1:9092,broker2:9092`). The client communicates with |
| 92 | +TLS-enabled Kafka brokers. |
| 93 | + |
| 94 | +### `KAFKA_TOPIC` |
| 95 | + |
| 96 | +The Kafka topic where messages will be produced. |
| 97 | + |
| 98 | +### `SASL_USERNAME` |
| 99 | + |
| 100 | +Username for SASL PLAIN authentication. Ensure your Kafka brokers support SASL PLAIN authentication. |
| 101 | + |
| 102 | +### `SASL_PASSWORD` |
| 103 | + |
| 104 | +Password for SASL PLAIN authentication. |
| 105 | + |
| 106 | +### `USGS_LAST_POLLED_FILE` |
| 107 | + |
| 108 | +The file path where the bridge stores the state of processed entries. This helps |
| 109 | +in resuming data fetching without duplication after restarts. Default is |
| 110 | +`/mnt/state/usgs_last_polled.json`. |
| 111 | + |
| 112 | +## Deploying into Azure Container Instances |
| 113 | + |
| 114 | +You can deploy the USGS Instananeous Values Service bridge as a container directly to Azure Container |
| 115 | +Instances providing the information explained above. Just click the button below and go. |
| 116 | + |
| 117 | +[](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fclemensv%2Freal-time-sources%2Fmain%2Fusgs_iv%2Fazure-template.json) |
| 118 | + |
| 119 | +## Additional Information |
| 120 | + |
| 121 | +- **Source Code**: [GitHub Repository](https://github.com/clemensv/real-time-sources/tree/main/usgs_iv) |
| 122 | +- **Documentation**: Refer to [EVENTS.md](EVENTS.md) for the JSON event format. |
| 123 | +- **License**: MIT |
| 124 | + |
| 125 | +## Example |
| 126 | + |
| 127 | +To run the bridge fetching entries from multiple feeds every 10 minutes and sending them to an Azure Event Hub: |
| 128 | + |
| 129 | +```shell |
| 130 | +$ docker run --rm \ |
| 131 | + -e CONNECTION_STRING='Endpoint=sb://...;SharedAccessKeyName=...;SharedAccessKey=...;EntityPath=...' \ |
| 132 | + -v /path/to/state:/mnt/state \ |
| 133 | + ghcr.io/clemensv/real-time-sources-usgs-iv:latest |
| 134 | +``` |
| 135 | + |
| 136 | +This setup allows you to integrate USGS services data into your data processing pipelines, enabling real-time data analysis and monitoring. |
| 137 | + |
| 138 | +## Notes |
| 139 | + |
| 140 | +- Ensure that you have network connectivity to the USGS services. |
| 141 | +- The bridge efficiently handles data fetching and forwarding, but monitor resource usage if you are fetching data from many feeds at a high frequency. |
| 142 | + |
| 143 | +## Support |
| 144 | + |
| 145 | +For issues or questions, please open an issue on the [GitHub repository](https://github.com/clemensv/real-time-sources/issues). |
0 commit comments