Skip to content

Commit a86b6b4

Browse files
committed
adapt to latest xReg and add mode-s
Signed-off-by: Clemens Vasters <[email protected]>
1 parent 1196b54 commit a86b6b4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+3970
-190
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@
1212
**/_version.py
1313

1414

15+
mode-s/kql/aircraft.csv

gtfs/generate_producer.ps1

+31-31
Large diffs are not rendered by default.

gtfs/xreg/gtfs.xreg.json

+62-62
Large diffs are not rendered by default.

mode-s/CONTAINER.md

+145
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# USGS Water Services - Instantaneous Values Service bridge to Apache Kafka, Azure Event Hubs, and Fabric Event Streams
2+
3+
This container image provides a bridge between the [USGS Water Services](https://waterservices.usgs.gov/) Instantaneous Values
4+
Service and Apache Kafka, Azure Event Hubs, and Fabric Event Streams. The bridge
5+
fetches entries from specified feeds and forwards them to the configured Kafka
6+
endpoints.
7+
8+
## Functionality
9+
10+
The bridge retrieves data from the USGS Instantaneous Values Service and writes the entries to a
11+
Kafka topic as [CloudEvents](https://cloudevents.io/) in a JSON format, which is
12+
documented in [EVENTS.md](EVENTS.md). You can specify multiple feed URLs by
13+
providing them in the configuration.
14+
15+
## Database Schemas and handling
16+
17+
If you want to build a full data pipeline with all events ingested into
18+
database, the integration with Fabric Eventhouse and Azure Data Explorer is
19+
described in [DATABASE.md](../DATABASE.md).
20+
21+
## Installing the Container Image
22+
23+
Pull the container image from the GitHub Container Registry:
24+
25+
```shell
26+
$ docker pull ghcr.io/clemensv/real-time-sources-usgs-iv:latest
27+
```
28+
29+
To use it as a base image in a Dockerfile:
30+
31+
```dockerfile
32+
FROM ghcr.io/clemensv/real-time-sources-usgs-iv:latest
33+
```
34+
35+
## Using the Container Image
36+
37+
The container defines a command that starts the bridge, reading data from the
38+
USGS services and writing it to Kafka, Azure Event Hubs, or
39+
Fabric Event Streams.
40+
41+
### With a Kafka Broker
42+
43+
Ensure you have a Kafka broker configured with TLS and SASL PLAIN
44+
authentication. Run the container with the following command:
45+
46+
```shell
47+
$ docker run --rm \
48+
-e KAFKA_BOOTSTRAP_SERVERS='<kafka-bootstrap-servers>' \
49+
-e KAFKA_TOPIC='<kafka-topic>' \
50+
-e SASL_USERNAME='<sasl-username>' \
51+
-e SASL_PASSWORD='<sasl-password>' \
52+
ghcr.io/clemensv/real-time-sources-usgs-iv:latest
53+
```
54+
55+
### With Azure Event Hubs or Fabric Event Streams
56+
57+
Use the connection string to establish a connection to the service. Obtain the
58+
connection string from the Azure portal, Azure CLI, or the "custom endpoint" of
59+
a Fabric Event Stream.
60+
61+
```shell
62+
$ docker run --rm \
63+
-e CONNECTION_STRING='<connection-string>' \
64+
ghcr.io/clemensv/real-time-sources-usgs-iv:latest
65+
```
66+
67+
### Preserving State Between Restarts
68+
69+
To preserve the state between restarts and avoid reprocessing feed entries,
70+
mount a volume to the container and set the `USGS_LAST_POLLED_FILE` environment variable:
71+
72+
```shell
73+
$ docker run --rm \
74+
-v /path/to/state:/mnt/state \
75+
-e USGS_LAST_POLLED_FILE='/mnt/state/usgs_last_polled.json' \
76+
... other args ... \
77+
ghcr.io/clemensv/real-time-sources-usgs-iv:latest
78+
```
79+
80+
## Environment Variables
81+
82+
### `CONNECTION_STRING`
83+
84+
An Azure Event Hubs-style connection string used to connect to Azure Event Hubs
85+
or Fabric Event Streams. This replaces the need for `KAFKA_BOOTSTRAP_SERVERS`,
86+
`SASL_USERNAME`, and `SASL_PASSWORD`.
87+
88+
### `KAFKA_BOOTSTRAP_SERVERS`
89+
90+
The address of the Kafka broker. Provide a comma-separated list of host and port
91+
pairs (e.g., `broker1:9092,broker2:9092`). The client communicates with
92+
TLS-enabled Kafka brokers.
93+
94+
### `KAFKA_TOPIC`
95+
96+
The Kafka topic where messages will be produced.
97+
98+
### `SASL_USERNAME`
99+
100+
Username for SASL PLAIN authentication. Ensure your Kafka brokers support SASL PLAIN authentication.
101+
102+
### `SASL_PASSWORD`
103+
104+
Password for SASL PLAIN authentication.
105+
106+
### `USGS_LAST_POLLED_FILE`
107+
108+
The file path where the bridge stores the state of processed entries. This helps
109+
in resuming data fetching without duplication after restarts. Default is
110+
`/mnt/state/usgs_last_polled.json`.
111+
112+
## Deploying into Azure Container Instances
113+
114+
You can deploy the USGS Instananeous Values Service bridge as a container directly to Azure Container
115+
Instances providing the information explained above. Just click the button below and go.
116+
117+
[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fclemensv%2Freal-time-sources%2Fmain%2Fusgs_iv%2Fazure-template.json)
118+
119+
## Additional Information
120+
121+
- **Source Code**: [GitHub Repository](https://github.com/clemensv/real-time-sources/tree/main/usgs_iv)
122+
- **Documentation**: Refer to [EVENTS.md](EVENTS.md) for the JSON event format.
123+
- **License**: MIT
124+
125+
## Example
126+
127+
To run the bridge fetching entries from multiple feeds every 10 minutes and sending them to an Azure Event Hub:
128+
129+
```shell
130+
$ docker run --rm \
131+
-e CONNECTION_STRING='Endpoint=sb://...;SharedAccessKeyName=...;SharedAccessKey=...;EntityPath=...' \
132+
-v /path/to/state:/mnt/state \
133+
ghcr.io/clemensv/real-time-sources-usgs-iv:latest
134+
```
135+
136+
This setup allows you to integrate USGS services data into your data processing pipelines, enabling real-time data analysis and monitoring.
137+
138+
## Notes
139+
140+
- Ensure that you have network connectivity to the USGS services.
141+
- The bridge efficiently handles data fetching and forwarding, but monitor resource usage if you are fetching data from many feeds at a high frequency.
142+
143+
## Support
144+
145+
For issues or questions, please open an issue on the [GitHub repository](https://github.com/clemensv/real-time-sources/issues).

mode-s/Dockerfile

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Use an official Python runtime as a parent image
2+
FROM python:3.11-slim
3+
4+
LABEL org.opencontainers.image.source = "https://github.com/clemensv/real-time-sources/tree/main/mode_s"
5+
LABEL org.opencontainers.image.title = "USGS Instantaneous Values Service bridge to Kafka endpoints"
6+
LABEL org.opencontainers.image.description = "This container is a bridge between USGS feeds and Kafka endpoints. It fetches entries from feeds and forwards them to the configured Kafka endpoints."
7+
LABEL org.opencontainers.image.documentation = "https://github.com/clemensv/real-time-sources/blob/main/mode_s/CONTAINER.md"
8+
LABEL org.opencontainers.image.license = "MIT"
9+
10+
# Set the working directory in the container
11+
WORKDIR /app
12+
13+
# Copy the current directory contents into the container at /app
14+
COPY . /app
15+
16+
# Install the required Python packages
17+
RUN pip install .
18+
19+
# Define environment variables (default values)
20+
ENV CONNECTION_STRING=""
21+
ENV LOG_LEVEL="INFO"
22+
23+
# Run the application
24+
CMD ["python", "-m", "mode_s", "feed"]

mode-s/EVENTS.md

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Mode-S API Bridge Events
2+
3+
This document describes the events that are emitted by the Mode-S API Bridge.
4+
5+
- [Mode_S](#message-group-modes)
6+
- [Mode_S.Messages](#message-modesmessages)
7+
8+
---
9+
10+
## Message Group: Mode_S
11+
12+
---
13+
14+
### Message: Mode_S.Messages
15+
16+
#### CloudEvents Attributes:
17+
18+
| **Name** | **Description** | **Type** | **Required** | **Value** |
19+
|-------------|-----------------|--------------|--------------|-----------|
20+
| `specversion` | CloudEvents version | `string` | `True` | `1.0` |
21+
| `type` | Event type | `string` | `True` | `Mode_S.Messages` |
22+
| `source` | Station Identifier | `uritemplate` | `True` | `{stationid}` |
23+
24+
#### Schema:
25+
26+
##### Record: Messages
27+
28+
*A container for multiple Mode-S and ADS-B decoded messages.*
29+
30+
| **Field Name** | **Type** | **Description** |
31+
|----------------|----------|-----------------|
32+
| `messages` | *array* | An array of Mode-S and ADS-B decoded message records. |

mode-s/README.md

+111
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Mode-S Data Poller Usage Guide
2+
3+
## Overview
4+
5+
**Mode-S Data Poller** retrieves ADS-B data from dump1090 and sends updates to a Kafka topic.
6+
7+
### What is dump1090?
8+
9+
[dump1090](https://github.com/antirez/dump1090) is an open-source Mode S decoder specifically designed for RTLSDR devices. It captures ADS-B messages broadcast by aircraft and decodes them to provide real-time information about aircraft positions, velocities, and other parameters. dump1090 is commonly used with RTL-SDR dongles to set up low-cost ADS-B receivers.
10+
11+
### Recommended Forks and Tools
12+
13+
- **dump1090-fa**: The [FlightAware fork of dump1090](https://github.com/flightaware/dump1090) is the best-maintained version and includes additional features and improvements.
14+
- **PiAware**: [PiAware](https://flightaware.com/adsb/piaware/install) is an easy way to set up dump1090-fa on a Raspberry Pi. It provides a complete package for ADS-B reception and integration with FlightAware.
15+
16+
### How dump1090 Receivers are Operated
17+
18+
dump1090 receivers are typically operated by connecting an RTL-SDR dongle to a computer or a Raspberry Pi. The software listens for ADS-B messages on 1090 MHz, decodes them, and provides the data via various endpoints, including a web interface and network ports.
19+
20+
### BEAST Endpoint
21+
22+
The BEAST endpoint is a binary protocol used by dump1090 to stream raw ADS-B messages over a network. By default, dump1090 provides this endpoint on port 30005. The Mode-S Data Poller connects to this endpoint to retrieve ADS-B data. Other receivers supporting BEAST output may also work with this tool.
23+
24+
## Key Features:
25+
- **ADS-B Data Fetching**: Retrieve current ADS-B data from dump1090.
26+
- **Kafka Integration**: Send ADS-B data updates as CloudEvents to a Kafka topic, supporting Microsoft Event Hubs and Microsoft Fabric Event Streams.
27+
28+
## Installation
29+
30+
The tool is written in Python and requires Python 3.10 or later. You can download Python from [here](https://www.python.org/downloads/) or get it from the Microsoft Store if you are on Windows.
31+
32+
### Installation Steps
33+
34+
Once Python is installed, you can install the tool from the command line as follows:
35+
36+
```bash
37+
pip install git+https://github.com/clemensv/real-time-sources#subdirectory=mode-s
38+
```
39+
40+
If you clone the repository, you can install the tool as follows:
41+
42+
```bash
43+
git clone https://github.com/clemensv/real-time-sources.git
44+
cd real-time-sources/mode-s
45+
pip install .
46+
```
47+
48+
For a packaged install, consider using the [CONTAINER.md](CONTAINER.md) instructions.
49+
50+
## How to Use
51+
52+
After installation, run `mode_s.py`. It supports multiple subcommands:
53+
- **Feed (`feed`)**: Continuously poll ADS-B data from dump1090 and send updates to a Kafka topic.
54+
55+
### **Feed (`feed`)**
56+
57+
Polls Mode-S data from dump1090 and sends them as CloudEvents to a Kafka topic. The events are formatted using CloudEvents structured JSON format and described in [EVENTS.md](EVENTS.md).
58+
59+
- `--host`: Hostname or IP address of dump1090.
60+
- `--port`: TCP port dump1090 is listening on.
61+
- `--kafka-bootstrap-servers`: Comma-separated list of Kafka bootstrap servers.
62+
- `--kafka-topic`: Kafka topic to send messages to.
63+
- `--sasl-username`: Username for SASL PLAIN authentication.
64+
- `--sasl-password`: Password for SASL PLAIN authentication.
65+
- `--connection-string`: Microsoft Event Hubs or Microsoft Fabric Event Stream [connection string](#connection-string-for-microsoft-event-hubs-or-fabric-event-streams) (overrides other Kafka parameters).
66+
- `--ref-lat`: Latitude of your receiving antenna, required for decoding Mode-S/ADS-B positions.
67+
- `--ref-lon`: Longitude of your receiving antenna, required for decoding Mode-S/ADS-B positions.
68+
69+
#### Example Usage:
70+
71+
```bash
72+
mode_s.py feed --kafka-bootstrap-servers "<bootstrap_servers>" --kafka-topic "<topic_name>" --sasl-username "<username>" --sasl-password "<password>" --polling-interval 60
73+
```
74+
75+
Alternatively, using a connection string for Microsoft Event Hubs or Microsoft Fabric Event Streams:
76+
77+
```bash
78+
mode_s.py feed --connection-string "<your_connection_string>" --polling-interval 60
79+
```
80+
81+
### Connection String for Microsoft Event Hubs or Fabric Event Streams
82+
83+
The connection string format is as follows:
84+
85+
```
86+
Endpoint=sb://<your-event-hubs-namespace>.servicebus.windows.net/;SharedAccessKeyName=<policy-name>;SharedAccessKey=<access-key>;EntityPath=<event-hub-name>
87+
```
88+
89+
When provided, the connection string is parsed to extract the Kafka configuration parameters:
90+
- **Bootstrap Servers**: Derived from the `Endpoint` value.
91+
- **Kafka Topic**: Derived from the `EntityPath` value.
92+
- **SASL Username and Password**: The username is set to `'$ConnectionString'`, and the password is the entire connection string.
93+
94+
### Environment Variables
95+
The tool supports the following environment variables to avoid passing them via the command line:
96+
- `KAFKA_BOOTSTRAP_SERVERS`: Kafka bootstrap servers (comma-separated list).
97+
- `KAFKA_TOPIC`: Kafka topic for publishing.
98+
- `SASL_USERNAME`: SASL username for Kafka authentication.
99+
- `SASL_PASSWORD`: SASL password for Kafka authentication.
100+
- `CONNECTION_STRING`: Microsoft Event Hubs or Microsoft Fabric Event Stream connection string.
101+
- `POLLING_INTERVAL`: Polling interval in seconds.
102+
- `DUMP1090_HOST`: Hostname or IP address of dump1090.
103+
- `DUMP1090_PORT`: TCP port dump1090 listens on.
104+
- `REF_LAT`: Latitude of your receiving antenna, required for decoding positions.
105+
- `REF_LON`: Longitude of your receiving antenna, required for decoding positions.
106+
- `STATIONID`: Station ID for event source attribution.
107+
108+
## State Management
109+
110+
The tool handles state internally for efficient API polling and sending updates.
111+

0 commit comments

Comments
 (0)