-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prune fails because of unmounted datasets #35
Comments
Thanks for opening this issue. What does your system config look like, at least:
(after reboot - redact as necessary) |
zpool status shows both pools online. my fix for the issue is: #!/bin/sh
# After hard crash of a nomad node, remaining pots can't be pruned since their datasets are not mounted.
# This mounts the datasets and prunes pots.
zfs list -rH -o name zdata/pot/jails | xargs -L 1 zfs mount && logger -t pot_cleanup mounting all pot datasets || \
logger -t pot_cleanup failed to mount all pot datasets
pot prune && logger -t pot_cleanup pruning pots || logger -t pot_cleanup could not prune all pots This supported by a RC script which runs it once before nomad. |
@einsiedlerkrebs any reason you didn’t enable zfs in rc.conf? This can, e.g., be done using the service command:
It would take care of mounting zfs file systems on boot. |
Yes indeed this solved the issue. Thanks. |
I am experiencing the issue, that the pots are not cleaned up after hard reboot of a nomad node and therefore the jobs are failing.
When the system is up again, the ZFS datasets are not mounted into position, therefore the configuration file of a pot can not be found.
This leads to a failing prune command and therefore to the inability to run "prepare" in nomad. Because of this reason the node in not reboot safe.
To reproduce:
on a single server nomad node with running services (via pot) run
reboot
commandafter system is up, observe that the desired services are not up
get pot datasets with
zfs list
mount each "service" related datasets and its recursive ones
run
pot prune
trigger fresh service start on nomad node (either setting count to 0 and back to 1 or removing database)
now service should be working again
The text was updated successfully, but these errors were encountered: