prune fails because of unmounted datasets #35

einsiedlerkrebs · 2023-03-22T13:57:11Z

I am experiencing the issue, that the pots are not cleaned up after hard reboot of a nomad node and therefore the jobs are failing.

When the system is up again, the ZFS datasets are not mounted into position, therefore the configuration file of a pot can not be found.
This leads to a failing prune command and therefore to the inability to run "prepare" in nomad. Because of this reason the node in not reboot safe.

To reproduce:

on a single server nomad node with running services (via pot) run reboot command
after system is up, observe that the desired services are not up
get pot datasets with zfs list
mount each "service" related datasets and its recursive ones
run pot prune
trigger fresh service start on nomad node (either setting count to 0 and back to 1 or removing database)
now service should be working again

The text was updated successfully, but these errors were encountered:

grembo · 2023-03-24T11:48:31Z

Thanks for opening this issue.

What does your system config look like, at least:

cat /etc/rc.conf
zpool status

(after reboot - redact as necessary)

einsiedlerkrebs · 2023-03-28T08:10:00Z

nomad_enable="True"
nomad_user="root"
nomad_debug="YES"
nomad_env="PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/sbin:/bin"
nomad_dir="/var/tmp/nomad"
nomad_args="-config=/opt/hashicorp/nomad-agent.hcl"

zpool status shows both pools online.

my fix for the issue is:

#!/bin/sh
# After hard crash of a nomad node, remaining pots can't be pruned since their datasets are not mounted.
# This mounts the datasets and prunes pots.

zfs list -rH -o name zdata/pot/jails | xargs -L 1 zfs mount && logger -t pot_cleanup mounting all pot datasets || \
 logger -t pot_cleanup failed to mount all pot datasets
pot prune && logger -t pot_cleanup pruning pots || logger -t pot_cleanup could not prune all pots

This supported by a RC script which runs it once before nomad.

grembo · 2023-03-28T10:44:10Z

@einsiedlerkrebs any reason you didn’t enable zfs in rc.conf? This can, e.g., be done using the service command:

# service zfs enable

It would take care of mounting zfs file systems on boot.

einsiedlerkrebs · 2023-03-29T09:11:22Z

Yes indeed this solved the issue. Thanks.

einsiedlerkrebs mentioned this issue Mar 22, 2023

pots left behind in case job is dying #19

Closed

einsiedlerkrebs changed the title ~~prune fails because of unmounted filesystems~~ prune fails because of unmounted datasets Mar 22, 2023

einsiedlerkrebs mentioned this issue Mar 29, 2023

add enabled zfs service hint to Installation.md nomad-pot-driver#35 bsdpot/pot-book#3

Open

einsiedlerkrebs closed this as completed Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prune fails because of unmounted datasets #35

prune fails because of unmounted datasets #35

einsiedlerkrebs commented Mar 22, 2023 •

edited

Loading

grembo commented Mar 24, 2023

einsiedlerkrebs commented Mar 28, 2023

grembo commented Mar 28, 2023 •

edited

Loading

einsiedlerkrebs commented Mar 29, 2023

prune fails because of unmounted datasets #35

prune fails because of unmounted datasets #35

Comments

einsiedlerkrebs commented Mar 22, 2023 • edited Loading

grembo commented Mar 24, 2023

einsiedlerkrebs commented Mar 28, 2023

grembo commented Mar 28, 2023 • edited Loading

einsiedlerkrebs commented Mar 29, 2023

einsiedlerkrebs commented Mar 22, 2023 •

edited

Loading

grembo commented Mar 28, 2023 •

edited

Loading