-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Rebuild of Dashy in ProxMox cluster v 8.0.4 / LinuxTurnkey / docker with ZFS storage is crashing. #1337
Comments
If you're enjoying Dashy, consider dropping us a ⭐ |
Just to mention, that Dashy is for me by far the best Dashboard solution to meet my needs. Mainly because of the multipage an easy search functionality. Thanks Liss ;-) |
Hi can you share your Dashy Logs? |
Hi, arch: amd64 I've attached the dashy logs downloaded from portainer. Let me know, if you need another log file, as I am not able to shell in the dashy container. (Looked @ Google some months ago, this is normal behavior) Steps done for your log files:
Attached the log file of same container (restored in ProxMox), as soon as I move this container to a local-lvm storage, and doing a rebuild. Let me know if I can help to solve issue by providing more information. Thanks so far, |
That indeed is very confusing, can you share why you use /home/home after
the port in the url of dashy?
And can you share your browser console when dashy says "not found" ?
(Depending on your browser, tight click anywhere then developer options)
Some random guide I found:
https://balsamiq.com/support/faqs/browserconsole/#:~:text=To%20open%20the%20developer%20console,(on%20Windows%2FLinux)
.
LeLuc ***@***.***> schrieb am Fr., 20. Okt. 2023, 02:02:
… Hi,
Here the 600.conf (prio-2 docker running CT=600 = dashy)
arch: amd64
cores: 2
features: keyctl=1,nesting=1
hostname: dockerPrio2Services
memory: 2048
net0: name=eth0,bridge=vmbr0,gw=192.168.178.1,hwaddr=3E:CB:DD:D1:4C:DB,ip=
192.168.178.202/24,type=veth
ostype: debian
rootfs: cluZFS-1:subvol-600-disk-0,size=8G
swap: 512
tags: docker;ha
unprivileged: 1
I've attached the dashy logs downloaded from portainer. Let me know, if
you need another log file, as I am not able to shell in the dashy
container. (Looked @ Google some months ago, this is normal behavior)
Steps done for your log files:
-
Stopped container in portainer
-
restarted dashy container
-
checked log file to be "normal"
See file: "2023-10-20_Dashy-it-4-Home_logs after RESTART.txt"
2023-10-20_Dashy-it-4-Home_logs after RESTART.txt
<https://github.com/Lissy93/dashy/files/13048532/2023-10-20_Dashy-it-4-Home_logs.after.RESTART.txt>
-
tested dashy. Worked as expected
-
Rebuild of dashy without changing any *.yml file. "Just" started a
rebuild;
1. "Config"
2. "Update Configuration"
3. "Rebuild Application"
4. "Start Build"
-
dashy error while rebuilding. dashy dashboard not displayed anymore
(Chrome refresh)
[image: image]
<https://user-images.githubusercontent.com/45201013/276776129-295360d3-dac1-4b6f-a3e6-7bd08895befd.png>
[image: image]
<https://user-images.githubusercontent.com/45201013/276776319-02c8fa74-c0c4-4de1-ba00-55bd269c4b32.png>
- downloaded log file of dashy-container via portainer
See file: "2023-10-20 ***@***.*** build NO CONFIG FILE CHANGE = BAD.txt"
2023-10-20 ***@***.*** build NO CONFIG FILE CHANGE = BAD.txt
<https://github.com/Lissy93/dashy/files/13048535/2023-10-20.Dashy%402.1.1.build.NO.CONFIG.FILE.CHANGE.BAD.txt>
Attached the log file of same container (restored in ProxMox), as soon as
I move this container to a local-lvm storage, and doing a rebuild.
***@***.*** build NO CONFIG FILE CHANGE = OK.txt
<https://github.com/Lissy93/dashy/files/13048561/Dashy%402.1.1.build.NO.CONFIG.FILE.CHANGE.OK.txt>
Let me know if I can help to solve issue by providing more information.
Thanks so far,
Luc
—
Reply to this email directly, view it on GitHub
<#1337 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXBPCQFXNFFDPKA3KHSHSMDYAG5SLAVCNFSM6AAAAAA5OLJY3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZRHA3DGNRSGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi CrazyWolf13 ;-) Took some time, as I wanted to make some tests by moving my ProxMox from ZFS to ceph ... So before digging in in the responses I was asked for: ceph does not give any problem whilst rebuilding the config in Dashy ! I will switch back to ZFS and redoo tomorrow latest the same rebuild process, and see if same problem does still apply. I'll post my results. Here the answers to the questions. Here the conf.yml file ... pageInfo:
Just my point of view ... (I am not a developer at all ;-) ) : I presume that this bug is more related to how Dashy is accessing files with different filesystems. ZFS versus local-lvm vs ceph ... this because other containers behave without no problem till now, whilst also running on any of these filesystems, but mainly on ZFS. The only way to understand what is happening will be to have on your side someone debugging step by step the rebuild process on ZFS. Especially as this does happen on every rebuild. Be it with a minimalistic *.yml file, or with my more complex multi page *yml set-up. Developer console in the next post, as I have switched back to ZFS .. Thanks so far |
Hi guys, I switched back to ZFS .. and problem reoccurs exactly the same way, as before .. So local-lvm and ceph is working ... Local-lvm is not an optipon, as I do run a ProxMox-Cluster .. ceph is for this configuration consuming too much ressources .. So ZFS is the only possible option for me .. Hope you'll find where this problem comes from ... Hereafter the Chrome COnsole while refreshing the page .. Thanks so far .. Luc |
Please look again through the guide I sent you and send me the output of the Browser Console. |
Hallo , Was not at Home for some days ... Therefore some delay .. Hereafter the asked information. I sent last time the Network part, as the Console view showed only few lines .. This time I activated in Menu "All the levels", the unselected "Verbose" mode .. which finally showed more information. Screenshot and Logfiles (preserved mode) are attached. First Console file:
Second Console file:
This time the Dashy page apeared showing all the different steps as he was trying to build his environment. "Initializing" / "Running Checks" / "Building" / "Finishing Off" / "Almost Done" / "Not Long Left" / "Taking Longer than Expected" and ended up in the Dashy Error Page ... with nothing in the console displayed. I've tested also in Edge, but exact same behaving. Just as some information which might be important for you. Log file of succesfull rebuild attached: Moving it afterwards to ZFS gives a running Dashy container on ZFS. Let me know if you do need more .. Luc |
One thought ... but to be validated, if this can help on your side to narrow/pinpoint the issue ... I can export my container after I've regenerated Dashy on ZFS .. so you can import on your side, and should be able to "see" what exactly is the problem, and myabe have a hint where this might come from .. ? Just let me know ... |
Hi all, Just to see, if this issue is still open, and if someone needs more information? I am there to help to get rid of this .. ;-) |
Hi @CrazyWolf13 I have moved the dashy docker to be stored on local-lvm, knowing, that this will disable in my ProxMox cluster the availability of dashy in case of failure of the docker-HOST, or the ProxMox host. I therefore removed my dashy docker storage to cluZFS, and tried to apply the modifications I needed to do to the dashy files. Seems, that this time an error Message was displayed at the end of the log fil in docker, which did not apear in my previous faulty rebuilds. Hope this helps to make dashy cluZFS compatible .. ;-) Available to help, in this tricky issue. Luc docker-dashy-LOG-file_██████╗ █████╗ ███████╗██╗ ██╗██╗ ██╗ Welcome to Dashy! 🚀 Using Dashy V-2.1.1. Update Check Complete
|
I sadly cannot help you any further with this, the best bet is to hope for Lissy to look into this, however as she is really busy and there are way more important things heavily needing her attention. So I guess right now it's a bit out of the scope, however this is only my opinion and maybe lissy looks into this:) |
Thanks @CrazyWolf13 for your fast reply. Ok, I will keep an eye on this issue, so I can maybe help with some actions on my side, as @Lissy93 has some time to take a look at this. Alm is so far documented in this ticket. I would be happy to help you guys, if any information is needed. Thx so far. Thumbs up for this beautiful dashboard. Really great 🤩 Danke Tobias |
Files are too huge to uplload! Smalest file is 491 MB .. Max is 509 MB. Max upload on GitHub is 25 MB. Let me know if we can find another service to transfer these files, like WeTransfer. But then I need an e-mail where I can send the link. :-( Luc |
Hi, I do recontact you, just to see if you have an idea, how to send you the files ? I will be available till Friday evening ;-) ... Will be back on 22nd January .. I adapt to your needs .. Thanks for the support so far. Luc |
Apologies, I was busy the past few days. Feel free to upload them to something like pixeldrain, and I'll take a look from there. Also about the rebuild thing - this is a bug introduced in 2.1.1 that causes the application to not be rebuilt on startup (which it did in 2.1.0). See #1290 (comment) for a full explanation. |
Hi ... No problem at all ;-) .... I am happy, that you do take your time to look at this problem. Especially, as I do not know, if it is related to Dashy, or even a third party coincidence .. Hope you'l find it out ;-) Amazing tool .. Better than WeTransfer, as no e-mail needed ;-) .. Perfect .. So here we are .. Here is the link: 2024-01-11-dashyDockerExports pointing to all 3 files as described in my previous e-mail ... Thanks so far for your help ;-) Luc |
This may or may not work, but have you tried creating another ZFS instance to test to see if it's occurring there as well? I have had some whacky issues with permissions on my ZFS proxmox cluster before |
Hi,Thanks for your input.
I have some doubt about this, as I have in my cluster +/- 19 containers running. All are running smoothly! Dashy also runs smoothly on each node of the clustern as I move the lxc arround to the other nodes. (Migrating lxc = docker-host)The only difference Dashy does represent, is that it can “rebuild” some internal config. It is exactly this process, while being on the ZFS storage, that Dashy gets corrupted. Other containers, like WiKi,js / ntopng / grafana / … do all work correctly.
Depending on the news as I return from holiday on 22. January, I’ll give it a try;-).I’ll let you know.
Thanks for your input.
Luc
|
Hi all, Just wanted to let you know, that I am back from holiday .. ;-) Ready to help, if needed. No hurry, as I know you have for sure other topics also to handle. Wanted to just to let you know, that I am am able as of now, to give further details, you may need .. Your speed will be mine. Thanks for your help so far. Luc |
Allright, so I've finally got a chance to look at this a little bit. I used Meld to take a look at the differences between the three; here are the main highlights:
The implications: 0-local-lvm
1-local-lvm
2-zfs
What to doUnfortunately it's quite hard to diagnose things remotely like this, but my current leading theory is that it's to do with the docker image being stored originally on LVM, but then migrated to ZFS - try deleting the docker image from your LXC completely (I believe you can do it through the portainer webUI) so it gets re-downloaded when you restart the container. |
Hi, Thanks for this detailled feedback. I know, that debugging this issue is quite complex if you do not have the necessary infrastructure. Sorry for that. Thanks for all your help to find what is not behaving as expected ;-) as this will help us all. I was according to your feedback deleteing all the dashy related items on docker (via Portainer). This includes;
Docker compose file and dashy *.yml files not changed since +/- a month. I repulled the image from Docker-Hub (latest which is TAG: 2.1.1 via the docker compose file and ended up in the Dashy container running (Healthy Status displayed in Portainer), but as usual with the same menu items displayed in all dashy-Menus. (This behavior is since the very beginning. No difference in lvm or ZFS or ceph. Nor really anyoing. OK for me, as I know I just need to rebuild to have entries showing up) I then did a rebuild, to have these different Pages/Menus recreated, while remaining on ZFS. Same error occured, as on the very beginning. BTW: I created a ceph-storage on the same ProxMox cluster and Dashy does behave exactly the same way on ceph-storage, than on ZFS-Storage. :-( I remain available for any information you may need. P.S.: Image I used while creating this bug (TAG: 2.2.1) is the same as the image I just pulled (TAG: 2.1.1) Did I miss some "newer" not yet published image you want me to test? |
Hi @TheRealGramdalf , For every new release you generate, I test your new version on my ProxMox Cluster, to see if the error I do encounter, may have disapeared.
All the other prior releases did not change anything on my problem. But, this release does react differently. I hope that this may help you to pinpoint what the problem is. It is also intended to help other users now facing this new situation! I do post my log fiels hereafter. Details: I was running Dashy 2.1.2 prior to the update to be on my ZFS filesystem. Just to remember. Dashy on ProxMox with ZFS-filesystem does enup in an erro while regenerating his pages/engine as any modifications was done in any *.yml (config) file(s) New behavior; (Same LXC container as with prior release)
Dashy container does not start anymore. (Constantly rebooting due to Restart Policy: "Unless Stopped") LOG results while being on ZFS [Dashy crashing] `$ node server Welcome to Dashy! 🚀 Using Dashy V-3.0.0. Update Check Complete
Welcome to Dashy! 🚀 Using Dashy V-3.0.0. Update Check Complete
After stopping the LXC container, and moving the storage to local-lvm, Dashy was able to start, and generate all the pages of my configuration, and it is working like a charm. Log file running LXC on local-lvm storage [ALL OK] Using Dashy V-3.0.0. Update Check Complete Welcome to Dashy! 🚀 Using Dashy V-3.0.0. Update Check Complete
I now wanted to know, if switching back to ZFS will give a working Ver 3.0.0 Dashy. BUT after stopping the LXC container and moving back the storage to ZFS, this time, Dashy is behaving exactly as mentionned above (Restarting continously, and producing the exact same log-file entries) Hope this long comment will help to understand why your dashy-docker does "not like" ZFS filesystem. I was not able to test CEPH, as I will be travelling the next weeks ... If you do need more details, just ping me ... I will answer, as I am back .. (+/- last week May. Thanks for your fabulous Dashy ;-) .... It is still the only solution matching my needs ;-) Luc |
One question @LuxBibi - can you post the output of The output of |
Hi @TheRealGramdalf , First, thanks for the fast reply ;-) ... Here the asked details.
cluZFS-1 feature@async_destroy enabled local
Available if more needed, Luc |
I believe the problem may be related to some bug fixes in ZFS version 2.2.0, primarily Linux Container support, which fixed some issues with overlay2 - which you are using as the storage driver according to #1337 (comment):
Based on the feature flags shown in
This suggests to me that zfs-utils have been updated to My recommended steps:Warning As always, take backups, and make sure you take proper precautions. The actions described below can be destructive, so follow with care.
I would try that first and see where it takes you, since up to zfs 2.2.0 overlay2 support has been dicey at best, and resulted in many weird issues. |
Hi @TheRealGramdalf , I do really appreciate your help and effort, to try to get rid of this issue. Thanks a lot for this ;-). Especially, as these updates seem to be the solution to all the problems I have encountered so far with Dashy on my ProxMox CLuster !!!! Rebuilding, changing *.yml files, migrating .. all worked now like a charm ... ;-) I'll close this issue as Dashy is now running perfectly on these new ProxMox updates. Thanks for your ideas and hints. As this maybe of interest for anyone encountering the same problem. Here the actual status (took some time, as I was not @ Home)
Did this for the other remaining nodes in my cluster, ending up with a 8.2.2. Proxmox cluster! I have redeployed the Dashy Stack after removing the "old" Dashy image, and repulling the latest v.3.0.1 Dashy image. 'docker container log_file` Welcome to Dashy! 🚀 Using Dashy V-3.0.1. Update Check Complete Hereafter the actual ZFS versions.
|
Environment
Self-Hosted (Docker)
System
ProxMox Ver 8.0.4 / 3 node Cluster - LXC - Debian GNU/Linux 11 (bullseye) - docker Version: 24.0.5
Version
Dash version 2.1.1
Describe the problem
As I start a rebuild of Dashy Dashboard via icon/menu; (as I changed some *.yml files)
-> Config -> Update Configuratipon -> Rebuild Application
the rebuild process is "crashing" and Dashy is no more usaeable at all.
Dashy works (using and rebuilding) like a charm on a Synology NAS, on ProxMox with container on local-lvm storage, with exact same image version. Problems only do apear as I have the lxc (Linux container where docker is running) on a cluster wide ZFS storage (shared between all the ProxMox nodes)
I am able to reproduce this error at 100% for every time I do recreate working Dashy and rebuild!
As soon as I move the docker container to local storage (local-lvm) on the ProxMox cluster, no problem at all.
Workaround if someone encounters same problem;
Move the lxc container disk to local storage (e.g. local-lvm), make all the changes you need, redeploy a new Dashy image and rebuild Dashy. As Dashy is up and running again (Healthy status in portainer) rebuild and validate that actual set-up in *.yml file(s) does match your needs. As this is OK, move back lxc container to cluster wide ZFS storage to have ProxMox cluster redundancy for dashy. (Not very userfriendly and quite time consuming) but works
Additional info
_Dashy-it-4-Home_logs.txt
Please tick the boxes
The text was updated successfully, but these errors were encountered: