-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong Partitioning? #3
Comments
Yeah, I haven't updated the readme yet 😅 I switched to hash partitioning by repo to allow for more partitions and having them more evenly sized. But even with that on my hardware I was able to ingest only up to 2k events/s. |
Hmm. I run the dual partitioning now, with 16 and 16 partitions, i.e., 256 in total. I haven't checked the metrics, but I think it's fully caught up and everything is synced after approx. a week or so. This hardware is (supposedly) a 2TB NVMe SSD, on which I run ZFS with compression. I don't use ScyllaDB because I don't trust these NoSQL databases, and I don't want to start indexing from scratch. The database is a healthy |
Oh, nice! Do you run prometheus and grafana to keep track of it? What kind of performance numbers did you get? (I recently got a 4TB NVMe SSD, but my indexer is still running on a SATA SSD) |
nope 🙃 I usually run them in the same docker compose setup, but I didn't add it yet. Which version of Grafana does the dashboard work with? Any plugins etc. I need? (In the best case, do you have a docker compose snippet flying around for that?) |
There's an outdated compose file here: https://github.com/uabluerail/indexer/blob/main/metrics/docker-compose.yml I didn't use it, since I'm running an instanced for everything I have at home and just added it to the config. This is my dashboard: https://github.com/uabluerail/indexer/blob/main/dashboards/indexer.json (also quite outdated), with some scripts to manage it in Grafana https://github.com/uabluerail/indexer/blob/main/Makefile#L95 |
Aha! I set all that up. The dashboard is very nice! Some of the hardware stuff is specific to your setup, but that's fine :) I noticed I had a delay of approx. 24 hours to some of the official PDSs. Most of them were around a few hours. I turned off the record indexer to catch up overnight, and I'm almost caught up now. In general, performance varies quite a bit:
(this was all without the record indexer running, which will probably slow things down a bit) I guess I could cut down on autovacuum times by increasing partitioning by DID, which should make the record table partitions smaller overall? In the first few hours I still had the record indexer running and got up to 150k repos/h total, but usually something more like 70k repos/h. Unfortunately, the machine sometimes OOM-kills Postgres, which wreaks havoc on everything, and most notably means autovacuum is canceled (and restarted once everything is back up). I've adjusted command: [
"-c", "max_connections=500",
"-c", "max_parallel_workers_per_gather=8",
"-c", "shared_buffers=8GB",
"-c", "work_mem=2GB",
"-c", "max_wal_size=4GB"
]
shm_size: '10gb'
oom_score_adj: -200 I've also adjusted the max. number of connections the DB pool of the record indexer keeps to |
Hm, I guess I updated regex since then to properly categorize PDSs. I've committed the current version of my dashboard just now, you can check it out too.
Yeah, even though ZFS should free up some memory under pressure, to the virtual memory subsystem it looks like "allocated" instead of "buffers", so it can't treat it as available for reuse.
I'm limiting ARC to 4GB on my machine, I don't want ZFS to eat too much RAM. And no, you don't have to reboot, you can adjust it at runtime: sudo sh -c 'echo 4294967296 > /sys/module/zfs/parameters/zfs_arc_max'
Compose file still has
4k commits/s is not too bad, should be good enough at least until the next big influx of users, and maybe even for some time afterwards.
Did record-indexer finish the backfill? If it did, the load from it should be minimal and not impact consumer too much. What I was doing taking turns between consumer and record-indexer: let consumer catch up, switch to record-indexer for the night, rinse and repeat. |
Hello again! I've had an eye on the dashboards and tweaked a few settings here and there. I'm now caught up and sailing along (until the disks are full, anyway). This is my current postgres configuration:
It still died occasionally while backfilling, but it's stable now. I also turned full page writes off, according to some tutorial about Postgres on ZFS: ALTER SYSTEM set full_page_writes=off; Weirdly, while backfilling, the bottleneck seemed to be the disk. I'm running this on some cloud VM, which supposedly has NVMes. I got pretty much exactly 40 MB/s disk writes, which looked suspiciously like some throttling went on somewhere. I guess I'll go for bare metal next time. Or maybe I should play around with wider partitioning? Do you know whether I can un-partition and re-partition in place? Otherwise the disk won't be large enough I'm afraid... Anyway, I did some CSV exports, which took a respectable 13 hours for all follows, likes, user profiles, and post languages. 😊 |
Hm, yeah, it probably especially makes sense with compression enabled, since postgres's idea of block boundaries wouldn't match reality. Overall you want it to never OOM :) Might be worth to tweak RAM limits for other containers: |
Hello, it's me again!
I finally got around to give the whole thing a spin. While partitioning, I noticed that the migration partitions by
repo
instead ofcollection
(which is what it says in the README). I played around a bit, with not-so-many records, but still:Script to un-partition:
and then run
docker compose up update-db-schema
to re-create the indices correctly.No partitioning
Pros (maybe)
Cons (maybe)
Query performance
Partition by
repo
Pros (maybe)
Cons (maybe)
Query performance
Partition by
collection
Pros (maybe)
Cons (maybe)
Query performance
Partition by
repo
, then sub-partition bycollection
Pros (maybe)
Cons (maybe)
Query performance
Opinions?
The text was updated successfully, but these errors were encountered: