WIP: Prospective demo #22

picaultj · 2025-02-03T10:30:28Z

TODO

picaultj · 2025-02-18T16:47:42Z

    # FIXME/TODO
    #   - self.topic_models shall not be an attribute of BERTrend -- to much memory consumption after a few iterations
    #   - what we did so far :
    #       * create topic models for each period, store them in self.topic_models
    #       * merge the data after preprocessing of each model
    #   - instead modify the functions as follows
    #       * train_topic_models: do a combined operation
    #           no need to store anything else than the last topic model (at least temporarily)
    #           for each period
    #               combine the operations of training the new topic model and merging
    #               optionnally store (as BERTopic serialization, using the function "save_topic_model" the newly created model
    #               merge the new one with previous data
    #           that way: no need to store BERTopic models inside BERTrend instance (memory saving)
    #                       we can serialize the BERTrend objects simply as a .drill and restore it the same way
    #    - in the demo, modify the different states of BERTrend (timestamps) are checked: use the BERTrend objects
    #       (ex. keys of self.doc_groups)
    #       instead of looking into the disk for available topic models

…s column)

Problem of overhead memory in BERTrend serialization

picaultj force-pushed the prospective_demo branch 4 times, most recently from a421abe to 2ffd01e Compare February 7, 2025 09:38

grosjeang force-pushed the prospective_demo branch from cab210e to 9a16f06 Compare February 24, 2025 08:38

picaultj and others added 13 commits February 24, 2025 11:04

Separation of signals dataframes and detailed analysis in separate tabs

d13a3d8

Added URLs for weak, strong signals for better data exploration

3fae9e1

Fixed some UI bugs

2019d8d

Added report generation

a50e02f

Added mail support to send report

2b682df

bug fix

0ff4c62

Bug workaround (parquet serialization with non str values in Document…

84eeb38

…s column)

Added script to regenerate models from scratch

12989ff

Bug fix in GUI

cd240c5

Problem of overhead memory in BERTrend serialization

Bug fix newsletters

0680c8f

Code cleaning

187860c

Revert change

1a5d6fa

🎨 Format Python code with psf/black

6872e6a

grosjeang force-pushed the prospective_demo branch from 9a16f06 to 6872e6a Compare February 24, 2025 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Prospective demo #22

WIP: Prospective demo #22

picaultj commented Feb 3, 2025 •

edited

Loading

picaultj commented Feb 18, 2025

WIP: Prospective demo #22

Are you sure you want to change the base?

WIP: Prospective demo #22

Conversation

picaultj commented Feb 3, 2025 • edited Loading

picaultj commented Feb 18, 2025

picaultj commented Feb 3, 2025 •

edited

Loading