Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ArchiveSelector::getArchiveIdsAndStates can sometimes select old data #23085

Open
4 tasks done
diosmosis opened this issue Feb 28, 2025 · 2 comments
Open
4 tasks done
Labels
Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. To Triage An issue awaiting triage by a Matomo core team member

Comments

@diosmosis
Copy link
Member

diosmosis commented Feb 28, 2025

What happened?

In some cases, archive data can be written where an older archive (ie, with an older ts_archived value) can have a higher idarchive than a newer archive's idarchive value. I don't know what causes archive data like this to be created, but it results in the UI displaying old, inaccurate data, rather than up-to-date data.

This happens randomly, but fairly reliably, in the Matomo for WordPress' builds, like this one: https://productionresultssa3.blob.core.windows.net/actions-results/702dc1ef-41d7-4581-a299-5545a6d5837a/workflow-job-run-37007988-8392-5f7a-0b46-d397a89a2823/logs/job/job-logs.txt?rsct=text%2Fplain&se=2025-02-28T06%3A23%3A50Z&sig=qatnq0Z8jvxy5PModjE4%2FWiHReutsK5oX8M1FjUL5Dk%3D&ske=2025-02-28T14%3A42%3A34Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2025-02-28T02%3A42%3A34Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-01-05&sp=r&spr=https&sr=b&st=2025-02-28T06%3A13%3A45Z&sv=2025-01-05 (note: there are a lot of logs here, the relevant logs start at the found visits 3 line).

In the above case, there are 7 visits tracked total, and the archive that contains this correct metric has this data:

[
  {
    idarchive: '19',
    name: 'done',
    idsite: '1',
    date1: '2023-12-20',
    date2: '2023-12-20',
    period: '1',
    ts_archived: '2025-02-28 06:07:45',
    value: '1'
  },
  {
    idarchive: '19',
    name: 'nb_visits',
    idsite: '1',
    date1: '2023-12-20',
    date2: '2023-12-20',
    period: '1',
    ts_archived: '2025-02-28 06:07:45',
    value: '7'
  },
]

The archive that gets used when VisitsSummary.get is called, however, is the older one:

[
  {
    idarchive: '373',
    name: 'done',
    idsite: '1',
    date1: '2023-12-20',
    date2: '2023-12-20',
    period: '1',
    ts_archived: '2025-02-28 06:02:57',
    value: '4'
  },
  {
    idarchive: '373',
    name: 'nb_visits',
    idsite: '1',
    date1: '2023-12-20',
    date2: '2023-12-20',
    period: '1',
    ts_archived: '2025-02-28 06:02:57',
    value: '3'
  }
]

For some reason, the idarchive of the newer archive, is less than the older one.

What should happen?

The archive with the latest ts_archived value should be selected, rather than the archive with the greatest idarchive. (I believer this can be changed in ArchiveSelector::getArchiveIdsAndStates().)

Or alternatively, newer archive data should always have a greater idarchive value.

How can this be reproduced?

It would be very difficult to reproduce this, since it is random. Though recently I've been seeing it fairly often in the Matomo for WordPress e2e tests.

Matomo version

5.2.2

PHP version

8.1

Server operating system

Linux

What browsers are you seeing the problem on?

No response

Computer operating system

No response

Relevant log output

Validations

@diosmosis diosmosis added Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. To Triage An issue awaiting triage by a Matomo core team member labels Feb 28, 2025
@mneudert
Copy link
Member

mneudert commented Mar 4, 2025

Hi @diosmosis,

Can you give use some links directly pointing to the action runs that show this problem? The raw log links have a rather short expiry in them.

The error looks like the sequence table is getting corrupted. Messing with the sequences is an easy way to force the error you are experiencing, but that should not "just happen" randomly 🤔

@diosmosis
Copy link
Member Author

@mneudert Here is a link to the failed action: https://github.com/matomo-org/matomo-for-wordpress/actions/runs/13582282090/job/37970328073

The output has a dump of the archive table, but does not have a dump of the sequence table.

I am currently working around the issue by checking if the archived visit count looks correct, and if not, dropping the table and re-archiving, which seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. To Triage An issue awaiting triage by a Matomo core team member
Projects
None yet
Development

No branches or pull requests

2 participants