-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass instant from aptosdb for calculating latency metric #15678
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
// Copyright © Aptos Foundation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
use crate::metrics::INDEXER_DB_LATENCY; | ||
use anyhow::Result; | ||
use aptos_config::config::{internal_indexer_db_config::InternalIndexerDBConfig, NodeConfig}; | ||
use aptos_db_indexer::{ | ||
|
@@ -9,13 +10,12 @@ use aptos_db_indexer::{ | |
indexer_reader::IndexerReaders, | ||
}; | ||
use aptos_indexer_grpc_utils::counters::{log_grpc_step, IndexerGrpcStep}; | ||
use aptos_logger::info; | ||
use aptos_storage_interface::DbReader; | ||
use aptos_types::{indexer::indexer_db_reader::IndexerReader, transaction::Version}; | ||
use std::{ | ||
path::{Path, PathBuf}, | ||
sync::Arc, | ||
time::Duration, | ||
time::Instant, | ||
}; | ||
use tokio::{runtime::Handle, sync::watch::Receiver as WatchReceiver}; | ||
|
||
|
@@ -24,14 +24,14 @@ const INTERNAL_INDEXER_DB: &str = "internal_indexer_db"; | |
|
||
pub struct InternalIndexerDBService { | ||
pub db_indexer: Arc<DBIndexer>, | ||
pub update_receiver: WatchReceiver<Version>, | ||
pub update_receiver: WatchReceiver<(Instant, Version)>, | ||
} | ||
|
||
impl InternalIndexerDBService { | ||
pub fn new( | ||
db_reader: Arc<dyn DbReader>, | ||
internal_indexer_db: InternalIndexerDB, | ||
update_receiver: WatchReceiver<Version>, | ||
update_receiver: WatchReceiver<(Instant, Version)>, | ||
) -> Self { | ||
let internal_db_indexer = Arc::new(DBIndexer::new(internal_indexer_db, db_reader)); | ||
Self { | ||
|
@@ -166,31 +166,30 @@ impl InternalIndexerDBService { | |
|
||
pub async fn run(&mut self, node_config: &NodeConfig) -> Result<()> { | ||
let mut start_version = self.get_start_version(node_config).await?; | ||
let mut target_version = self.db_indexer.main_db_reader.ensure_synced_version()?; | ||
let mut step_timer = std::time::Instant::now(); | ||
|
||
loop { | ||
let start_time: std::time::Instant = std::time::Instant::now(); | ||
let next_version = self.db_indexer.process_a_batch(start_version)?; | ||
|
||
if next_version == start_version { | ||
if let Ok(recv_res) = | ||
tokio::time::timeout(Duration::from_millis(100), self.update_receiver.changed()) | ||
.await | ||
{ | ||
if recv_res.is_err() { | ||
info!("update sender is dropped"); | ||
return Ok(()); | ||
} | ||
if target_version <= start_version { | ||
match self.update_receiver.changed().await { | ||
Ok(_) => { | ||
(step_timer, target_version) = *self.update_receiver.borrow(); | ||
}, | ||
Err(e) => { | ||
panic!("Failed to get update from update_receiver: {}", e); | ||
}, | ||
Comment on lines
+178
to
+180
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is how this thread knows the main db has quit, right? In that case we should return And I realized even in the previous logic, this thread doesn't have a chance to quit until hitting the target version? It can be an issue when the first time the indexer is enabled? (granted we usually quit the maindb by killing the whole process, it'd be better if we deal with this case gracefully, if not too complicated.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ugh, in that case the indexer loop never quits? That should be an issue in tests? Shall we implement graceful quitting (separately)? @grao1991 ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't disagree. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there is a method called "run_with_end_version" for testing |
||
} | ||
continue; | ||
}; | ||
} | ||
let next_version = self.db_indexer.process(start_version, target_version)?; | ||
INDEXER_DB_LATENCY.set(step_timer.elapsed().as_millis() as i64); | ||
log_grpc_step( | ||
SERVICE_TYPE, | ||
IndexerGrpcStep::InternalIndexerDBProcessed, | ||
Some(start_version as i64), | ||
Some(next_version as i64), | ||
None, | ||
None, | ||
Some(start_time.elapsed().as_secs_f64()), | ||
Some(step_timer.elapsed().as_secs_f64()), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same issue with yesterday, that before it's caught up you already logged the latency, which is not accurate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Each loop now is blocked on notification of write to main db. Previously, each loop is only a batch of all updates. so this should reflect the latency. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. |
||
None, | ||
Some((next_version - start_version) as i64), | ||
None, | ||
|
@@ -205,18 +204,14 @@ impl InternalIndexerDBService { | |
node_config: &NodeConfig, | ||
end_version: Option<Version>, | ||
) -> Result<()> { | ||
let mut start_version = self.get_start_version(node_config).await?; | ||
while start_version <= end_version.unwrap_or(std::u64::MAX) { | ||
let next_version = self.db_indexer.process_a_batch(start_version)?; | ||
if next_version == start_version { | ||
tokio::time::sleep(std::time::Duration::from_millis(100)).await; | ||
continue; | ||
} | ||
start_version = next_version; | ||
let start_version = self.get_start_version(node_config).await?; | ||
let end_version = end_version.unwrap_or(std::u64::MAX); | ||
let mut next_version = start_version; | ||
while next_version < end_version { | ||
next_version = self.db_indexer.process(start_version, end_version)?; | ||
// We shouldn't stop the internal indexer so that internal indexer can catch up with the main DB | ||
tokio::time::sleep(std::time::Duration::from_secs(1)).await; | ||
} | ||
// We should never stop the internal indexer | ||
tokio::time::sleep(std::time::Duration::from_secs(100)).await; | ||
|
||
Ok(()) | ||
} | ||
} | ||
|
@@ -230,7 +225,7 @@ impl MockInternalIndexerDBService { | |
pub fn new_for_test( | ||
db_reader: Arc<dyn DbReader>, | ||
node_config: &NodeConfig, | ||
update_receiver: WatchReceiver<Version>, | ||
update_receiver: WatchReceiver<(Instant, Version)>, | ||
end_version: Option<Version>, | ||
) -> Self { | ||
if !node_config | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
// Copyright © Aptos Foundation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
use aptos_metrics_core::{register_int_gauge, IntGauge}; | ||
use once_cell::sync::Lazy; | ||
|
||
pub static INDEXER_DB_LATENCY: Lazy<IntGauge> = Lazy::new(|| { | ||
register_int_gauge!( | ||
"aptos_internal_indexer_latency", | ||
"The latency between main db update and data written to indexer db" | ||
) | ||
.unwrap() | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will error out with an empty db (before genesis is put in), is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, internal indexer is supposed to start after main db bootstrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay..