-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storages: introduce inverted index file format & writer & reader #9844
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9dcf222
to
df58e9d
Compare
df58e9d
to
17a1b6c
Compare
dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTinyLocalIndexWriter.cpp
Outdated
Show resolved
Hide resolved
RUNTIME_CHECK_MSG(false, "Unsupported index kind: {}", magic_enum::enum_name(index.info.kind)); | ||
break; | ||
} | ||
if (auto builder = LocalIndexBuilder::create(index.info); builder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if builder
is nullptr
? Should we at least print some logs for debugging.
17a1b6c
to
07ebbf2
Compare
619ff99
to
4d11099
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index framework part looks fine
804493d
to
d35a94a
Compare
auto data_size = write_buf.count(); | ||
auto buf = write_buf.tryGetReadBuffer(); | ||
// ColumnFileDataProviderRNLocalPageCache currently does not support read data with fields | ||
options.wbs.log.putPage(index_page_id, 0, buf, data_size, {data_size}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What magnitude is the size of the page for the inverted index going to be? Is it BlockSize
or times of BlockSize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AfterCompressed(MetaSize + BlockCount * BlockSize)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set data_sizes
if we always read the whole page from disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnFileDataProviderRNLocalPageCache currently does not support read data withiout fields
{ | ||
auto & entry = block.entries[i]; | ||
read_buf.read(reinterpret_cast<char *>(entry.row_ids.data()), entry.row_ids.size() * sizeof(RowID)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it is good handle some EOF failures here? Because I can only expect a data corruption happens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use readStrict
instead.
block.entries[i].value = value; | ||
block.entries[i].row_ids.resize(row_ids_size); | ||
} | ||
for (UInt32 i = 0; i < size; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a naive question... Is there going to be some cases where we only need the row values here? For example, if we want a count in given range, then seems we don't need the actual row_id
s then?
If so, we can save some memory here, and make the local_index_cache bigger to reduce its possibilities of being evicted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not support agg now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't because we don't have enough time, or we can't because the arch doesn't support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have enough time. We have stored the size of row ids.
"Inverted index operation duration", \ | ||
Histogram, \ | ||
F(type_build, {{"type", "build"}}, ExpBuckets{0.001, 2, 20}), \ | ||
F(type_download, {{"type", "download"}}, ExpBuckets{0.001, 2, 20}), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will change this metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build: ~InvertedIndexWriterInternal
download: will used in DMFileInvertedIndexReader
which is not included in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest looks good
dbms/src/Storages/DeltaMerge/Index/InvertedIndex/CommonUtil.cpp
Outdated
Show resolved
Hide resolved
} | ||
|
||
template <typename T> | ||
void Block<T>::search(BitmapFilterPtr & bitmap_filter, ReadBuffer & read_buf, T key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about the performance as it involves a lot of syscall (even though the underlying page is possibly cached).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer size is 1MB by default, so maybe it is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a pure looking-forward scene, so the buffer should be adequate.
{ | ||
UInt32 size; | ||
readIntBinary(size, read_buf); | ||
UInt32 seek_offset = size * (sizeof(T) + sizeof(UInt32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, why not simply use an absolute seek? Could be possibly make it simpler (whence=SEEK_SET)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadBuffer
does not support seek
T real_key = key; | ||
auto it = index.find(real_key); | ||
if (it != index.end()) | ||
bitmap_filter->set(it->second, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing values in bitmap_filter is not cleared. Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
} | ||
|
||
template <typename T> | ||
void InvertedIndexMemoryReader<T>::searchRange(BitmapFilterPtr & bitmap_filter, const Key & begin, const Key & end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used for SQLS like WHERE x >= .. and x <= ..
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
- x > 0 ==> [1, MAX]
- x < 10 ==> [MIN, 9]
- x > 0 & x < 10 ==> [1, 9]
case TypeIndex::MyDate: | ||
case TypeIndex::MyDateTime: | ||
case TypeIndex::MyTimeStamp: | ||
return std::make_shared<InvertedIndexMemoryReader<UInt64>>(buf, index_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do they decay to UInt64? Are there any references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiflash/dbms/src/TiDB/Schema/TiDBTypes.h
Lines 26 to 55 in 3cda2f6
#define COLUMN_TYPES(M) \ | |
M(Decimal, 0, Decimal, Decimal32) \ | |
M(Tiny, 1, VarInt, Int8) \ | |
M(Short, 2, VarInt, Int16) \ | |
M(Long, 3, VarInt, Int32) \ | |
M(Float, 4, Float, Float32) \ | |
M(Double, 5, Float, Float64) \ | |
M(Null, 6, Nil, Nothing) \ | |
M(Timestamp, 7, UInt, MyDateTime) \ | |
M(LongLong, 8, Int, Int64) \ | |
M(Int24, 9, VarInt, Int32) \ | |
M(Date, 10, UInt, MyDate) \ | |
M(Time, 11, Duration, Int64) \ | |
M(Datetime, 12, UInt, MyDateTime) \ | |
M(Year, 13, Int, Int16) \ | |
M(NewDate, 14, Int, MyDate) \ | |
M(Varchar, 15, CompactBytes, String) \ | |
M(Bit, 16, VarInt, UInt64) \ | |
M(JSON, 0xf5, Json, String) \ | |
M(NewDecimal, 0xf6, Decimal, Decimal32) \ | |
M(Enum, 0xf7, VarUInt, Enum16) \ | |
M(Set, 0xf8, VarUInt, UInt64) \ | |
M(TinyBlob, 0xf9, CompactBytes, String) \ | |
M(MediumBlob, 0xfa, CompactBytes, String) \ | |
M(LongBlob, 0xfb, CompactBytes, String) \ | |
M(Blob, 0xfc, CompactBytes, String) \ | |
M(VarString, 0xfd, CompactBytes, String) \ | |
M(String, 0xfe, CompactBytes, String) \ | |
M(Geometry, 0xff, CompactBytes, String) \ | |
M(TiDBVectorFloat32, 0xe1, VectorFloat32, Array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class DataTypeMyTimeBase : public DataTypeNumberBase<UInt64> |
// Only one of the below will be set | ||
.def_vector_index = idx.vector_index, | ||
}); | ||
new_index_infos->emplace_back(LocalIndexInfo(idx.id, column_id, idx.vector_index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the logic for adding inverted index be added in a later PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
auto data_size = write_buf.count(); | ||
auto buf = write_buf.tryGetReadBuffer(); | ||
// ColumnFileDataProviderRNLocalPageCache currently does not support read data with fields | ||
options.wbs.log.putPage(index_page_id, 0, buf, data_size, {data_size}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set data_sizes
if we always read the whole page from disk?
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
d35a94a
to
164d2a3
Compare
Signed-off-by: Lloyd-Pottiger <[email protected]>
What problem does this PR solve?
Issue Number: ref #9843
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note