Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storages: introduce inverted index file format & writer & reader #9844

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Feb 6, 2025

What problem does this PR solve?

Issue Number: ref #9843

Problem Summary:

What is changed and how it works?

First part of inverted index, introduce inverted index file format & builder & viewer

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Feb 6, 2025
Copy link
Contributor

ti-chi-bot bot commented Feb 6, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from lloyd-pottiger, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 6, 2025
RUNTIME_CHECK_MSG(false, "Unsupported index kind: {}", magic_enum::enum_name(index.info.kind));
break;
}
if (auto builder = LocalIndexBuilder::create(index.info); builder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if builder is nullptr? Should we at least print some logs for debugging.

@Lloyd-Pottiger Lloyd-Pottiger changed the title Storages: introduce inverted index file format & builder & viewer Storages: introduce inverted index file format & writer & reader Mar 3, 2025
Copy link
Member

@breezewish breezewish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index framework part looks fine

auto data_size = write_buf.count();
auto buf = write_buf.tryGetReadBuffer();
// ColumnFileDataProviderRNLocalPageCache currently does not support read data with fields
options.wbs.log.putPage(index_page_id, 0, buf, data_size, {data_size});
Copy link
Member

@CalvinNeo CalvinNeo Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What magnitude is the size of the page for the inverted index going to be? Is it BlockSize or times of BlockSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AfterCompressed(MetaSize + BlockCount * BlockSize)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set data_sizes if we always read the whole page from disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnFileDataProviderRNLocalPageCache currently does not support read data withiout fields

{
auto & entry = block.entries[i];
read_buf.read(reinterpret_cast<char *>(entry.row_ids.data()), entry.row_ids.size() * sizeof(RowID));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it is good handle some EOF failures here? Because I can only expect a data corruption happens here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use readStrict instead.

block.entries[i].value = value;
block.entries[i].row_ids.resize(row_ids_size);
}
for (UInt32 i = 0; i < size; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a naive question... Is there going to be some cases where we only need the row values here? For example, if we want a count in given range, then seems we don't need the actual row_ids then?
If so, we can save some memory here, and make the local_index_cache bigger to reduce its possibilities of being evicted.

Copy link
Contributor Author

@Lloyd-Pottiger Lloyd-Pottiger Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not support agg now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't because we don't have enough time, or we can't because the arch doesn't support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have enough time. We have stored the size of row ids.

"Inverted index operation duration", \
Histogram, \
F(type_build, {{"type", "build"}}, ExpBuckets{0.001, 2, 20}), \
F(type_download, {{"type", "download"}}, ExpBuckets{0.001, 2, 20}), \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will change this metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build: ~InvertedIndexWriterInternal
download: will used in DMFileInvertedIndexReader which is not included in this PR.

Copy link
Member

@breezewish breezewish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good

}

template <typename T>
void Block<T>::search(BitmapFilterPtr & bitmap_filter, ReadBuffer & read_buf, T key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the performance as it involves a lot of syscall (even though the underlying page is possibly cached).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer size is 1MB by default, so maybe it is acceptable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a pure looking-forward scene, so the buffer should be adequate.

{
UInt32 size;
readIntBinary(size, read_buf);
UInt32 seek_offset = size * (sizeof(T) + sizeof(UInt32));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, why not simply use an absolute seek? Could be possibly make it simpler (whence=SEEK_SET)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadBuffer does not support seek

T real_key = key;
auto it = index.find(real_key);
if (it != index.end())
bitmap_filter->set(it->second, nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing values in bitmap_filter is not cleared. Is it ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

template <typename T>
void InvertedIndexMemoryReader<T>::searchRange(BitmapFilterPtr & bitmap_filter, const Key & begin, const Key & end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used for SQLS like WHERE x >= .. and x <= ..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

  1. x > 0 ==> [1, MAX]
  2. x < 10 ==> [MIN, 9]
  3. x > 0 & x < 10 ==> [1, 9]

Comment on lines +99 to +103
case TypeIndex::MyDate:
case TypeIndex::MyDateTime:
case TypeIndex::MyTimeStamp:
return std::make_shared<InvertedIndexMemoryReader<UInt64>>(buf, index_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do they decay to UInt64? Are there any references

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define COLUMN_TYPES(M) \
M(Decimal, 0, Decimal, Decimal32) \
M(Tiny, 1, VarInt, Int8) \
M(Short, 2, VarInt, Int16) \
M(Long, 3, VarInt, Int32) \
M(Float, 4, Float, Float32) \
M(Double, 5, Float, Float64) \
M(Null, 6, Nil, Nothing) \
M(Timestamp, 7, UInt, MyDateTime) \
M(LongLong, 8, Int, Int64) \
M(Int24, 9, VarInt, Int32) \
M(Date, 10, UInt, MyDate) \
M(Time, 11, Duration, Int64) \
M(Datetime, 12, UInt, MyDateTime) \
M(Year, 13, Int, Int16) \
M(NewDate, 14, Int, MyDate) \
M(Varchar, 15, CompactBytes, String) \
M(Bit, 16, VarInt, UInt64) \
M(JSON, 0xf5, Json, String) \
M(NewDecimal, 0xf6, Decimal, Decimal32) \
M(Enum, 0xf7, VarUInt, Enum16) \
M(Set, 0xf8, VarUInt, UInt64) \
M(TinyBlob, 0xf9, CompactBytes, String) \
M(MediumBlob, 0xfa, CompactBytes, String) \
M(LongBlob, 0xfb, CompactBytes, String) \
M(Blob, 0xfc, CompactBytes, String) \
M(VarString, 0xfd, CompactBytes, String) \
M(String, 0xfe, CompactBytes, String) \
M(Geometry, 0xff, CompactBytes, String) \
M(TiDBVectorFloat32, 0xe1, VectorFloat32, Array)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class DataTypeMyTimeBase : public DataTypeNumberBase<UInt64>

// Only one of the below will be set
.def_vector_index = idx.vector_index,
});
new_index_infos->emplace_back(LocalIndexInfo(idx.id, column_id, idx.vector_index));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the logic for adding inverted index be added in a later PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

auto data_size = write_buf.count();
auto buf = write_buf.tryGetReadBuffer();
// ColumnFileDataProviderRNLocalPageCache currently does not support read data with fields
options.wbs.log.putPage(index_page_id, 0, buf, data_size, {data_size});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set data_sizes if we always read the whole page from disk?

Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants