Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] hugeclient批量数据写入,提示:java.lang.illegalArgumentException: the max length of bytes is 65535, but get 337492. #2291

Open
1 task done
dongma opened this issue Aug 21, 2023 · 14 comments
Assignees
Labels
good first issue Good for newcomers improvement General improvement todo
Milestone

Comments

@dongma
Copy link

dongma commented Aug 21, 2023

Bug Type (问题类型)

rest-api (结果不合预期)

Before submit

  • 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: 1.0.0 (Apache Release Version)
  • Backend: RocksDB x nodes, HDD or SSD
  • OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
  • Data Size: xx vertices, xx edges

图库版本:1.0.0版本
存储引擎:HBase
数据量:点1亿+,边数量1亿+。点已正常写入,边上有属性字段较多,一批次写入1000条左右。

期望,请求体大小大于95535,看图库服务端是否参数可调整(限制请求体大小的参数为?)

Expected & Actual behavior (期望与实际表现)

hugeclient批量数据写入,一批次request body请求次数大于服务端默认的65535大小,导致关系数据无法写入。
异常堆栈如下:

java.lang.IllegalArgumentException: the max length of bytes is 65535,  but get 337492.
at org.apache.hugegraph.exception.ServerException.fromResponse(ServerException.java: 45)
at org.apache.hugegraph.client.RestClient.checkStatus(RestClinet.java:91)
at org.apache.hugegraph.rest.AbstractRestClient.post(AbstractRestClient.java:232)
at org.apache.hugegraph.api.graph.EdgeAPI.create(EdgeAPI.java:58)
at org.apache.hugegraph.driver.GraphManager.addEdges(GraphManager.java:262)
at org.apache.hugegraph.driver.GraphManager.addEdges(GraphManager.java:254)
.....

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

@dongma dongma added the bug Something isn't working label Aug 21, 2023
@imbajin
Copy link
Member

imbajin commented Aug 22, 2023

@DanGuge could take a look for it? Better to make the vertex/edge ID length configurable

@DanGuge
Copy link
Contributor

DanGuge commented Aug 22, 2023

@DanGuge could take a look for it? Better to make the vertex/edge ID length configurable

I will check this later

@dongma
Copy link
Author

dongma commented Aug 22, 2023

@DanGuge could take a look for it? Better to make the vertex/edge ID length configurable

Thanks for your reply, I had resolved this problem last night [憨笑] . The reason why failed to write record to hugegraph server is the length of Text(String) type was limit to 65535.

After check my data rows, found that one property value' length of few rows is over that 65535, the writing operation was success after limit property length.

Below is the full logic to check property value length :
image

image image

@dongma
Copy link
Author

dongma commented Aug 22, 2023

close my issue, the length Text property value should less than 65535.

@dongma dongma closed this as completed Aug 22, 2023
@imbajin
Copy link
Member

imbajin commented Aug 22, 2023

close my issue, the length Text property value should less than 65535.

we know the limitation, and consider add an option for user to modify it (so as the length of vertex/edge ID)

@LiJie20190102
Copy link
Contributor

close my issue, the length Text property value should less than 65535.

we know the limitation, and consider add an option for user to modify it (so as the length of vertex/edge ID)

Excuse me, is there a PR solution to this issue

@imbajin imbajin added improvement General improvement todo and removed bug Something isn't working labels Jan 26, 2024
@imbajin imbajin moved this to 🆕 New in HugeGraph Tasks Jan 26, 2024
@imbajin
Copy link
Member

imbajin commented Jan 26, 2024

close my issue, the length Text property value should less than 65535.

we know the limitation, and consider add an option for user to modify it (so as the length of vertex/edge ID)

Excuse me, is there a PR solution to this issue

thanks for the reminder,address it again

@LiJie20190102
Copy link
Contributor

I want to implement this function, can it be assigned to me

@imbajin
Copy link
Member

imbajin commented Jan 26, 2024

I want to implement this function, can it be assigned to me

@LiJie20190102 If u want to config property length, we could reopen this issue & link PR to it
If u only want to config VertexID length, u could submit a new issue (better)

@dongma
Copy link
Author

dongma commented Jan 26, 2024

close my issue, the length Text property value should less than 65535.

we know the limitation, and consider add an option for user to modify it (so as the length of vertex/edge ID)

Excuse me, is there a PR solution to this issue

@LiJie20190102 I currently don't create PR to the repository to fix this problem, "extract the limitation to a configuration" is the suggesstion from database developers.
we filtered the error rows which contains longer field (more than 65535) when import data to hugegraph.

@LiJie20190102
Copy link
Contributor

I want to implement this function, can it be assigned to me

@LiJie20190102 If u want to config property length, we could reopen this issue & link PR to it If u only want to config VertexID length, u could submit a new issue (better)

I want to do both

@imbajin imbajin reopened this Jan 26, 2024
@imbajin imbajin added this to the 1.3.0 milestone Jan 26, 2024
@LiJie20190102
Copy link
Contributor

LiJie20190102 commented Feb 6, 2024

I think a good solution is not to limit the length of vertex/edge ID or attributes, but to dynamically change the attribute values of org.apache.hugegraph.backend.serializer.BytesBuffer, such as UINT8_ MAX, because in business use, sometimes the length of vertex IDs does need to be this long. In this case, we should not limit their length, but rather modify the limit to accept the corresponding value,WDYT @imbajin @dongma

@imbajin
Copy link
Member

imbajin commented Feb 6, 2024

I think a good solution is not to limit the length of vertex/edge ID or attributes, but to dynamically change the attribute values of org.apache.hugegraph.backend.serializer.BytesBuffer, such as UINT8_ MAX, because in business use, sometimes the length of vertex IDs does need to be this long. In this case, we should not limit their length, but rather modify the limit to accept the corresponding value,WDYT @imbajin @dongma

var-length occupation generally affects read and write performance, including but not limited to the impact of EID filling occupation and scanning properties. It would be more appropriate to consider allowing users to customize configurations first? (then we could try to test the var-length way)

@dosu-bot What's your suggestion/analysis?

Copy link

dosubot bot commented Feb 6, 2024

🤖

Hey @imbajin, great to see you back with another insightful query! Hope you're doing well.

Thank you for your detailed explanation and suggestion. However, allowing users to customize configurations to dynamically change the attribute values of org.apache.hugegraph.backend.serializer.BytesBuffer, such as UINT8_MAX, could potentially lead to issues.

The constants defined in BytesBuffer class like UINT8_MAX, UINT16_MAX, UINT32_MAX, ID_LEN_MAX, BIG_ID_LEN_MAX, STRING_LEN_MAX, BLOB_LEN_MAX, etc. are used to define the maximum size of different data types and identifiers. These constants are used throughout the codebase to ensure data integrity and consistency.

Changing these values dynamically could lead to unpredictable behavior, data corruption, or even system crashes. For example, if you increase the UINT8_MAX value, it could lead to overflow issues when the data is read back because the system expects the data to be within the original range.

Moreover, these constants are also used to allocate memory for buffers. Dynamically changing these values could lead to inefficient memory usage or even out-of-memory errors if the values are set too high.

Therefore, it's not recommended to allow users to customize these configurations dynamically. If you need to handle larger IDs or attributes, it would be better to consider changing the data model or using a different data type that can accommodate larger values.

If you still want to proceed with this, it would require a significant amount of testing to ensure the system behaves correctly under different configurations. It's also important to provide clear documentation and warnings to users about the potential risks of changing these values.

As for the role of org.apache.hugegraph.backend.serializer.BytesBuffer in the HugeGraph database, it is a utility for reading and writing binary data. It extends the OutputStream class and provides methods for writing and reading various data types to and from a byte buffer. You can find more details about its role in the source code.

I hope this helps. If you have any other questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@imbajin imbajin added the good first issue Good for newcomers label Apr 15, 2024
@imbajin imbajin moved this from 🆕 New to Newcomer Task in HugeGraph Tasks Apr 15, 2024
imbajin added a commit that referenced this issue Aug 10, 2024
…ncode/decode string id length (#2622)

As title, change limit:
- vid max to 16KB
- eid max to 64kb (128k as backup)
- property max to 10MB (keep consistent)

fix #1593 #2291

---------

Co-authored-by: imbajin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers improvement General improvement todo
Projects
Status: ⚾️ Newcomer Task
Development

No branches or pull requests

4 participants