Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue where the charset from Content-Type is not parsed correctly, and refactor the related code. #15188

Merged
merged 1 commit into from
Mar 3, 2025

Conversation

Stellar1999
Copy link
Contributor

What is the purpose of the change?

Fix the issue where the charset from Content-Type is not parsed correctly, and refactor the related code.
Ref:issue #15187

Checklist

  • Make sure there is a GitHub_issue field for the change.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test to verify your logic correction. If the new feature or significant change is committed, please remember to add sample in dubbo samples project.
  • Make sure gitHub actions can pass. Why the workflow is failing and how to fix it?

@codecov-commenter
Copy link

codecov-commenter commented Feb 28, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Project coverage is 60.76%. Comparing base (403e127) to head (6781c13).

Files with missing lines Patch % Lines
...o/remoting/http12/message/DefaultHttpResponse.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                3.3   #15188      +/-   ##
============================================
+ Coverage     60.74%   60.76%   +0.02%     
  Complexity    10897    10897              
============================================
  Files          1885     1885              
  Lines         86071    86070       -1     
  Branches      12894    12892       -2     
============================================
+ Hits          52286    52303      +17     
+ Misses        28332    28319      -13     
+ Partials       5453     5448       -5     
Flag Coverage Δ
integration-tests 33.11% <0.00%> (-0.08%) ⬇️
samples-tests 29.19% <0.00%> (-0.02%) ⬇️
unit-tests 58.91% <90.90%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Stellar1999
Copy link
Contributor Author

@AlbumenJ @oxsean PTAL

@oxsean oxsean assigned oxsean and Stellar1999 and unassigned oxsean Feb 28, 2025
@oxsean oxsean self-requested a review February 28, 2025 09:25
@Stellar1999 Stellar1999 force-pushed the FIX_HEADER_PARSE_PROBLEM branch from 1dfff91 to 6781c13 Compare March 1, 2025 10:08
@brookqin
Copy link

brookqin commented Mar 1, 2025

@Stellar1999
First of all, thank you for responding to this issue so promptly.

rfc7231/3.1.1.1. Media Type
media-type = type "/" subtype *( OWS ";" OWS parameter )

According to the description in the RFC document, the content-type may include one or more parameters in addition to the media-type, with charset being just one of these parameters. There is no stipulated order for these parameters, so in extreme cases, relying on index positioning could potentially lead to issues.

Perhaps it is also necessary to define a new method: String mediaTypeParam(String name);, to retrieve the value of a specified parameter.

String mediaTypeParam(String name) {
  paramsStr = contentType.substring(X);
  params = paramsStr.split(";")
                    .filter(str -> str.contain("="))
                    .map(str -> str.split("="));
  return params[name];
}

var charset = mediaTypeParam("charset");
var boundary = mediaTypeParam("boundary");

@Stellar1999
Copy link
Contributor Author

@Stellar1999 First of all, thank you for responding to this issue so promptly.

rfc7231/3.1.1.1. Media Type media-type = type "/" subtype *( OWS ";" OWS parameter )

According to the description in the RFC document, the content-type may include one or more parameters in addition to the media-type, with charset being just one of these parameters. There is no stipulated order for these parameters, so in extreme cases, relying on index positioning could potentially lead to issues.

Perhaps it is also necessary to define a new method: String mediaTypeParam(String name);, to retrieve the value of a specified parameter.

String mediaTypeParam(String name) {
  paramsStr = contentType.substring(X);
  params = paramsStr.split(";")
                    .filter(str -> str.contain("="))
                    .map(str -> str.split("="));
  return params[name];
}

var charset = mediaTypeParam("charset");
var boundary = mediaTypeParam("boundary");

Thanks for your review.
I think my new code does not rely on an index for positioning. (If you mean indexOf, it's only used to get the charset string.) Like you said, maybe we can find a better general method to retrieve more parameters. However, as I see in the Dubbo code, it only relies on the charset. So, I believe we just need a special method to obtain the charset.

@AlbumenJ AlbumenJ merged commit bcd3fc9 into apache:3.3 Mar 3, 2025
19 checks passed
charset = StringUtils.EMPTY_STRING;
} else {
charset = contentType.substring(index + CHARSET_PREFIX.length()).trim();
charset = charset.split(";")[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tip, do not use split as it scans and slices the whole string and checks for regexp, use indexOf(char) for best performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, This PR has be merged. I will submit another PR to do this fix.
Thanks for your review.

@@ -81,6 +81,22 @@ public static List<HttpCookie> decodeCookies(String value) {
return cookies;
}

public static String getCharsetFromContentType(String contentType) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse or resolve will be btter than get.
parseCharset(String contentType)
Clear enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants