Skip to content

Commit 214f929

Browse files
shiv0408msfrohhasnain2808evankielleyruai0511
authored
Change remote state setting conditions (#16)
* Optimize global ordinal includes/excludes for prefix matching (opensearch-project#14371) * Optimize global ordinal includes/excludes for prefix matching If an aggregration specifies includes or excludes based on a regular expression, and the regular expression has a finite expansion followed by .*, then we can optimize the global ordinal filter. Specifically, in this case, we can expand the matching prefixes, then include/exclude the range of global ordinals that start with each prefix. Signed-off-by: Michael Froh <[email protected]> * Add unit test Signed-off-by: Michael Froh <[email protected]> * Add changelog entry Signed-off-by: Michael Froh <[email protected]> * Improve test coverage Updated the unit test to be functionally equivalent, but it covers more of the regex logic. Signed-off-by: Michael Froh <[email protected]> * Improve test coverage Signed-off-by: Michael Froh <[email protected]> * Fix bug in exclude-only case with no doc values in segment Signed-off-by: Michael Froh <[email protected]> * Address comments from @mch2 Signed-off-by: Michael Froh <[email protected]> --------- Signed-off-by: Michael Froh <[email protected]> * Adding access to noSubMatches and noOverlappingMatches in Hyphenation… (opensearch-project#13895) * Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter Signed-off-by: Evan Kielley <[email protected]> * Add Changelog Entry Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]> * test: add hyphenation decompounder tests Signed-off-by: Mohammad Hasnain <[email protected]> * test: refactor tests Signed-off-by: Mohammad Hasnain <[email protected]> * test: reformat test files Signed-off-by: Mohammad Hasnain <[email protected]> * chore: add changelog entry for 2.X Signed-off-by: Mohammad Hasnain <[email protected]> * chore: remove 3.x changelog Signed-off-by: Mohammad Hasnain <[email protected]> * chore: commonify settingsarr Signed-off-by: Mohammad Hasnain <[email protected]> * chore: commonify settingsarr Signed-off-by: Mohammad Hasnain <[email protected]> * chore: linting Signed-off-by: Mohammad Hasnain <[email protected]> --------- Signed-off-by: Evan Kielley <[email protected]> Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]> Signed-off-by: Mohammad Hasnain <[email protected]> Co-authored-by: Evan Kielley <[email protected]> * Add Settings related to Workload Management feature (opensearch-project#15028) * add QeryGroup Service tests Signed-off-by: Ruirui Zhang <[email protected]> * add PR to changelog Signed-off-by: Ruirui Zhang <[email protected]> * change the test directory Signed-off-by: Ruirui Zhang <[email protected]> * modify comments to be more specific Signed-off-by: Ruirui Zhang <[email protected]> * add test coverage Signed-off-by: Ruirui Zhang <[email protected]> * remove QUERY_GROUP_RUN_INTERVAL_SETTING as we'll define it in QueryGroupService Signed-off-by: Ruirui Zhang <[email protected]> * address comments Signed-off-by: Ruirui Zhang <[email protected]> * Update affiliation for @nknize. (opensearch-project#15322) Signed-off-by: dblock <[email protected]> * Add log when download completes with file size (opensearch-project#15224) Signed-off-by: Gaurav Bafna <[email protected]> * Support Filtering on Large List encoded by Bitmap (version update) (opensearch-project#15352) Signed-off-by: Andriy Redko <[email protected]> * Add support for index level slice count setting (opensearch-project#15336) Signed-off-by: Ganesh Ramadurai <[email protected]> * Adding allowlist setting for ingest-useragent and ingest-geoip processors (opensearch-project#15325) * Adding allowlist setting for user-agent, geo-ip and updated tests for ingest-common. Signed-off-by: Sarat Vemulapalli <[email protected]> * Remove duplicate test in ingest-common Signed-off-by: Sarat Vemulapalli <[email protected]> * Adding changelog Signed-off-by: Sarat Vemulapalli <[email protected]> --------- Signed-off-by: Sarat Vemulapalli <[email protected]> * Add Delete QueryGroup API Logic (opensearch-project#14735) * Add Delete QueryGroup API Logic Signed-off-by: Ruirui Zhang <[email protected]> * modify changelog Signed-off-by: Ruirui Zhang <[email protected]> * include comments from create pr Signed-off-by: Ruirui Zhang <[email protected]> * remove delete all Signed-off-by: Ruirui Zhang <[email protected]> * rebase and address comments Signed-off-by: Ruirui Zhang <[email protected]> * rebase Signed-off-by: Ruirui Zhang <[email protected]> * address comments Signed-off-by: Ruirui Zhang <[email protected]> * address comments Signed-off-by: Ruirui Zhang <[email protected]> * address comments Signed-off-by: Ruirui Zhang <[email protected]> * add UT coverage Signed-off-by: Ruirui Zhang <[email protected]> * [Star Tree] Lucene Abstractions for Star Tree File Formats (opensearch-project#15278) --------- Signed-off-by: Sarthak Aggarwal <[email protected]> * [Star tree] Changes to handle derived metrics such as avg as part of star tree mapping (opensearch-project#15152) --------- Signed-off-by: Bharathwaj G <[email protected]> * relaxing the join validation for nodes which have only store disabled but only publication enabled * relaxing the join validation for nodes which have only store disabled but only publication enabled Signed-off-by: Rajiv Kumar Vaidyanathan <[email protected]> --------- Signed-off-by: Michael Froh <[email protected]> Signed-off-by: Evan Kielley <[email protected]> Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]> Signed-off-by: Mohammad Hasnain <[email protected]> Signed-off-by: dblock <[email protected]> Signed-off-by: Gaurav Bafna <[email protected]> Signed-off-by: Andriy Redko <[email protected]> Signed-off-by: Ganesh Ramadurai <[email protected]> Signed-off-by: Sarat Vemulapalli <[email protected]> Signed-off-by: Rajiv Kumar Vaidyanathan <[email protected]> Co-authored-by: Michael Froh <[email protected]> Co-authored-by: Mohammad Hasnain Mohsin Rajan <[email protected]> Co-authored-by: Evan Kielley <[email protected]> Co-authored-by: Ruirui Zhang <[email protected]> Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]> Co-authored-by: Gaurav Bafna <[email protected]> Co-authored-by: Andriy Redko <[email protected]> Co-authored-by: Ganesh Krishna Ramadurai <[email protected]> Co-authored-by: Sarat Vemulapalli <[email protected]> Co-authored-by: Sarthak Aggarwal <[email protected]> Co-authored-by: Bharathwaj G <[email protected]> Co-authored-by: Rajiv Kumar Vaidyanathan <[email protected]>
1 parent 08e2b50 commit 214f929

File tree

73 files changed

+4237
-121
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+4237
-121
lines changed

CHANGELOG.md

+6
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
99
- Fix for hasInitiatedFetching to fix allocation explain and manual reroute APIs (([#14972](https://github.com/opensearch-project/OpenSearch/pull/14972))
1010
- [Workload Management] Add queryGroupId to Task ([14708](https://github.com/opensearch-project/OpenSearch/pull/14708))
1111
- Add setting to ignore throttling nodes for allocation of unassigned primaries in remote restore ([#14991](https://github.com/opensearch-project/OpenSearch/pull/14991))
12+
- [Workload Management] Add Delete QueryGroup API Logic ([#14735](https://github.com/opensearch-project/OpenSearch/pull/14735))
1213
- [Streaming Indexing] Enhance RestClient with a new streaming API support ([#14437](https://github.com/opensearch-project/OpenSearch/pull/14437))
1314
- Add basic aggregation support for derived fields ([#14618](https://github.com/opensearch-project/OpenSearch/pull/14618))
1415
- [Workload Management] Add Create QueryGroup API Logic ([#14680](https://github.com/opensearch-project/OpenSearch/pull/14680))- [Workload Management] Add Create QueryGroup API Logic ([#14680](https://github.com/opensearch-project/OpenSearch/pull/14680))
@@ -18,9 +19,13 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
1819
- Add `rangeQuery` and `regexpQuery` for `constant_keyword` field type ([#14711](https://github.com/opensearch-project/OpenSearch/pull/14711))
1920
- Add took time to request nodes stats ([#15054](https://github.com/opensearch-project/OpenSearch/pull/15054))
2021
- [Workload Management] Add Get QueryGroup API Logic ([14709](https://github.com/opensearch-project/OpenSearch/pull/14709))
22+
- [Workload Management] Add Settings for Workload Management feature ([#15028](https://github.com/opensearch-project/OpenSearch/pull/15028))
2123
- [Workload Management] QueryGroup resource tracking framework changes ([#13897](https://github.com/opensearch-project/OpenSearch/pull/13897))
2224
- Support filtering on a large list encoded by bitmap ([#14774](https://github.com/opensearch-project/OpenSearch/pull/14774))
2325
- Add slice execution listeners to SearchOperationListener interface ([#15153](https://github.com/opensearch-project/OpenSearch/pull/15153))
26+
- Add allowlist setting for ingest-geoip and ingest-useragent ([#15325](https://github.com/opensearch-project/OpenSearch/pull/15325))
27+
- Adding access to noSubMatches and noOverlappingMatches in Hyphenation ([#13895](https://github.com/opensearch-project/OpenSearch/pull/13895))
28+
- Add support for index level max slice count setting for concurrent segment search ([#15336](https://github.com/opensearch-project/OpenSearch/pull/15336))
2429

2530
### Dependencies
2631
- Bump `netty` from 4.1.111.Final to 4.1.112.Final ([#15081](https://github.com/opensearch-project/OpenSearch/pull/15081))
@@ -44,6 +49,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
4449

4550
### Changed
4651
- Add lower limit for primary and replica batch allocators timeout ([#14979](https://github.com/opensearch-project/OpenSearch/pull/14979))
52+
- Optimize regexp-based include/exclude on aggregations when pattern matches prefixes ([#14371](https://github.com/opensearch-project/OpenSearch/pull/14371))
4753
- Replace and block usages of org.apache.logging.log4j.util.Strings ([#15238](https://github.com/opensearch-project/OpenSearch/pull/15238))
4854

4955
### Deprecated

MAINTAINERS.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
2222
| Varun Bansal | [linuxpi](https://github.com/linuxpi) | Amazon |
2323
| Marc Handalian | [mch2](https://github.com/mch2) | Amazon |
2424
| Michael Froh | [msfroh](https://github.com/msfroh) | Amazon |
25-
| Nick Knize | [nknize](https://github.com/nknize) | Amazon |
25+
| Nick Knize | [nknize](https://github.com/nknize) | Lucenia |
2626
| Owais Kazi | [owaiskazi19](https://github.com/owaiskazi19) | Amazon |
2727
| Peter Nied | [peternied](https://github.com/peternied) | Amazon |
2828
| Rishikesh Pasham | [Rishikesh1159](https://github.com/Rishikesh1159) | Amazon |

modules/analysis-common/src/main/java/org/opensearch/analysis/common/HyphenationCompoundWordTokenFilterFactory.java

+8-1
Original file line numberDiff line numberDiff line change
@@ -54,11 +54,16 @@
5454
*/
5555
public class HyphenationCompoundWordTokenFilterFactory extends AbstractCompoundWordTokenFilterFactory {
5656

57+
private final boolean noSubMatches;
58+
private final boolean noOverlappingMatches;
5759
private final HyphenationTree hyphenationTree;
5860

5961
HyphenationCompoundWordTokenFilterFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
6062
super(indexSettings, env, name, settings);
6163

64+
noSubMatches = settings.getAsBoolean("no_sub_matches", false);
65+
noOverlappingMatches = settings.getAsBoolean("no_overlapping_matches", false);
66+
6267
String hyphenationPatternsPath = settings.get("hyphenation_patterns_path", null);
6368
if (hyphenationPatternsPath == null) {
6469
throw new IllegalArgumentException("hyphenation_patterns_path is a required setting.");
@@ -85,7 +90,9 @@ public TokenStream create(TokenStream tokenStream) {
8590
minWordSize,
8691
minSubwordSize,
8792
maxSubwordSize,
88-
onlyLongestMatch
93+
onlyLongestMatch,
94+
noSubMatches,
95+
noOverlappingMatches
8996
);
9097
}
9198
}

modules/analysis-common/src/test/java/org/opensearch/analysis/common/CompoundAnalysisTests.java

+52-11
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,12 @@
5050
import org.opensearch.test.IndexSettingsModule;
5151
import org.opensearch.test.OpenSearchTestCase;
5252
import org.hamcrest.MatcherAssert;
53+
import org.junit.Before;
5354

5455
import java.io.IOException;
56+
import java.io.InputStream;
57+
import java.nio.file.Files;
58+
import java.nio.file.Path;
5559
import java.util.ArrayList;
5660
import java.util.Arrays;
5761
import java.util.List;
@@ -63,17 +67,27 @@
6367
import static org.hamcrest.Matchers.instanceOf;
6468

6569
public class CompoundAnalysisTests extends OpenSearchTestCase {
70+
71+
Settings[] settingsArr;
72+
73+
@Before
74+
public void initialize() throws IOException {
75+
final Path home = createTempDir();
76+
copyHyphenationPatternsFile(home);
77+
this.settingsArr = new Settings[] { getJsonSettings(home), getYamlSettings(home) };
78+
}
79+
6680
public void testDefaultsCompoundAnalysis() throws Exception {
67-
Settings settings = getJsonSettings();
68-
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("test", settings);
69-
AnalysisModule analysisModule = createAnalysisModule(settings);
70-
TokenFilterFactory filterFactory = analysisModule.getAnalysisRegistry().buildTokenFilterFactories(idxSettings).get("dict_dec");
71-
MatcherAssert.assertThat(filterFactory, instanceOf(DictionaryCompoundWordTokenFilterFactory.class));
81+
for (Settings settings : this.settingsArr) {
82+
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("test", settings);
83+
AnalysisModule analysisModule = createAnalysisModule(settings);
84+
TokenFilterFactory filterFactory = analysisModule.getAnalysisRegistry().buildTokenFilterFactories(idxSettings).get("dict_dec");
85+
MatcherAssert.assertThat(filterFactory, instanceOf(DictionaryCompoundWordTokenFilterFactory.class));
86+
}
7287
}
7388

7489
public void testDictionaryDecompounder() throws Exception {
75-
Settings[] settingsArr = new Settings[] { getJsonSettings(), getYamlSettings() };
76-
for (Settings settings : settingsArr) {
90+
for (Settings settings : this.settingsArr) {
7791
List<String> terms = analyze(settings, "decompoundingAnalyzer", "donaudampfschiff spargelcremesuppe");
7892
MatcherAssert.assertThat(terms.size(), equalTo(8));
7993
MatcherAssert.assertThat(
@@ -83,6 +97,26 @@ public void testDictionaryDecompounder() throws Exception {
8397
}
8498
}
8599

100+
// Hyphenation Decompounder tests mimic the behavior of lucene tests
101+
// lucene/analysis/common/src/test/org/apache/lucene/analysis/compound/TestHyphenationCompoundWordTokenFilterFactory.java
102+
public void testHyphenationDecompounder() throws Exception {
103+
for (Settings settings : this.settingsArr) {
104+
List<String> terms = analyze(settings, "hyphenationAnalyzer", "min veninde som er lidt af en læsehest");
105+
MatcherAssert.assertThat(terms.size(), equalTo(10));
106+
MatcherAssert.assertThat(terms, hasItems("min", "veninde", "som", "er", "lidt", "af", "en", "læsehest", "læse", "hest"));
107+
}
108+
}
109+
110+
// Hyphenation Decompounder tests mimic the behavior of lucene tests
111+
// lucene/analysis/common/src/test/org/apache/lucene/analysis/compound/TestHyphenationCompoundWordTokenFilterFactory.java
112+
public void testHyphenationDecompounderNoSubMatches() throws Exception {
113+
for (Settings settings : this.settingsArr) {
114+
List<String> terms = analyze(settings, "hyphenationAnalyzerNoSubMatches", "basketballkurv");
115+
MatcherAssert.assertThat(terms.size(), equalTo(3));
116+
MatcherAssert.assertThat(terms, hasItems("basketballkurv", "basketball", "kurv"));
117+
}
118+
}
119+
86120
private List<String> analyze(Settings settings, String analyzerName, String text) throws IOException {
87121
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("test", settings);
88122
AnalysisModule analysisModule = createAnalysisModule(settings);
@@ -111,21 +145,28 @@ public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
111145
}));
112146
}
113147

114-
private Settings getJsonSettings() throws IOException {
148+
private void copyHyphenationPatternsFile(Path home) throws IOException {
149+
InputStream hyphenation_patterns_path = getClass().getResourceAsStream("da_UTF8.xml");
150+
Path config = home.resolve("config");
151+
Files.createDirectory(config);
152+
Files.copy(hyphenation_patterns_path, config.resolve("da_UTF8.xml"));
153+
}
154+
155+
private Settings getJsonSettings(Path home) throws IOException {
115156
String json = "/org/opensearch/analysis/common/test1.json";
116157
return Settings.builder()
117158
.loadFromStream(json, getClass().getResourceAsStream(json), false)
118159
.put(IndexMetadata.SETTING_VERSION_CREATED, Version.CURRENT)
119-
.put(Environment.PATH_HOME_SETTING.getKey(), createTempDir().toString())
160+
.put(Environment.PATH_HOME_SETTING.getKey(), home.toString())
120161
.build();
121162
}
122163

123-
private Settings getYamlSettings() throws IOException {
164+
private Settings getYamlSettings(Path home) throws IOException {
124165
String yaml = "/org/opensearch/analysis/common/test1.yml";
125166
return Settings.builder()
126167
.loadFromStream(yaml, getClass().getResourceAsStream(yaml), false)
127168
.put(IndexMetadata.SETTING_VERSION_CREATED, Version.CURRENT)
128-
.put(Environment.PATH_HOME_SETTING.getKey(), createTempDir().toString())
169+
.put(Environment.PATH_HOME_SETTING.getKey(), home.toString())
129170
.build();
130171
}
131172
}

0 commit comments

Comments
 (0)