Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Sharepoint Folder Parsing #3791

Merged
merged 2 commits into from
Jan 27, 2025
Merged

Fix Sharepoint Folder Parsing #3791

merged 2 commits into from
Jan 27, 2025

Conversation

yuhongsun96
Copy link
Contributor

@yuhongsun96 yuhongsun96 commented Jan 27, 2025

Description

https://linear.app/danswer/issue/DAN-1342/sharepoint-cant-index-folders

Sharepoint Folders aren't getting parsed correctly

How Has This Been Tested?

Verified against our own Sharepoint (with some hacks since data format has changed)

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Copy link

vercel bot commented Jan 27, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 27, 2025 0:40am

filtered_driveitems = [
item
for item in driveitems
if element.folder in item.parent_reference.path
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously a bit unsafe, if the folder was "test" anything with "test" in any of the paths would be included

@@ -82,8 +83,13 @@ def _extract_site_and_folder(site_urls: list[str]) -> list[SiteData]:
sites_index = parts.index("sites")
site_url = "/".join(parts[: sites_index + 2])
folder = (
parts[sites_index + 2] if len(parts) > sites_index + 2 else None
"/".join(unquote(part) for part in parts[sites_index + 2 :])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now allows for nested folders and folders with %20 (spaces)

@@ -77,7 +77,9 @@ export function LabelWithTooltip({
}

export function SubLabel({ children }: { children: string | JSX.Element }) {
return <div className="text-xs text-subtle">{children}</div>;
return (
<div className="text-xs text-subtle whitespace-pre-line">{children}</div>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontend uses bullet points but all of them are on the same line, this fixes that. Hope no unwanted side effects.

all_paths = [
item.parent_reference.path for item in driveitems
]
logger.warning(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't repro, add logging

@yuhongsun96 yuhongsun96 merged commit 05ab949 into main Jan 27, 2025
10 of 11 checks passed
@yuhongsun96 yuhongsun96 deleted the sharepoint-folder branch January 27, 2025 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants