Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser for unsaved Windows Notepad tabs #540

Merged
merged 39 commits into from
Aug 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c528241
Initial commit
joost-j Jan 23, 2024
3594ff0
Removed unused 'seek_size' function
joost-j Feb 14, 2024
b1bcd69
Refactored the code to work with new LEB128 structure, added some mor…
joost-j Feb 15, 2024
c634987
Added more comments
joost-j Feb 15, 2024
d3d35a1
Refactor c_def to include parsing of both variants
joost-j Feb 19, 2024
cef81d0
Bump dissect.cstruct version to >=4.0.dev for clarity
joost-j Feb 19, 2024
7934f3e
Apply suggestions from code review
joost-j Feb 26, 2024
e6ea019
Removed duplicate brackets and refactor assertion into warning log
joost-j Feb 26, 2024
12fdd4a
Change variable names to fsize1 and fsize2, plus some linting
joost-j Feb 26, 2024
39a34a7
Refactored to work with LEB128 backport
joost-j Mar 4, 2024
8566028
Process feedback
joost-j Mar 4, 2024
56a26fa
Set cstruct dependency to next release
joost-j Mar 4, 2024
b18e975
Restore original shimcache.py file
joost-j Mar 4, 2024
1a1d80d
Move TextEditorTabRecord definition
joost-j Mar 25, 2024
b00bdc3
Remove content_length field from record
joost-j Mar 25, 2024
a124202
Apply suggestions from code review
joost-j Mar 25, 2024
dbaca5d
Change TabEditorTabRecord formatting
joost-j Mar 25, 2024
d66fa54
Black formatting, fix tests, add annotations import
joost-j Mar 25, 2024
bdaccbc
Bump cstruct version again
joost-j Mar 25, 2024
ad78273
Bump dependencies as leb128 is now included in dev release
joost-j Mar 28, 2024
0d9c88f
Implemented deletion of characters, refactored, added new tests
joost-j Mar 28, 2024
304db58
Small comment changes
joost-j Mar 28, 2024
2ca889c
Remove chunked addition of zero bytes
joost-j Mar 28, 2024
74ffb83
Added new test, changed to list insertion instead of appending
joost-j Mar 28, 2024
c148061
Refactored test file and removed fileState enum
joost-j Mar 28, 2024
2bf6e2f
Small comment changes/typos
joost-j Apr 11, 2024
a19c49b
Split plugin from parsing logic, added more tests
joost-j Apr 26, 2024
f808bc7
Removed fh.read() and re-added them to the c_def
joost-j Apr 26, 2024
9b38f3e
Added options and more test cases to support newest version
joost-j Apr 26, 2024
a3b6f27
Added separate records for unsaved/saved tabs, included more data (ti…
joost-j May 8, 2024
677817c
Change cstruct version
joost-j May 13, 2024
9674e37
Remove the --include-deleted-contents arg and make it default
joost-j Aug 14, 2024
06e3f07
Rewrite TabContent records into WindowsNotepadTab class
joost-j Aug 14, 2024
a384fd9
Implement repr for WindowsNotepadTab class
joost-j Aug 14, 2024
914c324
Merge branch 'main' into feature/windows_notepad_tabs
joost-j Aug 14, 2024
e625684
Add typehints and small fixes
Horofic Aug 16, 2024
9bb13c7
Merge branch 'main' into feature/windows_notepad_tabs
Horofic Aug 16, 2024
27fca92
Add suggestions
Horofic Aug 16, 2024
a9b32eb
Merge branch 'feature/windows_notepad_tabs' of github.com:joost-j/dis…
Horofic Aug 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
13 changes: 13 additions & 0 deletions dissect/target/plugins/apps/texteditor/texteditor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from dissect.target.helpers.descriptor_extensions import UserRecordDescriptorExtension
from dissect.target.helpers.record import create_extended_descriptor
from dissect.target.plugin import NamespacePlugin

GENERIC_TAB_CONTENTS_RECORD_FIELDS = [("string", "content"), ("path", "path"), ("string", "deleted_content")]

TexteditorTabContentRecord = create_extended_descriptor([UserRecordDescriptorExtension])(
"texteditor/tab", GENERIC_TAB_CONTENTS_RECORD_FIELDS
)


class TexteditorPlugin(NamespacePlugin):
__namespace__ = "texteditor"
340 changes: 340 additions & 0 deletions dissect/target/plugins/apps/texteditor/windowsnotepad.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,340 @@
from __future__ import annotations

import logging
import zlib
from typing import Iterator

from dissect.cstruct import cstruct
from dissect.util.ts import wintimestamp
from flow.record.fieldtypes import digest

from dissect.target.exceptions import UnsupportedPluginError
from dissect.target.helpers.descriptor_extensions import UserRecordDescriptorExtension
from dissect.target.helpers.fsutil import TargetPath
from dissect.target.helpers.record import (
UnixUserRecord,
WindowsUserRecord,
create_extended_descriptor,
)
from dissect.target.plugin import export
from dissect.target.plugins.apps.texteditor.texteditor import (
GENERIC_TAB_CONTENTS_RECORD_FIELDS,
TexteditorPlugin,
)
from dissect.target.target import Target

# Thanks to @Nordgaren, @daddycocoaman, @JustArion and @ogmini for their suggestions and feedback in the PR
# thread. This really helped to figure out the last missing bits and pieces
# required for recovering text from these files.

windowstab_def = """
struct file_header {
char magic[2]; // NP
uleb128 updateNumber; // increases on every settings update when fileType=9,
// doesn't seem to change on fileType 0 or 1
uleb128 fileType; // 0 if unsaved, 1 if saved, 9 if contains settings?
}

struct tab_header_saved {
uleb128 filePathLength;
wchar filePath[filePathLength];
uleb128 fileSize; // likely similar to fixedSizeBlockLength
uleb128 encoding;
uleb128 carriageReturnType;
uleb128 timestamp; // Windows Filetime format (not unix timestamp)
char sha256[32];
char unk0;
char unk1;
uleb128 fixedSizeBlockLength;
uleb128 fixedSizeBlockLengthDuplicate;
uint8 wordWrap; // 1 if wordwrap enabled, 0 if disabled
uint8 rightToLeft;
uint8 showUnicode;
uint8 optionsVersion;
};

struct tab_header_unsaved {
char unk0;
uleb128 fixedSizeBlockLength; // will always be 00 when unsaved because size is not yet known
uleb128 fixedSizeBlockLengthDuplicate; // will always be 00 when unsaved because size is not yet known
uint8 wordWrap; // 1 if wordwrap enabled, 0 if disabled
uint8 rightToLeft;
uint8 showUnicode;
uint8 optionsVersion;
};

struct tab_header_crc32_stub {
char unk1;
char unk2;
char crc32[4];
};

struct fixed_size_data_block {
uleb128 nAdded;
wchar data[nAdded];
uint8 hasRemainingVariableDataBlocks; // indicates whether after this single-data block more data will follow
char crc32[4];
};

struct variable_size_data_block {
uleb128 offset;
uleb128 nDeleted;
uleb128 nAdded;
wchar data[nAdded];
char crc32[4];
};

struct options_v1 {
uleb128 unk;
};

struct options_v2 {
uleb128 unk1; // likely autocorrect or spellcheck
uleb128 unk2; // likely autocorrect or spellcheck
};
"""

WINDOWS_SAVED_TABS_EXTRA_FIELDS = [("datetime", "modification_time"), ("digest", "hashes"), ("path", "saved_path")]

WindowsNotepadUnsavedTabRecord = create_extended_descriptor([UserRecordDescriptorExtension])(
"texteditor/windowsnotepad/tab/unsaved",
GENERIC_TAB_CONTENTS_RECORD_FIELDS,
)

WindowsNotepadSavedTabRecord = create_extended_descriptor([UserRecordDescriptorExtension])(
"texteditor/windowsnotepad/tab/saved",
GENERIC_TAB_CONTENTS_RECORD_FIELDS + WINDOWS_SAVED_TABS_EXTRA_FIELDS,
)

c_windowstab = cstruct().load(windowstab_def)


def _calc_crc32(data: bytes) -> bytes:
"""Perform a CRC32 checksum on the data and return it as bytes."""
return zlib.crc32(data).to_bytes(length=4, byteorder="big")


class WindowsNotepadTab:
"""Windows notepad tab content parser"""

def __init__(self, file: TargetPath):
self.file = file
self.is_saved = None
self.content = None
self.deleted_content = None
self._process_tab_file()

def __repr__(self) -> str:
return (
f"<{self.__class__.__name__} saved={self.is_saved} "
f"content_size={len(self.content)} has_deleted_content={self.deleted_content is not None}>"
)

def _process_tab_file(self) -> None:
"""Parse a binary tab file and reconstruct the contents."""
with self.file.open("rb") as fh:
# Header is the same for all types
self.file_header = c_windowstab.file_header(fh)

# fileType == 1 # 0 is unsaved, 1 is saved, 9 is settings?
self.is_saved = self.file_header.fileType == 1

# Tabs can be saved to a file with a filename on disk, or unsaved (kept in the TabState folder).
# Depending on the file's saved state, different header fields are present
self.tab_header = (
c_windowstab.tab_header_saved(fh) if self.is_saved else c_windowstab.tab_header_unsaved(fh)
)

# There appears to be a optionsVersion field that specifies the options that are passed.
# At the moment of writing, it is not sure whether this specifies a version or a number of bytes
# that is parsed, so just going with the 'optionsVersion' type for now.
# We don't use the options, but since they are required for the CRC32 checksum
# we store the byte representation
if self.tab_header.optionsVersion == 0:
# No options specified
self.options = b""
elif self.tab_header.optionsVersion == 1:
self.options = c_windowstab.options_v1(fh).dumps()
elif self.tab_header.optionsVersion == 2:
self.options = c_windowstab.options_v2(fh).dumps()
else:
# Raise an error, since we don't know how many bytes future optionVersions will occupy.
# Now knowing how many bytes to parse can mess up the alignment and structs.
raise NotImplementedError("Unknown Windows Notepad tab option version")

Check warning on line 163 in dissect/target/plugins/apps/texteditor/windowsnotepad.py

View check run for this annotation

Codecov / codecov/patch

dissect/target/plugins/apps/texteditor/windowsnotepad.py#L163

Added line #L163 was not covered by tests

# If the file is not saved to disk and no fixedSizeBlockLength is present, an extra checksum stub
# is present. So parse that first
if not self.is_saved and self.tab_header.fixedSizeBlockLength == 0:
# Two unknown bytes before the CRC32
tab_header_crc32_stub = c_windowstab.tab_header_crc32_stub(fh)

# Calculate CRC32 of the header and check if it matches
actual_header_crc32 = _calc_crc32(
self.file_header.dumps()[3:]
+ self.tab_header.dumps()
+ self.options
+ tab_header_crc32_stub.dumps()[:-4]
)
if tab_header_crc32_stub.crc32 != actual_header_crc32:
logging.warning(

Check warning on line 179 in dissect/target/plugins/apps/texteditor/windowsnotepad.py

View check run for this annotation

Codecov / codecov/patch

dissect/target/plugins/apps/texteditor/windowsnotepad.py#L179

Added line #L179 was not covered by tests
"CRC32 mismatch in header of file: %s (expected=%s, actual=%s)",
self.file.name,
tab_header_crc32_stub.crc32.hex(),
actual_header_crc32.hex(),
)

# Used to store the final content
self.content = ""

# In the case that a fixedSizeDataBlock is present, this value is set to a nonzero value
if self.tab_header.fixedSizeBlockLength > 0:
# So we parse the fixed size data block
self.data_entry = c_windowstab.fixed_size_data_block(fh)

# The header (minus the magic) plus all data is included in the checksum
actual_crc32 = _calc_crc32(
self.file_header.dumps()[3:] + self.tab_header.dumps() + self.options + self.data_entry.dumps()[:-4]
)

if self.data_entry.crc32 != actual_crc32:
logging.warning(
"CRC32 mismatch in single-block file: %s (expected=%s, actual=%s)",
self.file.name,
self.data_entry.crc32.hex(),
actual_crc32.hex(),
)

# Add the content of the fixed size data block to the tab content
self.content += self.data_entry.data

# Used to store the deleted content, if available
deleted_content = ""

# If fixedSizeBlockLength in the header has a value of zero, this means that the entire file consists of
# variable-length blocks. Furthermore, if there is any remaining data after the
# first fixed size blocks, as indicated by the value of hasRemainingVariableDataBlocks,
# also continue we also want to continue parsing
if self.tab_header.fixedSizeBlockLength == 0 or (
self.tab_header.fixedSizeBlockLength > 0 and self.data_entry.hasRemainingVariableDataBlocks == 1
):
# Here, data is stored in variable-length blocks. This happens, for example, when several
# additions and deletions of characters have been recorded and these changes have not been 'flushed'

# Since we don't know the size of the file up front, and offsets don't necessarily have to be in order,
# a list is used to easily insert text at offsets
text = []

while True:
# Unfortunately, there is no way of determining how many blocks there are. So just try to parse
# until we reach EOF, after which we stop.
try:
data_entry = c_windowstab.variable_size_data_block(fh)
except EOFError:
break

# Either the nAdded is nonzero, or the nDeleted
if data_entry.nAdded > 0:
# Check the CRC32 checksum for this block
actual_crc32 = _calc_crc32(data_entry.dumps()[:-4])
if data_entry.crc32 != actual_crc32:
logging.warning(

Check warning on line 240 in dissect/target/plugins/apps/texteditor/windowsnotepad.py

View check run for this annotation

Codecov / codecov/patch

dissect/target/plugins/apps/texteditor/windowsnotepad.py#L240

Added line #L240 was not covered by tests
"CRC32 mismatch in multi-block file: %s (expected=%s, actual=%s)",
self.file.name,
data_entry.crc32.hex(),
actual_crc32.hex(),
)

# Insert the text at the correct offset.
for idx in range(data_entry.nAdded):
text.insert(data_entry.offset + idx, data_entry.data[idx])

elif data_entry.nDeleted > 0:
# Create a new slice. Include everything up to the offset,
# plus everything after the nDeleted following bytes
deleted_content += "".join(text[data_entry.offset : data_entry.offset + data_entry.nDeleted])
text = text[: data_entry.offset] + text[data_entry.offset + data_entry.nDeleted :]

# Join all the characters to reconstruct the original text within the variable-length data blocks
text = "".join(text)

# Finally, add the reconstructed text to the tab content
self.content += text

# Set None if no deleted content was found
self.deleted_content = deleted_content if deleted_content else None


class WindowsNotepadPlugin(TexteditorPlugin):
"""Windows notepad tab content plugin."""

__namespace__ = "windowsnotepad"

GLOB = "AppData/Local/Packages/Microsoft.WindowsNotepad_*/LocalState/TabState/*.bin"

def __init__(self, target: Target):
super().__init__(target)
self.users_tabs: list[TargetPath, UnixUserRecord | WindowsUserRecord] = []
for user_details in self.target.user_details.all_with_home():
for tab_file in user_details.home_path.glob(self.GLOB):
# These files seem to contain information on different settings / configurations,
# and are skipped for now
if tab_file.name.endswith(".1.bin") or tab_file.name.endswith(".0.bin"):
continue

Check warning on line 282 in dissect/target/plugins/apps/texteditor/windowsnotepad.py

View check run for this annotation

Codecov / codecov/patch

dissect/target/plugins/apps/texteditor/windowsnotepad.py#L282

Added line #L282 was not covered by tests

self.users_tabs.append((tab_file, user_details.user))

def check_compatible(self) -> None:
if not self.users_tabs:
raise UnsupportedPluginError("No Windows Notepad tab files found")

Check warning on line 288 in dissect/target/plugins/apps/texteditor/windowsnotepad.py

View check run for this annotation

Codecov / codecov/patch

dissect/target/plugins/apps/texteditor/windowsnotepad.py#L288

Added line #L288 was not covered by tests

@export(record=[WindowsNotepadSavedTabRecord, WindowsNotepadUnsavedTabRecord])
def tabs(self) -> Iterator[WindowsNotepadSavedTabRecord | WindowsNotepadUnsavedTabRecord]:
"""Return contents from Windows 11 Notepad tabs - and its deleted content if available.

Windows Notepad application for Windows 11 is now able to restore both saved and unsaved tabs when you re-open
the application.


Resources:
- https://github.com/fox-it/dissect.target/pull/540
- https://github.com/JustArion/Notepad-Tabs
- https://github.com/ogmini/Notepad-Tabstate-Buffer
- https://github.com/ogmini/Notepad-State-Library
- https://github.com/Nordgaren/tabstate-util
- https://github.com/Nordgaren/tabstate-util/issues/1
- https://medium.com/@mahmoudsoheem/new-digital-forensics-artifact-from-windows-notepad-527645906b7b

Yields a WindowsNotepadSavedTabRecord or WindowsNotepadUnsavedTabRecord. with fields:

.. code-block:: text

content (string): The content of the tab.
path (path): The path to the tab file.
deleted_content (string): The deleted content of the tab, if available.
hashes (digest): A digest of the tab content.
saved_path (path): The path where the tab was saved.
modification_time (datetime): The modification time of the tab.
"""
for file, user in self.users_tabs:
# Parse the file
tab: WindowsNotepadTab = WindowsNotepadTab(file)

if tab.is_saved:
yield WindowsNotepadSavedTabRecord(
content=tab.content,
path=tab.file,
deleted_content=tab.deleted_content,
hashes=digest((None, None, tab.tab_header.sha256.hex())),
saved_path=tab.tab_header.filePath,
modification_time=wintimestamp(tab.tab_header.timestamp),
_target=self.target,
_user=user,
)
else:
yield WindowsNotepadUnsavedTabRecord(
content=tab.content,
path=tab.file,
_target=self.target,
_user=user,
deleted_content=tab.deleted_content,
)
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Loading