Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a periodic encoder to the DatetimeEncoder #1235

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
bc9a5bc
Initial commit for periodic encoder
rcap107 Feb 7, 2025
176b18f
Refactoring code to split Spline and Circular encoders
rcap107 Feb 10, 2025
cc6455c
WIP Refactoring code to split Spline and Circular encoders
rcap107 Feb 10, 2025
99b2d37
Renaming a constant
rcap107 Feb 10, 2025
ee24931
Finished implementation of the circular encoder
rcap107 Feb 10, 2025
1a53da2
Updating changelog
rcap107 Feb 10, 2025
156d397
✅ WIP Improving test coverage
rcap107 Feb 11, 2025
3004442
Fixing tests
rcap107 Feb 12, 2025
9e6e05f
Fixing more bugs with testing
rcap107 Feb 12, 2025
542522e
Fixing a typo
rcap107 Feb 12, 2025
3804dc9
Fixing all tests and docstring shenanigans
rcap107 Feb 12, 2025
a981526
Fixing bugs, adding tests, adding docstrings
rcap107 Feb 13, 2025
3380f83
Delete example_circular_encoding.py
rcap107 Feb 13, 2025
ed5eb46
Adding doc classes
rcap107 Feb 13, 2025
ea209d6
Implementing fixes from comments
rcap107 Feb 13, 2025
af9f363
Update skrub/_datetime_encoder.py
rcap107 Feb 17, 2025
d0ab8bf
Update skrub/_datetime_encoder.py
rcap107 Feb 17, 2025
1ac5caf
Fixing a bug with how the first value is selected
rcap107 Feb 18, 2025
5d98ee3
Fixing a fix that didn't fix
rcap107 Feb 18, 2025
c96d3fc
Adding back default values
rcap107 Feb 18, 2025
4b660e8
Merge remote-tracking branch 'upstream/main' into circular_encoding
rcap107 Mar 10, 2025
4dcae67
Simplifying code and hardcoding defaults
rcap107 Mar 10, 2025
e160dcb
fixing an import
rcap107 Mar 10, 2025
7be8721
Updating the datetime encoder example to add periodic features
rcap107 Mar 10, 2025
2ab6471
Fixing example
rcap107 Mar 10, 2025
aa88cd1
wip debugging
rcap107 Mar 11, 2025
72ae07b
Fixing a bug that was caused by pandas indexin
rcap107 Mar 11, 2025
5bd2ffa
Fixing a bug with pandas indexing, renaming variables
rcap107 Mar 11, 2025
2d28a18
Updating example to add new datetimeencoder features
rcap107 Mar 11, 2025
b2349f5
Removing a debugging script
rcap107 Mar 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,15 @@ Ongoing development
New features
------------

- The :class:`TableReport` now switch it's visual theme between light and dark according to the user preferences.
- The :class:`TableReport` now switches its visual theme between light and dark according to the user preferences.
:pr:`1201` by :user:`rouk1 <rouk1>`.

- Adding a new way to control the location of the data directory, using envar `SKRUB_DATA_DIRECTORY`.
:pr:`1215` by :user:`Thomas S. <thomass-dev>`

- The :class:`DatetimeEncoder` now supports periodic encoding of the features using circular (sine/cosine) and spline
transformers. :pr:`1235` by :user:`Riccardo Cappuzzo<rcap107>`.

Changes
-------

Expand Down
4 changes: 3 additions & 1 deletion skrub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from ._agg_joiner import AggJoiner, AggTarget
from ._check_dependencies import check_dependencies
from ._column_associations import column_associations
from ._datetime_encoder import DatetimeEncoder
from ._datetime_encoder import CircularEncoder, DatetimeEncoder, SplineEncoder
from ._deduplicate import compute_ngram_distance, deduplicate
from ._fuzzy_join import fuzzy_join
from ._gap_encoder import GapEncoder
Expand Down Expand Up @@ -56,4 +56,6 @@
"TextEncoder",
"StringEncoder",
"column_associations",
"SplineEncoder",
"CircularEncoder",
]
18 changes: 18 additions & 0 deletions skrub/_dataframe/_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
"unique",
"filter",
"where",
"where_row",
"sample",
"head",
"slice",
Expand Down Expand Up @@ -1182,6 +1183,23 @@ def _where_polars(col, mask, other):
return col.zip_with(mask, pl.Series(other))


@dispatch
def where_row(obj, mask, other):
raise NotImplementedError()


@where_row.specialize("pandas")
def _where_row_pandas(obj, mask, other):
return obj.apply(pd.Series.where, **{"cond": mask, "other": other})


@where_row.specialize("polars")
def _where_row_polars(obj, mask, other):
return obj.with_columns(
pl.when(pl.Series(mask)).then(pl.all()).otherwise(pl.Series(other))
)


@dispatch
def sample(obj, n, seed=None):
raise NotImplementedError()
Expand Down
18 changes: 18 additions & 0 deletions skrub/_dataframe/tests/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -778,6 +778,24 @@ def test_where(df_module):
)


def test_where_row(df_module):
df = df_module.make_dataframe({"col1": [1, 2, 3], "col2": [1000, 2000, 3000]})
out = ns.where_row(
df,
df_module.make_column("", [False, True, False]), # mask
df_module.make_column(
"", [None, None, None]
), # values to put in on the entire row
)
right = df_module.make_dataframe(
{"col1": [None, 2, None], "col2": [None, 2000, None]}
)
df_module.assert_frame_equal(
ns.pandas_convert_dtypes(out),
ns.pandas_convert_dtypes(right),
)


def test_sample(df_module):
s = ns.pandas_convert_dtypes(df_module.make_column("", [0, 1, 2]))
sample = ns.sample(s, 2)
Expand Down
Loading