Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header whitespace regression in 5.12.0 #1354

Closed
timmc-edx opened this issue Nov 27, 2023 · 7 comments · Fixed by #1357
Closed

Header whitespace regression in 5.12.0 #1354

timmc-edx opened this issue Nov 27, 2023 · 7 comments · Fixed by #1357
Assignees
Labels
Milestone

Comments

@timmc-edx
Copy link

timmc-edx commented Nov 27, 2023

5.12.0 no longer tolerates sheets with filled-in cells that are not under a header. (For example, there might be a summary table off to the side of the main data.)

Repro data

aaa bbb    
w x    
y z   other
worksheet.get_all_records(expected_headers=['aaa'])

Expected (old) behavior:

In 5.11.3 the output is [{'aaa': 'w', 'bbb': 'x', '': ''}, {'aaa': 'y', 'bbb': 'z', '': 'other'}]

Unexpected (new) behavior:

5.12.0 raises gspread.exceptions.GSpreadException: the header row in the worksheet contains multiple empty cells

Environment info

  • Operating System: Ubuntu Linux
  • Python version: 3.8
  • gspread version: 5.12.0
@alifeee
Copy link
Collaborator

alifeee commented Nov 27, 2023

hi! thanks for the issue :)

This bug was fixed by #1353 after issue #1352 was filed.

This fix will be released in 5.12.1, which will release in the next month.

@timmc-edx
Copy link
Author

Oh, thanks! Somehow I missed that in my issue search.

@timmc-edx
Copy link
Author

Hmm... I think this is actually a different issue. I can still repro it with current master (0932358).

@alifeee
Copy link
Collaborator

alifeee commented Nov 27, 2023

ah, yes, I see the issue. Thanks for sticking with it.

The code returning the error is here

values_len = len(values[0])
keys_len = len(keys)
values_wider_than_keys_by = values_len - keys_len
default_blank_in_keys = default_blank in keys
if ((values_wider_than_keys_by > 0) and default_blank_in_keys) or (
values_wider_than_keys_by > 1
):
raise GSpreadException(
"the header row in the worksheet contains multiple empty cells"
)

Keys is obtained with

keys = self.get_values(
"{head}:{head}".format(head=head), value_render_option=value_render_option
)[0]

values is obtained with

values = self.get_values(
"{first_index}:{last_index}".format(
first_index=first_index, last_index=last_index
),
value_render_option=value_render_option,
)

The problem is that #1353 changed the way that the header row is obtained (in f6f0658). Beforehand, it was just the first row in the data

data = self.get_all_values(value_render_option=value_render_option)
# Return an empty list if the sheet doesn't have enough rows
if len(data) <= idx:
return []
keys = data[idx]

Thus, the cells are obtained as thus:

before

image

after

image

This is because the Google Sheets API cuts off the right and bottom of requested ranges if they are empty cells (see #1289)

Thus, the number of values (4) is wider than the number of keys (2), so the condition values_wider_than_keys_by > 1 is True, so the error is thrown (see error code above).

Now, #1353 exists so you can get the data from only rows 100-150 (e.g.) while still using a row 1 as the header row (e.g.). So we can't recombine the requests.

We will probably have to make sure the keys and values arrays have the same number of columns, by using fill_gaps

gspread/gspread/utils.py

Lines 546 to 557 in 0932358

def fill_gaps(L, rows=None, cols=None, padding_value=""):
"""Fill gaps in a list of lists.
e.g.,::
>>> L = [
... [1, 2, 3],
... ]
>>> fill_gaps(L, 2, 4)
[
[1, 2, 3, ""],
["", "", "", ""]
]

If we do that, we can probably remove this code

if ((values_wider_than_keys_by > 0) and default_blank_in_keys) or (
values_wider_than_keys_by > 1
):
raise GSpreadException(
"the header row in the worksheet contains multiple empty cells"
)
elif values_wider_than_keys_by == 1:
keys.append(default_blank)
elif values_wider_than_keys_by < 0:
values = fill_gaps(values, cols=keys_len, padding_value=default_blank)

because it is covered by this logic

header_row_is_unique = len(keys) == len(set(keys))
if not header_row_is_unique:
raise GSpreadException("the header row in the worksheet is not unique")

I do not have much time this week to code. If you would like to try the fix it should be simple enough. Otherwise we will get round to it and it should get into v5.12.1

Thanks again for the report :)

@lavigne958
Copy link
Collaborator

I agree we should fill gaps the headers too, to make sure the header row matches the requested range size and prevent the google default behavior of stripping the empty cells.

Though we should keep in mind that in this particular scenario column C and D have empty headers so would collide in the check and will fail but in this case that is the wanted behavior. to prevent key/value overriding when building the dict.

@alifeee alifeee added this to the 5.12.2 milestone Nov 29, 2023
@alifeee alifeee added the Bug label Nov 29, 2023
@alifeee alifeee self-assigned this Nov 29, 2023
@alifeee
Copy link
Collaborator

alifeee commented Nov 29, 2023

yep

Though we should keep in mind that in this particular scenario column C and D have empty headers so would collide in the check and will fail but in this case that is the wanted behavior. to prevent key/value overriding when building the dict.

Here, expected_headers is provided as [], showing that the user wants to ignore duplicate headers.

@alifeee
Copy link
Collaborator

alifeee commented Dec 7, 2023

Hi. Please read the proposal for changing how get_all_records works for next release 6.0.0 -> #1367. We would enjoy if you would contribute your opinion :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants