Header whitespace regression in 5.12.0 #1354

timmc-edx · 2023-11-27T18:01:55Z

5.12.0 no longer tolerates sheets with filled-in cells that are not under a header. (For example, there might be a summary table off to the side of the main data.)

Repro data

aaa	bbb
w	x
y	z		other

worksheet.get_all_records(expected_headers=['aaa'])

Expected (old) behavior:

In 5.11.3 the output is [{'aaa': 'w', 'bbb': 'x', '': ''}, {'aaa': 'y', 'bbb': 'z', '': 'other'}]

Unexpected (new) behavior:

5.12.0 raises gspread.exceptions.GSpreadException: the header row in the worksheet contains multiple empty cells

Environment info

Operating System: Ubuntu Linux
Python version: 3.8
gspread version: 5.12.0

The text was updated successfully, but these errors were encountered:

alifeee · 2023-11-27T18:08:18Z

hi! thanks for the issue :)

This bug was fixed by #1353 after issue #1352 was filed.

This fix will be released in 5.12.1, which will release in the next month.

timmc-edx · 2023-11-27T18:09:29Z

Oh, thanks! Somehow I missed that in my issue search.

timmc-edx · 2023-11-27T18:13:47Z

Hmm... I think this is actually a different issue. I can still repro it with current master (0932358).

alifeee · 2023-11-27T19:21:42Z

ah, yes, I see the issue. Thanks for sticking with it.

The code returning the error is here

gspread/gspread/worksheet.py

Lines 689 to 699 in 0932358

    
           values_len = len(values[0]) 
        
           keys_len = len(keys) 
        
           values_wider_than_keys_by = values_len - keys_len 
        
           default_blank_in_keys = default_blank in keys 
        
           if ((values_wider_than_keys_by > 0) and default_blank_in_keys) or ( 
        
               values_wider_than_keys_by > 1 
        
           ): 
        
               raise GSpreadException( 
        
                   "the header row in the worksheet contains multiple empty cells" 
        
               )

Keys is obtained with

gspread/gspread/worksheet.py

Lines 658 to 660 in 0932358

    
           keys = self.get_values( 
        
               "{head}:{head}".format(head=head), value_render_option=value_render_option 
        
           )[0]

values is obtained with

gspread/gspread/worksheet.py

Lines 682 to 687 in 0932358

    
           values = self.get_values( 
        
               "{first_index}:{last_index}".format( 
        
                   first_index=first_index, last_index=last_index 
        
               ), 
        
               value_render_option=value_render_option, 
        
           )

The problem is that #1353 changed the way that the header row is obtained (in f6f0658). Beforehand, it was just the first row in the data

gspread/gspread/worksheet.py

Lines 547 to 553 in 3e5976e

    
           data = self.get_all_values(value_render_option=value_render_option) 
        
           # Return an empty list if the sheet doesn't have enough rows 
        
           if len(data) <= idx: 
        
               return [] 
        
           keys = data[idx]

Thus, the cells are obtained as thus:

before

after

This is because the Google Sheets API cuts off the right and bottom of requested ranges if they are empty cells (see #1289)

Thus, the number of values (4) is wider than the number of keys (2), so the condition values_wider_than_keys_by > 1 is True, so the error is thrown (see error code above).

Now, #1353 exists so you can get the data from only rows 100-150 (e.g.) while still using a row 1 as the header row (e.g.). So we can't recombine the requests.

We will probably have to make sure the keys and values arrays have the same number of columns, by using fill_gaps

gspread/gspread/utils.py

Lines 546 to 557 in 0932358

    
           def fill_gaps(L, rows=None, cols=None, padding_value=""): 
        
               """Fill gaps in a list of lists. 
        
               e.g.,:: 
        
                   >>> L = [ 
        
                   ... [1, 2, 3], 
        
                   ... ] 
        
                   >>> fill_gaps(L, 2, 4) 
        
                   [ 
        
                       [1, 2, 3, ""], 
        
                       ["", "", "", ""] 
        
                   ]

If we do that, we can probably remove this code

gspread/gspread/worksheet.py

Lines 694 to 703 in 0932358

    
           if ((values_wider_than_keys_by > 0) and default_blank_in_keys) or ( 
        
               values_wider_than_keys_by > 1 
        
           ): 
        
               raise GSpreadException( 
        
                   "the header row in the worksheet contains multiple empty cells" 
        
               ) 
        
           elif values_wider_than_keys_by == 1: 
        
               keys.append(default_blank) 
        
           elif values_wider_than_keys_by < 0: 
        
               values = fill_gaps(values, cols=keys_len, padding_value=default_blank)

because it is covered by this logic

gspread/gspread/worksheet.py

Lines 664 to 666 in 0932358

    
           header_row_is_unique = len(keys) == len(set(keys)) 
        
           if not header_row_is_unique: 
        
               raise GSpreadException("the header row in the worksheet is not unique")

I do not have much time this week to code. If you would like to try the fix it should be simple enough. Otherwise we will get round to it and it should get into v5.12.1

Thanks again for the report :)

lavigne958 · 2023-11-28T15:51:26Z

I agree we should fill gaps the headers too, to make sure the header row matches the requested range size and prevent the google default behavior of stripping the empty cells.

Though we should keep in mind that in this particular scenario column C and D have empty headers so would collide in the check and will fail but in this case that is the wanted behavior. to prevent key/value overriding when building the dict.

alifeee · 2023-11-29T10:50:34Z

yep

Though we should keep in mind that in this particular scenario column C and D have empty headers so would collide in the check and will fail but in this case that is the wanted behavior. to prevent key/value overriding when building the dict.

Here, expected_headers is provided as [], showing that the user wants to ignore duplicate headers.

alifeee · 2023-12-07T11:16:10Z

Hi. Please read the proposal for changing how get_all_records works for next release 6.0.0 -> #1367. We would enjoy if you would contribute your opinion :)

alifeee mentioned this issue Nov 28, 2023

fix combine_merged_cells when using from a range that doesn't start at A1 #1335

Merged

1 task

alifeee added this to the 5.12.2 milestone Nov 29, 2023

alifeee added the Bug label Nov 29, 2023

alifeee self-assigned this Nov 29, 2023

lavigne958 mentioned this issue Nov 29, 2023

merge master into v6.0.0 release branch #1356

Closed

alifeee mentioned this issue Nov 30, 2023

Many fixes for get_records #1357

Merged

alifeee linked a pull request Nov 30, 2023 that will close this issue

Many fixes for get_records #1357

Merged

alifeee closed this as completed in #1357 Dec 3, 2023

alifeee mentioned this issue Dec 7, 2023

PROPOSAL: changes to get_records/get_all_records #1367

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Header whitespace regression in 5.12.0 #1354

Header whitespace regression in 5.12.0 #1354

timmc-edx commented Nov 27, 2023 •

edited

Loading

alifeee commented Nov 27, 2023

timmc-edx commented Nov 27, 2023

timmc-edx commented Nov 27, 2023

alifeee commented Nov 27, 2023

lavigne958 commented Nov 28, 2023

alifeee commented Nov 29, 2023

alifeee commented Dec 7, 2023

Header whitespace regression in 5.12.0 #1354

Header whitespace regression in 5.12.0 #1354

Comments

timmc-edx commented Nov 27, 2023 • edited Loading

Repro data

Expected (old) behavior:

Unexpected (new) behavior:

Environment info

alifeee commented Nov 27, 2023

timmc-edx commented Nov 27, 2023

timmc-edx commented Nov 27, 2023

alifeee commented Nov 27, 2023

before

after

lavigne958 commented Nov 28, 2023

alifeee commented Nov 29, 2023

alifeee commented Dec 7, 2023

timmc-edx commented Nov 27, 2023 •

edited

Loading