Zum Inhalt der Seite gehen


I use Excel a lot (I mean, who doesn’t, right?) but I am never not horrified by the inexplicable things it does to formatting and actual fucking content, and the monstrous things people feel they have to do to stop it.

Today it’s SIC (Standard Industrial Classification) codes, which are 2, 3 or 5 digit strings, like ‘081’ (Quarrying of stone, sand and clay)… and they’re stored as an assortment of numbers, or strings with a preceding quote. But not even consistently within the Excel file. 😳
It’s really a lot worse than that. There are also quality control ‘flags’ indicated only by manual colour-coding on the cells containing values to which they apply. I don’t know if there’s any way to extract that information?

And this is released by the Office of National Statistics.

Look, I know I am an atypical user. Most people who download these stats want it human-readable, and they might even enjoy a multicoloured spreadsheet 😣

But surely they can make it predictable as well?
In an Excel spreadsheet I use for budgeting, I have a column. It is calculated. 200 rows down, it decides to inform me that the cell is not like the cell to the left. Ah, yeah, it's like the cell above it. I cannot seem to fix that in the next 50-some cells without saying ignore "error." Grrr...
You generally *can* extract cell formatting information with things like openpyxl

but yes, what I would not give for excel to have a "when you open CSV files, treat every value as a string and do not convert anything ever" mode
I can’t recall, are you an #rstats user? Many years ago I remember seeing something (a talk, a paper) by Jenny Bryan about how to extract info from messy Excel files. I’m pretty sure she had a way to grab cell color.
stop using a spreadsheet when you need a database 😀