British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology. 

Staying productive: Further compression of Excel files (xlsx)

I recently ran into a problem. I needed to upload supplemental materials to a journal, but they capped the max file size at 100 MB. That is a lot, but sometimes, you have files that are too large. This journal needed the file format to be Excel. The newer Excel format, .xlsx, is already a compressed version of .xls format. This means that an xlsx file is going to be smaller than even a plain text version of the file. 

Rather than email the journal and ask for some work around, which would have involved a major delay given the UK/US time difference, I managed to compress the 128 MB xlsx even further! 

Here is the solution to the problem; a simple computer trick here: 

http://datapigtechnologies.com/blog/index.php/how-to-compress-xlsx-files-to-the-smallest-possible-size/

Basically, this trick works because the compression used by Excel isn't great, but Excel can read files compressed by other tools (depending on the exact settings you choose). This linked post is a good walkthrough, but I wanted to create an even more detailed version using 7-zip rather than WinZip (7-Zip is freely available, unlike WinZip). In my hands, WinZip did lead to a better compression, 90 MB rather than the 95 MB for 7-Zip, but both would have worked. I did not try to change the default much with either tool, any changes I did make meant that Excel could not read the file. But there might be settings that do allow for further compression and still allow Excel to open the file, if anyone discovers these, please let me know. 

1. Find your .xlsx in the browser/explorer

1.PNG

2. Then change the extension from .xlsx to .zip. 

2.PNG
3.PNG

3. Now "Extract Here" the file with 7-Zip. 

4.PNG

What you will have now, in addition to your .zip file, are multiple files/folders. 

5.PNG

4. Select all of the outputted files/folders, right click, select 7-Zip, and select "Add to archive..." to create a new compressed file. 

6.PNG

5. When the 7-Zip box shows up, select how to compress the file. To ensure that Excel can open it after this process is over, select "zip", leaving the other options as default. I have not tested all of these options so others might work and give better compression (if you experiment with this step, again, please let me know how it worked out). 

7.PNG

Now you have a new .zip file as output, notice the file size is much smaller than the origional Excel (xlsx) file. 

8.PNG

6. Change the file extension from .zip to .xlsx to make it and open it in Excel. 

9.PNG

Now, hopefully it has compressed your file enough to help.