Technology

Can Excel or Google spreadsheets handle really large amounts of data?


software

Can Excel or Google spreadsheets handle really large amounts of data?

by Pexels

Spreadsheet software is often used to collect accounting data, build large data sets by machine learning, and categorize a large number of numbers. Computer scientistAditya ParameswaranThe research team led by them and others of Microsoft Excel, Google spreadsheets, open source free spreadsheet softwareLibreOffice CalcAnnounced that the benchmark was done at. Computer research scientist Adrian Colyer explains the results of this research in an easy-to-understand manner.

Benchmarking Spreadsheet Systems
https://people.eecs.berkeley.edu/~adityagp/papers/spreadsheet_bench.pdf

Benchmarking spreadsheet systems – the morning paper
https://blog.acolyer.org/2019/12/06/benchmarking-spreadsheet-systems/

As technology advances, spreadsheets such as Google Spread Seed and Excel are said to support more than 5 million cells. However, Mr. Colyer points out that "actually, spreadsheet software becomes quite heavy when tens of thousands of lines are read, and freezes occur frequently and cannot be put into practical use."

The research team of Parameswaran showed this in research. The researchers conducted a benchmark test using a 50,000-row x 17-column spreadsheet file containing weather data as a sample.

First, the research team prepared spreadsheets with various numbers of rows from 150 to 490,000 from the sample, and performed various operations to check the load time for loading, updating, and querying.

The graph below on the left summarizes Excel and LibreOffice Calc, and the graph on the right summarizes Google spreadsheet delays. The horizontal axis is the number of sample rows, and the vertical axis is the number of seconds it takes to read. F is for reading formula cells and value cells, and V is for reading only value cells. Excel works relatively stably, but reading time is almost doubled when cells are read.

In the case of LibreOffice Calc, the reading time did not change depending on the cell, but you can see from the graph above that the operation is considerably heavy compared to Excel. Also, although Google spreadsheets vary, reading formula cells takes a lot of time.

Colyer evaluates that “Excel is 6000 lines, LibreOffice Calc and Google spreadsheets are over 150 lines, and it is delayed at a level that can not be put to practical use.”

And the graph when narrowing down with the filter function is as follows. If you use Excel to narrow down a sheet that contains formula cells and value cells, you can see that the process took an unusually long time. In addition, when Google Sheet exceeded 20,000 lines, processing took 1.5 seconds, and when it exceeded 50,000 lines, it was found that it waited nearly 5 seconds.

Also,VLOOKUPWhen using the function, LibreOffice Calc and Google Spreadsheet scan the entire spreadsheet, and it seems that there was no action to stop when it finds a value. As a result, Colyer says that working with LibreOffice Calc and Google Sheets can be quite inefficient when the data is sorted.

Overall, Colyer argues that various types of spreadsheet software have not been able to efficiently process large data sets, saying, “This benchmark result is a starting point that contributes to the optimization of spreadsheet software. It will be. "

Copy the title and URL of this article




Source link

Show More

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
Close
Close