Common problems with large file uploads

- July 18, 2012

Large file uploads can be a problem.

Follow the discussion here on HN

Over the last few decades, Moore’s law has kept storage size increasing at a breakneck pace, with 1 Terabyte drives now increasingly common in PCs. Alongside this rapid increase in storage, we’ve seen the average file size go through the roof, with images in the tens of megabytes and blu-ray movies topping 40 GB in some cases. And while this is fine for users, if you’re a web developer who wants to upload and process these monolithic files, life can get challenging rather quickly.

For example, here’s some snags to watch out for when you are uploading large files:

Browser limitations:

Fortunately this isn’t quite as bad as it used to be, but if you’re looking to support browsers that are more than a few years old, many of them have trouble with files pushing past the 100 MB range. Even as recent as IE8 browsers have had trouble going above 2 GB

Server configuration issues:

This comes up most frequently with PHP, but make sure that your web server is configured to handle files at the sizes your are interested in. In PHP the important flags to set are memory_limit, post_max_size, and upload_max_filesize, but many frameworks have similar configuration options

Memory issues:

You’ve got to be pushing up some seriously big files to hit into these problems, but if you are going above 1 GB Apache and other web servers might start complaining due to the memory usage required. This is especially problematic if you are doing processing on the file or doing a read-then-write, as opposed to streaming it directly into storage.

Timeout issues:

Another frustrating one – given that your users might have bandwidth in the 100-KBs range, uploading a file around 100 MB will take a significant amount of time. Many web servers, especially ones on hosting services like Heroku, time out on requests after a certain amount of time, which can be as low as 30 seconds, causing the upload to fail

… and there’s plenty more – a quick google search brings up all the aches and pains developers run into when dealing with this problem.

Fortunately, if you run up against this wall, there are a number of things you can do.

If you need to deal with older browsers, a good old Flash or Java fallback can often due the trick. There are a number of solutions out there, choose one that works for you. Be advised though that some of them start behaving oddly around 2 GB because of 32-bit rather than 64-bit versions of Flash.

If you are using Amazon S3, they offer the ability to POST files directly to their service, offering significant savings on bandwidth and dealing with many of the memory and timeout issues mentioned earlier. You can learn more about this service on their Post sample app. Unfortunately it does require pretty serious changes to your server logic and there are a number of limitations.

Shard the file into more manageable chunks using the javascript Blob API, stream them up in parallel and reassemble them on the server. This isn’t available on all browsers yet, but elegantly solves almost all server-side issues, from timeouts to memory issues

Any other ideas?

Follow the discussion here on HN