Mimetypes versus Extensions

- June 16, 2012


Over the last few days we’ve had some conversations here at Filestack about how content types will be identified in the future, whether we will still all be using extensions on filenames like .txt or whether mimetypes are the way forward. If you want to brush up on mimetypes, the wikipedia article is great. Here’s our take:

Overall, mimetypes should be the way to identify content, but given the dominance of file extensions in local filesystems and applications, it is likely that file extensions will become the de facto way to identify content on the web as well.

Mimetypes are Great…

As mentioned, mimetypes should be what is commonly used instead of ugly extensions. One of the biggest strengths of mimetypes are that you can do group together subtypes under a single type. The “text/*” mimetype is a great example of this: if I am building a text editing application, it makes much more sense to ask for any text by declaring the “text/*” mimetype, rather than list out the myriad of possible extensions. Mimetypes are also a fair bit longer than extensions, and so are typically more readible. As seen in Google Docs, mimetypes allow you to decouple the name of the files from it’s type, such that the awkward appendix can be dropped from “My Family.jpg,” making renaming far easier implement and simplifying filesystem displays.

…but riddled with inconsistencies

Unfortunately, we’ve lived in a primarily PC dominated world where extensions are much more common, and thus mimetypes suffer from the atrophy that comes with infrequent use. Even some common files have multiple, inconsistent mimetypes, for example .psd’s can be identified by either “image/x-photoshop” or “image/vnd.adobe.photoshop” depending on who you ask.

This isn’t to say that extensions aren’t inconsistent, for example the use of .jpg, .jpeg, or even .jpe, but over the years of use many of these inconsistencies have been overcome. Also, while many of the common mimetypes are clean and readable, some are rather cryptic as seen with “image/x-photoshop”

My recommendation? If you are working with user content, use mimetype as your internal representation because of the benefits described, but have an interface with the outside world that accepts both mimetypes and extensions to handle any inconsistencies that may arise from the use of mimetypes. Also, to help rid the world of the forced, uninformative, and over-used “application/*” mimetype, I’d also propose the inclusion of a few more top-level types:

– code/, as a more specific selector than “text/” for content that can be compiled or interpreted as computer instructions

– document/, as a way to specify office documents, PDFs, etc – content you might print out and read that’s not just plain text

– drawing/ or photo/, as a way to distinguish mutable images such as photoshop documents, etc. from immutable images like photos.

Comment below or on Hacker News