paperlined.org
apps
document updated 2 months ago, on Apr 11, 2025

auto-detecting a file's type

... looking only at files' contents, not at their file extension.

The relevant Wikipedia article is "content sniffing".

tools available

I really like using file(1) and magic(5) to auto-detect a file's type based on its contents. Its database covers MANY different file types, and it generally seems to get things right. However, it IS still a guess, and sometimes guesses are wrong.

Is this file text or binary?

You should be aware that auto-detecting whether a file is text is a guess/heuristic, and different heuristics often disagree with each other about whether a particular file is text.

Some tools that contain text vs binary heuristics:

What character encoding does this text file use?

These are always a heuristic guess, and ideally a file's character encoding should always be stated explicitly.

See also — en.wikipedia.org/wiki/Charset_detection

Some tools that can do this:

Is this file a Perl source-code file?

Some tools that can auto-detect if a file contains Perl source: