document updated 1 year, 2 months ago, on Feb 19, 2023

auto-detecting a file's type

... looking only at files' contents, not at their file extension.

The relevant Wikipedia article is "content sniffing".

tools available

I really like using file(1) and magic(5) to auto-detect a file's type based on its contents. Its database covers MANY different file types, and it generally seems to get things right. However, it IS still a guess, and sometimes guesses are wrong.

text vs binary

You should be aware that auto-detecting whether a file is text is a guess/heuristic, and different heuristics often disagree with each other about whether a particular file is text.

Some tools that contain text vs binary heuristics:

auto-detecting character encoding within text files

see also —

Some tools that can do this:

Perl source files

Some tools that can auto-detect if a file contains Perl source: