... looking only at files' contents, not at their file extension.
The relevant Wikipedia article is "content sniffing".
I really like using file(1) and magic(5) to auto-detect a file's type based on its contents. Its database covers MANY different file types, and it generally seems to get things right. However, it IS still a guess, and sometimes guesses are wrong.
You should be aware that auto-detecting whether a file is text is a guess/heuristic, and different heuristics often disagree with each other about whether a particular file is text.
Some tools that contain text vs binary heuristics:
find -type f | perl -nle 'print if -T'
find -type f | perl -nle 'print if qx{file -bi $_} =~ m#^text/#'
grep
has a heuristic that allows it to ignore binary files [algorithm]grep -rlI ^
grep --recursive --files-with-matches --binary-files=without-match ^
^
" means "always match, regardless of file contents"see also — en.wikipedia.org/wiki/Charset_detection
Some tools that can do this:
find -type f | perl -nle 'print if qx{file -bk $_} =~ /\bASCII text\b/'
Some tools that can auto-detect if a file contains Perl source:
find -type f | perl -MPerl::Metrics::Simple -nle 'print if Perl::Metrics::Simple->is_perl_file($_)'
perl -MFile::Find::Rule::Perl -le 'print join "\n", find(perl_file => 1, in => ".")'
find -type f | perl -nle 'print if qx{file -b $_} =~ /^perl(?! Storable)/i'
find -type f | perl -nle 'open(my$f,"-|","file","-b",$_)or die$!; print if <$f> =~ /^perl(?! Storable)/i'
file
often misclassifies extremely short Perl files as just ASCII.perl -c
does not, across a WIDE variety of files)find -type f | xargs -n 1 syncheck
find -type f | perl -nle 'system "syncheck", $_'