document updated 13 years ago, on Jan 14, 2011
This page examines how Perl's -T operator works.
-T examines a file, and tries to guess if it's an ASCII text file. But what heuristic, exactly, does it use?
To see the implementation of it, search for "fttext" within pp_sys.c of the Perl source code.
Basically:
- the first 512 bytes of the file are scanend
- it evaluates each character in turn
- if a null character is found anywhere, the file is considered binary
- if less than 1/3 of the bytes are classified as "odd", then the file is considered text
- "odd" is defined as:
- UTF8 characters don't count as odd
- characters that are <32 are odd, with several exceptions:
- \n, \r, \b, \t, \f, \0x1B