File (command)

Revision as of 20:28, 5 May 2025 by imported>Guy Harris (→‎Usage: That's now Issue 8.)
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Template:Short description Template:Redirects here Template:Lowercase title {{#invoke:Infobox|infobox}}Template:Template other{{#invoke:Check for unknown parameters | check | showblankpositional=1 | unknown = Template:Main other | preview = Page using Template:Infobox software with unknown parameter "_VALUE_"|ignoreblank=y | AsOf | author | background | bodystyle | caption | collapsetext | collapsible | developer | discontinued | engine | engines | genre | included with | language | language count | language footnote | latest preview date | latest preview version | latest release date | latest release version | latest_preview_date | latest_preview_version | latest_release_date | latest_release_version | licence | license | logo | logo alt | logo caption | logo upright | logo size | logo title | logo_alt | logo_caption | logo_upright | logo_size | logo_title | middleware | module | name | operating system | operating_system | other_names | platform | programming language | programming_language | released | replaced_by | replaces | repo | screenshot | screenshot alt | screenshot upright | screenshot size | screenshot title | screenshot_alt | screenshot_upright | screenshot_size | screenshot_title | service_name | size | standard | title | ver layout | website | qid }}Template:Main other file is a shell command for reporting the type of data contained in a file. It is commonly supported in Unix and Unix-like operating systems.

As the command uses relatively quick-running heuristics to determine file type, it can report misleading information. The command can be fooled, for example, by including a magic number in the content even if the rest of the content does not match what the magic number indicates. The command report cannot be taken as completely trustworthy.

The Single UNIX Specification (SUS) requires the command to exhibit the following behavior with respect to the file specified via the command-line:

  1. If the file cannot be read, or its Unix file type is undetermined, the command will report that the file was processed but its type was undetermined
  2. The command must be able to determine the types directory, FIFO, socket, block special file, and character special file
  3. A zero-length file is reported as such
  4. An initial part of file is considered and the command is to use position-sensitive tests
  5. The entire file is considered and the command is to use context-sensitive tests
  6. Otherwise, the file is reported as a data file

Position-sensitive tests are normally implemented by matching various locations within the file against a textual database of magic numbers (see the Usage section). This differs from other simpler methods such as file extensions and schemes like MIME.

In the System V implementation, the Ian Darwin implementation, and the OpenBSD implementation, the command uses a database to drive the probing of the lead bytes. That database is stored as a file that is located in /etc/magic, /usr/share/file/magic or similar.

HistoryEdit

The file command originated in Unix Research Version 4<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> in 1973. System V brought a major update with several important changes, most notably moving the file type information into an external text file rather than compiling it into the binary itself.

Most major BSD and Linux distributions include a free, open-source implementation that was written from scratch by Ian Darwin in 1986–87.<ref>The early history of this program is recorded in its private CVS repository; see [1] Template:Webarchive the log of the main program</ref> It keeps file type information in a text file with a format based on that of the System V version. It was expanded by Geoff Collyer in 1989 and since then has had input from many others, including Guy Harris, Chris Lowth and Eric Fischer. From late 1993 onward, its maintenance has been organized by Christos Zoulas. The OpenBSD system has its own subset implementation written from scratch, but still uses the Darwin/Zoulas collection of magic file formatted information.

The <syntaxhighlight lang="text" class="" style="" inline="1">file</syntaxhighlight> command was ported to the IBM i operating system.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>

As of version 4.00 of the Ian Darwin/Christos Zoulas implementation of file, the functionality of the command is implemented in and exposed by a libmagic library that is accessible to consuming code via C (and compatible) linking.<ref>Template:Man</ref><ref>Template:Man</ref><ref>Template:Cite mailing list</ref><ref>Template:Cite mailing list</ref>

UsageEdit

The SUS<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> mandates the following command-line options:

  • -M file, prevents the default position-sensitive and context-sensitive tests in favor of the tests specified in a specially formatted file
  • -m file, same as for -M, but with tests in addition to the default
  • -d, selects default position-sensitive and context-sensitive tests; this is the default behavior unless -M or -m are specified
  • -h, do not dereference symbolic links that point to an existing file or directory
  • -L, dereference the symbolic link that points to an existing file or directory
  • -i, do not classify the file further than to report as: nonexistent, a block special file, a character special file, a directory, a FIFO, a socket, a symbolic link, or a regular file; the Ian Darwin and OpenBSD versions behave differently with this option and instead output an Internet media type ("MIME type") identifying the recognized file format

Implementations may add extra options. Ian Darwin's implementation adds -s 'special files', -k 'keep-going' or -r 'raw', among many others.<ref name=linux>Template:Man</ref>

ExamplesEdit

For a C source code file, <syntaxhighlight lang="text" class="" style="" inline="1">file main.c</syntaxhighlight> reports:

main.c: C program text

For a compiled executable, <syntaxhighlight lang="text" class="" style="" inline="1">file program</syntaxhighlight> reports information like:

program: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked
    (uses shared libs), stripped

For a block device /dev/hda, <syntaxhighlight lang="text" class="" style="" inline="1">file /dev/hda1</syntaxhighlight> reports:

/dev/hda1: block special (0/0)

By default, file does not try to read a device file due to potential undesirable effects. But using the non-standard option <syntaxhighlight lang="text" class="" style="" inline="1">-s</syntaxhighlight> (available in the Ian Darwin branch), which requests to read device files to identify content, <syntaxhighlight lang="text" class="" style="" inline="1">file -s /dev/hda1</syntaxhighlight> reports details such as:

/dev/hda1: Linux/i386 ext2 filesystem

Via Ian Darwin's non-standard option -k, the command does not stop after the first hit found, but looks for other matching patterns. The -r option, which is available in some versions, causes the new line character to be displayed in its raw form rather than in its octal representation. On Linux, <syntaxhighlight lang="text" class="" style="" inline="1">file -k -r libmagic-dev_5.35-4_armhf.deb</syntaxhighlight> reports information like:

libmagic-dev_5.35-4_armhf.deb: Debian binary package (format 2.0)
- current ar archive
- data

For a compressed file, <syntaxhighlight lang="text" class="" style="" inline="1">file compressed.gz</syntaxhighlight> reports information like:

compressed.gz: gzip compressed data, deflated, original filename, `compressed', last
    modified: Thu Jan 26 14:08:23 2006, os: Unix

For a compressed file, <syntaxhighlight lang="text" class="" style="" inline="1">file -i compressed.gz</syntaxhighlight> reports information like:

compressed.gz: application/x-gzip; charset=binary

For a PPM file, <syntaxhighlight lang="text" class="" style="" inline="1">file data.ppm</syntaxhighlight> reports;

data.ppm: Netpbm PPM "rawbits" image data

For a Mach-O universal binary, <syntaxhighlight lang="text" class="" style="" inline="1">file /bin/cat</syntaxhighlight> reports like:

/bin/cat: Mach-O universal binary with 2 architectures
/bin/cat (for architecture ppc7400):	Mach-O executable ppc
/bin/cat (for architecture i386):	Mach-O executable i386

For a symbolic link, <syntaxhighlight lang="text" class="" style="" inline="1">file /usr/bin/vi</syntaxhighlight> reports:

/usr/bin/vi: symbolic link to vim

Identifying a symbolic link is not available on all platforms and will be dereferenced if -L is passed or POSIXLY_CORRECT is set.

See alsoEdit

ReferencesEdit

Template:Reflist

External linksEdit

Template:Sister project

  • Fine Free File Command – homepage for Ian Darwin's version of file used in major BSD and Linux distributions.
  • binwalk, a firmware analysis tool that carves files based on libmagic signatures
  • TrID, an alternative providing ranked answers (instead of just one) based on statistics.
  • Magika, an ML-based tool, by Google Research

Template:Unix commands Template:Plan 9 commands