File Utility and Libmagic

It’s been long time since I posted something because I am kind of trapped in busy schedule. Anyways I decided to take out
little time to write this up. This post will focus about “file” unix utility and magic number,

In Unix and Unix like OS, file extensions are not that necessary rather MIME type of the file is used to determine file type. “file” utility can be used
to do that thing. File utility actually performs three tests to determine file type

1) Filesystem test
2) Magic number test
3) Language test

Filesystem test

The filesystem test is based on return from stat() system call from the object’s inode(data structure used to store file information) and
determines if it’s a socket , symbolic-link ,etc which are defined in “stat.h”
STAT

Magic number test

Magic numbers are file signature that are associated with the filetype. Unix has this magic number database that is used for comparison
magic number is stored in a particular place in a file and is compared with database to determine filetype

Language test

This test is performed if other two fail to give any result. This test determines if given file has human readable text and if so then
what type of text , This test also tries to find if text is “English” , “Arabic” , etc. (using type of encoding)

Let’s take deeper look into magic number test

A magic number is a number embedded at or near the beginning of a file that indicates its file format (i.e., the type of file it is). It is also sometimes referred to as a file signature. Magic numbers are generally not visible to users.

However, they can easily be seen with the use of a hex editor, which is a specialized program that shows and allows modification of every byte in a file. For common file formats, the numbers conveniently represent the names of the file types. Thus, for example, the magic number for linux ELF file is 0x7F454C46, which when converted into ASCII is .ELF.

1
$ hexdump -C <file name> | more

hexdump can be used to show magic number of a file . In the picture below I used it on a linux ELF file to show the concept.

hexdump

Magic number test is not that hard to self implement programmatically but why bother if your system comes shipped with libmagic library that can perform this type of
test without much hassle.

Let’s try to utilize libmagic with an example in C

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include<stdio.h>
#include<magic.h>
#define MIME_DB "/usr/share/file/magic.mgc"
int main(int argc,char **argv){
if(argc !=2 ){
printf("invalid argument supplied\n");
return 1;
}
char *file=argv[1];
const char *mimetype;
magic_t magic_cookie;
magic_cookie=magic_open(MAGIC_MIME_TYPE);
if(magic_cookie == NULL){
printf("error creating magic cookie\n");
return 1;
}
magic_load(magic_cookie,MIME_DB);
mimetype=magic_file(magic_cookie,file);
printf("%s\n",mimetype);
magic_close(magic_cookie);
return 0;
}

to compile this

1
$ gcc -Wall mime.c -o mime -lmagic

and running it

1
$ ./mime mime

magic.c

As evident from source , I defined location of mimetype database as MIME_DB , then after argument checking I called magic_open() function
with argument MAGIC_MIME_TYPE so that other functions should return mime string. This function creates magic cookie pointer.

Then , I initialized mimetype database using magic_load() and finally used magic_file() which returned mimetype of that file.
magic_close() is used to avoid memory leakage after we’re done.

Manual page of libmagic has more detailed information about this library.

Even anti-viruses use magic number to detect malware based on their magic number , which needless to say is not very effective technique for this purpose if used alone.