The UNIX command file attempts to identify a
file type based on bytes in a file rather than on the file
extension. For example, every GIF file starts with the letters GIF, and every JPEG file
begins with the value 0xffd8 in big-endian order.

The
file command works by
consulting an ASCII text file that’s usually located at /etc/magic.
You can check a UNIX machine’s man page for magic to explore the
format. You’ll also find a similar file in the Apache Web server.
It’s used to identify files with an unrecognized file extension and
return the correct MIME type. The Apache version is much simpler
and covers files that are more likely to be in a Web server.

The database can use a similar technique to
identify the MIME type of a BLOB column. Most database applications
store some type of identification along with the data. However,
it’s possible for someone to incorrectly identify the data.
Therefore, it would be useful to be able to check the magic value
of a BLOB and return a MIME type and encoding information by
directly examining the data.


Installing magic into the database

The process for installing magic uses a Perl
script to generate data in a normalized format that can be loaded
using SQL*Loader into the database. The data is then scanned by a
single PL/SQL function, which tries to identify the BLOB’s data.
Here are the steps for setting up the data, along with an
example.

Step
1:
Create a Perl script to turn magic into a SQL*Loader data
file:


#!/usr/local/bin/perl
# — magicdata.pl

# scan the “magic” file for file identification rules
$filename = ($#ARGV>=0) ? $ARGV[0] : ‘magic’;
open(FILE,”<$filename”) || die “Couldn’t open file
‘$filename'”;
$i = 0;
while (<FILE>)
{
    next if /^\s*#/;    #
skip comments
    next if /^\s*$/;    #
skip blank lines
    s/[\r\n]*//g;       #
strip trailing cr/lf
    # replace octal escape codes
    s/\\([0-9]{3})/pack(‘C’,oct($1))/eg;
    # split on spaces, except for “\ “
    my ($offset,$dt,$cnt,$mime,$encoding) =
split(/(?<!\\)\s+/);
    $cont = ($offset =~ /^>/) ? ‘Y’ :
undef;
    $offset = substr($offset,1) if
$cont;
    if ($dt eq ‘string’)
    {
        # generate a
HEXTORAW version of the string
        $data =
join(”,map(sprintf(‘%02X’,$_),unpack(‘C*’,$cnt)));
    }
    else
    {
        # handle special
number formats
        if ($cnt =~ /^0x/)
{ $cnt = hex($cnt); }    # hex
        elsif ($cnt =~
/^0/) { $cnt = oct($cnt); }  # octal
        warn “unknown
number: ‘$cnt'” unless $cnt =~ /^([0-9]|[1-9][0-9]*)$/;
        if ($dt eq
‘belong’) {
            $data
= sprintf(‘%02X’ x 4,unpack(‘C4’,pack(‘N’,$cnt)));
        } elsif ($dt eq
‘lelong’) {
            $data
= sprintf(‘%02X’ x 4,unpack(‘C4’,pack(‘V’,$cnt)));
        } elsif ($dt eq
‘beshort’ || $dt eq ‘short’) {
            $data
= sprintf(‘%02X’ x 2,unpack(‘C2’,pack(‘n’,$cnt)));
        } elsif ($dt eq
‘leshort’) {
            $data
= sprintf(‘%02X’ x 2,unpack(‘C2’,pack(‘v’,$cnt)));
        } elsif ($dt eq
‘byte’) {
            $data
= sprintf(‘%02X’,$cnt);
        } else {
            warn
“data type ‘$dt’ not implemented”;
        }
    }
    $i++;
    print
join(‘,’,$i,$cont,$offset,$data,$mime,$encoding),”\n”;
}
close(FILE);

$ perl magicdata.pl $ORACLE_HOME/Apache/conf/magic >
magicdata.dat

Step
2:
Create the table to hold this data using this SQL script in
SQL*Plus:


create table magicdata
(
    line        integer,

    cont        char(1),

    offset      integer,

    data        raw(24),

   
mime        varchar2(24),

    encoding    varchar2(10)

);

Step
3:
Load the data generated above into the database with
SQL*Loader:


load data
truncate
into table magicdata
fields terminated by ‘,’ optionally enclosed by ‘”‘
trailing nullcols
(
    line,
    cont,
    offset,
    data,
    mime,
    encoding
)

$ sqlldr user=scott/tiger control=magicdata.ctl
data=magicdata.dat

Here is some test data; it’s a simple table
with one BLOB column.

drop table magictest;
create table magictest (myblob blob);

Step
4:
Use SQL*Loader to load three images from files into this
table:


load data
infile *
into table magictest
fields terminated by ‘,’
(
    fname filler,
    “MYBLOB” lobfile(fname) terminated by
eof
)
begindata
file.bmp
file.gif
file.jpg

$sqlldr userid=scott/tiger control=magictest.ctl

Here’s a PL/SQL function to scan the magic data
table and the BLOB to attempt to determine the file MIME type:


create or replace function magic(lob_loc blob) return
varchar2
is
    continued boolean := false;
    bdata raw(100);
begin
    for rec in (select * from magicdata order
by line) loop
        if rec.cont = ‘Y’
then
            if
continued then
                bdata
:= dbms_lob.substr
                (

                    lob_loc,

                    utl_raw.length(rec.data),

                    rec.offset+1

                );

                if
utl_raw.compare(bdata,rec.data) = 0 then
                    return
rec.mime;
                end
if;
            end
if;
        else
            bdata
:= dbms_lob.substr
            (

                lob_loc,

                utl_raw.length(rec.data),

                rec.offset+1

            );

            dbms_output.put_line(bdata||’
<=> ‘||rec.data);
            if
utl_raw.compare(bdata,rec.data) = 0 then
                if
rec.mime is null then
                    continued
:= true;
                else

                    return
rec.mime;
                end
if;
            end
if;
        end if;
    end loop;
    return null;
end magic;
/
show errors;

Now, to show it actually works, I run the SQL
statement:


Select magic(myblob) from testdata;

And get the output:


MAGIC(MYBLOB)
————-
image/bmp
image/gif
image/jpeg

TechRepublic’s Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!