Data Management

Oracle Tip: Put some magic into your database

Discover how to check the magic value of a BLOB and return a MIME type and encoding information by directly examining the data.

The UNIX command file attempts to identify a file type based on bytes in a file rather than on the file extension. For example, every GIF file starts with the letters GIF, and every JPEG file begins with the value 0xffd8 in big-endian order.

The file command works by consulting an ASCII text file that's usually located at /etc/magic. You can check a UNIX machine's man page for magic to explore the format. You'll also find a similar file in the Apache Web server. It's used to identify files with an unrecognized file extension and return the correct MIME type. The Apache version is much simpler and covers files that are more likely to be in a Web server.

The database can use a similar technique to identify the MIME type of a BLOB column. Most database applications store some type of identification along with the data. However, it's possible for someone to incorrectly identify the data. Therefore, it would be useful to be able to check the magic value of a BLOB and return a MIME type and encoding information by directly examining the data.

Installing magic into the database

The process for installing magic uses a Perl script to generate data in a normalized format that can be loaded using SQL*Loader into the database. The data is then scanned by a single PL/SQL function, which tries to identify the BLOB's data. Here are the steps for setting up the data, along with an example.

Step 1: Create a Perl script to turn magic into a SQL*Loader data file:


#!/usr/local/bin/perl
# —- magicdata.pl

# scan the "magic" file for file identification rules
$filename = ($#ARGV>=0) ? $ARGV[0] : 'magic';
open(FILE,"<$filename") || die "Couldn't open file '$filename'";
$i = 0;
while (<FILE>)
{
    next if /^\s*#/;    # skip comments
    next if /^\s*$/;    # skip blank lines
    s/[\r\n]*//g;       # strip trailing cr/lf
    # replace octal escape codes
    s/\\([0-9]{3})/pack('C',oct($1))/eg;
    # split on spaces, except for "\ "
    my ($offset,$dt,$cnt,$mime,$encoding) = split(/(?<!\\)\s+/);
    $cont = ($offset =~ /^>/) ? 'Y' : undef;
    $offset = substr($offset,1) if $cont;
    if ($dt eq 'string')
    {
        # generate a HEXTORAW version of the string
        $data = join('',map(sprintf('%02X',$_),unpack('C*',$cnt)));
    }
    else
    {
        # handle special number formats
        if ($cnt =~ /^0x/) { $cnt = hex($cnt); }    # hex
        elsif ($cnt =~ /^0/) { $cnt = oct($cnt); }  # octal
        warn "unknown number: '$cnt'" unless $cnt =~ /^([0-9]|[1-9][0-9]*)$/;
        if ($dt eq 'belong') {
            $data = sprintf('%02X' x 4,unpack('C4',pack('N',$cnt)));
        } elsif ($dt eq 'lelong') {
            $data = sprintf('%02X' x 4,unpack('C4',pack('V',$cnt)));
        } elsif ($dt eq 'beshort' || $dt eq 'short') {
            $data = sprintf('%02X' x 2,unpack('C2',pack('n',$cnt)));
        } elsif ($dt eq 'leshort') {
            $data = sprintf('%02X' x 2,unpack('C2',pack('v',$cnt)));
        } elsif ($dt eq 'byte') {
            $data = sprintf('%02X',$cnt);
        } else {
            warn "data type '$dt' not implemented";
        }
    }
    $i++;
    print join(',',$i,$cont,$offset,$data,$mime,$encoding),"\n";
}
close(FILE);

$ perl magicdata.pl $ORACLE_HOME/Apache/conf/magic > magicdata.dat

Step 2: Create the table to hold this data using this SQL script in SQL*Plus:


create table magicdata
(
    line        integer,
    cont        char(1),
    offset      integer,
    data        raw(24),
    mime        varchar2(24),
    encoding    varchar2(10)
);

Step 3: Load the data generated above into the database with SQL*Loader:


load data
truncate
into table magicdata
fields terminated by ',' optionally enclosed by '"'
trailing nullcols
(
    line,
    cont,
    offset,
    data,
    mime,
    encoding
)

$ sqlldr user=scott/tiger control=magicdata.ctl data=magicdata.dat

Here is some test data; it's a simple table with one BLOB column.

drop table magictest;
create table magictest (myblob blob);

Step 4: Use SQL*Loader to load three images from files into this table:


load data
infile *
into table magictest
fields terminated by ','
(
    fname filler,
    "MYBLOB" lobfile(fname) terminated by eof
)
begindata
file.bmp
file.gif
file.jpg

$sqlldr userid=scott/tiger control=magictest.ctl

Here's a PL/SQL function to scan the magic data table and the BLOB to attempt to determine the file MIME type:


create or replace function magic(lob_loc blob) return varchar2
is
    continued boolean := false;
    bdata raw(100);
begin
    for rec in (select * from magicdata order by line) loop
        if rec.cont = 'Y' then
            if continued then
                bdata := dbms_lob.substr
                (
                    lob_loc,
                    utl_raw.length(rec.data),
                    rec.offset+1
                );
                if utl_raw.compare(bdata,rec.data) = 0 then
                    return rec.mime;
                end if;
            end if;
        else
            bdata := dbms_lob.substr
            (
                lob_loc,
                utl_raw.length(rec.data),
                rec.offset+1
            );
            dbms_output.put_line(bdata||' <=> '||rec.data);
            if utl_raw.compare(bdata,rec.data) = 0 then
                if rec.mime is null then
                    continued := true;
                else
                    return rec.mime;
                end if;
            end if;
        end if;
    end loop;
    return null;
end magic;
/
show errors;

Now, to show it actually works, I run the SQL statement:


Select magic(myblob) from testdata;

And get the output:


MAGIC(MYBLOB)
——————-
image/bmp
image/gif
image/jpeg

TechRepublic's Oracle newsletter covers automating Oracle utilities, generating database alerts, solving directed graph problems, and more. Automatically subscribe today!

Editor's Picks