Commit f6fa8bf8 authored by Thomas Leonard's avatar Thomas Leonard
Browse files

Added (incomplete) discussion of contents matching.

parent e235408e
......@@ -19,14 +19,14 @@
</authorgroup>
<title>Shared MIME-info Database</title>
<date>15 April 2002</date>
<date>18 April 2002</date>
</articleinfo>
<sect1>
<title>Introduction</title>
<sect2>
<title>Version</title>
<para>
This is version 0.2 of the Shared MIME-info Database spec, last updated 15 April 2002.
This is version 0.3 of the Shared MIME-info Database spec, last updated 18 April 2002.
</para>
</sect2>
<sect2>
......@@ -44,10 +44,10 @@ rules for determining the type apply to all programs.
</para>
<para>
This specification attempts to unify the type-guessing systems currently in
use by GNOME, KDE and ROX. Only the name-to-type mapping is covered by this
spec; other MIME type information, such as the default handler for a particular
type, or the icon to use to display it in a file manager, are not covered since
these are a matter of style.
use by GNOME, KDE and ROX. Only the name-to-type and contents-to-type mappings
are covered by this spec; other MIME type information, such as the default
handler for a particular type, or the icon to use to display it in a file
manager, are not covered since these are a matter of style.
</para>
</sect2>
<sect2>
......@@ -65,8 +65,8 @@ interpreted as described in RFC 2119.
<title>KDE</title>
<para>
KDE uses <filename>.desktop</filename> files, with Type=MimeType, one file
per file. The files are arranges in the filesystem to mirror the two-level
MIME type hierarchy.
per type to determine type from file name. The files are arranged in the
filesystem to mirror the two-level MIME type hierarchy.
The syntax is very similar to other <filename>.desktop</filename> files,
with Name=, Comment= etc.
</para>
......@@ -91,7 +91,23 @@ Value=.kwd
</para>
<para>
KDE does not have a separate system for specifying extension matches, but
uses glob patterns for everything.
uses case-sensitive glob patterns for everything.
</para>
<para>
A single file stores all the rules for recognising files by content. This
is almost identical to <citerefentry><refentrytitle>file</refentrytitle>
<manvolnum>1</manvolnum></citerefentry>'s <filename>magic.mime</filename>
database file, but without the encoding field.
</para>
<para>
The format is described in the file itself as follows:
<programlisting><![CDATA[
# The format is 4-5 columns:
# Column #1: byte number to begin checking from, ">" indicates continuation
# Column #2: type of data to match
# Column #3: contents of data to match
# Column #4: MIME type of result
]]></programlisting>
</para>
</sect2>
<sect2>
......@@ -194,6 +210,9 @@ The it tries again with the filename in lower-case. It then tries again
from the second '.', and so on. If no type is found, it tries the regular
expressions.
</para>
<para>
ROX has no rules for determining a file's type from its contents.
</para>
</sect2>
</sect1>
<sect1>
......@@ -229,6 +248,7 @@ Comment=HTML document
Comment[af]=...
[... etc. other translations ]
Patterns=*.htm;*.html;
Contents=(starts-with "<HTML")
]]></programlisting>
</para>
<para>
......@@ -256,6 +276,24 @@ one. This is so that <filename>main.C</filename> will be seen as a C++ file,
but <filename>IMAGE.GIF</filename> will still use the *.gif pattern.
</para>
</sect2>
<sect2>
<title>Contents matching</title>
<para>
The value of the Contents attribute is a scheme expression. If the expression
evaluates to a true value then the file is assumed to be of this type.
Since scanning a file's contents can be very slow, applications may choose
to do pattern matching first and only fallback to content matching, or not
perform it at all.
</para>
<para>
<note>
<para>
This is just a vague proposal at the moment. Also, need a list of functions
to provide.
</para>
</note>
</para>
</sect2>
<sect2>
<title>Dealing with conflicts</title>
<para>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment