Commit dc2724f1 authored by Thomas Leonard's avatar Thomas Leonard

Updated for new release.

parent da5d7650
...@@ -3,6 +3,10 @@ shared-mime-info Thomas Leonard <tal00r@ecs.soton.ac.uk> ...@@ -3,6 +3,10 @@ shared-mime-info Thomas Leonard <tal00r@ecs.soton.ac.uk>
Version 0.10 (03-Mar-2003) Version 0.10 (03-Mar-2003)
* Much better validation of input files.
* Added note about the use of extended attributes to store the MIME type.
* Ensure that all changes to generated files happen atomically. * Ensure that all changes to generated files happen atomically.
* Change to half-text, half-binary format to make parsing the magic file * Change to half-text, half-binary format to make parsing the magic file
......
<?xml version="1.0" standalone="no"?> <?xml version="1.0" standalone="no"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"/usr/share/sgml/docbook/dtd/xml/4.1.2/docbookx.dtd" [ "/usr/share/sgml/docbook/dtd/xml/4.1.2/docbookx.dtd" [
<!ENTITY updated "26 Feb 2003"> <!ENTITY updated "03 Mar 2003">
<!ENTITY version "0.11-pre"> <!ENTITY version "0.10">
]> ]>
<article id="index"> <article id="index">
...@@ -31,7 +31,7 @@ ...@@ -31,7 +31,7 @@
<sect2> <sect2>
<title>Version</title> <title>Version</title>
<para> <para>
This is version &version; of the Shared MIME-info Database spec, last updated &updated;.</para> This is version &version; of the Shared MIME-info Database specification, last updated &updated;.</para>
</sect2> </sect2>
<sect2> <sect2>
<title>What is this spec?</title> <title>What is this spec?</title>
...@@ -374,9 +374,10 @@ The format of these generated files and the source files in <filename>packages</ ...@@ -374,9 +374,10 @@ The format of these generated files and the source files in <filename>packages</
are explained in the following sections. This step serves several purposes. First, it allows are explained in the following sections. This step serves several purposes. First, it allows
applications to quickly get the data they need without parsing all the source XML files (the applications to quickly get the data they need without parsing all the source XML files (the
base package alone is over 700K). Second, it allows the database to be used for other base package alone is over 700K). Second, it allows the database to be used for other
purposes (such as creating the <filename>/etc/mime.types</filename> if desired). Third, it purposes (such as creating the <filename>/etc/mime.types</filename> file if
allows some validation to be performed on the input data, and removes the need for other desired). Third, it allows validation to be performed on the input data,
applications to carefully check the input for errors themselves. and removes the need for other applications to carefully check the input for
errors themselves.
</para> </para>
</sect2> </sect2>
<sect2> <sect2>
...@@ -413,7 +414,8 @@ filename matching anyway. ...@@ -413,7 +414,8 @@ filename matching anyway.
<userinput>priority</userinput> attribute for all of the contained rules. Low <userinput>priority</userinput> attribute for all of the contained rules. Low
numbers should be used for more generic types (such as 'gzip compressed data') numbers should be used for more generic types (such as 'gzip compressed data')
and higher values for specific subtypes (such as a word processor format that and higher values for specific subtypes (such as a word processor format that
happens to use gzip to compress the file). The default priority value is 50. happens to use gzip to compress the file). The default priority value is 50, and
the maximum is 100.
</para><para> </para><para>
Each <userinput>match</userinput> element has a number of attributes: Each <userinput>match</userinput> element has a number of attributes:
...@@ -440,9 +442,9 @@ Each <userinput>match</userinput> element has a number of attributes: ...@@ -440,9 +442,9 @@ Each <userinput>match</userinput> element has a number of attributes:
</entry></row> </entry></row>
<row><entry>mask</entry><entry>No</entry><entry> <row><entry>mask</entry><entry>No</entry><entry>
The number to AND the value in the file with before comparing it to `value'. The The number to AND the value in the file with before comparing it to
mask can start with `0x' to indicate a hexadecimal value, or with `0' to indicate `value'. Masks for numerical types can be any number, while masks for strings
octal. must be in base 16, and start with 0x.
</entry></row> </entry></row>
</tbody></tgroup> </tbody></tgroup>
...@@ -462,10 +464,10 @@ to provide the text in multiple languages. ...@@ -462,10 +464,10 @@ to provide the text in multiple languages.
</itemizedlist> </itemizedlist>
Applications may also define their own elements, provided they are namespaced to prevent collisions. Applications may also define their own elements, provided they are namespaced to prevent collisions.
Unknown elements are copied directly to the output XML files like <userinput>comment</userinput> Unknown elements are copied directly to the output XML files like <userinput>comment</userinput>
elements. elements. A typical use for this would be to indicate the default handler
A typical use for this would be to indicate the default handler application for a particular desktop application for a particular desktop
("Galeon is the GNOME default text/html browser"). Note that this doesn't indicate the user's preferred ("Galeon is the GNOME default text/html browser"). Note that this doesn't
application, only the (fixed) default. indicate the user's preferred application, only the (fixed) default.
</para> </para>
<para> <para>
Here is an example source file, named <filename>diff.xml</filename>: Here is an example source file, named <filename>diff.xml</filename>:
...@@ -486,6 +488,7 @@ Here is an example source file, named <filename>diff.xml</filename>: ...@@ -486,6 +488,7 @@ Here is an example source file, named <filename>diff.xml</filename>:
</mime-type> </mime-type>
</mime-info> </mime-info>
]]></programlisting> ]]></programlisting>
</para><para>
In practice, common types such as text/x-diff are provided by the freedesktop.org shared In practice, common types such as text/x-diff are provided by the freedesktop.org shared
database. Also, only new information needs to be provided, since this information will be merged database. Also, only new information needs to be provided, since this information will be merged
with other information about the same type. with other information about the same type.
...@@ -568,9 +571,9 @@ Where possible, compatible changes only will be made. ...@@ -568,9 +571,9 @@ Where possible, compatible changes only will be made.
All numbers are big-endian, so need to be byte-swapped on little-endian machines. All numbers are big-endian, so need to be byte-swapped on little-endian machines.
</para><para> </para><para>
The rest of the file is made up of a sequence of small sections. The rest of the file is made up of a sequence of small sections.
Each section is introduced by giving the priority and type in brackets. Each section is introduced by giving the priority and type in brackets, followed by
Higher priority entries come first. a newline character. Higher priority entries come first. Example:
<screen>[50:text/x-diff]</screen> <screen>[50:text/x-diff]\n</screen>
Each line in the section takes the form: Each line in the section takes the form:
<screen>[ indent ] ">" start-offset "=" value <screen>[ indent ] ">" start-offset "=" value
[ "&amp;" mask ] [ "~" word-size ] [ "+" range-length ] "\n"</screen> [ "&amp;" mask ] [ "~" word-size ] [ "+" range-length ] "\n"</screen>
...@@ -579,7 +582,7 @@ Each line in the section takes the form: ...@@ -579,7 +582,7 @@ Each line in the section takes the form:
<thead><row><entry>Part</entry><entry>Example</entry><entry>Meaning</entry></row></thead> <thead><row><entry>Part</entry><entry>Example</entry><entry>Meaning</entry></row></thead>
<tbody> <tbody>
<row><entry>indent</entry><entry>0</entry><entry>The nesting <row><entry>indent</entry><entry>1</entry><entry>The nesting
depth of the rule, corresponding to the number of '>' characters in the traditional file format.</entry></row> depth of the rule, corresponding to the number of '>' characters in the traditional file format.</entry></row>
<row><entry>">" start-offset</entry><entry>&gt;4</entry><entry>The offset into the <row><entry>">" start-offset</entry><entry>&gt;4</entry><entry>The offset into the
file to look for a match.</entry></row> file to look for a match.</entry></row>
...@@ -598,8 +601,9 @@ Each line in the section takes the form: ...@@ -598,8 +601,9 @@ Each line in the section takes the form:
</tgroup> </tgroup>
</informaltable> </informaltable>
</para><para> </para><para>
Note that the start-offset, value, value length and mask are all binary, Note that the value, value length and mask are all binary, whereas everything
whereas everything else is textual. else is textual. Each of the elements begins with a single character to
identify it, except for the indent level.
</para><para> </para><para>
The word size is used for byte-swapping. Little-endian systems should reverse The word size is used for byte-swapping. Little-endian systems should reverse
the order of groups of bytes in the value and mask if this is greater than one. the order of groups of bytes in the value and mask if this is greater than one.
...@@ -607,12 +611,18 @@ This only affects `host' matches (`big32' entries still have a word size of 1, ...@@ -607,12 +611,18 @@ This only affects `host' matches (`big32' entries still have a word size of 1,
for example, because no swapping is necessary, whereas `host32' has a word size for example, because no swapping is necessary, whereas `host32' has a word size
of 4). of 4).
</para><para> </para><para>
The range-length, word-size and mask components are optional. If missing, the range-length The indent, range-length, word-size and mask components are optional. If
defaults to 1, the word-size is 1, and the mask is all one bits. missing, indent defaults to 0, range-length to 1, the word-size to 1, and the
mask to all 'one' bits.
</para><para> </para><para>
Indent corresponds to the nesting depth of the rule. Top-level rules have an Indent corresponds to the nesting depth of the rule. Top-level rules have an
indent of zero. The parent of an entry is the preceding entry with an indent indent of zero. The parent of an entry is the preceding entry with an indent
one less than the entry. The test number is an index into the array of tests. one less than the entry.
</para><para>
If an unknown character is found where a newline is expected then the whole
line should be ignored (there will be no binary data after the new
character, so the next line starts after the next "\n" character). This is for
future extensions.
</para><para> </para><para>
The text/x-diff above example would (on its own) create this magic file: The text/x-diff above example would (on its own) create this magic file:
<programlisting><![CDATA[ <programlisting><![CDATA[
...@@ -627,11 +637,11 @@ The text/x-diff above example would (on its own) create this magic file: ...@@ -627,11 +637,11 @@ The text/x-diff above example would (on its own) create this magic file:
<sect2> <sect2>
<title>Storing the MIME type using Extended Attributes</title> <title>Storing the MIME type using Extended Attributes</title>
<para> <para>
An implementation may also get a file's MIME type from the <userinput>user.mime_type</userinput> extended An implementation MAY also get a file's MIME type from the <userinput>user.mime_type</userinput> extended
attribute. <!-- The attr(5) man page documents this name --> The type given here should normally be used attribute. <!-- The attr(5) man page documents this name --> The type given here should normally be used
in preference to any guessed type, since the user is able to set it explicitly. Applications may choose to in preference to any guessed type, since the user is able to set it explicitly. Applications MAY choose to
set the type when saving files. Since many applications and filesystems do not support extended attributes, set the type when saving files. Since many applications and filesystems do not support extended attributes,
implementations should not rely on this method being available. implementations MUST NOT rely on this method being available.
</para> </para>
</sect2> </sect2>
<sect2> <sect2>
......
...@@ -155,6 +155,24 @@ static gboolean validate_magic(xmlNode *parent) ...@@ -155,6 +155,24 @@ static gboolean validate_magic(xmlNode *parent)
return TRUE; return TRUE;
} }
static int get_priority(xmlNode *node)
{
char *prio_string;
int p;
prio_string = xmlGetNsProp(node, "priority", NULL);
if (prio_string)
{
p = atoi(prio_string);
g_free(prio_string);
if (p < 0 || p > 100)
return -1;
return p;
}
else
return 50;
}
/* 'field' was found in the definition of 'type' and has the freedesktop.org /* 'field' was found in the definition of 'type' and has the freedesktop.org
* namespace. If it's a known field, process it and return TRUE, else * namespace. If it's a known field, process it and return TRUE, else
* return FALSE to add it to the output XML document. * return FALSE to add it to the output XML document.
...@@ -167,17 +185,22 @@ static gboolean process_freedesktop_node(Type *type, xmlNode *field) ...@@ -167,17 +185,22 @@ static gboolean process_freedesktop_node(Type *type, xmlNode *field)
pattern = xmlGetNsProp(field, "pattern", NULL); pattern = xmlGetNsProp(field, "pattern", NULL);
g_return_val_if_fail(pattern != NULL, FALSE); if (pattern)
{
g_hash_table_insert(globs_hash, g_strdup(pattern), type); g_hash_table_insert(globs_hash,
g_free(pattern); g_strdup(pattern), type);
g_free(pattern);
}
else
g_print("* Missing 'pattern' attribute in glob element "
"(type %s/%s)\n", type->media, type->subtype);
} }
else if (strcmp(field->name, "magic") == 0) else if (strcmp(field->name, "magic") == 0)
{ {
xmlNode *copy; xmlNode *copy;
gchar *type_name; gchar *type_name;
if (validate_magic(field)) if (get_priority(field) != -1 && validate_magic(field))
{ {
copy = xmlCopyNode(field, 1); copy = xmlCopyNode(field, 1);
type_name = g_strconcat(type->media, "/", type_name = g_strconcat(type->media, "/",
...@@ -188,15 +211,15 @@ static gboolean process_freedesktop_node(Type *type, xmlNode *field) ...@@ -188,15 +211,15 @@ static gboolean process_freedesktop_node(Type *type, xmlNode *field)
g_ptr_array_add(magic, copy); g_ptr_array_add(magic, copy);
} }
else else
g_print("Skipping invalid magic for type '%s/%s'\n", g_print("* Skipping invalid magic for type '%s/%s'\n",
type->media, type->subtype); type->media, type->subtype);
} }
else if (strcmp(field->name, "comment") == 0) else if (strcmp(field->name, "comment") == 0)
return FALSE; /* Copy through */ return FALSE; /* Copy through */
else else
{ {
g_warning("Unknown freedesktop.org field '%s' " g_print("* Unknown freedesktop.org field '%s' "
"in type '%s/%s'\n", "in type '%s/%s'\n",
field->name, type->media, type->subtype); field->name, type->media, type->subtype);
return FALSE; return FALSE;
} }
...@@ -430,7 +453,13 @@ static void write_out_glob(gpointer key, gpointer value, gpointer data) ...@@ -430,7 +453,13 @@ static void write_out_glob(gpointer key, gpointer value, gpointer data)
Type *type = (Type *) value; Type *type = (Type *) value;
FILE *stream = (FILE *) data; FILE *stream = (FILE *) data;
fprintf(stream, "%s/%s:%s\n", type->media, type->subtype, pattern); if (strchr(pattern, '\n'))
g_print("* Glob patterns can't contain literal newlines "
"(%s in type %s/%s)\n", pattern,
type->media, type->subtype);
else
fprintf(stream, "%s/%s:%s\n",
type->media, type->subtype, pattern);
} }
/* Renames pathname by removing the .new extension */ /* Renames pathname by removing the .new extension */
...@@ -472,23 +501,6 @@ static void write_out_type(gpointer key, gpointer value, gpointer data) ...@@ -472,23 +501,6 @@ static void write_out_type(gpointer key, gpointer value, gpointer data)
g_free(filename); g_free(filename);
} }
static int get_priority(xmlNode *node)
{
char *prio_string;
int p;
prio_string = xmlGetNsProp(node, "priority", NULL);
if (prio_string)
{
p = atoi(prio_string);
g_free(prio_string);
g_return_val_if_fail(p >= 0 && p <= 100, 50);
return p;
}
else
return 50;
}
/* (this is really inefficient) */ /* (this is really inefficient) */
static gint cmp_magic(gconstpointer a, gconstpointer b) static gint cmp_magic(gconstpointer a, gconstpointer b)
{ {
...@@ -651,8 +663,7 @@ static void parse_int_value(int bytes, const char *in, const char *in_mask, ...@@ -651,8 +663,7 @@ static void parse_int_value(int bytes, const char *in, const char *in_mask,
value = strtol(in, &end, 0); value = strtol(in, &end, 0);
if (*end != '\0') if (*end != '\0')
{ {
g_set_error(error, MIME_ERROR, g_set_error(error, MIME_ERROR, 0, "Value is not a number");
0, "Value is not a number");
return; return;
} }
...@@ -785,9 +796,8 @@ static void write_magic_children(FILE *stream, xmlNode *parent, int indent, ...@@ -785,9 +796,8 @@ static void write_magic_children(FILE *stream, xmlNode *parent, int indent,
for (node = parent->xmlChildrenNode; node; node = node->next) for (node = parent->xmlChildrenNode; node; node = node->next)
{ {
GError *error = NULL; GError *error = NULL;
char *offset, *mask, *value, *type; char *offset, *mask, *value, *type, *end;
char *parsed_mask = NULL; char *parsed_mask = NULL;
const char *colon;
int word_size = 1; int word_size = 1;
long range_start; long range_start;
int range_length = 1; int range_length = 1;
...@@ -804,23 +814,40 @@ static void write_magic_children(FILE *stream, xmlNode *parent, int indent, ...@@ -804,23 +814,40 @@ static void write_magic_children(FILE *stream, xmlNode *parent, int indent,
g_return_if_fail(value != NULL); g_return_if_fail(value != NULL);
g_return_if_fail(type != NULL); g_return_if_fail(type != NULL);
range_start = atol(offset); range_start = strtol(offset, &end, 10);
colon = strchr(offset, ':'); if (!*offset)
if (colon) g_set_error(&error, MIME_ERROR, 0, "Empty offset");
range_length = atol(colon + 1) - range_start + 1; else if (*end == ':' && end[1])
{
int range_end;
range_end = strtol(end + 1, &end, 10);
if (*end == '\0')
range_length = range_end - range_start + 1;
else
g_set_error(&error, MIME_ERROR, 0,
"Invalid offset");
}
else if (*end != '\0')
g_set_error(&error, MIME_ERROR, 0, "Invalid offset");
if (strcmp(type, "host16") == 0) if (strcmp(type, "host16") == 0)
word_size = 2; word_size = 2;
else if (strcmp(type, "host32") == 0) else if (strcmp(type, "host32") == 0)
word_size = 4; word_size = 4;
else if (strcmp(type, "big16") && strcmp(type, "big32") && else if (!error && strcmp(type, "big16") &&
strcmp(type, "big32") &&
strcmp(type, "little16") && strcmp(type, "little32") && strcmp(type, "little16") && strcmp(type, "little32") &&
strcmp(type, "string") && strcmp(type, "byte")) strcmp(type, "string") && strcmp(type, "byte"))
g_warning("Unknown magic type '%s'\n", type); {
g_set_error(&error, MIME_ERROR, 0,
"Unknown magic type '%s'", type);
}
g_string_truncate(parsed_value, 0); g_string_truncate(parsed_value, 0);
parse_value(type, value, mask, parsed_value, &parsed_mask, if (!error)
&error); parse_value(type, value, mask,
parsed_value, &parsed_mask, &error);
if (error) if (error)
{ {
...@@ -866,6 +893,7 @@ static void write_magic(FILE *stream, xmlNode *node) ...@@ -866,6 +893,7 @@ static void write_magic(FILE *stream, xmlNode *node)
int prio; int prio;
prio = get_priority(node); prio = get_priority(node);
g_return_if_fail(prio >= 0 && prio <= 100);
type = xmlGetNsProp(node, "type", NULL); type = xmlGetNsProp(node, "type", NULL);
g_return_if_fail(type != NULL); g_return_if_fail(type != NULL);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment