-
Notifications
You must be signed in to change notification settings - Fork 10
The Parser
The parser is really simple and process the file line by line. Even that the
Netscape Bookmarks file format is generated by browsers, the error acceptance
is pretty high, just raising exception if a folder doesn't have a body closing
tag (</DL><p>
). Behaves like a browser importing bookmarks from a file.
On this page describes the support for the format
Due to the lack of documentation and standard for the Netscape Bookmarks file format, the parser was built with html export by browser as guide and test cases. Some things that were found on the internet were added with full support, but other with just legacy support. To see the guidelines for the parser see the Netscape Bookmarks File Format wiki page
The meta, the first seven lines of the file, should be like this:
<!DOCTYPE NETSCAPE-Bookmark-file-1>
<!-- This is an automatically generated file.
It will be read and overwritten.
DO NOT EDIT! -->
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Bookmarks</TITLE>
<H1>Bookmarks</H1>
Some parts are variable from browser to browser. Most common changes are:
-
Comment: The warning is ignored and don't need to be three lines, since some examples showed just one line. The parser, when finds the
<!--
will skip the next line after the-->
. However, no empty lines before and after the comment are expected. -
H1
tag: This can be another name, like a translation, what Opera does.
The parser doesn't require a meta. If it doesn't find it, it'll just parse the bookmark folder tree
After the H1
tag should be a <DL><p>
, but there can be empty lines and the parser
won't raise an exception. This marks the start of the root bookmarks folder. There also
should be a </DL><p>
at the end of the file, marking the end of the root folder,
not needing to be in the last line of the file. Anything after the </DL><p>
won't be processed neither will be in the non_parsed
dictionary
An item can be a folder, shortcut, etc. The supported are:
A folder is an item that contains other items. They have a H3
tag and their tree is
wrapped by <DL><p>
and </DL><p>
. The H3
should be:
<DT><H3 {attributes}>{name}</H3>
The parser ignores spaces between attributes. The body should be:
<DL><p>
{item}
{item}
{item}
.
.
.
</DL><p>
If a folder doesn't have a <DL><p>
and a </DL><p>
it's considered empty. If there's
just one tag, the parser will raise an exception
A shortcut is an item with a link. They have a A
tag that should be:
<DT><A HREF="{url}" {attributes}>{name}</A>
The parser proceeds normally if the shortcut doesn't have the HREF
attribute and if
the attributes doesn't have spaces among themselves
Those are integrated with shortcut and are implemented with basic support. If a shortcut
have FEED="true
or WEBSLICE="true
, a special type of shortcut is created that supports
some extra attributes of these types. Icons are already integrated with shortcuts, being
them the icon of the shortcut that the browser shows. For more info about these, see
the Netscape Bookmarks file format wiki page
The parser supports the most common attributes that are used by browser and some extras. Every item already have a set of attributes, some are inherited, some are specific for their type
All item have these attributes:
-
Number position: This one isn't put in the item tag, but the order that the bookmarks have is important. This makes possible organize bookmarks on a browser
-
ADD_DATE: The date that the item was created, in unix time
-
LAST_MODIFIED: The date of the last modification in an item, in unix time
-
Parent: Also not present on the item tag. The parent is always the folder that contains the item. Just the root folder doesn't have a parent
-
Name: Is in the tag contents, between the opening and the closing tag.
In addition of Item attributes, folder have these attributes:
-
PERSONAL_TOOLBAR_FOLDER: Represents if a folder is the bookmark toolbar folder. Normally present just in toolbar folder, with the value "true". Normal folders commonly doesn't have this attribute
-
Items: The items that the folder contains. These item are between
<DL><p>
</DL><p>
In addition of Item attributes, shortcuts have these attributes:
-
HREF: the url to the page of the shortcut
-
LAST_VISIT: the date of the last visit to the site of the shortcut, in unix time. Not present in all browsers exports
-
PRIVATE: probably indicates if a shortcut is private. It's value is an integer, probably "0" or "1". I've never found outside of created examples
-
TAGS: list of the shortcut tags, divided by commas. These are inserted by the user in the bookmark manager of his browser.
-
ICON_URI: url of the favicon.ico of the domain. If the domain doesn't have one, the value of this attributes will be
"fake-favicon-uri:{url}"
, being{url}
the same of theHREF
attribute. Present in Firefox exports -
ICON: the icon of the shortcut encode in a string with the format/encoding string before it. Normally the icon is a png and encoded in base64, represented by the "data:image/png;base64," before the encoded data appears. If another picture format or encoding is used, it'll be present in the the start of the value with "data:image/{format};{encoding}", however I never seen one that wasn't png;base64. The format/encoding notation and the encoded data are separated by a comma
-
Comment: not present in the
A
tag, but in the line next to it, in aDD
tag without closing -
SHORTCUTURL: the address bar shortcut keyword that is associated with the bookmark. Used by Firefox. When the keyword is entered in the address bar that's associated with a bookmark, Firefox will automatically navigate to that bookmarked page
In addition to shortcuts attributes, feed have the following attributes:
-
FEED: present in the
A
tag, with value "true" -
FEEDURL: the url of the feed
In addition to shortcuts attributes, web slices have the following attributes:
-
WEBSLICE: present in the
A
tag, with value "true" -
ISLIVEPREVIEW: if the web slice is a live preview, has boolean value ("true", commonly)
-
PREVIEWSIZE: contains the windows size of the web slice, in a string, with the format "w x h"
Some things aren't more used in browser nowadays, but they're supported by the parser. However, some things existed, but isn't seen anywhere. These are the Feeds and Web Slices.
Feeds: probably RSS Feeds, would deliver news' headlines of a website in a simple format
with an url to the article page. On old versions of Internet Explorer, these where normal
shortcuts. Nowadays teh social networks, with their feed, made the RSS a little obsolete.
However, some people still use them, but they need an app or a browser extension to handle
those, even in a better way. The parser know if an shortcut is a feed if the attribute
FEED
is present, and will raw copy the value of the FEEDURL
attribute
Web Slices: introduced on Internet Explorer 8 Beta 1. It's purposed was to put in the
bookmarks toolbar a "slice" of the web page to be quickly viewed. Weather sites used
this to show the current weather and of the next days. They, however died. To know more
visit the Wikipedia page. The parser knows
if a shortcut is a web slice if the WEBSLICE
attribute is present. The value of
ISLIVEPREVIEW
is copied as boolean and the value of PREVIEWSIZE
is a raw copy of
the value, as a string