Skip to content

The Parser

FlyingWolFox edited this page Feb 14, 2024 · 3 revisions

The parser is really simple and process the file line by line. Even that the Netscape Bookmarks file format is generated by browsers, the error acceptance is pretty high, just raising exception if a folder doesn't have a body closing tag (</DL><p>). Behaves like a browser importing bookmarks from a file. On this page describes the support for the format

Note

Due to the lack of documentation and standard for the Netscape Bookmarks file format, the parser was built with html export by browser as guide and test cases. Some things that were found on the internet were added with full support, but other with just legacy support. To see the guidelines for the parser see the Netscape Bookmarks File Format wiki page

How file meta should be

The meta, the first seven lines of the file, should be like this:

<!DOCTYPE NETSCAPE-Bookmark-file-1>
<!-- This is an automatically generated file.
     It will be read and overwritten.
     DO NOT EDIT! -->
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Bookmarks</TITLE>
<H1>Bookmarks</H1>

Some parts are variable from browser to browser. Most common changes are:

  • Comment: The warning is ignored and don't need to be three lines, since some examples showed just one line. The parser, when finds the <!-- will skip the next line after the -->. However, no empty lines before and after the comment are expected.

  • H1 tag: This can be another name, like a translation, what Opera does.

The parser doesn't require a meta. If it doesn't find it, it'll just parse the bookmark folder tree

After the H1 tag should be a <DL><p>, but there can be empty lines and the parser won't raise an exception. This marks the start of the root bookmarks folder. There also should be a </DL><p> at the end of the file, marking the end of the root folder, not needing to be in the last line of the file. Anything after the </DL><p> won't be processed neither will be in the non_parsed dictionary

Items supported

An item can be a folder, shortcut, etc. The supported are:

Folder

A folder is an item that contains other items. They have a H3 tag and their tree is wrapped by <DL><p> and </DL><p>. The H3 should be:

<DT><H3 {attributes}>{name}</H3>

The parser ignores spaces between attributes. The body should be:

    <DL><p>
        {item}
        {item}
        {item}
        .
        .
        .
    </DL><p>

If a folder doesn't have a <DL><p> and a </DL><p> it's considered empty. If there's just one tag, the parser will raise an exception

Shortcut

A shortcut is an item with a link. They have a A tag that should be:

<DT><A HREF="{url}" {attributes}>{name}</A>

The parser proceeds normally if the shortcut doesn't have the HREF attribute and if the attributes doesn't have spaces among themselves

Feed, Web Slice and Icon

Those are integrated with shortcut and are implemented with basic support. If a shortcut have FEED="true or WEBSLICE="true, a special type of shortcut is created that supports some extra attributes of these types. Icons are already integrated with shortcuts, being them the icon of the shortcut that the browser shows. For more info about these, see the Netscape Bookmarks file format wiki page

Attributes supported

The parser supports the most common attributes that are used by browser and some extras. Every item already have a set of attributes, some are inherited, some are specific for their type

Item attributes

All item have these attributes:

  • Number position: This one isn't put in the item tag, but the order that the bookmarks have is important. This makes possible organize bookmarks on a browser

  • ADD_DATE: The date that the item was created, in unix time

  • LAST_MODIFIED: The date of the last modification in an item, in unix time

  • Parent: Also not present on the item tag. The parent is always the folder that contains the item. Just the root folder doesn't have a parent

  • Name: Is in the tag contents, between the opening and the closing tag.

Folder attributes

In addition of Item attributes, folder have these attributes:

  • PERSONAL_TOOLBAR_FOLDER: Represents if a folder is the bookmark toolbar folder. Normally present just in toolbar folder, with the value "true". Normal folders commonly doesn't have this attribute

  • Items: The items that the folder contains. These item are between <DL><p> </DL><p>

Shortcut attributes

In addition of Item attributes, shortcuts have these attributes:

  • HREF: the url to the page of the shortcut

  • LAST_VISIT: the date of the last visit to the site of the shortcut, in unix time. Not present in all browsers exports

  • PRIVATE: probably indicates if a shortcut is private. It's value is an integer, probably "0" or "1". I've never found outside of created examples

  • TAGS: list of the shortcut tags, divided by commas. These are inserted by the user in the bookmark manager of his browser.

  • ICON_URI: url of the favicon.ico of the domain. If the domain doesn't have one, the value of this attributes will be "fake-favicon-uri:{url}", being {url} the same of the HREF attribute. Present in Firefox exports

  • ICON: the icon of the shortcut encode in a string with the format/encoding string before it. Normally the icon is a png and encoded in base64, represented by the "data:image/png;base64," before the encoded data appears. If another picture format or encoding is used, it'll be present in the the start of the value with "data:image/{format};{encoding}", however I never seen one that wasn't png;base64. The format/encoding notation and the encoded data are separated by a comma

  • Comment: not present in the A tag, but in the line next to it, in a DD tag without closing

  • SHORTCUTURL: the address bar shortcut keyword that is associated with the bookmark. Used by Firefox. When the keyword is entered in the address bar that's associated with a bookmark, Firefox will automatically navigate to that bookmarked page

Feed attributes

In addition to shortcuts attributes, feed have the following attributes:

  • FEED: present in the A tag, with value "true"

  • FEEDURL: the url of the feed

Web Slice attributes

In addition to shortcuts attributes, web slices have the following attributes:

  • WEBSLICE: present in the A tag, with value "true"

  • ISLIVEPREVIEW: if the web slice is a live preview, has boolean value ("true", commonly)

  • PREVIEWSIZE: contains the windows size of the web slice, in a string, with the format "w x h"

About Legacy/Basic support

Some things aren't more used in browser nowadays, but they're supported by the parser. However, some things existed, but isn't seen anywhere. These are the Feeds and Web Slices.

Feeds: probably RSS Feeds, would deliver news' headlines of a website in a simple format with an url to the article page. On old versions of Internet Explorer, these where normal shortcuts. Nowadays teh social networks, with their feed, made the RSS a little obsolete. However, some people still use them, but they need an app or a browser extension to handle those, even in a better way. The parser know if an shortcut is a feed if the attribute FEED is present, and will raw copy the value of the FEEDURL attribute

Web Slices: introduced on Internet Explorer 8 Beta 1. It's purposed was to put in the bookmarks toolbar a "slice" of the web page to be quickly viewed. Weather sites used this to show the current weather and of the next days. They, however died. To know more visit the Wikipedia page. The parser knows if a shortcut is a web slice if the WEBSLICE attribute is present. The value of ISLIVEPREVIEW is copied as boolean and the value of PREVIEWSIZE is a raw copy of the value, as a string