User:Tomaswoas/Wsif format

Wiki on a Stick Interchange Format is the preferred format for exporting WoaS content to a single file. This allows you to share it with whomever you like, or to archive your content. This page describes version 1.4.0 of the format.

Format description

edit

Note The WSIF 1.4.0 version introduces some changes, mostly concerned with moving to a convention over configuration pattern:

   removes the ability to have external file content for embedded files (Image:: and File:: pages), as most browsers can't write binary files and the concept breaks the whole idea of the WSIF file as a sharing format; this has also led to the removal of some header names and may lead to further changes in the future (removal of the mime page header);
   adds the ability for WoaS to define page defaults in the WSIF information header, including a default boundary value;
   in partnership with the above change, removes the page headers that define a conventional page, page.attributes:0 and page.encoding:8bit/plain (this doesn't prevent these defaults from being changed in some future format version);
   removes the first boundary value as superfluous; the blank line after a header block serves as the first boundary for page content; the boundary search routine has also been improved so that a content/boundary conflict can now only occur if the exact boundary string, including dashes, occurs at the start of a content line;
   removes woas. from in front of the page. namespace as unnecessary; WSIF is for WoaS-like Wikis and has WoaS embedded in it's acronym; further branding is surely unnecessary and bloated.

These changes haven't affected this Woas' ability to import older format versions. WSIF format is similar to MIME email attachments format; each line is formatted like an email header definition:

header.name: header-value

Header names must be alphanumeric only, plus the namespace 'dot' (.): they cannot include the underscore character, _ (the code transforms header names internally for efficiency, leading to this limitation). The space after the colon and before the header-value is optional but present in all Woas generated files.

All WSIF content must be ASCII, so header-value must be ECMA-escaped when necessary; this corresponds to escaping slashes (from \ to \\) and encoding any UTF-8 sequence to the corresponding \u0000 string, where 0000 is the lowercase hexadecimal value of the UTF-8 character (as in ECMAScript); this encoding uses the woas.ecma_encode() function in WoaS.

A WSIF file contains header blocks that describe its contents. Header blocks contain one or more header lines, in any order, followed by a blank line (\n\n); the blank line is used to mark the end of both information and page header blocks.

The WSIF information header block (using the wsif namespace) describes the file itself, and the WoaS page header blocks (using the page namespace) describe the pages contained within it. The file starts with the WSIF information header block, including any optional page header defaults (new in 1.4.0), and ends with a blank line. In a conventional (inline content) WSIF file the information block is followed by one or more page definitions. There are two other WSIF file types (index and page) that use a slightly different format.

Reserved namespaces

edit

wsif namespace

edit

The wsif. namespace defines an information header. The required headers are:

   wsif.version - WSIF format version used by the file;
   wsif.generator - program/library used to generate WSIF content;
   wsif.generator.version - version of generator (usually WoaS) used to produce the WSIF file; this header is currently optional for imported content not generated by a WoaS file, though this is likely to change soon so that WoaS derivatives can also be imported properly.

The optional headers are:

   wsif.author - the name of the person generating the WSIF file;
   wsif.pages - total count of pages stored inside the WSIF file; this is not used by Woas anymore, but can be handy when looking at the file.
   wsif.type - only present for external data source use. wsif.type: index lets the system know the page is a special Index listing of the content pages and their associated filenames. This type also allows for an optional page description. wsif.type: page is present when the Use multiple files for WSIF data source option is selected within the WSIF settings section of the Special::Options page.

Version 1.1.0 and above are currently supported. PVHL: My Woas will be adding optional hash and signed headers to the wsif namespace in a future Woas and WSIF version, along with the ability to sign individual pages (or, more accurately, to sign their hash value). There will also be date.created, date.modified, and description headers. Hopefully these changes will also be accepted by the WoaS project.

page namespace

edit

The page. namespace defines general WoaS content properties and specific WoaS pages. The 1.4.0 WSIF version introduces default content headers placed in the information header block and also conventional values (for attributes and encoding) that don't need to be stated at all. The defaults are for a plain ASCII page that has not been encrypted, escaped, or encoded as base64 content: a normal WoaS page. If a page has the same header values as the default header or conventional values they need not be repeated. This means a conventional page need only have headers for its title and possibly its modification date (currently optional).

The following are header definitions specific to WoaS pages:

   page.title - ECMA-escaped page title;
   page.attributes - a positive integer value in decimal format specifying the page attributes; attribute values are binary OR'd together to form a single decimal value, so an image file has an attribute value of 12 (embedded file and image), or 13 if it is read-only; the current attributes are:
       0 - a conventional page, and so also the conventional value that need not be stated,
       1 - read only,
       2 - encrypted,
       4 - embedded file (Image:: and File:: namespaces),
       8 - image;
   page.date.modified - a positive integer value in decimal format representing seconds from the UTC epoch to the last modified time; this header is defined only if data is available from WoaS, where it is currently optional;
   page.encoding - content encoding format; content and/or header values are encoded in ecma/plain format when UTF-8 sequences are found inside the string, otherwise 8bit/plain is used; 8bit/base64 is instead used for binary files or encrypted pages;
   Note
   8bit is an historical mistake that the official WoaS project has stated it will not be fixing. All the encodings are, in fact, 7bit text encodings
   the page.encoding header can contain one of the following values:
       8bit/plain - the default for ordinary ASCII text and so also the conventional value that need not be stated; data is ASCII text which can use 7bits to represent characters; only ASCII text should be used and not UTF-8;
       ecma/plain - data is ECMA-escaped UTF-8 text;
       8bit/base64 - data is base64-encoded binary data;
       text/wsif - data is WSIF (not used in version 1.4.0, but will be added back once a WSIF file can be transposed within a conventional WSIF file).
   page.boundary - string used for inline boundary recognition, without leading dashes (not used by the index and page file types); now usually an information header block default header;
   page.mime - mime type of embedded pages that have a defined mime type, such as images.

All headers are mandatory (under their specific state definition branch), except page.date.modified and the (currently) conventional values for page.attributes (0) and page.encoding (8bit/plain). PVHL: My Woas will soon be adding date.created (for pages and WSIF files, and also wsif.date.modified) and will also be making dates non-optional (display of dates can still be optional though). page.author will also be added in both the information header block default and page header forms. A description header will also be added in both page and wsif forms, as discussed elsewhere.

I will also be adding several page attributes when the code is refactored. These include attributes such as core, optional, preload, shadow, and help. I also intend to remove the mime header and simply save Woas pages to WSIF files in the same form as they are kept internally: as a data URL complete with mime statement.

WSIF file types

edit

Inline content: the conventional file type

edit

Version 1.4.0 of the WSIF format begins incorporating the convention over configuration design pattern. The conventional WSIF file type therefore does not need to be flagged as such; it is assumed unless a header states otherwise.

Inline boundary

edit

A line starting with two dashes (--) followed by the boundary header value marks the end of the page content; the content does not include the \n required to create the marker. An example page:

page.title: Example ... page.boundary: my-random-id

this is the inline content --my-random-id

--my-random-id must be a (usually random) string that does not appear at the start of any line within the page content. It does not need to be unique within the entire WSIF file, which is read sequentially, page by page (as in a state machine). With the advent of default header values, most pages will not have an individual marker, and a future Woas version won't need individual boundary markers at all (for better file creation efficiency); though individual markers will then not be generated by Woas, they will still be allowed by the WSIF format. If a page does need its own marker, Woas generates a random one for that page only.

A conventional WSIF file example

edit

wsif.version: 1.4.0 wsif.generator: woas wsif.generator.version: 0.13.0 Alpha 0 wsif.author: PVHL wsif.pages: 8 page.boundary: aGQsCTsKed

page.title: Main Page page.date.modified: 1374388808

= Welcome to my Woas! ... --aGQsCTsKed

page.title: ::Menu

Some menu text ... --aGQsCTsKed ...

The blank line between the end-of-page boundary marker and the first page header is optional, but always included by Woas for readability by humans.

Index

edit

An Index WSIF type is marked as such:

wsif.type: index

A WSIF Index file has different content from a conventional WSIF file; the information header is similar, though, apart from the type header. After the normal information block, terminated by a blank line, a simple listing of page titles and their associated file names is given. The block of names should be a continuous block without blank lines separating the pages.

The file names are separated from the titles by a doubled delimiter: ||. These markers will never exist in a page title, even when we introduce character escaping, as intended, and they will also not be present in the file's name. Though the format doesn't require spaces either side of the delimiter, Woas always includes them; any space before or after the delimiter is trimmed off, as is any space at the start and end of the line.

An example index file might look as follows, with associated WSIF page files accompanying it:

wsif.type: index wsif.version: 1.4.0 wsif.generator: woas wsif.generator.version: 0.13.0 Alpha 0 wsif.author: PVHL wsif.pages: 336

Main Page || 0.wsif

Menu || 1.wsif

WoaS::Aliases || 2.wsif WoaS::Hotkeys || 3.wsif WoaS::CSS::Custom || 4.wsif ... Water || 335.wsif

PVHL: A future Woas version is likely to change the file name to one similar to the page title, transformed according to an algorithm being developed; at that time a file name will be optional if it is the conventional value and is located in the same directory as the index file Although not yet implemented, the format allows a short description to be added after the file name, copied from a future page header, page.description, or the opening paragraph if no such header is available, and using the same delimiter. This would then help with importing, matching a similar capability that will be added to the WSIF import page. It might look like this:

Water || 335.wsif || Information on the various forms water can take

The description will have a maximum length and Woas may also format the description to a fixed width using the standard line continuation character, \.

Page

edit

A Page WSIF type is marked as such:

wsif.type: page

The difference between a conventional WSIF file and a page WSIF file is that only one page is present, all the page headers are left in the information header (and are therefore the default values), so there is no separate page header, and there is no boundary header or marker. An example page file might look as follows:

wsif.type: page wsif.version: 1.4.0 wsif.generator: woas wsif.generator.version: 0.13.0 Alpha 1 wsif.author: PVHL page.title: Main Page page.date.modified: 1374388808

= Welcome to my Woas!

/(Yep, ...

The content continues to the end of the file and so needs no boundary header or marker, and no wsif.pages header.

Usage notes

edit

WSIF format is pretty straightforward to produce or interpret. A few more thoughts:

   it is safe to embed a WSIF file inside another, with proper boundaries implemented;
   WSIF files can contain custom headers (using a namespace other than the reserved wsif and page namespaces). For example you could define a new namespace for header definitions called custom:
   custom.x: 100
   custom.y: 200
   custom.content: Hello world!
   It would then be up to you to parse such header-value couplets and give them a proper representation. The JavaScript API to allow this within WoaS is not yet implemented.

References

edit
edit