Bliki engine (Java Wikipedia API)
Original author(s)Axel Kramer
Developer(s)Axel Kramer, Jan Berkel
Initial releaseJanuary 2008; 16 years ago (2008-01)
Stable release
3.0.19 / August 2012; 11 years ago (2012-08)
Written inJava
Websitebitbucket.org/axelclk/info.bliki.wiki/wiki/Home

The Bliki engine (also known as the Java Wikipedia API) is a Java library used for converting between MediaWiki (Wikipedia) syntax and HTML.[1] It also supports converting MediaWiki syntax to plain text and contains helper classes for working with MediaWiki dump files. An example of the syntax equivalence between MediaWiki code and HTML can be found here.

History

edit

The Bliki engine was initially released as an Eclipse plugin (Eclipse Wikipedia Editor plugin, version 2.0.5) in October, 2006.[2] The first version of the Bliki engine as a Java library (Java Wikipedia API) was 3.0.1, and was released on January 20th, 2008.[2] The most up-to-date version of the library is 3.0.19. The first commit to the project was made in December, 2006 and the most recent commit was made in January, 2016.[3]

Installation

edit

If you are using Maven, add the following repository to your pom.xml:[4]

<repository>
  <id>info-bliki-repository</id>
  <url>http://gwtwiki.googlecode.com/svn/maven-repository/</url>
  <releases>
    <enabled>true</enabled>
  </releases>
</repository>

along with the following dependency for the bliki-core jar file:

<dependency>
  <groupId>info.bliki.wiki</groupId>
  <artifactId>bliki-core</artifactId>
  <version>3.0.19</version>
</dependency>

and if you also want the Bliki addons jar, include the below dependency as well:

<dependency>
  <groupId>info.bliki.wiki</groupId>
  <artifactId>bliki-addons</artifactId>
  <version>3.0.19</version>
</dependency>

If you are not using Maven, then grab the jars directly from the project page. In this case, you may also want to download the Bliki's dependency jars available as bliki.core.libs.001.zip[5].

Features

edit

The Bliki engine supports the following list of features:[1]

Wikipedia/MediaWiki Syntax to HTML

edit

The Bliki engine can render MediaWiki syntax to HTML. It can render wiki tags for bold, italic, headers, source, image etc. Wiki tables, lists, categories, footnotes and some of the template parser functions are also supported.[1]

HTML syntax to Wiki syntax

edit

The classes used for converting from HTML to Wiki syntax are contained under the info.bliki.html package hierarchy. The following packages can be used for converting to wiki formats:

Convert MediaWiki syntax to plain text

edit

Bliki has PlainTextConverter class which can be used to convert MediaWiki syntax to plain text.

APIs for working with MediaWiki XML dump files

edit

Bliki has helper classes (example: WikiXMLParser) which can be used to parse MediaWiki XML dump files. These can be used to convert the XML dump to plain text or to HTML.[6]

Converter Tool

edit

A Java GUI converter tool is provided[7] which allows the user to experiment with the Bliki conversion methods for Wiki2HTML, Plain2Wiki and HTML2Wiki.

Sample Usage

edit

MediaWiki syntax to HTML

edit

The following code snippet shows a basic example of converting MediaWiki syntax to HTML. The info.bliki.wiki.model.WikiModel class needs to be imported. Then, the WikiModel.toHtml method is called with the MediaWiki code to be converted.[5]

import info.bliki.wiki.model.WikiModel;
...
String htmlText = WikiModel.toHtml("''This is italic text''");
...

htmlText now contains HTML markup <p><i>This is italic text</i></p>.

HTML to MediaWiki syntax

edit

To convert HTML code to Mediawiki syntax, HTML2WikiConverter and ToWikipedia classes have to be imported. The HTML code is set by calling setInputHTML method on a HTML2WikiConverter object. Then, the converter's toWiki method is called with a ToWikipedia instance to perform the conversion.[8]

import info.bliki.html.HTML2WikiConverter
import info.bliki.html.wikipedia.ToWikipedia
...
...
HTML2WikiConverter conv = new HTML2WikiConverter();
conv.setInputHTML("<h2>This is a large heading</h2>");
String wikiText = conv.toWiki(new ToWikipedia());
...

wikiText now contains equivalent MediaWiki syntax == This is a large heading ==.

If the html conversion string above was <p><i>This is italic text</i></p> we would have got back the wiki input in the first example ''This is italic text''

MediaWiki syntax to plain text

edit

To convert MediaWiki text to plain text you will have to import and use the info.bliki.wiki.filter.PlainTextConverter class. [9]

import info.bliki.wiki.filter.PlainTextConverter;
import info.bliki.wiki.model.WikiModel;
...
WikiModel wikiModel = new WikiModel("https://en.wikipedia.org/w/api.php/${image}", 
                                      "https://en.wikipedia.org/w/api.php/${title}");
String wikiText = "This is a [[Hello World]] '''example'''";
String plainText = wikiModel.render(new PlainTextConverter(), wikiText);
System.out.print(plainText);
...

The program above will remove the MediaWiki syntax (Hyperlink of 'Hello World' to wiki page and bold formatting of the word 'example') and output simple plain text This is a Hello World example.

Parsing MediaWiki XML dump files

edit

In this example, we will make use of the WikiXMLParser class which iterates through the MediaWiki XML dump file and parses each article in the dump. The dump of Wikipedia articles is available from the Database dump progress page.[6]

import info.bliki.wiki.dump.IArticleFilter;
import info.bliki.wiki.dump.Siteinfo;
import info.bliki.wiki.dump.WikiArticle;
import info.bliki.wiki.dump.WikiXMLParser;
...
...
class TestArticleFilter implements IArticleFilter {

    public void process(WikiArticle page, Siteinfo siteinfo) throws SAXException {
        if (page.isCategory()) {
            System.out.println(page.getTitle());
        }
    }
}

...
...

try {
String dumpFilename = "C:\\dump\\mediawikiwiki-20160203-pages-articles.xml";
IArticleFilter handler = new TestArticleFilter();
WikiXMLParser wxp = new WikiXMLParser(dumpFilename, handler);
wxp.parse();
} catch (Exception e) {
    e.printStackTrace();
}

IArticleFilter is an interface for a filter which processes all articles from a given Wikipedia XML dump file. The TestArticleFilter class here implements the IArticleFilter interface. The method process gets called on each article parsed by the WikiXMLParser and its title gets printed if it implements the category namespace. A sample of the titles printed when the above code is executed is shown below:

Category:Syntax highlighting extensions/en
Category:HTML variables/en
Category:UserLogout extensions/en
Category:History and diffs/en

Alternative tools

edit

The Bliki engine is just one of many tools that can convert MediaWiki syntax into other formats. Some notable alternatives include:[10]

edit

Online Wikipedia Markup Converter.

References

edit
  1. ^ a b c "Official Bliki Wiki". Retrieved 30 January 2016.
  2. ^ a b "Google Groups". groups.google.com. Retrieved 2016-02-14.
  3. ^ "Bliki engine commit history". Retrieved 3 February 2016.
  4. ^ "Hook into Wikipedia using Java and the MediaWiki API | Integrating Stuff". Retrieved 2016-02-13.
  5. ^ a b "How to convert Mediawiki text to HTML". Retrieved 3 February 2016.
  6. ^ a b "Helper classes to work with MediaWiki XML dump files". bitbucket.org. Retrieved 2016-02-13.
  7. ^ "BlikiConverter.java". GitHub. Retrieved 2016-02-13.
  8. ^ "How to convert HTML to Mediawiki text". Retrieved 3 February 2016.
  9. ^ "Mediawiki2PlainText". Retrieved 2016-02-09.
  10. ^ "Alternative parsers - MediaWiki". www.mediawiki.org. Retrieved 2016-02-13.