0 remain
3282 completed
The program's namesake, Hattori Hanzō.

Hanzo is an experimental plug-in for jEdit built to partially automate the process of converting from bad, old-style {{Infobox Ship}} templates to shiny, new {{Infobox Ship Begin}} templates. This is a task undertaken by the Ships Wikiproject and is (briefly) described at Category:Ship articles needing infobox conversion.

It's so named because The Bride was wreaking havoc with a Hattori Hanzo sword on TV while I was searching for a name for a Java class.

The program has absolutely no other use in the universe, and the only way someone else could use it would be to basically do a bit-by-bit copy of my hard drive. It depends on about a zillion other packages. That said, if you want to write a lexer for infoboxes or automate some editing processes in Beanshell, I have some notes below.

Hanzo is 100% finished with the job I created it for, having helped me convert 3282 of 3,282 infoboxes in 3 days 4 days 5 days. There are about 0 left to go. which represents about 0 hours. The remaining 50 or so infoboxes have to be cleaned up by hand, which is costing extra time. Hanzo's current status could best be described as "humming along smoothly for a few hundred edits, then bursting into flames."

Feedback edit

If you're here, you probably saw an edit summary. I have a watch on the discussion page here. Feedback away.

Project history edit

3,282 infoboxes were converted in a period of 5 days, 8 hours and 27 minutes, from:

  • 10:35, 27 March 2008 (hist) (diff) USS Patrick Henry (SSBN-599)‎ (replaced infobox using Hanzo) (top)

to

  • 19:02, 1 April 2008 (hist) (diff) HMS Dragon (D35)‎ (Migrating infobox with Hanzo)

This represents about 25.54 conversions per calendar hour over the period of 128.45 hours.

Related infobox issues edit

As of 30 March, 2008, about 3,750 pages use {{Infobox Ship Begin}}, listed here Ship infoboxes requiring conversion include approximately

  1. 2,500 {{Infobox Ship}} remaining, listed here (of the original 3,282),
  2. 50 table header 01 conversions
  3. 1,000 table header 02 conversions
  4. 2,361 2,226 2,161 hand-tagged articles, including subst'ed infoboxes

I haven't formally analyzed (2) and (3), but they should be mostly amenable to automation. (4) might not be as easy, it may be something of a head-scratcher.

Technical edit

Hanzo's main functionality comes from a lexical analyzer written in Java with jFlex. To a large extent, the program lives inside a jEdit environment. A single-purpose program, it just barely functions. It was written in three four rather arduous days: about a day to write the lexer (twice, per Raymond's law), half a day to uninstall/reinstall/fix jEdit to work with wmjed, and a two and a half days to do stuff like:

  • get communication from WP to the lexer and back
  • preserve UTF-8 characters
  • do automatic page loading
  • do local diffs
  • automate to a 1-click process

It has one goal in life: to translate Ship-specific infoboxes.

Translating these infoboxes with regular expression search-and-replace seemed nuts to me. I couldn't bring myself to hack out code to do it. On the other hand, a small lexer with dozen rules and 4 parse states seems to do it pretty nicely.

Requirements edit

The BeanShell scripts below need an environment something like this:

  • jEdit version 4.3pre13 or later from http://www.jedit.org
  • mwjed wikimedia jedit plugin
    • mwjed has some requirements of its own, read the mwjed page carefully
  • the JDiff plugin, available from inside the jEdit Plugin manager ( Plugins menu, Plugin manager item, Install tab)

Possibly reusable bits edit