User talk:Shinhan/AWB BannerSheller

Latest comment: 16 years ago by Madman bum and angel in topic Digression

Example code edit

If you are interested in doing this with C#, heres a rough outline:

 string nestedTemplates = "{{WikiProjectBannerShell";
 strng otherTemplates = "";
 //Recognize a template as words enclosed within {{ and }} and followed by one or more optional
 //"|name=value". Since we need neither name or value, we ignore whatever follows |.
 //The part which matches the name of the template is named for quick retrieval
 Regex tl = new Regex(@"(?<generictl>{{(?<tlname>\w+)(\|.*)*}})", RegexOptions.Compiled);

 //A separate regex for WPBiography is created. We need the results of "living" and "activepol"
 //params separately. So name them
 Regex bio = new Regex(@"(?<biotl>{{WPBiography((\|\w+=\w+)|(\|living=(?<living>\w+))|(\|activepol=(?<activepol>\w+)))*}})");
 
 //Let the source string be named sourceString
 //Match bio with the string and replace the matched part with ""
 Match bioMatch = bio.Match(sourceString);
 GroupCollection gc = bioMatch.Groups;
 bio.Replace(sourceString, "");

 if (gc["biotl"] != "")
 {
     if (gc["living"] == "yes")
         nestedTemplates += "|living=yes";
     if (gc["activepol"] == "yes")
         nestedTemplates += "|activepol=yes";
     nestedTemplates += @"|1=\r\n" + gc["biotl"];
 }
 nestedTemplates += @"|1=\r\n";

 //Tackle all other templates, match and replace and selectively add
 MatchCollection templates = tl.Matches(sourceString);
 tl.Replace(sourceString, "");

 //Let tlList contain of type List<string> contains the list of templates
 //to be included in WP banner
 foreach(Match template in templates)
 {
     GroupCollection g = template.Groups;
     if (tlList.contains(g["tlname"]))
         nestedTemplates += g["generictl"] + @"\r\n";
     else
         otherTemplates += g["generictl"] + @"\r\n";
 }

 nestedTemplates += @"\r\n}}" + otherTemplates;
 
 //Code is not tested, so there might be bugs
 //Some shorthand could be used, but my intention wasn't to win IOCCC :P

--soum talk 14:58, 11 July 2007 (UTC)Reply

Discussion edit

Note: The regexes have not been designed to handle multi word template names, param names or values or spaces. And, similar technique can be used to covert all GA and FA and PR templates into an article history template. --soum talk 18:12, 11 July 2007 (UTC)Reply

Its much harder with GA/FA/PR. One needs to use Article History javascript to find revision id's, and some templates record only revision id but no date. PR templates only record the PR page but no date or revision id. All of this must be gathered by hand and cant be done automatically.
Good idea with nested regexp. No need for 1100 regexp matches :D
I changed "|=" to "|1=" as thats the proper syntax.
Since multi word templates are a must (params, other than living/activepol, are not needed), would this work?
instead of
(?<tlname>\w+)
putting
(?<tlname>[\w\s]+)
Thanks for the help :) — Shinhan < talk > 21:11, 11 July 2007 (UTC)Reply
I already said the regexes used here were just for demonstration, not comprehensive. So, for use, they will need modification. Like using (\w *)+ to recognize multiple words rather than \w+ that does only one word. And stuff.
And GA/FA/PR (the newer template revisions directly include the link to the discussion as well as the revision id, so no javasript required. A single regexp with named groups can do the job. So no hunting through the history. But yeah the older version will need some manual intervention (it can be automated as well, but thats a lot of work).

Digression edit

When I was writing this code, I got another idea which I thought out over the night. A dedicated browser for wikipedia which renders wikimarkup directly, rather than via HTML. It will fetch the wikimarkup, convert parse it into a graph of objects (the object model). An article will be composed of a TemplateCollection, a SectionCollection, a LinksCollection, a TableCollection a CategoriesCollection and a ForeignWiki collection, and they will be related to represent the overall struture of a document. A Section will again be composed of ParagraphCollection, ListCollection, NumberedListCollection, structured by relationships. With this object model, it can be trivially serialized into XML (XML means easier processing). And since the schema is defined for both XML and WikiMarkup, we can derive on from other. Also parsing in this way means on a round trip, we can get a cleanup of the wikimarkup for free (as XML is more well-formed than wikimarkup). Relationships can also be used to give the rendering structure (colors, positioning etc) to the XML.
With this object model in place, we can:
  • Directly render content, or use XSLT transform to convert to XHTML and have a browser rendering engine display it.
  • Locally store the XML document format, make changes to it, rather than to the server. Store the changesets, convert to wikimarkup and submit to the server as a batch. Yes. Offline editing. Wikipedia itself will detect conflicts, we can pass on that to the user.
  • Whenever an article is edited, it will subscribe to its RSS feed, so the local copy always stays updated.
  • Not limited to the editing environment that WP gives, rather can switch to a GUI based editor or advanced editor that can provide autocompletion for template parameters.
  • Lots of tasks can be automatized, like updating articles automatically when closing GA/AFD/FA/PR discussions etc.
  • Integrate AWB functionality.
I have already started work on the object model and XML syntax and thinking about starting work on local caching/rss integration. Will post it to SourceForge/CodePlex soon. Interested? Though I cannot do the rendering stuff, need others to take that part at least. --soum talk 09:48, 12 July 2007 (UTC)Reply
Interesting, and I can see it can be useful, but I dont have much expirience with XML and XSLT transforms and such… — Shinhan < talk > 10:29, 12 July 2007 (UTC)Reply
No problem, I will be using the same license as AWB so that I can borrow code. The parser will involve regexes, you can take from there as well. But its still some time away. Anyone is free to contribute anytime they want. I probably will make it a browser plug in rather than a full blown app. But lets move this discussion to Sourceforge when it goes online and meanwhile focus on your bot :) --soum talk 11:09, 12 July 2007 (UTC)Reply
If you're committed to doing it right, I'd look at MediaWiki's Parser.php. It's a bitch to work with, but you can be assured that you aren't parsing anything differently than Wikipedia would. — Madman bum and angel (talkdesk) 18:08, 20 July 2007 (UTC)Reply