User talk:OverlordQBot/Archive/Dev1

Latest comment: 16 years ago by OverlordQ in topic Update 1

June 25, 2007 edit

Update 1 edit

Yay, I have it listening to the irc channel. Stupid color codes tripped me up a bit, then I shamlessly borrowed and fixed the code from VandalFighter to give me good output. Also have it filtering for just talk pages.

Debug output:

ovrlrdq@myhost:~/svn/perlwikipedia$ ./SigBot.pl
Retrieving http://en.wikipedia.org/w/index.php?title=Special%3AUserlogin&action=edit
Login as "OverlordQBot" succeeded.
Connected to irc.wikimedia.org
.#.#.
http://en.wikipedia.org/w/index.php?title=Talk:Good_Samaritan_%28Hellboy%29&diff=140470832&oldid=99140288
!.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.
http://en.wikipedia.org/w/index.php?title=User_talk:Shaunyboy_Brikman&diff=140470852&oldid=137821030
!.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.
http://en.wikipedia.org/w/index.php?title=User_talk:LaraLove&diff=140470882&oldid=140440565
!.#.#.
http://en.wikipedia.org/w/index.php?title=User_talk:Rambutan&diff=140470884&oldid=140467437
!.#.#.#.#.#.
http://en.wikipedia.org/w/index.php?title=User_talk:Deryck_Chan&diff=140470890&oldid=140225824
!.#.#.#.
http://en.wikipedia.org/w/index.php?title=Talk:Good_Samaritan_%28Hellboy%29&diff=140470893&oldid=140470832
!.#.#

A . is when the bot recieves a message from rc of a new edit. A # indicates it's not a talk page. If it is a talk page, it outputs the url and then an !.

Update 2 edit

Getting a Diff engine to work has been a pain, haven't come up with a better way of creating a diff then pulling the two revisions and then running a diff algo on both. Any input would be greatly appreciated.

myhost:/home/ovrlrdq/svn/perlwikipedia# ./SigBot.pl
Connected to irc.wikimedia.org
.A.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#
.A.S.#.#.S.#.#.#.#.#.#.A.#.#.#.#.#.#.S.#.#.#.A.#.#.#.#.#.#.A.#.#.#.#.A.#.#.#.#.#.#.#.#.#.A.#.S.#.A
.#.A.A.A.#.A.#.#.#.#.#.#.#.#.#.#.A.#.A.S.#.#.S.#.#.#.#.#.A.#.#.#.#.#.#.#.#.#.#.#.A.#.#.#.A.#.S.#.#
.#.#.#.#.#.#.A.#.#.#.#.#.#

Legend:

  • . Revision (followed by one of the following results)
  • # Revsion was not on a Talk page
  • A Revision only had additions
  • S Revision had subtrations
  • N Revision was non-contiguous

Also fixed bug in parser where it was both newpage and minor edit. Need sleep.

June 30, 2007 edit

Update 1 edit

Completely scrapped Perlwikipedia module. Using as many POE components as possible. Switched to using the API instead of making a normal request and scraping the html, dont know if it provides any less load on their end, but I'm sure that's why it's there.

Working on writing the xml parser for the calls to the API. Q T C 05:53, 30 June 2007 (UTC)Reply

July 2, 2007 edit

Test

Update 1 edit

I hacked myself into a corner with some excessive subroutines named similarly so I scrapped what I had and redid those logic portions. Working good again, only problem is some of the HTTP requests are timing out :-/ Added a HTTP Keep-Alive pool, so hopefully that'll smooth those out, otherwise I'll have to figure out some way of writing an error handler. Still up in arms on whether or not diffing the two revisions or pulling the diff page and munging it is the best method. I've gotten to a good position in the code for it to go either way so I'll save a copy of what I have now and persue the munging the diff page method. This will cut down on requests to to WP by half on a best-case. Q T C 00:29, 3 July 2007 (UTC)Reply

July 06, 2007 edit

Update 1 edit

Testing new diff parsing routines. Q T C 09:03, 6 July 2007 (UTC)Reply

Hopefully it works :) Q T C 09:04, 6 July 2007 (UTC)Reply

One last test Q T C 09:04, 6 July 2007 (UTC)Reply

July 24, 2007 edit

Update 1 edit

Ouch? Has it been 18 days? Got sidetracked with Real Life (tm). I'm a horrible procrastinator, but I'mma sit down now and finish it off.

  1. Rewrote Requests to api.php, only sends request for pages that are new (ie: not oldrev newrev from rv feed)
  2. Split parsing from doing diff on two calls to api.php, to munging the html from viewing actual diff.
  3. Parsing 'engines' done.

ToDo:

  1. Logic on to sign a post or not.
  2. Submit edit back to WP.

Q T C 05:09, 24 July 2007 (UTC)Reply

Update 2 edit

Still running into a bug where I get the revision notice from the rv irc bot, request it through api.php but get an error reply saying that the page doesn't exist. Of course this only happens on new pages, but still is slightly aggravating. Q T C 05:37, 24 July 2007 (UTC)Reply

EG:

'<page ns="3" title="User talk:76.210.5.146" missing=""/>'

Q T C 05:46, 24 July 2007 (UTC)Reply

Update 3 edit

Fixed the non-existant page bug, now filtering out revisions that only add a template is proving to be the pain. In persuit of the conditional portion of the Bot I figured out i was chewing off one to many letters so usernames were getting truncated by one letter, which explains why skipping Bot edits wasn't failing because names became BetacommandBo. Q T C 00:30, 25 July 2007 (UTC)Reply

Update 4 edit

Parsing the HTML is proving to be a PITA. Gotta take a break from this, at least on the plus side I went the parsing as XML route since using regex's to parse HTML is a 'Bad Thing' *winks at wikilinkwatcher* Q T C 01:08, 25 July 2007 (UTC)Reply

July 25, 2007 edit

Update 1 edit

sigh, Looks like somebody else wrote a similar bot. I'm going to finish up anyways, throw it in my resume. Seems they just looked at the bot accounts' activity log and assumed I was idle instead of asking me about it. *shrug* Would have been done by, but I rewrote lots of guts to cut the WP server requests in half, guess how they say 'nice guys finish last' is oh so true.

Anyways, on a related note, kinda hard to test the last little functionality because there's so few unsigned talk page edits going on :) Q T C 10:37, 25 July 2007 (UTC)Reply

Update 2 edit

Arg, looks like I optimized myself into a hole. Back to the drawing board. Q T C 10:48, 25 July 2007 (UTC)Reply

July 27, 2007 edit

Update 1 edit

Development of this strain terminated. Q T C 22:07, 27 July 2007 (UTC)Reply