Comments? edit

As I am still under development, any comments on this whole endeavor are very welcome below.

For a faster response, leave my human supervisor a message on his talk page.

Verboseness of entry descriptions edit

I looked over some of Navibot's entries, and one thing that struck me is that the descriptions being added for the links that are added are often longer than they need to be. This is an example of unnecessary detail - per Wikipedia:Manual_of_Style_(disambiguation_pages)#Individual_entries, "The description associated with a link should be kept to a minimum, just sufficient to allow the reader to find the correct link." The trend of lengthy descriptions seemed to follow through a number of Navibot's edits (otherwise, it seems that what it is doing is great!). Just, try to describe the links as concisely as possible - for albums, artist and year produced is usually more than enough. -- Natalya 03:11, 28 July 2008 (UTC)Reply

It's funny, I'd been thinking the opposite - admiring the bot for not adding a note at all to some cases which didn't need it like this one. But I've just been editing a very verbose dab page, so perhaps it was the contrast! PamD (talk) 09:00, 28 July 2008 (UTC)Reply
Good comments—coming up with good descriptors is the hardest part of this endeavor, and the most interesting to me. Right now, Navibot tries to extract them from the lead sentence, but of course it doesn't really know what the lead is saying, and can't edit it too aggressively. In my day job, I work on computational linguistics, and I eventually hope to apply work in statistical parsing, sentence compression, etc., to these problems. There are also lots of heuristic shortcuts that the bot will eventually use: as Nataly said, albums can be described quite concisely, and I'm working on the bot doing just that with a few bits of information extracted from any occurences of {{Infobox Album}} found on the target page. There are a few other similar tricks I should get finished before the bot begins running unsupervised.
On the other hand, as PamD points out, sometimes the qualifier (my name for the trailing parenthetical in an article title) is enough for a reader. The early edits don't reflect this, but right now Navibot will add a descriptor only if it can come up with a short enough one (less than 50 characters). The bot is happy to get by with the qualifier, but only if it's more than one word—(Cassius album) is good enough, just (album) is not. And if there's no short descriptor, and the qualifier isn't long enough, the current version of the bot punts entirely on adding such an entry. It'll get back to those additions when it gets smart enough to do a good job.
Thanks for the comments! —johndburger 17:37, 28 July 2008 (UTC)Reply

Coping with redirects edit

Looking at the entries so far it looks most impressive and really useful - I can't understand why so many people have managed to create articles, name them with headings with a disambiguation, and not bother to add them to the dab page! Ah well, that's editors.

The one case I've spotted where a human tidying up would have done differently is Baha (given name): it's been added to BAHA, but there's an existing redirect from Baha to Bahá'í symbols, which needs either to be redirectd to the BAHA dab page, or to be given a "redirect" hatnote on the Bahai symbols page. I wasn't quite sure about DNO/Dno either. Perhaps the whole area where there are matches with different capitalisation needs to be looked at again, or the bot might just have to give up and produce a list for human attention in these cases? They're inconsistently handled anyway, with some cases where one dab page handles both initialism and word, and others where they're split.

Keep up the good work, anyway - I'm sure this bot will be a huge asset to Wikipedia even if we need to do a little tidying up after it. PamD (talk) 08:58, 28 July 2008 (UTC)Reply

Yes, initialisms are another tricky case where I've changed the bot based on these older edits. In its current incarnation, the bot would not have added Baha (given name) to BAHA—the main reason is that such DABs often say something like PQR is an acronym that may mean, so Navibot will currently not add Pqr to PQR. And in the other case you mention, where there are multiple DABs that differ only in case, the bot wimps out entirely. Eventually, I may make it smart enough to decide in some cases which of multiple DABs to add an ambiguous entry to.
As for redirects, one problem is that I can't be sure the redirect is simply an alternate name for its target. In your example, however, the redirect is tagged with {{R from title without diacritics}}, which is indeed evidence that perhaps there should be an entry for Bahá'í symbols added to BAHA. I do keep track of these redirect tags in the database I build to find potential entries to add, and at some point I'll use them for exactly this purpose. —johndburger 19:55, 28 July 2008 (UTC)Reply

Primary usage edit

The Beatles diff here illustrates another problem: could the bot act differently in the situation where it's the primary usage which is missing, so that instead of adding a link at the end of the list it adds it at the top of the page, or flags it up for manual intervention? It might need manual intervention to check that the line below is then appropriately worded, as in fact in this case it is: here the appropriate line had been removed from the top of the dab page in April by an IP editor who only made one other, equally unhelpful, edit. PamD (talk) 09:38, 28 July 2008 (UTC)Reply

Yes, I'm unduly proud of the bot for finding that missing entry. At first I was sure it had made some mistake—how could The Beatles (disambiguation) not point to The Beatles??? I don't know if I feel comfortable having the bot decide whether an addition is the primary usage—what do you think of this edit? Should Navibot have promoted the new entry to first place?
One of the reasons the bot now provides more contentful edit summaries is to make such things easier to spot by human reviewers, but you may be right that a flag of some kind is appropriate for oddball cases like that. I suppose I could just add {{disambig-cleanup}} to the page on some edits. Or I could use my own category—I think some bot owners do something like that. The more I think about it, there are in fact several other cases where the bot could make the edit, but also call for human attention in some way—good idea! —johndburger 20:17, 28 July 2008 (UTC)Reply
It's going to be unambiguous that the primary usage is the primary usage, surely... it's the one without a disambiguation, but which isn't a dab or a redirect page! It'll only be missing from the dab page through human error or vandalism, but there's plenty of both of those around in WP. PamD (talk) 23:08, 28 July 2008 (UTC)Reply
Aaargh - now looked at your Da Hui example, where it all rather falls apart... problems being that the new item the bot found was already half-listed (presumably before it had its own article) so there's duplication to undo, and the other item is not a straight disambiguation but a partial title match. Of which there will be plenty. So although technically the song is the primary usage, it might well not be appropriate to have a standard "primary usage" line here. It gets worse. I'd already wondered what happens about placenames, where you get disambiguation by comma rather than by bracket: is the bot geared up to recognise that it needs to add New York, North Yorkshire (when it exists) to the New York disambiguation page? Then of course there are surnames.... perhaps don't go there as yet! But the bot already seems pretty amazing, congratulations. PamD (talk) 23:15, 28 July 2008 (UTC)Reply
Your New York, North Yorkshire example is definitely something I want to get to. Right now the bot posits possible DAB entries based only on obvious commonalities like Title (disambiguation) and Title or Title (qualifier). That's probably the most prevalent kind of dab-entry link, but there are many others. City, Province and City, Country are the "next frontier" I'd like to explore. Surnames and given names are definitely more complicated, in part because many of those pages are not quite DABs, but genuine articles. I'm starting slow. :) —johndburger 02:29, 29 July 2008 (UTC)Reply
Just to finish the discussion about The Beatles (disambiguation), someone quickly resolved the issue by promoting Navibot's new entry into the primary position. This is a great example of what I hope will happen—the bot does the grunt work, and people do the smart stuff. —johndburger 03:33, 30 July 2008 (UTC)Reply