Wikipedia talk:Version 1.0 Editorial Team/Archive 17

WikiProject Mixed drink assessments

The assessments are not working for some reason, could any one look into it? Their assessment page is here --Jeremy ( Blah blah...) 00:16, 5 January 2009 (UTC)

The problem appears to be with the {{WPMIX}} banner template as it is not assigning quality and importance categories. Without any articles in categories the bot is not finding anything to place in the statistics. Unfortunately I can't help with the template code as it looks a little too confused for me to understand. My recommendation would be to redesign the template so that it makes use of {{WPBannerMeta}}, if possible, as the code would be much simplified. However I am unsure whether the Meta can support the complicated focus= feature that is on the existing template. Road Wizard (talk) 02:54, 7 January 2009 (UTC)

Wikiproject Spirits

The assessment table for WikiProject Spirits doesn't seem to include anything above low importance (i.e. no mid, high or top) even though such articles do exist. Table is here Wikipedia:Version 1.0 Editorial Team/Spirits articles by quality statistics

Thanks, Cabe6403 (TalkSign!) 01:59, 7 January 2009 (UTC)

The affected category pages were missing from the Category:Foo-importance articles structure that the bot uses to base the statistics, however another editor fixed the error shortly before you left your comment here.[1][2][3] As the bot last checked your project two days ago the statistics are out of date, but the problem should be resolved on the next run. Road Wizard (talk) 02:37, 7 January 2009 (UTC)

WP Physics is 100% assessed. (~11,000 articles)

I dunno if this is a place to "report" this, but it feel as good as any.Headbomb {ταλκκοντριβςWP Physics} 19:44, 16 January 2009 (UTC)

Great work. How long do you think till they all reach featured status? ^_^ Road Wizard (talk) 19:57, 16 January 2009 (UTC)
Wow! Fantastic work. - Dan Dank55 (send/receive) 19:59, 16 January 2009 (UTC)

Quality of article is challenged

Culture has been nominated for a good article reassessment. Articles are typically reviewed for one week. Please leave your comments and help us to return the article to good article quality. If concerns are not addressed during the review period, the good article status will be removed from the article. Reviewers' concerns are here. --AlotToLearn (talk) 08:08, 17 January 2009 (UTC)

Delay in Wikipedia:Version 1.0 Editorial Team/Chicago articles by quality log

The bot seems to still be celebrating the inauguration. Can someone get it to run.--TonyTheTiger (t/c/bio/WP:CHICAGO/WP:LOTM) 08:04, 23 January 2009 (UTC)

You'd be better off posting this to Wikipedia talk:Version 1.0 Editorial Team/Assessment, where you'll get a response from the people more associated with the technical aspects. Happymelon 08:50, 23 January 2009 (UTC)
Actually, this is a bot issue, not an assessment issue, so I believe Wikipedia_talk:Version_1.0_Editorial_Team/Index may be better. However, I think Carl and Oleg both watch this page also, so it may get picked up. Walkerma (talk) 09:14, 23 January 2009 (UTC)

Target date for improving articles

What's the target date for improving artciles for v 1.0 --Philcha (talk) 11:00, 23 January 2009 (UTC)

sometime :D Happymelon 12:43, 23 January 2009 (UTC)
Version 0.7 has had the dump done several weeks ago, but there are issues with the publisher that are delaying things. Once those are resolved we will start work on Version 0.8, and we will set the deadline then. If I get the choice, I think I'd like to say the end of July 2009 for Version 0.8. If that goes well, then I could see Version 1.0 itself coming out next year - but a lot depends on the publisher.
A major headache with 0.7 has been putting together the index - for 30,000 articles it would be difficult to do this by hand! (Version 0.5, 2000 articles, needed 40 hours of manual work). I'm pleased to report that we've tested a script that uses keywords (manually selected) to generate the index, and although it's not perfect, it's looking (IMHO) very promising. That means that a summer deadline for V0.8 is looking more realistic, if the publisher is open to that. Walkerma (talk) 16:41, 23 January 2009 (UTC)

Hi. With regard to the stats for WP:YEARS, could you please expand the matrix to include columns for top, high, mid and low importance as we are now using the importance parameter, having set up an assessment dept within the project? Thanks very much. --Orrelly Man (talk) 08:02, 24 January 2009 (UTC)

  Done Ruslik (talk) 10:16, 24 January 2009 (UTC)
Thank you. --Orrelly Man (talk) 17:31, 24 January 2009 (UTC)

Project page needs updating

It says "The 2008/9 selection is due out June 2008". Maybe something else is 8 or more months out of date too? Robin Patterson (talk) 22:32, 1 February 2009 (UTC)

Thanks! I updated all the things that looked like they needed it. That release came out in the fall; Version 0.7 should come out fairly soon. Walkerma (talk) 03:27, 2 February 2009 (UTC)
Great! :) I have done some heavy work on a few articles that were selected, getting them to GA after the deadline, so there should be some better quality for the next release! BOZ (talk) 18:23, 2 February 2009 (UTC)
I'll add that the D&D Wikiproject has changed vastly from the time of the last selection bot, until its current level of focus. We've added many hundreds of articles to the project in the last few months, mostly due to absorbing smaller abandoned related projects, and picking up creative personalities (such as Gary Gygax!), video games, books, and more articles related to D&D which were in existence but which had not yet been assigned to the wikiproject. In fact, I added many of them personally when I noticed how many of our important subjects were not in the selection list! Additionally, we have been hard at work reassessing our articles, and merging smaller ones into more comprehensive lists and articles, so that old assessment data from last autumn is almost completely out of date. As part of this effort, we have been steadily working on getting our most important articles up to GA and better status. Let us know when you are able to get another run of the selection bot, because that data has proven invaluable to our work on the D&D project. :) BOZ (talk) 21:00, 2 February 2009 (UTC)
I'll look forward to seeing it. Once 0.7 is out I'll request a bot run so projects can get an update. If you have a nice set of articles together, the project may want to consider producing a WikiReader - we plan to start helping with those in 2009. Cheers, Walkerma (talk) 06:23, 3 February 2009 (UTC)

CSD tagged asessment pages

Hi
Are those three still needed:

All were tagged for speedy deletion as "old assessment page for a merged WikiProject".
Amalthea 01:23, 24 February 2009 (UTC)

I think they can be deleted. If they are needed, if the project gets recreated, the bot will automatically re-create them anyway. Thanks, Walkerma (talk) 02:10, 24 February 2009 (UTC)
OK, thanks. --Amalthea 03:16, 24 February 2009 (UTC)

Could you take a look at this?

A discussion on the assessment scale is currently taking place at.[4]. The discussion would greatly benefit from someone involved with WP 1.0 acting as a link between here and this place. Headbomb {ταλκκοντριβς – WP Physics} 02:44, 2 March 2009 (UTC)

Thanks! My bad, I should have posted a notice here ages ago! Walkerma (talk) 02:45, 2 March 2009 (UTC)

Update on release of Version 0.7

I was in touch with our publisher, and they are ready to publish; their search engine etc. are ready. We expect to release Version 0.7 early in 2009. The list of around 31,000 articles was fixed in late 2008, and a complete list with VersionIDs is here. A few of the VersionIDs may change from this, and small spots of obvious vandalism may be removed from text in others, but by and large this is what the final version will look like. The Zeno formatting for final articles is also ready. The index is almost ready, though some work is still needed. If you'd like to help me read lots of bad language (the output from a script that spots "bad words", checking for things like what we found in the first line of this article, then please let me know! Walkerma (talk) 14:50, 5 February 2009 (UTC)

God knows I'm more than a bit late in responding, but I'm willing to try anyway. Any other interested parties? John Carter (talk) 23:01, 22 February 2009 (UTC)
Thanks! We're still working on that, there's been some issue with running the script, and so I'd really appreciate that help! Good to hear from you! Walkerma (talk) 05:09, 23 February 2009 (UTC)
I'd like to help read through selections, let me know what I can do.
(also, could an admin take a look at the first 3 redlinks in the #Archives section please? If there was any content in those redlinks, they should be undeleted and moved to a subpage of this.
and this page could do with an archiving :) -- Quiddity (talk) 04:32, 2 March 2009 (UTC)
No content in those 3 red links, each had a redirect. - Dan Dank55 (push to talk) 06:04, 2 March 2009 (UTC)
I got the list of "badwords" from Wizzy today, but I can't open it because it's in gz format. Hopefully he'll send me a version I can open soon. Is if OK if I email you folks with the file, which is over 1 MB in gz format? (Let's hope they're not all valid vandalism! I'm predicting about 50-100 problem articles.) If so, what format would you like? This would be REALLY helpful for getting the DVD done, as it is the last major wikitask remaining before release (the rest is at the business end of things). I'll do the archiving later, I need sleep right now. Many thanks, Walkerma (talk) 09:53, 4 March 2009 (UTC)
gz is standard Unix gzip - even winzip will open it. If you cannot open it, I will send it zipped ? If anyone would like a copy, I will email it to them. Wizzy 09:59, 4 March 2009 (UTC)
I tried to unzip gz files a lot last year on my Windows PC, trying several different unzip programs - and as then, the file appears briefly then disappears into oblivion. But I can read zip files easily, if you can manage that. Thanks, Walkerma (talk) 10:07, 4 March 2009 (UTC)
StuffIt Expander should be able to handle these files without any problem. ···日本穣? · Talk to Nihonjoe 06:59, 10 March 2009 (UTC)

Over 9000

Could we add in the phrase "over 9000" to the rescan? See User:CharlotteWebb/dubious statistics (and youtube for the explanation). Ta. -- Quiddity (talk) 22:25, 4 March 2009 (UTC)

I ran a scan for that phrase, and it is short enough to post here. I don't really understand the meme, though. Wizzy 09:47, 5 March 2009 (UTC)

 < 4 manuals and :over 9000: pipes, finance>     St. Stephen's Cathedral, Vienna
 <p" and "MMFCL" :over 9000: times you go t>     Shangri-La
 <ate, there are :over 9000: languages in t>     Bantu languages
 <on with 62% of :over 9000: votes in polls>     Nikki and Paulo
 < time (such as :over 9000: years), the te>     Life imprisonment
 <ith details of :over 9000: lighthouses an>     Lighthouse
 <s, climbing to :over 9000: before descend>     Collatz conjecture
 <ed from 100 to :over 9000:.<sup id="cite_>     Hyperinflation
 <, which raised :over 9000: signatures; an>     "Weird Al" Yankovic

PDF versions

The Wikipedia-to-PDF, and the agreement with PediaPress to produce "books on demand" is now apparently here. I'd like to ask the question here - how can we use this? I think that perhaps we should consider setting up a scheme whereby a WikiProject can compile a "book" of some of their best work, then offer it on their portals and WikiProject pages. Would this be an appropriate way to go? Do others have better/other ideas? Cheers, Walkerma (talk) 05:58, 9 March 2009 (UTC)

When we discussed this over at the VG project (here) we basically came to the conclusion that the articles we would want to put in a book (the essentials) don't much overlap with the articles we have that are in a publish-worthy state. So all our books focus on a tiny subset of our articles. Nifboy (talk) 08:05, 9 March 2009 (UTC)
Still, this might serve as a focus for an effort to get one coherent set of nice-quality articles together. Please keep us posted on your work, and we might be able to find a bigger publisher for anything that looks nice. Cheers, Walkerma (talk)

Badwords list

I have put the list of pages, words with a little context, and the articles they are found in on my ftp site, both as a zip file, and a gzip file. I have also put a statistics page up, as a zip and gzip, that lists which regexps from Lupin's badwords list were hit the most frequently. The badwords list looks like this :-

<wo teams began :to cool:; it would be a>       Colorado Avalanche
<llow">Winners, :losers:, undecided in >        Colorado Avalanche
<meiosis</a> and: sexual :reproduction.</>      Diatom
<o the top in a :scum :and can be isol> Diatom
<ection port and: cocking: handle (which >      SA80
<he comma shaped: cocking: handle on A2.<>      SA80

Where colons delimit the regexp found.

Fixing

Can someone explain how we can 'fix' the static version ? Perhaps we could have a wiki (authenticated ?) where we could edit this list (from above) ? Is the final release created from this list, or vice versa ? What do all the numbers mean on that list ? The list is long enough that we need some crowd sourcing to go through it. If there are too many false positives, perhaps we should fix Lupin's list and re-run the badwords scan. Wizzy 12:09, 4 March 2009 (UTC)

OK, User:Kelson asked us just to list the articles containing a problem, and where the problem is, and send them to him. (You should also fix the problem in the online Wikipedia!). We're expecting to find fewer than 100 real problems - for V0.5, there were 2300 badwords found in 2000 articles, but only 6 were real problems; the others were legitimate uses of such words (quotes from pop stars, articles on biology, etc). But I'll ask Kelson to read this thread - he may prefer you to edit them on a mirror wiki. Thanks a lot for posting this zip version! Walkerma (talk) 01:26, 5 March 2009 (UTC)
I've emailed Kelson. I'd like to mention that we can't choose later versions of articles, except perhaps one or two, because the articles have to be synchronised - otherwise templates and images are our of sync with the article and things get messed up. If you're interested - I'll count Quiddity and JohnCarter as part of the team from their earlier posts (thanks!). Walkerma (talk) 03:58, 5 March 2009 (UTC)
I've looked at the file, and there are around 75,000 entries in the file. That's a lot of reading, about three times longer than I expected, based on pro rata from V0.5. The list is quite fast to work through, though, because the vast majority of the "badwords" are clearly legitimate. But we will have to think how to reduce the workload. Wizzy has already removed eponymous entries (in the V0.5 list the article Chicken contained nearly all of the entries of the word "Chicken" - not surprisingly!). Does anyone here have any ideas on how to trim the list still further? Walkerma (talk) 04:44, 5 March 2009 (UTC)

For those wanting a closer look at the static version, those articles are visible at http://tmp.kiwix.org/wp1en-0.7_2/html/articles/<first letter>/<second letter>/<third letter>/<article name><random junk>.html - for instance http://tmp.kiwix.org/wp1en-0.7_2/html/articles/s/a/8/SA80_6e36.html from above, or http://tmp.kiwix.org/wp1en-0.7_2/html/articles/b/i/l/Bill_Cosby_9aa1.html (for some real vandalism). Just browse to the directory, and then search for the article.

Perhaps people that want to help can just post the line from the scan logs to this page, or a sub-page ? Wizzy 10:00, 5 March 2009 (UTC)

OK, an update. User:Kelson has put the 0.7 collection onto a wiki here, so we can fix things directly (you must create an account first).
Secondly, I've been thinking that 75,000 may be just too much for us to handle. I had expected around 20,000 to be on the list, which I think would've been manageable. One option would be to ask Wizzy to run the script on a recent dump, and then ONLY list the entries on the list that had been deleted (I'm calling this the "diff" approach). This would not be as good as checking every one of the 75000, because any long term vandalism would be missed. The articles in 0.7 are all high traffic articles, where obvious vandalism will be found, but a few unobtrusive things may sneak through. Last time (0.5) I noticed one long-term example here in an FA, though it was reverted by an anon after about 2 months. Wizzy - would such a "diff" approach be possible? Or do people think we should still try to do the full batch?
Another approach would be to look at who added the badwords. I will ask Luca de Alfaro if it is possible for the UCSC to help with this; failing that, we might be able to come up with something cruder ourselves. Then we could look at both the "diff" list and the list of badwords added by dubious editors, and those two lists should catch nearly every case of vandalism, I think.
Thirdly, once we agree on a strategy, we need to sign up and start doing the work. If we opt for the complete list, I propose that we each sign up to do a batch of, say, 10,000, then we report back when we've completed them. If we do a "diff" or similar, perhaps we can reduce the signup amount to 1,000 or 2,000. 2000 translates to (if I recall correctly) perhaps 4-8 hours work.
What do people think we should do? Walkerma (talk) 06:21, 7 March 2009 (UTC)
OK, I've talked with Luca de Alfaro, and he will work with us. However, he cannot (at present) generate a trust rating for a version, nor can he generate a list for a certain word. He is going to run Wikitrust on the Version 0.7 articles and let us know what he finds.
In the meantime, could Wizzy run the "diff" with his script to see if we can reduce the number of entries in the list? Wizzy, do you need another (newer) dump of Wikipedia to use for that? Or is that approach completely impossible? Walkerma (talk) 10:38, 9 March 2009 (UTC)
I do not have a good way of getting a recent dump. I am on ADSL with very restrictive cap. I do have access to a server with more bandwidth, but would need to fetch all the articles from wikipedia, and postprocess them in the same way Kelson does before running the script ? Wizzy 07:21, 10 March 2009 (UTC)
I have requested help on this, so hopefully we can get an equivalent collection of recent versions for you to run through. Walkerma (talk) 06:19, 12 March 2009 (UTC)

Featured lists?

Are featured lists going to be added to this listing?



I don't think there are any under our purview yet, but there are plans... ···日本穣? · Talk to Nihonjoe 06:56, 10 March 2009 (UTC)

As you can perhaps see (clear cache if needed), FL now shows up. The bot only generates a column or row for categories that are populated, so FL would have been omitted if you had zero FLs. Walkerma (talk) 06:30, 12 March 2009 (UTC)

Thinking about 0.8

Version 0.7 is well underway (hopefully out soon) and so it's time to start thinking about 0.8, especially if we want to move to a faster release schedule. I have put down some of my experiences with the 0.7 process at User:SelectionBot/0.7/Notes. Here is an executive summary of some of the things that I think need work for 0.8:

  • I thought that there was good response to the SelectionBot's lists of selected articles. However, the static HTML system left a lot to be desired. It could not be updated very easily, and was slow to create. For the next release, I want to integrate this with the new WP 1.0 bot, which will be database-driven. This will add much more flexibility to the process, remove some of the dependencies on database dumps, and allow the list to be updated essentially in real time.
  • The manual selection process had a strange workflow, and was too separate from the automated selection. It was often the case that the same articles were manually selected and automatically selected, under different names. And it was very difficult to integrate the manual selection list into the final release. To remedy this, I would like to create an interface in the new WP 1.0 bot to allow reviewers to mark articles as "manually selected". Then these would show up in the automatically-generated lists just like automatically-selected articles, and it would be easy to see what is going on project-by-project.
  • The index for the release really needs some work. I helped create a very simplistic index for 0.7, but we could really use some editors to specialize to figure out the best way to set up the index. This would not require a great deal of coding; I think it is more a matter of planning and setting up templates on the wiki. These can later be filled in by the selection bot. What we really need are editors with some graphical design skill to make the index attractive as well as functional.

I am sure I have other minor things on my list, but those are the big three. — Carl (CBM · talk) 02:34, 18 March 2009 (UTC)

Thanks a lot, Carl, that's a really good summary, and your notes are also very helpful. Personally I'm happy with how 0.7 has gone, except for the slowness, but there is much we can do to improve things. I don't think we'll solve everything for 0.8, but we can do better. There are many ideas here that will provide the basis of a great discussion, once 0.7 has been released. Thanks, Walkerma (talk) 02:46, 18 March 2009 (UTC)

Is this going to be sold?

Is there any ethical dilemma about taking the hard work of many people, done for free, and selling it for money? If the CD/DVD will be distributed for free, I apologize. Yesitsnot (talk) 07:39, 21 March 2009 (UTC)

Not particularly, I think, given that Wikipedia's license terms are chosen specifically to permit such. If the intent was to prevent commercial use of the resulting encyclopedia, we'd be using something like cc-by-nc-sa. Kirill [pf] 23:53, 21 March 2009 (UTC)
Kirill is right of course, but the collection will be available for free download, and it's under an open license so people can burn copies of it as they like. With 0.5, copies were used on computers for African schools, and also distributed free to schools in parts of India. That's the main reason we work on these releases. The publisher has spent two years on developing software for an offline reader, and they hope to make their money - not a lot - by selling flash drives containing the collection in shops. I don't see any problem in them receiving money for delivering WP via a different medium, as long as the collection itself continues to be available free. Walkerma (talk) 06:52, 22 March 2009 (UTC)

Time to archive maybe?

This page is supposed to be around 255kb long right now. Maybe we could archive some of it? John Carter (talk) 17:23, 22 March 2009 (UTC)

I've just archived all of the discussions from 2008!!! The MiszaBot template was in the wrong place, which was what prevented automagic archiving. The current MiszaBot settings are to archive after 28 days of inactivity on a thread. Physchim62 (talk) 13:01, 30 March 2009 (UTC)
Thank you! I was wondering how to get the bot to run! Walkerma (talk) 22:00, 30 March 2009 (UTC)
The bot's working now! For future reference, the template which calls the bot needs to be above the first section header on the talk page. Physchim62 (talk) 08:59, 31 March 2009 (UTC)

Central America List and FL class articles

Does anyone know why [[Wikipedia:WikiProject Central America/Assessment]] or most of its subprojects (Costa Rica seems to be an exception) do not pick up List-class or FL-class articles? Does anyone know how that can be fixed? Thanks. [[User:Rlendog|Rlendog]] ([[User talk:Rlendog|talk]]) 15:08, 16 March 2009 (UTC) :I just checked [[:Category:FL-Class Central America articles]] and [[:Category:List-Class Central America articles]] and found it was missing [[:Category:Wikipedia 1.0 assessments]], which the bot needs in order to put it into the 1.0 system. I've added this cat to both, and hopefully the bot will pick it up soon. Cheers, [[User:Walkerma|Walkerma]] ([[User talk:Walkerma|talk]]) 02:37, 18 March 2009 (UTC) ::Thanks. That seemed to work for some of the subprojects, such as [[Wikipedia:WikiProject Nicaragua]], but lists and FLs are still excluded from the parent project [[Wikipedia:WikiProject Central America/Assessment]]. [[User:Rlendog|Rlendog]] ([[User talk:Rlendog|talk]])` :In fact, [[:Category:Wikipedia 1.0 assessments]] ''isn't'' needed on the lower categories, only on [[:Category:Central America articles by quality]]. This is still a bug. I ran the bot by hand, and it finds the list categories OK, but it doesn't count them in the statistics. [[User:Physchim62|Physchim62]] [[User talk:Physchim62|(talk)]] 09:40, 31 March 2009 (UTC) :Or rather, the lists are counted as unassessed! And categorized in [[:Category:Unassessed-Class Central America articles]] as well as in [[:Category:List-Class Central America articles]]. The bug is in {{tl|WikiProject Central America}} somewhere… can't see it immediately. Maybe now is the moment to consider moving to a banner based on {{tl|WPBannerMeta}}! [[User:Physchim62|Physchim62]] [[User talk:Physchim62|(talk)]] 09:56, 31 March 2009 (UTC) ::Is it just a matter of adding FL and List to this section of the Central America template? If so, I can do it if I can access the code (it is currently protected):

|}<includeonly> {{#switch:{{Central America/Class|{{{class}}}}} |FA=[[Category:FA-Class Central America articles|{{PAGENAME}}]] |A=[[Category:A-Class Central America articles|{{PAGENAME}}]] {{#ifeq:{{{A-Class|}}}|pass||[[Category:Incorrectly tagged WikiProject Central America articles|{{PAGENAME}}]]}} |GA=[[Category:GA-Class Central America articles|{{PAGENAME}}]] |B|b=[[Category:B-Class Central America articles|{{PAGENAME}}]] |C|c=[[Category:C-Class Central America articles|{{PAGENAME}}]] |Start=[[Category:Start-Class Central America articles|{{PAGENAME}}]] |Stub=[[Category:Stub-Class Central America articles|{{PAGENAME}}]] |Category|category=[[Category:Category-Class Central America articles|{{PAGENAME}}]] |Cat|cat=[[Category:Category-Class Central America articles|{{PAGENAME}}]] |NA={{#switch:{{ARTICLESPACE}} |Template=[[Category:Template-Class Central America articles|{{PAGENAME}}]] |#default=[[Category:Non-article Central America pages|{{PAGENAME}}]] }} |#default=[[Category:Unassessed-Class Central America articles|{{PAGENAME}}]] }}</includeonly><noinclude> {{Documentation}}{{pp-template|small=yes}} <!-- PLEASE ADD CATEGORIES AND INTERWIKIS TO THE /doc SUBPAGE, THANKS --> </noinclude> [[User:Rlendog|Rlendog]] ([[User talk:Rlendog|talk]]) 15:54, 3 April 2009 (UTC) ::::No, the bug is more serious than that. The banner is ''also'' adding these articles to [[:Category:Unassessed-Class Central America articles]] and, all the time it's doing that, they won't be properly counted in the statistics. [[User:Physchim62|Physchim62]] [[User talk:Physchim62|(talk)]] 17:40, 3 April 2009 (UTC)

Assessment chart upgrade?

Whoever made and updates charts like the one on the left has done an enormous service to projects like WikiProject Oregon. I know they were originally made for the Version 1.0 project, but they are incredibly useful for anyone trying to improve a certain area of the encyclopedia collaboratively.

There's one thing we've always wished we could do, which is link to the cross-section of the categories. Frank Schulenburg recently showed me a tool that makes this possible, and I made the chart on the right so that now we may, for instance, link to all top-importance, B-class articles as part of a collaboration of the week drive.

Obviously, it would be ideal if the two charts were integrated. But the one on the left gets updated so frequently, and I'm not sure what the best way to incorporate the structure I made into it. Any suggestions? Anyone want to work on combining them, maybe on a meta-level that benefits all WikiProjects? -Pete (talk) 18:13, 5 April 2009 (UTC)

See User:WP 1.0 bot/Second generation. It's coming, eventually. Happymelon 18:23, 5 April 2009 (UTC)

Version 0.7 release update

I have spoken with the publisher, and it looks as if we can have a "beta" version of 0.7 ready for the end of March. The main remaining issue is the #Badwords_list, but we are working on getting that down to a manageable size. The publisher, Linterweb, has said that they intend to publish the collection on a USB flash drive rather than a DVD. This has a couple of advantages:

  • It does not need to be fabricated by an outside company - Linterweb can dump the content onto the flash drives themselves.
  • It should be able to be sold for the same cost as an empty flash drive, and in the same shops as an empty flash drive; this would used as an additional incentive for the buyer to purchase that particular brand of flash drive. It would also be sold at Amazon etc.

The pages will be in Zeno format (see here), unlike Version 0.5 but like the German releases. Walkerma (talk) 06:28, 12 March 2009 (UTC)

unarchive and datebump -- Quiddity (talk) 19:15, 17 April 2009 (UTC)
What's the status of the 0.7 release? Also, approximately what size (in megabytes) do you expect 0.7 to be? Excuse me for being a newbie, but what are "badwords"? Axl ¤ [Talk] 14:55, 12 May 2009 (UTC)

diff approach

Not great news, I am afraid. I ran the badwords scan on the current dump (unfortunately in wiki markup as opposed to HTML of the 'reference' dump). I stripped the wiki markup from the current dump to approximate the html version, sorted the results of both, and ran a comparison, and then removed the entries that are still the same. I got the list down to 63,700 - not a great improvement. I am not certain the the wikidump I got matches the article list in our reference list very well, so I am going to try directly fetching the list that Kelson used.

Just an update. Wizzy 17:03, 14 March 2009 (UTC)

OK - finished the badwords scan on the directly-fetched list. I sorted the matches within the article, to allow for text re-arrangement within the article, and the list size drops to 22229 - about 1/3 of the original. I have uploaded a gzipped version, and a zipped version. Wizzy 07:51, 17 March 2009 (UTC)

THANKS!!!! That's great! 22229 is still a lot, but it's much more manageable. Now we need to get some folks to help out going through these. Any volunteers? Walkerma (talk) 02:31, 18 March 2009 (UTC)
I'm still happy to help. Assign me a batch of 4-8hrs work (can work on today and tomorrow), and point me to the directions (where do we fix problems, where do we note unsureness, etc). -- Quiddity (talk) 17:44, 18 March 2009 (UTC)
Thanks Quiddity! I've set up a page at Wikipedia:Version_1.0_Editorial_Team/Badwords_cleanup for this work, and this includes a quick guide to how the work is done (essentially the same as I did for Version 0.5). Please sign up for a section, and work through it as you find time. Please note - I leave home for a 3000+ mile trip to a weeklong conference on Saturday, so if my contribution is small over the next few days, please understand. I hope to be able to work on this while I'm away, but if I can't please don't just think I'm a lazy $%#*. Cheers, Walkerma (talk) 03:19, 19 March 2009 (UTC)

Badwords cleanup questions

I've finished checking the Bs (took about 4 hours), but have a few I wasn't sure about:

  1. <l the cults and: sexes: are devotees o> Bahawalpur
    <ranted and its :erection: started in 837> Bahawalpur - (whole article has been vastly updated since the revision that included these 2)
  2. > Birefringence - (many entries in badwords list, but empty of actual badwords??)
  3. <gists where you: asses: your own perso> Big Five personality traits - (spelling error, should be assess)
  4. <about you that :pisses: Me off!"</li>> Book of Job - (vast changes since this was a part of it - suggest piano's diff instead - see diff)
  5. <ined that it "[:stank:] to high heave> Bride of Frankenstein - (vast changes since this revision, improved to FA quality, suggest using FA promoted version instead [Can we update like that?])

and there were a few that I couldn't find the badwords listed (I checked numerous diffs between mid-Dec and mid-Jan for each (but probably just missed the exact problem diff in each case)). Leaving list here for later rechecking against the kiwix mirror...:

  • <a> He is :cheating on his: wife with Kati> Booker Huffman - (can't find) 6th Dec, added to listing
  • <lass="metadata :dummy:" colspan="2"><> Brandy Norwood - (can't find) (Can't find even on Kiwix)
  • < about, I'd be :stupid :to say no."<sup> Brenda Song - (can't find) (Can't find either, but looks fine in Kiwix - Walkerma)
  • <<p>:itsanhh@aol.com: <sup id="cite_> Brenda Song - (can't find) (Can't find, but REMOVED personal info from Kiwix and added to listing - Walkerma)
  • <dely know that :your mum: goes to this p> Brothel - (can't find) 6th Dec.
  • <ogether to have: sex :with eac other.> Bus - (can't find) 4th Dec.

No more time today. Back tomorrow. -- Quiddity (talk) 23:02, 19 March 2009 (UTC)

Brilliant work! Thanks a lot. I added VersionIDs above when I found them. Here are some quick comments & answers:
  • I spent more time checking exactly when the dump occurred, and it turns out (see above) it was around 4th-6th December. (I had thought it was the end of December, but I was wrong - sorry!). I think it may be easier just to look for the vandalism in the kiwix page rather than on the live Wikipedia, as long as it's not too slow, because these vandalisms only seem to last a very short time.
  • However, note that we do not use versions contributed by anon IPs in 0.7, so the vandalism is usually redlinked account holders, and these often stand out in the History.
  • Unfortunately we can't easily switch to a different version, unless it's only a day or two different - because there is a constant change of templates, photographs, etc. The release includes a complete list of all contributors, too. If you chose the FA you mention, some of the content might be broken. That's why we are working with a 3 month old version of the article.
Must get on, I'm preparing a two hour presentation for tomorrow morning! Thanks again, Walkerma (talk) 03:30, 20 March 2009 (UTC)
unarchive and datebump -- Quiddity (talk) 19:15, 17 April 2009 (UTC)

Proposed addition to assessment table

I have propsed it here. Crash Underride 16:22, 13 May 2009 (UTC)

GA Reassessment of George Washington

I have done a GA Reassessment of the George Washington article as part of the GA Sweeps project. I have found the article to need work on referencing. My review is here. I am notifying all the interested projects that this article is on hold for a week pending work that needs to be done. I don't think it will require too much to satisfy the GA Criteria and I sincerely hope that someone will step forward and take this project on. It would be a shame to delist what is in all senses but one, a good article. If you have any questions please contact me on my talk page. H1nkles (talk) 21:14, 5 June 2009 (UTC)

Question

This may not be the proper place to ask, and if that is true, could you direct me to where I could ask? Where would I go to find the number of articles that were under the provenance of one WikiProject at this time last year, so one could see how many new articles have been added to the project in the meanwhile? Thanks. Wildhartlivie (talk) 20:51, 10 June 2009 (UTC)

Possible issue

The most recent edition of the logs of changes in the philosophy wikiproject have been listing every article, rather than just changes. See June 15:

Wikipedia:Version_1.0_Editorial_Team/Metaphysics_articles_by_quality_log
Wikipedia:Version_1.0_Editorial_Team/Medieval_philosophy_articles_by_quality_log

Pontiff Greg Bard (talk) 19:13, 15 June 2009 (UTC)

Thanks! London Transport had the same problem, I think. Looks like the bot developed a bug. It happened once before, a year or two back. Cheers, Walkerma (talk) 19:33, 15 June 2009 (UTC)
It was apparently caused by some kind of reconfiguration of the servers; CBM is looking into the problem, which apparently messed up at least one other bot also. Walkerma (talk) 05:35, 16 June 2009 (UTC)

Offline reader for mobile devices

I browsed through the archives but could not find an answer. Is there any ongoing effort to create an open source mobile version of Wikipedia 0.x for mobile devices such as Android and iPhone? Such devices are often taking in the subway, where no Internet connection is available, and not everybody can afford to pay for a mobile Internet connection. So an offline copy of Wikipedia would be great, and Version 0.x is much more adapted than the whole Wikipedia for such devices with rather limited storage. A very portable way to produce this would just simple HTML files and a static thematic index: Using the device's HTML browser means there is no software to develop. A more evolved version could include searching and compression, but that would require device-specific development. I am sure this is not a new idea, could anyone point me to a place where people are building this (open source) ? Cheers Nicolas1981 (talk) 05:27, 20 June 2009 (UTC)

Actually, I don't know of anything. We're even struggling with the standard offline release right now, because our publisher is no longer offering the reader software as open source. I think it would be a fantastic project, but it would need someone to coordinate it on WP (could you do it?). I will be going to Wikimania this year, I'll try and fish the waters to find a software company that could & would do this. I'll also talk to User:Kelson on the French Wikipedia, since we work a lot together. Walkerma (talk) 06:32, 20 June 2009 (UTC)

Major statistics problem

What the hell has happened in the July 11 stats for the WikiProject Ireland that the total assessed articles has fallen from 23,675 to 10,087? This seems to have been caused by all the stub-class stats being omitted. Can someone please fix it? ww2censor (talk) 22:55, 11 July 2009 (UTC)

OK, I think I found the problem. Someone had vandalised the stub-class category page. ww2censor (talk) 03:55, 12 July 2009 (UTC)
Thanks! I was worried that we had another bot problem, like we had this spring. Walkerma (talk) 14:17, 13 July 2009 (UTC)

WORSHIP OF A DICTATOR ON WIKIPEDIA

Dear Wikipedia:Version 1.0 Editorial Team. Could the editors of wikipedia please do something about that embarrassing feel-good article about the Eastern European Dictator (Joseph Broz-the former Yugoslavia). He is portrayed as some sort of pop star. This is embarrassing considering he was responsible for war crimes,mass massacres, torture & mass imprisonment. One to mention is the Foibe Massacres (there are BBC documentaries). Wikipedia has an article on this so it’s just contradicting itself. You have one feel-good article about a Dictator then you have an article about the Massacres he approved and organized with the Yugoslav Partisan Army. Then there were Death squads in Southern Dalmatia (the Croatians are putting up monuments for the poor victims & their families now) Also it’s important to mention that the Croatian Government is paying compensation to his former victims. Surely a more critical historical article should be written or this present article should be removed altogether. What is next? A Stalin feel-good article? What about the respect towards the poor victims who suffered those awful events? Can the editors please look into this?Sir Floyd (talk) 01:54, 4 August 2009 (UTC)

Please note guys, this is a sock trying to stir-up trouble. He's probably User:Luigi 28, but could be User:Brunodam as well. This message was copy-pasted just about everywhere in an effort to create a fake conflict about a non-existing issue. Considering the barely disguised POV-pushing, there's really little doubt its just another in a long, looong line of socks belonging to one of the so-called "irredentist users". A group of a dozen or so (mostly Italian) POV-pushers that got banned from enWiki for edit-warring, sockpuppeteering, block-evasion, harassment, etc.. They surface now and again to make spiteful edits and insult people. --DIREKTOR (TALK) 23:30, 6 August 2009 (UTC)

Category intersections for Wikiproject assessment tables

Hi all, I can't find the page that discussed this being in the works, I recall it being part of the .7 pages, but I can't find it now. Is that work still moving forward? As an alternative, I was considering filing a bot request to use the toolserver tool Cat Scan and then have a bot process the wikitext output into something useful, then make each of the numbers in the Wikiproject tables linkable. For example, now when I want to find which articles are top priority start class articles, I need to do it manually, but if that number in the table linked to a list of the articles instead of just being a number, it would be much easier for projects to prioritize their work. - Taxman Talk 00:27, 28 August 2009 (UTC)

Martin mentioned at Wikimania that this is still planned in the next version of the bot, but I don't recall a specific date for when that version is expected to be released. Kirill [talk] [pf] 02:24, 28 August 2009 (UTC)
That would be great if they would do that. Rlendog (talk) 02:53, 28 August 2009 (UTC)

OpenSound Control moved to Open Sound Control

OpenSound Control has been moved to Open Sound Control. The old name appears on your page Wikipedia:Version_1.0_Editorial_Team/Computing_articles_by_quality/54. I didn't edit your page because this would have disturbed the alphabetic sorting on it, and I wasn't sure it that would mess things up. HairyWombat (talk) 15:27, 18 August 2009 (UTC)

It's OK now. It often takes a few days for the bot to update things like this. Cheers, Walkerma (talk) 06:25, 4 September 2009 (UTC)

Criteria in deciding Importance levels

With importance assessment criteria for a project, a project shouldn't worry about interwikis, in-links, or other criteria, and leave that to the selection bot, correct? Also in terms of size and scope of the various levels, is there a general rule on what range the various number of articles per level should be, or size (assuming all were comprehensive) of articles if they were bundled together? We're updating the importance scheme at Wikipedia talk:WikiProject California, and are trying to flush out the balance and size of the top level at the moment. -Optigan13 (talk) 04:49, 28 August 2009 (UTC)

Optigan13 brings up a good point about leaving the objective criteria (e.g., in-links, interwikis, hits, etc.) to be determined by the bot (it is anyway). I would like to see the projects come to terms with the intrinsic value the subject of the article has relative to the mission of the projects. Does the article's subject significantly represent the articles of interest to the project? If yes, it is Top Importance to the project and if no, it would be low or not at all. The value of the Importance criteria should be a message to the readers of the articles and not be limited to a message meant only for the Version 1.0 Editorial Team. Pknkly (talk) 03:57, 29 August 2009 (UTC)
Back when I was rating the WP:VG articles my usual criteria for Top-level was a gut-level "Well DUH". Anything moderately below that was High-level, and anything that stood out amongst the rest was Mid-level. Nifboy (talk) 05:50, 29 August 2009 (UTC)
Thanks, just wanted to confirm that. The duh standard I'm using is did we have to learn about that by the time we finished high school in California. I'm also wanting the importance categories to have some logical flow down the structure and a decent description of what that level is so that criteria for the various levels will be clear and equitable. -Optigan13 (talk) 00:24, 30 August 2009 (UTC)
Yep, I think you have it about right. The objective criteria are easy for a computer to read; what is harder is that human element that says, "This may score low by some measures, but it's really important for this project." An example would be Shakespeare's plays, which has zero interwikis (other languages just have the plays listed under Shakespeare), but it's listed as Top-Class for the Shakespeare project (duh!). As a rough guide, I think a typical sized project (say, 1000 articles) should have roughly 1% of those as Top-Class; many projects seem only to put one or two articles in Top, but that's a bit tighter than we intended. BTW, I loved the comment about "I would like to see the projects come to terms with the intrinsic value.." and "The value of the Importance criteria should be a message to the readers of the articles". The system is supposed to serve the projects and users; I see the 1.0 project as a small beneficiary of the classification work that projects probably need to be doing anyway. Cheers, Walkerma (talk) 02:08, 4 September 2009 (UTC)

Wikimania update

I gave a talk on the 1.0 project at Wikimania 2009, there is a video of my talk available (linked from that page). I'll try to make the slides available soon. I also gave a separate presentation on the assessment & selection scheme - again, there is a video available linked from the abstract.

We had an short, informal "offline releases" meeting - you can see a brief summary of the discussion at m:Offline readers. Walkerma (talk) 02:42, 10 September 2009 (UTC) The main things for WP1.0 that seem new or important to me are:

  • The Foundation seems to be backing the OpenZIM format for offline releases.
  • They like WikiPock as an offline release format. I think they will also support Kiwix as an offline reader once it is available for Windows & Mac (currently only Linux, but Windows version should be available very soon).
  • Linterweb is looking less attractive as a publication partner since they are no longer using an open source reader.
Who give you this information Martin ? Pmartin (talk) 15:04, 8 October 2009 (UTC)
  • The Foundation will aim to produce a new HTML dump of the English Wikipedia soon, and may even produce an openZIM version.

I also had lots of time to talk with Luca de Alfaro, who is behind the WikiTrust initiative. We are planning to collaborate on getting a WikiTrust-based version selection tool ready to use for the next release - this would reduce our time to publication by about 6 months!

Feel free to ask questions. Walkerma (talk) 02:56, 10 September 2009 (UTC)

Apparently I was misinformed on the Linterweb reader software (see above) - it IS open source. Walkerma (talk) 04:34, 9 October 2009 (UTC)

Wikimedia stragegy planning and task forces

As some of you may know, the Wikimedia Foundation is conducting a major initiative to develop a strategy for the next five years. Many people here will be interested some of the proposed task forces - offline releases (part of ESP 2) and quality (ESP 3). Please take a look at the list of activities, and sign up here if you think you can help. Cheers, Walkerma (talk) 17:44, 29 September 2009 (UTC)

Category:Pages where template include size is exceeded

As you may know Category:Pages where template include size is exceeded has many of Wikipedia:Version 1.0 Editorial Team's assessment pages in it. For this reason I propose simplifying them so that Wikipedia:Version 1.0 Editorial Team/African diaspora articles by quality/2 (scroll to the bottom to see the effect on the page) becomes like User:Rich_Farmbrough/temp38. What do you think? I could change the existing members of Category:Pages where template include size is exceeded, if you are happy for me to do that. Rich Farmbrough, 13:17, 24 September 2009 (UTC).

How much of a problem is this (a) for users and (b) server usage? Is there anything else? I personally don't mind too much if we lose the templates, but we need to know that this is a real problem. Also, we should get the opinions of several people before we make such a change (because hundreds of people are involved in assessment). BTW, even if you were to change the pages, remember the bot updates these pages every 4-5 days or so, so it involves CBM or Oleg re-coding the bot.
One other solution would be to create a plain page like you showed, and add the color directly into the table using HTML coding rather than template transclusion. If that is possible (we need CBM or Oleg to comment here), then I think we could have the best of both worlds - the tables would look the same, but the server load would be much less. Walkerma (talk) 03:15, 25 September 2009 (UTC)
  1. The pages are broken.
  2. Actually while many people may be involved in the assessment, some of these pages are scarcely used. [5] shows that Wikipedia:Version 1.0 Editorial Team/Canadian communities articles by quality/13 got 4 hits in the whole of August. Which is not surprising since we have Category:WikiProject Canadian communities articles, of which High importance got a massive 13 hits in August.
  3. Yes coding the colour in is fairly trivial. The reason I didn't suggest it is that I don't want to replace the transcluded template with a subst:ed tempalte,adn stil run into the size problems that mean the page does not render below a certain point."
  4. It is only indirectly about server load. It is fine to have a few hundred or a few thousand rarely viewed pages that are heavy on the server, but the protection is generic and stops the page rendering templates beyond a certain point.
Rich Farmbrough, 18:02, 26 September 2009 (UTC).
Thanks for clarifying, that's very helpful (I'm not very strong on the technical side - sorry!). I'd seen the error, but hadn't realised this problem was the cause. What I meant by HTML coding was that instead of the Bot having a command to say "Paste this Start-Class template into the next cell" we would code the bot to say "Paste the word Start into the next cell, and make the background orange-red", similar to what is done here. We could also do the same with importance assessment templates. That would mean the page would contain ABSOLUTELY NO TEMPLATES at all. Do you think this is feasible? Walkerma (talk) 20:00, 26 September 2009 (UTC)
CBM is busy/away, but says he will address this issue when he returns. Thanks again for flagging the problem. Walkerma (talk) 15:24, 28 September 2009 (UTC)

All I need to do is to lower the maximum number of articles that can get listed on a single page. I have lowered the threshold some, and I will see after the next full run if I need to lower it more. If anyone were to edit the pages by hand, their changes would get overwritten the next time the bot updates the pages. But the change in format might also confuse the bot, because it does look at the previous content to help detect articles that were removed from a project category. So any changes in format need to be accomplished within the bot code, not outside of it. — Carl (CBM · talk) 04:20, 30 September 2009 (UTC)

One problem with lowering the max number is that then we end up with even more pages for the "X articles by quality" listing - for bigger projects, this large number is already a problem, and I think it would be better if we could make each page smaller by making it more efficient. Can that be done? Walkerma (talk) 08:32, 30 September 2009 (UTC)
I find it hard to believe the larger projects would browse through a list of thousands of articles. The maximum number of articles per page was 450, and I decreased it to 400; either way, a list that long is not really readable by a human, and a list with 1000 articles would only be worse.
The longterm fix for all this is to switch over to the new WP 1.0 bot, which will make the lists dynamically, so people can search for what they are interested in instead of having to use these lists. That system keeps being delayed, for various reasons, but I do plan to move forward with it after 0.7 is done. — Carl (CBM · talk) 11:04, 30 September 2009 (UTC)

OK I'm going to do the temporary fix to the problem pages. It will fix the pages, clear them out of the fairly important category, even if it needs to be re-applied in a few days. the first page I hit had all Low-Class Stub-Class anyway. Rich Farmbrough, 15:38, 30 September 2009 (UTC).

Please don't do anything by hand! The bot is working on an update, and I need to see which pages are still broken after that. — Carl (CBM · talk) 11:38, 1 October 2009 (UTC)

After the full bot run, I tweaked things and ran a few more by hand. At the moment no WP 1.0 bot tables are in the category. I will check again after the next full run. — Carl (CBM · talk) 11:58, 9 October 2009 (UTC)

Download formats

I would like access to all the articles in the dump as a file structure with html (named by article title) + graphics files.

I specifically would like to be able to put a /wiki directory at the top of an apache root directory and serve everything to a standard browser in a school environment.

The school environments I service have no internet access, and wikipedia == internet for these schools.

I understand that search is a problem here - a statically-generated index may be the best we can do.

I realise that the custom reader + zeno file is okawix's answer, but extra software installation is a problem for me (diverse clients, and me not being present except at install time).

I want /n/ne/Nelson_Mandela.html - not /0/09/7766.html.

Search must be static, or server-side.

I am not denigrating all the work put in so far, nor okawix wishing for a $$ return - I would just like an alternate download format, as above. Wizzy 10:42, 18 October 2009 (UTC)

You can use ZimReader to serve a ZIM file with a HTTP server. I have set up one with our file. Please contact me directly per email (or even better the official ZIM ML dev-l ((at)) openzim.org) for any additional question. I would be really pleased, if people discovering issues with this dump report them to http://bugs.kiwix.org. Kelson (talk) 09:12, 9 November 2009 (UTC)

Comments sub pages

I have a vague memory that "Comments" subpages were started by this WikiProject. If anyone can confirm or deny that, commenting at this discussion might be a good idea. I know that some of the worklists set up by this project for the various WikiProject assessment workflows trasnclude the comments subpages (unless I'm mixing up my assessment WikiProjects). I'll have a look in the talk page archives here. 18:45, 20 October 2009 (UTC) UPDATE: I found the old discussions here: 1, 2, 3 (May to July 2006). I'm going to take this back to the Village Pump discussion. If anyone else knows of any old discussions, please list them here or there. Carcharoth (talk) 18:51, 20 October 2009 (UTC)

ArbCom election reminder: voting closes 14 December

Dear colleagues

This is a reminder that voting is open until 23:59 UTC next Monday 14 December to elect new members of the Arbitration Committee. It is an opportunity for all editors with at least 150 mainspace edits on or before 1 November 2009 to shape the composition of the peak judicial body on the English Wikipedia.

On behalf of the election coordinators. Tony (talk) 09:37, 8 December 2009 (UTC)

Beta test of new WP 1.0 bot

The new version of the WP 1.0 bot is ready for some initial beta testing. More information is at User_talk:WP_1.0_bot/Second_generation#Beta_testing_2009-12-16, where any comments and suggestions will be deeply appreciated. — Carl (CBM · talk) 01:51, 17 December 2009 (UTC)

Excellent - thanks Carl! I know a lot of people have been waiting for this! Walkerma (talk) 02:26, 17 December 2009 (UTC)

Global warming: proposal for discretionary sanctions

At Wikipedia:Administrators' noticeboard/Climate Change there is an ongoing discussion of a proposed measure to encourage administrators to enforce policy more strictly on articles related to climate change. I'm placing this notification here because global warming is a member of this WikiProject. It doesn't belong on the main WikiProject page because it's a user conduct matter and isn't really on topic there. --TS 13:31, 1 January 2010 (UTC)

About the index

Martin Walker asked me to comment here about the index. From the start, I viewed the 0.7 index as an experiment that we would learn from for the future releases. It's a fun project to work on: finding ways to leverage all of the semantic metadata that we already have in order to automatically index articles. Unfortunately, I don't have enough time to work on the index and the WP 1.0 bot, so I had to leave the index for someone else to continue.

The code that I used for the geographic and topic indexes is here in a subversion repository. The code does require a lot of hand-holding to work, partially because I did not have a lot of time to polish it and partially because we were inventing the requirements for the index at the same time that the code was written, so there were no specifications to check the code against. I would view it as a just starting point for the 0.8 index.

Although some of the code looks a little embarrassing from my current perspective, I think that overall the 0.7 index was a successful experiment. I would be glad to answer questions and be in touch about the code for whoever takes up the indexing side of the release versions. — Carl (CBM · talk) 20:48, 28 January 2010 (UTC)

Thanks CBM! This index has its faults, but it is definitely a big step forward compared to what we had. For the 2000 articles in Version 0.5, I did the indexes by hand (scroll to the bottom for the full list). I actually knew almost every article in the collection - and it took me about 16 hours. Doing an index by hand for 31,000 articles seemed a bit daunting! I note that other offline releases typically have not had any index, only an alphabetical listing and possibly a search box, so it's clearly not as simple as it may seem. It will take a lot of manual refinement, but once we have this it will be very valuable, and usable for all future releases (with updating). Walkerma (talk) 23:12, 28 January 2010 (UTC)

Recommendations to the Wikimedia Foundation & the community for offline releases

We have pretty rough drafts for four recommendations now up at strategy:Task_force/Recommendations/Offline. The four recommendations are pretty basic:

  • Make re-use of Wikimedia content easier (e.g. improve the dumps).
  • Use cellphones as a delivery mechanism for Wikimedia content, since these are much more popular in developing countries than the Internet.
  • Use schools as distribution points for Wikimedia content.
  • Produce a variety of selections of content, using automation and human organization to choose suitable articles and article versions.

It would be very helpful if you could review these, and leave comments here. The recommendations have to be finalized by January 12th, so there isn't a lot of time! If you prefer just to make a quick comment here, I'll be sure to take comments here into account also. Thanks, Walkerma (talk) 08:47, 11 January 2010 (UTC)

Vital articles

I'd like to propose that release versions use articles listed at Wikipedia:Vital articles (1,000 articles), if not those at Wikipedia:Vital articles/Expanded (now at about 3,250 articles). Maurreen (talk) 18:15, 10 February 2010 (UTC)

I basically agree. I suspect that nearly all of these are included, but it would be nice to compare. You can see the "worst" of the main 1000 listed here. Even there, nearly all are included in the automatic selection; with the handful that aren't (score below 1250), they are often quite flawed, or they are redirects. For a few they are the "wrong article" - i.e., they represent an important idea, but the main article covering that idea is something else. In other words, the few that "fail" usually need something fixing, and this can be used to keep the Vital list "trim". Unfortunately, the expanded list isn't tagged for the bot, so we'd have to use AWB to compare those with the automated lists. Once we have a list of any missing (which we can do once the 0.8 selection is done) we can manually review the handful that got overlooked. Cheers, Walkerma (talk) 04:11, 18 February 2010 (UTC)
Thanks. We're reviewing the Wikipedia:Vital articles and in the process of swapping some out. Cheers. :) Maurreen (talk) 04:26, 18 February 2010 (UTC)

Tagging 0.7 articles

We need to do some sort of talk page tagging for the WP 0.7 articles. This would be a good time to discuss the {{WP1.0}} template, and whether we should extend that to handle multiple releases, or use a separate template for each release.

Personally, I would favor reusing the same template, for simplicity. We could simply add a parameter for 0.7 that categorizes the talk pages appropriately.

Also, I am not sure about the quality/importance ratings on the WP10 template. Is the idea that these should simply take the highest value that any wikiproject assigns to the article? — Carl (CBM · talk) 19:56, 13 February 2010 (UTC)

I agree that we should reuse the WP1.0 template - it was always intended that we would use it for ALL releases. Titoxd coded much of the template, so he can advise on changes & updates. As I understand it, the template just has a simple "v0.7=pass" to indicate that the article is included in 0.7. However, at present, only those articles that were manually reviewed have the template; the bot-selected articles don't have any 1.0 template at present on their talk pages.
The importance rating was included in case we needed it, but it is almost never used, and I would recommend removing it altogether. It is far too subjective, now we have specialist project assessments and we have the "external interest points". The quality rating dates from the time when most articles didn't have a WikiProject assessment, and so often the 1.0 template was the only assessment template available. I would say that now this quality assessment is unnecessary, so it could be removed. In the same update, we could add parameters to allow us to start on Version 0.8, and we need to make sure that the main WP1.0 bot can read the Version number for 0.7 and 0.8 into the tables (at present only 0.5 shows up).
I think once these issues have been ironed out, we should consider bot tagging all the 0.7 articles with the template. I'd recommend doing this over a week or two, so that people's watchlists don't fill up with these updates. An alternative might be to wait until we do the 0.8 selection (Spring 2010??), and tag for 0.7 and 0.8 simultaneously (where articles are in both, which will be true for much of the 0.7 selection). We'd need to see if such work can handle nesting/collapsing of templates as seen in this example and this example, respectively. Thanks, Walkerma (talk) 04:50, 14 February 2010 (UTC)
I'd like to go ahead and work on the 0.7 tagging so I can get the web system to start listing those articles. That should be useful when we're working on 0.8. I agree with removing the quality and importance parameters, although we'll have to think about what to do in the unusual event that a selected article is not rated by any wikiprojects. I think that there are so many projects that we should be able to add a rating template for some project if this ever comes up.
One thing that we need to talk about before updating the workflow is the "pass"/"held"/"nom" values of the parameters. I think that it will not be efficient to have people nominate articles by just editing the talk page tags. I'll start a section on that below. So I think that we should remove the "nom" parameter from the template. — Carl (CBM · talk) 13:21, 14 February 2010 (UTC)

Manual selection workflow

One thing that I noticed during the selection phase of 0.7 is that the workflow for manually-selected articles was difficult to integrate into the automatic selection. The previous workflow is documented at Wikipedia:Release_Version_Nominations. It works OK except for a few issues:

  1. The list of included articles at Wikipedia:Release Version is not easy to read with a bot. In particular, it's hard to make a dynamic list of all selected articles, both manually-selected and automatically-selected
  2. There is a list and also a talk page tag for selected articles; these can get out of sync. Some people will add just the talk page tag, instead of actually nominating the article.
  3. There was no way to mark articles that should not be included; several wikiprojects wanted to exclude articles because they thought the automatic selection chose poorly.
  4. Over time, the manual selection did not keep up with page moves. I noticed quite a few redirects in the "manual selection".

I have added features to the web tool that should help with this. The tool will let us maintain a "master list" of articles that are either marked as "release" (so they will be included) or "norelease" (so they will not be included even if the automatic selection picks them). Before we start doing 0.8, I'll import all the manually-selected articles from 0.7 into this system, as a starting point. But for now I have left the list empty so people can play around with it by adding and removing random articles.

My idea for the new workflow is something like this:

  1. Article is nominated at Wikipedia:Release_Version_Nominations
  2. If passed, the article is added to the list in the web tool
  3. Whether passed or held (not passed), the talk page tag is edited to record this.

If you want to try out the web tool, it's here. I have created a username 'Test' (capitalized) with password 'wp10test' that you can use to try it out. Of course I will remove this user later.

The web tool does require a password to edit the manual selection, as a small obstacle to vandalism. I will give out passwords upon request. The list is publicly viewable and all changes made with the web tool are logged, along with a timestamp and comment. So the web system should be just as transparent as the wiki-based system would be. — Carl (CBM · talk) 13:55, 14 February 2010 (UTC)

(Sorry I've been "offline" all week - real life getting in the way again!) This looks to be very workable, and it will save the problems we had with keeping the template-based list and the manual list in sync for 0.5 and 0.7. It looks very smooth and easy to use - this is necessary, since some reviewers may only review a dozen or so articles. My only comment - we will need to document how this system will work, and who is able to review. Basically anyone with an account can review, but we want them to sign in and see how the process works and FAQs before they start just randomly selecting articles - especially the part about needing both a nominator and reviewer. We may want even to add some text onto the top of the toolserver page. Thanks a lot - this looks very nice! Walkerma (talk) 03:40, 18 February 2010 (UTC)
Since we want to make it easy for people to review, I edited the system so that I don't have to assign passwords by hand. There is a system used by CommonsHelper that lets people register a password and tie it to their Wikipedia account. Now my script uses that system too, so people can register themselves without waiting for me to assign a password.
I agree with the need for documentation. We can certainly add instructions to the screens at various places (particularly on the "add articles" screen). — Carl (CBM · talk) 12:25, 19 February 2010 (UTC)

fast java code for counting word frequency (for indexing purposes)

i said i wouldn't do it, but here it is. main thing is you really need to use a hashtable to do it in low computational complexity time, otherwise it will be at least O(N) slower:

import java.util.*;

public class count_words {
	int total_word_count = 0;
	Hashtable<String,Integer> total_word_counts = new Hashtable<String,Integer>();
	Hashtable<String,Integer> article_total_word_counts = new Hashtable<String,Integer>();
	Hashtable<String,Hashtable<String,Integer>> article_word_counts = new Hashtable<String,Hashtable<String,Integer>>();

	public void countArticleWords(String article, String[] words) {
		int total = 0;

		//get article word count entry
		Hashtable<String,Integer> article_word_count = article_word_counts.get(article);
		if( article_word_count == null) {
			article_word_count = new Hashtable<String,Integer>();
			article_word_counts.put(article,article_word_count);
		}
		//can be improved by adding to total word count in a second pass, from iterating through article hash table
		for( int i = 0; i < words.length; i++) {

			//get word and make sure it only contains alphabetic characters
			String word = words[i].toLowerCase();
			byte[] test = word.getBytes();
			boolean ok = true;
			for( int j = 0; j < test.length; j++) {
				if( test[j] < 'a' || test[j] > 'z') {
					ok = false;
					break;
				}
			}
			if( !ok)
				break;

			//add to word counts
			total++;
			Integer tot_count = total_word_counts.get(word);
			if( tot_count == null)
				tot_count = 0;
			Integer art_count = article_word_count.get(word);
			if( art_count == null)
				art_count = 0;
			tot_count++;
			art_count++;
			total_word_counts.put(word,tot_count);
			article_word_count.put(word,art_count);
		}

		//write word count totals
		article_total_word_counts.put(article,total);
		total_word_count += total;
	}

	public Hashtable<String,Integer> getArticleWordCounts(String article) {
		return article_word_counts.get(article);
	}
}

indexing would be pretty straightforward data manipulations from there. though latent indexing would be more complex - it would require constructing word count hashtables by category, and then calculating kl-divergences of articles from those. I haven't tested it but given the simplicity all errors are bound to be shallow. Kevin Baastalk 17:57, 27 February 2010 (UTC)

a little more:

	public String[] words = null;
	public int[] values = null;
	public double[] freqs = null;
	public void compressMainWordIndex() {
		Vector<String> total_keys = new Vector<String>();
		Enumeration<String> e = total_word_counts.keys();
		while(e.hasMoreElements()) {
			String s = e.nextElement();
			total_keys.add(s);
		}
		words = new String[total_keys.size()];
		values = new int[total_keys.size()];
		freqs = new float[total_keys.size()];
		for( int i = 0; i < words.length; i++) {
			words[i] = total_keys.get(i);
			values[i] = total_word_counts.get(words[i]);
			freqs[i] = values[i]/total_word_count;
		}
	}
	public double[] getMeanRegressedArticleWordFreq(String article, int words_to_add) {
		double[] art_freqs = new double[freqs.length];
		Hashtable<String,Integer> arthash = article_word_counts.get(article);
		int count = article_total_word_counts.get(article);
		for( int i = 0; i < words.length; i++) {
			Integer f = arthash.get(words[i]);
			if( f == null)
				f = 0;
			art_freqs[i] = (f+freqs[i]*words_to_add)/(count+words_to_add);
		}
		return art_freqs;
	}
	public double[] getWordSurprise(double[] article_word_frequency) {
		double[] surprise = new double[words.length];
		for( int i = 0; i < words.length; i++)
			surprise[i] = Math.log(article_word_frequency[i]/freqs[i]);
		return surprise;
	}

roughly speaking the words that you want to have in your index word by the ones with the highest variance of usage frequency in articles. it really should be something like negentropy, but variance is a decent proxy. then the articles with the highest surpise for each word are listed under that word in the index. Kevin Baastalk 18:56, 27 February 2010 (UTC)

and if i can find i good linear algebra package for my gpu i might try doing svd or pca, and then we could build in a latent semantic indexing search engine. (i though someone had already lsi'd wikipedia, but i cant seem to find it on the internet) Kevin Baastalk 20:38, 27 February 2010 (UTC)

Version 0.7 - should FINALLY be published soon!

The lengthy cleanup process was completed a couple of weeks back (it took about 6 months), and there is now a clean ZIM file available for the publisher to use. We had been using the "dirty" version for testing the software. The release will probably be sold on a USB flash drive, not DVD; we hope to have a version available for download via BitTorrent too. So look out for a version being shown at Wikimania (though I can't promise!), and available commercially shortly afterwards.

For the future, we hope to use an automated cleanup tool to speed up the time to publication, but that tool is still being written at the moment. Walkerma (talk) 06:16, 18 August 2009 (UTC)

Awesome news. Is there a link to what articles will be included? ♫ Cricket02 (talk) 13:38, 18 August 2009 (UTC)
Yessss! I would love to seed it when you get it out as a torrent. JoeSmack Talk 01:24, 19 August 2009 (UTC)
The article listing (as a set of indexes) is here. (Added later:The topical index has errors!) Joe - thanks for the offer! I was hoping there was someone available to do that. The question is whether it has free software (an offline reader) with it or not. Cheers, Walkerma (talk) 03:49, 19 August 2009 (UTC)
The index appears to be malformated and possibly incomplete. There are some very basic articles completely missing, like Plant, Flower, Tree. I don't see the listings for different specific organisms anywhere. --EncycloPetey (talk) 04:21, 19 August 2009 (UTC)
Yes, the problems with the topical index have already been reported, and should have been fixed for the actual release. Sorry, I should have mentioned that earlier (I added a note). Those three articles are in the collection, but they are missing from the index. It's a formatting issue, dependent on the server settings; I think the index looked fine on a different server. You will find those articles listed in the alphabetical index, for example here. Cheers, Walkerma (talk) 18:59, 22 August 2009 (UTC)
OK, I downloaded the Windows version, but I had a problem getting the Okawix program to open the "corpus" (the collection of articles). It may be a memory/size issue - this is a laptop with pretty basic specs. Linterweb are working on that. Also, User:CBM is working on fixing the index pages. Once we can get these to work, we'd like some people to beta test the release before it goes official. Any volunteers? Thanks, Walkerma (talk) 01:59, 4 September 2009 (UTC)
If you need help from someone who has no idea what "Okawix" is for, you can count on me for beta test. It doesn't have to be completely low tech, I do know the meaning of "download", "memory", and "Windows version". Pknkly (talk) 06:16, 4 September 2009 (UTC)

(Unindent) Sounds great. Okawix is just the publisher's latest name for their offline reader software - you wouldn't know it unless you'd beta tested it already! If you have email via WP enabled, I'll email you the information once the bugs are fixed. Thanks, Walkerma (talk) 06:19, 4 September 2009 (UTC)

I can volunteer, shoot me an email. JoeSmack Talk 03:43, 6 September 2009 (UTC)
I'm still waiting for Linterweb to fix the "slow open" problem; I did get it to open on my desktop PC, but it took a long time to do so. I'll keep you posted. Walkerma (talk) 02:28, 10 September 2009 (UTC)
What will be the size of 0.7 (approximately, in megabytes)? Axl ¤ [Talk] 11:57, 12 September 2009 (UTC)
I heard back from Linterweb - they've done a major rewrite of Okawix. I'm taking a look at the new software, and once we have a stable working version of it I will ask for help. As for the size of the release, it's about 4 GB so it takes a long time to download! (Hence our desire to have a BitTorrent version available). Watch this space! Cheers, Walkerma (talk) 16:54, 16 September 2009 (UTC)
Wow, 4 GB = 4,096 MB!! It's sure take a long time to download that... The availability of BitTorrent version sure is helping others to download it more quickly. Keep us informed then. Also, don't forget to update the info for next version's job if there any. Thanks Martin... Ivan Akira (talk) 08:08, 17 September 2009 (UTC)
Any news? JoeSmack Talk 23:40, 5 October 2009 (UTC)
Yes - finally! We have a beta test ready for you to check here. PLEASE NOTE: There is a bug in the topical index which results in some article links being missing or broken, but everything else should be working. And (IMHO) the interface looks much nicer than in the alpha version. I should point out that the Okawix reader software is no longer GPL/GFDL so it can't be shared (not a situation we're at all happy about, but we can't do anything at this stage until Kiwix is ready). Obviously the content file (the "corpus") can be freely shared. Please give feedback here! Once that is done, and the index fixed, Linterweb will publish this on USB keys (flash drives). Thanks, Walkerma (talk) 15:26, 7 October 2009 (UTC)
Hi, I'm one of the Okawix developers. Just to let you know that Okawix is still open source and distributed under the GPL licence. The project is hosted on Sourceforge: http://sourceforge.net/projects/okawix/ with the sources available in a SVN repository. Guillaumito (talk) 12:41, 8 October 2009 (UTC)

There is an error or bug... I don't know wether wp1-beta1 file is the culprit or client software... but... if you go to article titled "Double play", you will see text like "" depth="10"/>" depth="9"/>" depth="9"/>" depth="10"/>" depth="10"/>" in top of that page which I believe it shouldn't be there. Ivan Akira (talk) 03:50, 16 October 2009 (UTC)

That is very weird! It's not present in the Kiwix version, so it's not just vandalism. BTW: User:Kelson is creating a new ZIM file using the partly-cleaned indexes, and Linterweb will update Version 0.7 using that - then we should be close to publication! Thanks for all your work on the beta tests - it's been very useful so far, I think. Walkerma (talk) 07:16, 23 October 2009 (UTC)
I don't know, but nice to hear about the new release... I'm sorry I can't always follow this discussion (very busy with college tasks), but I'm always try to check the update news about 0.7 release. Ivan Akira (talk) 13:24, 23 October 2009 (UTC)
The FINAL version of the Okawix release seems working well with my computer, it's nice Martin... Ivan Akira (talk) 14:31, 18 November 2009 (UTC)
Great, that's what I've been wanting to hear! I haven't seen any problems beyond one broken link. Are there any others who want to comment? Walkerma (talk) 14:46, 18 November 2009 (UTC)
Hmmm... I'm just curious... If someone want to cite any article from Wikipedia released version 0.7, how can he do that? Because the okawix version doesn't include the Cite this page link or any information that explain citing... Ivan Akira (talk) 11:44, 22 November 2009 (UTC)

Links for beta testing

Before you download, please read all the feedback below, and note that the corpus file is huge (ca 1 GB if I recall).

  • Client file is here
  • Corpus (the article collection) is here.

Thanks! Walkerma (talk) 06:07, 13 October 2009 (UTC)

Correction

I apologize, I had been misinformed - which is excellent news! See Guillaume's comment above, and I have heard from two others as well confirming my mistake. I apologize to Linterweb and the Okawix team over this, I hope I haven't caused any problems for them.

So, we have a fully open source, GPL reader for the wiki, ready for publication! This will make the distribution much more powerful, and allow things like BitTorrent releases to work effectively. Thanks for the correction, Walkerma (talk) 04:30, 9 October 2009 (UTC)

Installing corpus

Can't load the corpus into Okawix. Clicking on Install starts the install process but never seems to finish. I let it run in install mode for over 24 hours and it still says "Please wait while your corpus is being installed". I was able to install and remove the Wiki news corpus. Should I have waited more than 24 hours for the install of the WP Ver 0.7 corpus? Pknkly (talk) 15:32, 11 October 2009 (UTC)

What is the size of the downloaded .okawix file? Did Okawix created a repository with some .zeno, .index and .map files in it? Guillaumito (talk) 13:44, 12 October 2009 (UTC)
Here is the information obtained from doing a "properties" on the files:
file wp1-beta1 size = 1.88 GB (2,024,473,024 bytes)
These are within a folder, called wp1-beta1 size= 1.88 GB (2,024,469,606 bytes), that was created during the install process.
file article.index size=27.4 MB (28,789,206 bytes)
file article.map size=1.34 MB (1,408,418 bytes)
file word.index size=40.4 MB (42,429,830 bytes)
file word.map size=16.7 MB (17,611,112 bytes)
file wp1-beta1.zeno size=1.80 GB (1,934,231,040 bytes)
Also, I don't know if its relevant, but after the 24+ hour install attempt I received a message from Firefox when I opened it. It said something about installing xxxxx (I recognized it may of had something to do with the install). When I clicked on yes, it came back and said it failed because it was not supported by the latest version of Firefox (ver 3.5.3). I could delete everything and try it again so I can get more information, but I will hold onto everything until your feedback. Pknkly (talk) 18:03, 12 October 2009 (UTC)
I received the same message, the addon is 'Linkiwix' and I told it no. Bad bad bad that the offline reader tries to install something (whatever it is) like that without telling. JoeSmack Talk 18:39, 12 October 2009 (UTC)
Yup, the installation have a major problem. Firstly, we must understand how to install by ourself, because there is no manual in the software. Next it install add-on on Mozilla, I confirm that. The other one is, after the installation of "wp1-beta1.okawix" have finished, the Okawix 0.7 always says "wp1-beta1.okawix vanished, do you want to remove it?", which actually it not vanished at all. So when I search for any article and try to read the article, the application always prompt that message and I must always press Cancel to overcome it. I don't know what happen.
Oh yeah Pknkly, the size of wp1-beta1 foler should be 2.37 GB (2,545,856,301 bytes) [because my wp1-beta1.zeno sized 2.28 GB (2,455,617,718 bytes)], and you missed one file that is entry.lnk (17 bytes). Ivan Akira (talk) 22:29, 12 October 2009 (UTC)
Thanks for the feedback. I'll look for your size and file after my next attempt (see below). Pknkly (talk) 10:10, 13 October 2009 (UTC)

Firefox add-on / Linkiwix

I have the same problem. My en.wikipedia corpus(with images included a total of 16GB) stops downloading at around 2 GB. I think that the problem is that the okawix has some opened buffer that has no limitation. If we press Ctrl-Alt-Del we see in the task manager how the Memory Usage keeps growing as the program keeps downloading and the size of the downloaded file is very close to that of the Memory Size displayed in task manager. So the question is why does it stop at around 2 GB(1.8 actualy)? I think it has to do with the fact that for a PC running windows XP that has 1 GB of RAM, the total ammount of page file allocated by the system is, in my case, 2.5 GB. Even if these wouldn't be true i think it would be a great workaround to somehow reinitiate the socket at let's say every 100 MB or so. Another problem would be in systems that have FAT32 filesystem. The maximum size of a file is 4 GB. This also needs a workaround. John(from Romania) —Preceding unsigned comment added by 79.116.67.21 (talk) 08:54, 24 October 2009 (UTC)

I just uploaded a new Okawix build for Windows without the Firefox add-on (Linkiwix) that was installed as part of the installation process. As for the issues with installing the Wikipedia 0.7 content, we're trying to reproduce and fix the problem now. Guillaumito (talk) 07:20, 13 October 2009 (UTC)

I just uploaded yet another build that tries to fix the un-archiving experienced by some of you... now we didn't managed to reproduced it, so we just fixed what we guess was not working. Guillaumito (talk) 09:17, 13 October 2009 (UTC)
Its too bad you were unable to duplicate my symptoms. That may point to a peculiarity within my PC. Nevertheless, because of the changes, I'm ready to try again. I'll delete my old install of Okiwix as well as the old download of the corpus. Then I will download and install Okawix. Do you recommend using Okawix to download and install the corpus or should I download the corpus outside of Okawix and then from within Okawix point to the downloaded corpus? Should it matter? Pknkly (talk) 09:58, 13 October 2009 (UTC)
I would recommend not downloading again, just remove the wp1-beta1 folder not the wp1-beta1.okawix file then use the "add new corpus from local file" feature (found in the drop-down menu) and select the .okawix file. Guillaumito (talk) 11:08, 13 October 2009 (UTC)
Will do. Expect feedback later today. Pknkly (talk) 12:08, 13 October 2009 (UTC)
I will give it a shot of the new client software... And it appear the new client software is smaller (11,019,800 bytes) than before (11,026,885 bytes) so it should be the removal of firefox add-on I presume. Ivan Akira (talk) 11:57, 13 October 2009 (UTC)
I'm already try the new software, so far so good! It run smoothly on my machine. There is no corpus installation error and there is no add-on anymore... Moreover, the HTML text formatting, JavaScript along with the images are seen well... Well done guys. But I not tested any further... Let see if I encounter any other problem... Ivan Akira (talk) 12:20, 13 October 2009 (UTC)
I'm back home from my conference and this morning I just did a download/install on my laptop - which has given me serious installation problems in the past - and everything went smoothly. This is a PC running XP, with an 80GB hard drive that is almost full (only about 5 GB left), and I think around 1GB of memory. The corpus took a while to install (maybe 20 minutes) but I could see the progress meter, so I wasn't worried that it had frozen. Looking good so far! Walkerma (talk) 15:11, 13 October 2009 (UTC)
Mine was around 20 minutes as well, no install probs. Although, it isn't exactly transparent in explaining what to do with the corpus/application - i think most of us did educated guessing. JoeSmack Talk 16:12, 13 October 2009 (UTC)
Thank you all for the feedback and the benchmark install times. I tried again and received the same symptoms. Clearly, the problem is local to my PC. I'll load all of it on another PC I have available and then see if I can isolate the problem on the PC having the problem. Pknkly (talk) 04:08, 14 October 2009 (UTC)
On the other PC the install worked as well as it did for all of you. Went back to problem PC, deleted everything, rebooted, reloaded fresh copies of corpus and Okawix - no more problems. I'm sure I reset the cause of the original symptoms so the cause is gone and will remaine a mystery. Pknkly (talk) 14:49, 14 October 2009 (UTC)
Can someone be kind enough to post screenshots of the program. Thanks, MahangaTalk 20:13, 15 October 2009 (UTC)

Okawix skin

I have tried to edit the okawix css files to have custom colors, namely high contrast to do lots of reading, but i had no luck in modifying the skins and colors at all! i changed all the ffffff entreis to 888888 for example and the background stayed white. any ideas please? thankyou.


I see that explorer titled skin for Okawix is nice, the colors are great, but how about creating a new skin that have a look and feel like encyclopedia or Wikipedia style? Or there is any guideline to make the skin by ourself? Ivan Akira (talk) 00:20, 14 October 2009 (UTC)

There's no really guidelines on how to write a new skin : each skin is a css file with a bunch of png files and are located in the chrome/skin folder. To create a new one, just create a new subfolder named after your skin name and add a line in chrome/chrome.manifest :
skin interfacewiki yourskinname skin/yourskinname/
Guillaumito (talk) 15:37, 14 October 2009 (UTC)
Oww... so that's how to add the custom skin... Maybe if I have time, I will tweak the skin... It looks interesting for me... Ivan Akira (talk) 06:29, 15 October 2009 (UTC)

Index

Are we supposed to be able to find any article using any index? An example - should I be able to find the article Chicago, Illinois in all three of the indexes? Pknkly (talk) 15:17, 14 October 2009 (UTC)

Index by location for Americas, for United States - this seems incomplete. Only shows articles within the Mathematics section and they are all biographies of mathematicians. I was hoping to see a section on "Cities, towns, villages" within which I thought I would be able to find the article on Chicago, Illinois. Pknkly (talk) 15:17, 14 October 2009 (UTC)

Is there a design spec for how the index was to be set up? At this point I don't know if I'm seeing a problem or if I'm trying to find something (e.g., the Chicago article) in a way that was not within the design. Pknkly (talk) 15:17, 14 October 2009 (UTC)

The indexes in this collection are very old versions, and Linterweb needs to compile new versions - they do know that. I think we're waiting for new ZIM files of the topical index. It's clear that the "by location" index in the current corpus is broken - most of the US content is missing! See the latest version of that page here, and you can see how big that page is! For Chicago under US/geography, we have Chicago, Chicago (2002 film), Chicago Board of Trade, Chicago Cubs, Chicago Loop, Chicago Marathon, Chicago River and Chicago metropolitan area. The topical index also needs fixing. Thanks, Walkerma (talk) 19:30, 15 October 2009 (UTC)
OK, we've had an extensive discussion via email, and reached the conclusion that the broken/hidden links in indexes will be fixed, but the incompleteness of some indexes (especially the topical one) won't be fixed until the next release (probably Version 0.8). Walkerma (talk) 07:08, 23 October 2009 (UTC)

Content categories

Since content categories are encyclopedic I expected to see them at the bottom or the end of the articles within Okawix. If it was to be there, it is not. If it wasn't within the design, was it considered and rejected? Pknkly (talk) 16:00, 14 October 2009 (UTC)

I would love these. JoeSmack Talk 16:29, 14 October 2009 (UTC)

Good things within and about Okawix

  • Very quick presentation of linked pages. Pknkly (talk) 16:00, 14 October 2009 (UTC)
  • Fast search results. Pknkly (talk) 16:00, 14 October 2009 (UTC)
  • Overall look of wikipages is intact w/images and infoboxes. JoeSmack Talk 16:31, 14 October 2009 (UTC)
Agree. Pknkly (talk) 12:25, 15 October 2009 (UTC)

Bad things within and about Okawix

  • Clicking on refs sometimes brings you to the refs section sometimes not. JoeSmack Talk 16:31, 14 October 2009 (UTC)
    • Do you have some example of not-working ref links? Guillaumito (talk) 07:42, 18 October 2009 (UTC)
      • It was extremely common, look around. I presume it is a certain format of ref tagging that doesn't work. If you still can't find the problem i'll launch my app, but i got a ton of crap open right now and can't myself. JoeSmack Talk 17:51, 18 October 2009 (UTC)
  • No installation documentation, not very friendly even if there was. JoeSmack Talk 16:31, 14 October 2009 (UTC)
    • Are you talking about Okawix installation or adding content into Okawix? Btw... Okawix is designed so it can "embed" content in its installation; the idea would be to have a custom Okawix installer that would install both Okawix and the Wikipedia selection. Now we have a technical problem, nsis installer (and I guess other installer software as well) doesn't supports files larger than 2Gb so we have to find a way to workaround that limitation. Guillaumito (talk) 07:42, 18 October 2009 (UTC)
      • I'm pretty much in the dark from the beginning. Installing the app was't too hard (although the installer could be friendlier with what/where it is install and explaining info briefly about the app), the corpus was the real weird thing to me. Couldn't double click the file, couldn't find how to load the file from within the application. Ended up dragging and dropping, hoping I was dragging it on the right application in the install folder. Then it just started loading for twenty minutes (no, 'would you like to blah? it'll take a while, yes/no?). Maybe the installer could have an after note about installing corpus files too. If you could include the app w/the corpus that'd be more idiot (me) proof. Besides that, a gentle readme would go miles. Also, computer requirements? JoeSmack Talk 17:51, 18 October 2009 (UTC)
  • Interface is largely icon based (i mean, you can float over it i suppose) with no menu bar. JoeSmack Talk 16:31, 14 October 2009 (UTC)
  • It really slows down my laptop to a crawl, so I'm guessing it's a memory hog - worse than having Firefox, Word and Acrobat all running simultaneously. That is a real issue if this is to be used in developing countries, where computers may have limited RAM. Walkerma (talk) 19:41, 15 October 2009 (UTC)
    • Martin is right it sure slow down client computer. If we're using high-end processor and large space of RAM, we can't see any significat delay. Ivan Akira (talk) 03:43, 16 October 2009 (UTC)

New ZIM file

After several trials, Kelson has finally been able to generate a new ZIM file. This should be the final version ready for publication. I've asked Linterweb to process it, so hopefully we should see what the final version looks like. Thanks to all! Walkerma (talk) 18:49, 1 November 2009 (UTC)

OK, we now have a (hopefully) FINAL version of the Okawix release. Could people please take a look? If you think it looks OK, we can go ahead and publish this. Cheers, Walkerma (talk) 06:19, 10 November 2009 (UTC)
Okawix use Zeno file not Zim File, we have make zim to zeno file to work with okawix Pmartin (talk) 16:35, 19 November 2009 (UTC)
Linterweb is planning to publish the release in the New Year; apparently there wasn't enough time to get that done before the holidays. I'll update here once we have a firm date. Meanwhile, it would be good if people could consider what we need for Version 0.8. Walkerma (talk) 07:21, 15 December 2009 (UTC)
I've downloaded and tried wdb2.okawix and it looks good, generally. However, about "Wikipedia:0.7/0.7geo/Iraq" (the "index of pages related to Iraq"):
  • There are two partly overlapping "Arts, language, and literature" lists.
  • "Babylon 5" and "Paul Anka" seem to appear erroneously on the first of these lists.
Do these issues show a general problem in the categorization of articles?
RickJP (talk) 19:27, 26 December 2009 (UTC)
Another issue: missing image "Gharial.2005-02-26.JPG" on the right of the "Archosauromorphs" box at the bottom of article "Archosauromorphs".
The indexes are a bit of a mess - we know that, but we don't have anyone to work on improving them right now. I know why Babylon 5 went in - it's because the word "Babylon" (or Babylonian etc) is associated with Iraq. I've made a note on our keywords list to exclude Babylon 5 from that. It's an annoying problem - like the fact that Java is mostly associated with 200 million Indonesians living there, but sometimes it refers to a scripting language for computers! As for Paul Anka, I'm baffled! It may be that he performed in Iraq for the US troops, and that got picked up, though the article doesn't mention such a thing now. I also don't understand the reason for the two "Arts..." lists. We also know that many articles have been omitted from indexes that should be in. Hopefully we can do a better job with indexes for Version 0.8. Thanks, Walkerma (talk) 08:46, 12 January 2010 (UTC)

Release date

Has this one finally been released? :) If so, how does one obtain a copy? BOZ (talk) 13:01, 22 January 2010 (UTC)

Any update on this project? Kaldari (talk) 18:58, 26 January 2010 (UTC)
It is now unofficially available for download here (2.4 GB, be warned!). We were told February 1st for an official release date, but that still needs to be confirmed. Cheers, Walkerma (talk) 07:00, 27 January 2010 (UTC)
I confirm the 1 February everythink is ok for us if no problems with the dump Pmartin (talk) 10:05, 27 January 2010 (UTC)
The new okawix still has problems in the geographical indices (compare to comments about the previous drop wdb2.okawix):
  • The problem of a duplicate "Arts, language, and literature" entry is indeed fixed in the Iraq index "Wikipedia:0.7/0.7geo/Iraq". However, all other geographical index entries that I checked and that have an "Arts, language, and literature" topic, have the same duplication.
  • The link to the country page itself is missing from all geographical indices that I checked. Is it possible to change the name of the country at the top of the page to a link (e.g., "This is an index of pages related to Cyprus")?
  • "Babylon 5" is out of the "Iraq" list, but "Paul Anka" is not. There also seem to still be numerous other errors – for example, "English language" under "Niue"; "Red Squirrel" and "Bilberry" under "Estonia"; "Serval" under "Morocco".
RickJP (talk) 17:18, 27 January 2010 (UTC)
Thanks for this feedback, every part helps. I think the "Arts...." wasn't an original bug but one that appeared when fixing earlier bugs (before we had loads of redlinks, broken pages - much worse than now!). But as you point out, there are still loads of errors in the indexes. The person who wrote the code for the indexes is aware that there are some bugs in the code, but he does not have the time to work on fixing them. We really need an experienced programmer to fix the code, but we can't really hold off the release until that happens (We've not had anyone volunteer in the last six months, though I spoke to a one or two people who may take it on). What you may not have noticed is that many articles that should be listed are not. Some errors are more much more difficult to catch, such as the Babylon 5 one for Iraq, and will depend on a manual process of winnowing out the errors one by one. At this point, we have the choice (a) either to put out the release with the index as it is, and hopefully put out a version 0.71 with a (partially) fixed index later on; (b) alternatively, we could put out the release without one or more indexes. I would suggest that a poor index is better than none at all, as long as most listings are correct (which I think they are) - so I've been working towards option (a). I hate to put out a version with known problems, but this is only a test release (hence the Version 0.x designation), and also - as every day ticks by these articles get more and more out of date! What is your opinion? Ultimately, though, the decision is made by our publisher - in this case Pmartin (see above).
People working on offline releases often need test collections like this available for them to test out things like offline readers, offline browser-readable versions, and even statistical analysis. This release will be a great help to those people.
If you - or someone you know - has the technical skills to fix bugs in the code, we would love to get you involved in doing this! I would not want to put out a Version 0.8 until we can get better indexing done. Thanks again, Walkerma (talk) 20:37, 27 January 2010 (UTC)
What kind of technical skills do you need, i.e. what is your code written in? Kaldari (talk) 22:24, 27 January 2010 (UTC)
BTW, I've tried installing the Mac version of Okawix several times. Everytime it tells me that the installation was successful, but nothing gets installed. Kaldari (talk) 22:31, 27 January 2010 (UTC)
I don't have the technical skill to fix bugs. Since it is a test release, I support option (a) above, even leaving the index as-is, listing the known problems in the release notes.
I've been running Okawix for Windows using WINE on Ubuntu Linux, and have encountered no problems with it. RickJP (talk) 17:26, 28 January 2010 (UTC)
Thanks for the feedback. I think I had heard that they were having problems with Mac, so maybe that will not be part of the official launch - in which case that needs to be unbundled from the package! I've reviwed emails, and requested some information on the indexing code - I'm not sure how it was written. Although some problems are clearly bugs in the code, apparently this work is not difficult from a technical standpoint, it just requires a lot of time and TLC to get all the keywords and other connections right. If you can help that would be great. Hopefully Pmartin will comment on the above. Cheers, Walkerma (talk) 18:22, 28 January 2010 (UTC)
The weird thing is that the installer thinks that it is installed on my hard drive now, but I can't find it. It not in the Applications directory and Spotlight can't find anything called Okawix besides the installer. Does it install the program somewhere besides the Applications directory? Kaldari (talk) 19:53, 28 January 2010 (UTC)
Nevermind, I finally found it. It was hidden in a subfolder for some reason. Kaldari (talk) 14:58, 29 January 2010 (UTC)

0.7 bugs

I finally got the Mac version all set up on my machine. A few problems I noticed:

  • The article search doesn't work (although in-page search does)
  • Some biographical articles are sorted by first name (like Abraham Lincoln) and some are sorted by last name (like Woody Allen)
  • "All biographies" only lists A - C.

Kaldari (talk) 17:22, 29 January 2010 (UTC)

This is very useful to know, and we'll have to mention that there are problems with the Mac version in any announcements. Since there has been no reply from Linterweb here, I left a message. Even your installation problem is to be noted - I doubt you would have had the same problem installing Microsoft Word or Adobe Acrobat, indicating that the installation is not as easy as it should be. I sincerely hope that you can help us with Version 0.8! Thanks, Walkerma (talk) 03:17, 1 February 2010 (UTC)
For those having issues with searching on Macs, we may have fixed the problem. Could you test the following (not yet public) release: http://www.okawix.com/install/okawix-0.7-20100202.dmg and report problems/improvements/etc. ? Guillaumito (talk) 16:32, 2 February 2010 (UTC)
Any news? Once we know this works, we can announce the launch. Walkerma (talk) 07:32, 4 February 2010 (UTC)
Not anymore Pmartin (talk) 08:49, 4 February 2010 (UTC)
The Mac version is still being evaluated. Walkerma (talk) 19:13, 28 February 2010 (UTC)