Looks like it is going to be a fascinating paper. I was just wondering how you were going to operationalize “good edits” and “good articles” in your data collection.
- I wonder that myself. (That happens to be my major problem with the paper. Jrincayc 05:53, 11 Dec 2003 (UTC)
Production of Wikipedia Content
The first thing I thought of was to use standard production theory and treat “contributions” as a variable input. Contributions could be defined as some combination of “length of article” plus “number of edits” I guess. Some sort of Total Product curve, Average product curve, and Marginal product curve could be created to indicate the areas of increasing returns to contributions, decreasing returns to contribution, optimum level of contribution, etc. But I don’t know if that really helps you to define “improvement in an article”. I look forward to further installments on this very interesting topic.
- As the number of edits/size of article increases, the onset of diminishing returns to contributions (per article) is complicated by positive network externalities at the systemic level. mydogategodshat 17:45, 12 Dec 2003 (UTC)
- Yes, so somehow any usefull model is going to have to take into account both some approximation on effect to article and effect to encyclopedia. How to seperate the effects is going to be hard. Jrincayc 15:37, 14 Dec 2003 (UTC)
Model V3 Proposal
editMy next idea for a model (the second version was the one that was used in the handed in paper) is to work at the article level. For each article, try and predict the number of edits done to the article. Variables to try and predict from will be number of months since last edit, number of previous edits, number of previous authors, some author/edit interaction terms, various encyclopedia size statistics (total articles, total edits, articles with more than twenty authors ...). This will hopefully be able to tell the encyclopedia's effect on the article, and compare that to the articles effect on the article. This might be able to tease out some of the two seperate effects. Jrincayc 15:37, 14 Dec 2003 (UTC)
- Just one comment: Using a single regression equation with "number of edits" as the dependent variable may entail validity problems. In particular, can "number of edits" really act as a proxy for "quality of article"? Maybe, but there are many very POV articles that receive heavy editing. I suggest you regress the independent variables against "article size" as well as "number of edits", that is, do the procedure twice. Neither article size or # of edits is a really good proxy for article quality, but if you regress against each of them, you will be able to compare.
- OK, another comment now that I think of it: I understand you used OLS. What do you think about using a stepwize regression? This might be useful given that some of your independent variables will have very high explanitory power (such as "number of previous edits"), and some will have very low. It would also be useful in checking your interaction terms.
- mydogategodshat 06:58, 12 Feb 2004 (UTC)