Hi Marjorie,
I was talking about data and resources from the perspective of the maintenance problems they produce rather than intrinsic qualities. But I would still maintain that there is a fundamental epistemological difference between outputs and processes, or as I called
them, data and projects/resources (in broad terms, I'd say the difference between a project and a resource is that the former has an end date and the later doesn't; but both are processes).
To me, the important thing about data is that they exist and they are used for something (that's even built into the etymology of the term). We can distinguish between primary and secondary data, but data are still the stuff you build your interpretation and
analysis from. In English in the humanities we call them "primary and secondary resources," but that is a historical accident: in print, everything is ultimately data, since it is fixed on the page at some point and your maintenance issues become purely archival
(something I would say is a defining feature of data as opposed to processes); while there are resources in my sense in the print world--e.g. journals--the temptation to build resources is less in that world than in the digital, since the commitment is so
much more obvious.
So to give an example. If I am writing an article about the appearance of the phoenix in the OE Phoenix poem, my data are "the poem," the other secondary work I cite, reference works I use, and so on. I'm guessing your analysis would say only "the poem" is
data in this case. But if it were, then I would ask "what do you mean by 'the poem'"? If I use a transcription or an edition, I'm really accessing the text through a secondary source that has analysis built into it. I don't see the difference between basing
my reading of a poem on an edition and basing my analysis of a concept on some secondary work on that concept. You might say that this problem goes away if I base my readings on the manuscript itself, but I'd say you aren't actually changing anything: you
are still basing your readings on an interpretation, the only difference is that you did the textual interpretation upon which your analysis is based yourself instead of relying on that of an outsider.
However, in this thread, we were talking about the project management implications, and I'd say the distinction is even stronger there. Seen from the producer side, data are the outputs of your project or resource which others use for their own work. Seen from
the consumer side, they are information from a project or resource that you inherit, acquire, produce, or extend through further accumulation, and, then, presumably, analyse and use for higher research purposes. The key thing is that data ultimately has to
be in some sense isolatable in order to be used. Data by its very nature represents a snap-shot in time and/or conceptual space.
My argument was that you should take advantage of this property and always strive to ensure that your projects result in something that can be considered in some sense "finished". That's not the same thing as saying "definitive" or "complete": good data leads
to additional questions and revisions. But what I mean is you should always strive to have outputs that you can exist as a snap-shot in time or conceptual space and that could remain useful when you are dead or no longer interested in maintaining them. And
that means getting them in a shape where they can be archived by professionals and don't require active maintenance.
The only reason why this was important is because it is easier in the digital world than in the print to accidentally turn data into resources by adding secondary features that raise almost impossible maintenance burdens. One example of this is deciding that
you want to control who extends your data or how they do it: for example, by deciding you are going to referee future additions or insist that others follow your protocols; another is wrapping your data up so tightly in a specific processing environment or
process that you will lose the data if you fail to maintain the processor.
I guess if I had to sum up, I'd say: data is something that can be archived and resources are things that need to be maintained (and so can't be archived).
My argument was that that resources carry incredibly heavy costs with them and, in the long run, will always fail (Chaucer never did manage to revoke all his licentious tales, after all). So if you decide your data must be published in a form that requires
active subsequent maintenance, realise what you are setting yourself up for and try to design it so that it degrades well when people cease to maintain it. But better, try to distinguish between your resources and data from the very beginning and see all resources
as temporary things.
-dan
From: dm-l-bounces@uleth.ca [dm-l-bounces@uleth.ca] on behalf of Marjorie Burghart [marjorie.burghart@ehess.fr]
Sent: June-22-13 3:37
To: dm-l, MailList
Subject: Re: [dm-l] Re: How to make your data live forever (and maybe your project?)
Hi Dan!
Maybe this is a bit side-tracked, but I would argue with the definition of data and resource that you give (data most of all). To me data is raw, primary material, and I am not comfortable considering articles, monographs, dictionaries or edited
texts as data, for instance. They are an elaborate, secondary material, they are knowledge produced from data, but not data themselves.
As for resource, to me it can be a simple means to access data or more elaborate material, but that's not my main definition of a resource. I would call a resource any coherent set of material, primary or secondary - for instance to me the Online
Froissart is a resource on Froissart's chronicle; I would also call a resource, to a certain extent, a project providing users with nifty means of processing a set of data (for instance, the project preparing the digital edition of Flaubert's "Bouvard et Pécuchet"
put a lot of efforts into building an interface that would let the user navigate through Flaubert's material for his unfinished novel and make hypothesis about its potential construction - an interface which is fully part of the project).
It seems to me that there are more projects aiming at producing resources rather than data, which can explain why they are so difficult to maintain. The coherence would be lost if the material was just poured and melted into a large data repository,
or the data would lose most of its interest if separated from the specific tools created by a project to process it.
Maybe this distinction can shed some different light on the issue: curation of secondary material is a long-established tradition, through libraries, but curation of data is a different kettle of fish. There are not powerful pre-existing traditions
and models as for secondary material, and the digital lore has to invent them quickly. As for resources (according to my definition), their inherent coherence and the often very strong link between data/material and the interface created to use it means that
maintaining the interface is a often central issue, and one that is particularly difficult to solve in the long term.