Dear Digital Medievalists and TEI members,
I have a question about preservation of digital content especially medieval manuscripts. I am writing a small article on the topic and I have consulted a lot sources (papers, handbooks) but most of them do not say anything about the "life span" of the data in specific formats. To clarify this - a .doc file crated in 1995. Will be most likely unreadable in 2010. What about other formats? Has anyone done some research on "life span" of a specific version of digital formats and when it becomes clear that the new version and the old one are not compatible anymore? And here I am talking about pdf, rtf, doc (and all office files), djvu, tiff, jpg , mpg etc. (texts and images especially)
In my work I am also making a small remark on XML as a data container since it is, in my opinion, the best way to go and the standard will surely be around for years. But what kind of steps do you make to ensure the preservation of documents that have been encoded in xml
I would also like to hear if there are opposing views on xml.
I also have the same question about the media. I found some research about the longevity of CDs and DVDs but I am also interested in other media like older hard disks, zip drives and magnetic media.
I am sorry to bother you with this, but I can use any help I can get
Thank you in advance
Daniel Mondekar
Stuart Lee and I wrote an article that touched on this a little in Gail Owen's book on Anglo-Saxon Manuscripts. And I had a couple of brief columns on different aspects of the problem Heroic Age a little while back.
I'm not 100% sure I share your premises, BTW. I think most things from 1995 would still be recoverable, if you knew the right software. I recently helped a colleague restore a whole bunch of very old WordPerfect files using Open Office. And while I've not tried a 1995 .doc file in it, I'd be amazed if it couldn't read it.
My rule of thumb is anything for a PC or Mac is recoverable, no matter how old, unless it is in a minor proprietary format. So most image files, most WordPerfect and Word files, I'm guessing most Wordstar files should be fine. I'd have my doubts about ChiWriter files, though that might have been a pre-PC program for the SuperPet. As a rule, you're better off in recovery with something like OpenOffice, since the stakes surrounding compatibility are much higher for them than for proprietary software like Word: people use Word whether or not it reads other formats, but nobody would use OpenOffice if it didn't read Word. In face in the case of old .doc files, Open Office was a better interpreter than Word: when I was typesetting Caedmon's Hymn, which I did in Word from SGML masters, I had some trouble where Word would get confused in displaying complex tables. Opening the files in OO and then saving them immediately as Word Files again was usually enough to solve the problem.
It is useful to read Nicholas Barker on preservation anxiety, BTW. I think a lot of what he says about misplaced fears of obsolescence with regard to 19th Century paper is also true of things like CD-ROMs and file formats. You'd be amazed how much works just fine.
-dan
On 10-07-28 05:12 PM, Daniel Mondekar wrote:
Dear Digital Medievalists and TEI members,
I have a question about preservation of digital content especially medieval manuscripts. I am writing a small article on the topic and I have consulted a lot sources (papers, handbooks) but most of them do not say anything about the “life span” of the data in specific formats. To clarify this – a .doc file crated in 1995. Will be most likely unreadable in 2010. What about other formats? Has anyone done some research on “life span” of a specific version of digital formats and when it becomes clear that the new version and the old one are not compatible anymore? And here I am talking about pdf, rtf, doc (and all office files), djvu, tiff, jpg , mpg etc. (texts and images especially)
In my work I am also making a small remark on XML as a data container since it is, in my opinion, the best way to go and the standard will surely be around for years. But what kind of steps do you make to ensure the preservation of documents that have been encoded in xml
I would also like to hear if there are opposing views on xml.
I also have the same question about the media. I found some research about the longevity of CDs and DVDs but I am also interested in other media like older hard disks, zip drives and magnetic media.
I am sorry to bother you with this, but I can use any help I can get
Thank you in advance
Daniel Mondekar
Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.org News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.ca Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
On Wed, 28 Jul 2010 "O'Donnell, Dan" daniel.odonnell@uleth.ca wrote:
I'd have my doubts about ChiWriter files, though that might have been a pre-PC program for the SuperPet.
This was a popular PC program for writing scientific, especially mathematical texts:
http://www.horstmann.com/ChiWriter/
Its internal format was rather simple and easy to reverse-engineer, you can find its short description e.g. at
http://mirror.ctan.org/support/chi2tex/read.me
I should have still the C source code of a converter to TeX by Horstman
http://www.tug.org/TUGboat/Articles/tb12-3-4/tb33horstman.pdf
but the legal status of it is not clear for me. I've received it indirectly without any conditions, but later its author started to sell the program. This
http://www.ctan.org/tex-archive/support/chi2tex/
contains unfortunately only binaries (which you can however still run, at least in principle, under Free-DOS in a virtual machine).
This
http://www.ctan.org/tex-archive/support/chi2ltx/
may also be of some use.
Best regards
JSB
Dear Daniel,
The issue of long-term preservation of digital content coming from medieval manuscripts as the source is, as far as I can tell, exactly the same as it is for any other digital data anywhere, and vast amounts of ink (real and virtual) has been used discussing the issue. Although it's a few years old now, you might start with the article "Architecture and Technologies for Trusted Digital Repositories," Jantz and Giarlo, D-Lib Magazine 2005 ( http://www.dlib.org/dlib/june05/jantz/06jantz.html - not really as technical as it may sound, and includes some important definitions), then move your research on from there. Although I'm not sure that there are many real and true TDRs even now, it's a fine ideal to start with. Although Dan O'Donnell is correct that many older file formats are still able to be read far past the time we might expect it, the issue of "digital preservation" is much more extensive than simply "can I still have access to the data on this file" (at least, if you ask a librarian about it :-) ).
Dot
On Thu, Jul 29, 2010 at 1:12 AM, Daniel Mondekar mondekar@hrstud.hr wrote:
Dear Digital Medievalists and TEI members,
I have a question about preservation of digital content especially medieval manuscripts. I am writing a small article on the topic and I have consulted a lot sources (papers, handbooks) but most of them do not say anything about the “life span” of the data in specific formats. To clarify this – a .doc file crated in 1995. Will be most likely unreadable in 2010. What about other formats? Has anyone done some research on “life span” of a specific version of digital formats and when it becomes clear that the new version and the old one are not compatible anymore? And here I am talking about pdf, rtf, doc (and all office files), djvu, tiff, jpg , mpg etc. (texts and images especially)
In my work I am also making a small remark on XML as a data container since it is, in my opinion, the best way to go and the standard will surely be around for years. But what kind of steps do you make to ensure the preservation of documents that have been encoded in xml
I would also like to hear if there are opposing views on xml.
I also have the same question about the media. I found some research about the longevity of CDs and DVDs but I am also interested in other media like older hard disks, zip drives and magnetic media.
I am sorry to bother you with this, but I can use any help I can get
Thank you in advance
Daniel Mondekar
Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.org News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.ca Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
Dear Daniel, Dan, Dot,
I very much agree with Dot and Dan. There's also the issue, in addition to file format and medium, of the preservation of user experience (UX) and user interface (UI), specifically when materials are specifically presented as editions and archives. While care must be taken to choose formats that will withstand the test of time, such as TEI, I would argue that we, as a community, need to address the issue of UX and UI preservation (which to my mind can only be approached through emulation). Upfront decisions, though, about file formats can greatly influence this, though. Take for example, Dan O'Donnell's edition of Caedmon's Hymn. Since nothing in it can't be rendered by a standard web browser, it is cross platform and should be compatible with browsers for some time to come. This is not so with platform-specific editions, such as those done on CD-ROM that only work, for example, on Mac OS 9. (I recently had to acquire a computer running OS 9 so I could take a look at some of these.)
✍ Best, Grant Simpson ¶ Senior Analyst/Programmer, Office of the Registrar ¶ Vice President, IU Bloomington Professional Council ¶ Interim Chair, MEST Graduate Student Advisory Committee ¶ Doctoral Student, Department of English Indiana University Bloomington
On Jul 29, 2010, at 2:25 AM, Dot Porter wrote:
Dear Daniel,
The issue of long-term preservation of digital content coming from medieval manuscripts as the source is, as far as I can tell, exactly the same as it is for any other digital data anywhere, and vast amounts of ink (real and virtual) has been used discussing the issue. Although it's a few years old now, you might start with the article "Architecture and Technologies for Trusted Digital Repositories," Jantz and Giarlo, D-Lib Magazine 2005 (http://www.dlib.org/dlib/june05/jantz/06jantz.html - not really as technical as it may sound, and includes some important definitions), then move your research on from there. Although I'm not sure that there are many real and true TDRs even now, it's a fine ideal to start with. Although Dan O'Donnell is correct that many older file formats are still able to be read far past the time we might expect it, the issue of "digital preservation" is much more extensive than simply "can I still have access to the data on this file" (at least, if you ask a librarian about it :-) ).
Dot
On Thu, Jul 29, 2010 at 1:12 AM, Daniel Mondekar <mondekar@hrstud.hrmailto:mondekar@hrstud.hr> wrote: Dear Digital Medievalists and TEI members,
I have a question about preservation of digital content especially medieval manuscripts. I am writing a small article on the topic and I have consulted a lot sources (papers, handbooks) but most of them do not say anything about the “life span” of the data in specific formats. To clarify this – a .doc file crated in 1995. Will be most likely unreadable in 2010. What about other formats? Has anyone done some research on “life span” of a specific version of digital formats and when it becomes clear that the new version and the old one are not compatible anymore? And here I am talking about pdf, rtf, doc (and all office files), djvu, tiff, jpg , mpg etc. (texts and images especially) In my work I am also making a small remark on XML as a data container since it is, in my opinion, the best way to go and the standard will surely be around for years. But what kind of steps do you make to ensure the preservation of documents that have been encoded in xml I would also like to hear if there are opposing views on xml. I also have the same question about the media. I found some research about the longevity of CDs and DVDs but I am also interested in other media like older hard disks, zip drives and magnetic media. I am sorry to bother you with this, but I can use any help I can get Thank you in advance Daniel Mondekar
Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.orghttp://digitalmedievalist.org/ News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.camailto:dm-l@uleth.ca Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
-- *~*~*~*~*~*~*~*~*~*~* Dot Porter (MA, MSLS) Digital Medievalist, Digital Librarian Email: dot.porter@gmail.commailto:dot.porter@gmail.com *~*~*~*~*~*~*~*~*~*~* Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.orghttp://digitalmedievalist.org News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.camailto:dm-l@uleth.ca Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
If I can give away the game, a little, we have a very interesting article on this in the dm-j pipeline. More hopefully soon.
-dan
On 10-07-29 05:28 AM, Simpson, Grant Leyton wrote:
Dear Daniel, Dan, Dot,
I very much agree with Dot and Dan. There's also the issue, in addition to file format and medium, of the preservation of user experience (UX) and user interface (UI), specifically when materials are specifically presented as editions and archives. While care must be taken to choose formats that will withstand the test of time, such as TEI, I would argue that we, as a community, need to address the issue of UX and UI preservation (which to my mind can only be approached through emulation). Upfront decisions, though, about file formats can greatly influence this, though. Take for example, Dan O'Donnell's edition of Caedmon's Hymn. Since nothing in it can't be rendered by a standard web browser, it is cross platform and should be compatible with browsers for some time to come. This is not so with platform-specific editions, such as those done on CD-ROM that only work, for example, on Mac OS 9. (I recently had to acquire a computer running OS 9 so I could take a look at some of these.)
✍ Best, Grant Simpson ¶ Senior Analyst/Programmer, Office of the Registrar ¶ Vice President, IU Bloomington Professional Council ¶ Interim Chair, MEST Graduate Student Advisory Committee ¶ Doctoral Student, Department of English Indiana University Bloomington
On Jul 29, 2010, at 2:25 AM, Dot Porter wrote:
Dear Daniel,
The issue of long-term preservation of digital content coming from medieval manuscripts as the source is, as far as I can tell, exactly the same as it is for any other digital data anywhere, and vast amounts of ink (real and virtual) has been used discussing the issue. Although it's a few years old now, you might start with the article "Architecture and Technologies for Trusted Digital Repositories," Jantz and Giarlo, D-Lib Magazine 2005 (http://www.dlib.org/dlib/june05/jantz/06jantz.html - not really as technical as it may sound, and includes some important definitions), then move your research on from there. Although I'm not sure that there are many real and true TDRs even now, it's a fine ideal to start with. Although Dan O'Donnell is correct that many older file formats are still able to be read far past the time we might expect it, the issue of "digital preservation" is much more extensive than simply "can I still have access to the data on this file" (at least, if you ask a librarian about it :-) ).
Dot
On Thu, Jul 29, 2010 at 1:12 AM, Daniel Mondekar <mondekar@hrstud.hr mailto:mondekar@hrstud.hr> wrote:
Dear Digital Medievalists and TEI members, I have a question about preservation of digital content especially medieval manuscripts. I am writing a small article on the topic and I have consulted a lot sources (papers, handbooks) but most of them do not say anything about the “life span” of the data in specific formats. To clarify this – a .doc file crated in 1995. Will be most likely unreadable in 2010. What about other formats? Has anyone done some research on “life span” of a specific version of digital formats and when it becomes clear that the new version and the old one are not compatible anymore? And here I am talking about pdf, rtf, doc (and all office files), djvu, tiff, jpg , mpg etc. (texts and images especially) In my work I am also making a small remark on XML as a data container since it is, in my opinion, the best way to go and the standard will surely be around for years. But what kind of steps do you make to ensure the preservation of documents that have been encoded in xml I would also like to hear if there are opposing views on xml. I also have the same question about the media. I found some research about the longevity of CDs and DVDs but I am also interested in other media like older hard disks, zip drives and magnetic media. I am sorry to bother you with this, but I can use any help I can get Thank you in advance Daniel Mondekar Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.org <http://digitalmedievalist.org/> News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.ca <mailto:dm-l@uleth.ca> Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
-- *~*~*~*~*~*~*~*~*~*~* Dot Porter (MA, MSLS) Digital Medievalist, Digital Librarian Email: dot.porter@gmail.com mailto:dot.porter@gmail.com *~*~*~*~*~*~*~*~*~*~* Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.org http://digitalmedievalist.org News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.ca mailto:dm-l@uleth.ca Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.org News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.ca Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
On 10-07-29 12:25 AM, Dot Porter wrote:
Although Dan O'Donnell is correct that many older file formats are still able to be read far past the time we might expect it, the issue of "digital preservation" is much more extensive than simply "can I still have access to the data on this file" (at least, if you ask a librarian about it :-) ).
Which is why you never should ;) Seriously, though, you're right, of course.
Dot
On Thu, Jul 29, 2010 at 1:12 AM, Daniel Mondekar <mondekar@hrstud.hr mailto:mondekar@hrstud.hr> wrote:
Dear Digital Medievalists and TEI members, I have a question about preservation of digital content especially medieval manuscripts. I am writing a small article on the topic and I have consulted a lot sources (papers, handbooks) but most of them do not say anything about the “life span” of the data in specific formats. To clarify this – a .doc file crated in 1995. Will be most likely unreadable in 2010. What about other formats? Has anyone done some research on “life span” of a specific version of digital formats and when it becomes clear that the new version and the old one are not compatible anymore? And here I am talking about pdf, rtf, doc (and all office files), djvu, tiff, jpg , mpg etc. (texts and images especially) In my work I am also making a small remark on XML as a data container since it is, in my opinion, the best way to go and the standard will surely be around for years. But what kind of steps do you make to ensure the preservation of documents that have been encoded in xml I would also like to hear if there are opposing views on xml. I also have the same question about the media. I found some research about the longevity of CDs and DVDs but I am also interested in other media like older hard disks, zip drives and magnetic media. I am sorry to bother you with this, but I can use any help I can get Thank you in advance Daniel Mondekar Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.org <http://digitalmedievalist.org> News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.ca <mailto:dm-l@uleth.ca> Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
-- *~*~*~*~*~*~*~*~*~*~* Dot Porter (MA, MSLS) Digital Medievalist, Digital Librarian Email: dot.porter@gmail.com mailto:dot.porter@gmail.com *~*~*~*~*~*~*~*~*~*~*
Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.org News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: dm-l@uleth.ca Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
There's a usefull list of some "canonical" references in the discussion on sustainablity, a "souvenir" of a paper by Bella Millet to be printed out for reasons of preservation: http://www.i-d-e.de/wordpress/wp-content/uploads/2010/05/bella-millett-s ustainable-souvenir-eets-2010.pdf
Franz
Dear Daniel,
I understand your question of 'preservation of digital content'. As far as my memory serves, it seems to be a Microsoft specific problem. DOC format, native to MS, has gone through two 'major' changes in the last two decades, as MS Word goes on improving. The .doc file created by Word 97 was unreadable on older versions; but the one created by Word 95 and older was readable on Word 97. It's called downward-compatibility, designed to serve 'business ethic' for not ruining the valuable digital content. The same problem happened again when Word 2007 was released. Word 2007 is downward-compatible, whilst Word 2007's specific doc files are unreadeable on Word 2003. However, I have no idea whether Word 2007 reads the doc files before Word 97 or not.
Other popular formats have also gone through improvements, though compatibility issues are barely heard. I remember when I opened a PDF file created by newer version of Photoshop with an older version programme, the file was properly opened despite the message 'Some information will lose, etc'. That's why I say the compability issue is probably a MS specific issue.
I think, the life of a popular file format, e.g. jpg or mpg, is rather long. There was a debate over compatibility when MS was planning its 2nd generation GUI, i.e. Win 95, to succeed Win 3.x. They seem to have come to an agreement about downward-compatibility, as stated. That's why many 20-yr old formats are still in use and 20-yr old files of those formats are still readable. For example, JPEG format was there when I was in high school. Now I have no difficulty reading those archaic files on this computer, though their 65k-colour palette violates my eyes.
I wish I can say something about XML, which is way too modern for a historian, um, politically. Personally I like databases and .txt format more than new standards, only because I am used to them.
So far CD and DVD are the most reliable media. Their life span is longer than 15 years as long as they are treated tenderly. Hard Drive is efficient when it is cool. It, however, can turn into a nightmare when it is naughty. That's why IT experts suggest everybody to make backup CD/DVDs of the HD. Older harddisks are useable as long as they are with IDE interface and NOT broken. The average life span of the older generation HD, says, 20G, is like 5 years. Don't shake it and don't feed it water, it may live longer. It is hard to tell how long IDE interface will survive though, as SATA is getting popular. ZIP drives! It was out in the market for maybe a half year? It was gone immediately when CD-R was commercialised. Magnetic tapes were terminated by CD-R, too.
Whatever media you use, regular backup is the rule. Hope this helps.
Best wishes
============================
Gerald Liu
PhD student in medieval history, Durham Working on late medieval manorial management and farm workers. Personal website http://www.durham.ac.uk/gerald.liu/
-----Original Message----- From: dm-l-bounces@uleth.ca on behalf of Daniel Mondekar Sent: Thu 29/07/2010 00:12 To: dm-l@uleth.ca Subject: [dm-l] Question about preservation of digital content
Dear Digital Medievalists and TEI members,
I have a question about preservation of digital content especially medieval manuscripts. I am writing a small article on the topic and I have consulted a lot sources (papers, handbooks) but most of them do not say anything about the "life span" of the data in specific formats. To clarify this - a .doc file crated in 1995. Will be most likely unreadable in 2010. What about other formats? Has anyone done some research on "life span" of a specific version of digital formats and when it becomes clear that the new version and the old one are not compatible anymore? And here I am talking about pdf, rtf, doc (and all office files), djvu, tiff, jpg , mpg etc. (texts and images especially)
In my work I am also making a small remark on XML as a data container since it is, in my opinion, the best way to go and the standard will surely be around for years. But what kind of steps do you make to ensure the preservation of documents that have been encoded in xml
I would also like to hear if there are opposing views on xml.
I also have the same question about the media. I found some research about the longevity of CDs and DVDs but I am also interested in other media like older hard disks, zip drives and magnetic media.
I am sorry to bother you with this, but I can use any help I can get
Thank you in advance
Daniel Mondekar