>>> Abdullah.Alger-2(a)postgrad.manchester.ac.uk 06/16/05 11:42 AM
Compared to all of the other concordancing tools I think that Watt's is
the
simplest to use. Also, what's great about it is that it can handle
characters such as <thorn> and <eth>. A nice feature in the program is
that you can save the results as an html document, but the drawback is
that you cannot save it in xml or any other format except as text.
>>>Are there any concordance programs that allow you to convert to xml?
Abdullah Alger
There are several RTF to XML conversion applications online including:
RTF2F0 XML Converter
http://www.rtf2fo.com/features.html
RTF to XML
http://www.rtf-to-xml.com/features.html
RTF to XML 5.2.1
http://www.programmersheaven.com/zone16/cat290/35143.htm
I must admit I have not used these, so I cannot vouch for their
effectiveness (or lack thereof); however, opening a text file (from
Watt's Concordance) in Word, saving as RTF, then using a converter to
create an XML file might work.
---Laurie
Laurie Ringer
Assistant Professor of English
Canadian University College
Lacombe, AB T4L 2E5
(phone) 403.782.3381, ext. 4085
(fax) 403.782.0735
Quoting Godfried Croenen <g.croenen(a)liverpool.ac.uk>:
> Hi Laurie,
>
> I am still using TACT 1.2 when I need a concordance and I am happy to
> answer TACT queries if I can.
>
> I was not aware of R.J.C. Watt's programme Concordance, although I had
a
> look at the website and will try it out. But I can see already a
number of
> difficult problems with my corpus, as I have often encoded page breaks
or
> line breaks in the middle of words, which the programme apparently
cannot
> handle.
>
> I have also used WordSmith tools and find it useful, although it is a
> different kind of programme, mainly aimed for doing corpus linguistics
and
> hence not that good in formating and referencing the texts sections.
>
> Maybe you should also try out the TAPOR tool at
> <http://taporware.mcmaster.ca/>
>
> Best,
>
> Godfried
>
>
> --On 15 June 2005 13:25 -0600 Laurie Ringer <lringer(a)CAUC.CA> wrote:
>
>> I am producing a concordance of the English vernacular texts that
>> scholarship allows as Wycliffite or Lollard in persuasion. I would
like
>> to add more texts, and am attempting to work out a few issues on
which I
>> wondered if anyone might have advice.
>>
>> For information I have recently been using R.J.C. Watt's programme
>> Concordance (http://www.concordancesoftware.co.uk/); however, due to
a
>> significant problem with hyphenated words---Watt's Help file
>> specifically states that it does not treat hyphenated words, which
are
>> divided between 2 lines, as single words---I am thinking of switching
>> back to TACT or to another programme.
>>
>> It's been some years since I used TACT. Is anyone fluent in TACT and
>> willing to field the odd question or two which Ian Lancashire's book
>> Using TACT with Electronic Texts does not answer? Or, alternatively,
can
>> anyone recommend a better programme?
>>
>> Line numbering: Aside from keying in line numbers by hand (which I
have
>> been doing), is there a macro or application that can automate the
line
>> numbering process in large numbers of texts in Word or Word Pad?
>>
>> Page numbering: As above, is there a macro or application that can
>> automate the page numbering process in large numbers of texts? NB:
the
>> end of the printed page in electronic format rarely corresponds with
the
>> end of a Word or Word Pad page.
>>
>> Many thanks for any suggestions anyone might be able to make.
>> ---Laurie
>>
>> Laurie Ringer
>> Assistant Professor of English
>> Canadian University College
>> Lacombe, AB T4L 2E5
>> (phone) 403.782.3381, ext. 4085
>> (fax) 403.782.0735
>>
>> _______________________________________________
>> Digital Medievalist Project
>> Homepage: http://www.digitalmedievalist.org
>> Journal (Spring 2005-): http://www.digitalmedievalist.org/journal.cfm
>> RSS (announcements) server:
http://www.digitalmedievalist.org/rss/rss2.cfm
>> Wiki: http://sql.uleth.ca/dmorgwiki/index.php
>> Change membership options:
http://listserv.uleth.ca/mailman/listinfo/dm-l
>> Submit RSS announcement:
http://www.digitalmedievalist.org/newitem.cfm
>> Contact editorial Board: digitalmedievalist(a)uleth.ca
>> dm-l mailing list
>> dm-l(a)uleth.ca
>> http://listserv.uleth.ca/mailman/listinfo/dm-l
>
>
>
> ----------------------
> Dr. Godfried Croenen
> School of Modern Languages, French Section
> University of Liverpool
> Chatham Street
> Liverpool
> L69 7ZR
>
> Tel: +44 (0)151 794 2763
> Fax: +44 (0)151 794 2357
> e-mail: G.Croenen(a)Liverpool.ac.uk
>
>
> _______________________________________________
> Digital Medievalist Project
> Homepage: http://www.digitalmedievalist.org
> Journal (Spring 2005-): http://www.digitalmedievalist.org/journal.cfm
> RSS (announcements) server:
http://www.digitalmedievalist.org/rss/rss2.cfm
> Wiki: http://sql.uleth.ca/dmorgwiki/index.php
> Change membership options:
http://listserv.uleth.ca/mailman/listinfo/dm-l
> Submit RSS announcement: http://www.digitalmedievalist.org/newitem.cfm
> Contact editorial Board: digitalmedievalist(a)uleth.ca
> dm-l mailing list
> dm-l(a)uleth.ca
> http://listserv.uleth.ca/mailman/listinfo/dm-l
>
_______________________________________________
Digital Medievalist Project
Homepage: http://www.digitalmedievalist.org
Journal (Spring 2005-): http://www.digitalmedievalist.org/journal.cfm
RSS (announcements) server:
http://www.digitalmedievalist.org/rss/rss2.cfm
Wiki: http://sql.uleth.ca/dmorgwiki/index.php
Change membership options:
http://listserv.uleth.ca/mailman/listinfo/dm-l
Submit RSS announcement: http://www.digitalmedievalist.org/newitem.cfm
Contact editorial Board: digitalmedievalist(a)uleth.ca
dm-l mailing list
dm-l(a)uleth.ca
http://listserv.uleth.ca/mailman/listinfo/dm-l
I am the project director of the Electronic Grosseteste, a research resource that provides access to electronic medieval Latin texts and an integrated bibliography. The textbase is composed of a variety of Latin texts (most of them under copyright but still searchable). Right now the search engine is pretty primitive, and one enhancement I would like to make is to account for orthographical variants in the texts. Some texts were classicized, while other editors followed either the orthography of a single manuscript or attempted to follow some sort of convention based generally on Latin texts in later medieval England (these are the facts, and this post is not about the joy of debating editorial practice). Ideally, I would like to allow searches to include returns for classical and "medieval" spellings. For example, if a user queried "scientia" the engine would return matches for "scientia" and "sciencia". (wildcards are permitted, btw).
Now I work in Perl5, and so my initial thought was to create a set of hash tables that would map these variants since hashes would allow for more than one variant per entity, and the engine would then perform a lookup for each query element. Now I suppose coding into the engine the "orthographical rules" is another option, but I'll be honest and admit that computational linguistics has never been my thing. And, the beauty of hashes in Perl is that they are compiled very quickly, and don't eat too much memory.
Now before I go and reinvent the wheel with these hash tables, does anyone know of an open-source method or resource that addresses this kind of problem (I know that Brepols--pardon me, Brepolis...yeesh---has this all figured out but they don't play will with others, so that's a closed door.). My limited scouring of the web has yielded no joy, and so I seek the sage advice of this community.
Many thanks
Jim
--------------------
Dr James R. Ginther, PhD
Assoc. Professor of Medieval Theology
& Director of Graduate Studies
Dept of Theological Studies
St Louis University
ginthej(a)slu.edu
---------------------------------
dept: http://theology.slu.edu/
research: http://www.grosseteste.com/
I use subscriptions as an informal topic popularity meter: most postings
on dm-l result in an average of 5 new subscriptions and 3
unsubscriptions. Postings on XSLT result in 5 unsubscriptions and 3
subscriptions ;). Interestingly, posting the unicode website resulted in
no loss and some gain. Perhaps a gauge of usefulness?
-dan
--
Daniel Paul O'Donnell
Associate Professor of English
Director, Digital Medievalist Project
University of Lethbridge
Lethbridge AB T1K 3M4
Vox: +1 403 381-2539
Fax: +1 403 382-7191
URL: http://people.uleth.ca/~daniel.odonnell/
Digital Medievalist Project: http://www.digitalmedievalist.org/
I find it useful - and I was unaware of this site. I usually use the Unicode charts, but that can be tedious (since there are now five charts for the Latin alphabet).
Thanks, Dan, for a great bookmark!
Dot
-----Original Message-----
From: James Cummings <James.Cummings(a)computing-services.oxford.ac.uk>
To: Digital Medievalist Community mailing list <dm-l(a)uleth.ca>
Date: Fri, 10 Jun 2005 10:12:52 +0100
Subject: Re: [dm-l] Letter database: languages, character sets, names etc
Daniel Paul O'Donnell wrote:
> I'm not sure if members of this list would find this type of e-mail
> useful (please let me know if you do... or don't), but here goes:
I find it useful.
>
> A common problem in text encoding is locating the correct codes for
> "unusual letters". There are various utilities for doing this in
> windows, mac, and Linux. But here is a useful web-based utility.You can
> use it to look up character names and find their code point (though you
> do have to be fairly precise), and it will produce the correct number in
> hex and decimal formats. It will also tell you everything you ever
> wanted to know about characters required for encoding Estonian.
>
> http://www.eki.ee/letter/
Well, ok, I actually I knew about this particular site. I've used that and
of course there is the unicode site itself, especially the charts page. Also,
most linux distributions contain a graphical character-map utitilty that is
searchable.
On of the things out of unicode recently is their report:
http://www.unicode.org/reports/tr22/
on CharMapML = Character Mapping Markup Language.
Readers might also be interested in drafts of:
TEI P5 Draft Chapter 4: Language and Character Sets:
http://www.tei-c.org/P5/Guidelines/CH.html
and
TEI P5 Draft Chapter 25: Representation of non-standard characters and glyphs
http://www.tei-c.org/P5/Guidelines/WD.html
Just thought I'd add that in to Dan's comment.
-James
--
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk
_______________________________________________
Digital Medievalist Project
Homepage: http://www.digitalmedievalist.org
Journal (Spring 2005-): http://www.digitalmedievalist.org/journal.cfm
RSS (announcements) server: http://www.digitalmedievalist.org/rss/rss2.cfm
Wiki: http://sql.uleth.ca/dmorgwiki/index.php
Change membership options: http://listserv.uleth.ca/mailman/listinfo/dm-l
Submit RSS announcement: http://www.digitalmedievalist.org/newitem.cfm
Contact editorial Board: digitalmedievalist(a)uleth.ca
dm-l mailing list
dm-l(a)uleth.ca
http://listserv.uleth.ca/mailman/listinfo/dm-l
***************************************
Dorothy Carr Porter, Program Coordinator
Collaboratory for Research in Computing for Humanities
University of Kentucky
351 William T. Young Library
Lexington, KY 40506
dporter(a)uky.edu 859-257-9549
***************************************
I'm not sure if members of this list would find this type of e-mail
useful (please let me know if you do... or don't), but here goes:
A common problem in text encoding is locating the correct codes for
"unusual letters". There are various utilities for doing this in
windows, mac, and Linux. But here is a useful web-based utility.You can
use it to look up character names and find their code point (though you
do have to be fairly precise), and it will produce the correct number in
hex and decimal formats. It will also tell you everything you ever
wanted to know about characters required for encoding Estonian.
http://www.eki.ee/letter/
See also the Digital Medievalist Wiki entry for character encoding
http://sql.uleth.ca/dmorgwiki/index.php/Fonts (to which I have just
added information about this site).
-dan
I sent this message to Medtext-l this morning, but it's of interest to this list, too. I've never used HEML, but it looks neat:
Take a look at the Historical Event Markup and Linking Project (http://www.heml.org/). The stated goal of HEML is "to define XML elements that expose and outline historical events asserted in documents across the web and to parse and display these elements in interesting and useful ways." It's a markup system, not software, and you'd probably have to use some XML editing software to create your HEML documents, rather than any HEML-specific software, but it does appear to enable linking to digital objects using the <Evidence> element.
HEML is designed to be combined with other markup languages, so if you already have your information in some form of XML (a TEI list, for example) you can add HEML markup on top of that using the heml: namespace.
HEML also requires XSLT and/or SVG for viewing.
This may be more or less than what you need, but check out the example files. They're pretty cool.
Dot
-----Original Message-----
From: "Daniel Paul O'Donnell" <daniel.odonnell(a)uleth.ca>
To: dm-l(a)uleth.ca
Date: Wed, 08 Jun 2005 00:06:26 -0600
Subject: [dm-l] [Fwd: Fwd: Re: timelines]
Forwarded from Medtext-l. Does anybody know an answer?
***************************************
Dorothy Carr Porter, Program Coordinator
Collaboratory for Research in Computing for Humanities
University of Kentucky
351 William T. Young Library
Lexington, KY 40506
dporter(a)uky.edu 859-257-9549
***************************************
Hello all,
The editorial e-mail address for the digitalmedievalist project and
journal digitalmedievalist(a)uleth.ca has been hit by spammers who are
e-mailing us at a rate (200+ messages a day) that suggests they may be
trying a denial of service attack (though why they'd pick on us is
beyond me). The result is that we have very likely missed any legitimate
e-mail sent over the last three weeks. Missed e-mails include e-mails
sent directly to us or RSS announcements submitted via our on-line form.
We are trying to work out a way of reopening the address or finding
another way for people to get in touch with us. In the meantime,
correspondence for the digital medievalist project or journal should be
addresses to me personally: daniel.odonnell
the-funny-little-symbol-above-the-2-on-US-keyboards uleth.ca ;)
Sorry for any inconvenience.
-dan
--
--
Daniel Paul O'Donnell, PhD
Department of English
University of Lethbridge
Lethbridge Alberta T1K 3M4
Canada
Tel: +1 (403) 329-2377
Fax: +1 (403) 382-7191
e-mail: daniel.odonnell(a)uleth.ca
Web-Page: http://home.uleth.ca/~daniel.odonnell
The Electronic Caedmon's Hymn: http://home.uleth.ca/~caedmon