Workshop on Historical Text Mining
Thursday 20th and Friday 21st July 2006, Lancaster University, UK.
http://ucrel.lancs.ac.uk/events/htm06/
Organisers: Paul Rayson (Lancaster University) and Dawn Archer (University of Central Lancashire)
This is one of a series of workshops sponsored by the AHRC ICT Methods Network.
We want to develop a network of scholars interested in 'Historical Text Mining' via a workshop for experts from the various fields: text mining and E-Science, corpus development and annotation, historical linguistics, dialectology and computational linguistics. We believe that a discussion relating to the effective text mining of historical data is particularly overdue and much needed, because of the growth in (historical) digital resources (e.g. Open Content Alliance, Google Print, Early English Books Online). We particularly want to better define the relationship between the text mining/E-Science community, who are often involved in applying basic techniques to large scale datasets, and the corpus linguistic community, who tend to apply data-driven linguistic analysis and annotation techniques to relatively small datasets.
The 'Historical Text Mining' workshop will seek:
* to raise awareness of the various techniques utilised and/or tools developed by researchers working within the various fields. * to make scholars who work with historical data aware of existing text mining techniques that are applicable to their research needs, * to familiarise such scholars with the use of these techniques and tools, by means of a series of tutorial sessions (e.g. GATE, WordSmith, VARD, VIEW, Wmatrix), * to investigate the problems of applying some "modern" large-scale corpus annotation and analysis techniques to historical data, and * to encourage/enable a roundtable discussion, with the ultimate aim of determining what needs to be done to improve historical text mining and (importantly) identify possible future workshops and collaborative projects.
Participation is free but, since places are limited, we request that potential participants apply in advance, and explain why they wish to attend and what they expect from the workshop. For further details on the application procedure and the workshop programme, please see the website:
http://ucrel.lancs.ac.uk/events/htm06/
Dr Paul Rayson Computing Department, Infolab21, South Drive, Lancaster University, Lancaster, LA1 4WA, UK. Tel: +44 (0)1524 510357 Fax: +44 (0)1524 510492 Email: paul@comp.lancs.ac.uk
Dr Dawn Archer Lecturer in English Language and Linguistics, Department of Humanities, University of Central Lancashire, Preston, Lancashire, PR1 2HE Tel: +44 (0)1772 893032 Email: dearcher@uclan.ac.uk