Dear Dominique,
This does indeed look very interesting. I have a couple of
questions.
First, is the software (or some part of it) used to extract the text
from the manuscripts going to be released under a free software
license? While I'm sure there's plenty of domain-specific stuff
there, I'm also sure it could be very useful to other projects which
are doing similar things.
I find it so frustrating when large publically funded projects
funnel their money into proprietary software that can't be further
developed and built upon by others. c.f. Transkribus. Here's hoping
you're planning to release the code soon!
Second question, is there a way to access the plain text directly,
or is there only a search interface at the moment? Having direct
plain text access can be useful for others to do various further
analysis on the corpus.
Anyway, great work, looks exciting, congratulations!
Nick White
On Tue, Jun 20, 2017 at 03:12:12PM +0200, Dominique Stutzmann wrote:
> Dear all,
>
> Within the HIMANIS project, funded by the Joint Programming Initiative on
> Cultural Heritage and Global Change” (JPI-CH) of the European Union, the
> partners are developing cost-effective solutions for querying large sets of
> handwritten document images. With IRHT and A2iA (France), the Universities of
> Valencia (Spain) and Groningen (Netherlands) as well as the French National
> Archive, it gathers Computer Science, Humanities and Cultural Heritage
> institutions in order to produce technology to generate new, research-based
> knowledge from historical manuscripts. As a challenging and particularly
> interesting case study, we have indexed the large collection of the Trésor des
> Chartes’ registers produced by the French royal chancery (Paris, Archives
> Nationales, JJ7 – JJ209).
>
> Now we are proud to announce that you can search the plain text in the Trésor
> des Chartes’ registers and provide feedback: It is ready to be used and tested
> by all interested users worldwide!
> http://prhlt-kws.prhlt.upv.es/himanis/
>
>
> This is a prototype and beta version, which will be amended and will change
> over the next months, with new functionalities (navigate through hits, display
> of abstracts and editions) and with additional volumes to be indexed from the
> French National Library and the National Archive..
>
> The project website is: http://www.himanis.org/
> The search interface into the corpus: http://prhlt-kws.prhlt.upv.es/ himanis/
> Additional explanations about the interface: https://himanis.hypotheses. org/
> 105
>
> You can search with boolean operators and word sequences (for the syntax, check
> on https://himanis.hypotheses. org/105)
>
> You can help us measuring the precision of our results:
> - please click on highlighted hits to confirm whether the word is correctly
> spotted or not;
> - please double click on a missed hit if you see it on the page (it will be
> added to the index for all users to search from the next day)
>
> Two simple examples as a beginning:
> - "scriptor" within the whole corpus: http://prhlt-kws.prhlt.upv.es/ himanis/
> index.php?q=scriptor& t=10&feedback=1
> - "pelerinage" on one page : http://prhlt-kws.prhlt.upv.es/ himanis/index.php/
> ui/show/ chancery/147/853?q=pelerinage& t=50&feedback=1
>
> The complete indexing results from an automated, image analysis process. You
> may find unexpected or false hits: for example, abbreviations are expanded
> automatically and it is needless to say that they are error-prone; likewise
> place and person names are slightly less well spotted. You can enhance the hit
> list by setting the "confidence" rate (between 0 and 100).
>
> We hope that you will be as thrilled as we are to present these results and we
> invite you to test, give feedback and send further comments, critics and
> suggestions to himanis@irht.cnrs.fr!
>
> Best regards
>
> Dominique Stutzmann
> ––
> M. Dominique Stutzmann
> Chargé de recherche à l'Institut de Recherche et d'Histoire des Textes (CNRS,
> UPR 841)
> Digital Medievalist -- http://www.digitalmedievalist.org/
> Journal: http://www.digitalmedievalist.org/journal/
> Journal Editors: editors _AT_ digitalmedievalist.org
> News: https://digitalmedievalist.wordpress.com/news/
> Twitter: http://twitter.com/digitalmedieval
> Facebook: http://www.facebook.com/group.php?gid=49320313760
> Discussion list: dm-l@uleth.ca
> Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l