More than 20 million words available at Finnish Center for Scientific Computing

Espoo 05 Mar 99 A digital resource of over 20 million running words of written standard Finnish has been made available for Finnish researchers of language at the Language bank service operated by CSC, the Center for Scientific Computing.This resource bears the name Finnish text bank, version A and has been created in the EC funded project, LE-PAROLE during 1996-1998.

The texts for the text bank were collected and annotated by the Research Institute for the Languages of Finland and the Department of General Linguistics at the University of Helsinki. The texts are published between 1970-1997 and include electronic versions of books as well as major national newspapers and journals.

The texts are annotated according to TEI (Text Annotation Initiative) P3 recommendations. The TEI is an application of SGML (Standard Generalized Markup Language) recommending what textual features should be encoded (i.e. made explicit) in an electronic text, and how that encoding should be represented for loss-free, platform-independent interchange.

The next resource to be made available in the Language bank server will be the Oulu corpus of standard written Finnish of the1960s. Large foreign resources, such as British National Corpus are also going to be licensed and made available on Language bank server later on.

CSC, the Center for Scientific Computing is a service organization owned by the Finnish Ministry of Education. CSC offers the researchers services in computing, databases and Funet network (the Finnish University and Research Network).

Further information and applications for an account (in Finnish): www.csc.fi/kielipankki


Sandra Wermer