Recent Projects

See also:

TCSE: TED Corpus Search Engine

A versatile corpus system to retrieve video/text segments from over 2,300 TED Talks


jReadability is a Japanese text readability measurement system being developed in collaboration with Jaeho Lee at Waseda University.


RSyntaxTree is a graphical syntax tree generator written in the Ruby programming language.


WP2TXT extracts plain text data from Wikipedia dump file stripping all the MediaWiki markups and other metadata.


EngTagger is a probability based, corpus-trained tagger that assigns POS tags to English text based on a lookup dictionary and a set of probability values.


Paradocs is a paragraph-oriented document presentation system, created using Reveal.js

Intro to BYU Corpora [in Japanese]

A tutorial document to learn how to use COCA and other BYU corpora [in Japanese]

Using TCSE [in Japanese]

A tutorial document to learn how to use TED Corpus Search Engine (TCSE) [in Japanese]


at Doshisha University

Kyotanabe Campus, Doshisha University  

All the classes in Spring Semester 2020 at Doshisha University are taught online.


  • Faculty of Global Communications
    Doshisha University
    1-3 Tatara Miyakodani, Kyotanabe-shi, Kyoto-fu, 610-0394, JAPAN