TCSE is a search engine specializing in exploring transcripts of TED Talk. It has been created for educational and scientific purposes. TCSE uses data provided by TED under the Creative Commons BY-NC-ND license, but it is not an official service of TED.
TCSE is created by Yoichiro Hasebe at Doshisha University, Kyoto, Japan and made available free for non-commercial educational and scientific use. Please cite one of the following when you publish work which utilizes TCSE.
| TCSE Version | 12.1.0 |
| Date of talk data compilation | February 28, 2026 |
| English POS-Tagger / Syntactic Parser | spaCy 3.8 (en_core_web_lg) |
| Number of talks | 6,419 |
| Number of segments | 1,419,926 |
| Number of expanded segments | 677,487 |
| Number of elements | 13,017,589 |
| Number of lexical items | 106,707 |
| Arabic | 6,290 talks |
| Bulgarian | 2,344 talks |
| Burmese | 2,102 talks |
| Chinese, Simplified | 6,033 talks |
| Chinese, Traditional | 5,701 talks |
| Croatian | 2,062 talks |
| Czech | 1,792 talks |
| Dutch | 3,263 talks |
| French | 5,894 talks |
| German | 3,722 talks |
| Greek | 3,407 talks |
| Hebrew | 4,869 talks |
| Hindi | 1,202 talks |
| Hungarian | 3,932 talks |
| Indonesian | 3,651 talks |
| Italian | 5,559 talks |
| Japanese | 4,688 talks |
| Korean | 5,600 talks |
| Kurdish, Central | 1,429 talks |
| Kurdish, Northern | 1,144 talks |
| Persian | 4,183 talks |
| Polish | 3,823 talks |
| Portuguese | 5,055 talks |
| Portuguese, Brazilian | 5,400 talks |
| Romanian | 3,989 talks |
| Russian | 5,223 talks |
| Serbian | 3,076 talks |
| Slovak | 1,128 talks |
| Spanish | 6,291 talks |
| Swedish | 1,390 talks |
| Thai | 2,764 talks |
| Turkish | 5,395 talks |
| Ukrainian | 2,356 talks |
| Vietnamese | 5,679 talks |
How to skip to a specific segment
How to adjust sync between video and transcript
Sometimes video and transcript are not in sync for some reason. For such cases, the following solution is available on TCSE:
In the video playback view, the following text highlights are available:
Keywords of the Talk — Words with TF-IDF score above 3.0 for the talk are highlighted with an underline. TF-IDF (Term Frequency–Inverse Document Frequency) measures how important a word is to a particular talk relative to the entire corpus. Higher values indicate words that are characteristic of that specific talk.
Discourse Markers — Common discourse markers (e.g. however, in other words, you know, I mean) are highlighted with a colored underline. These are words and phrases that organize speech, signal transitions, or manage the flow of conversation.
Advanced search is available only in English.
Linguistic Reference (POS, Tags, Dependencies, Morphology)
POS keys use spaCy Universal POS names (e.g. {verb}, {noun}). Short aliases are also accepted: {v}=verb, {n}=noun, {a}/{j}=adj, {r}=adv, {pr}=pron.
An advanced search query string cannot consist only of POS keys.
| Lemma | [LEMMA] |
| Part of Speech | {POS} |
| Surface + Part of Speech | SURFACE{POS}(with no spaces in-between) |
| Lemma + Part of Speech | [LEMMA]{POS}(with no spaces in-between) |
| Logical Disjunction (OR) | A|B |
| Segment Onset (Beginning) | ^ |
| Noun Chunk | _ |
| Negative Match | -X |
| Wild Card (matching exactly one element/word) | -_ |
| Wild Card (matching variable length of strings) | * |
| Named Entity (NER) | %PERSON, %ORG, %GPE, %DATE, etc. |
[excite] |
| excite, excites, excited, exciting |
{noun} |
| Noun, any kind |
{verb} |
| Verb, any kind |
to * surprise |
|
to our surprise to his surprise, etc. |
[read] {det} [news|paper|article] |
|
they read these articles reading the paper or something I'm reading the news at six, etc. |
^ having {verb} |
|
Having started the process, Having said that, etc. |
[help]{noun} |
|
an aunt offered financial help, we called people for help, etc. |
[get] -rid of |
|
get outside of get ahead of got tired of, etc. |
[make] _ -_ |
|
made a bad design good. make this happen. make your life miserable., etc. |
[give] _ _ |
|
give you an example gave her a gift give the government any further excuse, etc. |
%PERSON said |
|
Obama said Einstein said, etc. |
In Advanced Search mode (check the "Advanced Search" checkbox), you can use %ENTITY notation to search for named entities recognized by spaCy NLP. Multi-token entities (e.g. "New York", "United Nations") are matched as a single unit. You can also search for NER patterns in the N-gram mode (e.g. %PERSON). The following entity types are available:
%CARDINAL | Numerals not covered by other types | 73,912 |
%DATE | Absolute or relative dates or periods | 72,487 |
%PERSON | People, including fictional | 59,525 |
%GPE | Countries, cities, states | 48,806 |
%ORG | Companies, agencies, institutions | 47,748 |
%ORDINAL | "first", "second", etc. | 21,850 |
%NORP | Nationalities, religious or political groups | 21,830 |
%LOC | Non-GPE locations (mountain ranges, bodies of water) | 14,512 |
%TIME | Times smaller than a day | 9,389 |
%PERCENT | Percentage (including "%") | 8,184 |
%QUANTITY | Measurements (weight, distance) | 6,854 |
%WORK_OF_ART | Titles of books, songs, etc. | 6,046 |
%MONEY | Monetary values | 5,108 |
%PRODUCT | Objects, vehicles, foods (not services) | 3,470 |
%FAC | Buildings, airports, highways, bridges | 2,649 |
%EVENT | Named hurricanes, battles, wars, sports events | 2,165 |
%LANGUAGE | Any named language | 1,557 |
%LAW | Named documents made into laws | 758 |