A Cache is like a short term memory. It’s typically faster than original data source, because it is in memory. You know accessing data from memory is faster than from hard drive.

When discussion system design, we may need to clarify the following questions. …

[ToolKit] VerbNet Api Tutorial

(This article is reposted from my previous blog website)

To implement the language understanding, we leverage some resource like syntatic frame and verbnet is one of them.

What’s verbnet?
Verbnet is a class-based verb Lexicon. Each verb in the verbnet is described by it’s semantic role…

(This article is reposted from my previous blog website, which was posted on May 18 2017)

Nowadays we utilize wiki-data as resource because of its great coverage.
To train the word representation, I have leveraged wiki-data as its input .

The following are the steps to extract wiki-data as preprocessing for machine learning.

Step 1.
Download wiki-data (choose the one end with ‘pages-articles.xml.bz2’)
or cmd : wget enwiki-20170501-pages-articles.xml.bz2

Step 2.
Got wiki extractor:

git clone https://github.com/zhaoshiyu/WikiExtractor.git

Step 3.
bzcat wget enwiki-20170501-pages-articles.xml.bz2 | python WikiExtractor-zsy.py -b200M -o extracted > vocabulary.txt

(-b 200M means 200M for each file. the default vaule is 500K)


(This article is reposted from my previous blog website)

To remember the day that I had shared the concept of Semantic Role Labeling (SRL) on R-Ladies community in Taiwan.

I will write down another post to introduce what’s SRL and what’s the difference between syntactic and semantic. However if you are eager to see what’s the difference between these two, I had written a brief introduction and module for that.

HaoWei He

Software Engineer. Interested in NLP, algorithm, Data Science & Cycling.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store