61 lines
2.0 KiB
Markdown
61 lines
2.0 KiB
Markdown
---
|
||
title: Natural Language Processing
|
||
localeTitle: 自然语言处理
|
||
---
|
||
## 自然语言处理(NLP)
|
||
|
||
正如维基百科所说,“自然语言处理(NLP)是计算机科学,信息工程和人工智能的一个子领域,涉及计算机与人类(自然)语言之间的相互作用,特别是如何对计算机进行编程以处理和分析大量数据自然语言数据。“ 简单来说,这是一个由人类产生的自然语言被计算机感知的过程。
|
||
|
||
### NLP面临的挑战
|
||
|
||
#### 1.轻松或大部分解决
|
||
```
|
||
*Spam detection
|
||
*Part of Speech Tagging
|
||
*Named Entity Recognition
|
||
```
|
||
|
||
#### 2.中级或取得良好进展
|
||
```
|
||
*Sentiment analysis
|
||
*Coreference resolution
|
||
*Word sense disambiguation
|
||
*Parsing
|
||
*Machine Translation
|
||
*Information Translation
|
||
```
|
||
|
||
#### 3.很难还是还需要很多工作
|
||
```
|
||
*Text Summarization
|
||
*Machine dialog system
|
||
```
|
||
|
||
### 常用技巧
|
||
```
|
||
*Structure extraction
|
||
*Identify and mark sentence, phrase, and paragraph boundaries
|
||
*Language identification
|
||
*Tokenization
|
||
*Acronym normalization and tagging
|
||
*Lemmatization / Stemming
|
||
*Entity extraction
|
||
*Phrase extraction
|
||
```
|
||
|
||
### 常用的图书馆
|
||
```
|
||
*NLTK, the most widely-mentioned NLP library for Python.
|
||
*SpaCy, an industrial-strength NLP library built for performance.
|
||
*Gensim, a library for document similarity analysis.
|
||
*TextBlob, a user-friendly and intuitive NLTK interface.
|
||
*CoreNLP from stanford group
|
||
*PolyGlot, a natural language pipeline that supports massive multilingual applications.
|
||
```
|
||
|
||
#### 更多信息:
|
||
|
||
进一步阅读:
|
||
|
||
* 点击[此处](https://medium.com/@gon.esbuyo/get-started-with-nlp-part-i-d67ca26cc828)查看有关NLP介绍的文章。
|
||
* 单击[此处](https://en.wikipedia.org/wiki/Natural_language_processing)查看Wikipedia参考。 |