--- title: Natural Language Processing localeTitle: 自然语言处理 --- ## 自然语言处理(NLP) 正如维基百科所说,“自然语言处理(NLP)是计算机科学,信息工程和人工智能的一个子领域,涉及计算机与人类(自然)语言之间的相互作用,特别是如何对计算机进行编程以处理和分析大量数据自然语言数据。“ 简单来说,这是一个由人类产生的自然语言被计算机感知的过程。 ### NLP面临的挑战 #### 1.轻松或大部分解决 ``` *Spam detection *Part of Speech Tagging *Named Entity Recognition ``` #### 2.中级或取得良好进展 ``` *Sentiment analysis *Coreference resolution *Word sense disambiguation *Parsing *Machine Translation *Information Translation ``` #### 3.很难还是还需要很多工作 ``` *Text Summarization *Machine dialog system ``` ### 常用技巧 ``` *Structure extraction *Identify and mark sentence, phrase, and paragraph boundaries *Language identification *Tokenization *Acronym normalization and tagging *Lemmatization / Stemming *Entity extraction *Phrase extraction ``` ### 常用的图书馆 ``` *NLTK, the most widely-mentioned NLP library for Python. *SpaCy, an industrial-strength NLP library built for performance. *Gensim, a library for document similarity analysis. *TextBlob, a user-friendly and intuitive NLTK interface. *CoreNLP from stanford group *PolyGlot, a natural language pipeline that supports massive multilingual applications. ``` #### 更多信息: 进一步阅读: * 点击[此处](https://medium.com/@gon.esbuyo/get-started-with-nlp-part-i-d67ca26cc828)查看有关NLP介绍的文章。 * 单击[此处](https://en.wikipedia.org/wiki/Natural_language_processing)查看Wikipedia参考。