2.0 KiB
2.0 KiB
title | localeTitle |
---|---|
Natural Language Processing | 自然语言处理 |
自然语言处理(NLP)
正如维基百科所说,“自然语言处理(NLP)是计算机科学,信息工程和人工智能的一个子领域,涉及计算机与人类(自然)语言之间的相互作用,特别是如何对计算机进行编程以处理和分析大量数据自然语言数据。“ 简单来说,这是一个由人类产生的自然语言被计算机感知的过程。
NLP面临的挑战
1.轻松或大部分解决
*Spam detection
*Part of Speech Tagging
*Named Entity Recognition
2.中级或取得良好进展
*Sentiment analysis
*Coreference resolution
*Word sense disambiguation
*Parsing
*Machine Translation
*Information Translation
3.很难还是还需要很多工作
*Text Summarization
*Machine dialog system
常用技巧
*Structure extraction
*Identify and mark sentence, phrase, and paragraph boundaries
*Language identification
*Tokenization
*Acronym normalization and tagging
*Lemmatization / Stemming
*Entity extraction
*Phrase extraction
常用的图书馆
*NLTK, the most widely-mentioned NLP library for Python.
*SpaCy, an industrial-strength NLP library built for performance.
*Gensim, a library for document similarity analysis.
*TextBlob, a user-friendly and intuitive NLTK interface.
*CoreNLP from stanford group
*PolyGlot, a natural language pipeline that supports massive multilingual applications.
更多信息:
进一步阅读: