资源简介
TF-ID算法实现Python的基本实现,很好用哦! 用到了几个第三方库
代码片段和文件信息
import nltk
import math
import string
from nltk.corpus import stopwords
from collections import Counter
from nltk.stem.porter import *
from sklearn.feature_extraction.text import TfidfVectorizer
text1 = “Python is a 2000 made-for-TV horror movie directed by Richard \
Clabaugh. The film features several cult favorite actors including William \
Zabka of The Karate Kid fame Wil Wheaton Casper Van Dien Jenny McCarthy \
Keith Coogan Robert Englund (best known for his role as Freddy Krueger in the \
A Nightmare on Elm Street series of films) Dana Barron David Bowe and Sean \
Whalen. The film concerns a genetically engineered snake a python that \
escapes and unleashes itself on a small town. It includes the classic final\
girl scenario evident in films like Friday the 13th. It was filmed in Los Angeles \
California and Malibu California. Python was followed by two sequels: Python \
II (2002) and Boa vs. Python (2004) both also made-for-TV films.“
text2 = “Python from the Greek word (πύθων/πύθωνας) is a genus of \
nonvenomous pythons[2] found in Africa and Asia. Currently 7 species are \
recognised.[2] A member of this genus P. reticulatus is among the longest \
snakes known.“
text3 = “The Colt Python is a .357 Magnum caliber revolver formerly \
manufactured by Colt‘s Manufacturing Company of Hartford Connecticut. \
It is sometimes referred to as a \“Combat Magnum\“.[1] It was first introduced \
in 1955 the same year as Smith & Wesson‘s M29 .44 Magnum. The now discontinued \
Colt Python targeted the premium revolver market segment. Some firearm \
collectors and writers such as Jeff Cooper Ian V. Hogg Chuck Hawks Leroy \
Thompson Renee Smeets and Martin Dougherty have described the Python as the \
finest production revolver ever made.“
def get_tokens(text):
lowers = text.lower()
#remove the punctuation using the character deletion step of translate
remove_punctuation_map = dict((ord(char) None) for char in string.punctuation)
no_punctuation = lowers.translate(remove_punctuation_map)
tokens = nltk.word_tokenize(no_punctuation)
return tokens
tokens = get_tokens(text1)
count = Counter(tokens)
print (count.most_common(10))
def stem_tokens(tokens stemmer):
stemmed = []
for item in tokens:
stemmed.append(stemmer.stem(item))
return stemmed
tokens = get_tokens(text1)
filtered = [w for w in tokens if not w in stopwords.words(‘english‘)]
stemmer = PorterStemmer()
stemmed = stem_tokens(filtered stemmer)
count1 = Counter(stemmed)
# print (count)
tokens = get_tokens(text2)
filtered = [w for w in tokens if not w in stopwords.words(‘english‘)]
stemmer = PorterStemmer()
stemmed = stem_tokens(filtered stemmer)
count2 = Counter(stemmed)
# print (count)
tokens = get_tokens(text3)
filtered = [w for w in tokens if not w in stopwords.words(‘english‘)]
stemmer = PorterStemmer()
stemmed = stem_tokens(filtered stemmer)
coun
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 3796 2018-06-03 12:01 tf-idf.py
----------- --------- ---------- ----- ----
3796 1
相关资源
- python实现SGBM图像匹配算法
- python实现灰度直方图均衡化
- scrapy_qunar_one
- Python学习全系列教程永久可用
- python简明教程.chm
- 抽奖大转盘python的图形化界面
- 双边滤波器实验报告及代码python
- python +MYSQL+HTML实现21蛋糕网上商城
- Python-直播答题助手自动检测出题搜索
- OpenCV入门教程+OpenCV官方教程中文版
- Python 串口工具源码+.exe文件
- Python开发的全栈股票系统.zip
- Python操作Excel表格并将其中部分数据写
- python书籍 PDF
- 利用python绘制散点图
- python+labview+No1.vi
- 老男孩python项目实战
- python源码制作whl文件.rar
- python3.5可用的scipy
- PYTHON3 经典50案例.pptx
- 计算机科学导论-python.pdf
- python模拟鼠标点击屏幕
- windows鼠标自动点击py脚本
- 鱼c小甲鱼零基础学python全套课后题和
- Python 练习题100道
- Practical Programming 2nd Edition
- wxPython Application Development Cookbook
- python 3.6
- Python 3.5.2 中文文档 互联网唯一CHM版本
- python3.5.2.chm官方文档
评论
共有 条评论