资源简介
TF-ID算法实现Python的基本实现,很好用哦! 用到了几个第三方库

代码片段和文件信息
import nltk
import math
import string
from nltk.corpus import stopwords
from collections import Counter
from nltk.stem.porter import *
from sklearn.feature_extraction.text import TfidfVectorizer
text1 = “Python is a 2000 made-for-TV horror movie directed by Richard \
Clabaugh. The film features several cult favorite actors including William \
Zabka of The Karate Kid fame Wil Wheaton Casper Van Dien Jenny McCarthy \
Keith Coogan Robert Englund (best known for his role as Freddy Krueger in the \
A Nightmare on Elm Street series of films) Dana Barron David Bowe and Sean \
Whalen. The film concerns a genetically engineered snake a python that \
escapes and unleashes itself on a small town. It includes the classic final\
girl scenario evident in films like Friday the 13th. It was filmed in Los Angeles \
California and Malibu California. Python was followed by two sequels: Python \
II (2002) and Boa vs. Python (2004) both also made-for-TV films.“
text2 = “Python from the Greek word (πύθων/πύθωνας) is a genus of \
nonvenomous pythons[2] found in Africa and Asia. Currently 7 species are \
recognised.[2] A member of this genus P. reticulatus is among the longest \
snakes known.“
text3 = “The Colt Python is a .357 Magnum caliber revolver formerly \
manufactured by Colt‘s Manufacturing Company of Hartford Connecticut. \
It is sometimes referred to as a \“Combat Magnum\“.[1] It was first introduced \
in 1955 the same year as Smith & Wesson‘s M29 .44 Magnum. The now discontinued \
Colt Python targeted the premium revolver market segment. Some firearm \
collectors and writers such as Jeff Cooper Ian V. Hogg Chuck Hawks Leroy \
Thompson Renee Smeets and Martin Dougherty have described the Python as the \
finest production revolver ever made.“
def get_tokens(text):
lowers = text.lower()
#remove the punctuation using the character deletion step of translate
remove_punctuation_map = dict((ord(char) None) for char in string.punctuation)
no_punctuation = lowers.translate(remove_punctuation_map)
tokens = nltk.word_tokenize(no_punctuation)
return tokens
tokens = get_tokens(text1)
count = Counter(tokens)
print (count.most_common(10))
def stem_tokens(tokens stemmer):
stemmed = []
for item in tokens:
stemmed.append(stemmer.stem(item))
return stemmed
tokens = get_tokens(text1)
filtered = [w for w in tokens if not w in stopwords.words(‘english‘)]
stemmer = PorterStemmer()
stemmed = stem_tokens(filtered stemmer)
count1 = Counter(stemmed)
# print (count)
tokens = get_tokens(text2)
filtered = [w for w in tokens if not w in stopwords.words(‘english‘)]
stemmer = PorterStemmer()
stemmed = stem_tokens(filtered stemmer)
count2 = Counter(stemmed)
# print (count)
tokens = get_tokens(text3)
filtered = [w for w in tokens if not w in stopwords.words(‘english‘)]
stemmer = PorterStemmer()
stemmed = stem_tokens(filtered stemmer)
coun
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 3796 2018-06-03 12:01 tf-idf.py
----------- --------- ---------- ----- ----
3796 1
相关资源
- Instant Pygame for Python Game Development How
- Biopython Tutorial
- Think Python 2nd
- 一个小小的表白程序(python)
- Python课堂笔记(高淇400集第一季)
- 二级考试python试题12套(包括选择题和
- pywin32_python3.6_64位
- python+ selenium教程
- PycURL(Windows7/Win32)Python2.7安装包 P
- 英文原版-Scientific Computing with Python
- 7.图像风格迁移 基于深度学习 pyt
- 基于Python的学生管理系统
- A Byte of Python(简明Python教程)(第
- Python实例174946
- Python 人脸识别
- Python 人事管理系统
- 基于python-flask的个人博客系统
- 计算机视觉应用开发流程
- python 调用sftp断点续传文件
- python socket游戏
- 基于Python爬虫爬取天气预报信息
- python函数编程和讲解
- Python开发的个人博客
- 基于python的三层神经网络模型搭建
- python实现自动操作windows应用
- python人脸识别(opencv)
- python 绘图(方形、线条、圆形)
- python疫情卡UN管控
- python 连连看小游戏源码
- 基于PyQt5的视频播放器设计
评论
共有 条评论