-
大小: 246KB文件类型: .zip金币: 1下载: 0 次发布日期: 2021-08-26
- 语言: 其他
- 标签: ElasticSearc webmagic java 搜索引擎
资源简介
本系统以SpringBoot基础框架整合其他技术设计和搭建而成,选用webmagic框架实现单节点的网络爬虫系统,爬虫的生命周期为链接提取、页面下载、内容抽取、持久化,多线程抓取机制,Redis队列和集合实现网页去重和增量抓取,Redis队列和集合实现网页去重和增量抓取。搜索引擎的索引和搜索系统是利用全文搜索引擎框架(ElasticSearch)构建,由IK分词器实现语句分词地功能,ElasticSearch是一个企业分布式、高扩展、高实时的搜索与数据技术分析处理引擎,可以用于搜索各种文当,它提供可扩展的搜索,具有高效的海量数据搜索、分析和探索的能力。最后实现一个简单的web搜索页面,来模拟搜索引擎客户端
代码片段和文件信息
/*
* Copyright 2007-present the original author or authors.
*
* Licensed under the Apache License Version 2.0 (the “License“);
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing software
* distributed under the License is distributed on an “AS IS“ BASIS
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.net.*;
import java.io.*;
import java.nio.channels.*;
import java.util.Properties;
public class MavenWrapperDownloader {
private static final String WRAPPER_VERSION = “0.5.6“;
/**
* Default URL to download the maven-wrapper.jar from if no ‘downloadUrl‘ is provided.
*/
private static final String DEFAULT_DOWNLOAD_URL = “https://repo.maven.apache.org/maven2/io/takari/maven-wrapper/“
+ WRAPPER_VERSION + “/maven-wrapper-“ + WRAPPER_VERSION + “.jar“;
/**
* Path to the maven-wrapper.properties file which might contain a downloadUrl property to
* use instead of the default one.
*/
private static final String MAVEN_WRAPPER_PROPERTIES_PATH =
“.mvn/wrapper/maven-wrapper.properties“;
/**
* Path where the maven-wrapper.jar will be saved to.
*/
private static final String MAVEN_WRAPPER_JAR_PATH =
“.mvn/wrapper/maven-wrapper.jar“;
/**
* Name of the property which should be used to override the default download url for the wrapper.
*/
private static final String PROPERTY_NAME_WRAPPER_URL = “wrapperUrl“;
public static void main(String args[]) {
System.out.println(“- Downloader started“);
File baseDirectory = new File(args[0]);
System.out.println(“- Using base directory: “ + baseDirectory.getAbsolutePath());
// If the maven-wrapper.properties exists read it and check if it contains a custom
// wrapperUrl parameter.
File mavenWrapperPropertyFile = new File(baseDirectory MAVEN_WRAPPER_PROPERTIES_PATH);
String url = DEFAULT_DOWNLOAD_URL;
if (mavenWrapperPropertyFile.exists()) {
FileInputStream mavenWrapperPropertyFileInputStream = null;
try {
mavenWrapperPropertyFileInputStream = new FileInputStream(mavenWrapperPropertyFile);
Properties mavenWrapperProperties = new Properties();
mavenWrapperProperties.load(mavenWrapperPropertyFileInputStream);
url = mavenWrapperProperties.getProperty(PROPERTY_NAME_WRAPPER_URL url);
} catch (IOException e) {
System.out.println(“- ERROR loading ‘“ + MAVEN_WRAPPER_PROPERTIES_PATH + “‘“);
} finally {
try {
if (mavenWrapperPropertyFileInputStream != null) {
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
目录 0 2020-06-14 12:54 search-engine\
目录 0 2020-06-14 12:53 search-engine\search-engine\
文件 333 2020-04-23 10:57 search-engine\search-engine\.gitignore
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\
文件 184 2020-04-29 21:00 search-engine\search-engine\.idea\.gitignore
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\artifacts\
文件 485 2020-04-23 11:16 search-engine\search-engine\.idea\artifacts\search_engine_war.xm
文件 21281 2020-05-05 00:31 search-engine\search-engine\.idea\artifacts\search_engine_war_exploded.xm
文件 830 2020-04-23 11:16 search-engine\search-engine\.idea\compiler.xm
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\storage_v2\
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\storage_v2\_src_\
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\storage_v2\_src_\schema\
文件 76 2020-04-25 17:59 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\storage_v2\_src_\schema\information_schema.FNRwLQ.me
文件 29120 2020-05-05 01:49 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d.xm
文件 984 2020-04-25 18:01 search-engine\search-engine\.idea\dataSources.local.xm
文件 525 2020-04-25 17:58 search-engine\search-engine\.idea\dataSources.xm
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dictionaries\
文件 490 2020-04-30 18:16 search-engine\search-engine\.idea\dictionaries\qirui.xm
文件 267 2020-04-29 21:01 search-engine\search-engine\.idea\encodings.xm
文件 864 2020-04-29 21:08 search-engine\search-engine\.idea\jarRepositories.xm
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\libraries\
文件 462 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__antlr_antlr_2_7_7.xm
文件 568 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__ch_qos_logback_logback_classic_1_2_3.xm
文件 547 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__ch_qos_logback_logback_core_1_2_3.xm
文件 543 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__commons_codec_commons_codec_1_13.xm
文件 616 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__commons_collections_commons_collections_3_2_2.xm
文件 517 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__commons_io_commons_io_1_3_2.xm
文件 514 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__com_alibaba_fastjson_1_2_28.xm
文件 499 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__com_carrotsearch_hppc_0_7_1.xm
............此处省略300个文件信息
- 上一篇:往年卷子.zip
- 下一篇:基于时间序列分析的故障诊断
相关资源
- elasticsearch-7.5.0-windows-x86_64.zip
- SPOOLING假脱机技术模拟
- 红河学院机房管理系统课程设计内服
- 编译原理中间代码生成
- 某点最新dubbo视频
- 餐厅点餐系统课程设计文档.docx
- 购物商城可做毕业设计
- 前端开发用js替换插件
- 学生成绩管理系统制作教程(含代码
- CentOS7下Elasticsearch高可用集群方案-完
- 俄罗斯方块源码
- 数据库系统课程设计-仓库管理系统
- cesiumContainer-剖面分析完成版.vue
- Chrome浏览器查看elasticsearch head插件
- 汽车租赁系统.zip
- 歌曲信息管理系统之3
- 教育信息平台需求分析及功能说明设
- 编译原理LR0分析代码完整
- web聊天室源码
- 这个快速入门指南将教你如何使Type
- 基于Web的通用网络多媒体教学平台的
- Modbus Rtu Slave从机demo实现串口通信
- 写一个类,名为Animal,该类有两个私
- 一个按钮搞定excel文件上传和导入[完
- JMF的音视频实时交互及存储的具体实
- RMI 服务器与客户端源码,很简单,自
- IntelliJ IDEA Maven Mybatis generator 自动生成
- 毕向东全集视频
- 前端页面实现对数据库的操作
- MyTriangle
评论
共有 条评论