资源简介
本系统以SpringBoot基础框架整合其他技术设计和搭建而成,选用webmagic框架实现单节点的网络爬虫系统,爬虫的生命周期为链接提取、页面下载、内容抽取、持久化,多线程抓取机制,Redis队列和集合实现网页去重和增量抓取,Redis队列和集合实现网页去重和增量抓取。搜索引擎的索引和搜索系统是利用全文搜索引擎框架(ElasticSearch)构建,由IK分词器实现语句分词地功能,ElasticSearch是一个企业分布式、高扩展、高实时的搜索与数据技术分析处理引擎,可以用于搜索各种文当,它提供可扩展的搜索,具有高效的海量数据搜索、分析和探索的能力。最后实现一个简单的web搜索页面,来模拟搜索引擎客户端

代码片段和文件信息
/*
* Copyright 2007-present the original author or authors.
*
* Licensed under the Apache License Version 2.0 (the “License“);
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing software
* distributed under the License is distributed on an “AS IS“ BASIS
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.net.*;
import java.io.*;
import java.nio.channels.*;
import java.util.Properties;
public class MavenWrapperDownloader {
private static final String WRAPPER_VERSION = “0.5.6“;
/**
* Default URL to download the maven-wrapper.jar from if no ‘downloadUrl‘ is provided.
*/
private static final String DEFAULT_DOWNLOAD_URL = “https://repo.maven.apache.org/maven2/io/takari/maven-wrapper/“
+ WRAPPER_VERSION + “/maven-wrapper-“ + WRAPPER_VERSION + “.jar“;
/**
* Path to the maven-wrapper.properties file which might contain a downloadUrl property to
* use instead of the default one.
*/
private static final String MAVEN_WRAPPER_PROPERTIES_PATH =
“.mvn/wrapper/maven-wrapper.properties“;
/**
* Path where the maven-wrapper.jar will be saved to.
*/
private static final String MAVEN_WRAPPER_JAR_PATH =
“.mvn/wrapper/maven-wrapper.jar“;
/**
* Name of the property which should be used to override the default download url for the wrapper.
*/
private static final String PROPERTY_NAME_WRAPPER_URL = “wrapperUrl“;
public static void main(String args[]) {
System.out.println(“- Downloader started“);
File baseDirectory = new File(args[0]);
System.out.println(“- Using base directory: “ + baseDirectory.getAbsolutePath());
// If the maven-wrapper.properties exists read it and check if it contains a custom
// wrapperUrl parameter.
File mavenWrapperPropertyFile = new File(baseDirectory MAVEN_WRAPPER_PROPERTIES_PATH);
String url = DEFAULT_DOWNLOAD_URL;
if (mavenWrapperPropertyFile.exists()) {
FileInputStream mavenWrapperPropertyFileInputStream = null;
try {
mavenWrapperPropertyFileInputStream = new FileInputStream(mavenWrapperPropertyFile);
Properties mavenWrapperProperties = new Properties();
mavenWrapperProperties.load(mavenWrapperPropertyFileInputStream);
url = mavenWrapperProperties.getProperty(PROPERTY_NAME_WRAPPER_URL url);
} catch (IOException e) {
System.out.println(“- ERROR loading ‘“ + MAVEN_WRAPPER_PROPERTIES_PATH + “‘“);
} finally {
try {
if (mavenWrapperPropertyFileInputStream != null) {
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
目录 0 2020-06-14 12:54 search-engine\
目录 0 2020-06-14 12:53 search-engine\search-engine\
文件 333 2020-04-23 10:57 search-engine\search-engine\.gitignore
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\
文件 184 2020-04-29 21:00 search-engine\search-engine\.idea\.gitignore
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\artifacts\
文件 485 2020-04-23 11:16 search-engine\search-engine\.idea\artifacts\search_engine_war.xm
文件 21281 2020-05-05 00:31 search-engine\search-engine\.idea\artifacts\search_engine_war_exploded.xm
文件 830 2020-04-23 11:16 search-engine\search-engine\.idea\compiler.xm
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\storage_v2\
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\storage_v2\_src_\
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\storage_v2\_src_\schema\
文件 76 2020-04-25 17:59 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d\storage_v2\_src_\schema\information_schema.FNRwLQ.me
文件 29120 2020-05-05 01:49 search-engine\search-engine\.idea\dataSources\7f4171d0-2398-4e1d-a9d2-9c84daaa0f0d.xm
文件 984 2020-04-25 18:01 search-engine\search-engine\.idea\dataSources.local.xm
文件 525 2020-04-25 17:58 search-engine\search-engine\.idea\dataSources.xm
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\dictionaries\
文件 490 2020-04-30 18:16 search-engine\search-engine\.idea\dictionaries\qirui.xm
文件 267 2020-04-29 21:01 search-engine\search-engine\.idea\encodings.xm
文件 864 2020-04-29 21:08 search-engine\search-engine\.idea\jarRepositories.xm
目录 0 2020-06-14 12:53 search-engine\search-engine\.idea\libraries\
文件 462 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__antlr_antlr_2_7_7.xm
文件 568 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__ch_qos_logback_logback_classic_1_2_3.xm
文件 547 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__ch_qos_logback_logback_core_1_2_3.xm
文件 543 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__commons_codec_commons_codec_1_13.xm
文件 616 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__commons_collections_commons_collections_3_2_2.xm
文件 517 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__commons_io_commons_io_1_3_2.xm
文件 514 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__com_alibaba_fastjson_1_2_28.xm
文件 499 2020-04-29 21:08 search-engine\search-engine\.idea\libraries\Maven__com_carrotsearch_hppc_0_7_1.xm
............此处省略300个文件信息
- 上一篇:往年卷子.zip
- 下一篇:基于时间序列分析的故障诊断
相关资源
- SpringBoot+H2+mybatis-plus59130
- 登录注册界面.zip48872
- 数字华容道
- SSM+Shiro+redis实现单点登陆
- jstl-api-1.2和jstl-impl-1.2
- 基于MVC模式的会员管理系统
- 国内一家大型软件公司内部的正规软
- 仿windows记事本
- GUI银行管理系统
- 超市收银系统eclipse access大学课程设计
- 模拟ATM柜员机系统--连接数据库
- A*算法的2D演示(带源码)
- 代码审查表和代码审查实例
- 仿126 网易 163 邮箱 界面
- Tomcat6.x
- 简单的行编辑器
- 扫雷(MVC架构)
- 302 Found
- window ping命令加时间并记录日志
- 千锋elasticsearch视频教程带笔记
- springboot+rabbitmq项目demo(亲测可正常运
- jxbrowser 所有版本通用的破解包
- 2017年-传智播客-张志君老师-SpringBoo
- Blob.js+Export2Excel.js
- 机会路由源代码+仿真工具(SCORP)
- POI中文帮助文档附带api手册.zip
- 2018双十一阿里供应链服务平台讲座
- es(elasticsearch)整合SpringCloudSpringBo
- 原银在线信贷平台概要设计说明书v
- office_word_api 开发文档
评论
共有 条评论