资源简介
Apache Beam: 一个高级且统一的编程模型.让批处理和流式据处理的作业在任何执行引擎上都可以运行.
代码片段和文件信息
“““script to fix the links in the staged website.
Finds all internal links which do not have index.html at the end and appends
index.html in the appropriate place (preserving anchors etc).
Usage:
From root directory after running the jekyll build execute
‘python .jenkins/append_index_html_to_internal_links.py‘.
Dependencies:
beautifulsoup4
Installable via pip as ‘sudo pip install beautifulsoup4‘ or apt via
‘sudo apt-get install python-beautifulsoup4‘.
“““
import fnmatch
import os
import re
from bs4 import BeautifulSoup
# Original link match. Matches any string which starts with ‘/‘ and doesn‘t
# have a file extension.
linkMatch = r‘^\/(.*\.(?!([^\/]+)$))?[^.]*$‘
# Regex which matches strings of type /internal/link/#anchor. Breaks into two
# groups for ease of inserting ‘index.html‘.
anchorMatch1 = r‘(.+\/)(#[^\/]+$)‘
# Regex which matches strings of type /internal/link#anchor. Breaks into two
# groups for ease of inserting ‘index.html‘.
anchorMatch2 = r‘(.+\/[a-zA-Z0-9]+)(#[^\/]+$)‘
matches = []
# Recursively walk content directory and find all html files.
for root dirnames filenames in os.walk(‘content‘):
for filename in fnmatch.filter(filenames ‘*.html‘):
# Javadoc does not have the index.html problem so omit it.
if ‘javadoc‘ not in root:
matches.append(os.path.join(root filename))
print ‘Matches: ‘ + str(len(matches))
# Iterates over each matched file looking for link matches.
for match in matches:
print ‘Fixing links in: ‘ + match
mf = open(match)
soup = BeautifulSoup(mf)
# Iterates over every
for a in soup.findAll(‘a‘):
try:
hr = a[‘href‘]
if re.match(linkMatch hr) is not None:
if hr.endswith(‘/‘):
# /internal/link/
a[‘href‘] = hr + ‘index.html‘
elif re.match(anchorMatch1 hr) is not None:
# /internal/link/#anchor
mat = re.match(anchorMatch1 hr)
a[‘href‘] = mat.group(1) + ‘index.html‘ + mat.group(2)
elif re.match(anchorMatch2 hr) is not None:
# /internal/link#anchor
mat = re.match(anchorMatch2 hr)
a[‘href‘] = mat.group(1) + ‘/index.html‘ + mat.group(2)
else:
# /internal/link
a[‘href‘] = hr + ‘/index.html‘
mf.close()
html = unicode(soup).encode(‘utf-8‘)
# Write back to the file.
with open(match “wb“) as f:
print ‘Replacing ‘ + hr + ‘ with: ‘ + a[‘href‘]
f.write(html)
except KeyError as e:
# Some tags don‘t have an href.
continue
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
目录 0 2018-11-30 02:32 beam-site-zh-master\
文件 653 2018-11-30 02:32 beam-site-zh-master\.travis.yml
文件 35141 2018-11-30 02:32 beam-site-zh-master\LICENSE
文件 2283 2018-11-30 02:32 beam-site-zh-master\README.md
目录 0 2018-11-30 02:32 beam-site-zh-master\project\
文件 1172 2018-11-30 02:32 beam-site-zh-master\project\translate-flow.md
目录 0 2018-11-30 02:32 beam-site-zh-master\site\
目录 0 2018-11-30 02:32 beam-site-zh-master\site\en\
文件 657 2018-11-30 02:32 beam-site-zh-master\site\en\.gitattributes
文件 117 2018-11-30 02:32 beam-site-zh-master\site\en\.gitignore
目录 0 2018-11-30 02:32 beam-site-zh-master\site\en\.jenkins\
文件 2557 2018-11-30 02:32 beam-site-zh-master\site\en\.jenkins\append_index_html_to_internal_li
文件 472 2018-11-30 02:32 beam-site-zh-master\site\en\Gemfile
文件 1925 2018-11-30 02:32 beam-site-zh-master\site\en\Gemfile.lock
文件 4050 2018-11-30 02:32 beam-site-zh-master\site\en\README.md
文件 331 2018-11-30 02:32 beam-site-zh-master\site\en\Rakefile
文件 1878 2018-11-30 02:32 beam-site-zh-master\site\en\_config.yml
文件 109 2018-11-30 02:32 beam-site-zh-master\site\en\_config_test.yml
文件 1886 2018-11-30 02:32 beam-site-zh-master\site\en\run_with_docker.sh
目录 0 2018-11-30 02:32 beam-site-zh-master\site\en\src\
文件 484 2018-11-30 02:32 beam-site-zh-master\site\en\src\.htaccess
目录 0 2018-11-30 02:32 beam-site-zh-master\site\en\src\_beam_team\
文件 4741 2018-11-30 02:32 beam-site-zh-master\site\en\src\_beam_team\team.md
目录 0 2018-11-30 02:32 beam-site-zh-master\site\en\src\_data\
文件 1013 2018-11-30 02:32 beam-site-zh-master\site\en\src\_data\authors.yml
文件 33956 2018-11-30 02:32 beam-site-zh-master\site\en\src\_data\capability-matrix.yml
文件 217 2018-11-30 02:32 beam-site-zh-master\site\en\src\_data\logos.yml
文件 878 2018-11-30 02:32 beam-site-zh-master\site\en\src\_data\meetings.yml
目录 0 2018-11-30 02:32 beam-site-zh-master\site\en\src\_includes\
文件 358 2018-11-30 02:32 beam-site-zh-master\site\en\src\_includes\authors-list.md
文件 300 2018-11-30 02:32 beam-site-zh-master\site\en\src\_includes\capability-matrix-common.md
............此处省略792个文件信息
评论
共有 条评论