资源简介
对大数据文本文件读取(按行读取)的优化,目前常规的方案有三种,第一种LineNumberReader,第二种RandomAccessFile,第三种是内存映射文件在RandomAccessFile基础上调用getChannel().map(...);代码提供在RandomAccessFile基础上,整合内部缓冲区,效率会有提高,测试过程中1000w行数据用时1秒,1亿行数据用时103(比1438秒快了13倍左右)
代码片段和文件信息
package com.gqshao.file.io;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.Arrays;
public class BufferedRandomAccessFile extends RandomAccessFile {
static final int LogBuffSz_ = 16; // 64K buffer
public static final int BuffSz_ = (1 << LogBuffSz_);
static final long BuffMask_ = ~(((long) BuffSz_) - 1L);
private String path_;
/*
* This implementation is based on the buffer implementation in Modula-3‘s
* “Rd“ “Wr“ “RdClass“ and “WrClass“ interfaces.
*/
private boolean dirty_; // true iff unflushed bytes exist
private boolean syncNeeded_; // dirty_ can be cleared by e.g. seek so track sync separately
private long curr_; // current position in file
private long lo_ hi_; // bounds on characters in “buff“
private byte[] buff_; // local buffer
private long maxHi_; // this.lo + this.buff.length
private boolean hitEOF_; // buffer contains last file block?
private long diskPos_; // disk position
/*
* To describe the above fields we introduce the following abstractions for
* the file “f“:
*
* len(f) the length of the file curr(f) the current position in the file
* c(f) the abstract contents of the file disk(f) the contents of f‘s
* backing disk file closed(f) true iff the file is closed
*
* “curr(f)“ is an index in the closed interval [0 len(f)]. “c(f)“ is a
* character sequence of length “len(f)“. “c(f)“ and “disk(f)“ may differ if
* “c(f)“ contains unflushed writes not reflected in “disk(f)“. The flush
* operation has the effect of making “disk(f)“ identical to “c(f)“.
*
* A file is said to be *valid* if the following conditions hold:
*
* V1. The “closed“ and “curr“ fields are correct:
*
* f.closed == closed(f) f.curr == curr(f)
*
* V2. The current position is either contained in the buffer or just past
* the buffer:
*
* f.lo <= f.curr <= f.hi
*
* V3. Any (possibly) unflushed characters are stored in “f.buff“:
*
* (forall i in [f.lo f.curr): c(f)[i] == f.buff[i - f.lo])
*
* V4. For all characters not covered by V3 c(f) and disk(f) agree:
*
* (forall i in [f.lo len(f)): i not in [f.lo f.curr) => c(f)[i] ==
* disk(f)[i])
*
* V5. “f.dirty“ is true iff the buffer contains bytes that should be
* flushed to the file; by V3 and V4 only part of the buffer can be dirty.
*
* f.dirty == (exists i in [f.lo f.curr): c(f)[i] != f.buff[i - f.lo])
*
* V6. this.maxHi == this.lo + this.buff.length
*
* Note that “f.buff“ can be “null“ in a valid file since the range of
* characters in V3 is empty when “f.lo == f.curr“.
*
* A file is said to be *ready* if the buffer contai
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 3915 2016-01-17 20:31 FileUtil.java
文件 11624 2016-01-17 20:29 BufferedRandomAccessFile.java
- 上一篇:serializer.jar
- 下一篇:android 蓝牙SPP传输demo
相关资源
- 神经网络算法与实现 ——基于Java语言
- java编写的简单手机通讯录
- 一款Java版的电子宠物游戏源代码
- JAVA给doc文档加密加水印
- Java程序设计基础 课后答案
- rtx单点登录(JAVA实现)
- java web使用监听器实现定时周期性执行
- java多线程模拟队列实现排队叫号
- 软件工程课程设计,教务考试系统
- Java聊天室
- JAVA实验字符串的滚动
- 在线财务管理系统(含源码)
- Java聊天室程序,socket编程
- JAVA编写的基于文本相似度匹配的文本
- java龟兔赛跑源代码
- Java骑士游历课程设计
- java_十进制数转换为二进制八进制十六
- 用贝叶斯分类器实现垃圾邮件分类器
- javaweb之jsp+servlet实现简单的学生管理
- java联网版五子棋源代码
- sql server 2000 java驱动包
- java 具有图形界面的最短路径问题的求
- java实现哈密顿路径,递归和非递归两
- java仿QQ() 最新版
- mysql-connector-java-5.1.37-bin jar包
- java解析Pcap文件获取五元组可执行
- Java程序设计清华大学出版社-习题参考
- 决策树算法(Java实现)
- mysql-connector-java-5.1.46.jar
- java版QQ聊天室源代码
评论
共有 条评论