我们有一些数据来自平面文件。例如
EmpCode,Salary,EmpName,...
100,1000,...,...
200,2000,...,...
200,2000,...,...
100,1000,...,...
300,3000,...,...
400,4000,...,...
我们想根据 EmpCode 聚合工资并写入数据库:
Emp_Code Emp_Salary Updated_Time Updated_User
100 2000 ... ...
200 4000 ... ...
300 3000 ... ...
400 4000 ... ...
我按照 Spring Batch 编写了类,如下所示
ItemReader - to read the employee data into a Employee object
示例 Employee ItemProcessor:
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
@Override
public Employee process(Employee employee) throws Exception {
employee.setUpdatedTime(new Date());
employee.setUpdatedUser("someuser");
return employee;
}
员工项目作者:
@Repository
public class EmployeeItemWriter implements ItemWriter<Employee> {
@Autowired
private SessionFactory sf;
@Override
public void write(List<? extends Employee> employeeList) throws Exception {
List<Employee> aggEmployeeList = aggregateEmpData(employeeList);
//write to db using session factory
}
private List<Employee> aggregateEmpData(List<? extends Employee> employeeList){
Map<String, Employee> map = new HashMap<String, Employee>();
for(Employee e: employeeList){
String empCode = e.getEmpCode();
if(map.containsKey(empCode)){
//get employee salary and add up
}else{
map.put(empCode,Employee);
}
}
return new ArrayList<Employee>(map.values());
}
}
XML配置
...
<batch:job id="employeeJob">
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="employeeItemReader"
writer="employeeItemWriter" processor="employeeItemProcessor"
commit-interval="100">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
...
它正在发挥作用并服务于我的目的。不过,我有几个问题。
1)当我查看日志时,显示如下(commit-interval=100):
状态=已完成,exitStatus=已完成,readCount=2652,filterCount=0,写入次数=2652readSkipCount=0、writeSkipCount=0、processSkipCount=0、commitCount=27、rollbackCount=0
但聚合后,只有 2515 条记录写入数据库。写入次数为2652。是否是因为到达ItemWriter的项目数仍然是2652?如何纠正这个问题?
2)我们对列表进行两次迭代。一次在 ItemProcessor 中,然后在 ItemWriter 中进行聚合。如果记录数量较多,则可能会出现性能问题。有没有更好的方法来实现这一目标?