当前位置：首页 > news >正文

hadoop有多个输入路径怎么处理

news 2025/7/10 3:59:52

在Hadoop中，可以使用FileInputFormat的addInputPath方法来添加多个输入路径。以下是实现步骤：

创建一个Job对象，并设置相关的参数和配置信息。

调用FileInputFormat的addInputPath方法添加输入路径。例如：

FileInputFormat.addInputPath(job, new Path(&quot;/path/to/input1&quot;));
FileInputFormat.addInputPath(job, new Path(&quot;/path/to/input2&quot;));
FileInputFormat.addInputPath(job, new Path(&quot;/path/to/input3&quot;));

可以添加任意数量的输入路径。

在Mapper中，可以通过FileSplit对象的getPath方法获取当前处理的文件的路径，例如：

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {private Text filename = new Text();public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {FileSplit fileSplit = (FileSplit) context.getInputSplit();Path path = fileSplit.getPath();filename.set(path.getName());// 处理文件内容context.write(filename, new IntWritable(1));}
}

在上述代码中，FileSplit对象可以获取当前处理的文件的路径，然后使用filename.set(path.getName())将文件名设置为输出的key，从而实现对每个输入文件的处理。

最后，提交MapReduce作业并等待完成，例如：

job.setMapperClass(MyMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileOutputFormat.setOutputPath(job, new Path(&quot;/path/to/output&quot;));
job.waitForCompletion(true);

这样，就可以实现对多个输入路径的处理了。

查看全文

http://www.lryc.cn/news/32684.html