The Parallel.ForEach https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach方法旨在并行化 CPU 密集型工作负载。下载文件是 I/O 密集型工作负载,因此Parallel.ForEach
对于这种情况并不理想,因为它不必要地阻塞ThreadPool
线程。正确的方法是异步,使用 async/await。用于发出异步 Web 请求的推荐类是HttpClient https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient,为了控制并发级别,一个很好的选择是TPL数据流 https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library图书馆。对于这种情况,使用该库中最简单的组件就足够了,ActionBlock https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.dataflow.actionblock-1 class:
async Task DownloadListAsync(List<string> list)
{
using (var httpClient = new HttpClient())
{
var rest = ExcludeDownloaded(list);
var block = new ActionBlock<string>(async link =>
{
await DownloadFileAsync(httpClient, link);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 10
});
foreach (var link in rest)
{
await block.SendAsync(link);
}
block.Complete();
await block.Completion;
}
}
async Task DownloadFileAsync(HttpClient httpClient, string link)
{
var fileName = Guid.NewGuid().ToString(); // code to generate unique fileName;
var filePath = Path.Combine(SavePath, fileName);
if (File.Exists(filePath)) return;
var response = await httpClient.GetAsync(link);
response.EnsureSuccessStatusCode();
using (var contentStream = await response.Content.ReadAsStreamAsync())
using (var fileStream = new FileStream(filePath, FileMode.Create,
FileAccess.Write, FileShare.None, 32768, FileOptions.Asynchronous))
{
await contentStream.CopyToAsync(fileStream);
}
}
下载文件的代码HttpClient
并不那么简单WebClient.DownloadFile()
,但为了保持整个过程异步(从网络读取和写入磁盘),这是您必须要做的。
Caveat:目前,异步文件系统操作没有有效实施 https://stackoverflow.com/questions/63217657/why-file-readalllinesasync-blocks-the-ui-thread/在.NET中。为了获得最大效率,最好避免使用FileOptions.Asynchronous
选项中的FileStream
构造函数。
.NET 6 更新:现在并行异步工作的最佳方法是Parallel.ForEachAsync https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreachasyncAPI。可以找到一个使用示例here https://stackoverflow.com/questions/15136542/parallel-foreach-with-asynchronous-lambda/68901782#68901782.