一般来说请讲,iRon对该问题的评论中的建议值得关注(具体问题在本节后面的部分中讨论):
为了保持较低的内存使用率,请使用流媒体物体的数量在管线中而不是首先将它们收集到内存中- 如果可行的话。
也就是说,不要这样做:
# !! Collects ALL objects in memory, as an array.
$rows = Import-Csv in.csv
foreach ($row in $rows) { ... }
do this:
# Process objects ONE BY ONE.
# As long as you stream to a *file* or some other output stream
# (as opposed to assigning to a *variable*), memory use should remain constant,
# except for temporarily held memory awaiting garbage collection.
Import-Csv in.csv | ForEach-Object { ... } # pipe to Export-Csv, for instance
然而,即使那样你看起来can内存不足很大 files - see 这个问题 - possibly与尚未被垃圾收集的不再需要的对象的内存积累有关;所以,定期调用[GC]::Collect()
in the ForEach-Object
脚本块可能会解决问题 - see 这个答案举个例子。
If you do需要收集所有输出的对象Import-Csv
立刻在记忆中:
The 无节制的您观察到的内存使用情况来自于如何[pscustomobject]
实例(Import-Csv
的输出类型)已实现,如中讨论的GitHub 问题 #7603(强调):
内存压力很可能来自于成本PSNoteProperty
[这是如何[pscustomobject]
属性已实现]。Each PSNoteProperty
开销为 48 字节, so 当每个属性只存储几个字节时,它就会变得巨大.
同一问题提出了解决方法以减少内存消耗(也如图所示瓦西夫·哈桑的回答):
注:这个解决方法的代价是昂贵的执行速度大大降低.
$csvFile = 'C:\top-1m.csv'
# Dynamically define a custom class derived from the *first* row
# read from the CSV file.
# Note: While this is a legitimate use of Invoke-Expression,
# it should generally be avoided.
"class CsvRow {
$((Import-Csv $csvFile | Select-Object -first 1).psobject.properties.Name -replace '^', '[string] $$' -join ";")
}" | Invoke-Expression
# Import all rows and convert them from [pscustomobject] instances
# to [CsvRow] instances to reduce memory consumption.
# Note: Casting the Import-Csv call directly to [CsvRow[]] would be noticeably
# faster, but increases *temporary* memory pressure substantially.
$alexaTopMillion = Import-Csv $csvFile | ForEach-Object { [CsvRow] $_ }
从长远来看,一个更好的解决方案也是faster是为了使Import-Csv
支持输出解析的行具有给定的输出类型,比如说,通过-OutputType
参数,如提议于GitHub 问题 #8862.
如果您对此感兴趣,请表示您对该提案的支持。
内存使用基准:
以下代码将内存使用与正常情况进行比较Import-Csv
导入(数组[pscustomobject]
s) 解决方法(自定义类实例数组)。
该测量并不准确,因为只是查询了 PowerShell 的进程工作内存,这可以显示后台活动的影响,但它可以粗略地了解使用自定义类所需的内存减少了多少。
示例输出,显示自定义类解决方法仅需要大约one 5th下面使用的示例 10 列 CSV 输入文件(大约 166,000 行)的内存大小 - 具体比率取决于输入行数和列数:
MB Used Command
------- -------
384.50 # normal import…
80.48 # import via custom class…
基准代码:
# Create a sample CSV file with 10 columns about 16 MB in size.
$tempCsvFile = [IO.Path]::GetTempFileName()
('"Col1","Col2","Col3","Col4","Col5","Col6","Col7","Col8","Col9","Col10"' + "`n") | Set-Content -NoNewline $tempCsvFile
('"Col1Val","Col2Val","Col3Val","Col4Val","Col5Val","Col6Val","Col7Val","Col8Val","Col9Val","Col10Val"' + "`n") * 1.662e5 |
Add-Content $tempCsvFile
try {
{ # normal import
$all = Import-Csv $tempCsvFile
},
{ # import via custom class
"class CsvRow {
$((Import-Csv $tempCsvFile | Select-Object -first 1).psobject.properties.Name -replace '^', '[string] $$' -join ";")
}" | Invoke-Expression
$all = Import-Csv $tempCsvFile | ForEach-Object { [CsvRow] $_ }
} | ForEach-Object {
[gc]::Collect(); [gc]::WaitForPendingFinalizers() # garbage-collect first.
Start-Sleep 2 # Wait a little for the freed memory to be reflected in the process object.
$before = (Get-Process -Id $PID).WorkingSet64
# Execute the command.
& $_
# Measure memory consumption and output the result.
[pscustomobject] @{
'MB Used' = ('{0,4:N2}' -f (((Get-Process -Id $PID).WorkingSet64 - $before) / 1mb)).PadLeft(7)
Command = $_
}
}
} finally {
Remove-Item $tempCsvFile
}