我正在开发一个链接检查器,一般来说我可以执行HEAD
请求,但是有些网站似乎禁用了这个动词,所以在失败时我还需要执行GET
请求(仔细检查链接是否确实已失效)
我使用以下代码作为我的链接测试器:
public class ValidateResult
{
public HttpStatusCode? StatusCode { get; set; }
public Uri RedirectResult { get; set; }
public WebExceptionStatus? WebExceptionStatus { get; set; }
}
public ValidateResult Validate(Uri uri, bool useHeadMethod = true,
bool enableKeepAlive = false, int timeoutSeconds = 30)
{
ValidateResult result = new ValidateResult();
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
if (useHeadMethod)
{
request.Method = "HEAD";
}
else
{
request.Method = "GET";
}
// always compress, if you get back a 404 from a HEAD it can be quite big.
request.AutomaticDecompression = DecompressionMethods.GZip;
request.AllowAutoRedirect = false;
request.UserAgent = UserAgentString;
request.Timeout = timeoutSeconds * 1000;
request.KeepAlive = enableKeepAlive;
HttpWebResponse response = null;
try
{
response = request.GetResponse() as HttpWebResponse;
result.StatusCode = response.StatusCode;
if (response.StatusCode == HttpStatusCode.Redirect ||
response.StatusCode == HttpStatusCode.MovedPermanently ||
response.StatusCode == HttpStatusCode.SeeOther)
{
try
{
Uri targetUri = new Uri(Uri, response.Headers["Location"]);
var scheme = targetUri.Scheme.ToLower();
if (scheme == "http" || scheme == "https")
{
result.RedirectResult = targetUri;
}
else
{
// this little gem was born out of http://tinyurl.com/18r
// redirecting to about:blank
result.StatusCode = HttpStatusCode.SwitchingProtocols;
result.WebExceptionStatus = null;
}
}
catch (UriFormatException)
{
// another gem... people sometimes redirect to http://nonsense:port/yay
result.StatusCode = HttpStatusCode.SwitchingProtocols;
result.WebExceptionStatus = WebExceptionStatus.NameResolutionFailure;
}
}
}
catch (WebException ex)
{
result.WebExceptionStatus = ex.Status;
response = ex.Response as HttpWebResponse;
if (response != null)
{
result.StatusCode = response.StatusCode;
}
}
finally
{
if (response != null)
{
response.Close();
}
}
return result;
}
这一切都运行良好且花花公子。除了当我执行GET
请求,整个有效负载被下载(我在wireshark中观看了这个)。
有没有办法配置底层ServicePoint
or the HttpWebRequest
根本不缓冲或急切加载响应主体?
(如果我手动编码,我会将 TCP 接收窗口设置得非常低,然后只抓取足够的数据包来获取标头,一旦我有足够的信息,就停止确认 TCP 数据包。)
for those wondering what this is meant to achieve, I do not want to download a 40k 404 when I get a 404, doing this a few hundred thousand times is expensive on the network
当您执行 GET 时,服务器将从文件的开头到结尾开始发送数据。除非你打断它。当然,以 10 Mb/秒的速度,这将是每秒 1 兆字节,因此如果文件很小,您将获得整个文件。您可以通过多种方式最大限度地减少实际下载量。
首先,你可以打电话request.Abort
收到回复后致电之前response.close
。这将确保底层代码在关闭响应之前不会尝试下载整个内容。我不知道这是否对小文件有帮助。我确实知道它会阻止您的应用程序在尝试下载多千兆字节的文件时挂起。
您可以做的另一件事是请求一个范围,而不是整个文件。请参阅AddRange http://msdn.microsoft.com/en-us/library/f2cwk28s方法及其重载。例如,你可以写request.AddRange(512)
,这只会下载文件的前 512 个字节。当然,这取决于支持范围查询的服务器。大多数都是这样。但是,大多数也支持 HEAD 请求。
您可能最终不得不编写一个按顺序尝试事物的方法:
- 尝试执行 HEAD 请求。如果有效(即不返回 500),那么就完成了
- 尝试使用范围查询进行 GET。如果没有返回 500,那么您就完成了。
- 执行常规 GET,并使用
request.Abort
after GetResponse
返回。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)