有类似 Buffer.Last Position Of 的东西吗?查找缓冲区中最后一次出现的字符?

2024-02-28

我有一个类型的缓冲区ReadOnlySequence<byte>。我想从中提取一个子序列(其中将包含 0 - n 条消息),方法是知道每条消息以0x1c, 0x0d(如所描述的here http://healthstandards.com/blog/2007/05/02/hl7-mlp-minimum-layer-protocol-defined/).

我知道缓冲区有一个扩展方法位置 https://learn.microsoft.com/en-us/dotnet/api/system.buffers.buffersextensions.positionof?view=netcore-3.1 but it

返回第一次出现的位置item in the ReadOnlySequence<T>.

我正在寻找一种方法,该方法可以返回最后一次出现的位置。我尝试自己实现它,这就是我到目前为止所拥有的

private SequencePosition? GetLastPosition(ReadOnlySequence<byte> buffer)
{
    // Do not modify the real buffer
    ReadOnlySequence<byte> temporaryBuffer = buffer;
    SequencePosition? lastPosition = null;

    do
    {
        /*
            Find the first occurence of the delimiters in the buffer
            This only takes a byte, what to do with the delimiters? { 0x1c, 0x0d }

        */
        SequencePosition? foundPosition = temporaryBuffer.PositionOf(???);

        // Is there still an occurence?
        if (foundPosition != null)
        {
            lastPosition = foundPosition;

            // cut off the sequence for the next run
            temporaryBuffer = temporaryBuffer.Slice(0, lastPosition.Value);
        }
        else
        {
            // this is required because otherwise this loop is infinite if lastPosition was set once
            break;
        }
    } while (lastPosition != null);

    return lastPosition;
}

我正在为此挣扎。首先PositionOf方法只需要一个byte但有两个分隔符,所以我必须传递一个byte[]。接下来我想我可以“以某种方式”优化循环。

您知道如何找到这些分隔符的最后一次出现吗?


我深入研究了这个问题,但我设法想出了一个扩展方法,我认为它回答了你的问题:

using System;
using System.Buffers;
using System.Collections.Generic;
using System.Linq;

public static class ReadOnlySequenceExtensions
{
    public static SequencePosition? LastPositionOf(
        this ReadOnlySequence<byte> source,
        byte[] delimiter)
    {
        if (delimiter == null)
        {
            throw new ArgumentNullException(nameof(delimiter));
        }
        if (!delimiter.Any())
        {
            throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
        }

        var reader = new SequenceReader<byte>(source);
        var delimiterToFind = new ReadOnlySpan<byte>(delimiter);

        var delimiterFound = false;
        // Keep reading until we've consumed all delimiters
        while (reader.TryReadTo(out _, delimiterToFind, true))
        {
            delimiterFound = true;
        }

        if (!delimiterFound)
        {
            return null;
        }

        // If we got this far, we've consumed bytes up to,
        // and including, the last byte of the delimiter,
        // so we can use that to get the position of 
        // the starting byte of the delimiter
        return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
    }
}

这里还有一些测试用例:

var cases = new List<byte[]>
{
    // Case 1: Check an empty array
    new byte[0],
    // Case 2: Check an array with no delimiter
    new byte[] { 0xf },
    // Case 3: Check an array with part of the delimiter
    new byte[] { 0x1c },
    // Case 4: Check an array with the other part of the delimiter
    new byte[] { 0x0d },
    // Case 5: Check an array with the delimiter in the wrong order
    new byte[] { 0x0d, 0x1c },
    // Case 6: Check an array with a correct delimiter
    new byte[] { 0x1c, 0x0d },
    // Case 7: Check an array with a byte followed by a correct delimiter
    new byte[] { 0x1, 0x1c, 0x0d },
    // Case 8: Check an array with multiple correct delimiters
    new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d },
    // Case 9: Check an array with multiple correct delimiters
    // where the delimiter isn't the last byte
    new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d, 0x3 },
    // Case 10: Check an array with multiple sequential bytes of a delimiter
    new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x1c, 0x0d, 0x3 },
};

var delimiter = new byte[] { 0x1c, 0x0d };
foreach (var item in cases)
{
    var source = new ReadOnlySequence<byte>(item);
    var result = source.LastPositionOf(delimiter);
} // Put a breakpoint here and examine result

Cases 1 to 5全部正确返回null. Cases 6 to 10全部正确返回SequencePosition到分隔符中的第一个字节(即在这种情况下,0x1c).

我还尝试创建一个迭代版本,在找到分隔符后会产生一个位置,如下所示:

while (reader.TryReadTo(out _, delimiterToFind, true))
{
    yield return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
}

But SequenceReader<T> and ReadOnlySpan<T>不能在迭代器块中使用,所以我想出了AllPositionsOf反而:

public static IEnumerable<SequencePosition> AllPositionsOf(
    this ReadOnlySequence<byte> source,
    byte[] delimiter)
{
    if (delimiter == null)
    {
        throw new ArgumentNullException(nameof(delimiter));
    }
    if (!delimiter.Any())
    {
        throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
    }

    var reader = new SequenceReader<byte>(source);
    var delimiterToFind = new ReadOnlySpan<byte>(delimiter);

    var results = new List<SequencePosition>();
    while (reader.TryReadTo(out _, delimiterToFind, true))
    {
        results.Add(reader.Sequence.GetPosition(reader.Consumed - delimiter.Length));
    }

    return results;
}

测试用例也能正常工作。

Update

现在我已经睡了一些觉,并且有机会思考一些事情,我认为上述问题可以改进,原因如下:

  1. SequenceReader<T> has a Rewind()方法,这让我思考SequenceReader<T>被设计为可重复使用
  2. SequenceReader<T>似乎是为了让它更容易使用而设计的ReadOnlySequence<T>一般而言
  3. 创建扩展方法ReadOnlySequence<T>为了使用SequenceReader<T>读取ReadOnlySequence<T>似乎倒退

鉴于上述情况,我认为尝试避免直接与ReadOnlySequence<T>在可能的情况下,优先选择并重复使用,SequenceReader<T>反而。考虑到这一点,这是一个不同的版本LastPositionOf现在是一个扩展方法SequenceReader<T>:

public static class SequenceReaderExtensions
{
    /// <summary>
    /// Finds the last occurrence of a delimiter in a given sequence.
    /// </summary>
    /// <param name="reader">The reader to read from.</param>
    /// <param name="delimiter">The delimeter to look for.</param>
    /// <param name="rewind">If true, rewinds the reader to its position prior to this method being called.</param>
    /// <returns>A SequencePosition if a delimiter is found, otherwise null.</returns>
    public static SequencePosition? LastPositionOf(
        this ref SequenceReader<byte> reader,
        byte[] delimiter,
        bool rewind)
    {
        if (delimiter == null)
        {
            throw new ArgumentNullException(nameof(delimiter));
        }
        if (!delimiter.Any())
        {
            throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
        }

        var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
        var consumed = reader.Consumed;

        var delimiterFound = false;
        // Keep reading until we've consumed all delimiters
        while (reader.TryReadTo(out _, delimiterToFind, true))
        {
            delimiterFound = true;
        }

        if (!delimiterFound)
        {
            if (rewind)
            {
                reader.Rewind(reader.Consumed - consumed);
            }

            return null;
        }

        // If we got this far, we've consumed bytes up to,
        // and including, the last byte of the delimiter,
        // so we can use that to get the starting byte
        // of the delimiter
        var result = reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
        if (rewind)
        {
            reader.Rewind(reader.Consumed - consumed);
        }

        return result;
    }
}

上面的测试用例继续通过,但我们现在可以重用相同的reader。此外,它还允许您指定是否要倒带到原始位置reader在被调用之前。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

有类似 Buffer.Last Position Of 的东西吗?查找缓冲区中最后一次出现的字符? 的相关文章

随机推荐