如何在 Python 中将一个字符串附加到另一个字符串?

2024-03-17

如何有效地将一个字符串附加到另一个字符串?有没有更快的替代方案:

var1 = "foo"
var2 = "bar"
var3 = var1 + var2

For handling multiple strings in a list, see How to concatenate (join) items in a list to a single string https://stackoverflow.com/questions/12453580/.

See How do I put a variable’s value inside a string (interpolate it into the string)? https://stackoverflow.com/questions/2960772 if some inputs are not strings, but the result should still be a string.


如果您只有一个对字符串的引用,并且将另一个字符串连接到末尾,CPython 现在会出现这种特殊情况,并尝试就地扩展该字符串。

最终结果是该操作摊销为 O(n)。

e.g.

s = ""
for i in range(n):
    s += str(i)

以前是O(n^2),现在是O(n)。

更多信息

从源代码(bytesobject.c):

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
}


/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing \0 byte (newsize
   does *not* include that), and a trailing \0 byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}

通过经验验证很容易。



$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop
  

这一点很重要但请注意,这种优化不是 Python 规范的一部分。据我所知,它仅在 cPython 实现中。例如,对 pypy 或 jython 进行相同的经验测试可能会显示较旧的 O(n**2) 性能。



$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop
  

到目前为止一切顺利,但是接下来,



$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop
  

哎哟甚至比二次更糟糕。因此 pypy 所做的事情对于短字符串效果很好,但对于较大字符串则表现不佳。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何在 Python 中将一个字符串附加到另一个字符串? 的相关文章

随机推荐