Skip to content

Python string improvements (performance and somewhat readability)

David Jaša requested to merge dj/python-string-improvements into main

While CPython does some magic to avoid quadratic complexity of repeated 'addition' to string-the-immutable-object achieving linear complexity for stuff like:

spam = ''
for i in range(1000):
    spam += 'eat spam!\n'

This is by no means guaranteed as Python docs point out:

and we shouldn't depend on such a behaviour especially in stuff like dump_status() of nmci/ctx.py that join more than a handful of original strings and do it for each and every scenario. Actually, based on a crude benchmark, the example above is about twice as slow as the more pythonic example below (yeah, such a simple thing is even better done by list comprehension):

spam = []
for i in range(1000)
    spam.append('eat spam!')
spam = '\n'.join(spam)

So the main goal of this commit is to switch all instances of strings being built incrementally to building a list (or where suitable, set) instead and .join()-ing it afterwards and to change '{foo}{bar}'.format(foo=foo, bar=bar) string to f'{foo}{bar}' f-strings

I started this as a more thorough effort on removing string addition using + or += operators altogether but after creating a simple benchmark I realized that this, .join() and f-string have about the same performance with '%s%s' % (var1, var2) and '{}{}'.format(var1, var2) being only slightly slower. The only slower outlier is parametrized '{foo}{bar}'.format(foo=foo, bar=bar) so I f-string-ified those in following commits. I however left some changes done before benchmarking that IMO increase code readability.

example benchmark results on my system are:

$ ./python-string-related-perf.py 
join fixed number of strings (4) to a long string (number=2000000):
'command %s --parameter %s' % (strings[0], strings[1])				:	 0.6372304310025356
'command {} --parameter {}'.format(strings[0], strings[1])			:	 0.6514693839999381
'command {arg} --parameter {param}'.format(arg=strings[0], param=strings[1])	:	 1.0831694859989511
'command ' + str[0] + ' --parameter=' + str[1]					:	 0.3938722479979333
' '.join(['command', str[0], '--parameter=', str[1]])				:	 0.40208850300041377
f'command {str[0]} --parameter {str[1]}'					:	 0.4099529650011391

build string from 50 to 80-long preexisting parts (number=5000):
build long string from 10 parts:
'\n'.join(strings)					:	 0.0017692799992801156
for i in strings: msgs.append(i); '\n'.join(msgs)	:	 0.003394540999579476
for i in strings: msg += msg + i + '\n'			:	 0.0055492060018877964
for i in strings: msg = f'{msg}{i}\n'			:	 0.004438264997588703

build long string from 100 parts:
'\n'.join(strings)					:	 0.009715741998661542
for i in strings: msgs.append(i); '\n'.join(msgs)	:	 0.027317003998177825
for i in strings: msg += msg + i + '\n'			:	 0.06035234099908848
for i in strings: msg = f'{msg}{i}\n'			:	 0.08395763999942574

build long string from 1000 parts:
'\n'.join(strings)					:	 0.09142985900325584
for i in strings: msgs.append(i); '\n'.join(msgs)	:	 0.2942033840008662
for i in strings: msg += msg + i + '\n'			:	 0.6274634230030642
for i in strings: msg = f'{msg}{i}\n'			:	 4.329035735998332

join long list to the end of short list:
devs contains 1000 devices
['nmcli', 'd', 'delete'] + devs	:	 2.9230779689969495
['nmcli', 'd', 'delete', *devs]	:	 2.972522068001126

pregenerate list or supply just interator object?
(f'sth{dev}' for dev in vlan_range=1000) 0.13947492800070904
[f'sth{dev}' for dev in vlan_range=1000] 0.12493522400109214


generate commands string using comprehension or commands list using for loop (number=1000)?
generate command string:				 0.11782203500115429
generate command list in for loop:			 3.5407861940002476
generate command list using nested comprehension:	 0.18802540999968187


append to list or add to list ({number=})?
for s in strings: ret.append(s):	 1.9168673099993612
for s in strings: ret += [s]:		 2.878833714999928
Edited by David Jaša

Merge request reports