说我有一个看起来像这样的字符串:
str = "The &yquick &cbrown &bfox &Yjumps over the &ulazy dog"
您会注意到字符串中有很多与符号的位置,后跟一个字符(例如“&y”和“&c”)。我需要用字典中的适当值替换这些字符,如下所示:
dict = {"&y":"\033[0;30m", "&c":"\033[0;31m", "&b":"\033[0;32m", "&Y":"\033[0;33m", "&u":"\033[0;34m"}
最快的方法是什么?我可以手动找到所有“&”号,然后遍历字典以更改它们,但这似乎很慢。进行一堆正则表达式替换似乎也很慢(我的实际代码中将有大约30-40对字典)。
任何建议表示赞赏,谢谢。
编辑:
正如在该问题的注释中所指出的那样,我的字典是在运行时之前定义的,并且在应用程序生命周期的过程中永远不会改变。它是ANSI转义序列的列表,其中将包含约40个项目。我要比较的平均字符串长度约为500个字符,但是会有不超过5000个字符的字符串(尽管很少见)。我目前也在使用Python 2.6。
编辑#2 我接受了Tor Valamos的回答是正确的,因为它不仅提供了有效的解决方案(尽管它不是 最佳 解决方案),而且考虑了所有其他问题,并且做了大量工作来比较所有这些问题。该答案是我在StackOverflow上遇到的最好,最有用的答案之一。恭喜您。
mydict = {“&y”:”\033[0;30m”, “&c”:”\033[0;31m”, “&b”:”\033[0;32m”, “&Y”:”\033[0;33m”, “&u”:”\033[0;34m”} mystr = “The &yquick &cbrown &bfox &Yjumps over the &ulazy dog”
for k, v in mydict.iteritems(): mystr = mystr.replace(k, v) print mystr The ←[0;30mquick ←[0;31mbrown ←[0;32mfox ←[0;33mjumps over the ←[0;34mlazy dog
我比较了一些解决方案:
mydict = dict([('&' + chr(i), str(i)) for i in list(range(65, 91)) + list(range(97, 123))]) # random inserts between keys from random import randint rawstr = ''.join(mydict.keys()) mystr = '' for i in range(0, len(rawstr), 2): mystr += chr(randint(65,91)) * randint(0,20) # insert between 0 and 20 chars from time import time # How many times to run each solution rep = 10000 print 'Running %d times with string length %d and ' \ 'random inserts of lengths 0-20' % (rep, len(mystr)) # My solution t = time() for x in range(rep): for k, v in mydict.items(): mystr.replace(k, v) #print(mystr) print '%-30s' % 'Tor fixed & variable dict', time()-t from re import sub, compile, escape # Peter Hansen t = time() for x in range(rep): sub(r'(&[a-zA-Z])', r'%(\1)s', mystr) % mydict print '%-30s' % 'Peter fixed & variable dict', time()-t # Claudiu def multiple_replace(dict, text): # Create a regular expression from the dictionary keys regex = compile("(%s)" % "|".join(map(escape, dict.keys()))) # For each match, look-up corresponding value in dictionary return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) t = time() for x in range(rep): multiple_replace(mydict, mystr) print '%-30s' % 'Claudio variable dict', time()-t # Claudiu - Precompiled regex = compile("(%s)" % "|".join(map(escape, mydict.keys()))) t = time() for x in range(rep): regex.sub(lambda mo: mydict[mo.string[mo.start():mo.end()]], mystr) print '%-30s' % 'Claudio fixed dict', time()-t # Andrew Y - variable dict def mysubst(somestr, somedict): subs = somestr.split("&") return subs[0] + "".join(map(lambda arg: somedict["&" + arg[0:1]] + arg[1:], subs[1:])) t = time() for x in range(rep): mysubst(mystr, mydict) print '%-30s' % 'Andrew Y variable dict', time()-t # Andrew Y - fixed def repl(s): return mydict["&"+s[0:1]] + s[1:] t = time() for x in range(rep): subs = mystr.split("&") res = subs[0] + "".join(map(repl, subs[1:])) print '%-30s' % 'Andrew Y fixed dict', time()-t
Python 2.6的结果
Running 10000 times with string length 490 and random inserts of lengths 0-20 Tor fixed & variable dict 1.04699993134 Peter fixed & variable dict 0.218999862671 Claudio variable dict 2.48400020599 Claudio fixed dict 0.0940001010895 Andrew Y variable dict 0.0309998989105 Andrew Y fixed dict 0.0310001373291
claudiu和andrew的解决方案都保持为0,因此我不得不将其增加到10000次运行。
我在 Python 3 (由于Unicode)中运行了它,将chars从39替换为1024(38是&符,所以我不想包含它)。字符串长度最大为10.000,包括大约980个替换字符串,长度为0-20的可变随机插入。从39到1024的unicode值会导致字符长度分别为1和2个字节,这可能会影响某些解决方案。
mydict = dict([('&' + chr(i), str(i)) for i in range(39,1024)]) # random inserts between keys from random import randint rawstr = ''.join(mydict.keys()) mystr = '' for i in range(0, len(rawstr), 2): mystr += chr(randint(65,91)) * randint(0,20) # insert between 0 and 20 chars from time import time # How many times to run each solution rep = 10000 print('Running %d times with string length %d and ' \ 'random inserts of lengths 0-20' % (rep, len(mystr))) # Tor Valamo - too long #t = time() #for x in range(rep): # for k, v in mydict.items(): # mystr.replace(k, v) #print('%-30s' % 'Tor fixed & variable dict', time()-t) from re import sub, compile, escape # Peter Hansen t = time() for x in range(rep): sub(r'(&[a-zA-Z])', r'%(\1)s', mystr) % mydict print('%-30s' % 'Peter fixed & variable dict', time()-t) # Peter 2 def dictsub(m): return mydict[m.group()] t = time() for x in range(rep): sub(r'(&[a-zA-Z])', dictsub, mystr) print('%-30s' % 'Peter fixed dict', time()-t) # Claudiu - too long #def multiple_replace(dict, text): # # Create a regular expression from the dictionary keys # regex = compile("(%s)" % "|".join(map(escape, dict.keys()))) # # # For each match, look-up corresponding value in dictionary # return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) # #t = time() #for x in range(rep): # multiple_replace(mydict, mystr) #print('%-30s' % 'Claudio variable dict', time()-t) # Claudiu - Precompiled regex = compile("(%s)" % "|".join(map(escape, mydict.keys()))) t = time() for x in range(rep): regex.sub(lambda mo: mydict[mo.string[mo.start():mo.end()]], mystr) print('%-30s' % 'Claudio fixed dict', time()-t) # Separate setup for Andrew and gnibbler optimized dict mydict = dict((k[1], v) for k, v in mydict.items()) # Andrew Y - variable dict def mysubst(somestr, somedict): subs = somestr.split("&") return subs[0] + "".join(map(lambda arg: somedict[arg[0:1]] + arg[1:], subs[1:])) def mysubst2(somestr, somedict): subs = somestr.split("&") return subs[0].join(map(lambda arg: somedict[arg[0:1]] + arg[1:], subs[1:])) t = time() for x in range(rep): mysubst(mystr, mydict) print('%-30s' % 'Andrew Y variable dict', time()-t) t = time() for x in range(rep): mysubst2(mystr, mydict) print('%-30s' % 'Andrew Y variable dict 2', time()-t) # Andrew Y - fixed def repl(s): return mydict[s[0:1]] + s[1:] t = time() for x in range(rep): subs = mystr.split("&") res = subs[0] + "".join(map(repl, subs[1:])) print('%-30s' % 'Andrew Y fixed dict', time()-t) # gnibbler t = time() for x in range(rep): myparts = mystr.split("&") myparts[1:]=[mydict[x[0]]+x[1:] for x in myparts[1:]] "".join(myparts) print('%-30s' % 'gnibbler fixed & variable dict', time()-t)
结果:
Running 10000 times with string length 9491 and random inserts of lengths 0-20 Tor fixed & variable dict 0.0 # disqualified 329 secs Peter fixed & variable dict 2.07799983025 Peter fixed dict 1.53100013733 Claudio variable dict 0.0 # disqualified, 37 secs Claudio fixed dict 1.5 Andrew Y variable dict 0.578000068665 Andrew Y variable dict 2 0.56299996376 Andrew Y fixed dict 0.56200003624 gnibbler fixed & variable dict 0.530999898911
(**请注意,gnibbler的代码使用了不同的字典,其中的键不包含’&’。Andrew的代码也使用了该备用字典,但并没有太大的区别,也许只是0.01倍的加速。)