我想创建一种算法,将文本分为3个均等大小的组(基于文本长度)。由于这将用于换行符,因此需要保持文本的顺序。
例如,此字符串:
Just testing to see how this works.
将排序为:
Just testing // 12 characters to see how // 10 characters this works. // 11 characters
有任何想法吗?
同样来自Wikipedia上有关自动换行的文章的“最小衣衫program”动态程序可以适应您的需求。设置LineWidth = len(text)/n - 1并忽略有关超过行宽的无限惩罚的注释;与一起使用c(i, j)as 的定义P = 2。
LineWidth = len(text)/n - 1
c(i, j)
P = 2
Code. I took the liberty of modifying the DP always to return exactly n lines, at the cost of increasing the running time from O(#words ** 2) to O(#words ** 2 * n). def minragged(text, n=3): """ >>> minragged('Just testing to see how this works.') ['Just testing', 'to see how', 'this works.'] >>> minragged('Just testing to see how this works.', 10) ['', '', 'Just', 'testing', 'to', 'see', 'how', 'this', 'works.', ''] """ words = text.split() cumwordwidth = [0] # cumwordwidth[-1] is the last element for word in words: cumwordwidth.append(cumwordwidth[-1] + len(word)) totalwidth = cumwordwidth[-1] + len(words) - 1 # len(words) - 1 spaces linewidth = float(totalwidth - (n - 1)) / float(n) # n - 1 line breaks def cost(i, j): """ cost of a line words[i], ..., words[j - 1] (words[i:j]) """ actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i]) return (linewidth - float(actuallinewidth)) ** 2 # best[l][k][0] is the min total cost for words 0, ..., k - 1 on l lines # best[l][k][1] is a minimizing index for the start of the last line best = [[(0.0, None)] + [(float('inf'), None)] * len(words)] # xrange(upper) is the interval 0, 1, ..., upper - 1 for l in xrange(1, n + 1): best.append([]) for j in xrange(len(words) + 1): best[l].append(min((best[l - 1][k][0] + cost(k, j), k) for k in xrange(j + 1))) lines = [] b = len(words) # xrange(upper, 0, -1) is the interval upper, upper - 1, ..., 1 for l in xrange(n, 0, -1): a = best[l][b][1] lines.append(' '.join(words[a:b])) b = a lines.reverse() return lines if __name__ == '__main__': import doctest doctest.testmod()