使用 pytesseract python 库识别 .png 图像中的文本,但出现此错误:
--------------------------------------------------------------------------- OSError Traceback (most recent call last) Cell In[2], line 18 14 pytesseract.tesseract_cmd = path_to_tesseract 16 # Passing the image object to image_to_string() function 17 # This function will extract the text from the image ---> 18 text = pytesseract.image_to_string(img) 20 # Displaying the extracted text 21 print(text[:-1]) File ~\anaconda3\lib\site-packages\pytesseract\pytesseract.py:423, in image_to_string(image, lang, config, nice, output_type, timeout) 418 """ 419 Returns the result of a Tesseract OCR run on the provided image to string 420 """ 421 args = [image, 'txt', lang, config, nice, timeout] --> 423 return { 424 Output.BYTES: lambda: run_and_get_output(*(args + [True])), 425 Output.DICT: lambda: {'text': run_and_get_output(*args)}, 426 Output.STRING: lambda: run_and_get_output(*args), 427 }[output_type]() File ~\anaconda3\lib\site-packages\pytesseract\pytesseract.py:426, in image_to_string.<locals>.<lambda>() 418 """ 419 Returns the result of a Tesseract OCR run on the provided image to string 420 """ 421 args = [image, 'txt', lang, config, nice, timeout] 423 return { 424 Output.BYTES: lambda: run_and_get_output(*(args + [True])), 425 Output.DICT: lambda: {'text': run_and_get_output(*args)}, --> 426 Output.STRING: lambda: run_and_get_output(*args), 427 }[output_type]() File ~\anaconda3\lib\site-packages\pytesseract\pytesseract.py:288, in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes) 277 with save(image) as (temp_name, input_filename): 278 kwargs = { 279 'input_filename': input_filename, 280 'output_filename_base': temp_name, (...) 285 'timeout': timeout, 286 } --> 288 run_tesseract(**kwargs) 289 filename = f"{kwargs['output_filename_base']}{extsep}{extension}" 290 with open(filename, 'rb') as output_file: File ~\anaconda3\lib\site-packages\pytesseract\pytesseract.py:255, in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout) 252 cmd_args.append(extension) 254 try: --> 255 proc = subprocess.Popen(cmd_args, **subprocess_args()) 256 except OSError as e: 257 if e.errno != ENOENT: File ~\anaconda3\lib\subprocess.py:951, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask) 947 if self.text_mode: 948 self.stderr = io.TextIOWrapper(self.stderr, 949 encoding=encoding, errors=errors) --> 951 self._execute_child(args, executable, preexec_fn, close_fds, 952 pass_fds, cwd, env, 953 startupinfo, creationflags, shell, 954 p2cread, p2cwrite, 955 c2pread, c2pwrite, 956 errread, errwrite, 957 restore_signals, 958 gid, gids, uid, umask, 959 start_new_session) 960 except: 961 # Cleanup if the child failed starting. 962 for f in filter(None, (self.stdin, self.stdout, self.stderr)): File ~\anaconda3\lib\subprocess.py:1420, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_gid, unused_gids, unused_uid, unused_umask, unused_start_new_session) 1418 # Start the process 1419 try: -> 1420 hp, ht, pid, tid = _winapi.CreateProcess(executable, args, 1421 # no special security 1422 None, None, 1423 int(not close_fds), 1424 creationflags, 1425 env, 1426 cwd, 1427 startupinfo) 1428 finally: 1429 # Child is launched. Close the parent's copy of those pipe 1430 # handles that only the child should have open. You need (...) 1433 # pipe will not close when the child process exits and the 1434 # ReadFile will hang. 1435 self._close_pipe_fds(p2cread, p2cwrite, 1436 c2pread, c2pwrite, 1437 errread, errwrite) OSError: [WinError 740] The requested operation requires elevation
有什么办法可以解决这个问题吗?
这是我的代码:
from PIL import Image from pytesseract import pytesseract # Defining paths to tesseract.exe # and the image we would be using path_to_tesseract = r"tesseract.exe" image_path = r"pdf2png/ActaConstitutiva0/ActaConstitutiva0-01.png" # Opening the image & storing it in an image object img = Image.open(image_path) # Providing the tesseract executable # location to pytesseract library pytesseract.tesseract_cmd = path_to_tesseract # Passing the image object to image_to_string() function # This function will extract the text from the image text = pytesseract.image_to_string(img) # Displaying the extracted text print(text[:-1])
您遇到的错误是因为Tesseract需要在管理员权限下运行。解决这个问题的方法之一是以管理员身份运行您的Python脚本,或者在代码中以管理员权限运行Tesseract。下面是两种解决方法:
方法1:以管理员身份运行Python脚本
这样,您的Python脚本和其中的Tesseract命令都将以管理员权限运行。
方法2:以管理员权限运行Tesseract
cd
tesseract [输入图像文件] [输出文本文件] -l eng
确保替换[输入图像文件]和[输出文本文件]为您的输入图像文件路径和输出文本文件路径。-l eng用于指定识别语言,可以根据您的需要更改。
[输入图像文件]
[输出文本文件]
-l eng
这样,您可以在以管理员权限运行Tesseract的情况下执行文本识别操作,而无需以管理员权限运行整个Python脚本。
无论您选择哪种方法,都应该解决”OSError: [WinError 740] The requested operation requires elevation”错误。