Python - Convert tables from .doc / .docx-files to .xls -


i'm tasked convert series of tables .doc , .docx-files .xls,

but have not managed find efficient way this. tables may in between other text.

i have looked pywin32, xlwt , couple of other libraries, seems have go through lot of steps.

any tips table conversion *.doc/*.docx *.xls file?

i'm assuming have many documents copy/paste, , seek pragmatic solution internal use. solution:

  • opens file in word in batch mode
  • you write little script cut outside tags html
  • saves file in html, using .xls extension
  • the html file open in excel default , click away warning.

create macro in word such this:

sub batchsaveas()     ' set output_dir appropriately     changefileopendirectory "output_dir"      outdocname = left(activedocument.name, len(activedocument.name) - 4) & ".xls"      activedocument.saveas filename:=outdocname, fileformat:= _         wdformatfilteredhtml, lockcomments:=false, password:="", addtorecentfiles _         :=true, writepassword:="", readonlyrecommended:=false, embedtruetypefonts _         :=false, savenativepictureformat:=false, saveformsdata:=false, _         saveasaoceletter:=false      activewindow.view.type = wdwebview      application.quit savechanges:=wddonotsavechanges end sub 

now can run word in batch mode through script calls each input file:

winword file_name /mbatchsaveas 

(you may need use full path names)

if warning on opening html / excel files not ok, write little python script run excel in batch mode. shows how run excel in python:

python com between python , excel

some tricks found useful: use clean-up; code need looks vba code, , if you're not @ vba, record macro want , modify python syntax.


Comments