i'm tasked convert series of tables .doc , .docx-files .xls, 
but have not managed find efficient way this. tables may in between other text.
i have looked pywin32, xlwt , couple of other libraries, seems have go through lot of steps.
any tips table conversion *.doc/*.docx *.xls file?
i'm assuming have many documents copy/paste, , seek pragmatic solution internal use. solution:
- opens file in word in batch mode
 - you write little script cut outside tags html
 - saves file in html, using .xls extension
 - the html file open in excel default , click away warning.
 
create macro in word such this:
sub batchsaveas()     ' set output_dir appropriately     changefileopendirectory "output_dir"      outdocname = left(activedocument.name, len(activedocument.name) - 4) & ".xls"      activedocument.saveas filename:=outdocname, fileformat:= _         wdformatfilteredhtml, lockcomments:=false, password:="", addtorecentfiles _         :=true, writepassword:="", readonlyrecommended:=false, embedtruetypefonts _         :=false, savenativepictureformat:=false, saveformsdata:=false, _         saveasaoceletter:=false      activewindow.view.type = wdwebview      application.quit savechanges:=wddonotsavechanges end sub   now can run word in batch mode through script calls each input file:
winword file_name /mbatchsaveas   (you may need use full path names)
if warning on opening html / excel files not ok, write little python script run excel in batch mode. shows how run excel in python:
python com between python , excel
some tricks found useful: use clean-up; code need looks vba code, , if you're not @ vba, record macro want , modify python syntax.
Comments
Post a Comment