Know nothing about C, have sample code, need to input and output file

Page 1 of 2 12 LastLast

  1. Posts : 1,772
    Windows 10 Pro
       #1

    Know nothing about C, have sample code, need to input and output file


    My programming skills date back to Fortran and PL/1 (no laughs please). So for all practical purposes, I am a complete programming newbie, but I have a specific problem to solve. I have over 5000 "doc" files are definitely not any form of Word. Word can't import them and numerous convertors just choke or output garbage. Google docs also can't import them. These files came off several machines running Win 98.

    I found a C program that does what I want, which is to remove all the non-text characters and output a text file. The sample program includes STDIN and STDOUT, which I understand to be standard input and standard output.

    Would something like this command line work? filename.doc < program > filename.txt in a Windows CMD box.

    Efficiency is not an issue. I'll just write a script. If it needs to run overnight, that's OK.

    Also, what is an easy to use, free C compiler?
      My Computers


  2. Posts : 305
    Win 10 and 11
       #2

    If you have the source code to this C program, please upload it as a zip file and I will take a look.

    I don't think you can run this from a command window as you describe. It probably runs from the command window, but your syntax may not be correct for the command line. The source code for the program will probably give us a clue on how to use the program's command line.
      My Computers


  3. Posts : 16,949
    Windows 10 Home x64 Version 22H2 Build 19045.4170
       #3

    x509 said:
    I have over 5000 "doc" files are definitely not any form of Word.
    Why do you say they are not Word files?
    What do you think created them?

    Didn't MisterEd's advice in two of your previous threads on this subject not help?
    Need Utility to read/convert older Word docs
    Retrieving all text from DOC files that are NOT MS Word


    Best of luck,
    Denis
      My Computer


  4. Posts : 1,772
    Windows 10 Pro
    Thread Starter
       #4

    Try3 said:
    Why do you say they are not Word files?
    What do you think created them?

    Didn't MisterEd's advice in two of your previous threads on this subject not help?
    Need Utility to read/convert older Word docs
    Retrieving all text from DOC files that are NOT MS Word


    Best of luck,
    Denis
    Denis,

    I tried and tried to unlock these files but they are definitely NOT MS Word. They may be PFS:First Choice, which was a 1980s DOS-based word processor. If there was any way that I could have used one of these utilities, I would not be asking for help.

    To be honest, I'm somewhat desperate here, because some of these files may have important legal information on them. The lawyer expects me to unlock these files. I don't mean to come across as weird, but given what may be in these files, I am super reluctant to just upload one to this group. It's a very messy situation, and I wish I didn't have to deal with it, but I do.

    To be clear, I'm not a "crazy man." I'm just an average Joe who has been dragged into this mess. Look at all my posts in this forum. All perfectly normal.

    - - - Updated - - -
    @Catnip

    Catnip said:
    If you have the source code to this C program, please upload it as a zip file and I will take a look.

    I don't think you can run this from a command window as you describe. It probably runs from the command window, but your syntax may not be correct for the command line. The source code for the program will probably give us a clue on how to use the program's command line.
    Super thanks.

    Here is the URL for this program, and attached is a text file (very small) of the actual code. If you click on the link below you will get the the page with this program and comments.

    Access to this page has been denied.

    Here is the actual code.p1.c.txt

    If possible can you remove the 72 column limit.
    Also change the handling of special characters as described in the program information to remove all non-printing ASCII characters and replace them with the blank spaces, except for the tab and newline characters.
    Replace all non-ASCII characters with a blank.

    Again, I am extremely grateful to you if you can do this.
      My Computers


  5. Posts : 305
    Win 10 and 11
       #5

    I tell you what. I have nothing to do tonight, so I am going to take a whack at fixing that little mess of C code that you linked and make it so it will actually work.

    I am going to do it in C#, so in the meantime, hold tight whilst I fix this. I will be sending you a little program when I am done. The name of the program will be called Parser.

    The syntax will be Parser <-name of input file> Example: Parser -Myfile.txt

    Hopefully this works. Stay tuned...
      My Computers


  6. Posts : 305
    Win 10 and 11
       #6

    @x509

    Well, I tried, but unfortunately it won't work for me. I was trying with Word documents. I am able to load and parse the file contents but can't separate the formatting information from the actual text, which is what I expect is happening to you. I wind up with gobbledygook.

    I realize that what you are dealing with are not Word files, but you are likely to run into the same problem as I did.

    Without actually looking at one of the files in question, there is no way for me to figure this out. I'm sorry.
      My Computers


  7. Posts : 11,247
    Windows / Linux : Arch Linux
       #7

    Hi there

    @x509

    Judging by references to PL/1 (nothing wrong with it at that time and I remember REXX, ISPF and TSO too) I wonder if those files have been encoded in the IBM code EBCDIC rather than conventional ASCII.

    Other than that if you cold post one of those doc files I'll have a go, My money is on these are EBCDIC encoded files.

    Cheers
    jimbo
      My Computer


  8. Posts : 1,223
    W10-Pro 22H2
       #8

    What are the chances that one of the files (if you were to upload it):
    (i) contains anything significantly confidential (from 1995 !!!), and
    (ii) can be cleaned up by any of us willing to give it a try?

    It's been over a month since you first posted - in your shoes, I'd risk it! Martin
      My Computer


  9. Posts : 11,247
    Windows / Linux : Arch Linux
       #9

    mngerhold said:
    What are the chances that one of the files (if you were to upload it):
    (i) contains anything significantly confidential (from 1995 !!!), and
    (ii) can be cleaned up by any of us willing to give it a try?

    It's been over a month since you first posted - in your shoes, I'd risk it! Martin

    I suspect format might be in IBM EBCDIC code rather than ASCII -- but if he could upload one of those files there'd be loads here who I'm sure would give it a go -- I can even run the old MVS/SP2 IBM mainframe OS with working JES2 on a laptop using the HERCULES emulator -- emulates the IBM303X mainframe series !!!!!

    Cheers
    jimbo.
      My Computer


  10. Posts : 3,274
    Win10
       #10

    A sample file might have helped to unlock the mystery.

    Anyway...
    Just in case anybody may be interested in trying to convert some obscure legacy files, and looking for possible alternatives to other UpToDate/current methods.....

    A legacy file converter for 64-bit Windows - WordPerfect Universe
    LegacyFileConverter

    The drag & drop works for single files, but it has Command Line support (which can be used to convert more than one file), and can copy the output to acsii .TXT files where applicable.

    Virustotal Report : 'Most' engines inc. Defender and MalwareBytes show clean (but may be worth testing/trying in a VM if there are no alternatives left).
    VirusTotal

    pn: By modifying the autoexec and config files for the VBOX in the above installation, and making a few other minor adjustments, it is possible to run the Word For Word program directly and then do the conversions through the original Word For Word GUI itself - the only problem with using this method is that although it is possible to do mass conversions, the original Dos Ascii file names would need to follow the 8.3 naming convention otherwise the all the output names would be converted automatically using the 8.3 convention.
    Last edited by das10; 18 Jul 2022 at 06:46.
      My Computers


 

  Related Discussions
Our Sites
Site Links
About Us
Windows 10 Forums is an independent web site and has not been authorized, sponsored, or otherwise approved by Microsoft Corporation. "Windows 10" and related materials are trademarks of Microsoft Corp.

© Designer Media Ltd
All times are GMT -5. The time now is 23:25.
Find Us




Windows 10 Forums