search
HomeJavajavaTutorialjtd format file conversion analysis
jtd format file conversion analysisJun 26, 2017 am 09:58 AM
documentFormatparseConvert

In the project that I have been busy with since 2016, the module I am mainly responsible for is the file parsing part. When I was working on it, I made all kinds of mistakes and troubles. At least it is finally over. Now I have put all the parts in the project together. Let’s summarize the parsing of these files for future reference. The main documents parsed in this project include office files, pdf, csv, rtf, txt, jtd, and emails in eml, msg and pst formats, as well as rar and zip compression. When decompressing the package, there is actually a file in the mlf format. However, after my research and the research of the company's bosses, I can't overcome the difficulty for the time being, so I can only give up the file in this format for the time being, and other analysis has not been done. It has been done, mainly these. I will summarize them all one by one later. Regarding file parsing, I use Tika of Apache.

Today we will first take a look at the analysis of this jtd file. Some people may not know what this jtd file is. Let me explain it first:

jtd格式文件是由日本的文字处理软件一太郎生成的文件格式

It can be understood as a jtd format file. The word we usually use does not need to be edited and opened with Itaro software. Let me show you what this Itaro software looks like:

jtd format file conversion analysis

I was very surprised when I first saw this requirement. Embarrassing. How to do this? It’s still a Japanese software. I can’t understand it even if I check the information. I can’t find it on Baidu and stackoverflow. At this time, thanks to a big boss in the company who can understand Japanese, this The boss found a solution on a Japanese website. The website address is http://d.hatena.ne.jp/satorufujimori/20070227/1172549793

. The solution is to use vbs script to convert the jtd format file Convert to txt file, and then parse the corresponding txt to obtain the content. The script on the website is as follows:

//taro2txt.vbs
Set taro = CreateObject("JXW.Application")
taro.Visible = True
taro.Documents.Open "c:\taro\a.jtd"
taro.ActiveDocument.SaveAs "c:\out\a.txt", "", "", "", 10, "ShiftJIS" //※1
taro.Quit

Everyone pays attention to the 10, which is an identifier. 10 means converting the jtd format file into txt Format files, if you want to convert jtd format files into files in other formats, you need to replace 10 with other identifiers, but what is more embarrassing is that we did not find a specific document explaining which number represents which document, and then at that time I tried from 0 to 100, and a lot of messy formats came out. The only useful one is 10, which means that it can only convert jtd format files into txt format files. In this case, all the pictures in the original file will disappear. However, our business is to read the file content and enter it into Solr for retrieval, so if there is no picture, there will be no picture. Later, we adopted this method to solve the problem.

Through the above script, you can convert jtd files without passwords into txt files, but the most embarrassing thing is that our jtd format files have passwords. This is embarrassing, but fortunately it was solved in the end. , I forgot how to solve it at the time, but the solution is as follows:

//taro2txt.vbs
Set taro = CreateObject("JXW.Application")
taro.Visible = True
taro.Documents.Open "c:\taro\a.jtd",password//在此处加上密码
taro.ActiveDocument.SaveAs "c:\out\a.txt", "", "", "", 10, "ShiftJIS" //※1
taro.Quit

After the script is completed, just click Run to convert the specific jtd file into a txt file, and then Just process the txt file and extract the content (the content extraction of txt format files will be explained in another article later).

The above problem has been solved, but there is still a problem. I can’t create a script file for all jtd files. Besides, I don’t know what files the customer has, so I thought of adding it to vbs. The script passes parameters. Although I don’t know the syntax of VBS, I still wrote it according to what is said on the Internet. The specific script content is as follows:

Option Explicit

Dim a0 : a0 = WScript.Arguments(0)
Dim a1 : a1 = WScript.Arguments(1)
Dim a2 : a2 = WScript.Arguments(2)
Dim taro

ExchangeFile a0, a1, a2

Sub ExchangeFile(src,dest,password)
    Set taro = CreateObject("JXW.Application")
    taro.Visible = True
    taro.Documents.Open src,password
    taro.ActiveDocument.SaveAs dest, "", "", "", 10, "" 
    taro.Quit
End Sub

Where a0 represents the path of the jtd file, and a1 represents the path to the jtd file. The path of the generated txt format file, a2 represents the password of the jtd file, which is actually the process of passing parameters to call the function.

After the script is perfected, it is a question of using java to call the vbs script. I found the answer to this question on stackoverflow. The calling method is as follows:

public static void main(String[] args) {
   try {
      Runtime.getRuntime().exec( "wscript D:/Send_Mail_updated.vbs" );
   }
   catch( IOException e ) {
      System.out.println(e);
      System.exit(0);
   }
}

Through the above series of steps, you can succeed Convert jtd files into txt files, but there are several problems:

  1. Calling the vbs script through the java program does not return a value indicating whether the txt file is actually generated. If the password The error is that the corresponding txt file cannot be generated. My processing method is to check whether the txt file has been generated every once in a while. After a certain number of times, it will be judged that the conversion failed. The number of times is based on the file size. For example, a 10M file will be Check every 5 seconds, 10 times in total. If the txt file is not generated, it will be judged as a failure. This is a waste of time when trying the password, and the file may be relatively large, or the machine configuration is not good enough. The txt file is generated, but after the check time has passed, it is directly determined that it cannot be converted correctly;

  2. Every time you run the vbs script, the Ichitaro software will be opened, and when trying the password, if the password If an error occurs, a Windows error pop-up window will appear on the server where the application is deployed. Although Ichitaro's process will be killed in the end, the customer can clearly see the Itaro program and error prompts before it is killed. This is very Embarrassing things;

  3. If the jtd file is too large, for example, when the file reaches 30M, the script conversion speed will be very slow. Question 2 also mentioned that during the file conversion process, the customer can If the Ichitaro program is seen on the server, if the client directly kills Itaro during this period, then the file conversion will definitely fail;

The above problems have not been solved yet, and there will be more later It depends on the usage after deployment at the customer's end. If the jtd format files at the customer's end are all under 10M, then there shouldn't be much of a problem. However, if the files exceed 30M, the conversion process will definitely be slow. And there is always the risk that the Ichitaro software will be killed during the conversion process. The specific situation depends on the customer's trial situation.

That’s all for now about file parsing in jtd format. As for the extraction of content after converting jtd format files into txt format files, I will write about it later.

The above is the detailed content of jtd format file conversion analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Python解析XML中的特殊字符和转义序列Python解析XML中的特殊字符和转义序列Aug 08, 2023 pm 12:46 PM

Python解析XML中的特殊字符和转义序列XML(eXtensibleMarkupLanguage)是一种常用的数据交换格式,用于在不同系统之间传输和存储数据。在处理XML文件时,经常会遇到包含特殊字符和转义序列的情况,这可能会导致解析错误或者误解数据。因此,在使用Python解析XML文件时,我们需要了解如何处理这些特殊字符和转义序列。一、特殊字符和

Python编程解析百度地图API文档中的坐标转换功能Python编程解析百度地图API文档中的坐标转换功能Aug 01, 2023 am 08:57 AM

Python编程解析百度地图API文档中的坐标转换功能导读:随着互联网的快速发展,地图定位功能已经成为现代人生活中不可或缺的一部分。而百度地图作为国内最受欢迎的地图服务之一,提供了一系列的API供开发者使用。本文将通过Python编程,解析百度地图API文档中的坐标转换功能,并给出相应的代码示例。一、引言在开发中,我们有时会涉及到坐标的转换问题。百度地图AP

使用Python解析SOAP消息使用Python解析SOAP消息Aug 08, 2023 am 09:27 AM

使用Python解析SOAP消息SOAP(SimpleObjectAccessProtocol)是一种基于XML的远程过程调用(RPC)协议,用于在网络上不同的应用程序之间进行通信。Python提供了许多库和工具来处理SOAP消息,其中最常用的是suds库。suds是Python的一个SOAP客户端库,可以用于解析和生成SOAP消息。它提供了一种简单而

PHP8.0中的XML解析库PHP8.0中的XML解析库May 14, 2023 am 08:19 AM

随着PHP8.0的发布,许多新特性都被引入和更新了,其中包括XML解析库。PHP8.0中的XML解析库提供了更快的解析速度和更好的可读性,这对于PHP开发者来说是一个重要的提升。在本文中,我们将探讨PHP8.0中的XML解析库的新特性以及如何使用它。什么是XML解析库?XML解析库是一种软件库,用于解析和处理XML文档。XML是一种用于将数据存储为结构化文档

使用Python解析带有命名空间的XML文档使用Python解析带有命名空间的XML文档Aug 09, 2023 pm 04:25 PM

使用Python解析带有命名空间的XML文档XML是一种常用的数据交换格式,能够适应各种应用场景。在处理XML文档时,有时会遇到带有命名空间(namespace)的情况。命名空间可以防止不同XML文档中元素名的冲突,提高了XML的灵活性和可扩展性。本文将介绍如何使用Python解析带有命名空间的XML文档,并给出相应的代码示例。首先,我们需要导入xml.et

PHP中的HTTP Basic鉴权方法解析及应用PHP中的HTTP Basic鉴权方法解析及应用Aug 06, 2023 am 08:16 AM

PHP中的HTTPBasic鉴权方法解析及应用HTTPBasic鉴权是一种简单但常用的身份验证方法,它通过在HTTP请求头中添加用户名和密码的Base64编码字符串进行身份验证。本文将介绍HTTPBasic鉴权的原理和使用方法,并提供PHP代码示例供读者参考。一、HTTPBasic鉴权原理HTTPBasic鉴权的原理非常简单,当客户端发送一个请求时

PHP 爬虫实战之获取网页源码和内容解析PHP 爬虫实战之获取网页源码和内容解析Jun 13, 2023 am 10:46 AM

PHP爬虫是一种自动化获取网页信息的程序,它可以获取网页代码、抓取数据并存储到本地或数据库中。使用爬虫可以快速获取大量的数据,为后续的数据分析和处理提供巨大的帮助。本文将介绍如何使用PHP实现一个简单的爬虫,以获取网页源码和内容解析。一、获取网页源码在开始之前,我们应该先了解一下HTTP协议和HTML的基本结构。HTTP是HyperText

PHP中的单点登录(SSO)鉴权方法解析PHP中的单点登录(SSO)鉴权方法解析Aug 08, 2023 am 09:21 AM

PHP中的单点登录(SSO)鉴权方法解析引言:随着互联网的发展,用户通常要同时访问多个网站进行各种操作。为了提高用户体验,单点登录(SingleSign-On,简称SSO)应运而生。本文将探讨PHP中的SSO鉴权方法,并提供相应的代码示例。一、什么是单点登录(SSO)?单点登录(SSO)是一种集中化认证的方法,在多个应用系统中,用户只需要登录一次,就能访问

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools