PDA

View Full Version : Collaboration Tool


Toaster
2009-01-07, 12:35
On the prior note on using Wiki's to aid collaboration. Since a wiki is already a well-defined concept and what's needed is something new (though arguably somewhat wiki-like), here's a cut at describing the elements of a useful collaboration tool:
1. Web-based tool with a front end for group communication.
2. Source control supporting check-out of a file or a portion of a file.
3. Support for a well documented intermediate format that recompilers and editors can consume.
4. Publicly available open-license source code.

IMHO 1, 2 and 4 are the most obvious, so I’ll focus on the third point.
#3, The current project chain seems fairly unstable and dependent on esoteric knowledge of a variety of undocumented game interpreters – and will always be this way. The introduction of a standard middle layer that everyone can write to would go a long way to increasing collaboration and interoperability of the toolsets above and below the layer. The translation/recompilation chain would look like this:

Native Game Files -> [decompiler] -> Annotated Game Files -> [extraction tool] -> Standard Translation File -> [collaboration tool]

(Also note that since there is nothing illegal above the decompiler there is no reason not to open source everything to the right of the decompiler… the decompiler may also be legal, just not it’s use.)

The community already has a big leg up on many of these stages; indexed xml-ish translation files are already in use, the extraction tool is a simple one-pass preprocessor, and wiki can be supplemented with source control.

As the decompiler will most often be the issue, the annotated game format needs to be simple and somewhat comprehensive. Though we are all probably aware of the horror stories told by those attempting to decipher file formats, keep in mind that the decompiler folks’ real pain is not exporting to a new format but rather deciphering the undocumented structure of the compiled source files. I believe such as file description exists as the annotated format can be abstracted as an index file pointing out the memory locations of the modifiable portions of the script files as well as where the scripts' memory pointers are and how to recalculate them when the game file is rebuilt.

zalas
2009-01-07, 19:00
#3, The current project chain seems fairly unstable and dependent on esoteric knowledge of a variety of undocumented game interpreters – and will always be this way. The introduction of a standard middle layer that everyone can write to would go a long way to increasing collaboration and interoperability of the toolsets above and below the layer. The translation/recompilation chain would look like this:

Native Game Files -> [decompiler] -> Annotated Game Files -> [extraction tool] -> Standard Translation File -> [collaboration tool]

(Also note that since there is nothing illegal above the decompiler there is no reason not to open source everything to the right of the decompiler… the decompiler may also be legal, just not it’s use.)

The community already has a big leg up on many of these stages; indexed xml-ish translation files are already in use, the extraction tool is a simple one-pass preprocessor, and wiki can be supplemented with source control.

As the decompiler will most often be the issue, the annotated game format needs to be simple and somewhat comprehensive. Though we are all probably aware of the horror stories told by those attempting to decipher file formats, keep in mind that the decompiler folks’ real pain is not exporting to a new format but rather deciphering the undocumented structure of the compiled source files. I believe such as file description exists as the annotated format can be abstracted as an index file pointing out the memory locations of the modifiable portions of the script files as well as where the scripts' memory pointers are and how to recalculate them when the game file is rebuilt.
Um... well, pretty much all of the stuff I made tools for export into an xml format that my editing program, VASTT, reads. The problem with requiring a standard intermediate format is that there is going to be a lot of extras needed for the insertion stage for some games, like telling it where the original code was located. File formats vary too much for a single approach to work. This is why I have to hack in things like putting in random crap in the "line number" field to tell the inserter how to do certain things.
However, having said that, I'm leaning towards a more basic approach, where the "intermediate" file format is simply an indexed string table. You'd have separate files (or tables in a relational database) for the raw versions and translated versions, and the editing program will display them side by side. This makes it a lot more flexible than the current VASTT format, which only has Line Number, Speaker, Original Line, Translated Line and Comments. Any needed metadata for inserters will be in separate "project" files.

Toaster
2009-01-08, 00:17
Nice, sounds like a useful program. Apologies, I haven’t run across VASTT yet, so I wasn’t aware that you’ve built a lot of the infrastructure. I agree/like the idea of the indexed string table as the intermediate file. Depending on what type of instability/uncertainty you are likely to encounter with the source material you might consider making the index into a compound of the original file ID and the memory offset to ensure the indices remain unique etc.

You may be right that it isn’t possible to produce a generic annotated game file given the variety of formats out there. I also agree that certain things you described in AliceSoft’s system, e.g. the graphics file formats aren’t practical to capture with a format spec; however, what a re/decompiler needs to know about these formats tends to be fairly consistent and limited. What if the annotation format only described the editable blocks and pointers in the file and tagged each block with a type? The tag would essentially be a mime type for the given block of data. Given such a format, the top level tool would be able to manipulate the structure of the game files and presumably the data in blocks with more common mime types – leaving the esoteric types for others to hack on.

btw – is there a VASTT home page I can visit?

zalas
2009-01-08, 11:13
Files have too many different ways of storing data. There's compression, encryption, pascal strings, C strings and you can have many levels of indirection as well. Furthermore, the line-breaking requirements for strings are also specified very differently. If you only included the common types, you wouldn't be able to access many games. For example, to dump the messages in Sengoku Rance, what you have to do is essentially disassemble the bytecode and figure out how each message is called and then piece together a line from the bytecode. It's the same with Utawarerumono if you want a decent dialogue line-by-line format. Even if it's not that extreme, you'll find that almost every game has a little quirk here or there and I think it's probably best to have simply two types of files -- a string table file for people to translate, and an opaque metadata file that holds information about how to insert.

And no, there isn't a VASTT homepage; it's in perpetual alpha (beat that, Google! (: )

Agilis
2009-01-08, 15:05
I think it's something of a trap to say "let's make a collaboration tool" -ergo-> wiki-like thing, as if it were obvious. It'd be clearer if we just attacked a finite list of clear workflows that most people seem to use instead. For example:

- people who use mostly IRC/IM to discuss and collaborate. All parties just open the same file and talk in real time, possibly using a pastebin/ftp to throw blobs around
- groups that work mostly in email because they're not around at the same time. Alternatively, they have a forum or something with threads that discuss single topics (wiki discussions tend to be annoying for discussing about a few scattered lines in a single script page)
- some groups just leverage the version control and discuss 'in the script'
- some throw everything onto the wiki
- are some groups bureaucratic enough to use trac-like tools or bug trackers?

Off the top of my head that's the major ones I've experienced, and wiki only factors into only some of the cases.

----

Incidentally, for VASTT, is there any support for instances where it's not a 1:1 mapping between lines of source and translated text? It's rare, but there are definitely use cases where it's useful to be able to add in an extra block of text, or possibly drop one.

If the string table is standardized nicely, I can see some potential uses for anyone who's ambitious enough to want to build some translation memory/CAT like functionality to hook into it. Sharing of common terms, previous history, some language processing magic perhaps and a whole separate tool can be made that'd be really helpful for translators, but only if there's a suitably general source<->target string table framework around already.

Asceai
2009-01-08, 16:20
Incidentally, for VASTT, is there any support for instances where it's not a 1:1 mapping between lines of source and translated text? It's rare, but there are definitely use cases where it's useful to be able to add in an extra block of text, or possibly drop one.

Is it that rare? Because I seem to be doing it an awful lot - is this a bad thing? =P

Agilis
2009-01-08, 16:44
I haven't had to do it very often in my work, but it's mostly a function of the people I translate and how their writing works out, plus me being rather stubborn about it in some cases.

I can easily see cases where if there's lots of long technical/compound words, you'd have to add blocks just from sheer space limitation. What's rare is when you feel compelled to do it for some kind of aesthetic or dramatic presentation reason.

zalas
2009-01-08, 22:07
I can easily see cases where if there's lots of long technical/compound words, you'd have to add blocks just from sheer space limitation. What's rare is when you feel compelled to do it for some kind of aesthetic or dramatic presentation reason.
For just overflowing a line, I usually add in extra dialog boxes in the insertion process if possible. Although, most of the time it doesn't really work because the voice file only attaches to the first one. However, it was something I'm planning to add to Sengoku Rance if boxes really start overflowing.
If the string table is standardized nicely, I can see some potential uses for anyone who's ambitious enough to want to build some translation memory/CAT like functionality to hook into it. Sharing of common terms, previous history, some language processing magic perhaps and a whole separate tool can be made that'd be really helpful for translators, but only if there's a suitably general source<->target string table framework around already.
I actually thought about adding that to VASTT at some point, but I got sick of working with wxPython, which is why VASTT is feature-frozen.

Toaster
2009-01-09, 22:22
Lol, I found wxPython a little lacking as well.

Just curious, have you figured out how to shoehorn in proportional fonts? Fixed fonts seem to be the bane of many translations.
________
aromed vaporizers (http://www.vaporshop.com/aromed-vaporizer.html)

zalas
2009-01-10, 01:45
That will depend vastly on the engine in question. Generally I've been too lazy to even bother figuring that one out, since it'll require rewriting most font engines and then adding a hack to make the game automatically line-wrap at line ends (can't really pre-wrap properly for proportional fonts). I figure, if I want to have something that nice looking, I might as well start from scratch and write my own port of the engine -_-;

Toaster
2009-01-11, 01:41
Space always seems to be an issue with moving from Japanese to English so I looked into the issue a bit. I am still poking around in Win32 but the solution may be a matter of finding the calls to GetStockObject and setting the changing the parameter from ANSI_FIXED_FONT or SYSTEM_FONT to ANSI_VAR_FONT. More difficult would be messing around with SetFont calls, which would require modifying the ITextFont structure. Depending on how AGTH is implemented, it may be able to intercept these calls and modify the relevant parameters.

The line wrap issue is always a mess and would probably require quite a bit more work... assuming that one is already hacking assembly there is a Win32 function that estimates the text width.

zalas
2009-01-11, 13:01
Space always seems to be an issue with moving from Japanese to English so I looked into the issue a bit. I am still poking around in Win32 but the solution may be a matter of finding the calls to GetStockObject and setting the changing the parameter from ANSI_FIXED_FONT or SYSTEM_FONT to ANSI_VAR_FONT. More difficult would be messing around with SetFont calls, which would require modifying the ITextFont structure. Depending on how AGTH is implemented, it may be able to intercept these calls and modify the relevant parameters.
I don't think that will apply to a lot of games, especially newer ones running DirectX9. A lot of games also do their own spacing, regardless of proportionality.

The line wrap issue is always a mess and would probably require quite a bit more work... assuming that one is already hacking assembly there is a Win32 function that estimates the text width.
Most engines line wrap by stepping one character at a time until the line is full. That is not how you wrap English. You have to step words at a time, which means you have to build up a totally new data structure which holds word tokens instead of character tokens and you'd have to write a function to tokenize your input stream into that. It's doable, but I don't think it's really worth the effort.

Rasqual Twilight
2009-01-11, 13:41
You have to step words at a time, which means you have to build up a totally new data structure which holds word tokens instead of character tokens and you'd have to write a function to tokenize your input stream into that. It's doable, but I don't think it's really worth the effort.

This is what I did for one of the games I'm working on, I have run into issues of history buffers being allocated statically and thus making the process prone to crash. Otherwise, the tokenization part is not really the hardest part (since I'm plugging a DLL into the process; the DLL being compiled from whatever programming language).

Toaster
2009-01-11, 13:48
I don't think that will apply to a lot of games, especially newer ones running DirectX9. A lot of games also do their own spacing, regardless of proportionality.

The another distinction may be which games are designed to work on both PCs and consoles. I have a feeling most of the PC-only games are using win32 or directx's fonts to display text.

I agree that general line wrapping is probably out of the question. Maybe someone else on the thread will have some good ideas though.

btw - any thoughts on publishing your translation file spec when you feel that it is complete?

Toaster
2009-01-11, 13:54
Otherwise, the tokenization part is not really the hardest part (since I'm plugging a DLL into the process; the DLL being compiled from whatever programming language).

Where are you injecting the dll code? Are you replacing a dll that is part of the game engine or are you doing something like intercepting a function call and inserting your on code between the engine and the original function?
________
EX250J (http://www.cyclechaos.com/wiki/Kawasaki_EX250J)

Rasqual Twilight
2009-01-11, 14:09
I modified the executable entrypoint to a stub function that loads the dll (and installs call redirections by patching parts of the code section) if said dll can be found, otherwise it just goes on to the original entry point as if nothing had happened. Of course this requires to distribute a modified executable in the patch installer, but I don't think it's uncommon practice.

Toaster
2009-01-12, 13:33
OK. I am still reading up on api hooking, so I can't say anything specific yet. Sophistication-wise you're not that far from hooking api calls. Code-wise I think there is a small overlap with your code and api hooking, at the very least you have a process running with the target process.

In case you are interested, there is a nice article here http://www.codeguru.com/cpp/w-p/system/misc/article.php/c5667.
________
TD2 (http://www.yamaha-tech.com/wiki/Yamaha_TD2)