Page 1 of 3

B-$hep SCR decompiler

Posted: 04 Jan 2010, 17:34
by B-$hep
Hi guys.
I have a plan to make a SCR decompiler.
First at least some simple one that decompiles very basic scripts.

Primary thing i must figure out is that how variables are saved in SCR, probabaly every variable has been converted to pointer that points to some table item which holds all data in script. From where GTA2 reads the data.
Of course i haven't deeply looked at this part but it seems so. Something like that.

For example:
PLAYER_PED dummy = (112.7, 5.7, 5.7) 1 180


Where has that "dummy" gone in SCR ?!! Where is PLAYER_PED?! And such stuff that must be figured out.

If SCR would use simple tokens for each command then it would be pretty easy to make a decompiler. But probably it has tokens but as i said i haven't figured out this yet really.

What i need: for example how LEVELSTART, LEVELEND is saved.
If you skip any of these then MIS compiler will crash. So this is one problem, just to tell you.


But because if had so much luck with SCR files already, im sure i can figure out something.


The best thing about them that they are small, only 80,7KB. Of course alot of data is packed but still they contain alot of empty space also. Depends on scr.

Re: Scripts of Tiny Town, Hidden Surprise and Face Off

Posted: 04 Jan 2010, 21:03
by ALPINE
B-$hep wrote:PLAYER_PED dummy = (112.7, 5.7, 5.7) 1 180
Where has that "dummy" gone in SCR ?!! Where is PLAYER_PED?!
Variable and pointer variable names are NOT saved in SCR. Game don't care about variable names - it saves them only as indexes. For this example, gta2.exe have an array like "player_peds[]". So, if you write:

Code: Select all

PLAYER_PED dummy = (112.7, 5.7, 5.7) 1 180
PLAYER_PED bob = (113.7, 5.7, 5.7) 1 180
PLAYER_PED b_shep = (113.7, 5.7, 5.7) 1 180
after compile it was something like

Code: Select all

player_ped[0] = (112.7, 5.7, 5.7) 1 180
player_ped[1] = (112.7, 5.7, 5.7) 1 180
player_ped[2] = (112.7, 5.7, 5.7) 1 180
without any variable names.

So decompiler can't generate original .mis code. It only can generate variable names like ped1, ped2, ped3 etc.

If you really want to make a decompiler - you need to find coordinates in scr, not names.
If you write:

Code: Select all

PLAYER_PED dummy = (112.7, 5.7, 5.7) 1 180
,
you should - for example - open SCR file in any hex editor and try to find value "112.7". It might be saved as float (32-bit floating point), or as encoded integer, for example, (int)round(112.7 * 65536) with integer part in 2 higher bytes and fraction part in 2 lower bytes. Or (short int)round(112.7*256) with integer part in higher byte and fraction part in lower byte. Or anything else. It might be signed or unsigned.

Try...

P.S. Delphi 7 programmer? Pascal fan? :) I hope you know basics of C++ to understand my explanation :)

Re: Scripts of Tiny Town, Hidden Surprise and Face Off

Posted: 04 Jan 2010, 22:18
by B-$hep
Alpine, i already know all that, except indexes.

Check out my tool
http://www.gtaforums.com/index.php?showtopic=432158

If i wouldn't know that then i wouldn't be able to create such tool and i know how coords are saved. This is easy now, as i have figured it out already, because as i said, i wouldn't be able to create my SCR Tool

OK, this is logical actually that variables are not saved and i must generate vars by myself.
But how actually game finds the commands in SCR, that's the problem.
Basically i can find any coordinate in SCR (take cranes, crushers, destructors, generators, powerups, etc), move them to other places, remove or hide, it doesn't matter.

Do you actually understand the primary problem?

For example PLAYER_PED, LEVELSTART, LEVELEND etc. PLAYER_PED requires coordinates, rotation and remap. This is easy.

But how the hell game finds that PLAYER_PED in SCR? That's the biggest problem.


Probably it holds the data as you said, for example: player_ped[0].
This zero points probably to some command (token) table from which game figures out what command is used.

PLAYER_PED is replaced by some token or pointer (for example B5) that points to some command table from where game figures out what command is that B5.

miss2.exe generates some txt files that contain data like:

Code: Select all

1 	PLAYER_PED		EXEC 2	(2916352,6684672,131072)	0	25	
2 	ARROW_DEC		EXEC 3	arrow 
3 	ARROW_DEC		EXEC 4	arrow_2 
4 	ARROW_DEC		EXEC 5	arrow_3 
5 	ARROW_DEC		EXEC 6	arrow_4 
6 	ARROW_DEC		EXEC 7	arrow_end 
7 	MAP_ZONE_SET		 8	B01	0 0 0 0 200 0 0 0 1000 0 
8 	MAP_ZONE_SET		 9	B02	0 0 0 0 300 0 0 0 1000 0 
9 	MAP_ZONE_SET		 10	B03	0 0 0 0 200 0 0 0 1000 0 
10 	MAP_ZONE_SET		 11	B04	0 0 0 0 300 0 0 0 1000 0 
11 	MAP_ZONE_SET		 12	B05	0 0 0 0 400 0 0 0 1000 0 
etc. This will be handy also i guess to figure things out. But main problem still is the MAIN that should be solved.
If you know anything else about the SCR files, please let me know.

EDIT: i should send R* a letter and asking to release SCR file specs.
Maybe they can publish at least this little very old useless / outdated info?
Nobody needs this but we do.

EDIT2: email passed to R*


About C++, well, i have used it 3 years already. Not a problem for me.
But im just active Pascal programmer (more than 6 years now).
It doesn't matter what language i use actually.

Sorry guys for offtopic here, this discussion should be moved to my SCR tool thread.

Re: Scripts of Tiny Town, Hidden Surprise and Face Off

Posted: 05 Jan 2010, 12:28
by ALPINE
I recommend to try something like this:

Code: Select all

CHAR_DATA char1
CHAR_DATA char2
CHAR_DATA char3
CHAR_DATA char4
CHAR_DATA char5
CHAR_DATA char6
CHAR_DATA char7
CHAR_DATA char8
Compile it. SCR now may contain 8 same blocks, but 1 byte must be different due to different variable names. So, you should find out position of this byte. Every block means "char_data" command, your position of byte - index of variable name.

After that write this:

Code: Select all

char1 = CREATE_CHAR_INSIDE_CAR  ( car1 ) remap  occupation  END
char2 = CREATE_CHAR_INSIDE_CAR  ( car1 ) remap  occupation  END
char3 = CREATE_CHAR_INSIDE_CAR  ( car1 ) remap  occupation  END
char4 = CREATE_CHAR_INSIDE_CAR  ( car1 ) remap  occupation  END
char5 = CREATE_CHAR_INSIDE_CAR  ( car1 ) remap  occupation  END
char6 = CREATE_CHAR_INSIDE_CAR  ( car1 ) remap  occupation  END
char7 = CREATE_CHAR_INSIDE_CAR  ( car1 ) remap  occupation  END
char8 = CREATE_CHAR_INSIDE_CAR  ( car1 ) remap  occupation  END
Again, SCR now may contain 8 same blocks and 1 byte must be different and it equals variable index described before. Block means implementation of CREATE_CHAR_INSIDE_CAR() with this parameters, changing byte means variable index.

changing car index, remap value, occupation value you can find out position of bytes describing parameters of CREATE_CHAR_INSIDE_CAR() function.

etc...

About functions and command that don't have any parameters (for example, HAS_PARK_FINISHED). Strategy for exploring will be next:

Code: Select all

CHAR_DATA char1
HAS_PARK_FINISHED
CHAR_DATA char2
HAS_PARK_FINISHED
CHAR_DATA char3
HAS_PARK_FINISHED
If you have already explored CHAR_DATA, you know how it is implemented in SCR and you can see bytes between two CHAR_DATA blocks - it is exactly HAS_PARK_FINISHED() function.

About IF, WHILE operators.
I think SCR file contains array of simplest instructions like assembler code. As I think, every command must be tagged with index. If "IF" block is written like this:

Code: Select all

IF ( HAS_PARK_FINISHED() )
  MAKE_ALL_CHARS_MUGGERS(ON)
ELSE
  MAKE_ALL_CHAR_MUGGERS(OFF)
ENDIF
it may be implemented like this:

Code: Select all

1 HAS_PARK_FINISHED()
2 GOTO_IF_ZERO 5
3 MAKE_ALL_CHARS_MUGGERS(ON)
4 GOTO 6
5 MAKE_ALL_CHARS_MUGGERS(OFF)
6 next commands... 

GOTO_IF_ZERO N means go to instruction #N, if previous instruction return zero. in AVR assembler this instruction name is BREQ (branch if equal).

"IF" also can be implemented as any other code so it will be really hard to learn...

I think it will be enough to find out simple commands for extracting object positions, remaps and car models from SCR file.

Re: SCR decompiler

Posted: 05 Jan 2010, 16:33
by B-$hep
Very nice info, i will try this.

May i ask how you know all this? Are you some veteran scripter or guy who tried to make similar tool?
You don't have to answer if you don't want. Im just curious.


I will try the info you gave me. But you will be in credits for sure.


Thanks.


EDIT: The first script example you gave me, generated this:

Code: Select all

1 	CHAR_DEC		EXEC 2	char1 
2 	CHAR_DEC		EXEC 3	char2 
3 	CHAR_DEC		EXEC 4	char3 
4 	CHAR_DEC		EXEC 5	char4 
5 	CHAR_DEC		EXEC 6	char5 
6 	CHAR_DEC		EXEC 7	char6 
7 	CHAR_DEC		EXEC 8	char7 
8 	CHAR_DEC		EXEC -1	char8 

Re: SCR decompiler

Posted: 05 Jan 2010, 18:19
by Razor
B-$hep wrote: You don't have to answer if you don't want. Im just curious.
not just you :)

Re: SCR decompiler

Posted: 05 Jan 2010, 22:15
by B-$hep
OK, i decided to start to make basics of decompiler. Just for fun.
I know maybe i shouldn't hurry but i can't decompile scripts by just collecting info.
I need to code to make this possible. So...

At the moment it should correctly recognize LEVELSTART, LEVELEND and first PLAYER_PED in SCR files and of course it's coordinates, remap and rotation.


I guess it's pointless to release this yet? Or not??

Actually it would be nice if you guys here would test it on your scripts. To be sure if it correctly finds the LEVELSTART, LEVELEND and first player.

These are core commands that should always work. Without them there would be no script or the script will be invalid.


So release the first version?


EDIT: also one question: should be the decompiler integrated into SCR tool?
Or it's better if it's a independent tool?

Re: SCR decompiler

Posted: 05 Jan 2010, 22:53
by ALPINE
Are you some veteran scripter or guy who tried to make similar tool?
:D No, I just tried to imagine how SCR files should be interpreted. Same source code should be compiled in same instruction blocks. If we know source code and searched for coordinate values (which is same in source code and in SCR), we should find all block that implements block of source code.

Think logically. Imagine how gta2 programmers should implement things you researching. No tricks :) and of course I have some skill in programming.
release the first version?
No.
Try to develop program that can recognize every PLAYER_PED blocks in scr file (try to do that I'm written above).
should be the decompiler integrated into SCR tool?
Being Linux user, I prefer "Make each program do one thing but do it well" philosophy :) If I were you I'll make console SCR tool, console SCR decompiler and one GUI-based frontend for both this program. Most free software is built like this.

You should do easier: integrate decompiler into SCR tool and make it working through console parameters (for example, scrtool.exe bil.scr). It will be nice because user can simply drag'n'drop .scr file on scrtool.exe and immediately get .mis.
But you will be in credits for sure.
OK :) can you add link to my website? (not ready yet, it will be written after some months...)

Re: SCR decompiler

Posted: 05 Jan 2010, 23:30
by B-$hep
OK Alpine. Will do like you said.


For example i played around with your very first script example:

Code: Select all

CHAR_DATA char1
CHAR_DATA char2
CHAR_DATA char3
...etc
And i figured out that it really is like you said: 8 chunks, each chunk has it's own index.
Last one is 8 as it should. They start from 1 like in script.
Each CHAR_DATA chunk (or block like you said) is exactly 35 bytes in size.


I guess i must get first the chunks and then parse each chunk individually.


So now i will try to apply same method to PLAYER_PED also.

Re: SCR decompiler

Posted: 06 Jan 2010, 08:11
by NTAuthority
B-$hep wrote: Each CHAR_DATA chunk (or block like you said) is exactly 35 bytes in size.
Strange non-rounded size, indeed. It could be that it contains an identifier of exactly one byte's size, and GTA2's engine stores CHAR values directly into the script's memory, unlike GTA3 and so on, which just stores an identifier -- or it contains all possible data that could be passed as initializer.

However, the method of doing stuff specifically for each command might not be the best way. Also, it's likely the 'memory' structure and the 'code' structure are kept in the same file like in GTA3script - especially seeing GTA2's declaration methods, which preload data into the memory. Another thing you should try first is to find out the base file structure - where the memory section starts, where the code section starts - what the header offsets do, and so on.

I also expect a code function to not be like a normal CPU assembly's format (which GTA4's .sco format is) but more like embedded stuff (GTA3 .scm-ish, I don't think they'd make assembly, followed by direct calls, followed by assembly in scripting engine rewrites :p )

For example, in GTA3, a function could be stored in the script like this:

01 00 04 00

This corresponds to 'WAIT 0' in the original .sc language. In the case of GTA3, because a list of original commands was not available, people have grown to naming the calls by the internal identifier (0001, some kind of chronological order). The parameter type in this case was '04', which stands for '1-byte int'. I don't think it's likely GBHscript uses fixed-length parameters for each function (but it could be!). However, this means you should know the parameter count for every function for the decompilation to work. However, since we have the docs, it should be possible to do that. And the value, obviously, is 0.

Still, I recommend to first find the global structure, and only then find the meanings of the sub-structures.

Re: SCR decompiler

Posted: 06 Jan 2010, 09:36
by B-$hep
The miss2 also changes something in SCR header, not always but in some specific cases.

Seems that SCR files has some ID also in header:

Code: Select all

00 00 08 00 24
which almost never changes. But in some new scripts i have seen different value in there.
Maybe because of compiler version changes or something, can't be sure.

For example if you have simplest script and add bunch of PLAYER_PED declarations:

Code: Select all

PLAYER_PED dummy1 =(13.0,12.0,12.0) 2 180
PLAYER_PED dummy2 =(13.0,12.0,12.0) 2 180
PLAYER_PED dummy3 =(13.0,12.0,12.0) 2 180
PLAYER_PED dummy4 =(13.0,12.0,12.0) 2 180
PLAYER_PED dummy5 =(13.0,12.0,12.0) 2 180

LEVELSTART
LEVELEND
Header is changed. It adds some data into header. Probably it stores also how many PLAYERS SCR file has or something like that. I don't know.

The bytes added are:

Code: Select all

40 5C 78
With one player, header has value of 2C. Also not always. Usually when you write your own new small script and compile it. Original scripts have alot of info stored in top of the file (ie. header). Probably it holds alot of useful info that would help in decompiling but i don't know yet exactly what.


EDIT, i forgot to tell that PLAYER_PED chunks have size of 28 bytes.



EDIT2: discovered interesting thing:

For example i created such script first (testa.scr):

Code: Select all

LEVELSTART
LEVELEND
Then added more stuff (test.scr):

Code: Select all

LEVELSTART
LEVELEND
LEVELSTART
LEVELEND
LEVELEND
LEVELEND
LEVELEND
LEVELEND
Compiled and compared. Result in top of the SCR file is this:
Image

As you see, even that 24 changed, which very rarely changes.

Because miss2 displays info like:

"Size of Mainscript: xx bytes"
"Max scripts in memory midgame: xx bytes"
"Max commands in memory midgame: xx lines"

This helps calculating size of command "blocks".

For example Size of Mainscript is 72 bytes.

Add one LEVELSTART, after that
Size of Mainscript: 80 bytes.

Same with LEVELEND command. Size is 8 bytes.


And miss2 seems to store that in his SCR files header.



EDIT3: Ok, CHAR_DATA size is actually 36 bytes. If you calculate using miss2 info.


Let's say Mainscript size is 80 bytes, add one

Code: Select all

CHAR_DATA yyy
and size becomes: 116 bytes.


So size of CHAR_DATA is: 116-80 = 36 bytes

Re: SCR decompiler

Posted: 09 Jan 2010, 03:04
by Sektor
Do you think you could do a proper fix for ste.scr trailer kill frenzies?

I want to insert this IF statement before the PUT_CAR_ON_TRAILER commands and put the ENDIF afterwards. Each command is numbered in the SCR file, so you'd have to renumber every command that comes after it and that would break all the GOTOs so you'd have to change them too. Did you manage to insert any code yet?

IF ( NOT ( IS_CAR_CRUSHED ( KF_5_car ) ) )
PUT_CAR_ON_TRAILER ( KF_5_car , KF_5_trailer_1 )
ENDIF

These are the bytes I changed in ste.SCR to null out the PUT_CAR_ON_TRAILER code but adding the IF statement would be better.

Comparing files steFIXED.SCR and steORIGINAL.SCR
000092FE: 00 3C
000092FF: 00 01
00009304: 00 FF
00009305: 00 01
00009306: 00 02
00009307: 00 02
0000930A: 00 3C
0000930B: 00 01
00009311: 00 02
00009312: 00 03
00009313: 00 02
00009316: 00 3C
00009317: 00 01
0000931C: 00 01
0000931D: 00 02
0000931E: 00 04
0000931F: 00 02

It's not important and I'm not sure making the kill frenzies harder to start is a good thing.

Re: SCR decompiler

Posted: 09 Jan 2010, 10:37
by B-$hep
It's funny.

Just yesterday i was thinking about your fix for that trailer KF bug and thought maybe i could make it better.
And now here you are with same question. I even have read some topic earlier somewhere where you have been talking about that. Can you find it?

Probably at GTAF. But i can't find the thread. Every bit of info will help.
Also i would like to know what KF exactly is causing this crash / bug?
And how to reproduce this ?

All the KF that are on trailer or just some specific one?


If all that are on trailer then it's pretty bad coding from R*.

Re: SCR decompiler

Posted: 09 Jan 2010, 10:55
by Sektor
All the frenzies that use a trailer could crash the game when loading a save. The save game error gtaforums topic.

There are 3 on Residential and 3 on Industrial. The problem is the cars that are meant to be put on the trailer are not spawned if you already tried that frenzy, so the put_car_on_trailer command causes a crash when loading a save. I guess it could be fixed a different way by spawning the cars no matter what, maybe they do spawn but then get cleared after the game checks the save.

Re: SCR decompiler

Posted: 09 Jan 2010, 12:23
by B-$hep
Well i quickly put together small test script by copying some pieces from different places

Code: Select all

PLAYER_PED player1 = (33.5, 22.5, 2.0) 25 0

CAR_DATA fbicar = (34.5,16.5) -1 000 EDSELFBI
PARKED_CAR_DATA trailer = (38.0,13.0) 19 90 TRUKTRNS

LEVELSTART 
explode(fbicar)

IF (  IS_CAR_CRUSHED ( fbicar ) )
   PUT_CAR_ON_TRAILER ( fbicar , trailer )
ENDIF
LEVELEND
Compiled, then renamed, then commented out the explode(fbicar).
Compiled and now compared them both.

Alot of stuff changed because of such simple command.

Image

+ some changes in header.


I guess i must understand each command (byte) before i will be able to move them around.
I.e. i must know what each byte does. Probably they are like you said numbered and also contain commands as simple tokens.

Header is also important, because miss2 almost always saves something in there for GTA2.

But don't worry. I will do my best.
It's a good challenge.

Re: SCR decompiler

Posted: 11 Jan 2010, 15:35
by elypter
i read over this thread and got curious about what might be possible with scripting.
if i got it right, commands are represented by some binary ids. I dont know if know enough yet to tell me if it might be possible adress functions inside gta2.exe that cannot be used with the official scriptinglanguage. Independent from that, is it imaginable that another compiler with a more flexible scripting language could be written for this bytecode, something python-like with arrays and such things?

Re: SCR decompiler

Posted: 11 Jan 2010, 15:44
by Sektor
It might be possible to inject code with an overflow in the SCR files but it would be better to do it by replacing one of the DLL files or modifying RAM/exe. It's beyond my skill level.

When I was trying to disable the invul in Tiny Town, I accidentally changed bytes that made all the parked cars mimic the movements of the player car. It crashed very soon after and was probably the game thinking all the cars had the same ID.

Re: SCR decompiler

Posted: 24 Jan 2010, 03:47
by BenMillard
Is there any practical use to a .scr decompiler? All fan-made levels start from scratch, with a .mis source code. So the only levels it would give us new information for are the originals. And we don't need any more information about those, do we?

I can see the challenge has excited you. Reminds me of the reverse-engineering we've done for all the other GTA editions.

Thing is, this is extremely time-consuming. You already have some projects which would be hugely useful to the community, such as the map viewer and editor. Maybe writing documentation for those parts which are known about but aren't written down, too.

Re: SCR decompiler

Posted: 24 Jan 2010, 09:43
by B-$hep
Yes it's time consuming and to be honest i don't have the time to mess with it at the moment.
Of course im not saying that i abandoned this completely but im working on more useful stuff at the moment, like you said: map editor for example.
Hell no. I can't stop that project, no matter the cost or time.
BenMillard wrote:Maybe writing documentation for those parts which are known about but aren't written down, too.
Like?
Examples please (if any).

Any requests for STY tool? I can modify (add or make feature better).
I got permission for that some months ago. Actually i had some ideas but have been lazy to update the STY tool, because i don't use it too often.

But if anybody asks for some changes or needs some new feature then let me know.

edit: STY Tool requests moved to own thread

Re: SCR decompiler

Posted: 24 Jan 2010, 12:31
by Razor
i would like to see mafia town mis... sb didnt included it :(