从反向工程lua文件解码Ascii字符串值(Decode Ascii string values from reverse-engineered lua file)

我用unluac反编译了一个lua文件,结果发现所有字符串变量都不可读,而是ascii encoded

clues = { { answer = { "\216\173", "\216\177", "\216\168", "\216\167", "\216\161" }, text = "\216\173\217\138\217\136\216\167\217\134\032\216\178\216\167\216\173\217\129\032\217\138\216\186\217\138\216\177\032\217\132\217\136\217\134\032\216\172\217\132\216\175\217\135", syllables = { {"\216\173", "\216\177"}, {"\216\168", "\216\167"}, {"\216\161"} }

我怎么去解码整个文件ignoring any non ascii characters python或java中的ignoring any non ascii characters ?

I decompiled a lua file with unluac and it turns out all the string variables are not readable and are instead ascii encoded

clues = { { answer = { "\216\173", "\216\177", "\216\168", "\216\167", "\216\161" }, text = "\216\173\217\138\217\136\216\167\217\134\032\216\178\216\167\216\173\217\129\032\217\138\216\186\217\138\216\177\032\217\132\217\136\217\134\032\216\172\217\132\216\175\217\135", syllables = { {"\216\173", "\216\177"}, {"\216\168", "\216\167"}, {"\216\161"} }

How do i go about decoding the whole file ignoring any non ascii characters in python or java?

最满意答案

您有UTF-8编码数据而不是 ASCII,每个字节使用十进制数编码为三位数转义序列。 实际文本主要包括阿拉伯文写作。

您需要将每个\ddd序列替换为相应的字节值,然后解码为UTF-8。 在Python 3中:

utf8_data = bytes([int(data[i + 1:i + 4]) for i in range(0, len(data), 4)]) print(utf8_data.decode('utf8'))

演示:

>>> data = r"\216\173\217\138\217\136\216\167\217\134\032\216\178\216\167\216\173\217\129\032\217\138\216\186\217\138\216\177\032\217\132\217\136\217\134\032\216\172\217\132\216\175\217\135" >>> utf8_data = bytes([int(data[i + 1:i + 4]) for i in range(0, len(data), 4)]) >>> print(utf8_data.decode('utf8')) حيوان زاحف يغير لون جلده

谷歌翻译告诉我这是一个令人毛骨悚然的动物用英语改变其皮肤的颜色

我们可以使用基于堆栈的解析器将Lua语法转换为JSON:

import re import json def lua_to_python(lua_data): return json.loads(''.join(_convert_lua_to_json_chunks(lua_data))) def _lua_bytes_to_text(data): return bytes( [int(data[i + 1:i + 4]) for i in range(0, len(data), 4)] ).decode('utf8') def _convert_lua_to_json_chunks(lua_data): tokens = re.split(br'(["{},])', lua_data) stack = [] pos_tokens = enumerate(tokens) for pos, token in pos_tokens: if b'=' in token: if not stack: # top-level key-value, produce JSON object syntax stack.append('}') yield '{' yield '"{}":'.format(token.strip().rstrip(b' =').decode('utf8')) elif token == b'{': # array or object? next_nonws = next(t for t in tokens[pos + 1:] if t.strip()) if b'=' in next_nonws: stack.append('}') yield '{' else: stack.append(']') yield '[' elif token == b'}': yield stack.pop() elif token == b'"': yield '"' for pos, s in pos_tokens: if s == b'"': yield '"' break yield _lua_bytes_to_text(s) else: yield token.decode('utf8') yield from stack

最后添加两个}字符,然后您的数据会产生:

>>> lua_to_python(lua_data) {'clues': [{'answer': ['ح', 'ر', 'ب', 'ا', 'ء'], 'text': 'حيوان زاحف يغير لون جلده', 'syllables': [['ح', 'ر'], ['ب', 'ا'], ['ء']]}]} >>> pprint(lua_to_python(lua_data)) {'clues': [{'answer': ['ح', 'ر', 'ب', 'ا', 'ء'], 'syllables': [['ح', 'ر'], ['ب', 'ا'], ['ء']], 'text': 'حيوان زاحف يغير لون جلده'}]}

这应该为您提供了进一步处理数据的大量选项。

You have UTF-8 encoded data, not ASCII, with each byte encoded to a three-digit escape sequence using decimal numbers. The actual text consists mainly of arabic writing.

You need to replace each \ddd sequence with the corresponding byte value, then decode as UTF-8. In Python 3:

utf8_data = bytes([int(data[i + 1:i + 4]) for i in range(0, len(data), 4)]) print(utf8_data.decode('utf8'))

Demo:

>>> data = r"\216\173\217\138\217\136\216\167\217\134\032\216\178\216\167\216\173\217\129\032\217\138\216\186\217\138\216\177\032\217\132\217\136\217\134\032\216\172\217\132\216\175\217\135" >>> utf8_data = bytes([int(data[i + 1:i + 4]) for i in range(0, len(data), 4)]) >>> print(utf8_data.decode('utf8')) حيوان زاحف يغير لون جلده

Google Translate tells me this is A creepy animal changes the color of its skin in English.

We can otherwise convert the Lua syntax to JSON using a stack-based parser:

import re import json def lua_to_python(lua_data): return json.loads(''.join(_convert_lua_to_json_chunks(lua_data))) def _lua_bytes_to_text(data): return bytes( [int(data[i + 1:i + 4]) for i in range(0, len(data), 4)] ).decode('utf8') def _convert_lua_to_json_chunks(lua_data): tokens = re.split(br'(["{},])', lua_data) stack = [] pos_tokens = enumerate(tokens) for pos, token in pos_tokens: if b'=' in token: if not stack: # top-level key-value, produce JSON object syntax stack.append('}') yield '{' yield '"{}":'.format(token.strip().rstrip(b' =').decode('utf8')) elif token == b'{': # array or object? next_nonws = next(t for t in tokens[pos + 1:] if t.strip()) if b'=' in next_nonws: stack.append('}') yield '{' else: stack.append(']') yield '[' elif token == b'}': yield stack.pop() elif token == b'"': yield '"' for pos, s in pos_tokens: if s == b'"': yield '"' break yield _lua_bytes_to_text(s) else: yield token.decode('utf8') yield from stack

With two additional } characters at the end, your data then produces:

>>> lua_to_python(lua_data) {'clues': [{'answer': ['ح', 'ر', 'ب', 'ا', 'ء'], 'text': 'حيوان زاحف يغير لون جلده', 'syllables': [['ح', 'ر'], ['ب', 'ا'], ['ء']]}]} >>> pprint(lua_to_python(lua_data)) {'clues': [{'answer': ['ح', 'ر', 'ب', 'ا', 'ء'], 'syllables': [['ح', 'ر'], ['ب', 'ا'], ['ء']], 'text': 'حيوان زاحف يغير لون جلده'}]}

This should give you plenty of options to further process the data.

从反向工程lua文件解码Ascii字符串值(Decode Ascii string values from reverse-engineered lua file)

我用unluac反编译了一个lua文件,结果发现所有字符串变量都不可读,而是ascii encoded

clues = { { answer = { "\216\173", "\216\177", "\216\168", "\216\167", "\216\161" }, text = "\216\173\217\138\217\136\216\167\217\134\032\216\178\216\167\216\173\217\129\032\217\138\216\186\217\138\216\177\032\217\132\217\136\217\134\032\216\172\217\132\216\175\217\135", syllables = { {"\216\173", "\216\177"}, {"\216\168", "\216\167"}, {"\216\161"} }

我怎么去解码整个文件ignoring any non ascii characters python或java中的ignoring any non ascii characters ?

I decompiled a lua file with unluac and it turns out all the string variables are not readable and are instead ascii encoded

clues = { { answer = { "\216\173", "\216\177", "\216\168", "\216\167", "\216\161" }, text = "\216\173\217\138\217\136\216\167\217\134\032\216\178\216\167\216\173\217\129\032\217\138\216\186\217\138\216\177\032\217\132\217\136\217\134\032\216\172\217\132\216\175\217\135", syllables = { {"\216\173", "\216\177"}, {"\216\168", "\216\167"}, {"\216\161"} }

How do i go about decoding the whole file ignoring any non ascii characters in python or java?

最满意答案

您有UTF-8编码数据而不是 ASCII,每个字节使用十进制数编码为三位数转义序列。 实际文本主要包括阿拉伯文写作。

您需要将每个\ddd序列替换为相应的字节值,然后解码为UTF-8。 在Python 3中:

utf8_data = bytes([int(data[i + 1:i + 4]) for i in range(0, len(data), 4)]) print(utf8_data.decode('utf8'))

演示:

>>> data = r"\216\173\217\138\217\136\216\167\217\134\032\216\178\216\167\216\173\217\129\032\217\138\216\186\217\138\216\177\032\217\132\217\136\217\134\032\216\172\217\132\216\175\217\135" >>> utf8_data = bytes([int(data[i + 1:i + 4]) for i in range(0, len(data), 4)]) >>> print(utf8_data.decode('utf8')) حيوان زاحف يغير لون جلده

谷歌翻译告诉我这是一个令人毛骨悚然的动物用英语改变其皮肤的颜色

我们可以使用基于堆栈的解析器将Lua语法转换为JSON:

import re import json def lua_to_python(lua_data): return json.loads(''.join(_convert_lua_to_json_chunks(lua_data))) def _lua_bytes_to_text(data): return bytes( [int(data[i + 1:i + 4]) for i in range(0, len(data), 4)] ).decode('utf8') def _convert_lua_to_json_chunks(lua_data): tokens = re.split(br'(["{},])', lua_data) stack = [] pos_tokens = enumerate(tokens) for pos, token in pos_tokens: if b'=' in token: if not stack: # top-level key-value, produce JSON object syntax stack.append('}') yield '{' yield '"{}":'.format(token.strip().rstrip(b' =').decode('utf8')) elif token == b'{': # array or object? next_nonws = next(t for t in tokens[pos + 1:] if t.strip()) if b'=' in next_nonws: stack.append('}') yield '{' else: stack.append(']') yield '[' elif token == b'}': yield stack.pop() elif token == b'"': yield '"' for pos, s in pos_tokens: if s == b'"': yield '"' break yield _lua_bytes_to_text(s) else: yield token.decode('utf8') yield from stack

最后添加两个}字符,然后您的数据会产生:

>>> lua_to_python(lua_data) {'clues': [{'answer': ['ح', 'ر', 'ب', 'ا', 'ء'], 'text': 'حيوان زاحف يغير لون جلده', 'syllables': [['ح', 'ر'], ['ب', 'ا'], ['ء']]}]} >>> pprint(lua_to_python(lua_data)) {'clues': [{'answer': ['ح', 'ر', 'ب', 'ا', 'ء'], 'syllables': [['ح', 'ر'], ['ب', 'ا'], ['ء']], 'text': 'حيوان زاحف يغير لون جلده'}]}

这应该为您提供了进一步处理数据的大量选项。

You have UTF-8 encoded data, not ASCII, with each byte encoded to a three-digit escape sequence using decimal numbers. The actual text consists mainly of arabic writing.

You need to replace each \ddd sequence with the corresponding byte value, then decode as UTF-8. In Python 3:

utf8_data = bytes([int(data[i + 1:i + 4]) for i in range(0, len(data), 4)]) print(utf8_data.decode('utf8'))

Demo:

>>> data = r"\216\173\217\138\217\136\216\167\217\134\032\216\178\216\167\216\173\217\129\032\217\138\216\186\217\138\216\177\032\217\132\217\136\217\134\032\216\172\217\132\216\175\217\135" >>> utf8_data = bytes([int(data[i + 1:i + 4]) for i in range(0, len(data), 4)]) >>> print(utf8_data.decode('utf8')) حيوان زاحف يغير لون جلده

Google Translate tells me this is A creepy animal changes the color of its skin in English.

We can otherwise convert the Lua syntax to JSON using a stack-based parser:

import re import json def lua_to_python(lua_data): return json.loads(''.join(_convert_lua_to_json_chunks(lua_data))) def _lua_bytes_to_text(data): return bytes( [int(data[i + 1:i + 4]) for i in range(0, len(data), 4)] ).decode('utf8') def _convert_lua_to_json_chunks(lua_data): tokens = re.split(br'(["{},])', lua_data) stack = [] pos_tokens = enumerate(tokens) for pos, token in pos_tokens: if b'=' in token: if not stack: # top-level key-value, produce JSON object syntax stack.append('}') yield '{' yield '"{}":'.format(token.strip().rstrip(b' =').decode('utf8')) elif token == b'{': # array or object? next_nonws = next(t for t in tokens[pos + 1:] if t.strip()) if b'=' in next_nonws: stack.append('}') yield '{' else: stack.append(']') yield '[' elif token == b'}': yield stack.pop() elif token == b'"': yield '"' for pos, s in pos_tokens: if s == b'"': yield '"' break yield _lua_bytes_to_text(s) else: yield token.decode('utf8') yield from stack

With two additional } characters at the end, your data then produces:

>>> lua_to_python(lua_data) {'clues': [{'answer': ['ح', 'ر', 'ب', 'ا', 'ء'], 'text': 'حيوان زاحف يغير لون جلده', 'syllables': [['ح', 'ر'], ['ب', 'ا'], ['ء']]}]} >>> pprint(lua_to_python(lua_data)) {'clues': [{'answer': ['ح', 'ر', 'ب', 'ا', 'ء'], 'syllables': [['ح', 'ر'], ['ب', 'ا'], ['ء']], 'text': 'حيوان زاحف يغير لون جلده'}]}

This should give you plenty of options to further process the data.