如何对JSON对象进行加密散列?(How to cryptographically hash a JSON object?)

以下问题比首先看起来更复杂。

假设我有一个任意的JSON对象,可以包含任何数量的数据,包括其他嵌套的JSON对象。 我想要的是JSON数据的加密哈希/摘要,而不考虑实际的JSON格式本身(例如:忽略JSON令牌之间的换行符和间隔差异)。

最后一部分是一个要求,因为JSON将由许多不同平台上的各种(de)序列化器生成/读取。 我知道至少有一个用于Java的JSON库,可以在反序列化期间读取数据时完全删除格式。 因此,它将破坏哈希。

上面的任意数据子句也使事情变得复杂,因为它阻止了我在给定的顺序中引用已知的字段,并且在使用之前连接它们(大概想想Java的非加密hashCode()方法如何工作)。

最后,将整个JSON字符串散列成一个字节(在反序列化之前)也是不可取的,因为JSON中的字段在计算哈希时应该被忽略。

我不知道有没有一个很好的解决这个问题,但我欢迎任何方法或想法=)

The following question is more complex than it may first seem.

Assume that I've got an arbitrary JSON object, one that may contain any amount of data including other nested JSON objects. What I want is a cryptographic hash/digest of the JSON data, without regard to the actual JSON formatting itself (eg: ignoring newlines and spacing differences between the JSON tokens).

The last part is a requirement, as the JSON will be generated/read by a variety of (de)serializers on a number of different platforms. I know of at least one JSON library for Java that completely removes formatting when reading data during deserialization. As such it will break the hash.

The arbitrary data clause above also complicates things, as it prevents me from taking known fields in a given order and concatenating them prior to hasing (think roughly how Java's non-cryptographic hashCode() method works).

Lastly, hashing the entire JSON String as a chunk of bytes (prior to deserialization) is not desirable either, since there are fields in the JSON that should be ignored when computing the hash.

I'm not sure there is a good solution to this problem, but I welcome any approaches or thoughts =)

最满意答案

当对允许灵活性的任何数据格式进行散列计算时,问题是常见的。 要解决这个问题,你需要规范化表示。

例如,由Twitter和其他服务用于认证的OAuth1.0a协议需要请求消息的安全哈希。 要计算哈希值,OAuth1.0a表示您需要首先按字母排序字段,用换行符分隔,删除字段名称(这是众所周知的),并为空值使用空行。 根据该规范化的结果计算签名或散列。

XML DSIG以相同的方式工作 - 您需要在签名之前规范化XML。 有一个建议的W3标准 ,因为它是签署的基本要求。 有人称它为c14n。

我不知道json的规范化标准。 值得研究。

如果没有,您可以为您的特定应用程序使用建立一个约定。 合理的开始可能是:

按字典顺序排列名称的属性 所有名称使用双引号 所有字符串值都使用双引号 名称和冒号之间以及冒号和值之间没有空格或一个空格 值之间没有空格和以下逗号 所有其他白色空间都折叠到单个空间或任何空白 - 选择一个 排除您不想签署的任何属性(一个示例是持有签名本身的属性) 用您选择的算法签署结果

您可能还想考虑如何在JSON对象中传递该签名 - 可能会建立一个众所周知的属性名称,如“nichols-hmac”或某些东西,可以获取该基本64位的散列版本。 哈希算法必须明确排除该属性。 那么,JSON的任何接收者都能够检查哈希。

规范化表示不需要是您在应用程序中传递的表示。 只需要给出一个任意的JSON对象就可以轻松地生成它。

The problem is a common one when computing hashes for any data format where flexibility is allowed. To solve this, you need to canonicalize the representation.

For example, the OAuth1.0a protocol, which is used by Twitter and other services for authentication, requires a secure hash of the request message. To compute the hash, OAuth1.0a says you need to first alphabetize the fields, separate them by newlines, remove the field names (which are well known), and use blank lines for empty values. The signature or hash is computed on the result of that canonicalization.

XML DSIG works the same way - you need to canonicalize the XML before signing it. There is a proposed W3 standard covering this, because it's such a fundamental requirement for signing. Some people call it c14n.

I don't know of a canonicalization standard for json. It's worth researching.

If there isn't one, you can certainly establish a convention for your particular application usage. A reasonable start might be:

lexicographically sort the properties by name double quotes used on all names double quotes used on all string values no space, or one-space, between names and the colon, and between the colon and the value no spaces between values and the following comma all other white space collapsed to either a single space or nothing - choose one exclude any properties you don't want to sign (one example is, the property that holds the signature itself) sign the result, with your chosen algorithm

You may also want to think about how to pass that signature in the JSON object - possibly establish a well-known property name, like "nichols-hmac" or something, that gets the base64 encoded version of the hash. This property would have to be explicitly excluded by the hashing algorithm. Then, any receiver of the JSON would be able to check the hash.

The canonicalized representation does not need to be the representation you pass around in the application. It only needs to be easily produced given an arbitrary JSON object.

如何对JSON对象进行加密散列?(How to cryptographically hash a JSON object?)

以下问题比首先看起来更复杂。

假设我有一个任意的JSON对象,可以包含任何数量的数据,包括其他嵌套的JSON对象。 我想要的是JSON数据的加密哈希/摘要,而不考虑实际的JSON格式本身(例如:忽略JSON令牌之间的换行符和间隔差异)。

最后一部分是一个要求,因为JSON将由许多不同平台上的各种(de)序列化器生成/读取。 我知道至少有一个用于Java的JSON库,可以在反序列化期间读取数据时完全删除格式。 因此,它将破坏哈希。

上面的任意数据子句也使事情变得复杂,因为它阻止了我在给定的顺序中引用已知的字段,并且在使用之前连接它们(大概想想Java的非加密hashCode()方法如何工作)。

最后,将整个JSON字符串散列成一个字节(在反序列化之前)也是不可取的,因为JSON中的字段在计算哈希时应该被忽略。

我不知道有没有一个很好的解决这个问题,但我欢迎任何方法或想法=)

The following question is more complex than it may first seem.

Assume that I've got an arbitrary JSON object, one that may contain any amount of data including other nested JSON objects. What I want is a cryptographic hash/digest of the JSON data, without regard to the actual JSON formatting itself (eg: ignoring newlines and spacing differences between the JSON tokens).

The last part is a requirement, as the JSON will be generated/read by a variety of (de)serializers on a number of different platforms. I know of at least one JSON library for Java that completely removes formatting when reading data during deserialization. As such it will break the hash.

The arbitrary data clause above also complicates things, as it prevents me from taking known fields in a given order and concatenating them prior to hasing (think roughly how Java's non-cryptographic hashCode() method works).

Lastly, hashing the entire JSON String as a chunk of bytes (prior to deserialization) is not desirable either, since there are fields in the JSON that should be ignored when computing the hash.

I'm not sure there is a good solution to this problem, but I welcome any approaches or thoughts =)

最满意答案

当对允许灵活性的任何数据格式进行散列计算时,问题是常见的。 要解决这个问题,你需要规范化表示。

例如,由Twitter和其他服务用于认证的OAuth1.0a协议需要请求消息的安全哈希。 要计算哈希值,OAuth1.0a表示您需要首先按字母排序字段,用换行符分隔,删除字段名称(这是众所周知的),并为空值使用空行。 根据该规范化的结果计算签名或散列。

XML DSIG以相同的方式工作 - 您需要在签名之前规范化XML。 有一个建议的W3标准 ,因为它是签署的基本要求。 有人称它为c14n。

我不知道json的规范化标准。 值得研究。

如果没有,您可以为您的特定应用程序使用建立一个约定。 合理的开始可能是:

按字典顺序排列名称的属性 所有名称使用双引号 所有字符串值都使用双引号 名称和冒号之间以及冒号和值之间没有空格或一个空格 值之间没有空格和以下逗号 所有其他白色空间都折叠到单个空间或任何空白 - 选择一个 排除您不想签署的任何属性(一个示例是持有签名本身的属性) 用您选择的算法签署结果

您可能还想考虑如何在JSON对象中传递该签名 - 可能会建立一个众所周知的属性名称,如“nichols-hmac”或某些东西,可以获取该基本64位的散列版本。 哈希算法必须明确排除该属性。 那么,JSON的任何接收者都能够检查哈希。

规范化表示不需要是您在应用程序中传递的表示。 只需要给出一个任意的JSON对象就可以轻松地生成它。

The problem is a common one when computing hashes for any data format where flexibility is allowed. To solve this, you need to canonicalize the representation.

For example, the OAuth1.0a protocol, which is used by Twitter and other services for authentication, requires a secure hash of the request message. To compute the hash, OAuth1.0a says you need to first alphabetize the fields, separate them by newlines, remove the field names (which are well known), and use blank lines for empty values. The signature or hash is computed on the result of that canonicalization.

XML DSIG works the same way - you need to canonicalize the XML before signing it. There is a proposed W3 standard covering this, because it's such a fundamental requirement for signing. Some people call it c14n.

I don't know of a canonicalization standard for json. It's worth researching.

If there isn't one, you can certainly establish a convention for your particular application usage. A reasonable start might be:

lexicographically sort the properties by name double quotes used on all names double quotes used on all string values no space, or one-space, between names and the colon, and between the colon and the value no spaces between values and the following comma all other white space collapsed to either a single space or nothing - choose one exclude any properties you don't want to sign (one example is, the property that holds the signature itself) sign the result, with your chosen algorithm

You may also want to think about how to pass that signature in the JSON object - possibly establish a well-known property name, like "nichols-hmac" or something, that gets the base64 encoded version of the hash. This property would have to be explicitly excluded by the hashing algorithm. Then, any receiver of the JSON would be able to check the hash.

The canonicalized representation does not need to be the representation you pass around in the application. It only needs to be easily produced given an arbitrary JSON object.