我是新来的regex,我有一个字符串的这种格式:
IhaveThisString = {WebhookName:webhook,RequestBody:somebody,RequestHeader:{emailCallBackUrl:https://yyyy-xx.zzzzz.logic.azure.com/workflows/efdbb900c/runs/00268387CU20/actions/HTTP_Webhook/repetitions/000000/run?api-version=2050-02-04&sp=%2Fruns%2F08sssssssssss0%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Frun%2C%2Fruns%2Fxxxxxxxxxxxxxxxxx20%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Fread&sv=1.0&sig=eYmmxxxxxxxxxxxxxxxxxxxhjIM,emailFileContent:hDQphc2xxxxxxxxxhpc2gsbWlzaHJhDQo=,emailFileName:456.csv,emailFrom:amitxxxx@gmail.com,emailSubject:Example,x-ms-workflow-id:6cxxxxxxxxxxxxxxbb900c,x-ms-workflow-version:08xxxxxxxxxxxxxxxx150,x-ms-workflow-name:runbook,x-ms-workflow-system-id:/locations/cxxxxx/scaleunits/xxxxx/workflows/6c14xxxxxxxxxxxxxxxb900c,x-ms-workflow-run-id:08xxxxxxxxxxxxx0,x-ms-workflow-run-tracking-id:c0889f0a-8ef9-5555-x111-77ldkfw98r34c54,x-ms-workflow-operation-name:HTTP_Webhook,x-ms-workflow-repeatitem-scope-name:For_each,x-ms-workflow-repeatitem-index:0,x-ms-workflow-repeatitem-batch-index:0,x-ms-execution-location:xxxxxxxx,x-ms-workflow-subscription-id:hfhfh-d6s6d-d7d9s-7ASassASasas4,x-ms-workflow-resourcegroup-name:rg_poc,x-ms-tracking-id:c3-5xxxx-xxxxx-asdsd-2xxxx21,x-ms-correlation-id:c3xxxxx-5xxxx-4xxxx-xxxxx-25xxxxx,x-ms-client-request-id:cxxxxxx-xxxx-4xxxx-axxx-2sssssz21,x-ms-client-tracking-id:08wwsswsCU16,x-ms-action-tracking-id:b1sdsd6-85sdsd-4fsdsd-sddc-988888888891,x-ms-zone-redundancy:optional,x-ms-activity-vector:AB.0L.OU.23,Connection:Keep-Alive,Accept-Encoding:gzip,Accept-Language:en,Host:xxxxxxxxxxx0418b.webhook.eus.azure-automation.net,User-Agent:azure-logic-apps/1.0}}"我需要使用json.load将上面的内容转换为json_object。
json_object = json.loads(IhaveThisString)但问题是,在冒号前后的字符串中缺少单引号。
我需要像这样重新格式化字符串:
{'WebhookName':'webhook','RequestBody':'somebody','RequestHeader':{'emailCallBackUrl':'https://yyyy-xx.zzzzz.logic.azure.com/workflows/efdbb900c/runs/00268387CU20/actions/HTTP_Webhook/repetitions/000000/run?api-version=2050-02-04&sp=%2Fruns%2F08sssssssssss0%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Frun%2C%2Fruns%2Fxxxxxxxxxxxxxxxxx20%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Fread&sv=1.0&sig=eYmmxxxxxxxxxxxxxxxxxxxhjIM','emailFileContent':'hDQphc2xxxxxxxxxhpc2gsbWlzaHJhDQo=','emailFileName':'456.csv','emailFrom':'amitxxxx@gmail.com','emailSubject':'Example','x-ms-workflow-id':'6cxxxxxxxxxxxxxxbb900c','x-ms-workflow-version':'08xxxxxxxxxxxxxxxx150','x-ms-workflow-name':'runbook','x-ms-workflow-system-id':'/locations/cxxxxx/scaleunits/xxxxx/workflows/6c14xxxxxxxxxxxxxxxb900c','x-ms-workflow-run-id':'08xxxxxxxxxxxxx0','x-ms-workflow-run-tracking-id':'c0889f0a-8ef9-5555-x111-77ldkfw98r34c54','x-ms-workflow-operation-name':'HTTP_Webhook','x-ms-workflow-repeatitem-scope-name':'For_each','x-ms-workflow-repeatitem-index':'0','x-ms-workflow-repeatitem-batch-index':'0','x-ms-execution-location':'xxxxxxxx','x-ms-workflow-subscription-id':'hfhfh-d6s6d-d7d9s-7ASassASasas4','x-ms-workflow-resourcegroup-name':'rg_poc','x-ms-tracking-id':'c3-5xxxx-xxxxx-asdsd-2xxxx21','x-ms-correlation-id':'c3xxxxx-5xxxx-4xxxx-xxxxx-25xxxxx','x-ms-client-request-id':'cxxxxxx-xxxx-4xxxx-axxx-2sssssz21','x-ms-client-tracking-id':'08wwsswsCU16','x-ms-action-tracking-id':'b1sdsd6-85sdsd-4fsdsd-sddc-988888888891','x-ms-zone-redundancy':'optional','x-ms-activity-vector':'AB.0L.OU.23','Connection':'Keep-Alive','Accept-Encoding':'gzip','Accept-Language':'en','Host':'xxxxxxxxxxx0418b.webhook.eus.azure-automation.net','User-Agent':'azure-logic-apps/1.0'}}"请让我知道如何在python中使用re来实现同样的目的。
注意:给定的数据有像emailCallBackUrl:https://yyyy-xx.zzzzz这样的URL,应该转换成类似'emailCallBackUrl':'https://yyyy-xx.zzzzz'
发布于 2022-10-01 07:18:53
这是一个有趣的解决方案,感谢您清晰的解释和示例输入!
基本方法是查找不是分隔符符号的字符序列(:、{、}、,):
[^{:},]+但是,这并不完全有效,因为https:从URL的其余部分中分离出来(参见这里)。
可以通过允许捕获组内的可选https:来解决此问题:
((?:https:)?[^{:},]+)请参阅regex工作的这里。当然,您可以以这种方式添加任何其他需要的异常(例如,((?:https:|http:)?[^{:},]+)也可以捕获http: )。
完整Python代码:
IhaveThisString = "{WebhookName:webhook,RequestBody:somebody,RequestHeader:{emailCallBackUrl:https://yyyy-xx.zzzzz.logic.azure.com/workflows/efdbb900c/runs/00268387CU20/actions/HTTP_Webhook/repetitions/000000/run?api-version=2050-02-04&sp=%2Fruns%2F08sssssssssss0%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Frun%2C%2Fruns%2Fxxxxxxxxxxxxxxxxx20%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Fread&sv=1.0&sig=eYmmxxxxxxxxxxxxxxxxxxxhjIM,emailFileContent:hDQphc2xxxxxxxxxhpc2gsbWlzaHJhDQo=,emailFileName:456.csv,emailFrom:amitxxxx@gmail.com,emailSubject:Example,x-ms-workflow-id:6cxxxxxxxxxxxxxxbb900c,x-ms-workflow-version:08xxxxxxxxxxxxxxxx150,x-ms-workflow-name:runbook,x-ms-workflow-system-id:/locations/cxxxxx/scaleunits/xxxxx/workflows/6c14xxxxxxxxxxxxxxxb900c,x-ms-workflow-run-id:08xxxxxxxxxxxxx0,x-ms-workflow-run-tracking-id:c0889f0a-8ef9-5555-x111-77ldkfw98r34c54,x-ms-workflow-operation-name:HTTP_Webhook,x-ms-workflow-repeatitem-scope-name:For_each,x-ms-workflow-repeatitem-index:0,x-ms-workflow-repeatitem-batch-index:0,x-ms-execution-location:xxxxxxxx,x-ms-workflow-subscription-id:hfhfh-d6s6d-d7d9s-7ASassASasas4,x-ms-workflow-resourcegroup-name:rg_poc,x-ms-tracking-id:c3-5xxxx-xxxxx-asdsd-2xxxx21,x-ms-correlation-id:c3xxxxx-5xxxx-4xxxx-xxxxx-25xxxxx,x-ms-client-request-id:cxxxxxx-xxxx-4xxxx-axxx-2sssssz21,x-ms-client-tracking-id:08wwsswsCU16,x-ms-action-tracking-id:b1sdsd6-85sdsd-4fsdsd-sddc-988888888891,x-ms-zone-redundancy:optional,x-ms-activity-vector:AB.0L.OU.23,Connection:Keep-Alive,Accept-Encoding:gzip,Accept-Language:en,Host:xxxxxxxxxxx0418b.webhook.eus.azure-automation.net,User-Agent:azure-logic-apps/1.0}}"
import re
import json
import pprint
with_quotes = re.sub(r'((?:https:)?[^{:},]+)', r'"\1"', IhaveThisString)
my_json = json.loads(with_quotes)
pp = pprint.PrettyPrinter(depth=4)
pp.pprint(my_json)输出:
{'RequestBody': 'somebody',
'RequestHeader': {'Accept-Encoding': 'gzip',
'Accept-Language': 'en',
'Connection': 'Keep-Alive',
'Host': 'xxxxxxxxxxx0418b.webhook.eus.azure-automation.net',
'User-Agent': 'azure-logic-apps/1.0',
'emailCallBackUrl': 'https://yyyy-xx.zzzzz.logic.azure.com/workflows/efdbb900c/runs/00268387CU20/actions/HTTP_Webhook/repetitions/000000/run?api-version=2050-02-04&sp=%2Fruns%2F08sssssssssss0%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Frun%2C%2Fruns%2Fxxxxxxxxxxxxxxxxx20%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Fread&sv=1.0&sig=eYmmxxxxxxxxxxxxxxxxxxxhjIM',
'emailFileContent': 'hDQphc2xxxxxxxxxhpc2gsbWlzaHJhDQo=',
'emailFileName': '456.csv',
'emailFrom': 'amitxxxx@gmail.com',
'emailSubject': 'Example',
'x-ms-action-tracking-id': 'b1sdsd6-85sdsd-4fsdsd-sddc-988888888891',
'x-ms-activity-vector': 'AB.0L.OU.23',
'x-ms-client-request-id': 'cxxxxxx-xxxx-4xxxx-axxx-2sssssz21',
'x-ms-client-tracking-id': '08wwsswsCU16',
'x-ms-correlation-id': 'c3xxxxx-5xxxx-4xxxx-xxxxx-25xxxxx',
'x-ms-execution-location': 'xxxxxxxx',
'x-ms-tracking-id': 'c3-5xxxx-xxxxx-asdsd-2xxxx21',
'x-ms-workflow-id': '6cxxxxxxxxxxxxxxbb900c',
'x-ms-workflow-name': 'runbook',
'x-ms-workflow-operation-name': 'HTTP_Webhook',
'x-ms-workflow-repeatitem-batch-index': '0',
'x-ms-workflow-repeatitem-index': '0',
'x-ms-workflow-repeatitem-scope-name': 'For_each',
'x-ms-workflow-resourcegroup-name': 'rg_poc',
'x-ms-workflow-run-id': '08xxxxxxxxxxxxx0',
'x-ms-workflow-run-tracking-id': 'c0889f0a-8ef9-5555-x111-77ldkfw98r34c54',
'x-ms-workflow-subscription-id': 'hfhfh-d6s6d-d7d9s-7ASassASasas4',
'x-ms-workflow-system-id': '/locations/cxxxxx/scaleunits/xxxxx/workflows/6c14xxxxxxxxxxxxxxxb900c',
'x-ms-workflow-version': '08xxxxxxxxxxxxxxxx150',
'x-ms-zone-redundancy': 'optional'},
'WebhookName': 'webhook'}发布于 2022-10-01 07:15:29
这将是烦人的、易碎的和不可靠的,因为一些值也有: (或其他一些特殊字符)作为值的一部分;在示例中,https://...有一个冒号,但不应该被视为JSON冒号.
如果像emailSubject这样的字段可能包含绝对的任何内容,情况就会更糟。
您给出的示例可以使用以下内容:
with_quotes = re.sub(r'[{}:,]+', r'"\g<0>"', IhaveThisString)
# fix up URLs broken by the quoting
with_quotes = re.sub('(https?)":"//', r'\g<1>://', with_quotes)
assert with_quotes[0] == with_quotes[-1] == '"'
print(with_quotes)
json_object = json.loads(with_quotes[1:-1])
pprint.pprint(json_object)但是,这不是一个很好的解决方案,因为emailSubject字段中的标点符号会将其抛出。
理想情况下,请尝试修复发送此数据的系统。
否则,您可能不得不手工解析,而不依赖于json.loads,从所拥有的数据中选择您需要的信息。
https://stackoverflow.com/questions/73916352
复制相似问题