首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用regex的Python字符串格式设置

使用regex的Python字符串格式设置
EN

Stack Overflow用户
提问于 2022-10-01 06:45:31
回答 2查看 91关注 0票数 -1

我是新来的regex,我有一个字符串的这种格式:

代码语言:javascript
复制
IhaveThisString = {WebhookName:webhook,RequestBody:somebody,RequestHeader:{emailCallBackUrl:https://yyyy-xx.zzzzz.logic.azure.com/workflows/efdbb900c/runs/00268387CU20/actions/HTTP_Webhook/repetitions/000000/run?api-version=2050-02-04&sp=%2Fruns%2F08sssssssssss0%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Frun%2C%2Fruns%2Fxxxxxxxxxxxxxxxxx20%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Fread&sv=1.0&sig=eYmmxxxxxxxxxxxxxxxxxxxhjIM,emailFileContent:hDQphc2xxxxxxxxxhpc2gsbWlzaHJhDQo=,emailFileName:456.csv,emailFrom:amitxxxx@gmail.com,emailSubject:Example,x-ms-workflow-id:6cxxxxxxxxxxxxxxbb900c,x-ms-workflow-version:08xxxxxxxxxxxxxxxx150,x-ms-workflow-name:runbook,x-ms-workflow-system-id:/locations/cxxxxx/scaleunits/xxxxx/workflows/6c14xxxxxxxxxxxxxxxb900c,x-ms-workflow-run-id:08xxxxxxxxxxxxx0,x-ms-workflow-run-tracking-id:c0889f0a-8ef9-5555-x111-77ldkfw98r34c54,x-ms-workflow-operation-name:HTTP_Webhook,x-ms-workflow-repeatitem-scope-name:For_each,x-ms-workflow-repeatitem-index:0,x-ms-workflow-repeatitem-batch-index:0,x-ms-execution-location:xxxxxxxx,x-ms-workflow-subscription-id:hfhfh-d6s6d-d7d9s-7ASassASasas4,x-ms-workflow-resourcegroup-name:rg_poc,x-ms-tracking-id:c3-5xxxx-xxxxx-asdsd-2xxxx21,x-ms-correlation-id:c3xxxxx-5xxxx-4xxxx-xxxxx-25xxxxx,x-ms-client-request-id:cxxxxxx-xxxx-4xxxx-axxx-2sssssz21,x-ms-client-tracking-id:08wwsswsCU16,x-ms-action-tracking-id:b1sdsd6-85sdsd-4fsdsd-sddc-988888888891,x-ms-zone-redundancy:optional,x-ms-activity-vector:AB.0L.OU.23,Connection:Keep-Alive,Accept-Encoding:gzip,Accept-Language:en,Host:xxxxxxxxxxx0418b.webhook.eus.azure-automation.net,User-Agent:azure-logic-apps/1.0}}"

我需要使用json.load将上面的内容转换为json_object。

代码语言:javascript
复制
json_object = json.loads(IhaveThisString)

但问题是,在冒号前后的字符串中缺少单引号。

我需要像这样重新格式化字符串:

代码语言:javascript
复制
{'WebhookName':'webhook','RequestBody':'somebody','RequestHeader':{'emailCallBackUrl':'https://yyyy-xx.zzzzz.logic.azure.com/workflows/efdbb900c/runs/00268387CU20/actions/HTTP_Webhook/repetitions/000000/run?api-version=2050-02-04&sp=%2Fruns%2F08sssssssssss0%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Frun%2C%2Fruns%2Fxxxxxxxxxxxxxxxxx20%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Fread&sv=1.0&sig=eYmmxxxxxxxxxxxxxxxxxxxhjIM','emailFileContent':'hDQphc2xxxxxxxxxhpc2gsbWlzaHJhDQo=','emailFileName':'456.csv','emailFrom':'amitxxxx@gmail.com','emailSubject':'Example','x-ms-workflow-id':'6cxxxxxxxxxxxxxxbb900c','x-ms-workflow-version':'08xxxxxxxxxxxxxxxx150','x-ms-workflow-name':'runbook','x-ms-workflow-system-id':'/locations/cxxxxx/scaleunits/xxxxx/workflows/6c14xxxxxxxxxxxxxxxb900c','x-ms-workflow-run-id':'08xxxxxxxxxxxxx0','x-ms-workflow-run-tracking-id':'c0889f0a-8ef9-5555-x111-77ldkfw98r34c54','x-ms-workflow-operation-name':'HTTP_Webhook','x-ms-workflow-repeatitem-scope-name':'For_each','x-ms-workflow-repeatitem-index':'0','x-ms-workflow-repeatitem-batch-index':'0','x-ms-execution-location':'xxxxxxxx','x-ms-workflow-subscription-id':'hfhfh-d6s6d-d7d9s-7ASassASasas4','x-ms-workflow-resourcegroup-name':'rg_poc','x-ms-tracking-id':'c3-5xxxx-xxxxx-asdsd-2xxxx21','x-ms-correlation-id':'c3xxxxx-5xxxx-4xxxx-xxxxx-25xxxxx','x-ms-client-request-id':'cxxxxxx-xxxx-4xxxx-axxx-2sssssz21','x-ms-client-tracking-id':'08wwsswsCU16','x-ms-action-tracking-id':'b1sdsd6-85sdsd-4fsdsd-sddc-988888888891','x-ms-zone-redundancy':'optional','x-ms-activity-vector':'AB.0L.OU.23','Connection':'Keep-Alive','Accept-Encoding':'gzip','Accept-Language':'en','Host':'xxxxxxxxxxx0418b.webhook.eus.azure-automation.net','User-Agent':'azure-logic-apps/1.0'}}"

请让我知道如何在python中使用re来实现同样的目的。

注意:给定的数据有像emailCallBackUrl:https://yyyy-xx.zzzzz这样的URL,应该转换成类似'emailCallBackUrl':'https://yyyy-xx.zzzzz'

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-10-01 07:18:53

这是一个有趣的解决方案,感谢您清晰的解释和示例输入!

基本方法是查找不是分隔符符号的字符序列(:{},):

代码语言:javascript
复制
[^{:},]+

但是,这并不完全有效,因为https:从URL的其余部分中分离出来(参见这里)。

可以通过允许捕获组内的可选https:来解决此问题:

代码语言:javascript
复制
((?:https:)?[^{:},]+)

请参阅regex工作的这里。当然,您可以以这种方式添加任何其他需要的异常(例如,((?:https:|http:)?[^{:},]+)也可以捕获http: )。

完整Python代码:

代码语言:javascript
复制
IhaveThisString = "{WebhookName:webhook,RequestBody:somebody,RequestHeader:{emailCallBackUrl:https://yyyy-xx.zzzzz.logic.azure.com/workflows/efdbb900c/runs/00268387CU20/actions/HTTP_Webhook/repetitions/000000/run?api-version=2050-02-04&sp=%2Fruns%2F08sssssssssss0%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Frun%2C%2Fruns%2Fxxxxxxxxxxxxxxxxx20%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Fread&sv=1.0&sig=eYmmxxxxxxxxxxxxxxxxxxxhjIM,emailFileContent:hDQphc2xxxxxxxxxhpc2gsbWlzaHJhDQo=,emailFileName:456.csv,emailFrom:amitxxxx@gmail.com,emailSubject:Example,x-ms-workflow-id:6cxxxxxxxxxxxxxxbb900c,x-ms-workflow-version:08xxxxxxxxxxxxxxxx150,x-ms-workflow-name:runbook,x-ms-workflow-system-id:/locations/cxxxxx/scaleunits/xxxxx/workflows/6c14xxxxxxxxxxxxxxxb900c,x-ms-workflow-run-id:08xxxxxxxxxxxxx0,x-ms-workflow-run-tracking-id:c0889f0a-8ef9-5555-x111-77ldkfw98r34c54,x-ms-workflow-operation-name:HTTP_Webhook,x-ms-workflow-repeatitem-scope-name:For_each,x-ms-workflow-repeatitem-index:0,x-ms-workflow-repeatitem-batch-index:0,x-ms-execution-location:xxxxxxxx,x-ms-workflow-subscription-id:hfhfh-d6s6d-d7d9s-7ASassASasas4,x-ms-workflow-resourcegroup-name:rg_poc,x-ms-tracking-id:c3-5xxxx-xxxxx-asdsd-2xxxx21,x-ms-correlation-id:c3xxxxx-5xxxx-4xxxx-xxxxx-25xxxxx,x-ms-client-request-id:cxxxxxx-xxxx-4xxxx-axxx-2sssssz21,x-ms-client-tracking-id:08wwsswsCU16,x-ms-action-tracking-id:b1sdsd6-85sdsd-4fsdsd-sddc-988888888891,x-ms-zone-redundancy:optional,x-ms-activity-vector:AB.0L.OU.23,Connection:Keep-Alive,Accept-Encoding:gzip,Accept-Language:en,Host:xxxxxxxxxxx0418b.webhook.eus.azure-automation.net,User-Agent:azure-logic-apps/1.0}}"

import re
import json
import pprint

with_quotes = re.sub(r'((?:https:)?[^{:},]+)', r'"\1"', IhaveThisString)
my_json = json.loads(with_quotes)
pp = pprint.PrettyPrinter(depth=4)
pp.pprint(my_json)

输出:

代码语言:javascript
复制
{'RequestBody': 'somebody',
'RequestHeader': {'Accept-Encoding': 'gzip',
                'Accept-Language': 'en',
                'Connection': 'Keep-Alive',
                'Host': 'xxxxxxxxxxx0418b.webhook.eus.azure-automation.net',
                'User-Agent': 'azure-logic-apps/1.0',
                'emailCallBackUrl': 'https://yyyy-xx.zzzzz.logic.azure.com/workflows/efdbb900c/runs/00268387CU20/actions/HTTP_Webhook/repetitions/000000/run?api-version=2050-02-04&sp=%2Fruns%2F08sssssssssss0%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Frun%2C%2Fruns%2Fxxxxxxxxxxxxxxxxx20%2Factions%2FHTTP_Webhook%2Frepetitions%2F000000%2Fread&sv=1.0&sig=eYmmxxxxxxxxxxxxxxxxxxxhjIM',
                'emailFileContent': 'hDQphc2xxxxxxxxxhpc2gsbWlzaHJhDQo=',
                'emailFileName': '456.csv',
                'emailFrom': 'amitxxxx@gmail.com',
                'emailSubject': 'Example',
                'x-ms-action-tracking-id': 'b1sdsd6-85sdsd-4fsdsd-sddc-988888888891',
                'x-ms-activity-vector': 'AB.0L.OU.23',
                'x-ms-client-request-id': 'cxxxxxx-xxxx-4xxxx-axxx-2sssssz21',
                'x-ms-client-tracking-id': '08wwsswsCU16',
                'x-ms-correlation-id': 'c3xxxxx-5xxxx-4xxxx-xxxxx-25xxxxx',
                'x-ms-execution-location': 'xxxxxxxx',
                'x-ms-tracking-id': 'c3-5xxxx-xxxxx-asdsd-2xxxx21',
                'x-ms-workflow-id': '6cxxxxxxxxxxxxxxbb900c',
                'x-ms-workflow-name': 'runbook',
                'x-ms-workflow-operation-name': 'HTTP_Webhook',
                'x-ms-workflow-repeatitem-batch-index': '0',
                'x-ms-workflow-repeatitem-index': '0',
                'x-ms-workflow-repeatitem-scope-name': 'For_each',
                'x-ms-workflow-resourcegroup-name': 'rg_poc',
                'x-ms-workflow-run-id': '08xxxxxxxxxxxxx0',
                'x-ms-workflow-run-tracking-id': 'c0889f0a-8ef9-5555-x111-77ldkfw98r34c54',
                'x-ms-workflow-subscription-id': 'hfhfh-d6s6d-d7d9s-7ASassASasas4',
                'x-ms-workflow-system-id': '/locations/cxxxxx/scaleunits/xxxxx/workflows/6c14xxxxxxxxxxxxxxxb900c',
                'x-ms-workflow-version': '08xxxxxxxxxxxxxxxx150',
                'x-ms-zone-redundancy': 'optional'},
'WebhookName': 'webhook'}
票数 1
EN

Stack Overflow用户

发布于 2022-10-01 07:15:29

这将是烦人的、易碎的和不可靠的,因为一些值也有: (或其他一些特殊字符)作为值的一部分;在示例中,https://...有一个冒号,但不应该被视为JSON冒号.

如果像emailSubject这样的字段可能包含绝对的任何内容,情况就会更糟。

您给出的示例可以使用以下内容:

代码语言:javascript
复制
with_quotes = re.sub(r'[{}:,]+', r'"\g<0>"', IhaveThisString)

# fix up URLs broken by the quoting
with_quotes = re.sub('(https?)":"//', r'\g<1>://', with_quotes)

assert with_quotes[0] == with_quotes[-1] == '"'

print(with_quotes)

json_object = json.loads(with_quotes[1:-1])

pprint.pprint(json_object)

但是,这不是一个很好的解决方案,因为emailSubject字段中的标点符号会将其抛出。

理想情况下,请尝试修复发送此数据的系统。

否则,您可能不得不手工解析,而不依赖于json.loads,从所拥有的数据中选择您需要的信息。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73916352

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档