ASR 自动语音识别(Automatic Speech Recognition)是一种将人的语音转换为文本的技术。
以前的ASR太难用了。瑞士那边做了一款厉害的ASR来替换。据说是基于人工智能的,大数据的。反正就是很牛的,让我来测试,供他们拍脑袋来做决策。 我只测反应时间,至于准不准,不在此次范围内(噪音,精度等)。 开发将ASR的SDK集成在一个demo里面,我来人工测试,也不是调用接口。就是模拟用户反复使用测试,来判断反应时间是否能达到要求。
先给了一个Android版本。 开始手工感受了一下,如果完全人工测试,太浪费时间了。
后面定了一个策略:就是用我录制的样本,每台机器跑4个样本,每个样本跑30遍,记录反应时间。 然后我用UI自动化的形式来完全模拟人工。
样本是这四句话: Due to delays, we need to reconsider our schedule this week. As we've discussed, we need to put our most experienced staff on this. Can you suggest an alternative to the restructuring? We'll implement quality assurance processes before the final review. 故意读得磕磕巴巴,每个音频大约在13秒。 但是录制出来的是m4a格式,我得转换一下,这里用ffmpeg
一、ffmpeg安装
1.ffmpeg下载:http://ffmpeg.org/download.html
2.解压到指定目录,将bin文件目录添加到path路径(电脑-属性-高级系统设置-环境变量-path-新建) 命令行(windows+r 输入cmd)输入:ffmpeg -version出结果表示成功。
二、ffmpeg使用
1.视频格式转换:ffmpeg -i num.mp4 -codec copy num2.avi
将num.mp4复制并转换为num2.avi
注:-i后表示要进行操作的文件
2.gif制作:ffmpeg -i num.mp4 -vframes 20 -y -f gif num3.gif
将num.mp4的前20帧制作为gif并命名为num3
3.视频截取:ffmpeg -i num.mp4 -ss 0 -t 3 -codec copy cut1.mp4
-ss后数字表示截取时刻,-t后数字表示截取时长
截取视频某一时刻为图片:ffmpeg -i num.mp4 -y -f image2 -ss 2 -t 0.001 -s 400x300 pic.jpg
将2s时刻截取为400x300大小的名为pic.jpg的图片(-ss后的数字为截取时刻)
4.每秒截取一张图片:ffmpeg -i num.mp4 -r 1 image%d.jpg
将视频num.mp4进行每秒截取一张图片,并命名为imagei.jpg(i=1,2,3...)
注:-r后的数字表示每隔多久截取一张。
然后写个脚本,批量转换就完成了。
import os
#current_path = os.path.dirname(os.path.abspath(__file__))
current_path = "C:\\work\\code\\android"
audio_file = os.path.join(current_path, "audio")
m4a_files = os.listdir(audio_file)
for i, m4a in enumerate(m4a_files):
if m4a.endswith(".m4a"):
os.system("ffmpeg -i " + audio_file +"\\" + m4a
+ " " + audio_file +"\\"+ str(i+1) + ".mp3")
然后调试了一下脚本,就跑起来了。 然后就是在点击播放,就开始录制。 python的音频播放,我尝试了几个,用pygame可以自己设置时间长度。 用起来大概是这样的:
import pygame
# play the audio
pygame.mixer.init()
pygame.mixer.music.load(file)
pygame.mixer.music.play()
sleep(SLEEPTIME)
pygame.mixer.music.stop()
#playsound(file)
通过判断分数是否在界面上出来,来判断反应时间,所以得不停的轮询:
start_time = datetime.datetime.now()
score = "textScore"
# Get the end time from judge the score whether appear or not
while not is_element_appear(driver, score):
print("please wait, the response still didn't back at {}".format(datetime.datetime.now()))
timeout_diff = datetime.datetime.now() - start_time
print(timeout_diff.seconds)
if timeout_diff.seconds >= TIME_OUT:
print("overtime")
# end_time = start_time.shift(seconds=+TIME_OUT)
break
end_time = datetime.datetime.now()
used_time = (end_time - start_time).total_seconds()
多次执行,为了防止中间出问题,写了个log, 这样前面执行的结果,可以记录下来:
def write_log(content):
# write a log in case crash or any breakout
with open(log_path, 'a+') as f:
f.writelines(content)
def get_sentence_audio():
# get each sentence and mp3 map
with open(sentence_file, "r") as f:
for index, line in enumerate(f.readlines()):
line = line.strip("\n")
file = audio_file_path + "\\" + str(index + 1) + ".mp3"
mutiply_times(line, file)
def mutiply_times(text, audio_file):
# play mutiply times
each_audio_play_time = []
type_text(text)
for i in range(TIMES):
start_record(audio_file, each_audio_play_time)
result.append(each_audio_play_time)
# in case it was crash.
write_log("current result is {}{} {} {}".format(get_brand(),get_version(),text,each_audio_play_time))
大约跑起来就是这个样子的:
from appium import webdriver
from time import sleep
import datetime
import pygame
import os
import numpy as np
import pandas as pd
from playsound import playsound
TIME_OUT = 30
TIMES = 30
element_time = 10
DURATION = 0.5
SLEEPTIME = 12
current_path = os.path.dirname(os.path.abspath(__file__))
audio_file_path = os.path.join(current_path, "audio")
sentence_file = current_path + "\\sentence.txt"
result_path = current_path + "\\result"
log_path = current_path + "\\result\\log.txt"
import subprocess
def cmd(cmd):
return subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def get_device():
for device in cmd('adb devices').stdout.readlines():
if 'devices' not in str(device):
device = device.decode('utf-8')
return device.split('\t')[0]
def get_version():
if cmd('adb shell getprop | findstr version.release').stdout != "":
result = cmd('adb shell getprop | findstr ro.build.version.release').stdout.readline()
result = str(result).split(":")[1]
return result[result.index("[") + 1:result.index("]")]
def get_brand():
if cmd('adb shell getprop | findstr ro.product.brand').stdout != "":
result = cmd('adb shell getprop | findstr ro.product.brand').stdout.readline()
result = str(result).split(":")[1]
return result[result.index("[") + 1:result.index("]")]
desired_caps = {}
desired_caps["platformName"] = "android"
desired_caps["platformVersion"] = get_version()
desired_caps["deviceName"] = get_device()
desired_caps["appPackage"] = "com.library.speechscoringsdk"
desired_caps["appActivity"] = "com.library.speechscoringsdk.RootActivity"
desired_caps["autoGrantPermissions"] = True
desired_caps["automationName"] = "UiAutomator1"
desired_caps["noReset"] = True
driver = webdriver.Remote("http://localhost:4723/wd/hub", desired_caps)
result = []
#
def wait_element_appear(driver, element):
try:
# 设置等待
wait = WebDriverWait(driver, element_time, DURATION)
# 使用匿名函数
wait.until(lambda diver: driver.find_element_by_id(element))
return True
except:
return False
def is_element_appear(driver, element):
try:
driver.find_element_by_id(element)
return True
except:
return False
def type_text(text):
permission = "permission_allow_button"
if wait_element_appear(driver, permission):
driver.find_element_by_id("permission_allow_button")
driver.find_element_by_id("settings").click()
sleep(1)
driver.find_element_by_id("record").click()
def start_record(file, each_time):
# add this part to clear the result in app
# driver.hide_keyboard()
driver.find_element_by_id("btnRecord").click()
# play the audio
pygame.mixer.init()
pygame.mixer.music.load(file)
pygame.mixer.music.play()
sleep(SLEEPTIME)
pygame.mixer.music.stop()
#playsound(file)
# start to calculate the start time after click the sentout button.
driver.find_element_by_id("btnRecord").click()
start_time = datetime.datetime.now()
score = "textScore"
# Get the end time from judge the score whether appear or not
while not is_element_appear(driver, score):
print("please wait, the response still didn't back at {}".format(datetime.datetime.now()))
timeout_diff = datetime.datetime.now() - start_time
print(timeout_diff.seconds)
if timeout_diff.seconds >= TIME_OUT:
print("overtime")
# end_time = start_time.shift(seconds=+TIME_OUT)
break
end_time = datetime.datetime.now()
used_time = (end_time - start_time).total_seconds()
print("time is {}".format(used_time))
each_time.append(used_time)
driver.find_element_by_id("settings").click()
driver.find_element_by_id("record").click()
# write log in case break out occur even it will lost some performance
# content = get_brand() + get_version() + ": run {}".format(text) + "used time: {}".format(used_time)+"\r\n"
# write_log(content)
def write_log(content):
# write a log in case crash or any breakout
with open(log_path, 'a+') as f:
f.writelines(content)
def get_sentence_audio():
# get each sentence and mp3 map
with open(sentence_file, "r") as f:
for index, line in enumerate(f.readlines()):
line = line.strip("\n")
file = audio_file_path + "\\" + str(index + 1) + ".mp3"
mutiply_times(line, file)
def mutiply_times(text, audio_file):
# play mutiply times
each_audio_play_time = []
type_text(text)
for i in range(TIMES):
start_record(audio_file, each_audio_play_time)
result.append(each_audio_play_time)
# in case it was crash.
write_log("current result is {}{} {} {}".format(get_brand(),get_version(),text,each_audio_play_time))
if __name__ == "__main__":
# s start to play and record time
get_sentence_audio()
# get result for each device
result_arrary = np.array(result)
result_pd = pd.DataFrame(result_arrary, columns=range(1, TIMES + 1))
# add the average result
result_pd['AVG'] = result_pd.mean(axis=1) # axis 0为列,1为行
# add the sample in result
sentences = open(sentence_file, 'r')
text = sentences.readlines()
sentences.close()
sentence_pd = pd.DataFrame(text, columns=["sentence"])
result_final = sentence_pd.join(result_pd)
# add the device type in result
device_arrary = np.array([get_brand() + get_version()] * len(text)).T
device_pd = pd.DataFrame(device_arrary)
result_final = device_pd.join(result_final)
# write the result to excel
result_final.to_excel(result_path + "\\" + get_brand() + get_version() + ".xlsx", index=False)
excel里面,多个sheet文件写在一起,可以这样用:
# 第一步:调用pandas包
# 第二步:读取数据
iris = pd.read_excel("C:\\work\\code\\android\\android.xlsx", None) # 读入数据文件,多个sheet
keys = list(iris.keys())
# 第三步:数据合并
iris_concat = pd.DataFrame()
for i in keys:
iris1 = iris[i]
iris_concat = pd.concat([iris_concat, iris1])
iris_concat.to_excel("C:\\work\\code\\android\\result.xlsx", index=False) # 数据保存路径
最后我数据处理的文件是:
test_path = "C:\\work\\code\\android\\result"
import pandas as pd
import os
def get_result(path):
final = []
for index, file in enumerate(os.listdir(path)):
pds = pd.read_excel(os.path.join(path, file))
# print(pds)
final.append(pds)
# final.append(pds.keys)
return final
s = get_result(test_path)
print(s)
final_result=pd.concat(s)
current_path = os.path.dirname(os.path.abspath(__file__))
result_file = current_path + "\\result.xlsx"
最后测了几轮,发现CN的时间大约是US的一半,然后offline的是online的四分之一,机器性能好的,稍微快点,差别不太大。然后boss看到这个结果,就可以拍板了。
后面还有个IOS版本的,还有ASR其他方面的测试,下回再说。
更多精彩,请关注微信公众号:python粉丝团
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。