前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >用Python来比较简历和职位匹配度

用Python来比较简历和职位匹配度

原创
作者头像
mariolu
修改2024-03-24 18:00:12
1720
修改2024-03-24 18:00:12
举报
文章被收录于专栏:Python实用主义CDN及云技术分享

到了金三银四找工作季。求职者海投了人力市场上的多个职位。资方也收到了很多份简历。那么这里发挥Python的实用性功能,我们用来帮你计算你的简历和哪份岗位匹配度最高。

一、原理

我们以一个最简单的原理,就是你的简历上出现的工作技能和岗位上需求技能的关键词匹配度最高,那么就认为这个岗位最适合你。当然这是一个基于这个简单原理给出的预测。不一定上完全准确,程序计算结果我们可以参考参考。

你可能想到一个类似的应用场景就是文档相似度。是的其实原理类似,不过这里我们加入了一些计算机领域的关键词过滤。

  • 关于文档相似度的原理你可以参考这里:https://cloud.tencent.com/developer/article/2373441
  • 计算机关键词词典,我放上了一份词典,大概含872多个计算机领域术语,https://github.com/lumanyu/ai_app/blob/main/job_hunter/computer_science_glossary.dict1.1

判断方法有很多种,比如说余弦相似度,ngram和著名的tf-idf方法去计算文本相似度。

本文以最简单比较好理解的余弦相似度,用python实操如何比较两段文字的相似度。

二、python实操

我们使用numpy来演示两段文档的余弦相似度

2.1 文档向量化

在此之前,我们需要对文字进行分词处理。

  • 工作岗位job1 ="..."
  • 工作岗位job2 ="..."
  • 求职者简历resume1
  • 求职者简历resume2

我们会建立一个词典库包含这四段文字出现的所有单词。而且单词要出现在计算机词典里。

那么要建的词典库就是filter_vocab= ['a', 'ai', 'base', 'building', 'by', 'client', 'code', 'computer', 'data', 'database', 'design', 'exploratory', 'for', 'framework', 'knowledge', 'language', 'leverage', 'management', 'of', 'on', 'optimization', 'performance', 'programming', 'projects', 'quality', 'query', 'research', 'robust', 'schema', 'set', 'software', 'stack', 'startup', 'strong', 'structure', 'support', 'system', 'the', 'theory', 'trace', 'ui', 'user', 'well']

然后计算这四段文字对应的n维向量(这里n,就是词典库出现的个数)就是

其中如果该句子出现了某个word,那么计次数1,出现多次则计出现的次数。

对句子向量化之后,就可以计算求职者简历和职位的相似度。

2.2 完整的程序

程序如下:

或者在在github链接上找到https://github.com/lumanyu/ai_app/blob/main/job_hunter/job_hunter.py

代码语言:python
代码运行次数:0
复制
import numpy as np
from math import sqrt, log
from itertools import chain, product
from collections import defaultdict
import requests

def cosine_sim(u,v):
    return np.dot(u,v) / (sqrt(np.dot(u,u)) * sqrt(np.dot(v,v)))


def corpus2vectors(corpus):
    def vectorize(sentence, vocab):
        return [sentence.split().count(i) for i in vocab]
    vectorized_corpus = []
    vocab = sorted(set(chain(*[i.lower().split() for i in corpus])))

    url="https://raw.githubusercontent.com/lumanyu/ai_app/main/job_hunter/computer_science_glossary.dict"
    response = requests.get(url)
    computer_words = response.text
    computer_words_lines = [y.lower() for y in (x.strip() for x in computer_words.splitlines()) if y]
    computer_words = sorted(set(chain(*[i.lower().split() for i in computer_words_lines])))
    print(computer_words)

    filter_vocab=[]
    for v in vocab:
        if v in computer_words:
            filter_vocab.append(v)

    print("-------filter_vocab-------\n")
    print(filter_vocab)

    
    for i in corpus:
        vectorized_corpus.append((i, vectorize(i, filter_vocab)))
    return vectorized_corpus, filter_vocab


##求职者1
job1="""
    **Requirements**
- 5+ years of experience as a full stack engineer building user-facing products
- Proficiency in Typescript and Next.js or similar framework
- Self-driven & comfortable with ambiguity - able to plan, sequence, and execute projects
- Strong at technical trade-off decisions
- Strong communication skills
- Experience at a startup
- Bay Area-based & excited about working in-person 2-3 days per week in SF
"""
#求职者2
job2 ="""
10+ years of experience in data management systems, including extensive experience in query optimization
Experience with building production-level code with a large user base, robust design structure and rigorous code quality
Degree in Computer Science or similar field, or equivalent practical experience, with strong competencies in data structures, algorithms, and software design/architecture
Experience with large code bases written in C++ or another systems programming language. You'll need to trace down defects, estimate work complexity, and design evolution and integration strategies as we rewrite different components of the system
Passion for the theory and practice of database query engines, as well as hands-on or academic experience in the domain
"""
#工作岗位1
resume1="""
You’ll build features that user automation and AI to make EVC contractors & developers lives easier → design, plan, sell, and work faster.
This means…
1. Building new features from scratch
2. Fixing bugs
3. DB architecture decisions
4. Exploratory AI projects
5. UI polish on frontend components
6. and more!
"""
#工作岗位2
resume2="""
Innovate in the area of flexible schema databases. Help us build a world-class query optimization system
Research state-of-the art query systems to inform our design
Leverage deep knowledge of the strength and weakness of the product and the industry trends to provide technical vision and direction 
Set initiative level strategy, architect plan and lead team towards successful execution
Advise management on decisions related to roadmap, processes, architecture and design
Identify, design, implement, test, and support new features related to query performance and robustness, query language enhancements, diagnostics, and integration with other products and tools
Work with other engineers to coordinate seamless changes in a feature-rich, large code base
Work with other teams including client drivers, cloud services, enterprise tools, support, consulting, education, and marketing to coordinate changes or contribute to their projects
Influence and grow team members through active mentoring, coaching and leading by example
"""

jobs=[job1,job2]
resumes=[resume1]
jobs_and_resumes = [job1, job2, resume1, resume2]
    
#对句子进行向量化,和生成共有的词典库
def create_test_corpus():
    corpus, vocab = corpus2vectors(jobs_and_resumes)
    return corpus, vocab

def test_cosine():
    corpus, vocab = create_test_corpus()
    
    #求职者1和求职者2
    jobs_corpus = corpus[0:2]
    #岗位1和岗位2
    resumes_corpus = corpus[2:4]

    #对求职者和岗位做相关性计算
    for sentx, senty in product(resumes_corpus, jobs_corpus):
        print("cosine =", cosine_sim(sentx[1], senty[1]))

#print "Testing cosine..."
test_cosine()

这里就得到

  • 求职者1 对岗位1 和岗位2的相似度为0.18和0.07
  • 求职者2 对岗位1 和岗位2的相似度为0.31和0.69

那么我们认为求职者1去找岗位1的相似度高,求职者2取找岗位2 的相似度高。

2.3 腾讯云api

这里我们还可以用腾讯云提供的文本相似度来做下比较

调用接口在这里

https://console.cloud.tencent.com/api/explorer?Product=nlp&Version=2019-04-08&Action=EvaluateSentenceSimilarity

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一、原理
  • 二、python实操
    • 2.1 文档向量化
      • 2.2 完整的程序
        • 2.3 腾讯云api
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档