社区首页 >专栏 >spark隐式转换 toDf_隐式转换是什么

spark隐式转换 toDf_隐式转换是什么

全栈程序员站长

发布于 2022-11-10 08:24:03

1.2K00

代码可运行

文章被收录于专栏：全栈程序员必看全栈程序员必看

运行总次数：0

代码可运行

文章目录

一. 生产问题背景

如上就是此blog产生的背景，

Spark SQL 中，
	DF.select()

select 报错 不能导入
	spark sql Cannot resolve overloaded method 'select'

咨询大佬后，隐式转换的原因 ,导入Spark的隐式转换后即可
	import spark.implicits._

二. 隐式转换开荒

没有隐式转换，只能从精度较高的—–>精度低的

但是从精度低—–> 精度高的。就会报错

2.1 隐式转换函数参数

解决方案就是自己定义一个隐式转换函数，double2int。这个隐士函数的功能也需要是唯一的

用强转换也行，那隐士转换可有可无？

RichFile

import java.io.File
import scala.io.Source

object implicit2 { 
   
  def main(args: Array[String]): Unit = { 
   

    //java.io.File 只封装了文件的元数据，文件内容必须通过IO
    //所以File 后无法直接获取context
    val context:String = new File("").readContext
  }

}

隐式转换更多的应用在此，想要实现File 后直接获取readContext 必须自己封装这个方法，然后实现隐式转换

object implicit2 { 
   
  def main(args: Array[String]): Unit = { 
   
    //声明隐式转换
    implicit def file2RichFile(file: File):RichFile = new RichFile(file)
    

    //java.io.File 只封装了文件的元数据，文件内容必须通过IO
    //所以File 后无法直接获取context
    val context:String = new File("").readContext
  }

}

class RichFile(file:File){ 
   
  //自己封装一个，让File后能readContext
  def readContext:String = { 
   
    Source.fromFile(file).mkString
  }
}

整理一下这个流程：

java.io.File 无 readContext方法
	查找implicit函数
		传入参数为File ， 返回方法当中有没有一个方法为readcontext
		以上匹配关系必须唯一

implicit def int2Date(int: Int):RichDate = new RichDate(int)

    val ago:String = "ago"
    val later:String = "later"
    val day2 = 2.days(ago)

class RichDate(day:Int){
  def days(when : String) = {
    if("ago"==when)
      LocalDate.now().plusDays(-day).toString
    else if("later"==when)
      LocalDate.now().plusDays(day).toString
    else
      println("later or age error")
  }
}

2.2 隐式类

注意，隐式函数引用的时候，implict 关键字标黄了这是啥意思呢？

这是在说：your code is as same sa the shit

since Scala2.10

再一次简化隐式转换，直接把类写在里面即可

object implicit3 {
  def main(args: Array[String]): Unit = {

    val ago : String = "ago"
    val later : String = "later"

    println(3.days(later))

    implicit class RichDate(day:Int){
      def days(when:String): Unit ={
        if ("ago" == when){
          LocalDate.now().plusDays(-day).toString
        }else if("later" == when){
          LocalDate.now().plusDays(day).toString
        }
      }
    }
  }

隐式类要求（1）其所带的构造参数有且只能有一个（2）隐式类必须被定义在“类”或“伴生对象”或“包对象”里，即隐式类不能是顶级的。

2.3 隐式解析机制

之前有一些提到，

（1）首先会在当前代码作用域下查找隐式实体（隐式方法、隐式类、隐式对象）。（一般是这种情况）（2）如果第一条规则查找隐式实体失败，会继续在隐式参数的类型的作用域里查找。类型的作用域是指与该类型相关联的全部伴生对象以及该类型所在包的包对象

三.回归主题

开头提到一个 DF.select

  /**
   * :: Experimental ::
   * (Scala-specific) Implicit methods available in Scala for converting
   * common Scala objects into `DataFrame`s.
   *
   * {
  
  {
  
  {
   *   val sparkSession = SparkSession.builder.getOrCreate()
   *   import sparkSession.implicits._
   * }}}
   *
   * @since 2.0.0
   */
  @Experimental
  @InterfaceStability.Evolving
  object implicits extends SQLImplicits with Serializable {
    protected override def _sqlContext: SQLContext = SparkSession.this.sqlContext
  }

  /**
   * Selects a set of column based expressions.
   * {
  
  {
  
  {
   *   ds.select($"colA", $"colB" + 1)
   * }}}
   *
   * @group untypedrel
   * @since 2.0.0
   */
  @scala.annotation.varargs
  def select(cols: Column*): DataFrame = withPlan {
    Project(cols.map(_.named), logicalPlan)
  }