Why project needed Boxcox transformation?

Why use Boxcox transformation, perspective of workflow

κ°’λ“€μ˜ μ°¨λΆ„ λ°°μ—΄ (sort ν›„ 뒀에 친ꡬ κ°’μ΄λž‘ λ‚˜λž‘ 차이만 쏙)

β†’ boxcox transformation (λ³΅μž‘ν•œ μ§€μˆ˜ν•¨μˆ˜μ˜ 승수의 ν•΄λ₯Ό 뉴턴 λ©”μ„œλ“œλ‘œ κ΅¬ν•˜κ³  이λ₯Ό μ •κ·œλΆ„ν¬λ‘œ λ³€ν™˜)

β†’ μ •κ·œλΆ„ν¬ 6 Οƒ (μ λ‹Ήν•œ κ°’ 좔리기 μ’‹λ‹€)

β†’ ν‰λ²”ν•œ λ†ˆλ“€ 그래 λ‹ˆλ“€μ„ μ›ν–ˆμ–΄ γ…‡γ…‡

Definiotion of the Boxcox transformation

선두 그룹에 값이 λͺ°λ € μžˆλŠ” 경우처럼 ν†΅κ³„μ μœΌλ‘œ μ ‘κ·Όν•˜λŠ” 값듀이 λΆˆκ· μΌν•œ 산포λ₯Ό 이룰 경우,

이 값듀을 μ •κ·œλΆ„ν¬ κ΄€μ μœΌλ‘œ μ ‘κ·Όν•˜κ³ μž ν•  λ•Œ Boxcox transformation 을 μ‚¬μš©ν•©λ‹ˆλ‹€.

μ–‘μˆ˜ 값을 κ°–λŠ” 수치 데이터와 ν•¨κ»˜ μ‚¬μš©ν•©λ‹ˆλ‹€.

box-cox-transformation μ„€λͺ…

newton-raphson 곡식

μœ„ν‚€ - μˆ˜μ‹

Implementation of Boxcox Transformation

scipy κΉƒν—ˆλΈŒ 링크

Boxcox code block

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
  # 1. scipy/stats/__init__.py

  stats ν΄λž˜μŠ€μ— λ©”μ„œλ“œκ°€ μ •μ˜λ¨

  Transformations
  ===============
      
  .. autosummary::
    :toctree: generated/

    boxcox
    boxcox_normmax
    boxcox_llf
    yeojohnson

  # 2. scipy/special/functions.json

  Cython 으둜 λž˜ν•‘λ˜μ–΄μžˆλŠ” boxcox ν•¨μˆ˜μž„

      "boxcox": {
          "_boxcox.pxd": {
              "boxcox": "dd->d"
          }
      },
      "boxcox1p": {
          "_boxcox.pxd": {

  # 3. scipy/scipy/special/_boxcox.pxd
  

Cython 으둜 μ •μ˜λ˜μ–΄ 있고, libc math api의 fabs, expm1 둜 κ΅¬ν˜„λ˜μ–΄ 있음.

1차원 λ°°μ—΄μ˜ 값은 μ•„λž˜μ˜ λ³€ν™˜μ„ κ±°μΉ¨.

libc.math - fab

fab

fabs, fabsf, fabsl - absolute value of floating-point number

λΆ€λ™μ†Œμˆ˜μ  μ ˆλŒ€κ°’μž„. μ•„λž˜μ˜ ν•¨μˆ˜μ—μ„œ 보면 10^(-19) 보닀 μž‘μ„ 경우 0으둜 간주함.

libc.math - expm1

expm1

expm1, expm1f, expm1l - exponential minus 1

λ§ν¬μ—μ„œ 보여쀀 곡식 κ·ΈλŒ€λ‘œ μ‚¬μš©ν•˜λŠ” 것을 μ•Œ 수 있음.

libc.math Cython package

μ—­μ‹œ c λŠ” number cruncher 듀을 μœ„ν•œ ν”„λ‘œκ·Έλž˜λ° μ–Έμ–΄λ‹€.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  
  cdef inline double boxcox(double x, double lmbda) nogil:
      # if lmbda << 1 and log(x) < 1.0, the lmbda*log(x) product can lose
      # precision, furthermore, expm1(x) == x for x < eps.
      # For doubles, the range of log is -744.44 to +709.78, with eps being
      # the smallest value produced.  This range means that we will have
      # abs(lmbda)*log(x) < eps whenever abs(lmbda) <= eps/-log(min double)
      # which is ~2.98e-19.  
      if fabs(lmbda) < 1e-19:
          return log(x)
      else:
          return expm1(lmbda * log(x)) / lmbda          
      ....