Taken from Deedle in 10 minutes, adapted to Analyze
import Analyze
import Analyze.Frame as Frame
import Analyze.Series as Series
import qualified Date.Time.Calendar -- from the 'time' packageWe can create a series with any type that instantiates
the IsList type class.
idxes = [ "A"
, "B"
, "C"
]
values = [10, 20, 30]
firstSeries = Series.new idxes valuesAlso, we can create a series from a list of tuples:
secondSeries = Series.ofObservations
[ ( "A", 10)
, ( "B", 20)
, ( "C", 30)
]We can create a series of implicit (ordinal) keys by doing:
thirdSeries = Series.ofValues [ 10.0, 20.0, 30.0 ]Now we can create a Frame using the first and the second Series, as they share the keys.
df1 = Frame.new ["first", "second"] [firstSeries, secondSeries]A Frame has two type parameters: Frame rowKey columnKey. We use
this to index. The types of the data itself are not specified, and
instead, we do so when getting the data from the Frame.
We can create a data frame with Int indexes for rows or columns by:
df2 = Frame.ofColumns [ ("first", firstSeries), ("second", secondSeries) ]
df3 = Frames.ofRows [ ("first", firstSeries), ("second", secondSeries)]Also, we can specify our indexes for rows and columns by specifying
(rowKey, columnKey, value):
df4 = Frame.ofValues
[ ("Monday", "John", 1.0)
, ("Tuesday", "Joe", 2.1)
, ("Tuesday", "John", 4.0)
, ("Wednesday", "John", -5.4)
]If your data types derive the Generic type class you can create a
data frame from a list of those too:
data Price = Price
{ day :: Text
, quantity :: Float
} deriving (Generic)
instance Serialize Price
prices :: [Price]
prices =
[ Price "1-1-17" 10.0
, Price "2-1-17" 12.0
, Price "3-1-17" 13.0
]
df5 = Frame.ofRecords pricesFinally, we can also load a data frame from CSV.
msftCsv = Frame.readCsv "resources/MSFT.csv"
fbCsv = Frame.readCsv "resources/FB.csv"The types are not inferred when loading like that, the user must specify them later.
msftOrd <- Frame.withFrame msftCsv $ do
Frame.indexRowsDate "Date"
Frame.sortRowsByKeyWe can now get only the open and close prices, and add a new column.
msft <- Frame.withFrame msftOrd $ do
Frame.sliceColumns [ "Open", "Close" ]
openColumn <- Frame.getColumn "Open"
closeColumn <- Frame.getColumn "Close"
let differenceColumn = zipWith (-) openColumn closeColumn
Frame.addColumn "Difference" differenceColumnWe can do the same thing for Facebook:
fb <- Frame.withFrame fbCsv $ do
Frame.indexRowsDate "Date"
Frame.sortRowsByKey
Frame.sliceColumns [ "Open", "Close" ]
openColumn <- Frame.getColumn "Open"
closeColumn <- Frame.getColumn "Close"
let differenceColumn = zipWith (-) openColumn closeColumn
Frame.addColumn "Difference" differenceColumnLet's create a single data frame that contains Microsoft and Facebook data. Before joining those data frames, we have to rename their columns so their names aren't duplicated.
let msftNames = ["MsftOpen", "MsftClose", "MsftDiff"]
msftRenamed <- Frame.withFrame msft $
Frame.indexColumnsWith msftNames
let fbNames = ["FbOpen", "FbClose", "FbDiff"]
fbRenamed <- Frame.withFrame fb $
Frame.indexColumnsWith fbNames
let joinedOut = Frame.withFrame msftRenamed $
Frame.outerJoin fbRenamed
let joinedIn = Frame.withFrame msftRenamed $
Frame.innerJoin fbRenamedlet val = Frame.getRow (Data.Time.Calendar.fromGregorian 2013 1 2) joinedIn
let val' = Frame.getRow (Data.Time.Calendar.fromGregorian 2013 1 2) joinedIn
& Series.get "FbOpen"-- 'modifying' modifies the original 'Frame' instead of copying it
Frame.modifying joinedOut $ do
comparison <- Frame.mapRowValues $ \row ->
if (Series.get "msftOpen" row) > (Series.get "fbOpen" row)
then "MSFT"
else "FB"
Frame.addColumn "Comparison" comparisonWe can now get the number of days when Microsoft stock prices were above Facebook and the other way round:
let msftCount = Frame.withFrame joinedOut $ do
Frames.getColumn "Comparison" (Series.as :: String)
>>= Series.filterValues (== "MSFT")
>>= Series.countValues
-- msftCount = 220
let fbCount = Frame.withFrame joinedOut $ do
Frames.getColumn "Comparison" (Series.as :: String)
>>= Series.filterValues (== "FB")
>>= Series.countValues
-- fbCount = 103Group rows by month and year:
monthly <- Frames.withFrame joinedIn $
Frame.groupRowsUsing $ \(y,m,_) _ -> fromGregorian y m 1