Specification: Select the lower-case body and timestamp of tweets that are in English and not retweets
Example input/output:
Input:
data| language | is_retweet | likes | body | ts |
|---|---|---|---|---|
en | false | 8 | Some Text | 1604534320 |
en | true | 8 | some Text | 1604534321 |
en | false | 8 | some Text | 1604534322 |
fr | false | 8 | some Text | 1604534322 |
Output:
body ts some text1604534320some text1604534322
Python - Imperative
def process_tweets(data):
result = []
for value in data:
if (value["language"] == "en" and
value["is_retweet"] == "false"):
result.append({
"body": value["body"].lower(),
"ts": value["ts"]
})
return resultPython - Functional
def process_tweets(data):
return [
{"body": value["body"].lower(),
"ts": value["ts"]}
for value in data
if value["language"] == "en" and
value["is_retweet"] == "false"
]Python - Pandas
def process_tweets(data):
result = data[
(data.language == 'en') &
(data.is_retweet == 'false')]
result.body = result.body.apply(lambda s: s.lower())
return result[["body", "ts"]]R - Tidyverse
process_tweets <- function(data) {
data %>%
filter(language == "en" & is_retweet == "false") %>%
mutate(body = tolower(body)) %>%
select(ts, body)
}SQL - SQLite
SELECT LOWER(body) as body, ts FROM data WHERE language = "en" and is_retweet = "false"
Q - kdb+
process_tweets: select lower[body], ts from data where (is_retweet ~\: "false") and (language ~\: "en")