Specification: Select the lower-case body and timestamp of tweets that are in English and not retweets
Example input/output:
Input:
data
language | is_retweet | likes | body | ts |
---|---|---|---|---|
en | false | 8 | Some Text | 1604534320 |
en | true | 8 | some Text | 1604534321 |
en | false | 8 | some Text | 1604534322 |
fr | false | 8 | some Text | 1604534322 |
Output:
body ts some text
1604534320
some text
1604534322
Python - Imperative
def process_tweets(data): result = [] for value in data: if (value["language"] == "en" and value["is_retweet"] == "false"): result.append({ "body": value["body"].lower(), "ts": value["ts"] }) return result
Python - Functional
def process_tweets(data): return [ {"body": value["body"].lower(), "ts": value["ts"]} for value in data if value["language"] == "en" and value["is_retweet"] == "false" ]
Python - Pandas
def process_tweets(data): result = data[ (data.language == 'en') & (data.is_retweet == 'false')] result.body = result.body.apply(lambda s: s.lower()) return result[["body", "ts"]]
R - Tidyverse
process_tweets <- function(data) { data %>% filter(language == "en" & is_retweet == "false") %>% mutate(body = tolower(body)) %>% select(ts, body) }
SQL - SQLite
SELECT LOWER(body) as body, ts FROM data WHERE language = "en" and is_retweet = "false"
Q - kdb+
process_tweets: select lower[body], ts from data where (is_retweet ~\: "false") and (language ~\: "en")