Welcome to Stocktwits collector’s documentation!
This package contains the class for collecting the twits of Stocktwits on your local.
Getting started
Stocktwits collector package is implemented for collecting the twits of Stocktwits on your local.
The goal is to implement this package for each Stocktwits API and to manage how to download many data on more files to avoid networking issues and to require again all data but only the missing part.
It is part of the educational repositories to learn how to write stardard code and common uses of the TDD.
Installation
If you want to use this package into your code, you can install by python3-pip:
pip3 install stocktwits_collector
python3
>>> import stocktwits_collector.collector as Collector
>>> help(Collector)
Development
The package is not self-consistent. So after to have downloaded the package by github and you have to install the requirements:
git clone https://github.com/bilardi/stocktwits-collector
cd stocktwits-collector/
pip3 install --upgrade -r requirements.txt
See the documentation to contribute.
Documentation
Read the documentation on readthedocs for
Usage
Development
Change Log
See CHANGELOG.md for details.
License
This package is released under the MIT license. See LICENSE for details.
API
Collector
The class for collecting twits of Stocktwits
A collection of methods to simplify your downloading
Stocktwits Collector
get history from Stocktwist, default last 30 messages |
|
save history from Stocktwist on files splitted by chunk per day, week or month |
Detailed list
- class stocktwits_collector.collector.Collector
- clean_data(messages, event)
clean data
- Arguments:
- messages (list of dict):
list of messages
- event (dict):
dictionary fully described in save_history() symbols (list of str): names of symbols to fetch users (list of str): names of users to fetch only_combo (bool): if True, fetches only messages of those symbols posted from those users
- Returns:
list of unique dictionaries cleaned
- clean_history(cursor, history, chunk='day')
clean history from messages with different chunk
- Arguments:
- cursor (dict):
dictionary with the keys oldest_date, min ID, earliest_date and max (ID)
- history (list[dict]):
list of messages
- chunk (str):
day, week or month, default day
- Returns:
history cleaned
- get_cursor(messages)
get cursor with oldest date, min ID and max ID
- Arguments:
- messages (list[dict]):
list of messages
- Returns:
a dictionary with oldest_date, min ID, earliest_date and max (ID)
- get_data(event)
get data from Stocktwits, default last 30 messages
- Arguments:
- event (dict):
dictionary fully described in save_history() symbols (list of str): names of symbols to fetch users (list of str): names of users to fetch min (int): optional, min ID max (int): optional, max ID limit (int): optional, defalt 30 messages
- Returns:
list of messages
- get_date(chunk='day', date=None, jump_chunk=False)
get date at midnight about chunk
- Arguments:
- chunk (str):
day, week or month, default day
- date (str):
datetime with format %Y-%m-%dT%H:%M:%SZ
- jump_chunk (bool):
True if you want to jump one chunk
- Returns:
string of date at midnight about that chunk or next one
- get_file_name(history, current_chunk, event)
get filename
- Arguments:
- history (list[dict]):
list of messages
- current_chunk (dict):
dictionary like event
- event (dict):
dictionary fully described in save_history()
- Returns:
the file name
- get_history(event)
get history from Stocktwist, default last 30 messages
- Arguments:
- event (dict):
dictionary fully described in save_history() start (datetime): optional, min datetime is_verbose (bool): optional, if True comments will be printed
- Returns:
list of messages
- get_temporary_event(messages, current_chunk, event)
get temporary chunk event from messages
- Arguments:
- messages (list[dict]):
list of messages
- current_chunk (dict):
dictionary fully described in save_history()
- event (dict):
dictionary fully described in save_history()
- Returns:
the temporary chunk event updated with the partial start and new min
- hold_output()
hold output
This method is temporary until PR approval: https://github.com/p-hiroshige/stockTwitsAPI/pull/1
- Example:
- with hold_output() as (out, err):
method_with_a_print()
captured_output = out.getvalue().strip()
- is_same_chunk(first_date, second_date, chunk='day')
compare a date with a second date
- Argument:
- first_date (str):
datetime with format %Y-%m-%dT%H:%M:%SZ
- second_date (str):
another date with format %Y-%m-%dT%H:%M:%SZ
- chunk (str):
day, week or month, default day
- Returns:
a boolean, True if the dates are of the same chunk
- is_younger(first_date, second_date)
compare a date with a second date
- Argument:
- first_date (str):
datetime with format %Y-%m-%dT%H:%M:%SZ
- second_date (str):
another date with format %Y-%m-%dT%H:%M:%SZ
- Returns:
a boolean, True if first date is younger than second one
- save_data(history, current_chunk, event)
save data
- Arguments:
- history (list[dict]):
list of messages
- current_chunk (dict):
dictionary like event
- event (dict):
dictionary fully described in save_history()
- Returns:
the temporary chunk event updated with the partial start and new max
- save_history(event)
save history from Stocktwist on files splitted by chunk per day, week or month
- Arguments:
- event (dict):
symbols (list[str]): names of symbols to fetch users (list[str]): names of users to fetch only_combo (bool): optional, if True, fetches only messages of those symbols posted from those users min (int): optional, min ID max (int): optional, max ID limit (int): optional, default 30 messages start (str): optional, min datetime chunk (str): optional (day, week or month), default day filename_prefix (str): optional, default “history.” filename_suffix (str): optional, default “.json” is_verbose (bool): optional, if True comments will be printed
- Returns:
last temporary chunk event discarded
- there_is_symbol(symbols_fetched, symbols_target)
check if in the message there are the symbols target
- Arguments:
- symbols_fetched (list of dict):
list of symbols
- symbols_target (list of string):
list of symbols names
- Returns:
a boolean, True if there is at least one symbol of target in the symbols fetched
- update_event(key, value, event)
update a specific key of event
- Arguments:
- key (str):
attribute name of event
- value (mix):
value you want to replace on that key
- event (dict):
dictionary fully described in save_history()
- Returns:
dictionary with the attribute named key changed with value
- walk(event, cursor, history)
walk along the messages like a shrimp
- Arguments:
- event (dict):
dictionary fully described in save_history()
- cursor (dict):
dictionary with the keys oldest_date, min ID, earliest_date and max (ID)
- history (list[dict]):
list of messages
- Returns:
cursor, history
Usage
The package uses the Stocktwits API manages three type of streas: user, symbol and conversation. Now the package manages the user and symbol streams.
There are some parameters that you can use. These are the mandatory parameters:
symbols, you can define a list of symbols that you want to download: this list has to have at least one element or it has to exist the parameter users
users, you can define a list of users that you want to download: this list has to have at least one element or it has to exist the parameter symbols
And these are optionals:
only_combo, when you want to download only the combo between a specific symbol and user, you have to use each previous parameter and this that it is a boolean
min, it is the ID of a specific twit from which you want to start downloading
max, it is the ID of a specific twit where you want to stop downloading
limit, it is the number of messages that you want to download in one shot
start, it is the datetime from which you want to start downloading
chunk, it is the chunk (day, week or month) in which you want to split the data
filename_prefix, it is the prefix name of files where you want to save the data
filename_suffix, it is the suffix name of files where you want to save the data
is_verbose, when you want to print some information to understand what the system is saving, it is a boolean
Without optional parameters, the system downloads the last 30 messages and prints those in the output. If you want to save that on a file (or more files), you have to use at least the chunk parameter.
Examples
Remeber to install the package by pip
pip3 install stocktwits-collector
or by requirements.txt contains one line with stocktwits-collector
pip3 install --upgrade -r requirements.txt
import os
import json
import pandas as pd
from stocktwits_collector.collector import Collector
sc = Collector()
# download last messages up to 30
messages = sc.get_history({'symbols': ['TSLA'], 'limit': 4})
# download the messages from a date to today
messages = sc.get_history({'symbols': ['TSLA'], 'start': '2022-04-04T00:00:00Z'})
# save the messages on files splitted per chunk from a date to max ID
chunk = sc.save_history({'symbols': ['TSLA'], 'start': '2022-04-04T00:00:00Z', 'chunk': 'day'})
# load data from one file
with open('history.20220404.json', 'r') as f:
data = json.loads(f.read())
df = pd.json_normalize(
data,
meta=[
'id', 'body', 'created_at',
['user', 'id'],
['user', 'username'],
['entities', 'sentiment', 'basic']
]
)
twits = df[['id', 'body', 'created_at', 'user.username', 'entities.sentiment.basic']]
# load data from multiple files
frames = []
path = '.'
for file in os.listdir(path):
filename = f"{path}/{file}"
with open(filename, 'r') as f:
data = json.loads(f.read())
frames.append(pd.json_normalize(
data,
meta=[
'id', 'body', 'created_at',
['user', 'id'],
['user', 'username'],
['entities', 'sentiment', 'basic']
]
)
)
df = pd.concat(frames).sort_values(by=['id'])
twits = df[['id', 'body', 'created_at', 'user.username', 'entities.sentiment.basic']]
Development
The package uses the Stocktwits API manages three type of streas: user, symbol and conversation. Now the package manages the user and symbol streams.
You can contribute to implement other functionalities by a Pull Request to master branch.
Run tests
cd stocktwits-collector/
pip3 install --upgrade -r requirements.txt
python3 -m unittest discover -v
There is also a script for integration tests, but it is only for specific changes
# run API with chunk day # around 160s
python3 -m unittest tests/integration_test.py
# run API with chunk week # around 400s
CHUNKS=week python3 -m unittest tests/integration_test.py
# run API with chunk month # around 1600s
CHUNKS=month python3 -m unittest tests/integration_test.py
# run API with chunk day, week and month
CHUNKS=all python3 -m unittest tests/integration_test.py
# run API with verbose and chunk day
VERBOSE=True python3 -m unittest tests/integration_test.py
Run make
Makefile is useful for many actions:
run the unit test by
make unittest
run the doc build by
make doc
Prepare a Pull Request (PR)
You can fork the repository in your space and then you can clone your copy in your local to change and run tests.
cd stocktwits-collector/
pip3 install --upgrade -r requirements.txt
python3 -m unittest discover -v
git checkout -b your-branch
git add files-changed
git commit -m "describe your changes here"
git push origin push your-branch
You can create the PR from your fork.